InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write
Blagoj Mitrevski† • Arina Rak† • Julian Schnitzler† • Chengkun Li† • Andrii Maksai‡ • Jesse Berent • Claudiu Musat
† First authors (random order) | ‡ Corresponding author: [email protected]
InkSight is an offline-to-online handwriting conversion system that transforms photos of handwritten text into digital ink through a Vision Transformer (ViT) and mT5 encoder-decoder architecture. By combining reading and writing priors in a multi-task training framework, our models process handwritten content without requiring specialized equipment, handling diverse writing styles and backgrounds. The system supports both word-level and full-page conversion, enabling practical digitization of physical notes into searchable, editable digital formats. In this repository we provide the model weights of Small-p, dataset, and example inference code.
Key capabilities:
- Offline-to-online handwriting conversion from photos
- Multi-language support with robust background handling
- Word-level and full-page text processing
- Vector-based digital ink output for editing and search
InkSight system architecture (animated version)
- June 2025: Paper accepted to TMLR (Transactions on Machine Learning Research)
- October 2024: Model weights and dataset released on Hugging Face
- October 2024: Featured on Google Research Blog
- February 2024: Interactive demo launched
Try InkSight on Hugging Face Space: Interactive Demo
Explore our example notebook with step-by-step inference examples.
Access our comprehensive dataset: InkSight Dataset on Hugging Face
uv is a fast Python package and project manager that provides excellent dependency resolution and virtual environment management.
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and set up the project
git clone https://github.com/google-research/inksight.git
cd inksight
uv sync
git clone https://github.com/google-research/inksight.git
cd inksight
conda env create -f environment.yml
conda activate inksight
Important: Use TensorFlow 2.15.0-2.17.0. Later versions may cause unexpected behavior.
For development or custom inference, run the Gradio playground locally:
git clone https://huggingface.co/spaces/Derendering/Model-Output-Playground
cd Model-Output-Playground
pip install -r requirements.txt
python app.py
- InkSight Dataset - Comprehensive collection of model outputs and expert traces
- Dataset Documentation - Detailed dataset description, format specifications, and usage guidelines
- Small-p model (CPU/GPU) - Optimized for standard inference
- Small-p model (TPU) - TPU-optimized version
- Inference Notebook - Word and page-level inference examples
- Sample Outputs - Visual examples of model results
The inference code demonstrates both word-level and full-page text processing using open-source alternatives to commercial OCR APIs, including support for docTR and Tesseract OCR.
This code is released under the Apache 2.0 License.
If you use InkSight in your research, please cite our paper:
@article{
mitrevski2025inksight,
title={InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write},
author={Blagoj Mitrevski and Arina Rak and Julian Schnitzler and Chengkun Li and Andrii Maksai and Jesse Berent and Claudiu Cristian Musat},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=pSyUfV5BqA},
note={}
}
- Project Page - Comprehensive project overview with examples and technical details
- Google Research Blog - Featured article explaining the research
This is not an officially supported Google product.