InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write

Blagoj Mitrevski† • Arina Rak† • Julian Schnitzler† • Chengkun Li† • Andrii Maksai‡ • Jesse Berent • Claudiu Musat

^† First authors (random order) | ^‡ Corresponding author: [email protected]

Animated teaser

Overview

InkSight is an offline-to-online handwriting conversion system that transforms photos of handwritten text into digital ink through a Vision Transformer (ViT) and mT5 encoder-decoder architecture. By combining reading and writing priors in a multi-task training framework, our models process handwritten content without requiring specialized equipment, handling diverse writing styles and backgrounds. The system supports both word-level and full-page conversion, enabling practical digitization of physical notes into searchable, editable digital formats. In this repository we provide the model weights of Small-p, dataset, and example inference code.

Key capabilities:

Offline-to-online handwriting conversion from photos
Multi-language support with robust background handling
Word-level and full-page text processing
Vector-based digital ink output for editing and search

InkSight system architecture (animated version)

Latest Updates

June 2025: Paper accepted to TMLR (Transactions on Machine Learning Research)
October 2024: Model weights and dataset released on Hugging Face
October 2024: Featured on Google Research Blog
February 2024: Interactive demo launched

Quick Start

Online Demo

Try InkSight on Hugging Face Space: Interactive Demo

Jupyter Notebook

Explore our example notebook with step-by-step inference examples.

Dataset

Access our comprehensive dataset: InkSight Dataset on Hugging Face

Installation

Using uv (Recommended)

uv is a fast Python package and project manager that provides excellent dependency resolution and virtual environment management.

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and set up the project
git clone https://github.com/google-research/inksight.git
cd inksight
uv sync

Using Conda

git clone https://github.com/google-research/inksight.git
cd inksight
conda env create -f environment.yml
conda activate inksight

Important: Use TensorFlow 2.15.0-2.17.0. Later versions may cause unexpected behavior.

Local Playground Setup

For development or custom inference, run the Gradio playground locally:

git clone https://huggingface.co/spaces/Derendering/Model-Output-Playground
cd Model-Output-Playground
pip install -r requirements.txt
python app.py

Resources

📊 Dataset

InkSight Dataset - Comprehensive collection of model outputs and expert traces
Dataset Documentation - Detailed dataset description, format specifications, and usage guidelines

🤖 Models

Small-p model (CPU/GPU) - Optimized for standard inference
Small-p model (TPU) - TPU-optimized version

💻 Code Examples

Inference Notebook - Word and page-level inference examples
Sample Outputs - Visual examples of model results

The inference code demonstrates both word-level and full-page text processing using open-source alternatives to commercial OCR APIs, including support for docTR and Tesseract OCR.

License and Citation

License

This code is released under the Apache 2.0 License.

Citation

If you use InkSight in your research, please cite our paper:

@article{
mitrevski2025inksight,
title={InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write},
author={Blagoj Mitrevski and Arina Rak and Julian Schnitzler and Chengkun Li and Andrii Maksai and Jesse Berent and Claudiu Cristian Musat},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=pSyUfV5BqA},
note={}
}

Additional Resources

Project Page - Comprehensive project overview with examples and technical details
Google Research Blog - Featured article explaining the research

This is not an officially supported Google product.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
docs		docs
figures		figures
test_inputs		test_inputs
utils		utils
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
colab.ipynb		colab.ipynb
environment.yml		environment.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock
visualize_dataset.py		visualize_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write

Overview

Latest Updates

Quick Start

Online Demo

Jupyter Notebook

Dataset

Installation

Using uv (Recommended)

Using Conda

Local Playground Setup

Resources

📊 Dataset

🤖 Models

💻 Code Examples

License and Citation

License

Citation

Additional Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

google-research/inksight

Folders and files

Latest commit

History

Repository files navigation

InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write

Overview

Latest Updates

Quick Start

Online Demo

Jupyter Notebook

Dataset

Installation

Using uv (Recommended)

Using Conda

Local Playground Setup

Resources

📊 Dataset

🤖 Models

💻 Code Examples

License and Citation

License

Citation

Additional Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages