Skip to content

google-research/inksight

Repository files navigation

Organization Icon

InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write

Blagoj Mitrevski†Arina Rak†Julian Schnitzler†Chengkun Li†Andrii Maksai‡Jesse BerentClaudiu Musat

First authors (random order)   |   Corresponding author: [email protected]

Paper arXiv Project Page Demo Colab Google Research Blog


Inksight
Animated teaser

Overview

InkSight is an offline-to-online handwriting conversion system that transforms photos of handwritten text into digital ink through a Vision Transformer (ViT) and mT5 encoder-decoder architecture. By combining reading and writing priors in a multi-task training framework, our models process handwritten content without requiring specialized equipment, handling diverse writing styles and backgrounds. The system supports both word-level and full-page conversion, enabling practical digitization of physical notes into searchable, editable digital formats. In this repository we provide the model weights of Small-p, dataset, and example inference code.

Key capabilities:

  • Offline-to-online handwriting conversion from photos
  • Multi-language support with robust background handling
  • Word-level and full-page text processing
  • Vector-based digital ink output for editing and search

Derender Diagram
InkSight system architecture (animated version)

Latest Updates

Quick Start

Online Demo

Try InkSight on Hugging Face Space: Interactive Demo

Jupyter Notebook

Explore our example notebook with step-by-step inference examples.

Dataset

Access our comprehensive dataset: InkSight Dataset on Hugging Face

Installation

Using uv (Recommended)

uv is a fast Python package and project manager that provides excellent dependency resolution and virtual environment management.

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and set up the project
git clone https://github.com/google-research/inksight.git
cd inksight
uv sync

Using Conda

git clone https://github.com/google-research/inksight.git
cd inksight
conda env create -f environment.yml
conda activate inksight

Important: Use TensorFlow 2.15.0-2.17.0. Later versions may cause unexpected behavior.

Local Playground Setup

For development or custom inference, run the Gradio playground locally:

git clone https://huggingface.co/spaces/Derendering/Model-Output-Playground
cd Model-Output-Playground
pip install -r requirements.txt
python app.py

Resources

📊 Dataset

🤖 Models

💻 Code Examples

The inference code demonstrates both word-level and full-page text processing using open-source alternatives to commercial OCR APIs, including support for docTR and Tesseract OCR.

License and Citation

License

This code is released under the Apache 2.0 License.

Citation

If you use InkSight in your research, please cite our paper:

@article{
mitrevski2025inksight,
title={InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write},
author={Blagoj Mitrevski and Arina Rak and Julian Schnitzler and Chengkun Li and Andrii Maksai and Jesse Berent and Claudiu Cristian Musat},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=pSyUfV5BqA},
note={}
}

Additional Resources


This is not an officially supported Google product.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published