This list contains links to great software tools and libraries and literature related to Optical Character Recognition (OCR).
Contributions are welcome, as is feedback.
- tesseract - The definitive Open Source OCR engine 
Apache 2.0 - ocropus - OCR engine based on LSTM, 
Apache 2.0 - ocropus 0.4 - Older v0.4 state of Ocropus, with tesseract 2.04 and iulib, C++
 - kraken - Ocropus fork with sane defaults
 - Ocrad - The GNU OCR. 
GPL - digit - OCR for numbers in meter displays, such as a power meter, using caffe
 - ocular - Machine-learning OCR for historic documents
 - SwiftOCR - fast and simple OCR library written in Swift
 - attention-ocr - OCR engine using visual attention mechanisms
 - RWTH-OCR - The RWTH Aachen University Optical Character Recognition System
 - simple-ocr-opencv and its fork - A simple pythonic OCR engine using opencv and numpy
 
- Clara OCR - Open source OCR in C 
GPL - Cuneiform - CuneiForm OCR was developed by Cognitive Technologies
 - Eye - an experimental Java OCR (image-to-text) application
 - kognition - An omnifont OCR software for KDE
 - OCRchie - Modular Optical Character Recognition Software
 - ocre - o.c.r. easy
 - xplab - A GTK 2 tool for pattern matching
 - hebOCR - Hebrew character recognition library (previously named hocr, see Wikipedia article) 
GPL 
- hocr-tools - Tools for doing various useful things with hOCR files, 
Apache 2.0 - hocr-spec - hOCR 1.1 specification
 - ocr-transform - CLI tool to convert between hOCR and ALTO, 
MIT - hocr-parser - hOCR Specification Python Parser
 - hOCRTools - hOCR to ALTO conversion XSLT
 
- ALTO XML Schema - XML Schema and development of the ALTO XML format
 - ALTO XML Documentation - Documentation and use cases for ALTO
 - alto-tools - Various tools to work with ALTO files, Python
 - AbbyyToAlto - PHP script converting from Abbyy 6 to ALTO XML
 
- TEI-OCR - TEI customization for OCR generated layout and content information
 - TEI SIG on Libraries - Best Practices for TEI in Libraries
 - GDZ - METS/TEI-based GDZ document format
 
- OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
 - Ocrocis - Project manager interface for Ocropy, see also external project homepage
 
- moz-hocr-editor - Firefox Addon for editing hOCR files Discontinued
 - qt-box-editor - QT4 editor of tesseract-ocr box files.
 - ocr-gt-tools - Client-Server application for editing OCR ground truth.
 - Paperwork - Using scanners and OCR to grep paper documents the easy way.
 - Paperless - Scan, index, and archive all of your paper documents.
 - gImageReader - gImageReader is a simple Gtk/Qt front-end to tesseract-ocr.
 - VietOCR - A Java/.NET GUI frontend for Tesseract OCR engine, including jTessBoxEditor a graphical Tesseract box data editor
 - PoCoTo - Fast interactive batch corrections of complete OCR error series in OCR'ed historical documents.
 - OCRFeeder - GTK graphical user interface that allows the users to correct characters or bounding boxes, ODT export and more.
 - PRImA PAGE Viewer - Java based viewer for PAGE XML files (layout + text content). Also supports ALTO XML, FineReader XML, and HOCR.
 - LAREX - A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
 - archiscribe - Web application for transcribing OCR ground truth from Archive.org. Deployed instance available at https://archiscribe.jbaiter.de/, results are available in @jbaiter/archiscribe-corpus.
 
- NoiseRemove.java in MathOCR - Java implementation of
 - binarize.c in ZBar - C implementations of two binarization algorithms, based on Sauvola
 - typeface-corpus - A repository for typefaces to train Tesseract and OCRopus for natural history collections and digital humanities.
 - binarizewolfjolion - Comparison of binarization algorithms. Blog post
 crop_morphology.pyin oldnyc - Cropping a page to just the text block- Whiteboard Picture Cleaner - Shell one-liner/script to clean up and beautify photos of whiteboards
 - Fred's ImageMagick script textcleaner - Processes a scanned document of text to clean the text background
 
- Open OCR - Run Tesseract in Docker containers
 - tesseract-web-service - An implementation of RESTful web service for tesseract-OCR using tornado.
 - docker-ocropy - A Docker container for running the ocropy OCR system.
 - ABBYY Cloud OCR SDK Code samples - Code samples for using the proprietary commercial ABBYY OCR API.
 - nidaba - An expandable and scalable OCR pipeline
 - gamera - A meta-framework for building document processing applications, e.g. OCR
 - ocr-tools - Project to provide CLI and web service interfaces to common OCR engines
 - ocrad-docker - Run the ocrad OCR engine in a docker container
 - kraken-docker - Run the kraken OCR engine in a docker container
 - ocr.space - Free Online OCR and OCR API by @a9t9 based on Tesseract (code is not open)
 
- ISRI OCR Evaluation Tools with a User Guide from 1996 :!:
- isri-ocr-evaluation-tools - further development by @eddieantonio (2015, 2016)
 - ancientgreekocr-evaluation-tools - further development by @nickjwhite (2013, 2014)
 
 - ocrevalUAtion - Cross-format evaluation, CLI and GUI
 - ngram-ocr-eval - Brute and simple OCR evaluation using ngrams
 - quack - Quality-Assurance-tool for scans with corresponding ALTO-files
 
- gosseract - Golang OCR library, wrapping Tesseract-ocr.
 
- Tess4J - Java Native Access bindings to Tesseract.
 - tess-two - Tools for compiling Tesseract on Android and Java API.
 
- tesseract for .net - A .Net wrapper for tesseract-ocr.
 
- Tesseract OCR for PHP - Tesseract PHP bindings.
 
- pytesseract - A Python wrapper for Google Tesseract.
 - pyocr - A Python wrapper for Tesseract and Cuneiform.
 - ocrodjvu - A library and standalone tool for doing OCR on DjVu documents, wrapping Cuneiform, gocr, ocrad, ocropus and tesseract
 - tesserocr - A Python wrapper for the tesseract-ocr API
 
- ocracy - pure javascript lstm rnn implementation based on ocropus
 - gocr.js - Javascript port (emscripten) of gocr
 - ocrad.js - Javascript port (emscripten) of ocrad
 - tesseract.js - Javascript port (emscripten) of Tesseract
 - node-tesseract - A simple wrapper for the Tesseract OCR package.
 - node-tesseract-native - C++ module for node providing OCR with tesseract and leptonica.
 
- rtesseract - Ruby library wrapping the tesseract and imagemagick executables.
 - ruby-tesseract - Native Tesseract bindings for Ruby MRI and JRuby
 - ocr_space - API wrapper for free ocr service ocr.space. Includes CLI
 
- tesseract.rs - Rust bindings for tesseract OCR.
 
- glyph-miner - A system for extracting glyphs from early typeset prints
 
- IMPACT: Tools for text digitisation - List of tools software projects related, some related to OCR
 - OCR-D - List of OCR-related academic articles in the context of the OCR-D project. 🇩🇪
 - Mendeley Group "OCR - Optical Character Recognition" - Collection of 34 papers on OCR
 - eadh.org projects - List of Digital Humanities-related projects in Europe, some related to OCR
 - Wikipedia: Comparison of optical character recognition software
 - OCR [and Deep Learning] by @handong1587
 - Ocropus Wiki: Publications
 
- Tesseract Blends Old and New OCR Technology (2016) @theraysmith
- Tutorial@DAS2016, Updated "What You Always Wanted to Know" slides
 
 - What You Always Wanted To Know About Tesseract (2014) @theraysmith
- Tutorial@DAS2014, includes demos
 
 - Extracting text from an image using Ocropus (2015)
 - Training an Ocropus OCR model (2015) @danvk
 - Ocropus Wiki: Compute errors and confusions (2016) @zuphilip
 - Ocropus Wiki: Working with Ground Truth (2016) @zuphilip
 - OCRopus (2016) @jze
- mostly on column separation in ocropus
 
 - 10 Tips for making your OCR project succeed (2013) @cneud
- general things to consider for OCR projects
 
 - Overview of LEADTOOLS Image Cleanup and Pre-processing SDK Technology -
- feature list for a commercial image pre-processing library; has nice before-after samples for pre-processing steps related to OCR
 
 - Extracting Text from PDFs; Doing OCR; all within R @shawngraham
- How to work with OCR from PDFs in the R programming environment
 
 - Tutorial: Command-line OCR on a Mac @bmschmidt
- Tutorial on how to run tesseract in Mac OSX
 
 - Practical Expercience with OCRopus Model Training (2016) @jze
 - Homemade Manuscript OCR (1): OCRopy (2017) @Jean-Baptiste-Camps
- Tutorial on applying OCR to medieval manuscripts with OCRopy
 
 - Optimizing Binarization for OCRopus (2017) @jze
 - Prototype demo for OCR postfix in Danish Newspapers (2016) @thomasegense
 - How Can I OCR My Dictionary? (2016) @JessedeDoes
 - "Needlessly complex" blog (2016) @mzucker. Several image processing how-tos (Python based), particularly:
 - (Open-Source-)OCR-Workflows (2017) @wrznr 🇩🇪 overview of the state of the art in open source OCR and related technologies (binarisation, deskewing, layout recognition, etc.), lots of example images and information on the @OCR-D project.
 
- abbyy-finereader-ocr-senate - Using OCR to parse scanned Senate Financial Disclosure forms.
 - cvOCR - An OCR system for recognizing resume or cv text, implemented in Python and C and based on tesseract
 - MathOCR - A printed scientific document recognition system, pre-alpha
 
- High performance document layout analysis (2003) Breuel
 - Adaptive degraded document image binarization (2006) Gatos, Pratikakis, Perantonis
 - [Internship Report] (2007) Gupta
 - OCRopus Addons (Internship Report) (2007) Dantrey
 
- Local Logistic Classifiers for Large Scale Learning (2012) Yousefi, Breuel
 
- High Performance OCR for Printed English and Fraktur using LSTM Networks (2013) Breuel, Ul-Hasan, Mayce Al Azawi. Shafait
 - Can we build language-independent OCR using LSTM networks? (2013) Ul-Hasan, Breuel
 - Offline Printed Urdu Nastaleeq Script Recognition with Bidirectional LSTM Networks (2013) Ul-Hasan, Ahmed, Rashid, Shafait, Breuel
 
- OCR of historical printings of Latin texts: Problems, Prospects, Progress. (2014) Springmann, Najock, Morgenroth, Schmid, Gotscharek, Fink
 - Correcting Noisy OCR: Context beats Confusion (2014) Evershed, Fitch
 
- TypeWright: An Experiment in Participatory Curation (2015) Bilansky
- On crowd-sourcing OCR postcorrection
 
 - Benchmarking of LSTM Networks (2015) Breuel
 - Recognition of Historical Greek Polytonic Scripts Using LSTM (2015) Simistira, Ul-Hassan, Papavassiliou, Basilis Gatos, Katsouros, Liwicki
 - A Segmentation-Free Approach for Printed Devanagari Script Recognition (2015) Karayil, Ul-Hasan, Breuel
 - A Sequence Learning Approach for Multiple Script Identification (2015) Ul-Hasan, Afzal, Shfait, Liwicki, Breuel
 
- Important New Developments in Arabographic Optical Character Recognition (OCR) (2016) Romanov, Miller, Savant, Kiessling
- on kraken
 - using OpenArabic/OCR_GS_Data for ground truth data
 
 - OCR of historical printings with an application to building diachronic corpora: A case study using the RIDGES herbal corpus (2016) Springmann, Lüdeling
 - Automatic quality evaluation and (semi-) automatic improvement of mixed models for OCR on historical documents (2016) Springmann, Fink, Schulz
 - Generic Text Recognition using Long Short-Term Memory Networks (2016) Ul-Hasan -- Ph.D Thesis
 - OCRoRACT: A Sequence Learning OCR System Trained on Isolated Characters (2016) Andreas Dengel, Ul-Hasan, Bukhari
 
- Telugu OCR Framework using Deep Learning (2015/2017) Achanta, Hastie
- see also TeluguOCR, banti_telugu_ocr, chamanti_ocr, #49