table-extractor

Extracting tables from PDFs or Any files using Deeplearning,OCR and Tabula

#Support for tabula is added in tables.py file. #We have used MASKRCNN here and trained it on images of pdf files to detect tables. #Things to do: 1)Adding a deep learning model than can detect columns of the tables. 2)Using Coordinates of detected tables and feeding it to tabula. 3)Adding support for OCR if PDFs are not avaliable.

Note:Use Tabula if you need the data extracted from pdfs,until the pytesseract branch is merged.

Standalone Api's that are used for extracting tables from Images of documents or PDFs are either Expensive(Abbey) or Not good enough(tabula,camelot,pdfminer etc) there are various types of tables in documents some easy to detect and Put in databases and some very unorthodox.Abbey uses Deep learning to solve that problem and probably the best api out there but its expensive.On the other hand Camelot,Tabula they only work for PDFs because they don't use OCR techniques instead they go for a Rule Based Approach and some classic EdgeDetection algorithms and GhostScript. They are free but don't really work that well if the table structure isn't good , also if tables countinue on other pages etc.We are going to solve those problems soon with this approach which combines all the approaches above and Give you a free and a flexible solution for your use case.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
models		models
pdf_to_image		pdf_to_image
poppler/bin		poppler/bin
table_detect		table_detect
text_sim		text_sim
.gitattributes		.gitattributes
README.md		README.md
tables.py		tables.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

table-extractor

About

Uh oh!

Releases

Packages

Languages

karan171/table-extractor

Folders and files

Latest commit

History

Repository files navigation

table-extractor

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages