Skip to content

Conversation

Jugal-lachhwani
Copy link

…R-based PDF text extraction using Tesseract- Supports custom page ranges and OCR configurations - Includes comprehensive unit tests with 14 test cases- Handles errors gracefully and filters empty pages- Adds proper documentation and type hints

Jugal-lachhwani and others added 2 commits September 3, 2025 15:19
…R-based PDF text extraction using Tesseract- Supports custom page ranges and OCR configurations - Includes comprehensive unit tests with 14 test cases- Handles errors gracefully and filters empty pages- Adds proper documentation and type hints
@Jugal-lachhwani
Copy link
Author

I have Added a Document loader for extracting text from scanned pdfs using ocr. The document loader which is apready present for this task are not mantained and compatible with recent versions(I have tried) so there is a need of ocr bassed scanner so that users dont have to use third party library as it doesn't provide proper metadata. So I think this problem of users can be solved by this ocr_pdf document loader.

Thank You

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant