Skip to content

Enhancing PDF extraction: multi-column layout and OCR #35

@JiaZhang42

Description

@JiaZhang42

Hi Eduard,

Thank you for creating such a powerful package!

I wonder if you plan to extend the PDF extraction functionality in llm_message() to automatically detect whether the PDF is multi-column or requires OCR and then apply the appropriate extraction method. From my experience, pdftools::pdf_text() does not currently handle these scenarios effectively.

Additionally, I noticed that pdf_page_batch() prepares both the text and image of each PDF page as a list of LLM messages. I’m new to this multimodal functionality and wanted to ask: Is the inclusion of both text and images primarily to account for the layout or structure of the PDF?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions