Skip to content

Remove threaded pdf parsing #978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 20, 2025
Merged

Remove threaded pdf parsing #978

merged 1 commit into from
Jun 20, 2025

Conversation

mskarlin
Copy link
Collaborator

Removing multi-threading from the pdf parsing, as pymypdf is not threadsafe per the documentation. When parsing many large files simultaneously, i.e. patents, pqa was throwing segmentation faults otherwise.

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. bug Something isn't working labels Jun 20, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Removes the use of asyncio.to_thread for PDF parsing due to PyMuPDF’s lack of thread-safety, switching to a direct call and updating the explanatory comment.

  • Call to asyncio.to_thread removed for parse_pdf_to_pages
  • Comment updated to reference PyMuPDF thread-safety documentation
  • PDF parsing now runs synchronously within read_doc

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 20, 2025
@mskarlin mskarlin merged commit 2cf0d6c into main Jun 20, 2025
5 checks passed
@mskarlin mskarlin deleted the remove-threaded-pdf-parsing branch June 20, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lgtm This PR has been approved by a maintainer size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants