Skip to content

Conversation

dan-and
Copy link

@dan-and dan-and commented Sep 27, 2025

Add PDF detection to skip processing PDF files in fetch and playwright scrapers. This prevents raw PDF binary data from being dumped into HTML/markdown fields.

Fixes #28

Add PDF detection to skip processing PDF files in fetch and playwright scrapers.
This prevents raw PDF binary data from being dumped into HTML/markdown fields.

Fixes devflowinc#28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] PDF Content Incorrectly Dumped into HTML/Markdown Fields During Web Craw
1 participant