Skip to content

Conversation

awalker4
Copy link
Contributor

This is a minor fix to improve our logging. When we buffer a file like input to disk in process_data_with_model, we always use the name document.pdf. This confused me when I found this in our logs:

2025-06-30 17:02:01,906 unstructured_inference INFO Reading image file: /var/folders/5k/frv076q97yl0ywybmzydhbsr0000gn/T/tmpc0uq7zde/document.pdf ...
2025-06-30 17:02:01,951 unstructured_api ERROR cannot identify image file '/private/var/folders/5k/frv076q97yl0ywybmzydhbsr0000gn/T/tmpc0uq7zde/document.pdf'

This path can be either pdfs or images, so let's just drop the extension to save ourselves some confusion.

Also added a comment so we don't forget why it's using a temp dir, not a temp file.

This is a minor fix to improve our logging. When we buffer a file like input to disk in
`process_data_with_model`, we always use the name `document.pdf`. This confused me when I found this
in our logs:

```
2025-06-30 17:02:01,906 unstructured_inference INFO Reading image file: /var/folders/5k/frv076q97yl0ywybmzydhbsr0000gn/T/tmpc0uq7zde/document.pdf ...
2025-06-30 17:02:01,951 unstructured_api ERROR cannot identify image file '/private/var/folders/5k/frv076q97yl0ywybmzydhbsr0000gn/T/tmpc0uq7zde/document.pdf'
```

This path can be either pdfs or images, so let's just drop the extension to save ourselves some
confusion.

Also added a comment so we don't forget why it's using a temp dir, not a temp file.
@awalker4 awalker4 requested a review from qued June 30, 2025 21:09
Copy link
Contributor

@qued qued left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@awalker4 awalker4 merged commit 18c73ca into main Jul 1, 2025
12 checks passed
@awalker4 awalker4 deleted the fix/pdf_filename_bug branch July 1, 2025 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants