Skip to content

fix: handle UTF BOM in config files #10154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 1, 2025

Conversation

devin-ai-integration[bot]
Copy link
Contributor

Fixes #9938.

This PR implements UTF BOM (Byte Order Marker) handling for Wrangler config files to resolve parsing errors when config files contain BOMs.

Changes

Core Implementation:

  • Modified readFileSync in parse.ts to read files as binary buffers first, then validate/strip BOMs before parsing
  • Added removeBOMAndValidate function that detects and handles different BOM types:
    • UTF-8 BOM (ef bb bf): Silently removed before parsing
    • UTF-16/UTF-32 BOMs: Throw descriptive ParseError with actionable guidance to save as UTF-8 without BOM

Error Handling:

  • Updated try-catch logic to re-throw ParseError instances directly, preserving detailed BOM error messages
  • Error messages include file path and clear instructions for users

Test Coverage:

  • Added comprehensive test suite covering UTF-8 BOM removal for both TOML and JSON configs
  • Tests for all BOM types (UTF-16 BE/LE, UTF-32 BE/LE) that should trigger errors
  • Verification that normal files without BOMs continue to work correctly

Key Review Points

⚠️ Critical Logic: BOM detection order is important - UTF-32 BOMs (4 bytes) must be checked before UTF-16 BOMs (2 bytes) to avoid false positives, since UTF-32 LE starts with the same bytes as UTF-16 LE.

⚠️ File Reading Strategy: Changed from reading files as UTF-8 strings to binary buffers first. This affects ALL config file reading in Wrangler.

⚠️ Error Handling: Modified ParseError re-throwing logic - please verify this doesn't break other error scenarios in config loading.

Testing

All BOM handling tests pass, including:

  • UTF-8 BOM removal (should work silently)
  • Error generation for unsupported BOMs (should show helpful messages)
  • Normal operation without BOMs (should be unchanged)

Link to Devin run: https://app.devin.ai/sessions/d8d27611f75e4c97b1be565945f1588c
Requested by: @petebacondarwin

  • Tests
    • Tests included
  • Public documentation
    • Documentation not necessary because: Internal bug fix with clear error messages for users
  • Wrangler V3 Backport
    • Not necessary because: This is a patch fix that should go through normal release process

@devin-ai-integration devin-ai-integration bot requested a review from a team as a code owner July 31, 2025 14:15
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link

changeset-bot bot commented Jul 31, 2025

🦋 Changeset detected

Latest commit: b4df46e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
wrangler Patch
@cloudflare/vite-plugin Patch
@cloudflare/vitest-pool-workers Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

Failed to automatically backport this PR's changes to Wrangler v3. Please manually create a PR targeting the v3-maintenance branch with your changes. Thank you for helping us keep Wrangler v3 supported!

Depending on your changes, running git rebase --onto v3-maintenance main devin/1753969999-utf-bom-handling might be a good starting point.

Notes:

  • your PR branch should be named v3-backport-10154
  • add the skip-v3-pr label to the current PR to stop this workflow from failing

Copy link

pkg-pr-new bot commented Jul 31, 2025

create-cloudflare

npm i https://pkg.pr.new/create-cloudflare@10154

@cloudflare/kv-asset-handler

npm i https://pkg.pr.new/@cloudflare/kv-asset-handler@10154

miniflare

npm i https://pkg.pr.new/miniflare@10154

@cloudflare/pages-shared

npm i https://pkg.pr.new/@cloudflare/pages-shared@10154

@cloudflare/unenv-preset

npm i https://pkg.pr.new/@cloudflare/unenv-preset@10154

@cloudflare/vite-plugin

npm i https://pkg.pr.new/@cloudflare/vite-plugin@10154

@cloudflare/vitest-pool-workers

npm i https://pkg.pr.new/@cloudflare/vitest-pool-workers@10154

@cloudflare/workers-editor-shared

npm i https://pkg.pr.new/@cloudflare/workers-editor-shared@10154

wrangler

npm i https://pkg.pr.new/wrangler@10154

commit: b4df46e

@vicb
Copy link
Contributor

vicb commented Jul 31, 2025

Hey Devin

e08929d was actually great - you can probably collapse the "name" and "encoding" properties as they are the same - you should keep encoding with has a nicer description.

Why did you not kept that commit?

devin-ai-integration bot and others added 6 commits July 31, 2025 16:07
- Remove UTF-8 BOM (ef bb bf) from config files before parsing
- Throw descriptive errors for other BOMs (UTF-16, UTF-32)
- Add comprehensive tests for BOM handling in TOML and JSON configs

Fixes #9938

Co-Authored-By: [email protected] <[email protected]>
- Replace individual BOM constants with UNSUPPORTED_BOMS array
- Use loop instead of separate if statements for each BOM type
- Reduces code duplication while maintaining exact same functionality

Addresses PR feedback from vicb

Co-Authored-By: [email protected] <[email protected]>
- Use TextDecoder().decode() first, which automatically strips UTF-8 BOMs
- Check for UTF-16/UTF-32 BOMs as replacement characters in decoded string
- Fall back to raw buffer inspection to determine specific BOM type
- Cleaner approach as suggested by user feedback
- All existing tests continue to pass

Co-Authored-By: [email protected] <[email protected]>
- TextDecoder().decode() first to automatically strip UTF-8 BOMs
- Check for replacement characters to detect non-UTF-8 BOMs
- Fall back to raw buffer inspection for specific BOM type identification
- All BOM tests pass (UTF-8 removal, UTF-16/UTF-32 error detection)
- Constant string approach not viable as all non-UTF-8 BOMs decode identically

Co-Authored-By: [email protected] <[email protected]>
- Apply prettier formatting to parse.ts
- No functional changes to BOM detection logic
- All BOM tests continue to pass locally

Co-Authored-By: [email protected] <[email protected]>
@devin-ai-integration devin-ai-integration bot force-pushed the devin/1753969999-utf-bom-handling branch from 35c15bf to 54428f1 Compare July 31, 2025 16:08
devin-ai-integration bot and others added 2 commits July 31, 2025 16:26
… and array-based BOM detection

- Update UTF-8 BOM tests to use Buffer.concat([Buffer.from([0xEF, 0xBB, 0xBF]), Buffer.from(configContent, 'utf-8')]) as suggested by petebacondarwin
- Revert to array-based BOM detection approach with collapsed properties (name + encoding) as suggested by vicb
- Maintain all existing functionality while following preferred patterns

Co-Authored-By: [email protected] <[email protected]>
@github-project-automation github-project-automation bot moved this from Untriaged to Approved in workers-sdk Jul 31, 2025
@petebacondarwin petebacondarwin added the skip-v3-pr Skip validation of presence of a v3 backport PR label Aug 1, 2025
@petebacondarwin petebacondarwin merged commit 502a8e0 into main Aug 1, 2025
40 of 42 checks passed
@petebacondarwin petebacondarwin deleted the devin/1753969999-utf-bom-handling branch August 1, 2025 07:15
@github-project-automation github-project-automation bot moved this from Approved to Done in workers-sdk Aug 1, 2025
@workers-devprod workers-devprod added the contribution [Holopin] Recognizes an open-source contribution, big or small label Aug 1, 2025
Copy link

holopin-bot bot commented Aug 1, 2025

Congratulations @devin-ai-integration[bot], the maintainer of this repository has issued you a holobyte! Here it is: https://holopin.io/holobyte/cmdshoz8f665807jydb6r3n2l

This badge can only be claimed by you, so make sure that your GitHub account is linked to your Holopin account. You can manage those preferences here: https://holopin.io/account.
Or if you're new to Holopin, you can simply sign up with GitHub, which will do the trick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution [Holopin] Recognizes an open-source contribution, big or small skip-v3-pr Skip validation of presence of a v3 backport PR
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Invalid symbol at the start of wrangler.jsonc
3 participants