Skip to content

Specify encoding for txt files so that browser doesn't mangle special characters #2922

@estanfor

Description

@estanfor

The default browser behavior is to open TXT files in the browser instead of downloading them, but it seems like the browser doesn't pick up the file encoding, meaning that some characters are getting mangled. The mangling only happens when you open the file in the browser, not when you download it. Example item: https://purl.stanford.edu/kc795fm0887

Steps to reproduce issue:

  1. Click on the "Download" button in the viewer for the txt file
  2. The txt file will (probably) open in a new browser tab, looking like this:
Image
  1. Go back to the "Download" button and right-click + "Save link as" to download the file to your computer.
  2. Open the file in a text editor. It should look like this, and be identified as a UTF-8 file:
Image

Is there a way to pass the correct encoding to the browser so that it can display the file correctly? I don't think we can assume that all files will be UTF-8, although it's probably more often true than any other encoding.

I'm basing my interpretation of the problem on this very old StackOverflow post https://stackoverflow.com/questions/13537371/displayerror-of-textfile-in-browser so maybe the issue is something else entirely.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions