-
Notifications
You must be signed in to change notification settings - Fork 357
Fix DecompressResponse middleware for multiple encodings and keep updated content-length header #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DecompressResponse middleware for multiple encodings and keep updated content-length header #809
Conversation
If multiple Content-Encodings were applied restoring the original message requires undoing all steps in reverse order. Thus we cannot continue when encountering an unsupported codec. Just skipping over one step will most likely just lead to decoding errors in the next supported step or incorrect results otherwise.
HEAD requests can be used to check the size of remote content to decide ahead of time whether it is worth fetching. Of course the size after decompression likely differs from the transfer size indicated in the content-length header, but depending on use case only the transfer size might be relevant. This obsoletes the empty-body special case in decompress_body previously added in 5bc9b82 since HEAD requests are now handled earlier. If we get an invalid empty body in a non-HEAD request we want to fail loudly.
Depending on context presence of this header is mandatory or at least strongly encouraged in HTTP/1.0 and HTTP/1.1 and some later processing steps might rely on or profit from its presence
fa0c87b
to
7f4a78b
Compare
Updated the patch to explicitly check for But now I’m not sure what to do with the existing empty-body special case in decompression added in 5bc9b82. Should we keep or drop it? It was motivated by |
If it's a HEAD request, it's caught earlier. If it's not a HEAD request with empty body + content-encoding, it should fail loudly. Unless I misunderstood https://datatracker.ietf.org/doc/html/rfc9110#name-head Should we merge it now 🤔 |
btw, when in doubt, keep adding tests 😄 |
7f4a78b
to
3fda9ab
Compare
Alright, updated the patch to remove the pre-existing empty-body special case then as it’s no longer needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 💜 Thank you so much for helping!
…dated content-length header (#809) * fix: stop decompressing response on first unknown codec If multiple Content-Encodings were applied restoring the original message requires undoing all steps in reverse order. Thus we cannot continue when encountering an unsupported codec. Just skipping over one step will most likely just lead to decoding errors in the next supported step or incorrect results otherwise. * fix: keep original content length and encoding headers for HEAD requests HEAD requests can be used to check the size of remote content to decide ahead of time whether it is worth fetching. Of course the size after decompression likely differs from the transfer size indicated in the content-length header, but depending on use case only the transfer size might be relevant. This obsoletes the empty-body special case in decompress_body previously added in 5bc9b82 since HEAD requests are now handled earlier. If we get an invalid empty body in a non-HEAD request we want to fail loudly. * fix: update existing content-length header after decompression Depending on context presence of this header is mandatory or at least strongly encouraged in HTTP/1.0 and HTTP/1.1 and some later processing steps might rely on or profit from its presence Signed-off-by: Yordis Prieto <[email protected]>
This addresses two issues:
First, until now if multiple encodings have been applied to the response body, the middleware would just skip over the unsupported algorithm and try to continue with the next. This likely crashes because the body now has the wrong data format for the next decompressor and is incorrect either way.
Furthermore, users may wish to use
HEAD
requests to check how much data would be transferred for aGET
to decide whether to (proactively) load a resource or not. But the middleware used to just delete anycontent-length
headers. Additionally,content-length
headers are mandatory in HTTP1.0 and unless doing chunked transfers HTTP1.1, thus its absence might confuse later processing steps even for other requests types. The second commit changes the middleware to instead retain or update thecontent-length
header.For the second part, there are two points worth pointing out:
content-length
header, this will now add one. I think this should be fine and it lead to simpler code, but if you think this should be changed let me knowHEAD
request and thus (effectively) also keep the originalcontent-encoding
header. Whether or not the content is compressed might also be of interest when doingHEAD
requests (though i guess you could also argue one should omit the middleware entirely in this case, but then one needs to manually keep theaccept-encoding
header in sync with what the middleware supports)