Open
Description
Description
When a StreamReader
with default constructor (UTF-8) encounters a UTF-8 character that is broken in half (one particular kind of invalid UTF-8 byte sequence), the handling changed from .NET 7 to .NET 8. I wasn't able to find docs mentioning this change.
Repro code:
using System.Runtime.InteropServices;
using System.Text;
using System.Text.Json;
var str = " \u00B7 ";
var bytes = Encoding.UTF8.GetBytes(str);
Console.WriteLine("Framework: " + RuntimeInformation.FrameworkDescription);
for (var i = 1; i <= bytes.Length; i++)
{
var range = bytes[0..i];
var readByStreamReader = new StreamReader(new MemoryStream(range)).ReadToEnd();
Console.WriteLine(JsonSerializer.Serialize(readByStreamReader));
}
Output in .NET 7 (no replacement character emitted):
Framework: .NET 7.0.14
" "
" "
" "
" \u00B7"
" \u00B7 "
" \u00B7 "
Output in .NET 8 (replacement character emitted)
Framework: .NET 8.0.0
" "
" "
" \uFFFD"
" \u00B7"
" \u00B7 "
" \u00B7 "
Version
.NET 8 GA
Previous behavior
I noticed this on .NET 8 GA. I did not test .NET 8 previews.
New behavior
A \uFFFD
character (Unicode replacement character) is emitted by the StreamReader
now. Previously nothing was emitted.
Type of breaking change
- Binary incompatible: Existing binaries may encounter a breaking change in behavior, such as failure to load or execute, and if so, require recompilation.
- Source incompatible: When recompiled using the new SDK or component or to target the new runtime, existing source code may require source changes to compile successfully.
- Behavioral change: Existing binaries may behave differently at run time.
Reason for change
Product team can provide details I think.
Recommended action
Document the change.
Feature area
Globalization
Affected APIs
System.IO.StreamReader