-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add multi mode encoding #676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds a private OptimizedLatin1DataSegment implementing mixed-mode segmentation and integrates fast-path checks in GenerateQrCode/CreateDataSegment to return it when input fits Latin‑1 (no UTF‑8/BOM/ECI). Adds tests and an approved ASCII snapshot for mixed-mode encoding. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Generator as QRCodeGenerator
participant Selector as CreateDataSegment
participant Optimized as OptimizedLatin1DataSegment
participant Generic as Generic DataSegment
Note over Selector,Optimized: fast-path selection when Latin‑1 encodable\nand no UTF‑8/BOM/ECI
Caller->>Generator: GenerateQrCode(plainText, ecc, forceUtf8, utf8BOM, eciMode, ...)
Generator->>Selector: CreateDataSegment(plainText, forceUtf8, utf8BOM, eciMode)
alt Not forcing UTF-8 && no BOM && ECI Default && CanEncode(plainText)
Selector->>Optimized: new OptimizedLatin1DataSegment(plainText)
Optimized-->>Generator: segment (optimized)
else
Selector->>Generic: determine encoding & create DataSegment
Generic-->>Generator: segment (fallback)
end
Generator->>Generator: segment.GetBitLength() / segment.WriteTo(bitArray)
Generator-->>Caller: QRCode (modules)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (5)
53-95
: Initial mode selection is correct; minor clarity tweak optionalLogic aligns with Annex J.2. For readability, consider using
!AlphanumericEncoder.CanEncode(nextChar)
instead of!CanEncodeNonDigit
in Line 71 to directly express “Byte-exclusive” check.- if (nextPos < text.Length && !AlphanumericEncoder.CanEncodeNonDigit(text[nextPos])) + if (nextPos < text.Length && !AlphanumericEncoder.CanEncode(text[nextPos])) return EncodingMode.Byte;
97-129
: Thresholds: consider centralizing and (optionally) adapting per version groupHard-coded minima (6/11/13) are valid but represent the lowest bracket in Annex J.2. Two improvements:
- Centralize these as named constants with a short comment referencing Annex J.2 brackets.
- Future enhancement: choose thresholds based on the eventual version group (1–9, 10–26, 27–40) after DetermineVersion, or use DP for truly optimal segmentation.
Also applies to: 131-162, 164-190
101-106
: Remove unused locals and inline declarations to satisfy formatter
var c = text[pos];
is unused in these loops, and a few variables can be inlined. This also resolves the IDE0018 “declaration can be inlined” pipeline failure.- while (pos < text.Length) - { - var c = text[pos]; + while (pos < text.Length) + { // Rule b.3 ... var numericCount = CountConsecutive(text, pos, IsNumeric); if (numericCount >= 6) { nextMode = EncodingMode.Numeric; return pos; } // Rule b.2 ... var alphanumericCount = CountConsecutive(text, pos, IsAlphanumeric); if (alphanumericCount >= 11) { nextMode = EncodingMode.Alphanumeric; return pos; } pos++; }Repeat the same removal of
var c = text[pos];
in ProcessAlphanumericMode and ProcessNumericMode.
Also consider inlining:- var segmentLength = segmentEnd - startPos; - var segmentData = mode switch + var segmentData = mode switch { - EncodingMode.Byte => PlainTextToBinaryByte(text.Substring(startPos, segmentLength), EciMode.Iso8859_1, false, false), - EncodingMode.Alphanumeric => AlphanumericEncoder.GetBitArray(text.Substring(startPos, segmentLength)), - EncodingMode.Numeric => PlainTextToBinaryNumeric(text.Substring(startPos, segmentLength)), + EncodingMode.Byte => PlainTextToBinaryByte(text.Substring(startPos, segmentEnd - startPos), EciMode.Iso8859_1, false, false), + EncodingMode.Alphanumeric => AlphanumericEncoder.GetBitArray(text.Substring(startPos, segmentEnd - startPos)), + EncodingMode.Numeric => PlainTextToBinaryNumeric(text.Substring(startPos, segmentEnd - startPos)), _ => throw new InvalidOperationException("Unsupported encoding mode") }; - var segment = new DataSegment(mode, segmentLength, segmentData, EciMode.Default); + var segment = new DataSegment(mode, segmentEnd - startPos, segmentData, EciMode.Default);Also applies to: 115-121, 137-147, 170-182
34-41
: Optional perf: avoid Substring allocationsNot urgent, but consider adding span-based overloads (e.g.,
PlainTextToBinaryNumeric(ReadOnlySpan<char>)
,AlphanumericEncoder.GetBitArray(ReadOnlySpan<char>)
) to eliminateSubstring
allocations on long inputs.Also applies to: 53-95, 97-129, 131-162, 164-206
25-33
: Formatting: run dotnet format to clear IDE0018 warningsThe CI “Format” job flagged IDE0018. After the minor inlining above, please run
dotnet format
to keep the pipeline green.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
QRCoder/QRCodeGenerator.cs
(1 hunks)QRCoder/QRCodeGenerator/AlphanumericEncoder.cs
(1 hunks)QRCoder/QRCodeGenerator/OptimizedDataSegment.cs
(1 hunks)QRCoderTests/QRGeneratorTests.cs
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (1)
QRCoder/QRCodeGenerator.cs (1)
IsInRange
(766-767)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (2)
QRCoder/QRCodeGenerator.cs (4)
QRCodeGenerator
(19-21)DataSegment
(129-135)EncodingMode
(746-761)IsInRange
(766-767)QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (4)
QRCodeGenerator
(3-82)AlphanumericEncoder
(8-81)CanEncode
(46-46)CanEncodeNonDigit
(41-41)
🪛 GitHub Actions: Format (Pull Request)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs
[error] 25-25: IDE0018: Variable declaration can be inlined. 'dotnet format' reported an issue at this location.
🪛 GitHub Check: format
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs
[failure] 25-25:
Variable declaration can be inlined
[failure] 25-25:
Variable declaration can be inlined
[failure] 25-25:
Variable declaration can be inlined
[failure] 25-25:
Variable declaration can be inlined
[failure] 25-25:
Variable declaration can be inlined
[failure] 25-25:
Variable declaration can be inlined
[failure] 25-25:
Variable declaration can be inlined
[failure] 25-25:
Variable declaration can be inlined
[failure] 25-25:
Variable declaration can be inlined
[failure] 25-25:
Variable declaration can be inlined
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test .NET 5.0 Windows
🔇 Additional comments (4)
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (1)
43-47
: New CanEncode helper looks correctMatches QR alphanumeric set and complements CanEncodeNonDigit. No issues.
QRCoderTests/QRGeneratorTests.cs (1)
479-479
: Expected bitstring updated for optimized segmentationChange aligns with new multi-mode path. Looks good.
QRCoder/QRCodeGenerator.cs (1)
115-118
: Optimization gating is soundGood guard conditions (Default ECI, no UTF‑8/BOM, ISO‑8859‑1). Falls back cleanly. No functional risks spotted.
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (1)
37-43
: Byte segment ECI choice is correct for ISO‑8859‑1Encoding with ISO‑8859‑1 and using ECI Default in DataSegment is standard-compliant (default Byte mode). LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (1)
43-47
: Public CanEncode(char) is a good addition; consider simplifying membership checksCurrent check does a range + array scan. Since the dict already contains all valid chars, you can use it directly and make NonDigit reuse it.
Example:
- public static bool CanEncode(char c) => IsInRange(c, '0', '9') || IsInRange(c, 'A', 'Z') || Array.IndexOf(_alphanumEncTable, c) >= 0; + public static bool CanEncode(char c) => _alphanumEncDict.ContainsKey(c);And optionally:
- public static bool CanEncodeNonDigit(char c) => IsInRange(c, 'A', 'Z') || Array.IndexOf(_alphanumEncTable, c) >= 0; + public static bool CanEncodeNonDigit(char c) => !IsInRange(c, '0', '9') && _alphanumEncDict.ContainsKey(c);Also, in CreateAlphanumEncDict, prefer iterating alphanumEncTable.Length instead of _alphanumEncTable.Length to avoid hidden coupling. [Based on learnings]
QRCoder/QRCodeGenerator.cs (1)
115-118
: Guarded switch to optimized segmentation is sensible; add safety to avoid regressionsGating on Default ECI/no BOM/ISO is good. However, fixed J.2 thresholds in the optimizer can, in rare cases (esp. higher versions), add more overhead than single‑mode. Consider a fallback compare to ensure the optimized path never produces a larger version/bitlength than the legacy path.
Minimal pattern:
- var segment = (!forceUtf8 && !utf8BOM && eciMode == EciMode.Default && IsValidISO(plainText)) - ? CreateOptimizedDataSegment(plainText) - : CreateDataSegment(plainText, forceUtf8, utf8BOM, eciMode); + DataSegment segment; + if (!forceUtf8 && !utf8BOM && eciMode == EciMode.Default && IsValidISO(plainText)) + { + var opt = CreateOptimizedDataSegment(plainText); + var leg = CreateDataSegment(plainText, forceUtf8, utf8BOM, eciMode); + var vOpt = DetermineVersion(opt, eccLevel, requestedVersion); + var vLeg = DetermineVersion(leg, eccLevel, requestedVersion); + // Prefer smaller version; if equal, prefer fewer bits + segment = vOpt < vLeg + ? opt + : vOpt > vLeg + ? leg + : (opt.ToBitArray(vOpt).Length <= leg.ToBitArray(vLeg).Length ? opt : leg); + } + else + { + segment = CreateDataSegment(plainText, forceUtf8, utf8BOM, eciMode); + }Please verify via tests/bench that the chosen path never increases required version compared to legacy.
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (3)
13-21
: Doc claim vs implementation: thresholds aren’t version‑awareYou cite ISO/IEC 18004:2015 Annex J.2, but the rules here use single (lower‑bound) thresholds (e.g., 6/11/13) instead of version‑grouped thresholds [1–9],[10–26],[27–40]. That’s valid and works, but it’s a heuristic, not a full J.2 implementation. Either:
- Clarify in the XML doc that heuristics use the lower‑bound cutoffs, or
- Make thresholds depend on the eventual version group (two‑pass or compare‑and‑choose).
55-97
: Initial mode rules: consider parameterizing thresholdsCurrent constants (Byte fallback <4, Alnum preference <7, Byte fallback for alnum <6) are the small‑version cutoffs. Exposing them as named constants (with comments for [1–9],[10–26],[27–40]) makes future tuning easier and avoids magic numbers.
101-131
: Remove unused locals and annotate thresholds
var c = text[pos];
isn’t used in these loops. Drop it. Also, extract 6/11/13 into named constants for readability and future tuning.Diff sketch:
- while (pos < text.Length) - { - var c = text[pos]; + while (pos < text.Length) + {Also applies to: 136-164, 168-192
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
QRCoder/QRCodeGenerator.cs
(1 hunks)QRCoder/QRCodeGenerator/AlphanumericEncoder.cs
(1 hunks)QRCoder/QRCodeGenerator/OptimizedDataSegment.cs
(1 hunks)QRCoderTests/QRGeneratorTests.cs
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (2)
QRCoder/QRCodeGenerator.cs (4)
QRCodeGenerator
(19-21)DataSegment
(129-135)EncodingMode
(746-761)IsInRange
(766-767)QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (4)
QRCodeGenerator
(3-82)AlphanumericEncoder
(8-81)CanEncode
(46-46)CanEncodeNonDigit
(41-41)
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (1)
QRCoder/QRCodeGenerator.cs (1)
IsInRange
(766-767)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test .NET 6.0 Windows
- GitHub Check: Test .NET 5.0 Windows
- GitHub Check: Test .NET Core 3.1
- GitHub Check: Test .NET Core 2.1
- GitHub Check: additional-tests
🔇 Additional comments (3)
QRCoderTests/QRGeneratorTests.cs (1)
479-479
: Updated expected matrix for multi‑mode path looks fineChange aligns with new segmentation; nothing else stands out here.
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (2)
39-45
: Byte segment encoding choice is correct for the gated pathFor ISO‑valid input with ECI Default, forcing ISO‑8859‑1 bytes and ECI Default is correct. No change requested.
If you later allow non‑Default ECI into the optimized path, ensure Byte segments propagate that ECI consistently.
195-208
: Local helpers read well; naming matches usageCountConsecutive/IsNumeric/IsAlphanumeric are clear. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (2)
36-44
: UseEciMode.Default
for consistency.Line 39 passes
EciMode.Iso8859_1
toPlainTextToBinaryByte
, while line 44 usesEciMode.Default
in theDataSegment
constructor. SinceCreateOptimizedDataSegment
is only invoked wheneciMode == EciMode.Default
(per line 116 in QRCodeGenerator.cs), both should useEciMode.Default
for consistency.Functionally, this doesn't affect correctness because the
IsValidISO
check inPlainTextToBinaryByte
(line 1110 in QRCodeGenerator.cs) will force ISO-8859-1 encoding regardless. However, usingDefault
would make the intent clearer.Apply this diff:
var segmentData = mode switch { - EncodingMode.Byte => PlainTextToBinaryByte(text, startPos, segmentLength, EciMode.Iso8859_1, false, false), + EncodingMode.Byte => PlainTextToBinaryByte(text, startPos, segmentLength, EciMode.Default, false, false), EncodingMode.Alphanumeric => AlphanumericEncoder.GetBitArray(text, startPos, segmentLength), EncodingMode.Numeric => PlainTextToBinaryNumeric(text, startPos, segmentLength), _ => throw new InvalidOperationException("Unsupported encoding mode") }; var segment = new DataSegment(mode, segmentLength, segmentData, EciMode.Default);
67-156
: Document the use of conservative thresholds.The implementation uses the smallest threshold values from the ISO/IEC 18004:2015 Annex J.2 ranges (e.g., 4 instead of [4,4,5] on line 70, 6 instead of [6,8,9] on line 111, etc.). This is a valid design choice that ensures correctness across all QR code versions while simplifying the implementation.
Consider adding a comment explaining this trade-off, such as:
// Note: This implementation uses the smallest (most conservative) threshold values from // the ISO standard ranges. While this works correctly for all versions, larger versions // could potentially achieve better optimization with version-specific thresholds.QRCoder/QRCodeGenerator.cs (1)
1105-1114
: Consider checkingIsValidISO
on the substring.Line 1110 calls
IsValidISO(plainText)
which checks the entire string rather than the substring being encoded ([offset, offset+count)
).Currently, this is not a bug because:
- The primary caller (
CreateOptimizedDataSegment
) only invokes this method when the entire input is known to be valid ISO-8859-1- If the entire string is valid ISO, all substrings are also valid ISO
However, for general-purpose use, this could lead to suboptimal encoding: if a substring is valid ISO but the full string is not, the code would skip ISO-8859-1 encoding even though the substring could benefit from it.
Consider adding a helper that checks a substring:
private static bool IsValidISO(string input, int offset, int count) { for (int i = offset; i < offset + count; i++) if (input[i] > 0xFF) return false; return true; }And updating line 1110:
- if (IsValidISO(plainText) && !forceUtf8) + if (IsValidISO(plainText, offset, count) && !forceUtf8)This would make the API more robust for future use cases.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
QRCoder/QRCodeGenerator.cs
(4 hunks)QRCoder/QRCodeGenerator/AlphanumericEncoder.cs
(2 hunks)QRCoder/QRCodeGenerator/OptimizedDataSegment.cs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- QRCoder/QRCodeGenerator/AlphanumericEncoder.cs
🧰 Additional context used
🧬 Code graph analysis (2)
QRCoder/QRCodeGenerator.cs (1)
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (2)
BitArray
(55-56)BitArray
(67-90)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (2)
QRCoder/QRCodeGenerator.cs (4)
QRCodeGenerator
(19-21)DataSegment
(129-135)EncodingMode
(746-761)IsInRange
(766-767)QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (4)
QRCodeGenerator
(3-92)AlphanumericEncoder
(8-91)CanEncode
(46-46)CanEncodeNonDigit
(41-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test .NET Core 3.1
- GitHub Check: Test .NET Framework 4.6.2
- GitHub Check: Test .NET Core 2.1
- GitHub Check: Test .NET 6.0 Windows
- GitHub Check: Test .NET 5.0 Windows
- GitHub Check: additional-tests
🔇 Additional comments (5)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (2)
195-207
: LGTM!The helper functions are correctly implemented:
CountConsecutive
efficiently counts matching charactersIsNumeric
andIsAlphanumeric
properly delegate to utility methods
7-21
: Well-structured implementation.The recursive approach using local functions is clean and clearly follows the ISO/IEC 18004:2015 Annex J.2 algorithm. The documentation properly notes the Kanji mode limitation.
QRCoder/QRCodeGenerator.cs (3)
115-118
: Well-designed optimization condition.The conditional logic correctly restricts the optimized segmentation path to scenarios where:
- UTF-8 is not forced
- No byte-order mark is required
- Default ECI mode is used
- The input is ISO-8859-1 compatible
This ensures the optimization is only applied when safe and beneficial.
1012-1064
: LGTM!The range-aware numeric encoding implementation is correct:
- The default overload properly delegates to the parameterized version
- All bit array size calculations correctly use
count
- All substring/span extractions properly use
offset + i
- Remainder handling (for 1 or 2 remaining digits) correctly computes indices
The changes enable efficient multi-segment encoding while maintaining correctness.
1148-1156
: LGTM!The span-based and non-span encoding paths correctly use the
offset
andcount
parameters:
GetByteCount
andGetBytes
operate on the specified substring- Buffer allocation and slicing are properly sized
- The fallback
Substring
call uses the correct range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
QRCoder/QRCodeGenerator.cs (1)
1105-1115
: Minor efficiency note: IsValidISO checks full string.Line 1110 checks
IsValidISO(plainText)
for the entire string rather than just the substring being encoded. This is safe because the optimization path (line 116) only activates when the full string is ISO-valid, guaranteeing all substrings are also valid. Additionally, when called from the optimized path witheciMode == EciMode.Iso8859_1
, the encoding is forced via the switch statement (line 1118-1123), making the check redundant but harmless.Consider checking only the substring to avoid unnecessary full-string scans:
- if (IsValidISO(plainText) && !forceUtf8) + if (!forceUtf8 && IsValidISOSubstring(plainText, offset, count))Where
IsValidISOSubstring
checks only the specified range:private static bool IsValidISOSubstring(string input, int offset, int count) { for (int i = offset; i < offset + count; i++) { if (input[i] > 0xFF) return false; } return true; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
QRCoder/QRCodeGenerator.cs
(4 hunks)QRCoder/QRCodeGenerator/OptimizedDataSegment.cs
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (2)
QRCoder/QRCodeGenerator.cs (3)
QRCodeGenerator
(19-21)EncodingMode
(746-761)IsInRange
(766-767)QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (4)
QRCodeGenerator
(3-92)AlphanumericEncoder
(8-91)CanEncode
(46-46)CanEncodeNonDigit
(41-41)
QRCoder/QRCodeGenerator.cs (1)
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (2)
BitArray
(55-56)BitArray
(67-90)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test .NET 6.0 Windows
- GitHub Check: Test .NET 5.0 Windows
- GitHub Check: Test .NET Core 3.1
- GitHub Check: Test .NET Core 2.1
- GitHub Check: additional-tests
🔇 Additional comments (6)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (3)
7-51
: Well-structured recursive segmentation.The recursive approach correctly handles mode transitions and prevents infinite recursion by ensuring forward progress (segmentEnd > startPos). Empty string handling is appropriate.
133-192
: LGTM!Both
ProcessAlphanumericMode
andProcessNumericMode
correctly implement their respective ISO J.2 rules. The use ofSelectInitialMode
at line 182 for intelligent next-mode selection is a good design choice.
194-208
: LGTM!Helper functions are simple and correct. The predicate-based
CountConsecutive
design provides good flexibility for different character tests.QRCoder/QRCodeGenerator.cs (3)
115-118
: LGTM!The optimization path selection is appropriately conservative, only activating when all safety conditions are met. The fallback to
CreateDataSegment
preserves backward compatibility.
1012-1064
: LGTM!The offset/count-based numeric encoding correctly handles all cases (groups of 3, remainder of 2, remainder of 1). Span-based parsing on supported platforms provides good performance.
1144-1181
: LGTM!The span-based encoding path efficiently uses stackalloc for small buffers and ArrayPool for larger ones, with proper cleanup. BOM handling is correct, and the offset/count parameters are applied consistently throughout.
@gfoidl Should this be a minor version bump? (to version 1.8.0 vs 1.7.1) I assume so, just in case it has an unintended side-effect in an edge case, but what do you think? It does not change the public API and meets QR code spec, and only is applied in scenarios where it is "known" to be compatible, so it could be simply a revision. |
Note that the optimization algorithm is tailored for QR code versions 1-9. For versions 10-26 or 27-40 there would ideally be different breakpoints at which the algorithm should switch. This means for versions 10-40, it may be possible to write a carefully-constructed QR code which would take up more space than unoptimized. The content would have to be tailored to trigger a mode change exactly at specific points while also maxing out the capacity for the requested version, such as this sample code, which throws a data too large exception: var gen = new QRCodeGenerator();
var str = "111111a111111a111111a111111a111111a111111";
var s = new string('a', 119 - str.Length) + str;
_ = gen.CreateQrCode(s, ECCLevel.H, requestedVersion: 10); The existing code is written to encode to a bit array before selecting the version. And the selected version changes the size of the bit array, so it would be a problem to do it in the reverse order. For a truly ideal algorithm, we would need to split the calculation of the ending length from the encoding to a bit array, so it could be calculated multiple times while determining the ideal version. |
We could change the design so the most conservative breakpoints are used, such that it would not always be optimized for versions 1-26 but would never cause the issue described above in rare situations for versions 10-40. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
QRCoderTests/QRGeneratorTests.cs (2)
479-480
: Reduce brittleness: avoid giant literal bitstring; use hash-based assertion like other tests.Current assertion embeds a very long literal that’s noisy and fragile. Prefer the established pattern used above (SHA1 → Base64, then ShouldBe on the trimmed hash). Optionally also assert qrData.Version.
482-499
: Harden mixed-mode test for maintainability
- Retain Version.ShouldBe(4).
- Replace the inline bit-string assertion with a hash-based check to reduce noise.
- Normalize line endings in the approved file (e.g. pass endOfLine:"\n" to GetGraphic or scrub CRLFs) to prevent spurious diffs.
- Optionally drop one of the two payload checks (hash vs. approval) to avoid redundancy.
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (2)
75-94
: Add basic argument validation for substring overloads.Null, index, and count aren’t validated; misuse would surface as IndexOutOfRange/KeyNotFound later. Add cheap guards to fail fast with clear messages.
Apply within these methods:
public static BitArray GetBitArray(string plainText, int index, int count) { - var codeText = new BitArray(GetBitLength(count)); + if (plainText is null) throw new ArgumentNullException(nameof(plainText)); + if (index < 0) throw new ArgumentOutOfRangeException(nameof(index)); + if (count < 0) throw new ArgumentOutOfRangeException(nameof(count)); + if (index > plainText.Length - count) throw new ArgumentOutOfRangeException(nameof(count)); + var codeText = new BitArray(GetBitLength(count)); WriteToBitArray(plainText, index, count, codeText, 0); return codeText; } public static int WriteToBitArray(string plainText, BitArray codeText, int codeIndex) { - return WriteToBitArray(plainText, 0, plainText.Length, codeText, codeIndex); + if (plainText is null) throw new ArgumentNullException(nameof(plainText)); + return WriteToBitArray(plainText, 0, plainText.Length, codeText, codeIndex); }Also applies to: 105-108
110-141
: Guard substring writer preconditions.Validate inputs up front; this prevents subtle range bugs and produces clearer exceptions.
public static int WriteToBitArray(string plainText, int index, int count, BitArray codeText, int codeIndex) { + if (plainText is null) throw new ArgumentNullException(nameof(plainText)); + if (codeText is null) throw new ArgumentNullException(nameof(codeText)); + if (index < 0) throw new ArgumentOutOfRangeException(nameof(index)); + if (count < 0) throw new ArgumentOutOfRangeException(nameof(count)); + if (index > plainText.Length - count) throw new ArgumentOutOfRangeException(nameof(count)); + if (codeIndex < 0) throw new ArgumentOutOfRangeException(nameof(codeIndex)); // Process each pair of characters. while (count >= 2) { var dec = _alphanumEncDict[plainText[index++]] * 45 + _alphanumEncDict[plainText[index++]]; codeIndex = DecToBin(dec, 11, codeText, codeIndex); count -= 2; } if (count > 0) { codeIndex = DecToBin(_alphanumEncDict[plainText[index]], 6, codeText, codeIndex); } return codeIndex; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
QRCoder/QRCodeGenerator.cs
(1 hunks)QRCoder/QRCodeGenerator/AlphanumericEncoder.cs
(3 hunks)QRCoderTests/QRGeneratorTests.cs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- QRCoder/QRCodeGenerator.cs
🧰 Additional context used
🧬 Code graph analysis (2)
QRCoderTests/QRGeneratorTests.cs (4)
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (1)
QRCodeGenerator
(3-142)QRCoder/QRCodeGenerator.cs (3)
QRCodeGenerator
(14-1031)QRCodeGenerator
(19-21)ECCLevel
(283-291)QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (1)
QRCodeGenerator
(5-215)QRCoder/ASCIIQRCode.cs (4)
AsciiQRCode
(8-153)AsciiQRCode
(13-13)AsciiQRCode
(19-19)GetGraphic
(31-36)
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (2)
QRCoder/QRCodeGenerator.cs (2)
IsInRange
(772-773)BitArray
(999-1008)QRCoder/QRCodeGenerator/AlphanumericDataSegment.cs (1)
GetBitLength
(29-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test .NET 6.0 Windows
- GitHub Check: Test .NET Core 2.1
- GitHub Check: Test .NET 5.0 Windows
- GitHub Check: Test .NET Framework 4.6.2
- GitHub Check: Test .NET Core 3.1
- GitHub Check: additional-tests
🔇 Additional comments (2)
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (2)
43-47
: Additive API looks good.
CanEncode
is correct and consistent withCanEncodeNonDigit
and_alphanumEncTable
.
55-66
: Bit-length helpers are correct and optimized.Delegation plus
(n/2)*11 + (n&1)*6
is correct and efficient.
Fixed; these notes are not applicable anymore. The revised code adapts to the version being used, recalculating the bit length for each version breakpoint. |
n/a |
Increased memory usage is fixed |
private static DataSegment CreateDataSegment(string plainText, bool forceUtf8, bool utf8BOM, EciMode eciMode) | ||
{ | ||
// Fast path: Use optimized Latin1 segment if conditions allow | ||
if (!forceUtf8 && !utf8BOM && eciMode == EciMode.Default && OptimizedLatin1DataSegment.CanEncode(plainText)) | ||
return new OptimizedLatin1DataSegment(plainText); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code should be effective in most usage scenarios, but does not change any semantics (whether correct or not) for strings that contain characters outside of ISO 8859-1, or when a specific ECI or UTF mode has been specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
QRCoderTests/QRGeneratorTests.cs (1)
795-805
: Consider renaming the test method for clarity.The test name
multi_mode_does_not_compress_longer_than_original
contains contradictory terminology (compression implies reduction, not expansion). Consider alternatives like:
multi_mode_does_not_expand_encoding
multi_mode_does_not_increase_encoded_length
multi_mode_encoding_is_not_worse_than_single_mode
The test itself is well-designed and appropriately validates the version 10 boundary case mentioned in the PR context, where the optimization algorithm transitions between breakpoint ranges.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
QRCoder/QRCodeGenerator.cs
(1 hunks)QRCoderTests/QRGeneratorTests.cs
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- QRCoder/QRCodeGenerator.cs
🧰 Additional context used
🧬 Code graph analysis (1)
QRCoderTests/QRGeneratorTests.cs (4)
QRCoder/QRCodeGenerator.cs (3)
QRCodeGenerator
(14-1033)QRCodeGenerator
(19-21)ECCLevel
(285-293)QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (1)
QRCodeGenerator
(5-337)QRCoder/QRCodeGenerator/CapacityTables.cs (1)
QRCodeGenerator
(3-444)QRCoder/ASCIIQRCode.cs (4)
AsciiQRCode
(8-153)AsciiQRCode
(13-13)AsciiQRCode
(19-19)GetGraphic
(31-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test .NET Core 2.1
- GitHub Check: Test .NET 5.0 Windows
- GitHub Check: Test .NET 6.0 Windows
- GitHub Check: Test .NET Framework 4.6.2
- GitHub Check: additional-tests
- GitHub Check: Test .NET Core 3.1
🔇 Additional comments (1)
QRCoderTests/QRGeneratorTests.cs (1)
482-500
: LGTM! Well-designed test for multi-mode encoding.The test effectively validates that mixed-mode encoding reduces the QR version from 5 to 4 by switching between alphanumeric, byte, and numeric modes. The ASCII snapshot validation provides good visual verification of the encoded output.
@gfoidl Ready for review. On a side note, due to recent PRs, memory usage in the 'before' case has been reduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (2)
169-180
: Incorrect condition for ISO rule a.4.Line 178 checks
!IsAlphanumericNonDigit(text[nextPos])
, which is true for both numeric and byte-only characters. However, ISO/IEC 18004:2015 Annex J.2 rule a.4 states "followed by data from the exclusive subset of the Byte character set," meaning characters that cannot be encoded in Alphanumeric mode (byte-only). The condition will incorrectly select Byte mode when a numeric sequence of less than 4 digits is followed by another numeric character.Apply this diff to fix the condition:
if (numericCount < threshold) { var nextPos = startPos + numericCount; - if (nextPos < text.Length && !IsAlphanumericNonDigit(text[nextPos])) + if (nextPos < text.Length && !IsAlphanumeric(text[nextPos])) return EncodingMode.Byte; }
226-232
: Incorrect character counting for ISO rule b.2.Line 227 counts all alphanumeric characters (including numeric digits) using
IsAlphanumeric
. However, ISO/IEC 18004:2015 Annex J.2 rule b.2 specifies "characters from the exclusive subset of the Alphanumeric character set," meaning alphanumeric-only characters excluding numeric digits (A-Z and special characters). This will incorrectly include numeric runs in the alphanumeric count, potentially causing premature mode switches.Apply this diff to count only alphanumeric-exclusive characters:
// Rule b.2: If a sequence of at least [11,15,16] character from the exclusive subset of the Alphanumeric character set occurs before more data from the exclusive subset of the Byte character set, switch to Alphanumeric mode - var alphanumericCount = CountConsecutive(text, pos, IsAlphanumeric); + var alphanumericCount = CountConsecutive(text, pos, IsAlphanumericNonDigit); if (alphanumericCount >= alphaThreshold)
🧹 Nitpick comments (2)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (2)
99-136
: LGTM!The
WriteTo
method correctly mirrors the structure ofGetBitLength
, ensuring consistency between bit length calculation and actual encoding. The empty string is handled gracefully.Optional refactor: Consider extracting shared segmentation logic.
The mode selection and segmentation logic is duplicated between
GetBitLength
andWriteTo
. While this is a known pattern in the codebase, extracting the shared logic into a helper method (e.g.,SegmentText
) that yields(mode, startPos, endPos)
tuples could reduce duplication and improve maintainability.
278-302
: Fix minor typo in comment.The
ProcessNumericMode
method correctly delegates toSelectInitialMode
for intelligent mode selection when exiting numeric mode, which is a good approach.Apply this diff to fix the typo:
- // Replaced by using the more intelligent intial mode logic: + // Replaced by using the more intelligent initial mode logic: if (!IsNumeric(c))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (2)
QRCoder/QRCodeGenerator/AlphanumericEncoder.cs (5)
QRCodeGenerator
(3-99)GetBitLength
(48-51)BitArray
(60-65)AlphanumericEncoder
(8-98)CanEncodeNonDigit
(41-41)QRCoder/QRCodeGenerator.cs (9)
QRCodeGenerator
(14-1032)QRCodeGenerator
(19-21)DataSegment
(126-142)IsValidISO
(977-989)EncodingMode
(753-768)GetCountIndicatorLength
(896-956)BitArray
(1000-1009)DecToBin
(878-888)IsInRange
(773-774)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test .NET 6.0 Windows
- GitHub Check: Test .NET Core 3.1
- GitHub Check: Test .NET 5.0 Windows
- GitHub Check: Test .NET Core 2.1
- GitHub Check: Test .NET Framework 4.6.2
- GitHub Check: additional-tests
- GitHub Check: format
🔇 Additional comments (9)
QRCoder/QRCodeGenerator/OptimizedDataSegment.cs (9)
1-36
: LGTM!The class structure,
CanEncode
validation, and constructor are correct. TheEncodingMode
property returningByte
is appropriate as a fallback for exception messages.
42-79
: LGTM!The
GetBitLength
method correctly iterates through the text, selecting modes and accumulating bit lengths. The empty string is handled gracefully, and the loop structure ensures the entire text is processed.
84-90
: LGTM!The
GetByteBitLength
helper correctly calculates the bit length for a byte mode segment, accounting for the mode indicator, count indicator, and ISO-8859-1 encoding (8 bits per character).
141-157
: LGTM!The
WriteByteSegment
helper correctly encodes a byte mode segment by writing the mode indicator, count indicator, and ISO-8859-1 encoded data to theBitArray
.
161-168
: LGTM!The implementation of ISO rules a.1 and a.3 is correct. Rule a.1 properly detects byte-only characters, and rule a.3 correctly handles alphanumeric sequences followed by non-alphanumeric characters.
Also applies to: 193-204
210-225
: LGTM!The implementation of ISO rule b.3 is correct, properly detecting numeric sequences that warrant a mode switch. The overall structure of
ProcessByteMode
is sound.Also applies to: 233-240
244-274
: LGTM!The
ProcessAlphanumericMode
method correctly implements ISO rules c.2 and c.3, detecting byte-only characters and numeric sequences that warrant mode switches.
309-327
: LGTM!The
GetBreakpoint
helper correctly implements version-dependent thresholds per ISO/IEC 18004:2015 Annex J.2, and theCountConsecutive
helper is a clean utility for counting matching characters.
329-336
: LGTM!The character classification helpers are correctly implemented.
IsNumeric
checks the digit range,IsAlphanumeric
correctly combines numeric and alphanumeric-non-digit checks, andIsAlphanumericNonDigit
appropriately delegates to the encoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it reads really nice, and thanks for all the comments that make it easy to follow along.
Should this be a minor version bump?
Yes. It brings in an enhancement, so minor should be increased here (depending on when you plan to release 1,8.0).
Or put in other words. That change shouldn't go into 1.7.x.
This PR includes a basic algorithm for producing multi-mode QR codes. It is only activated when a QR code is created with default settings (apart from ECC level) and does not contain characters outside the Latin-1 character set.
The benchmarks below for CreateQRCodeMultiMode demonstrate if the 'after' code produced the same version (size) QR code. So in that example there is no net benefit to the user.
The sample below shows how fewer bits encoded may produce a smaller QR code, which would both encode and render faster.
This PR does not include any code to automatically increase the ECC level if there are extra bits available to do so.
Before
After
Sample Before - 664 bits encoded - version 5
Sample After - 526 bits encoded - version 4
Summary by CodeRabbit