Skip to content

Conversation

@larsbarring
Copy link
Contributor

@larsbarring larsbarring commented Oct 27, 2025

Closes issue #128.

  1. Explicit UTF-8 superscript number, not range (line 114)

  2. Update the letter pattern to exclude superscript characters, UTF-8 patterns excluding superscripts (line 126-128)

  • The new utf8_2bytes_no_super letter pattern excludes:
    • \xc2\xb1 (±, plus-minus, not relevant but close)
    • \xc2\xb2 (² superscript 2)
    • \xc2\xb3 (³ superscript 3)
    • \xc2\xb3 (¶ pilcrow)
    • \xc2\xb7 (· middle dot)
    • \xc2\xb7 (¸ cedilla)
    • \xc2\xb9 (¹ superscript 1)
  • The new utf8_3bytes_no_super letter pattern excludes:
    • \xe2\x81\xb0 through \xe2\x81\xb9 (superscript digits ⁰, ⁴⁻⁹)
    • \xe2\x81\xba and \xe2\x81\xbb (superscript + and -)

This ensures that UTF-8 superscript characters are not captured as letters by the {id} rule and can instead be properly matched by the {utf8_exponent} rule.

@larsbarring
Copy link
Contributor Author

Simple bash script and C code for testing is available here, common to PR #134 (issue #128) , PR #135 (issue #129, and PR #136 (issue #132).

1. Explicit UTF-8 superscript number, not range (line 114)

2. Update letter pattern to exclude superscript characters,
   UTF-8 patterns excluding superscripts (line 126-128)

   The new utf8_2bytes_no_super pattern excludes:
         \xc2\xb1 (±, plus-minus, not relevant but close)
         \xc2\xb2 (² superscript 2)
         \xc2\xb3 (³ superscript 3)
         \xc2\xb3 (¶ pilcrow)
         \xc2\xb7 (· middle dot)
         \xc2\xb7 (¸ cedilla)
         \xc2\xb9 (¹ superscript 1)
   The new utf8_3bytes_no_super pattern excludes:
         \xe2\x81\xb0 through \xe2\x81\xb9 (superscript digits ⁰-⁹)
         \xe2\x81\xba and \xe2\x81\xbb (superscript + and -)
   This ensures that UTF-8 superscript characters are not captured
   by the {id} rule and can instead be properly matched by the
   {utf8_exponent} rule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant