Skip to content

Output of non-ASCII chars garbled if they match inverted character class #268

Open
@mbunkus

Description

@mbunkus

I have a file containing one non-ASCII character, e.g. the German Umlaut "ö". Matching the "ö" normally then all output is just fine. However, the output is garbled when the "ö" is matched by using an inverted character class.

My use case is that I'm searching for files that still use other encodings that UTF-8, and for that I use a character class that excludes all "known good" characters. However, this problem also occurs with UTF-8 encoded files.

Here's an example (copy & paste from the console):

[0 mbunkus@chai-latte ~] ack ö hallo.txt
Hallöle
[0 mbunkus@chai-latte ~] ack -i '[^a-z]' hallo.txt
Hall[0m�le
[0 mbunkus@chai-latte ~] cat hallo.txt
Hallöle
[0 mbunkus@chai-latte ~] locale
LANG=en_US.UTF-8
LC_CTYPE=de_DE.UTF-8
LC_NUMERIC=de_DE.UTF-8
LC_TIME=en_DK.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES=en_US.UTF-8
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Note that the colorization includes only the "ö" in the good case and "[0m�" in the bad case. Meaning the colorization is correct regarding which characters are highlighted and which aren't; just the characters output are wrong.

This happens both with ack 2.04 release and git at 3e498f7.

BTW: I accidentally filed this issue against the wrong ack repo (the old one) as it's not really easy to find a link to this repo on the home page.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions