Skip to content

Conversation

furtib
Copy link

@furtib furtib commented Sep 26, 2025

Why:
In the current implementation of the skipfiles, every file declared will match any other file that contains the declared file name at the beginning of its name. An example would be: simple.c and simple.cpp. Specifying simple.c with the --flag option would result in a regex: /path/to/simple.c.*

What:
Removed adding '*' to the end of any regex given through --file or the skipfile.

Notes:
According to the documentation, this change is breaking, but I could see some users declaring directories in their skip file without placing a '*' at the end.

Fixes: #4664

@furtib furtib requested a review from Szelethus September 26, 2025 14:36
@furtib furtib self-assigned this Sep 26, 2025
@furtib furtib added bugfix 🔨 analyzer 📈 Related to the analyze commands (analysis driver) labels Sep 26, 2025
Copy link
Collaborator

@barnabasdomozi barnabasdomozi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, I think this should be the default behavior that we don't include an extra * at the end of the regex pattern.

However, as you mentioned, this change could easily break users' skipfiles, so this change should be clearly documented in the upcoming release notes.

@Szelethus
Copy link
Contributor

However, as you mentioned, this change could easily break users' skipfiles, so this change should be clearly documented in the upcoming release notes.

I'm pretty anxious about this one. How widely is the trailing * known? Wouldn't it be friendly to emit a warning for files that would have been skipped with the trailing * but no longer is?

@furtib furtib added the WARN ⚠️: Backward compatibility breaker! MIND THE GAP! Merging this patch will mess up compatibility with the previous releases! label Oct 1, 2025
@furtib furtib requested a review from barnabasdomozi October 1, 2025 11:28
@furtib
Copy link
Author

furtib commented Oct 1, 2025

I have added a clause to add the star to entries that are evidently folders. (filenames ending with '/')
Users will use the '/' to point to folders even on Windows.
With this, I do respect that no filename may end with the '/' character.
Filenames can't contain the '/' character on Windows.
On the Linux side of things, I have been unable to create such a file with: touch, vim, vscode, thunar. Also I found this on a forum
@barnabasdomozi, what are your thoughts?

@barnabasdomozi
Copy link
Collaborator

I have added a clause to add the star to entries that are evidently folders. (filenames ending with '/') Users will use the '/' to point to folders even on Windows. With this, I do respect that no filename may end with the '/' character. Filenames can't contain the '/' character on Windows. On the Linux side of things, I have been unable to create such a file with: touch, vim, vscode, thunar. Also I found this on a forum @barnabasdomozi, what are your thoughts?

What if the user specified a directory without a / character at the end?
E.g.

-/workspace/myproject/build

@furtib
Copy link
Author

furtib commented Oct 2, 2025

I cannot be sure if the path you gave is for a file named build or for a folder, so I do not handle that case.

During a conversation with @HoBoIs, I got the suggestion to, instead of adding a * after a /, we could add an optional /.* to the end of our regex. This would look something like this: (?:\/.*)?. See an explanation for it here

This way, this change is NOT breaking anymore!

This would solve the issue of simple.c and simple.cc while also preserving the workings for a directory path defined like: /workspace/myproject/build

The only consideration to be taken now is: what if the user placed /* at the end?
I suspect that this would not cause an issue because /*/* is a subset of /*

Currently there is a slight issue with the //Z fnmatch.translate places to the end of the user defined regex, but I hope to iron out these issues.

@furtib furtib removed the WARN ⚠️: Backward compatibility breaker! MIND THE GAP! Merging this patch will mess up compatibility with the previous releases! label Oct 2, 2025
@furtib furtib marked this pull request as draft October 2, 2025 15:34
@furtib furtib marked this pull request as ready for review October 3, 2025 08:23
@furtib
Copy link
Author

furtib commented Oct 3, 2025

I have fixed the regex; the issue was fnmatch placing an end of line/end of input. I have moved this to the end of the appended regex.

I also added a comment explaining what the regex does, so in the future, it will be easier to decipher.

We have been thinking about the windows side of things too. Does the paths in compile_commands.json contain '' or '/' on Windows?

Copy link
Collaborator

@barnabasdomozi barnabasdomozi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
See my comment about a minor issue.

We should also investigate which slash type is used on Windows.

# https://docs.python.org/3/library/os.path.html#os.path.normpath
rexpr = re.compile(
fnmatch.translate(norm_skip_path + '*'))
fnmatch.translate(norm_skip_path)[:-2] + r"(?:/.*)?\Z")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fnmatch.translate() function's implementation changes, e.g. it no longer puts \Z in the end, then [:-2] breaks the regex. So instead of [:-2] consider using rstrip()

Also, is there a reason we are using \Z in the end of the regex and not $?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a simple check to see if \Z is there, before removing it.
rstrip(r"\Z") would have stripped away any path ending in z.
e.g: "abcz\Z" -> "abc"

@furtib furtib requested a review from barnabasdomozi October 6, 2025 06:29
@furtib
Copy link
Author

furtib commented Oct 6, 2025

I have added a check to the fnmatch.translate output, so as not to depend so much on its implementation.
I also tried to support Windows systems. os.path.normalize("/") will return "\\" (an escaped backslash). This way, we will check for an optional \\.* on Windows and /.* on linux.

@furtib furtib closed this Oct 6, 2025
@furtib furtib reopened this Oct 6, 2025
Copy link
Collaborator

@barnabasdomozi barnabasdomozi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer 📈 Related to the analyze commands (analysis driver) bugfix 🔨
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Analyze with --file flag runs for multiple files
3 participants