Skip to content

Remove comments from query when parsing SET #4959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: v2.7-bump_version_to_2.7.3
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions lib/set_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ std::map<std::string,std::vector<std::string>> SetParser::parse1() {

re2::RE2 re0("^\\s*SET\\s+", *opt2);
re2::RE2::Replace(&query, re0, "");
re2::RE2 re2("--.*$", *opt2); // remove comments
re2::RE2::Replace(&query, re2, "");
Comment on lines +86 to +87

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change correctly adds removal of -- comments for the parse1() method using the regex "--.*$".

However, the parse1v2() method (defined later in this file, e.g., lines 378-381 show its own SET prefix and suffix trimming) also parses SET statements but does not include similar logic to remove -- comments.

Could this lead to inconsistent behavior or parsing failures if parse1v2() encounters SET queries that include -- comments? If parse1v2() is intended to have feature parity with parse1() regarding comment handling for SET statements, it might require a similar update to ensure consistent parsing behavior.

Comment on lines +86 to +87

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This regex "--.*$" effectively handles -- comments by removing them up to the end of the line.

It's worth considering if other SQL comment types, such as # (hash) comments or /* ... */ block comments, can also appear in SET statements in a way that might interfere with the main parsing logic (the complex regex at line 93).

If these other comment types can also cause parsing issues within SET statements, would a more comprehensive comment removal strategy be beneficial for parse1()? This might involve leveraging or adapting the existing SetParser::remove_comments method (lines 572-612), which handles multiple comment types, or adding specific regexes for other comment styles to ensure robustness against various comment formats in SET queries.

re2::RE2 re1("(\\s|;)+$", *opt2); // remove trailing spaces and semicolon
re2::RE2::Replace(&query, re1, "");

Expand Down