Skip to content

Fix placeholder spans #1979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 1, 2025
Merged

Fix placeholder spans #1979

merged 4 commits into from
Aug 1, 2025

Conversation

xitep
Copy link
Contributor

@xitep xitep commented Jul 27, 2025

  • Optional support for :name as a parameters in queries; initially requested through Parameterized queries #291
  • The naming convention chosen here is based on the java persistence api spec (https://jakarta.ee/specifications/persistence/3.2/jakarta-persistence-spec-3.2#a4401) and should fit in well
  • The support here is already at the level of the tokenizer, to get the span of the whole placeholder right (i'm actually interested in replacing the parameter as a whole in the original query)
  • in fact, sqlparser's parser combines a colon token followed by a word token into a placeholder, but does not adjust the span of the resulting ast node :/ and allows for whitespace between the colon and the word. this is something i wanted to avoid, but i'm don't feel confident to change the parser without breaking already existing functionality (if any) based on this behaviour.

Comment on lines 844 to 848
/// Returns true if this dialect allow colon placeholders
/// e.g. `SELECT :var` (JPA named parameters)
fn supports_colon_placeholder(&self) -> bool {
false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, sqlparser's parser combines a colon token followed by a word token into a placeholder, but does not adjust the span of the resulting ast node :/ and allows for whitespace between the colon and the word.

@xitep could you elaborate on this part? I'm not sure I followed the intent, is my understanding correct that currently the parser indeed handles e.g. SELECT :var? If so what's the requirement for the tokenizer updates?

Copy link
Contributor Author

@xitep xitep Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello @iffyio, yes the parser can deal with things like :var already, but i'm not sure this is a feature or rather just a side effect of the code pointed out above. here's an example:

    let q = sqlparser::parser::Parser::new(&sqlparser::dialect::GenericDialect)
        .try_with_sql("select :var")
        .unwrap()
        .parse_query()
        .unwrap();
    let _ = sqlparser::ast::visit_expressions(&q, |expr: &Expr| {
        if let Expr::Value(v) = expr
            && let Value::Placeholder(s) = &v.value
        {
            println!("value: {v:?}");
            println!("placeholder: {s:?})");
        }
        ControlFlow::<()>::Continue(())
    });
// OUTPUT:
// value: ValueWithSpan { value: Placeholder(":var"), span: Span(Location(1,8)..Location(1,9)) }
// placeholder: ":var")

note:

  1. the span is incorrect, it covers only the colon
  2. the placeholder represents :<word> (which in fact is what i'm after)

however, you'll get the same output when using "select : var" or even "select : /*foobar*/ var" as input :-/ this is not what i want.

so the intention is to make :<word> a first level token, a placeholder.

does this help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that make sense, it seems that the handling of placeholders isn't very consistent by the parser as some are done in tokenizer while others later on. I think it could be good to do a more general hopefully improvement if we feel that ideally placeholders are managed by the tokenizer, the we can parse and represent it consistently/accordingly.

Did a quick pass through existing support, and I can imagine something like the following representation to merge the different representations explicitly, wdyt?

enum PlaceholderKind {
	/// Example:
	/// ```sql
	/// SELECT ?, ?foo
	/// ```
	SingleQuestionMark,
	/// Example:
	/// ```sql
	/// SELECT @foo
	/// ```
	SingleAt,
	/// Example:
	/// ```sql
	/// SELECT $1, $foo
	/// ```
	SingleDollar,
	/// Example:
	/// ```sql
	/// SELECT :42, :foo
	/// ```
	Colon,
	/// Example:
	/// ```sql
	/// SELECT $$, $$abc$$, $abc$, $$$abc$$$
	/// ```
	DollarEnclosed {
		count: usize
	}
}

Token::Placeholder {
	kind: PlaceholderKind,
	value: Option<String>
}

It would also remove the need for this code as well

// Not calling self.parse_identifier(false)? because only in placeholder we want to check numbers as idfentifies
// This because snowflake allows numbers as placeholders
let next_token = self.next_token();
let ident = match next_token.token {
Token::Word(w) => Ok(w.into_ident(next_token.span)),
Token::Number(w, false) => Ok(Ident::new(w)),
_ => self.expected("placeholder", next_token),
}?;
let placeholder = tok.to_string() + &ident.value;
ok_value(Value::Placeholder(placeholder))

Copy link
Contributor Author

@xitep xitep Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i can try it out and upgrade this PR accordingly. here's one thought to be clarified before i set off with it:

Token::Placeholder {
	kind: PlaceholderKind,
	value: Option<String>
}

this implies to me that "value" would be stripped off of the placeholder demarcation character(s); e.g. for the input ":foo" -> value would read Some("foo"), for the input "$$foo$$" it would read Some("foo"), for the input "$" value would be None, correct?

But then, what would sqlparser::ast::Value::Placeholder(..) look like? Clients surely might want to know exactly which placeholder style/kind was used; since sqlparser supports various formats, some clients might opt in to reject what they don't want their queries to contain.

all in all, it looks to me that in your proposal we want to end up with value being the original, full token as identified in the scanned input (which, naturally, is never None or empty), therefore:

Token::Placeholder {
	kind: PlaceholderKind,
	value: String,  // ~ preferrably &'a str
}

any feedback on this thought?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's correct that the input would be stripped into two, the kind and the value as in your examples.

Value::Placeholder I imagine would have the same representation mirroring Token::Placeholder since the goal becomes that placeholder parsing is handled by the tokenizer i.e.Value::Placeholder(Token::Placeholder)

which, naturally, is never None or empty

I think we might not always have a variable name, hence the Option - sqlite for example uses anonymous ? in some contexts for placeholders e.g. WHERE city IN (?, ?, ?, ?). In that scenario the representation becomes Token::Placeholder{kind: PlaceholderKind::SingleQuestionMark, value: None}

value: String, // ~ preferrably &'a str

ah heads up I think a reference &str wouldn't be feasible here without introducing lifetimes to the Token/Value enums which would be quite an invasive change and likely not worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrt ...

	DollarEnclosed {
		count: usize
	}

i'm seeing "$$foo$$" being treated as a "double quoted string", not as a placeholder. if dialect.supports_dollar_placeholder() == true) then $... is considered a placeholder, but the dollars at the start and end don't have to be balanced. this make the DollarEnclosed variant essentially void.

Copy link
Contributor Author

@xitep xitep Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, after a lot of struggle i came to the conclusion that the approach with first-class placeholder is not feasible in sqlparser. After having the tokenizer and the model change ready, this test (now failing) convinced me, that my concern cannot be resolved at the level of the tokenizer since sqlparser needs to be very flexible and determine the meaning only on the level of the parser. you can have a look here at how the change we discussed would have looked like. (this would be the consequential change to the parser.)

fortunately, i have found a far smaller change that satisfies my needs and will update this PR.

src/tokenizer.rs Outdated
/// Tokenizes an identifier followed immediately after a colon,
/// aka named query parameter, e.g. `:name`. The next char of the
/// processed char stream is to be an alphabetic - panics otherwise.
fn tokenize_colon_preceeded_placeholder(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would likely be ideal to use existing logic like tokenize_identifier_or_keyword or similar. Also heads up in general for the parser that returning an error when input is invalid etc is preferred over panicking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • wrt tokenize_identifier_or_keyword ... i guess it would make it quite hard to support the $ then as part of the identifier (if enabled by Dialect::supports_dollar_placeholder https://docs.rs/sqlparser/latest/sqlparser/dialect/trait.Dialect.html#method.supports_dollar_placeholder.) i see the placeholder identifiers distinct (ie. with different restrictions) from other SQL identifiers.
  • thanks for the heads-up; the panic here would be a programming error, as that contract says that the caller must make sure ... but I'm fine with the established convention and would make it return an error instead.

Comment on lines 844 to 848
/// Returns true if this dialect allow colon placeholders
/// e.g. `SELECT :var` (JPA named parameters)
fn supports_colon_placeholder(&self) -> bool {
false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that make sense, it seems that the handling of placeholders isn't very consistent by the parser as some are done in tokenizer while others later on. I think it could be good to do a more general hopefully improvement if we feel that ideally placeholders are managed by the tokenizer, the we can parse and represent it consistently/accordingly.

Did a quick pass through existing support, and I can imagine something like the following representation to merge the different representations explicitly, wdyt?

enum PlaceholderKind {
	/// Example:
	/// ```sql
	/// SELECT ?, ?foo
	/// ```
	SingleQuestionMark,
	/// Example:
	/// ```sql
	/// SELECT @foo
	/// ```
	SingleAt,
	/// Example:
	/// ```sql
	/// SELECT $1, $foo
	/// ```
	SingleDollar,
	/// Example:
	/// ```sql
	/// SELECT :42, :foo
	/// ```
	Colon,
	/// Example:
	/// ```sql
	/// SELECT $$, $$abc$$, $abc$, $$$abc$$$
	/// ```
	DollarEnclosed {
		count: usize
	}
}

Token::Placeholder {
	kind: PlaceholderKind,
	value: Option<String>
}

It would also remove the need for this code as well

// Not calling self.parse_identifier(false)? because only in placeholder we want to check numbers as idfentifies
// This because snowflake allows numbers as placeholders
let next_token = self.next_token();
let ident = match next_token.token {
Token::Word(w) => Ok(w.into_ident(next_token.span)),
Token::Number(w, false) => Ok(Ident::new(w)),
_ => self.expected("placeholder", next_token),
}?;
let placeholder = tok.to_string() + &ident.value;
ok_value(Value::Placeholder(placeholder))

xitep added 2 commits July 31, 2025 07:47
1. Require placeholders to be whitespace/comment-less
2. Correctly report span covering the whole placeholder
@xitep
Copy link
Contributor Author

xitep commented Jul 31, 2025

@iffyio would you like to have another look? i changed strategy to the whole problem.

@xitep xitep changed the title Support for colon preceeded placeholders Fix placeholder spans Jul 31, 2025
@xitep xitep requested a review from iffyio July 31, 2025 06:00
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xitep! The changes look reasonable to me overall, left some questions/comments

@@ -9586,14 +9586,14 @@ impl<'a> Parser<'a> {
tok @ Token::Colon | tok @ Token::AtSign => {
// Not calling self.parse_identifier(false)? because only in placeholder we want to check numbers as idfentifies
// This because snowflake allows numbers as placeholders
let next_token = self.next_token();
let next_token = self.next_token_no_skip().unwrap_or(&EOF_TOKEN).clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the intent correctly is this saying that e.g. SELECT :foo is to be parsed as a placeholder but SELECT : foo should not? If so I think it would be good to mention that in a comment. Also can we add a test case to demonstrate this change?

_ => self.expected("placeholder", next_token),
}?;
let placeholder = tok.to_string() + &ident.value;
ok_value(Value::Placeholder(placeholder))
Ok(Value::Placeholder(tok.to_string() + &ident.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test case for this to avoid future regressions? we have some span test cases here that we could add to

Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @xitep!
cc @alamb

@iffyio iffyio merged commit 6932f4a into apache:main Aug 1, 2025
10 checks passed
@xitep
Copy link
Contributor Author

xitep commented Aug 1, 2025

many thanks for your support @iffyio 🙇

@alamb
Copy link
Contributor

alamb commented Aug 1, 2025

many thanks for your support @iffyio 🙇

I agree @iffyio is awesome 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants