Skip to content
This repository was archived by the owner on Sep 8, 2023. It is now read-only.

Conversation

sehrope
Copy link

@sehrope sehrope commented Jun 21, 2017

Three commits in this PR:

  1. First adds a failing test to showcase dollarQuotedString(...) failing to handle strings like $a$. As the dollar quote tags are randomly chosen it's possible nobody ever noticed this. In practice it's easy to generate and any web service that uses this to quote user input could be trivially bombarded with enough attempts to trigger the bug (it's a 1/20 chance each time).
  2. Fixes said bug in dollarQuotedString(...) and cleans up some tests. The new code also has the positive side effect of using a plain $$ as the first choice for dollar quoting (i.e. tagless). In the common case where it'd be acceptable (i.e. $$ does not appear in the input string), that would shrink the output by two characters.
  3. Changes the set of random chars used for tags to increase the number of choices (48 vs. the original 20) per char and also exclude vowels and vowel lookalikes to prevent the choosing of "naughty" words as dollar quote tags. With the expanded set of chars the probability of it is already relatively slim and the vowel exclusions eliminate it entirely.

sehrope added 3 commits June 21, 2017 09:37
Add a test for dollarQuotedString() to showcase how it fails to handle the
randomly chosen dollar quote tag appearing in the string to be quoted.

The test works by repeatedly trying to quote $a$ until the string itself is
the randomly chosen dollar quote tag.
Fixes a bug in dollarQuotedString() where it fails to account for the
case where the string being quoted contains the randomly chosen dollar
quote tag. Before the fix, if a string containing $a$ there is a one in
21 (number of values in randomTags) chance of it occurring.

With the fix the function now checks if the chosen dollar quote tag is
contained within the string to be quoted and if so chooses a new longer
random tag. The tag initially starts as the empty string so the initial
default choice for quoting becomes $$.

Random additions to the tag were chosen, over incrementing a counter, to
make it slightly more difficult for an attacker to supply an input string
that would require a large number of iterations.

As the new defualt is to use $$ for quoting, a number of the results for
unit tests have been simplified to match on the exact text that would be
returned.
Change the set of characters used to generate random tags for dollar
quotes to include both lower and upper case non-vowel characters and
the digits two (2) through nine (9).

The overall expanded set decreases the likelyhood of a randomly picked
tag existing in the contained input.

The exclusions are to ensure that generated tags do not randomly end
up including naughty words.
@ejcx ejcx self-requested a review June 21, 2017 15:49
@ejcx
Copy link

ejcx commented Jun 21, 2017

Can you please update this to more clearly show the problem and explain the bug?

@sehrope
Copy link
Author

sehrope commented Jun 21, 2017

If you try to dollar quote a literal string that contains $a$ (where "a" is any character in the randomTags) array, you have a 1/20 chance of generating an invalid string as the same string could be used as the dollar quote tag.

Here's an example. Say we have a web service that uses this lib like so:

// Get some user input
var value = req.query.userInput;
// Escape it with this lib
var escapedValue = escape.escapeDollarQuoted(value);
// Build a query with the escaped value
var sql = 'SELECT ' + escapedValue + ' AS x';
// Execute the query
var results = await db.query(sql);
// Send back results to our client
res.send(results);

Now assume that the random tag that gets picked is $a$ (there's a 1/20 chance that it is). That same code is now:

var value = 'some-user-input';
var escapedValue = '$a$' + value + '$a$';
// SELECT $a$some-user-input$a$ as x
var sql = 'SELECT ' + escapedValue + ' AS x';
// ... execute sql and return result to user

... and for the common case it's fine.

Now let's say the user input includes $a$.

var value = 'foo$a$bar';
var escapedValue = '$a$' + value + '$a$';
// SELECT $a$foo$a$bar$a$ as x
var sql = 'SELECT ' + escapedValue + ' AS x';
// ... execute sql and return result to user

... the quotes won't align and the SQL will most likely have a syntax error.

Now let's say the user knows the structure of what you're executing and includes $a$ maliciously:

var value = '$a$ as x UNION ALL SELECT row_to_json(t.*)::text FROM customer_stuff t UNION ALL SELECT $a$';
var escapedValue = '$a$' + value + '$a$';
var sql = 'SELECT ' + escapedValue + ' AS x';
// ... execute sql and return result to user

Now the SQL becomes:

SELECT $a$$a$ as x UNION ALL SELECT row_to_json(t.*)::text FROM customer_stuff t UNION ALL SELECT $a$$a$ AS x

...and all your customer secrets just got exported.

As it's only a 1/20 chance to match the dollar quoting tag, the malicious user would only need to make a handful of requests until it triggers the issue.

@kessler
Copy link
Contributor

kessler commented Jun 22, 2017

Increasing the pool size of the random strings is definitely a good idea. We could also add another layer of randomness, so when the library is initiated it will randomize the pool of tags as well.

IMO empty tags should not be allowed at all, as it completely defeats the purpose of having them in the first place.

@sehrope
Copy link
Author

sehrope commented Jun 22, 2017

We could also add another layer of randomness, so when the library is initiated it will randomize the pool of tags as well.

Randomizing the set of characters prior to randomly pulling a character from it does not make things any more random or secure.

In theory you could replace the Math.random(...) calls with a cryptographic source of random numbers from the crypto package but that is not needed for something like this nor would it obviate the need to check for existence of the chosen tag in the input text (unless the chosen tag is very long, like say 192+ bits of randomness).

IMO empty tags should not be allowed at all, as it completely defeats the purpose of having them in the first place.

No it doesn't. The point of the function is to coerce the input to a string and return back the dollar quoted version of it. The shortest, and thus best, choice for dollar quotes is the one without any tag whatsoever, $$. That leads to the shortest resulting string. You gain nothing by having a non-empty tag value.

For reference: https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

The choice of a non-empty tag value only comes into play when the original string contains $$. At that point you have to pick something else to prevent misaligned or maliciously placed quotes. It doesn't matter what you pick as long as it's not contained in the input value.

Choosing it via random addition of characters is a fine idea, as opposed to sequentially picking it, as it's less likely a malicious user could craft a string that would require a large number of iterations till a non-matching dollar quote tag is picked. If it was purely deterministic, say try $1$ then $2$ ..., then it'd be easy for a malicious user to craft a large string that includes all those values in an attempt to cause a large number of iterations. With a random string, even sourced from something cryptographically insecure like Math.random(), it's significantly more difficult as the set of possibilities for each iteration balloons by a factor of 48 (i.e. 48 choices for the first iteration, 2,304 for the second, 110,592 for the third...).

@kessler
Copy link
Contributor

kessler commented Jun 22, 2017

@sehrope Regarding the empty tag $$, I see your point, you are totally right.

I don't think dollar quoted strings were meant defeat a brute force sql injection attack. I suspect that the context of dollar quoted strings is to prevent accidental breakage of the sql rather than an intentional one - but it's just my assumption.

I hope this explains better my motivation behind using random tags this way.

@kessler
Copy link
Contributor

kessler commented Jun 22, 2017

@sehrope I also kinda like this idea:

In theory you could replace the Math.random(...) calls with a cryptographic source of random numbers from the crypto package but that is not needed for something like this nor would it obviate the need to check for existence of the chosen tag in the input text (unless the chosen tag is very long, like say 192+ bits of randomness).

Though it will beef up the sql size by a lot.

Do you really think we'll need to check for the existance of a 192 bits of random text inside the value ?

@sehrope
Copy link
Author

sehrope commented Jun 22, 2017

@kessler said:

I don't think dollar quoted strings were meant defeat a brute force sql injection attack. I suspect that the context of dollar quoted strings is to prevent accidental breakage of the sql rather than an intentional one - but it's just my assumption.

Sure but that's just wrong. A function that can "accidentally" breaks with chance of 1/20 isn't very useful. For it to be usable it needs to work in every situation.

Though it will beef up the sql size by a lot.

Do you really think we'll need to check for the existance of a 192 bits of random text inside the value ?

No and that's not how my proposed path to fix this works. I'd suggest you read the code changes in the PR.

That comment was in reference to picking a long enough, and cryptographically secure, random value for the tag that it would be unlikely to occur in any input text. With enough bits (and 192 is more than enough) you don't have to worry about the string randomly appearing in the input as it's statistically impossible.

Unfortunately to do that you'd need a very long string as with 48-choices per character, that only buys you log2(48) =~ 5.5 bits of entropy per character. So to get 192-bits you'd need 35 characters.

Something like that would be more suitable to a client dynamically building an escaped SQL without being able to analyze the input first. Say streaming it out. In that situation you can't guess a valid short dollar quote tag as you don't know the totality of the input yet, so you'd have to pick something obscenely large to make sure it can't be found in the input. Like say $<some-uuid>$.

Anyway, none of that is needed though. The proposal in the PR leads to the shortest possible quoting (i.e. just $$) in the majority of situations and, even if has to pick a tag character, will never give a wrong answer as it makes sure the tag does not exist in the input.

@kessler
Copy link
Contributor

kessler commented Jun 22, 2017

@sehrope

Sure but that's just wrong. A function that can "accidentally" breaks with chance of 1/20 isn't very useful. For it to be usable it needs to work in every situation.

Don't agree on that.

No and that's not how my proposed path to fix this works. I'd suggest you read the code changes in the PR.

I did not say that it's your proposed fix, just that I like the idea.

Really, I was just trying to create a friendly dialog here, but you are clearly set on showing you're the smartest person in the room... so, good luck with that.

@sehrope
Copy link
Author

sehrope commented Jun 22, 2017

Really, I was just trying to create a friendly dialog here, but you are clearly set on showing you're the smartest person in the room... so, good luck with that.

No, I'm just explaining why the existing code is wrong so that it gets fixed.

@kessler
Copy link
Contributor

kessler commented Jun 22, 2017

@sehrope You're just patronizing, since the beginning of this thread... and that's just sad.

When people engage in a dialog they do things like ask questions and listen to what other people have to say. I have quite a few arguments about why I think you're wrong in some of your assumptions, but clearly you are not interested in hearing anything but yourself.

I wouldn't be suprised if you're also the kind of guy that must have the last word in every argument.

alemh3 referenced this pull request in opencosmos/ocpgescape Jul 21, 2017
…ows the same rules as an unquoted identifier, except that it cannot contain a dollar sign."
@kessler
Copy link
Contributor

kessler commented Jul 30, 2017

@sehrope to proceed with this discussion. As I stated earlier I do see your point regarding the empty tag. I feel very strongly that you should be open to, at least, hear other people's opinions.

It is possible that a discussion will yield an even better solution.

@MyRedDice
Copy link

I found a small issue with the PR. The "should handle dollar quotes in the string being escaped without resorting to luck" test has, I think, a problem. Because it has no "$$" within it (only "$a$"), I think it is getting escaped from:
SELECT $a$test$a$
to:
SELECT $$$a$test$a$$$

Putting an additional $$ in the middle of the test string will correctly test the intended behavior. I made this change and it worked correctly.

That aside, this change looks good to me, and I think it is valuable in security terms. I hope it is accepted.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants