Fix dollarQuotedString() to handle inputs that contain the randomly chosen quote tag #16

sehrope · 2017-06-21T14:34:50Z

Three commits in this PR:

First adds a failing test to showcase dollarQuotedString(...) failing to handle strings like $a$ . As the dollar quote tags are randomly chosen it's possible nobody ever noticed this. In practice it's easy to generate and any web service that uses this to quote user input could be trivially bombarded with enough attempts to trigger the bug (it's a 1/20 chance each time).
Fixes said bug in dollarQuotedString(...) and cleans up some tests. The new code also has the positive side effect of using a plain $$ as the first choice for dollar quoting (i.e. tagless). In the common case where it'd be acceptable (i.e. $$ does not appear in the input string), that would shrink the output by two characters.
Changes the set of random chars used for tags to increase the number of choices (48 vs. the original 20) per char and also exclude vowels and vowel lookalikes to prevent the choosing of "naughty" words as dollar quote tags. With the expanded set of chars the probability of it is already relatively slim and the vowel exclusions eliminate it entirely.

Add a test for dollarQuotedString() to showcase how it fails to handle the randomly chosen dollar quote tag appearing in the string to be quoted. The test works by repeatedly trying to quote $a$ until the string itself is the randomly chosen dollar quote tag.

Fixes a bug in dollarQuotedString() where it fails to account for the case where the string being quoted contains the randomly chosen dollar quote tag. Before the fix, if a string containing $a$ there is a one in 21 (number of values in randomTags) chance of it occurring. With the fix the function now checks if the chosen dollar quote tag is contained within the string to be quoted and if so chooses a new longer random tag. The tag initially starts as the empty string so the initial default choice for quoting becomes $$. Random additions to the tag were chosen, over incrementing a counter, to make it slightly more difficult for an attacker to supply an input string that would require a large number of iterations. As the new defualt is to use $$ for quoting, a number of the results for unit tests have been simplified to match on the exact text that would be returned.

Change the set of characters used to generate random tags for dollar quotes to include both lower and upper case non-vowel characters and the digits two (2) through nine (9). The overall expanded set decreases the likelyhood of a randomly picked tag existing in the contained input. The exclusions are to ensure that generated tags do not randomly end up including naughty words.

ejcx · 2017-06-21T15:53:15Z

Can you please update this to more clearly show the problem and explain the bug?

sehrope · 2017-06-21T16:40:33Z

If you try to dollar quote a literal string that contains $a$ (where "a" is any character in the randomTags) array, you have a 1/20 chance of generating an invalid string as the same string could be used as the dollar quote tag.

Here's an example. Say we have a web service that uses this lib like so:

// Get some user input
var value = req.query.userInput;
// Escape it with this lib
var escapedValue = escape.escapeDollarQuoted(value);
// Build a query with the escaped value
var sql = 'SELECT ' + escapedValue + ' AS x';
// Execute the query
var results = await db.query(sql);
// Send back results to our client
res.send(results);

Now assume that the random tag that gets picked is $a$ (there's a 1/20 chance that it is). That same code is now:

var value = 'some-user-input';
var escapedValue = '$a$' + value + '$a$';
// SELECT $a$some-user-input$a$ as x
var sql = 'SELECT ' + escapedValue + ' AS x';
// ... execute sql and return result to user

... and for the common case it's fine.

Now let's say the user input includes $a$ .

var value = 'foo$a$bar';
var escapedValue = '$a$' + value + '$a$';
// SELECT $a$foo$a$bar$a$ as x
var sql = 'SELECT ' + escapedValue + ' AS x';
// ... execute sql and return result to user

... the quotes won't align and the SQL will most likely have a syntax error.

Now let's say the user knows the structure of what you're executing and includes $a$ maliciously:

var value = '$a$ as x UNION ALL SELECT row_to_json(t.*)::text FROM customer_stuff t UNION ALL SELECT $a$';
var escapedValue = '$a$' + value + '$a$';
var sql = 'SELECT ' + escapedValue + ' AS x';
// ... execute sql and return result to user

Now the SQL becomes:

SELECT $a$$a$ as x UNION ALL SELECT row_to_json(t.*)::text FROM customer_stuff t UNION ALL SELECT $a$$a$ AS x

...and all your customer secrets just got exported.

As it's only a 1/20 chance to match the dollar quoting tag, the malicious user would only need to make a handful of requests until it triggers the issue.

kessler · 2017-06-22T10:09:14Z

Increasing the pool size of the random strings is definitely a good idea. We could also add another layer of randomness, so when the library is initiated it will randomize the pool of tags as well.

IMO empty tags should not be allowed at all, as it completely defeats the purpose of having them in the first place.

sehrope · 2017-06-22T11:11:35Z

We could also add another layer of randomness, so when the library is initiated it will randomize the pool of tags as well.

Randomizing the set of characters prior to randomly pulling a character from it does not make things any more random or secure.

In theory you could replace the Math.random(...) calls with a cryptographic source of random numbers from the crypto package but that is not needed for something like this nor would it obviate the need to check for existence of the chosen tag in the input text (unless the chosen tag is very long, like say 192+ bits of randomness).

IMO empty tags should not be allowed at all, as it completely defeats the purpose of having them in the first place.

No it doesn't. The point of the function is to coerce the input to a string and return back the dollar quoted version of it. The shortest, and thus best, choice for dollar quotes is the one without any tag whatsoever, $$. That leads to the shortest resulting string. You gain nothing by having a non-empty tag value.

For reference: https://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

The choice of a non-empty tag value only comes into play when the original string contains $$. At that point you have to pick something else to prevent misaligned or maliciously placed quotes. It doesn't matter what you pick as long as it's not contained in the input value.

Choosing it via random addition of characters is a fine idea, as opposed to sequentially picking it, as it's less likely a malicious user could craft a string that would require a large number of iterations till a non-matching dollar quote tag is picked. If it was purely deterministic, say try $1$ then $2$ ..., then it'd be easy for a malicious user to craft a large string that includes all those values in an attempt to cause a large number of iterations. With a random string, even sourced from something cryptographically insecure like Math.random(), it's significantly more difficult as the set of possibilities for each iteration balloons by a factor of 48 (i.e. 48 choices for the first iteration, 2,304 for the second, 110,592 for the third...).

kessler · 2017-06-22T11:46:13Z

@sehrope Regarding the empty tag $$, I see your point, you are totally right.

I don't think dollar quoted strings were meant defeat a brute force sql injection attack. I suspect that the context of dollar quoted strings is to prevent accidental breakage of the sql rather than an intentional one - but it's just my assumption.

I hope this explains better my motivation behind using random tags this way.

kessler · 2017-06-22T11:51:46Z

@sehrope I also kinda like this idea:

In theory you could replace the Math.random(...) calls with a cryptographic source of random numbers from the crypto package but that is not needed for something like this nor would it obviate the need to check for existence of the chosen tag in the input text (unless the chosen tag is very long, like say 192+ bits of randomness).

Though it will beef up the sql size by a lot.

Do you really think we'll need to check for the existance of a 192 bits of random text inside the value ?

sehrope · 2017-06-22T12:02:54Z

@kessler said:

I don't think dollar quoted strings were meant defeat a brute force sql injection attack. I suspect that the context of dollar quoted strings is to prevent accidental breakage of the sql rather than an intentional one - but it's just my assumption.

Sure but that's just wrong. A function that can "accidentally" breaks with chance of 1/20 isn't very useful. For it to be usable it needs to work in every situation.

Though it will beef up the sql size by a lot.

Do you really think we'll need to check for the existance of a 192 bits of random text inside the value ?

No and that's not how my proposed path to fix this works. I'd suggest you read the code changes in the PR.

That comment was in reference to picking a long enough, and cryptographically secure, random value for the tag that it would be unlikely to occur in any input text. With enough bits (and 192 is more than enough) you don't have to worry about the string randomly appearing in the input as it's statistically impossible.

Unfortunately to do that you'd need a very long string as with 48-choices per character, that only buys you log2(48) =~ 5.5 bits of entropy per character. So to get 192-bits you'd need 35 characters.

Something like that would be more suitable to a client dynamically building an escaped SQL without being able to analyze the input first. Say streaming it out. In that situation you can't guess a valid short dollar quote tag as you don't know the totality of the input yet, so you'd have to pick something obscenely large to make sure it can't be found in the input. Like say $<some-uuid>$ .

Anyway, none of that is needed though. The proposal in the PR leads to the shortest possible quoting (i.e. just $$) in the majority of situations and, even if has to pick a tag character, will never give a wrong answer as it makes sure the tag does not exist in the input.

kessler · 2017-06-22T13:10:36Z

@sehrope

Sure but that's just wrong. A function that can "accidentally" breaks with chance of 1/20 isn't very useful. For it to be usable it needs to work in every situation.

Don't agree on that.

No and that's not how my proposed path to fix this works. I'd suggest you read the code changes in the PR.

I did not say that it's your proposed fix, just that I like the idea.

Really, I was just trying to create a friendly dialog here, but you are clearly set on showing you're the smartest person in the room... so, good luck with that.

sehrope · 2017-06-22T13:32:07Z

Really, I was just trying to create a friendly dialog here, but you are clearly set on showing you're the smartest person in the room... so, good luck with that.

No, I'm just explaining why the existing code is wrong so that it gets fixed.

kessler · 2017-06-22T13:53:50Z

@sehrope You're just patronizing, since the beginning of this thread... and that's just sad.

When people engage in a dialog they do things like ask questions and listen to what other people have to say. I have quite a few arguments about why I think you're wrong in some of your assumptions, but clearly you are not interested in hearing anything but yourself.

I wouldn't be suprised if you're also the kind of guy that must have the last word in every argument.

…ows the same rules as an unquoted identifier, except that it cannot contain a dollar sign."

kessler · 2017-07-30T18:00:12Z

@sehrope to proceed with this discussion. As I stated earlier I do see your point regarding the empty tag. I feel very strongly that you should be open to, at least, hear other people's opinions.

It is possible that a discussion will yield an even better solution.

MyRedDice · 2017-10-13T19:26:56Z

I found a small issue with the PR. The "should handle dollar quotes in the string being escaped without resorting to luck" test has, I think, a problem. Because it has no "$$" within it (only "$a$"), I think it is getting escaped from:
SELECT $a$test$a$
to:
SELECT $$$a$test$a$$$

Putting an additional $$ in the middle of the test string will correctly test the intended behavior. I made this change and it worked correctly.

That aside, this change looks good to me, and I think it is valuable in security terms. I hope it is accepted.

sehrope added 3 commits June 21, 2017 09:37

ejcx self-requested a review June 21, 2017 15:49

alemh3 referenced this pull request in opencosmos/ocpgescape Jul 21, 2017

pgdoc 9.1 §4.1.2.4 - "The tag, if any, of a dollar-quoted string foll…

2f00f01

…ows the same rules as an unquoted identifier, except that it cannot contain a dollar sign."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix dollarQuotedString() to handle inputs that contain the randomly chosen quote tag #16

Fix dollarQuotedString() to handle inputs that contain the randomly chosen quote tag #16

Uh oh!

sehrope commented Jun 21, 2017

Uh oh!

ejcx commented Jun 21, 2017

Uh oh!

sehrope commented Jun 21, 2017

Uh oh!

kessler commented Jun 22, 2017

Uh oh!

sehrope commented Jun 22, 2017

Uh oh!

kessler commented Jun 22, 2017 •

edited

Loading

Uh oh!

kessler commented Jun 22, 2017

Uh oh!

sehrope commented Jun 22, 2017

Uh oh!

kessler commented Jun 22, 2017

Uh oh!

sehrope commented Jun 22, 2017

Uh oh!

kessler commented Jun 22, 2017

Uh oh!

kessler commented Jul 30, 2017 •

edited

Loading

Uh oh!

MyRedDice commented Oct 13, 2017

Uh oh!

Uh oh!

Fix dollarQuotedString() to handle inputs that contain the randomly chosen quote tag #16

Are you sure you want to change the base?

Fix dollarQuotedString() to handle inputs that contain the randomly chosen quote tag #16

Uh oh!

Conversation

sehrope commented Jun 21, 2017

Uh oh!

ejcx commented Jun 21, 2017

Uh oh!

sehrope commented Jun 21, 2017

Uh oh!

kessler commented Jun 22, 2017

Uh oh!

sehrope commented Jun 22, 2017

Uh oh!

kessler commented Jun 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kessler commented Jun 22, 2017

Uh oh!

sehrope commented Jun 22, 2017

Uh oh!

kessler commented Jun 22, 2017

Uh oh!

sehrope commented Jun 22, 2017

Uh oh!

kessler commented Jun 22, 2017

Uh oh!

kessler commented Jul 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MyRedDice commented Oct 13, 2017

Uh oh!

Uh oh!

kessler commented Jun 22, 2017 •

edited

Loading

kessler commented Jul 30, 2017 •

edited

Loading