Skip to content

Conversation

nothingmuch
Copy link
Collaborator

This PR contains two changes, using - instead of + as the fragment parameter delimiter, and lexicographically ordering the fragment parameters.

This implements bitcoin/bips#1890

@nothingmuch nothingmuch requested a review from DanGould July 5, 2025 12:49
@coveralls
Copy link
Collaborator

coveralls commented Jul 5, 2025

Pull Request Test Coverage Report for Build 16131346449

Details

  • 93 of 105 (88.57%) changed or added relevant lines in 2 files are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.01%) to 85.317%

Changes Missing Coverage Covered Lines Changed/Added Lines %
payjoin/src/core/ohttp.rs 0 1 0.0%
payjoin/src/core/uri/url_ext.rs 93 104 89.42%
Files with Coverage Reduction New Missed Lines %
payjoin/src/core/uri/url_ext.rs 4 85.41%
Totals Coverage Status
Change from base Build 16124956763: -0.01%
Covered Lines: 7455
Relevant Lines: 8738

💛 - Coveralls

The `url_ext` mod is itself feature gated, so any declarations within it
are already implicitly v2 only.
Although RFC 3986 (URIs) does not assign any special meaning to `+`, in
fragment parameters or in general, RFC 1866 (HTML 2.0) section 7.5 uses
it as a delimiter for keywords in query parameters. As a result some URI
libraries interpret `+` in URIs as ` `, even in fragment parameters.

Although not insurmountable (such transformation BIP 77 URIs is
reversible because ` ` is not used and `+` was only used for fragment
parameter delimitation) this presents friction and is in general
confusion, so to improve compatibility with such libraries `-` is now
used instead. It has no reserved meaning as a sub-delimiter.

For the time being when parsing both `+` and `-` will be accepted, but
only `-` will be used when encoding fragment parameters.
@nothingmuch nothingmuch force-pushed the fragment-fixes branch 3 times, most recently from 19e9cf2 to 452e4b3 Compare July 6, 2025 19:29
Copy link
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure that the last commit needs to go in (other than the base64-bech32 typo fix). What kind of errors does that prevent that would not otherwise be caught by bech32 parsing?

Comment on lines 133 to 138
// check for allowed delimiters
if c == b'-' {
has_dash = true;
} else if c == b'+' {
has_plus = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be in the loop? Can't fragment.contains('-') and fragment.contains('+') be used outside of the loop in the match for greater legibility

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't agree that that's more legibile, since these conditions are mutually exclusive with the charset range so conceptually it seems even more confusing to put these characters in the range of characters that are also included

not that efficiency really matters, but also scanning through the fragment once instead of 3 times seems reasonable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i roughly implemented this suggestion but i think i still prefer the older approach since repeating the logic with c != '-' && c != '+' places this information about the allowed delimiters in two different places instead of just one, which i find less legible than the slightly clunkier if else stuff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I understand why you'd put it all together now and am comfortable with that though would merge this as-is at this point.

AmbiguousDelimiter,
}

fn check_fragment_delimiter(fragment: &str) -> Result<char, ParseFragmentError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does a hell of a lot more than check the fragment delimiter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please suggest a better name assuming the newly proposed behavior (check no fragment ambiguity and that the fragment ~= /^[A-Z0-9+\-]*$/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_fragment_charset ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did not take the suggested name because Result<char, ParseFragmentError> is not self explanatory in terms of what this method returns, determining the delimiter to use is the important thing this function calls and checking the charset is kinda of a precondition

@nothingmuch
Copy link
Collaborator Author

I'm really not sure that the last commit needs to go in (other than the base64-bech32 typo fix).

Yeah I think I agree, by the time I wrote it, especially the error conversion boilerplate I had kinda regretted it, but I pushed anyway so we can discuss. I'm in favor of reverting to the more permissive behavior, or something intermediate.

What kind of errors does that prevent that would not otherwise be caught by bech32 parsing?

  1. mixed delimiters, which are indeed a contrivance (see the previous review comments)
  2. HRP strings that are valid according to bech32 but fall outside of the BIP77
  3. lowercase bech32

(1) can be done as fragment.contains('-') and +, and the match on pair of bools that arises from that more simply.

(2) seems unnecessary anyway, future extension mechanisms should be allowed as per BIP 77, and the behavior i implemented is too restrictive

(3) can be enforced as just checking that there's no lowercase chars

i will replace the last commit with a much more minimal one that does not attempt to do any bech32 charset validation as sketched in this reply

@DanGould
Copy link
Contributor

DanGould commented Jul 7, 2025

Sounds great. Looking forward to the more minimal final commit that does not attempt to do any bech32 charset validation.

@nothingmuch
Copy link
Collaborator Author

Also, long term I would prefer something that avoids this mutation based approach entirely, using a builder pattern to queue up fields, and then just joining them instead of parsing and mutating to set would be much simpler, but I didn't want to redesign, if you're cACK i'll write this up in an issue

Previously `set_param` would did not preserve order, but the way that
`set_param` was called ended up setting the RK, OH and EX fragment
parameters in reverse lexicographical order.

To avoid any privacy leaks from URI construction (revealing the specific
software the receiver is using) the spec now requires fragment
parameters to be ordered lexicographically, so `set_param` now ensures
this.
Copy link
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 51c4147

Ambiguity in the fragment parameter delimiter or any invalid characters
are no longer allowed.

The HRPs EX, OH, and RK are within the uppercase bech32 character set.
Only this character set along with the HRP delimiter `1` are now
allowed, with either `+` or `-` as a delimiter (but not both).
Comment on lines +153 to 159
if !(b'0'..b'9' + 1).contains(&c)
&& !(b'A'..b'Z' + 1).contains(&c)
&& c != b'-'
&& c != b'+'
{
return Err(ParseFragmentError::InvalidChar(c.into()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this syntax?

Suggested change
if !(b'0'..b'9' + 1).contains(&c)
&& !(b'A'..b'Z' + 1).contains(&c)
&& c != b'-'
&& c != b'+'
{
return Err(ParseFragmentError::InvalidChar(c.into()));
}
if !matches!(c, b'0'..=b'9' | b'A'..=b'Z' | b'-' | b'+') {
return Err(ParseFragmentError::InvalidChar(c.into()));
}


if !fragment.is_empty() {
fragment.push('+');
match (has_dash, has_plus) {
Copy link
Contributor

@DanGould DanGould Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I realize now the reason you had the has_dash, has_plus variables was to make the scanning operation O(1n) wrt the length of the fragment and not O(3n). If not that, why not?

Suggested change
match (has_dash, has_plus) {
match (fragment.contains('-'), fragment.contains('+')) {

Copy link
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 0eb74b9

What changed last night after my prior ACK? Looks good to me.

@DanGould DanGould merged commit 70755a9 into payjoin:master Jul 9, 2025
8 checks passed
@DanGould
Copy link
Contributor

DanGould commented Jul 9, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants