-
Notifications
You must be signed in to change notification settings - Fork 5.1k
http conn man: Introduce host simplification rules #40874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
http conn man: Introduce host simplification rules #40874
Conversation
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
This is in draft at the moment because my primary build environment is on MacOS, which is just an absolute nightmare for running tests. Give me a bit to finish it up. |
697975a
to
35ee254
Compare
Today, Envoy only supports a single wildcard ("*") in virtual host domain entries, which must be either a prefix or suffix. This limitation greatly simplifies the virtual host matching logic, but it is also inherently limiting. This change introduces a repeated list of "host simplification rules" at the RouteConfiguration level, that provide a way to substitute away other dynamic portions of the domain without changing what is sent upstream. For example, to match something like `*.foo.*.example.org` you might write a simplification rule like: `([^.]+[.]foo[.])([^.]+)([.]example[.]org)` with a substitution of `\1bar\3`. This then allows a virtual host domain entry of `*.foo.bar.example.org` to match `baz.foo.bar.example.org` or `wowza.foo.qux.example.org`. Host simplification rules are processed in the order they are defined. Signed-off-by: Ryan Anderson <[email protected]>
35ee254
to
e6f6f9d
Compare
Rebased (for the last time, hopefully). @alyssawilk may be interested in this, since I'm pretty sure this is copying an idea she implemented years ago for GFE. |
// be used to implement multiple-wildcard matching, by converting | ||
// all but one of the wildcards into a static string. | ||
// This is similar to ignore_port_in_host_matching (above), but more flexible. | ||
repeated type.matcher.v3.RegexMatchAndSubstitute host_simplification_rules = 18; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could benefit from an example I think. regexes are pretty hard to understand (for most people), and we need to give some guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fair. I wrote more words, though I'm unsure on how the formatting will work out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also slightly improved the example from the one I used in the tests by putting the .
outside the capture groups, so it would be more obvious what the substitution itself was doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happens if there are multiple rules? Will the final result will be used to find vhost or every rule's result will be used one by one?
And at least give it a meaning for name like host_rewrite_for_matching
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final result; I think the updated comments explain that better now, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about simplification_rules_for_host_matching
?
I'm wary of host_rewrite...
because it doesn't do the rewriting (intentionally) for the upstream request.... though I suppose these rules don't have to be used for "simplification", either.
Along with some more words of explanation on how this all works. Signed-off-by: Ryan Anderson <[email protected]>
Do the GCC CI builds use a different regex engine? |
Signed-off-by: Ryan Anderson <[email protected]>
Signed-off-by: Ryan Anderson <[email protected]>
Signed-off-by: Ryan Anderson <[email protected]>
// An example may help: | ||
// > host_simplification_rules: | ||
// > - pattern: | ||
// > regex: "^(foo)[.]([^.]+)[.](example[.]org)$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember seeing some discussions about the disadvantages of using general-regex matching (for non-prefix/non-suffix) use-cases, but I cannot recall the details, and whether it was related to host-matching.
cc @yanavlasov @mattklein123 may have more context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw a proposal a while back about using domain_regex
as a list in virtual hosts; this is arguably somewhat similar, but hopefully done in a way that the host matching algorithm itself can be changed to use alternate structures (a trie comes to mind, for example), where a list of regexs in the virtualhosts doesn't allow that.
Signed-off-by: Ryan Anderson <[email protected]>
I'm not sure I understand how the fuzz tests work, and that seems to be failing on the corpus step. Is there a reference to how those are supposed to be updated? |
I don't understand why moving the lower casing to earlier makes the test pass, but it seems to, and, well, it makes as much sense as why they were failing in the first place. Signed-off-by: Ryan Anderson <[email protected]>
Signed-off-by: Ryan Anderson <[email protected]>
Signed-off-by: Ryan Anderson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. This do bring new complexity to our core route logic.
I am thinking is that possible to use the early_header_mutation_extensions
to cover this requirement? You can complete the rewrite of the host header in the extension and use the host header after rewrite for vhost searching.
If you don't want to modify the actual host
value. We can add a new vhost_header
field at the RouteConfigration
to specify the header key that be used to get host value for vhost searching. (by default, :authority
/host
). Then you can store the rewritten value in the specified header.
A simple vhost_header
would much easier to understand and could also achieve your target.
// be used to implement multiple-wildcard matching, by converting | ||
// all but one of the wildcards into a static string. | ||
// This is similar to ignore_port_in_host_matching (above), but more flexible. | ||
repeated type.matcher.v3.RegexMatchAndSubstitute host_simplification_rules = 18; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happens if there are multiple rules? Will the final result will be used to find vhost or every rule's result will be used one by one?
And at least give it a meaning for name like host_rewrite_for_matching
or something.
I don't see a way with the existing extensions to copy/mutate the value of one header by taking input from a different header, so that would also mean at least one new extension as well, possibly a few new extensions.
If it was just a |
ping @wbpcode I think this is waiting for your input |
I've also created #41280 for the alternate approach. I think I still like this one better, because it's more obvious how to use it to support multiple wildcards, but I should be able to work with the other. |
sorry for the delay. I am OOO for vacation.
Substitution formatter could be used in the header mutation rules. It could help you to mutate header with other headers/attributes. |
complexity will not disappear. But at least we can move it out from the core. |
Commit Message:
http conn man: Introduce host simplification rules
Today, Envoy only supports a single wildcard ("*") in virtual host domain entries, which must be either a prefix or suffix.. This limitation greatly simplifies the virtual host matching logic, but it is inherently limiting in some ways.
This change introduces a repeated list of "host simplification rules" at the RouteConfiguration level, that provide a way to substitute away other dynamic portions of the domain without changing what is sent upstream.
For example, to match something like
*.foo.*.example.org
you might write a simplification rule like:([^.]*[.]foo[.])([^.]+)([.]example[.]org)
with a substitution of\1bar\3
. This then allows a virtual host domain entry of*.foo.bar.example.org
to matchbaz.foo.bar.example.org
orwowza.foo.qux.example.org
.Host simplification rules are processed in the order they are defined.
Additional Description:
Risk Level: low (minimal new code, mostly reused, optional feature that has no impact if not enabled)
Testing: tests added
Docs Changes: proto docs update
Release Notes: changelog updated
Platform Specific Features: none
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]
I think everything is covered here.