Skip to content

Conversation

pugmajere
Copy link
Contributor

@pugmajere pugmajere commented Aug 28, 2025

Commit Message:
http conn man: Introduce host simplification rules

Today, Envoy only supports a single wildcard ("*") in virtual host domain entries, which must be either a prefix or suffix.. This limitation greatly simplifies the virtual host matching logic, but it is inherently limiting in some ways.

This change introduces a repeated list of "host simplification rules" at the RouteConfiguration level, that provide a way to substitute away other dynamic portions of the domain without changing what is sent upstream.

For example, to match something like
*.foo.*.example.org you might write a simplification rule like: ([^.]*[.]foo[.])([^.]+)([.]example[.]org) with a substitution of \1bar\3. This then allows a virtual host domain entry of *.foo.bar.example.org to match
baz.foo.bar.example.org or wowza.foo.qux.example.org.

Host simplification rules are processed in the order they are defined.

Additional Description:
Risk Level: low (minimal new code, mostly reused, optional feature that has no impact if not enabled)
Testing: tests added
Docs Changes: proto docs update
Release Notes: changelog updated
Platform Specific Features: none
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]
I think everything is covered here.

Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #40874 was opened by pugmajere.

see: more, trace.

Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #40874 was opened by pugmajere.

see: more, trace.

@pugmajere
Copy link
Contributor Author

This is in draft at the moment because my primary build environment is on MacOS, which is just an absolute nightmare for running tests. Give me a bit to finish it up.

@pugmajere pugmajere force-pushed the randerson-host-simplification-rules branch from 697975a to 35ee254 Compare September 5, 2025 21:20
@pugmajere pugmajere changed the title Introduce host simplification rules router: Introduce host simplification rules Sep 5, 2025
@pugmajere pugmajere changed the title router: Introduce host simplification rules http conn man: Introduce host simplification rules Sep 5, 2025
@pugmajere pugmajere marked this pull request as ready for review September 5, 2025 23:25
Today, Envoy only supports a single wildcard ("*") in virtual host
domain entries, which must be either a prefix or suffix. This
limitation greatly simplifies the virtual host matching logic, but it
is also inherently limiting.

This change introduces a repeated list of "host simplification rules"
at the RouteConfiguration level, that provide a way to substitute away
other dynamic portions of the domain without changing what is sent
upstream.

For example, to match something like
`*.foo.*.example.org` you might write a simplification rule like:
`([^.]+[.]foo[.])([^.]+)([.]example[.]org)` with a substitution of
`\1bar\3`. This then allows a virtual host domain entry of
`*.foo.bar.example.org` to match
`baz.foo.bar.example.org` or `wowza.foo.qux.example.org`.

Host simplification rules are processed in the order they are defined.

Signed-off-by: Ryan Anderson <[email protected]>
@pugmajere pugmajere force-pushed the randerson-host-simplification-rules branch from 35ee254 to e6f6f9d Compare September 5, 2025 23:28
@pugmajere
Copy link
Contributor Author

Rebased (for the last time, hopefully).

@alyssawilk may be interested in this, since I'm pretty sure this is copying an idea she implemented years ago for GFE.

// be used to implement multiple-wildcard matching, by converting
// all but one of the wildcards into a static string.
// This is similar to ignore_port_in_host_matching (above), but more flexible.
repeated type.matcher.v3.RegexMatchAndSubstitute host_simplification_rules = 18;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could benefit from an example I think. regexes are pretty hard to understand (for most people), and we need to give some guidance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair. I wrote more words, though I'm unsure on how the formatting will work out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also slightly improved the example from the one I used in the tests by putting the . outside the capture groups, so it would be more obvious what the substitution itself was doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happens if there are multiple rules? Will the final result will be used to find vhost or every rule's result will be used one by one?

And at least give it a meaning for name like host_rewrite_for_matching or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final result; I think the updated comments explain that better now, at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simplification_rules_for_host_matching?
I'm wary of host_rewrite... because it doesn't do the rewriting (intentionally) for the upstream request.... though I suppose these rules don't have to be used for "simplification", either.

Along with some more words of explanation on how this all works.

Signed-off-by: Ryan Anderson <[email protected]>
@pugmajere
Copy link
Contributor Author

Do the GCC CI builds use a different regex engine?

Signed-off-by: Ryan Anderson <[email protected]>
Signed-off-by: Ryan Anderson <[email protected]>
Signed-off-by: Ryan Anderson <[email protected]>
// An example may help:
// > host_simplification_rules:
// > - pattern:
// > regex: "^(foo)[.]([^.]+)[.](example[.]org)$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember seeing some discussions about the disadvantages of using general-regex matching (for non-prefix/non-suffix) use-cases, but I cannot recall the details, and whether it was related to host-matching.
cc @yanavlasov @mattklein123 may have more context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a proposal a while back about using domain_regex as a list in virtual hosts; this is arguably somewhat similar, but hopefully done in a way that the host matching algorithm itself can be changed to use alternate structures (a trie comes to mind, for example), where a list of regexs in the virtualhosts doesn't allow that.

Signed-off-by: Ryan Anderson <[email protected]>
@pugmajere
Copy link
Contributor Author

I'm not sure I understand how the fuzz tests work, and that seems to be failing on the corpus step. Is there a reference to how those are supposed to be updated?

I don't understand why moving the lower casing to earlier makes the
test pass, but it seems to, and, well, it makes as much sense as why
they were failing in the first place.

Signed-off-by: Ryan Anderson <[email protected]>
Copy link
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. This do bring new complexity to our core route logic.

I am thinking is that possible to use the early_header_mutation_extensions to cover this requirement? You can complete the rewrite of the host header in the extension and use the host header after rewrite for vhost searching.
If you don't want to modify the actual host value. We can add a new vhost_header field at the RouteConfigration to specify the header key that be used to get host value for vhost searching. (by default, :authority/host). Then you can store the rewritten value in the specified header.

A simple vhost_header would much easier to understand and could also achieve your target.

// be used to implement multiple-wildcard matching, by converting
// all but one of the wildcards into a static string.
// This is similar to ignore_port_in_host_matching (above), but more flexible.
repeated type.matcher.v3.RegexMatchAndSubstitute host_simplification_rules = 18;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happens if there are multiple rules? Will the final result will be used to find vhost or every rule's result will be used one by one?

And at least give it a meaning for name like host_rewrite_for_matching or something.

@wbpcode wbpcode self-assigned this Sep 18, 2025
@pugmajere
Copy link
Contributor Author

pugmajere commented Sep 19, 2025

I am thinking is that possible to use the early_header_mutation_extensions to cover this requirement? You can complete the rewrite of the host header in the extension and use the host header after rewrite for vhost searching. If you don't want to modify the actual host value. We can add a new vhost_header field at the RouteConfigration to specify the header key that be used to get host value for vhost searching. (by default, :authority/host). Then you can store the rewritten value in the specified header.

I don't see a way with the existing extensions to copy/mutate the value of one header by taking input from a different header, so that would also mean at least one new extension as well, possibly a few new extensions.

A simple vhost_header would much easier to understand and could also achieve your target.

If it was just a which header to use for host matching option, I'd agree, but adding early_header_mutation_extensions to the mix seems like it ends up being just as complicated, if not worse.

@KBaichoo
Copy link
Contributor

ping @wbpcode

I think this is waiting for your input

@pugmajere
Copy link
Contributor Author

I've also created #41280 for the alternate approach.

I think I still like this one better, because it's more obvious how to use it to support multiple wildcards, but I should be able to work with the other.

@wbpcode
Copy link
Member

wbpcode commented Oct 3, 2025

sorry for the delay. I am OOO for vacation.

I don't see a way with the existing extensions to copy/mutate the value of one header by taking input from a different header, so that would also mean at least one new extension as well, possibly a few new extensions.

Substitution formatter could be used in the header mutation rules. It could help you to mutate header with other headers/attributes.

@wbpcode
Copy link
Member

wbpcode commented Oct 3, 2025

If it was just a which header to use for host matching option, I'd agree, but adding early_header_mutation_extensions to the mix seems like it ends up being just as complicated, if not worse.

complexity will not disappear. But at least we can move it out from the core.

@wbpcode wbpcode added waiting api and removed api labels Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants