Skip to content

This commit introduces three tokens that can be used in query rules regex #4939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: v3.0
Choose a base branch
from

Conversation

y-trudeau
Copy link

@y-trudeau y-trudeau commented May 6, 2025

These tokens are @USERNAME@, @HOSTNAME@ and @CN@. The username is readily available from MySQL Data Stream object but some extra code is needed to obtain @HOSTNAME@ and @CN@. The @HOSTNAME@ comes a DNS reverse lookup at the client connection when the new global variables mysql-dns_reverse_lookup or pgsql-dns_reverse_lookup are set set to true. The @CN@ value is obtained from the client SSL certificate when one is used. This allows query rules like:

"UPDATE Jobs SET.*WHERE hostname = '@CN@'"

The extra layer of authorization simplifies greatly the required ProxySQL query rules. The newly introduced Pg support in v3.0 has not been fully tested. The support for ipv6 is also missing.

…egular

expressions. These tokens are @USERNAME@, @hostname@ and @cn@.  The username
is readily available from MySQL Data Stream object but some extra code is needed
to obtain @hostname@ and @cn@. The @hostname@ comes a DNS reverse lookup at
the client connection when the new global variables mysql-dns_reverse_lookup or
pgsql-dns_reverse_lookup are set set to true. The @cn@ value is obtained from
the client SSL certificate when one is used. This allows query rules like:

"UPDATE Jobs SET.*WHERE hostname = '@cn@'"

The extra layer of authorization simplifies greatly the required ProxySQL
query rules.
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @y-trudeau, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces three new tokens (@USERNAME@, @hostname@, and @cn@) that can be used in query rules regex to simplify authorization. The username is obtained from the MySQL Data Stream object. The hostname requires a DNS reverse lookup, enabled by the new global variables mysql-dns_reverse_lookup or pgsql-dns_reverse_lookup. The CN value is extracted from the client SSL certificate when one is used. The changes affect both MySQL and PostgreSQL, adding new fields to store hostname and x509 subject name, and modifying the query processing logic to replace the tokens with the corresponding values before compiling the regex.

Highlights

  • New Tokens for Query Rules: Introduces @USERNAME@, @hostname@, and @cn@ tokens for use in query rules regex, enhancing authorization capabilities.
  • DNS Reverse Lookup: Adds a dns_reverse_lookup boolean variable to MySQL and PgSQL threads, enabling hostname resolution via DNS reverse lookup.
  • SSL Certificate Handling: Improves SSL certificate handling to extract the x509 subject name (CN) for use in query rules.
  • Query Processor Modification: Modifies the query processor to replace tokens in match_digest and match_pattern with actual values before regex compilation.

Changelog

Click here to see the changelog
  • include/MySQL_Data_Stream.h
    • Added x509_subject_name field to store the SSL certificate subject name.
    • Added hostname field to the addr struct for storing the resolved hostname.
  • include/MySQL_Thread.h
    • Added dns_reverse_lookup boolean variable to control DNS reverse lookup.
  • include/PgSQL_Data_Stream.h
    • Added x509_subject_name field to store the SSL certificate subject name.
    • Added hostname field to the addr struct for storing the resolved hostname.
  • include/PgSQL_Thread.h
    • Added dns_reverse_lookup boolean variable to control DNS reverse lookup.
  • include/proxysql_structs.h
    • Added pgsql_thread___dns_reverse_lookup and mysql_thread___dns_reverse_lookup thread-local boolean variables.
  • lib/MySQL_Thread.cpp
    • Included <netdb.h>, <string.h>, and <arpa/inet.h> for DNS reverse lookup functionality.
    • Added dns_reverse_lookup to the list of thread variables.
    • Initialized variables.dns_reverse_lookup to false in the MySQL_Threads_Handler constructor.
    • Added dns_reverse_lookup to the VariablesPointers_bool map.
    • Refreshed the dns_reverse_lookup variable during thread variable refresh.
    • Implemented DNS reverse lookup in listener_handle_new_connection when mysql_thread___dns_reverse_lookup is enabled.
  • lib/PgSQL_Data_Stream.cpp
    • Extracted and stored the SSL certificate subject name in x509_subject_name during SSL handshake.
    • Initialized addr.hostname to NULL in the constructor.
    • Freed addr.hostname and x509_subject_name in the destructor.
  • lib/PgSQL_Thread.cpp
    • Added dns_reverse_lookup to the list of thread variables.
    • Initialized variables.dns_reverse_lookup to false in the PgSQL_Threads_Handler constructor.
    • Added dns_reverse_lookup to the VariablesPointers_bool map.
    • Refreshed the pgsql_thread___dns_reverse_lookup variable during thread variable refresh.
    • Implemented DNS reverse lookup in listener_handle_new_connection when pgsql_thread___dns_reverse_lookup is enabled.
  • lib/Query_Processor.cpp
    • Included <string> for string manipulation.
    • Implemented replace_token and replace_all_token functions to replace tokens in query rules.
    • Modified compile_query_rule to use replace_all_token before compiling the regex, passing hostname, username, and x509_subject_name.
    • Modified process_query to pass hostname, username, and x509_subject_name to compile_query_rule.
  • lib/mysql_data_stream.cpp
    • Enabled proxy_info for SSL handshake completion.
    • Extracted and stored the SSL certificate subject name in x509_subject_name during SSL handshake.
    • Initialized addr.hostname to NULL in the constructor.
    • Freed addr.hostname and x509_subject_name in the destructor.
    • Added hostname to the JSON output in get_client_myds_info_json.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A token's journey,
From regex to runtime,
Rules now empowered.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@renecannao
Copy link
Contributor

Can one of the admins verify this patch?

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces tokens for use in query rules regex, which is a valuable addition. However, there are some areas that need attention before merging, including missing IPv6 support, untested PgSQL support, and non-optimal CN extraction. The code changes themselves seem reasonable, but thorough testing and optimization are necessary.

Summary of Findings

  • Missing IPv6 Support: The pull request description mentions that IPv6 support is missing. This should be addressed to ensure broader compatibility.
  • Untested PgSQL Support: The pull request description mentions that the newly introduced Pg support in v3.0 has not been fully tested. This should be addressed to ensure proper functionality.
  • Non-optimal CN Extraction: The code extracts the CN from the x509_subject_name every time, which is not optimal. It should be extracted in mysql_data_stream.cpp.

Merge Readiness

The pull request is not ready for merging. The missing IPv6 support and untested PgSQL support are significant issues that need to be addressed. Additionally, the non-optimal CN extraction should be optimized. I am unable to approve this pull request, and recommend that it not be merged until these issues are resolved and the code has been thoroughly tested.

@y-trudeau
Copy link
Author

All the tokens are upper case, I don't what happened with the formatting in git.

@renecannao
Copy link
Contributor

Hi @y-trudeau .
Thank you for the PR.

I edited your first comment to add quotes, so github doesn't change the formatting.

I didn't review it in details, but I have some initial feedback.

  1. in the new global variables name I would somehow specify that it refers to client connections. Maybe mysql-client_dns_reverse_lookup and pgsql-client_dns_reverse_lookup
  2. a DNS lookup is a blocking operation. And in fact, DNS lookups are the only blocking operations in ProxySQL. Historically, and especially in large deployments, when ProxySQL needed to perform DNS lookups to connect to backend, this could lead to DNS servers being overloaded that could in turn lead to an incident . Especially during temporary network glitches. This no matter the fact that ProxySQL tries to reuse backend connections as much as possible, thus limiting the need for new backend connections and DNS lookups. For this reason we introduced a DNS cache for backend connections. I believe the same is needed for frontend connections: actually, for frontend connections this is even more important, as frontend connections tend to be short lived. The good news is that we already have a "cache" used for frontend connections: client_host_cache . Right now this cache only counts errors, but it can be easily extended to store DNS reverse lookups
  3. I have some reservation about using @ as hardcoded string delimiter , or without a way to explicitly enable/disable the token replacement. I think it is easier to enable/disable the feature on a per query rule basis instead of making the delimiter configurable. I suggest to add add a new value in re_modifiers to specify that token replacement is enabled.

Although I understand what the feature is adding, it is still not clear to me what problem is trying to solve.
I mean, I understand it allows to match queries like "UPDATE Jobs SET.*WHERE hostname = '@CN@'" , but I don't understand what is the underlying use case it would solve.
Can you please clarify this?
Once I know the use case maybe I can suggest a different approach.

Thanks.

@y-trudeau
Copy link
Author

Hi @renecannao,
the patch has been developed for a customer which has a large fleet of application servers requesting work from the database based on their hostname. They essentially wants to add a layer of authorization in case a host becomes compromised. If a host is compromised, only the records belonging to that specific host would be at risk or at least, there would be a way to limit the impacts.

As of the changes you are proposing, I agree with them. To be honest, I noticed the client_host_cache but couldn't figure how to use it properly.

Regards

@y-trudeau
Copy link
Author

Hi, any progress with this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants