-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(pgsql): resort to servers with weight = 0 if all connections are used #4980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v3.0
Are you sure you want to change the base?
Conversation
Somehow this is not part of build essentials, suggest it just in case
…r with weight=0, use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @TPXP, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini or gemini-code-assist here, providing a summary of this pull request.
This pull request introduces a new feature for PostgreSQL hostgroups in ProxySQL to handle scenarios where primary servers become saturated. The core change allows servers configured with a weight
of 0 to act as fallback servers. Normally, servers with weight=0
are never selected by the load balancing algorithm. With this change, if all servers with weight > 0
within a hostgroup have reached their max_connections
limit, ProxySQL will then consider and route queries to available servers that have weight=0
(provided they themselves have not reached their max_connections
). This enables a pattern useful for setups like Aurora PostgreSQL, where a fixed-size instance can handle baseline traffic (weight > 0) and a serverless instance can handle overflow traffic (weight = 0) when the primary is full.
Additionally, a minor improvement is included to provide a more helpful error message when OpenSSL headers are not found during the build process.
Highlights
- Fallback to Weight 0 Servers: Implements logic in the PostgreSQL hostgroup manager (
PgSQL_HostGroups_Manager.cpp
) to select servers withweight=0
as a fallback mechanism when all servers withweight > 0
in the same hostgroup have reached theirmax_connections
limit. - Improved Build Error Message: Updates the error message displayed when OpenSSL headers are not found during the build, suggesting that
pkg-config
might be missing.
Changelog
- common_mk/openssl_flags.mk
- Modified the error message for missing OpenSSL headers (
SSL_IDIR
) to include a suggestion to check forpkg-config
.
- Modified the error message for missing OpenSSL headers (
- lib/PgSQL_HostGroups_Manager.cpp
- In the
get_random_MySrvC
function, added a conditional block (starting at line 1977) that checks if no servers withweight > 0
were available (sum == 0
) but there are still potential candidates (num_candidates > 0
). - If the condition is met, it logs an informational message.
- It then iterates through the available candidates (which must have
weight=0
in this scenario) and temporarily increases theirweight
by 1, making them eligible for selection in the subsequent weighted random distribution logic.
- In the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Weight zero waits unseen,
Till max connections are keen.
Then fallback takes hold,
A story unfolds,
For traffic, a new routine.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a mechanism to allow servers with weight == 0
to handle traffic when all servers with weight > 0
in a hostgroup are at max_connections
. This is a useful feature for handling overflow traffic, as described in the PR. The implementation modifies the server selection logic in PgSQL_HostGroups_Manager.cpp
.
One minor improvement to an OpenSSL-related error message in the build system is also included and looks good.
The core logic change for server selection needs careful consideration regarding the persistence of weight modifications, which is detailed in a review comment.
Summary of Findings
- Persistent Weight Modification for Fallback Servers: The current implementation modifies the
weight
ofweight == 0
servers to make them selectable. This change appears to be persistent in memory, which could alter load balancing behavior after an overflow event, causing these servers to compete with primary servers even when primaries have capacity. This might not align with a strict "overflow-only" routing strategy. - Improved OpenSSL Error Message (Not Commented): The change in
common_mk/openssl_flags.mk
to includepkg-config
in the error message for missing OpenSSL headers is a good usability improvement. This was assessed aslow
severity and per review settings, a direct comment was not added. - Testing Protocol Clarity (Informational): The testing protocol describes Server B (weight 0) handling 15 queries while having
max_connections = 5
. This seems contradictory, as the code and PR description statemax_connections
is respected. This might be a typo in the test description or an aspect of the test setup (e.g., serverless scaling) that isn't fully captured by themax_connections
parameter in ProxySQL for that specific server in the test. This was not commented on directly as it pertains to the test description rather than the code diff itself, but worth noting for clarity.
Merge Readiness
The pull request introduces a valuable feature for handling traffic overflow. However, there's a significant concern regarding the persistent modification of server weights, which could lead to unintended load balancing behavior after an overflow event. I've marked this as a high
severity issue and recommend addressing it before merging. I am unable to approve pull requests, so please ensure further review and approval from other maintainers after addressing the feedback. The improvement to the OpenSSL error message is a nice touch.
Can one of the admins verify this patch? |
Hello,
We have an Aurora PostgreSQL cluster with fixed-size instances, and sometimes have to face large traffic spikes (like 2x or 3x the usual one). Queries are hard to cache, we've come to the conclusion that RDS Serveless v2 could be a good solution for our problem as it's said to scale up very fast. Our plan is to use the fixed instance for normal traffic and send all overflow traffic (which we can detect when we reach a certain amount of running queries on the fixed size node) to the serverless instance, getting it to hopefully scale as needed and save costs as we don't have to constantly over-provision stuff.
Here's the query routing we want to implement
Query rules don't work here as they are applied before the query is routed.
Design
This Pull Request proposes to change the behavior of ProxySQL for servers with weight 0. Currently, servers with weight 0 are configured but are never selected by the load balancing algorithm. With this change, these servers can be selected if all other servers have reached
max_connections
. These server'smax_connections
values are still respected (a server reaching max_connections is not added to the candidates list).Alternative implementations
An alternative implementation could be to set a
priority
field on the server object and sort servers by this field when selecting them. My knowledge of the codebase is a it too limited for that though. This implementation works for me, but this could be a nice enhancement.Testing protocol
Spin up 2 databases in the same hostgroup, give weights 0 and 10, set max_connections to 5 and 100 then launch 20 slow queries in parallel (e.g.
select pg_sleep(2)
). The first server should process 5 of them and the other should process 15 of them.Testing files: https://gist.github.com/TPXP/598c9135a57e957dea2cf8b495a61562
Other notes
I've also added a small improvement to the error message if we can't locate OpenSSL headers.