Skip to content

feat(pgsql): resort to servers with weight = 0 if all connections are used #4980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: v3.0
Choose a base branch
from

Conversation

TPXP
Copy link

@TPXP TPXP commented Jun 4, 2025

Hello,

We have an Aurora PostgreSQL cluster with fixed-size instances, and sometimes have to face large traffic spikes (like 2x or 3x the usual one). Queries are hard to cache, we've come to the conclusion that RDS Serveless v2 could be a good solution for our problem as it's said to scale up very fast. Our plan is to use the fixed instance for normal traffic and send all overflow traffic (which we can detect when we reach a certain amount of running queries on the fixed size node) to the serverless instance, getting it to hopefully scale as needed and save costs as we don't have to constantly over-provision stuff.

Here's the query routing we want to implement

  • If max_connections hasn't been reached on server 1, send the query there
  • Else, send the query to server 2

Query rules don't work here as they are applied before the query is routed.

Design

This Pull Request proposes to change the behavior of ProxySQL for servers with weight 0. Currently, servers with weight 0 are configured but are never selected by the load balancing algorithm. With this change, these servers can be selected if all other servers have reached max_connections. These server's max_connections values are still respected (a server reaching max_connections is not added to the candidates list).

Alternative implementations

An alternative implementation could be to set a priority field on the server object and sort servers by this field when selecting them. My knowledge of the codebase is a it too limited for that though. This implementation works for me, but this could be a nice enhancement.

Testing protocol

Spin up 2 databases in the same hostgroup, give weights 0 and 10, set max_connections to 5 and 100 then launch 20 slow queries in parallel (e.g. select pg_sleep(2)). The first server should process 5 of them and the other should process 15 of them.

Testing files: https://gist.github.com/TPXP/598c9135a57e957dea2cf8b495a61562

docker compose up --build
# After it started, use another terminal
docker compose exec node bash
yarn
node test.js

Other notes

I've also added a small improvement to the error message if we can't locate OpenSSL headers.

TPXP added 2 commits June 4, 2025 12:16
Somehow this is not part of build essentials, suggest it just in case
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @TPXP, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini or gemini-code-assist here, providing a summary of this pull request.

This pull request introduces a new feature for PostgreSQL hostgroups in ProxySQL to handle scenarios where primary servers become saturated. The core change allows servers configured with a weight of 0 to act as fallback servers. Normally, servers with weight=0 are never selected by the load balancing algorithm. With this change, if all servers with weight > 0 within a hostgroup have reached their max_connections limit, ProxySQL will then consider and route queries to available servers that have weight=0 (provided they themselves have not reached their max_connections). This enables a pattern useful for setups like Aurora PostgreSQL, where a fixed-size instance can handle baseline traffic (weight > 0) and a serverless instance can handle overflow traffic (weight = 0) when the primary is full.

Additionally, a minor improvement is included to provide a more helpful error message when OpenSSL headers are not found during the build process.

Highlights

  • Fallback to Weight 0 Servers: Implements logic in the PostgreSQL hostgroup manager (PgSQL_HostGroups_Manager.cpp) to select servers with weight=0 as a fallback mechanism when all servers with weight > 0 in the same hostgroup have reached their max_connections limit.
  • Improved Build Error Message: Updates the error message displayed when OpenSSL headers are not found during the build, suggesting that pkg-config might be missing.

Changelog

  • common_mk/openssl_flags.mk
    • Modified the error message for missing OpenSSL headers (SSL_IDIR) to include a suggestion to check for pkg-config.
  • lib/PgSQL_HostGroups_Manager.cpp
    • In the get_random_MySrvC function, added a conditional block (starting at line 1977) that checks if no servers with weight > 0 were available (sum == 0) but there are still potential candidates (num_candidates > 0).
    • If the condition is met, it logs an informational message.
    • It then iterates through the available candidates (which must have weight=0 in this scenario) and temporarily increases their weight by 1, making them eligible for selection in the subsequent weighted random distribution logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Weight zero waits unseen,
Till max connections are keen.
Then fallback takes hold,
A story unfolds,
For traffic, a new routine.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to allow servers with weight == 0 to handle traffic when all servers with weight > 0 in a hostgroup are at max_connections. This is a useful feature for handling overflow traffic, as described in the PR. The implementation modifies the server selection logic in PgSQL_HostGroups_Manager.cpp.

One minor improvement to an OpenSSL-related error message in the build system is also included and looks good.

The core logic change for server selection needs careful consideration regarding the persistence of weight modifications, which is detailed in a review comment.

Summary of Findings

  • Persistent Weight Modification for Fallback Servers: The current implementation modifies the weight of weight == 0 servers to make them selectable. This change appears to be persistent in memory, which could alter load balancing behavior after an overflow event, causing these servers to compete with primary servers even when primaries have capacity. This might not align with a strict "overflow-only" routing strategy.
  • Improved OpenSSL Error Message (Not Commented): The change in common_mk/openssl_flags.mk to include pkg-config in the error message for missing OpenSSL headers is a good usability improvement. This was assessed as low severity and per review settings, a direct comment was not added.
  • Testing Protocol Clarity (Informational): The testing protocol describes Server B (weight 0) handling 15 queries while having max_connections = 5. This seems contradictory, as the code and PR description state max_connections is respected. This might be a typo in the test description or an aspect of the test setup (e.g., serverless scaling) that isn't fully captured by the max_connections parameter in ProxySQL for that specific server in the test. This was not commented on directly as it pertains to the test description rather than the code diff itself, but worth noting for clarity.

Merge Readiness

The pull request introduces a valuable feature for handling traffic overflow. However, there's a significant concern regarding the persistent modification of server weights, which could lead to unintended load balancing behavior after an overflow event. I've marked this as a high severity issue and recommend addressing it before merging. I am unable to approve pull requests, so please ensure further review and approval from other maintainers after addressing the feedback. The improvement to the OpenSSL error message is a nice touch.

@renecannao
Copy link
Contributor

Can one of the admins verify this patch?

@TPXP TPXP force-pushed the v3.0-backup-srv branch from 2c46b1a to 7bdb9ab Compare June 4, 2025 17:26
@TPXP TPXP changed the title feat(pgsql): resort to servers with weight == 0 if all connections are used feat(pgsql): resort to servers with weight = 0 if all connections are used Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants