feat(pgsql): resort to servers with weight = 0 if all connections are used #4980

TPXP · 2025-06-04T16:43:15Z

Hello,

We have an Aurora PostgreSQL cluster with fixed-size instances, and sometimes have to face large traffic spikes (like 2x or 3x the usual one). Queries are hard to cache, we've come to the conclusion that RDS Serveless v2 could be a good solution for our problem as it's said to scale up very fast. Our plan is to use the fixed instance for normal traffic and send all overflow traffic (which we can detect when we reach a certain amount of running queries on the fixed size node) to the serverless instance, getting it to hopefully scale as needed and save costs as we don't have to constantly over-provision stuff.

Here's the query routing we want to implement

If max_connections hasn't been reached on server 1, send the query there
Else, send the query to server 2

Query rules don't work here as they are applied before the query is routed.

Design

This Pull Request proposes to change the behavior of ProxySQL for servers with weight 0. Currently, servers with weight 0 are configured but are never selected by the load balancing algorithm. With this change, these servers can be selected if all other servers have reached max_connections. These server's max_connections values are still respected (a server reaching max_connections is not added to the candidates list).

Alternative implementations

An alternative implementation could be to set a priority field on the server object and sort servers by this field when selecting them. My knowledge of the codebase is a it too limited for that though. This implementation works for me, but this could be a nice enhancement.

Testing protocol

Spin up 2 databases in the same hostgroup, give weights 0 and 10, set max_connections to 5 and 100 then launch 20 slow queries in parallel (e.g. select pg_sleep(2)). The first server should process 5 of them and the other should process 15 of them.

Testing files: https://gist.github.com/TPXP/598c9135a57e957dea2cf8b495a61562

docker compose up --build
# After it started, use another terminal
docker compose exec node bash
yarn
node test.js

Other notes

I've also added a small improvement to the error message if we can't locate OpenSSL headers.

Somehow this is not part of build essentials, suggest it just in case

…r with weight=0, use it

gemini-code-assist

Hello @TPXP, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini or gemini-code-assist here, providing a summary of this pull request.

This pull request introduces a new feature for PostgreSQL hostgroups in ProxySQL to handle scenarios where primary servers become saturated. The core change allows servers configured with a weight of 0 to act as fallback servers. Normally, servers with weight=0 are never selected by the load balancing algorithm. With this change, if all servers with weight > 0 within a hostgroup have reached their max_connections limit, ProxySQL will then consider and route queries to available servers that have weight=0 (provided they themselves have not reached their max_connections). This enables a pattern useful for setups like Aurora PostgreSQL, where a fixed-size instance can handle baseline traffic (weight > 0) and a serverless instance can handle overflow traffic (weight = 0) when the primary is full.

Additionally, a minor improvement is included to provide a more helpful error message when OpenSSL headers are not found during the build process.

Highlights

Fallback to Weight 0 Servers: Implements logic in the PostgreSQL hostgroup manager (PgSQL_HostGroups_Manager.cpp) to select servers with weight=0 as a fallback mechanism when all servers with weight > 0 in the same hostgroup have reached their max_connections limit.
Improved Build Error Message: Updates the error message displayed when OpenSSL headers are not found during the build, suggesting that pkg-config might be missing.

Changelog

common_mk/openssl_flags.mk
- Modified the error message for missing OpenSSL headers (SSL_IDIR) to include a suggestion to check for pkg-config.
lib/PgSQL_HostGroups_Manager.cpp
- In the get_random_MySrvC function, added a conditional block (starting at line 1977) that checks if no servers with weight > 0 were available (sum == 0) but there are still potential candidates (num_candidates > 0).
- If the condition is met, it logs an informational message.
- It then iterates through the available candidates (which must have weight=0 in this scenario) and temporarily increases their weight by 1, making them eligible for selection in the subsequent weighted random distribution logic.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Weight zero waits unseen,
Till max connections are keen.
Then fallback takes hold,
A story unfolds,
For traffic, a new routine.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a mechanism to allow servers with weight == 0 to handle traffic when all servers with weight > 0 in a hostgroup are at max_connections. This is a useful feature for handling overflow traffic, as described in the PR. The implementation modifies the server selection logic in PgSQL_HostGroups_Manager.cpp.

One minor improvement to an OpenSSL-related error message in the build system is also included and looks good.

The core logic change for server selection needs careful consideration regarding the persistence of weight modifications, which is detailed in a review comment.

Summary of Findings

Persistent Weight Modification for Fallback Servers: The current implementation modifies the weight of weight == 0 servers to make them selectable. This change appears to be persistent in memory, which could alter load balancing behavior after an overflow event, causing these servers to compete with primary servers even when primaries have capacity. This might not align with a strict "overflow-only" routing strategy.
Improved OpenSSL Error Message (Not Commented): The change in common_mk/openssl_flags.mk to include pkg-config in the error message for missing OpenSSL headers is a good usability improvement. This was assessed as low severity and per review settings, a direct comment was not added.
Testing Protocol Clarity (Informational): The testing protocol describes Server B (weight 0) handling 15 queries while having max_connections = 5. This seems contradictory, as the code and PR description state max_connections is respected. This might be a typo in the test description or an aspect of the test setup (e.g., serverless scaling) that isn't fully captured by the max_connections parameter in ProxySQL for that specific server in the test. This was not commented on directly as it pertains to the test description rather than the code diff itself, but worth noting for clarity.

Merge Readiness

The pull request introduces a valuable feature for handling traffic overflow. However, there's a significant concern regarding the persistent modification of server weights, which could lead to unintended load balancing behavior after an overflow event. I've marked this as a high severity issue and recommend addressing it before merging. I am unable to approve pull requests, so please ensure further review and approval from other maintainers after addressing the feedback. The improvement to the OpenSSL error message is a nice touch.

lib/PgSQL_HostGroups_Manager.cpp

renecannao · 2025-06-04T16:46:04Z

Can one of the admins verify this patch?

TPXP added 2 commits June 4, 2025 12:16

enh: update missing openssl error to suggest missing pkg-config

c029e66

Somehow this is not part of build essentials, suggest it just in case

wip: if all servers have reached max_connections but there is a serve…

dbe0269

…r with weight=0, use it

gemini-code-assist bot reviewed Jun 4, 2025

View reviewed changes

gemini-code-assist bot suggested changes Jun 4, 2025

View reviewed changes

lib/PgSQL_HostGroups_Manager.cpp Outdated Show resolved Hide resolved

ref: revamp the weight==0 behaviour to avoid touching weights

7bdb9ab

TPXP force-pushed the v3.0-backup-srv branch from 2c46b1a to 7bdb9ab Compare June 4, 2025 17:26

TPXP changed the title ~~feat(pgsql): resort to servers with weight == 0 if all connections are used~~ feat(pgsql): resort to servers with weight = 0 if all connections are used Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pgsql): resort to servers with weight = 0 if all connections are used #4980

feat(pgsql): resort to servers with weight = 0 if all connections are used #4980

Uh oh!

TPXP commented Jun 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

renecannao commented Jun 4, 2025

Uh oh!

Uh oh!

feat(pgsql): resort to servers with weight = 0 if all connections are used #4980

Are you sure you want to change the base?

feat(pgsql): resort to servers with weight = 0 if all connections are used #4980

Uh oh!

Conversation

TPXP commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design

Alternative implementations

Testing protocol

Other notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

renecannao commented Jun 4, 2025

Uh oh!

Uh oh!

TPXP commented Jun 4, 2025 •

edited

Loading