Skip to content

Extending support for rds_topology metadata handling #4992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: v3.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion include/MySQL_HostGroups_Manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -1077,7 +1077,7 @@ class MySQL_HostGroups_Manager : public Base_HostGroups_Manager<MyHGC> {
void set_Readyset_status(char *hostname, int port, enum MySerStatus status);
unsigned long long Get_Memory_Stats();

void add_discovered_servers_to_mysql_servers_and_replication_hostgroups(const vector<tuple<string, int, int>>& new_servers);
void add_discovered_servers_to_mysql_servers_and_replication_hostgroups(const vector<tuple<string, int, long, int>>& new_servers);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The addition of long to the tuple suggests that hostgroup IDs can now be larger than int. Ensure that all related code (e.g., database schema, other functions using hostgroup IDs) are updated to handle long values to prevent potential truncation or overflow issues.

void add_discovered_servers_to_mysql_servers_and_replication_hostgroups(const vector<tuple<string, int, long, int>>& new_servers);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I aligned the data type with the existing extraction. Reading further in the code, it seems that this long is actually converted to uint_32 implicitly right after. Did this have a reason in the original implementation? Happy to change both the new and the existing variable to use int or unsigned int instead.


void update_group_replication_set_offline(char *_hostname, int _port, int _writer_hostgroup, char *error);
void update_group_replication_set_read_only(char *_hostname, int _port, int _writer_hostgroup, char *error);
Expand Down
31 changes: 25 additions & 6 deletions include/MySQL_Monitor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,12 @@ struct cmp_str {
#define N_L_ASE 16

#define AWS_ENDPOINT_SUFFIX_STRING "rds.amazonaws.com"
#define QUERY_READ_ONLY_AND_AWS_TOPOLOGY_DISCOVERY "SELECT @@global.read_only read_only, id, endpoint, port from mysql.rds_topology"
#define QUERY_READ_ONLY_AND_AWS_RDS_TOPOLOGY_DISCOVERY "SELECT @@global.read_only read_only, id, endpoint, port from mysql.rds_topology"
#define QUERY_INNODB_READ_ONLY_AND_AWS_RDS_TOPOLOGY_DISCOVERY "SELECT @@global.innodb_read_only read_only, id, endpoint, port from mysql.rds_topology"
#define QUERY_READ_ONLY_AND_AWS_BLUE_GREEN_TOPOLOGY_DISCOVERY "SELECT @@global.read_only AS read_only, id, endpoint, port, role, status, version FROM mysql.rds_topology"
#define QUERY_INNODB_READ_ONLY_AND_AWS_BLUE_GREEN_TOPOLOGY_DISCOVERY "SELECT @@global.innodb_read_only AS read_only, id, endpoint, port, role, status, version FROM mysql.rds_topology"

#define SUPPORTED_AWS_RDS_TOPOLOGY_VERSION "1.0"
/*

Implementation of monitoring in AWS Aurora will be different than previous modules
Expand Down Expand Up @@ -203,7 +207,16 @@ enum MySQL_Monitor_State_Data_Task_Type {
MON_REPLICATION_LAG,
MON_GALERA,
MON_AWS_AURORA,
MON_READ_ONLY__AND__AWS_RDS_TOPOLOGY_DISCOVERY
MON_READ_ONLY__AND__AWS_RDS_TOPOLOGY_DISCOVERY,
MON_INNODB_READ_ONLY__AND__AWS_RDS_TOPOLOGY_DISCOVERY,
MON_READ_ONLY__AND__AWS_RDS_BLUE_GREEN_TOPOLOGY_DISCOVERY,
MON_INNODB_READ_ONLY__AND__AWS_RDS_BLUE_GREEN_TOPOLOGY_DISCOVERY,
};

enum MySQL_Monitor_Aws_Metadata_Check {
AWS_RDS_TOPOLOGY_CHECK,
AWS_RDS_BLUE_GREEN_DEPLOYMENT_STATE_CHECK,
NONE
};

enum class MySQL_Monitor_State_Data_Task_Result {
Expand Down Expand Up @@ -442,16 +455,20 @@ struct DNS_Resolve_Data {
unsigned int refresh_intv = 0;
};


class MySQL_Monitor {
public:
static std::string dns_lookup(const std::string& hostname, bool return_hostname_if_lookup_fails = true, size_t* ip_count = NULL);
static std::string dns_lookup(const char* hostname, bool return_hostname_if_lookup_fails = true, size_t* ip_count = NULL);
static bool update_dns_cache_from_mysql_conn(const MYSQL* mysql);
static void trigger_dns_cache_update();

void process_discovered_topology(const std::string& originating_server_hostname, const vector<MYSQL_ROW>& discovered_servers, int reader_hostgroup);
bool is_aws_rds_multi_az_db_cluster_topology(const std::vector<MYSQL_ROW>& discovered_servers);
void process_discovered_topology(const std::string& originating_server_hostname, const vector<MYSQL_ROW>& discovered_servers, const MySQL_Monitor_State_Data* mmsd, int num_fields);
bool is_aws_rds_multi_az_db_cluster_topology(const string& originating_server_hostname, const vector<tuple<string, int, long, int>>& discovered_servers);
bool is_aws_rds_topology_query_task(const MySQL_Monitor_State_Data_Task_Type& task_type);
bool mysql_row_matches_query_task(const unordered_set<string> &field_names, const MySQL_Monitor_State_Data_Task_Type &task_type);
void add_topology_query_to_task(MySQL_Monitor_State_Data_Task_Type &task_type);
bool is_aws_rds_topology_version_supported(const string& version);


private:
std::vector<table_def_t *> *tables_defs_monitor;
Expand Down Expand Up @@ -501,6 +518,8 @@ class MySQL_Monitor {
bool shutdown;
pthread_mutex_t mon_en_mutex;
bool monitor_enabled;
MySQL_Monitor_Aws_Metadata_Check rds_topology_check_type = MySQL_Monitor_Aws_Metadata_Check::AWS_RDS_TOPOLOGY_CHECK;
int topology_loop = 0;
SQLite3DB *admindb; // internal database
SQLite3DB *monitordb; // internal database
SQLite3DB *monitor_internal_db; // internal database
Expand Down Expand Up @@ -563,7 +582,7 @@ class MySQL_Monitor {
* Note: Calling init_async is mandatory before executing tasks asynchronously.
*/
void monitor_ping_async(SQLite3_result* resultset);
void monitor_read_only_async(SQLite3_result* resultset, bool do_discovery_check);
void monitor_read_only_async(SQLite3_result* resultset);
void monitor_replication_lag_async(SQLite3_result* resultset);
void monitor_group_replication_async();
void monitor_galera_async();
Expand Down
18 changes: 11 additions & 7 deletions lib/MySQL_HostGroups_Manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7355,23 +7355,27 @@ MySQLServers_SslParams * MySQL_HostGroups_Manager::get_Server_SSL_Params(char *h
* @param new_servers A vector of tuples where each tuple contains the values needed to add each new server.
*/
void MySQL_HostGroups_Manager::add_discovered_servers_to_mysql_servers_and_replication_hostgroups(
const vector<tuple<string, int, int>>& new_servers
const vector<tuple<string, int, long, int>>& new_servers
) {
int added_new_server = -1;

GloAdmin->mysql_servers_wrlock();
wrlock();

// Add the discovered server with default values
for (const tuple<string, int, int>& s : new_servers) {
for (const tuple<string, int, long, int>& s : new_servers) {
string host = std::get<0>(s);
uint16_t port = std::get<1>(s);
int port = std::get<1>(s);
long int hostgroup_id = std::get<2>(s);

srv_info_t srv_info { host.c_str(), port, "AWS RDS" };
srv_opts_t srv_opts { -1, -1, -1 };
int weight = std::get<3>(s);

srv_info_t srv_info { host.c_str(), (uint16_t)port, "AWS RDS" };
srv_opts_t srv_opts { weight, -1, -1 };

added_new_server = create_new_server_in_hg(hostgroup_id, srv_info, srv_opts);
int res = create_new_server_in_hg(hostgroup_id, srv_info, srv_opts);
if (added_new_server < 0) {
added_new_server = res;
}
Comment on lines +7375 to +7378

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for added_new_server has changed. Previously, it would reflect the status of the last server addition attempt. Now, it reflects the status of the first attempt if it was successful (0), or the error code of the first failure if all prior attempts also failed.

For example:

  • If create_new_server_in_hg returns 0 for success, -1 for failure:
    • Old: S, S, F -> added_new_server = -1
    • New: S, S, F -> added_new_server = 0 (from the first S)
    • Old: F, S, S -> added_new_server = 0
    • New: F, S, S -> added_new_server = -1 (from the first F)

Confirm if this change in how added_new_server is determined aligns with the intended behavior. If the goal is to know if any server was successfully added, a boolean flag might be clearer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but wouldn't F, S, S result in added_new_server = 0?
First server:
res = -1, added_new_server = -1 -> added_new_server = -1
Second server:
res = 0, added_new_server = -1 -> added_new_server = 0
Third server:
res = 0, added_new_server = 0 -> no change because added_new_server >= 0

The desired behavior is to update the internal structures if at least one new server was added. The previous implementation didn't do that.

}

// If servers were added, perform necessary updates to internal structures
Expand Down
Loading