Description
ProxySQL version 1.4.13-15-g69d4207, codename Truls
Ubuntu 14.04
I believe there is a bug in how the Hostgroup manager or Monitor treats a recovered master backend. Its status gets stuck as OFFLINE_HARD in the reader hostgroup until proxysql is restarted. This seems in violation of the purpose of mysql-monitor_writer_is_also_reader
.
Steps to reproduce:
- Begin with master in hostgroup1, healthy and ONLINE,
mysql-monitor_writer_is_also_reader = false
:
ProxySQL Admin> select * from mysql_servers;
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1 | 10.86.179.223 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
1 row in set (0.00 sec)
ProxySQL Admin> select * from stats_mysql_connection_pool;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1 | 10.86.179.223 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 690 |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
1 row in set (0.00 sec)
-
Shutdown master: service mysql stop
-
ProxySQL moves master to hostgroup2, marks it SHUNNED, and marks it OFFLINE_HARD in hostgroup1:
ProxySQL Admin> select * from mysql_servers;
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 2 | 10.86.179.223 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
1 row in set (0.00 sec)
ProxySQL Admin> select * from stats_mysql_connection_pool;
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1 | 10.86.179.223 | 3306 | OFFLINE_HARD | 0 | 0 | 1 | 0 | 54 | 2172 | 55008 | 730 |
| 2 | 10.86.179.223 | 3306 | SHUNNED | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 730 |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
2 rows in set (0.00 sec)
-
Start mysql back up: service mysql start
-
ProxySQL moves master back to hostgroup1, marks it ONLINE. That's good. But the connection_pool says its still in hostgroup2 as OFFLINE_HARD:
ProxySQL Admin> select * from mysql_servers;
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1 | 10.86.179.223 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
1 row in set (0.00 sec)
ProxySQL Admin> select * from stats_mysql_connection_pool;
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1 | 10.86.179.223 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 650 |
| 2 | 10.86.179.223 | 3306 | OFFLINE_HARD | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 650 |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
2 rows in set (0.00 sec)
Problem: the master never leaves hostgroup2 as OFFLINE_HARD after a recovery like this. I have to restart ProxySQL to get the connection pool to show the master eliminated from hostgroup2. This causes problems because we monitor stats_mysql_connection_pool to make sure all backends are ONLINE, and this makes our monitor think we have a down backend. Anytime the master has a brief outage, we get a false status in the connection_pool.
Note: the same problem happens if you switch read_only to ON then OFF on the master instead of restarting mysql completely.
Expected behavior: The connection pool should not show it OFFLINE_HARD when it is in fact ONLINE and healthy. Further, when mysql-monitor_writer_is_also_reader = false
, the hostgroup manager or Monitor or whoever is responsible should eliminate the master/writer from the reader group when it is recovered. It probably shouldn't even be moved to the reader group at all; it should just be marked as OFFLINE in hostgroup1.