Skip to content

Bug: Hostgroup Manager doesn't remove master from reader group, stuck as OFFLINE_HARD #1817

Open
@jmosborn

Description

@jmosborn

ProxySQL version 1.4.13-15-g69d4207, codename Truls
Ubuntu 14.04

I believe there is a bug in how the Hostgroup manager or Monitor treats a recovered master backend. Its status gets stuck as OFFLINE_HARD in the reader hostgroup until proxysql is restarted. This seems in violation of the purpose of mysql-monitor_writer_is_also_reader.

Steps to reproduce:

  1. Begin with master in hostgroup1, healthy and ONLINE, mysql-monitor_writer_is_also_reader = false:
ProxySQL Admin> select * from mysql_servers;
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | 10.86.179.223 | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
1 row in set (0.00 sec)

ProxySQL Admin> select * from stats_mysql_connection_pool;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1         | 10.86.179.223 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 690        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
1 row in set (0.00 sec)
  1. Shutdown master: service mysql stop

  2. ProxySQL moves master to hostgroup2, marks it SHUNNED, and marks it OFFLINE_HARD in hostgroup1:

ProxySQL Admin> select * from mysql_servers;
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 2            | 10.86.179.223 | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
1 row in set (0.00 sec)

ProxySQL Admin> select * from stats_mysql_connection_pool;
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1         | 10.86.179.223 | 3306     | OFFLINE_HARD | 0        | 0        | 1      | 0       | 54      | 2172            | 55008           | 730        |
| 2         | 10.86.179.223 | 3306     | SHUNNED      | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 730        |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
2 rows in set (0.00 sec)
  1. Start mysql back up: service mysql start

  2. ProxySQL moves master back to hostgroup1, marks it ONLINE. That's good. But the connection_pool says its still in hostgroup2 as OFFLINE_HARD:

ProxySQL Admin> select * from mysql_servers;
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname      | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | 10.86.179.223 | 3306 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+---------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
1 row in set (0.00 sec)

ProxySQL Admin> select * from stats_mysql_connection_pool;
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | status       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 1         | 10.86.179.223 | 3306     | ONLINE       | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 650        |
| 2         | 10.86.179.223 | 3306     | OFFLINE_HARD | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 650        |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
2 rows in set (0.00 sec)

Problem: the master never leaves hostgroup2 as OFFLINE_HARD after a recovery like this. I have to restart ProxySQL to get the connection pool to show the master eliminated from hostgroup2. This causes problems because we monitor stats_mysql_connection_pool to make sure all backends are ONLINE, and this makes our monitor think we have a down backend. Anytime the master has a brief outage, we get a false status in the connection_pool.

Note: the same problem happens if you switch read_only to ON then OFF on the master instead of restarting mysql completely.

Expected behavior: The connection pool should not show it OFFLINE_HARD when it is in fact ONLINE and healthy. Further, when mysql-monitor_writer_is_also_reader = false, the hostgroup manager or Monitor or whoever is responsible should eliminate the master/writer from the reader group when it is recovered. It probably shouldn't even be moved to the reader group at all; it should just be marked as OFFLINE in hostgroup1.

proxysql.log

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions