diff --git a/enhancements/two-node-fencing/tnf.md b/enhancements/two-node-fencing/tnf.md index 37abadda46..fcd1701b95 100644 --- a/enhancements/two-node-fencing/tnf.md +++ b/enhancements/two-node-fencing/tnf.md @@ -125,7 +125,7 @@ Upon rebooting, the RHEL-HA components ensure that a node remains inert (not run If the failed peer is likely to remain offline for an extended period, admin confirmation is required on the remaining node to allow it to start OpenShift. This functionality exists within RHEL-HA, but a wrapper will be provided to take care of the details. -When starting etcd, the OCF script will use etcd's cluster ID and version counter to determine whether the existing data directory can be reused, or must be erased before joining an active peer. +When starting etcd, the OCF script will use data on disk (e.g. etcd's cluster ID) and the current state of the cluster (e.g. which resource agent is already running) to determine whether the existing data directory can be reused, or must be erased before joining an active peer. ### Summary of Changes @@ -441,7 +441,7 @@ For `platform: none` clusters, this will require customers to provide an ingress #### Graceful vs. Unplanned Reboots Events that have to be handled uniquely by a two-node cluster can largely be categorized into one of two buckets. In the first bucket, we have things that trigger graceful reboots. This includes events like upgrades, MCO-triggered reboots, and users sending a shutdown command to one of the nodes. In each of these cases - assuming a functioning two-node cluster - the node that is shutting down must wait for pacemaker to signal to etcd to remove the node from the etcd quorum to maintain e-quorum. When the node reboots, it must rejoin the etcd cluster and sync its database to the active node. -Unplanned reboots include any event where one of the nodes cannot signal to etcd that it needs to leave the cluster. This includes situations such as a network disconnection between the nodes, power outages, or turning off a machine using a command like `poweroff -f`. The point is that a machine needs to be fenced so that the other node can perform a special recovery operation. This recovery involves pacemaker restarting the etcd on the surviving node with a new cluster ID as a cluster-of-one. This way, when the other node rejoins, it must reconcile its data directory and resync to the new cluster before it can rejoin as an active peer. +Unplanned reboots include any event where one of the nodes cannot signal to etcd that it needs to leave the cluster. This includes situations such as a network disconnection between the nodes, power outages, or turning off a machine using a command like `poweroff -f`. The point is that a machine needs to be fenced so that the other node can perform a special recovery operation. This recovery involves pacemaker restarting the etcd on the surviving node as a cluster-of-one. This way, when the other node rejoins, it must reconcile its data directory and resync to the new cluster before it can rejoin as an active peer. #### Failure Scenario Timelines: This section provides specific steps for how two-node clusters would handle interesting events.