-
Notifications
You must be signed in to change notification settings - Fork 50
TQ: Implement prepare and commit for initial config #8682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
// | ||
// Nexus should only attempt to commit nodes that have acknowledged | ||
// a `Prepare`. The most likely reason that this has occurred | ||
// is that the node has lost its state on the M.2 drives. It can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that recovery is not actually guaranteed here, as the drive could have been wiped after acking the latest configuration but not yet rotating. A new configuration could have then been issued that doesn't contain the encrypted rack secret for the unrotated keys' epoch. I think that this is a rare scenario, but I also think we probably don't need to handle byzantine failure here. Instead, this should probably be an alarm state (similar to what was done in #8062) and support call if the data on the M.2s is gone.
Initial configurations can be prepared and committed with the implemented handlers. This is tested along with aborts at Nexus for when the coordinator for the initial configuration has crashed in a new property based test. The new property based test runs all possible nodes in the universe as the system under test (SUT), rather than running only the coordinator. This allows a full deterministic simulation of the protocol and checking of invariants at all nodes. It's also easier to write and understand as we don't have to capture and mock replies to the coordinator. I had always intended to write this test, but started with modelling the coordinator first since I thought it would be easier to incrementally build the protocol that way. However, it appears just as easy to incrementally build with all nodes as the SUT. The new test does not have a model of the system, which is exceedingly hard to do for such a protocol. Instead the test checks invariants of the real state of the SUT after every action, and allows peppering in postconditions as necessary for each action or operation. The Node API has also changed to not worry about time at all, and instead deals in terms of connections and disconnections. This makes for simpler code IMO, and matches what was done for LRTQ. We always are operating over sprockets streams, which run over TLS over TCP and so it makes little sense to model things as if arbitrary packets can get dropped and reordered. As a result of the new proptest and the change in time usage, I've decided to drop the coordinator test altogether. It's too complicated for its value add and urgency is a priority.
771cde7
to
071f1cf
Compare
Builds upon #8682 This PR implements the ability to reconfigure the trust quorum after a commit. This includes the ability to fetch shares for the most recently committed configuration to recompute the rack secret and then include that in an encrypted form in the new configuration for key rotation purposes. The cluster proptest was enhanced to allow this, and it generates enough races - even without crashing and restarting nodes that it forced the handling of `CommitAdvance` messages to be implemented. This implementation includes the ability to construct key shares for a new configuration when a node misses a prepare and commit for that configuration. This required adding a `KeyShareComputer` which collects key shares for the configuration returned in a `CommitAdvance` so that it can construct its own key share and commit the newly learned configuration. Importantly, constructing a key share and coordinating a reconfiguration are mutually exclusive, and so a new invariant was added to the cluster test. We also start keeping track of expunged nodes in the cluster test, although we don't yet inform them that they are expunged if they reach out to other nodes. There are a few places in the code where a runtime invariant is violated and an error message is logged. This always occurs on message receipt and we don't want to panic at runtime because of an errant message and take down the sled-agent. However, we'd like to be able to report these upstream. The first step here is to be able to report when these situations are hit and put the node in an `Alarm` state such that it is stuck until remedied via support. We should *never* see an Alarm in practice, but since the states are possible to reach, we should manage them appropriately. This will come in a follow up PR and be similar to what I implemented in #8062.
Initial configurations can be prepared and committed with the implemented handlers. This is tested along with aborts at Nexus for when the coordinator for the initial configuration has crashed in a new property based test.
The new property based test runs all possible nodes in the universe as the system under test (SUT), rather than running only the coordinator. This allows a full deterministic simulation of the protocol and checking of invariants at all nodes. It's also easier to write and understand as we don't have to capture and mock replies to the coordinator. I had always intended to write this test, but started with modelling the coordinator first since I thought it would be easier to incrementally build the protocol that way. However, it appears just as easy to incrementally build with all nodes as the SUT.
The new test does not have a model of the system, which is exceedingly hard to do for such a protocol. Instead the test checks invariants of the real state of the SUT after every action, and allows peppering in postconditions as necessary for each action or operation.
The Node API has also changed to not worry about time at all, and instead deals in terms of connections and disconnections. This makes for simpler code IMO, and matches what was done for LRTQ. We always are operating over sprockets streams, which run over TLS over TCP and so it makes little sense to model things as if arbitrary packets can get dropped and reordered.
As a result of the new proptest and the change in time usage, I've decided to drop the coordinator test altogether. It's too complicated for its value add and urgency is a priority.