-
Notifications
You must be signed in to change notification settings - Fork 628
Create Live Snapshot without shutting down node #2816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
b8e1bfd
3a2957d
989df29
37bb881
a1a558c
5614d67
af4be8d
a76fa30
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -523,6 +523,32 @@ func mainImpl() int { | |
} | ||
} | ||
|
||
// TODO: after the consensus-execution split, remove this code block and modify LiveDBSnapshotter to | ||
// directly read from sigur2. Currently execution will trigger snapshot creation of consensus side once | ||
// its own snapshotting is complete. | ||
var executionDBSnapShotTrigger, consensusDBSnapShotTrigger chan struct{} | ||
if nodeConfig.Execution.LiveDBSnapshotter.Enable { | ||
executionDBSnapShotTrigger = make(chan struct{}, 1) | ||
go func() { | ||
sigusr := make(chan os.Signal, 1) | ||
signal.Notify(sigusr, syscall.SIGUSR2) | ||
for { | ||
select { | ||
case <-ctx.Done(): | ||
return | ||
case <-sigusr: | ||
select { | ||
case executionDBSnapShotTrigger <- struct{}{}: | ||
default: | ||
} | ||
} | ||
} | ||
}() | ||
} | ||
if nodeConfig.Node.LiveDBSnapshotter.Enable { | ||
consensusDBSnapShotTrigger = make(chan struct{}, 1) | ||
} | ||
|
||
execNode, err := gethexec.CreateExecutionNode( | ||
ctx, | ||
stack, | ||
|
@@ -531,6 +557,8 @@ func mainImpl() int { | |
l1Client, | ||
func() *gethexec.Config { return &liveNodeConfig.Get().Execution }, | ||
liveNodeConfig.Get().Node.TransactionStreamer.SyncTillBlock, | ||
executionDBSnapShotTrigger, | ||
consensusDBSnapShotTrigger, // execution will invoke conensus's db snapshotting | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Today we have a circular call dependency between Consensus and Execution, but ideally we would like to get rid of that, and we are moving in that direction. In the current model that we have Consensus calls Execution, e.g., Consensus retrieves messages from different sources and then calls Execution to generate blocks related to those messages. That said, I think we should move the snapshot trigger API from Execution to Consensus, and Consensus will trigger Execution's DB snapshot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, in the near future we intend to split Consensus and Execution in different processes. Today we rely on having functions on ConsensusSequencer interface in which Execution can use to call Consensus. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the context! But for the working snapshot to be generated we need to first snapshot execution's dbs and then consensus's db i.e blockheight in chainDB <= arbDB. I understand your suggestion that the execution shouldn't call consensus actions, but this is merely a signal transfer to start db snapshotting. Moreover it is implemented in a way to be easily removed after the split with the idea that both consensus and execution will each have snaphotting API that will snapshot their respective dbs. Trying to make consensus trigger execution's database creation is essentially reverse of what we currently have but moving code around to consensus side, the code that will anyway be need to be removed after the split. If this is really a priority then I can completely decouple it so that livedbsnapshotter will return a success signal after db creation and nitro.go can handle signals entirely, let me know. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will ping you so we can chat 🙂, it will be easier to give you more context, but here goes a summary.
There is a common misconception, that the procedure that coordinates an action between multiple components should be placed where the action starts to happen. Nitro, in some parts, follows this colocation pattern today, which has some issues. That said, live snapshot coordination can be placed in Consensus, and we will still be able to snapshot Execution's DB before Consensus' DB.
I don't fully understand that 😕 Why code related to live snapshot will be removed after the split? If I inferred correctly, you are saying that, only after the split, we will have an API in Consensus that will trigger a snapshot of Consensus's DB, so a human node operator will be able to first trigger Execution's DB snapshot, and after that trigger Consensus' DB snapshot, right? I am more inclined to have an UX in which the human node operator only triggers snapshot once, and the software is responsible to coordinate snapshotting Execution and Consensus sides in the correct order. Also, we intend to have other Execution clients, such as one based on reth, but we do not intend to have multiple Consensus clients, at least in the near future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, to snapshot Consensus' DB we will need to hold insertionMutex, right? |
||
) | ||
if err != nil { | ||
log.Error("failed to create execution node", "err", err) | ||
|
@@ -555,6 +583,7 @@ func mainImpl() int { | |
fatalErrChan, | ||
new(big.Int).SetUint64(nodeConfig.ParentChain.ID), | ||
blobReader, | ||
consensusDBSnapShotTrigger, | ||
) | ||
if err != nil { | ||
log.Error("failed to create node", "err", err) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: