-
Notifications
You must be signed in to change notification settings - Fork 910
Forward sync columns by root #7946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
N/A Extracts (3) from #7946. Prior to peerdas, a batch should never have been in `AwaitingDownload` state because we immediataly try to move from `AwaitingDownload` to `Downloading` state by sending batches. This was always possible as long as we had peers in the `SyncingChain` in the pre-peerdas world. However, this is no longer the case as a batch can be stuck waiting in `AwaitingDownload` state if we have no peers to request the columns from. This PR makes `AwaitingDownload` to be an allowable in between state. If a batch is found to be in this state, then we attempt to send the batch instead of erroring like before. Note to reviewer: We need to make sure that this doesn't lead to a bunch of batches stuck in `AwaitingDownload` state if the chain can be progressed. Backfill already retries all batches in AwaitingDownload state so we just need to make `AwaitingDownload` a valid state during processing and validation. This PR explicitly adds the same logic for forward sync to download batches stuck in `AwaitingDownload`. Apart from that, we also force download of the `processing_target` when sync stops progressing. This is required in cases where `self.batches` has > `BATCH_BUFFER_SIZE` batches that are waiting to get processed but the `processing_batch` has repeatedly failed at download/processing stage. This leads to sync getting stuck and never recovering.
c81ce28
to
e0d8f04
Compare
Some required checks have failed. Could you please take a look @pawanjay176? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pawanjay176
I haven't managed to go through all the logic, but I've added comments for the findings I have so far. Let me know what you think. I'll continue next week.
.peers | ||
.read() | ||
.good_custody_subnet_peer_range_sync(data_column, batch_epoch) | ||
.next() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This always picks the first matching peer, would it be possible for the same peer to keep getting selected for the same column across different batches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a hashmap so every iteration in the peerdb would return a different order with new peers getting added and old peers getting removed I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This program prints the same result on every iteration BUT different across runs - it uses a randomised hasher but the iteration order is deterministic, so I'm not sure if we can't rely on this:
#[test]
fn main() {
let mut a = HashMap::new();
a.insert("a", 1);
a.insert("z", 2);
a.insert("b", 3);
a.insert("y", 4);
a.insert("c", 5);
a.insert("x", 6);
let get = || a.iter().filter(|(key,&val)| val % 2 == 0).map(|(key,val)| key);
for i in 0..100 {
println!("{:?}", get().next());
}
}
Even if it's really random, i think callers should not rely on the implementation details of peer_db
, and either (1) perform the random logic on the caller site (2) encapsulate the logic in a get_next_good_custody_subnet_peer
This way callers of the API don't make assumptions about the internal data structure of the DB, and prevents us from creating unexpected bugs, i.e. we don't break things if we change the internal data structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, i'll choose randomly at the call site too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 04398ad using .choose()
Some(resp) | ||
} | ||
// Reuse same logic that we use for coupling data columns for now. | ||
// todo(pawan): we should never get a coupling error here, so simplify this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that since we are requesting by root and also verifying the inclusion proof when adding a data column, by the time we reach here we should have caught all errors.
If we get a valid response for the data columns (no VerifyError), then there shouldn't be any issues with coupling
/// This is used for penalizing in case of invalid batches. | ||
#[derive(Debug, Clone)] | ||
pub struct ResponsiblePeers { | ||
pub block_blob: PeerId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean block_peer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe block_blob_peer
? the peer we request the block and blobs from are the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about block_and_blobs
? Or just block_peer
and a doc explaining that it serves both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the same strategy for backfill sync too? Would simplify the code. Do you believe it would make backfill sync much slower?
} | ||
} | ||
DataColumnsByRootRequester::RangeSync { parent } => { | ||
if let Some(resp) = self.network.on_data_columns_by_root_range_response( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to repeat the on_data_columns_by_root_range_response
call inside the match? You can do:
if let Some(_) = on_data_columns_by_root_range_response {
match req_id.requester { .. }
}
range_request_id.id, | ||
blocks, | ||
responsible_peers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should blocks_by_range_response
have the same order of arguments as on_block_response
?
} | ||
} | ||
|
||
impl<E: EthSpec> ActiveRequestItems for DataColumnsByRootRangeRequestItems<E> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can de-duplicate this code by making DataColumnsByRootRequestItems
take a vec of block roots
/// This is used for penalizing in case of invalid batches. | ||
#[derive(Debug, Clone)] | ||
pub struct ResponsiblePeers { | ||
pub block_blob: PeerId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about block_and_blobs
? Or just block_peer
and a doc explaining that it serves both.
/// | ||
/// This is used for penalizing in case of invalid batches. | ||
#[derive(Debug, Clone)] | ||
pub struct ResponsiblePeers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could be very irresponsible if they serve bad data :)
What about SourcePeers
or ProviderPeers
or ServingPeers
? Or just BatchPeers
since in the context peers only do one thing and it's serving data
@@ -1163,6 +1190,28 @@ impl<T: BeaconChainTypes> SyncingChain<T> { | |||
self.send_batch(network, batch_id)?; | |||
} | |||
|
|||
// Force requesting the `processing_batch` to progress sync if required | |||
if !self.batches.contains_key(&self.processing_target) { | |||
debug!(?self.processing_target,"Forcing requesting processing_target to progress sync"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug!(?self.processing_target,"Forcing requesting processing_target to progress sync"); | |
debug!(?self.processing_target, "Forcing requesting processing_target to progress sync"); |
@@ -1051,6 +1077,8 @@ impl<T: BeaconChainTypes> SyncingChain<T> { | |||
} | |||
}, | |||
} | |||
} else { | |||
debug!(?self.to_be_downloaded, ?self.processing_target,"Did not get batch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug!(?self.to_be_downloaded, ?self.processing_target,"Did not get batch"); | |
debug!(?self.to_be_downloaded, ?self.processing_target, "Did not get batch"); |
/// | ||
/// This function is used when we want to request data columns by root instead of range. | ||
/// Pre-fulu, it works similar to `Self::block_components_by_range_request`. | ||
pub fn block_components_by_range_request_without_components( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated code from block_components_by_range_request
you can add a new ByRangeRequestType
and re-use the function above
/// | ||
/// This function must be manually invoked at regular intervals or when a new peer | ||
/// gets added. | ||
pub fn retry_pending_root_range_requests(&mut self) -> Result<(), String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is good. To sum up, the new functionality added in this PR is:
New by_range request type to:
- Request blocks first
- Then with those roots request data columns by root
- If any of those requests fail or mismatch retry
All this logic can be grouped and moved into its own file (the coupling service) and make network_context.rs
less chaotic. I did so in my tree-sync WIP, and subjectively feels nicer https://github.com/dapplion/lighthouse/blob/47c93578c418bdac5c3beb3064ab5f675c3c177d/beacon_node/network/src/sync/network_context/block_components_by_range.rs
|
||
// Re-insert entries that still need to be retried | ||
self.pending_column_by_root_range_requests | ||
.extend(entries_to_keep); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This retries should have either a retry counter or an expiry time. We should have metrics also to track the count of retries
column_requests: Vec<(DataColumnsByRootRequestId, Vec<ColumnIndex>)>, | ||
) -> Result<(), String> { | ||
// Nothing to insert, do not initialize | ||
if column_requests.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this ever happen?
|
||
Ok(()) | ||
} | ||
_ => Err("Invalid initialization".to_string()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ => Err("Invalid initialization".to_string()), | |
_ => Err("Invalid state: expected DataColumnsFromRoot".to_string()), |
/// Note: this variant starts out in an uninitialized state because we typically make | ||
/// the column requests by root only **after** we have fetched the corresponding blocks. | ||
/// We can initialize this variant only after the columns requests have been made. | ||
DataColumnsFromRoot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a new variant for it?
Issue Addressed
N/A
The Problem
Our current strategy of syncing blocks + columns by range works roughly as follows for each batch:
SyncingChain
to fetch blocks from and send aBlocksByRange
requestDataColumnsByRange
request at the same timeblock_root
and thekzg_commitment
matches. If the coupling failed, try to re-request the failed columns.This strategy works decently well when the chain is finalizing as most of our peers are on the same chain. However, in times of non-finality we need to potentially sync multiple head chains.
This leads to issues with our current approach because the block peer and the data column peers might have a different view of the canonical chain due to multiple heads. So when we use the above approach, it is possible that the block peer returns us a batch of blocks for chain A while some or all data column peers send us the batch of data columns for a different chain B. Different data column peers might also be following different chains.
We initially tried to get around this problem by selecting column peers only from within the current
SyncingChain
. EachSyncingChain
represents ahead_root
that we are trying to sync to and we group peers based on samehead_root
. That way, we know for sure that the block and column peers are on the same chain. This works in theory, but in practice, during long periods of non-finality, we tend to create multiple head chains based on thehead_root
and split the global peerset. Pre-fulu, this isn't a big deal since all peers are supposed to have all the blob data.But splitting peers with peerdas is a big challenge due to not all peers having the full data available. There are supernodes, but during bad network conditions, supernodes would be getting way too many requests and not even have any incoming peer slots. As we saw on fusaka devnets, this strategy leads to sync getting stalled and not progressing.
Proposed Changes
1. Use
DataColumnsByRoot
instead ofDataColumnsByRange
to fetch columns for forward syncThis is the main change. The new strategy would go as follows:
SyncingChain
to fetch blocks from and send aBlocksByRange
requestblock_roots
and trigger aDataColumnsByRoot
request for every block in the response that has any blobs based on theexpected_kzg_commitments
field.(4) kinda assumes that most synced/advanced peers would have different chains in their fork choice to be able to serve specific by root requests. My hunch is that this is true, but we should validate this in a devnet 4 like chain split scenario.
Note: that we currently use this by root strategy only for forward sync, not for backfill. Backfill has to deal with only a single canonical chain so byrange requests should work well there.
2. ResponsiblePeers to attribute peer fault correctly
Adds the
ResponsiblePeers
struct which stores the block and the column peers that we made the download requests to.For most of our peer attributable errors, the processing error indicates whether the block peer was at fault or if the column peer was at fault.
We now communicate this information back to sync and downscore specific peers based on the fault type. This imo, is an improvement over current unstable where most of the time, we attribute fault to the peer that "completed" the request by being the last peer to respond.
Due to this ambiguity in fault attribution, we weren't downscoring pretty serious processing errors like
InvalidKzgProofs
,InvalidExecutionPayload
etc. I think this PR attributes the errors to the right peers. Reviewers please check that this claim is actually true.3. Make
AwaitingDownload
an allowable in-between stateNote: This has been extracted to its own PR here and merged #7984
Prior to peerdas, a batch should never have been in
AwaitingDownload
state because we immediataly try to move fromAwaitingDownload
toDownloading
state by sending batches. This was always possible as long as we had peers in theSyncingChain
in the pre-peerdas world.However, this is no longer the case as a batch can be stuck waiting in
AwaitingDownload
state if we have no peers to request the columns from. This PR makesAwaitingDownload
to be an allowable in between state. If a batch is found to be in this state, then we attempt to send the batch instead of erroring like before.Note to reviewer: We need to make sure that this doesn't lead to a bunch of batches stuck in
AwaitingDownload
state if the chain can be progressed.