Replies: 7 comments 8 replies
-
The only problem I can see with this approach, assuming Kademlia does pull from the Swarm cache to bootstrap, is that the peer store will be storing all peers we've connected to and not just the peers that were confirmed to support the kademlia protocol and also in the kademlia k-buckets. So this isn't as big of a speed up as it could be. If the peers persisted to disk were only the peers that are also in the kademlia k-buckets, then we'd had the theoretically maximum speedup possible. With this approach only some fraction of the peers will be dialed and have their protocol confirmed so they will be added to the Kademlia routing tables iff |
Beta Was this translation helpful? Give feedback.
-
It seems like there could be a more straight forward and efficient way to:
|
Beta Was this translation helpful? Give feedback.
-
FWIW the js peer store serializes supported protocols (from identify) along with other metadata and at startup KAD-DHT only adds peers that have previously claimed to support the KAD-DHT protocol to the routing table, it doesn't add every peer in the peer store. By default on loading from the peer store, peers that have not been successfully dialled in the last hour have their multiaddrs removed (requiring a peer routing lookup to dial to ensure their addresses are current) and peers that have been without multiaddrs for six hours are removed entirely. So if it's a quick restart of a node, it'll take peers from the peerstore but if the node has been offline for more than six hours it'll go back to the bootstrappers to build it's routing table. |
Beta Was this translation helpful? Give feedback.
-
I'm starting to get the sense that the Peer information is fragmented too much in rust-libp2p. The Swarm has a cache. Kademlia has a cache. Gossipsub has a cache. On and on and on. They all store different things and have different heuristics for adding/removing/updating records. I'm wondering if the time has come to re-think peer information storage and interfacing. At least a little bit. My main concern is that there doesn't seem to be a direct way to gather all of the Peer information from the various caches and store it to disk and later restore from disk and emit events that give the various caches to restore their state directly. For instance, if I serialize the Kademlia's Peer and Provider records to disk and later reload it from disk, there's no code for getting that data back into the k-buckets/routing table and trigger a "filter and refresh" pass. It seems like there should be a way to do that. Operating on the assumption that a time, network, and energy intensive bootstrap process must happen on every startup seems like a bad design. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I can see 2 benefits of persisting peers:
I guess the main goal is 1. I don't know whether it is actually easy, but periodically persisting kademlia buckets (all peerids and associated maddrs) to disk should be enough. As @dhuseby said you must have at least 1 bootstrapper, so on restart try to connect to all of these peers (irrespective of actual latest connection), and if you can connect to at least 1 (that isn't a hardcoded boostrapper), it's a win. On restart, it is also possible to load the buckets as they were during the last snapshot, and let them be refreshed/purged automatically. This means that the routing table will contain unreachable peers for a while, which is undesirable. Restarting by loading the last buckets snapshot may not be the fastest/more efficient way to bootstrap since many timeouts are to be expected. However, trying to connect to all peers from the previous snapshot, and using only the ones responding quickly could be a way to speedup the bootstrap, while relying less on hardcoded bootstrappers. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
@elenaf9 I was talking to @achingbrain about how js-libp2p accelerates Kademlia startup. As I understand it, they use a peer store that can be persisted to disk and loaded again. It contains PeerId's, their associated multiaddrs and a timestamp of the last time there was a connection on each multiaddr.
In general, when you load that state from disk, it adds the peer data to the peer store and they get filtered by age and some other details irrelevant to this discussion. However, after the PeerId's and multiaddrs have been "expired", the remaining are announced to the rest of the protocols. The Kademlia protocol implementation picks those up and adds them as discovered peers, which at that point, means they go into the buckets of the protocol. Then Kademlia tries to dial/ping all of them to refresh that state.
I was looking at the recently added peer store and I think it can be used to do something similar in rust-libp2p.
On first startup with no persistent state, the kademlia will bootstrap and each discovered peer will cause a
NewExternalAddrOfPeer
event to be emitted which the peer store receives and records. If, in my peer behavior event loop, I handlepeer_store::RecordUpdated
events, I can use theMemoryStore::insert_custom_data
function and theMemoryStore::get_custom_data_mut
function to maintain a mapping between Multiaddr and timestamps to mimic the time stamping that js-libp2p does.Then on shutdown, I can use the
MemoryStore::record_iter
to get the stored PeerId's and their associated PeerRecord and custom data to serialize it to disk.When running the peer again, but this time with cached peer state, I can do the "expiring" of stale Multiaddrs similar to js-libp2p and then add them to the swarm via
Swarm::add_peer_address
.The only problem is, the Kademlia implementation in rust-libp2p is very obtuse in how it would pick up the added peers. Essentially, I'm adding peers into the Swarm's address cache which I think the Kademlia protocol will use when bootstrapping if it knows of no other peers. I do know Kademlia will use the Swarm cache when trying to replace a disconnected peer in its k-buckets because it tries to re-dial via PeerId only. However, I'm not 100% certain that the bootstrap process will grab peers from the Swarm cache. I'll need to test that.
Anyway, that's the current best answer on using persistent peer data to potentially accelerate Kademlia startup.
Beta Was this translation helpful? Give feedback.
All reactions