-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
This is the tracking issue for the streams of work that need to get corrected/fixed as result of the Kubo 0.17 release with libp2p resource manager enabled by default: #8761
Must complete before 0.18 RC
Theme 2: reports of default limits not working well for users
- cannot reserve inbound connection: resource limit exceeded #9432 . There are multiple problems buried in here per cannot reserve inbound connection: resource limit exceeded #9432 (comment). This theme is about raising the default limits.
Theme 3: improve RM errors coming from other peers
This is related to reports of a disabled go-libp2p resource manager still managing resources. In fact this is an error message from a remote go-libp2p peer exceeding their own resource limits and when the message is printed on the local peer its hard to differentiate between the do.
Things we can do:
- Adjust messaging for resource manager on the local Kubo side so it's easier to differentiate between local and remote peer resource manager errors.
- It looks like the message already is differentiable. Feel free to do more though.
- Adds some docs on how to differentiate and point to the go-libp2p issue for handling this better in go-libp2: quic: Error from peer should be explicit that it is coming from remote peer rather than a local error libp2p/go-libp2p#1928
Theme 4: confusion around "magic values"
This is about how "4611686018427388000" looks like a random number when it is actually our "infinity".
Things we can do:
- Allow -1 to denote infinity for the go-libp2p resource manager. Requested: Enable specifying infinite limits with a more "intuitive" magic number like -1 libp2p/go-libp2p#1935
- Document what Kubo's magic value is:
- Using another number that is more obvious a magic number like 999999999999999999
- It then also possible to search and find the number int he code. (Right now we are computing it which is what causes it not to show up in search)
Options not on the table:
ipfs swarm limits all
to go from "4611686018427388000" to "infinity" on the output because that isn't valid JSON.- Similarly we can't add a comment after the magic number because that isn't valid json and the go json parser we use doesn't allow parsing non-spec compliant things like comments.
Theme 5: be clearer on startup about what limits are being set and why
Add a log message like:
First computing default go-libp2p resource manager limits based on Swarm.ResourceManager.maxMemory of {Swarm.ResourceManager.maxMemory} and then applying any user-supplied overrides on top. Run
ipfs swarm limits all
to see the resulting limits.
Theme 6: Wrong tone about resource manager getting in the way (being a bug) vs. being a feature
UX angle about being clear that in general this is a feature not a bug:
- cannot reserve inbound connection: resource limit exceeded #9432 (comment)
- docs: libp2p resource management #9468
Theme 7: clarity around the "error message" meaning
There isn't clarity around what "system: cannot reserve inbound connection: resource limit exceeded"
. means. For this example, it means Swarm.ResourceMgr.Limits.System.ConnsInbound is exceeded.
Things we can do:
- Provide docs on how to interpret this message.
Reverse engineer the message based on https://github.com/libp2p/go-libp2p/blob/master/p2p/host/resource-manager/scope.go so we can map it back toSwarm.ResourceMgr.Limits.$scope.$limit
. If we do that, we can then print what the limit value is.- 2022-12-06 maintainer conversation: not going to do this
Theme 8: Provide actionable advice when resource limits are hit
When a resource limit is hit, we point users to https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr. We could provide a better documentation path for how someone debugs this situation.
Theme 9: have the ConnMgr limits be set under Swarm.ResourceMgr.Limits.System.ConnsInbound
As discussed in #9468, this allows low priority idle connections to get cleaned up to make space for higher priority connections
Things we can do:
- Lower the default limits
- Log when ConnMgr limits aren'tunder Swarm.ResourceMgr.Limits.System.ConnsInbound
Theme 10: resource manager doing its job of protecting a node is alarming
This was broughtup throughout #9432, but generally users have found the resource manager "ERRORs" spammy. That they are printed as ERRORs also runs counter to our narrative about this being a feature. Should a feature doing its job be an error?
Things we can do:
Theme 11: fix bugs in the swarm stats command
Theme 12: remove additional footguns around (soft) ConnMgr and (hard) ResourceMgr limits and their interactions
- Computed default ResourceMgr limits account for ConnMgr HighWater and are sufficiently high #9545
- Fail if Swarm.ConnMgr.HighWater is above Swarm.ResourceMgr hard limits #9549
Theme 13: clarify and improve handling of zeroes
- Make it possible to set 0-value user-supplied resource manage limits #9564
- Resource Manager:
ipfs swarm limit <scope> --reset
is setting to zero all other scopes. #9559
Ideally completing before the 0.18 final release
Theme 1: usability issues in entering config
It's too easy for someone to enter invalid config and not get any feedback that they have done so.
- fix: Avoid unknown fields on config #9438
- This will happen in 2023Q1 and won't making it in for 0.18