Skip to content

Peer closes connection before other peer has read all the data #5060

Open
@ShahakShama

Description

@ShahakShama

Summary

Stream's poll_close function can return Ready on peer A before peer B has read all the data. and then if the connection closes due to not having an open stream on peer A, peer B doesn't get the event it should've gotten after reading the data (depends on which behaviour we're at) and gets a ConnectionClosed event instead.

Note that this only occurs when sending a lot of data.

I've created an executable to reproduce this in https://github.com/ShahakShama/libp2p_bug_example.
The executable implements the Request Response protocol and when writing a response it writes a lot of garbage bytes and when reading a response it reads a lot of garbage bytes.
In order to run this executable, in one terminal run the command
cargo run --release -- -l /ip4/127.0.0.1/tcp/11111 -m 1000000 -t 10
and in another terminal run
cargo run --release -- -l /ip4/127.0.0.1/tcp/22222 -d /ip4/127.0.0.1/tcp/11111 -m 1000000 -s -t 10
You'll might need to increase -m depending on your hardware. note that -m should be identical between both processes`

Expected behavior

I know that if I set a longer timeout using with_idle_connection_timeout then the issue is fixed, but I don't think it's intended that data gets lost if you don't set this timeout to a big enough value (If I'm wrong, I'd love to hear an explanation why)

IMO the connection should stay alive until the other peer got all the data that we've sent to it (e.g in TCP we got an ack message on all the data we've sent)

Actual behavior

The behaviour I'm seeing is that once our peer has sent all the data it may close the connection before it reached the other peer

Relevant log output

No response

Possible Solution

I don't know the inner implementation of libp2p well enough, but I think that there are a few areas in which to change the code:

  1. in Stream::poll_close, return Ready only when the other peer got all the data (as mentioned before with the TCP example)
  2. change the connection_keep_alive logic to check if the other peer got all the data
  3. Add some functionality to Stream to check if the other peer got all the data. Then, the behaviours that handle big messages can check if the stream can be safely dropped before dropping it (This will require to change the Request Response behaviour alongside other behaviours)

Version

0.53.2

Would you like to work on fixing this bug ?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions