Skip to content

Conversation

sunkuamzn
Copy link
Contributor

The first two messages were using different message sizes for some reason

@j-xiong
Copy link
Contributor

j-xiong commented Sep 10, 2025

Anything wrong with the original code? These messages are not part of the main test loop.

@sunkuamzn
Copy link
Contributor Author

@j-xiong the test was passing but the TX and RX sizes were mismatched. Making these sizes opts.transfer_size also increases test coverage in that the endpoint must be able to handle a receive of opts.transfer_size size before the call to fi_av_insert

@j-xiong
Copy link
Contributor

j-xiong commented Sep 11, 2025

Tx size and Rx size don't need to match, and testing of arbitrary-size message before fi_av_insert is not the goal of this test. What in the main test loop is what to be tested.

@sunkuamzn
Copy link
Contributor Author

@j-xiong Tx and Rx sizes don't need to match but it's confusing to have different sizes. Using addrlen for the Tx buffer size is especially confusing because the Tx buffer has nothing to do with the length of the address returned by fi_getname.

Do you not like opts.transfer_size?

@j-xiong
Copy link
Contributor

j-xiong commented Sep 11, 2025

It's common to post a fixed size rx buffer (in this case rx_size) to match different tx sizes. That should not be a source of confusion. addrlen is used for the tx size just because it's a reasonable length that has been initialized. Replacing addrlen with opts.transfer_size should work just fine, but is it necessary?

@aingerson
Copy link
Contributor

@sunkuamzn Yeah I agree with jianxin - the main test loop is the one testing the opts.transfer_size. The part you're changing is testing the unexpected unspec addr path and it just does a single send/recv of the address name. It doesn't need the larger size. It's not really adding extra coverage because we already test the larger sizes in the main loop.
And like jianxin said, the rx size that you post doesn't have to match. We use the larger size to make sure and make room for any size address. It'll just consume only what it needs.

@sunkuamzn
Copy link
Contributor Author

@aingerson @j-xiong are you at least ok with either (1) replacing addrlen with 1 in the first two tx calls or (2) replacing 1 with addrlen in the last tx call?

@aingerson
Copy link
Contributor

Oh I think I misread the code and missed the actual address exchange is happening OOB which makes sense. I'm fine with changing all the tx sends in this function to 1 but maybe rephrase to explain that the address exchange is OOB so these are just test messages and have nothing to do with address length? Sorry for the confusion!

@aingerson
Copy link
Contributor

(I am also fine with the changes in this PR as is now that I understand it)
ie if the user specified a 0 length transfer then I think this should work too

The first two messages were using different message sizes
for some reason

Signed-off-by: Sai Sunku <[email protected]>
@sunkuamzn
Copy link
Contributor Author

@aingerson updated Tx message sizes to 1

*/
ret = ft_post_tx_buf(ep, remote_fi_addr, addrlen, 0, &tx_ctx,
tx_buf, mr_desc, ft_tag);
ret = (int) ft_tx(ep, remote_fi_addr, 1, &tx_ctx);
Copy link
Contributor

@aingerson aingerson Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These both need to stay with the ft_post_tx_buf call which don't wait for the completion. ft_tx waits for it to complete which is going to conflict with the get_tx_comp(2) call on line 249 and make the test hang. The goal here is to post 2 sends and then sync with the other side to ensure that they go to the unexpected queue before the receives are posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants