fabtests: implement pausable timer #11507

alekswn · 2025-10-15T15:33:55Z

Replace global timespec variables with a flexible ft_timer union that supports pause/resume functionality. This allows for more accurate timing measurements in tests by excluding client-server synchronization time.

In particular, this change fixes issue with fi_rma_bw -o writedata -p efa -f efa-direct.
Currently bandwidth numbers produced by that benchmark are wrong. In case of writedata client sends a stream of write messages to server and server replies with a send after each window. The purpose of the benchmark is to measure bandwidth of writedata operation, not the synchronization time. However in some cases synchronization time can take up to 30% of overall timing, therefore bandwidth calculation is off by the same 30%.

Key changes:

Add ft_timer union with pause/resume capabilities
Replace timespec usage across all test files with new timer API
Update timing functions in benchmark and test utilities
bandwidth_rma -o writedata excludes reverse sync time from total timing
~~bandwidth_rma starts timer only after all completions from warm-up iterations had been received~~
~~Enable payload validation for warm-up iterations~~
~~Increase default warmup iterations from 10 to 64 to mach default window size~~
Fix a bug in uni_bandwidth due to dafaulting to a global var targs
Fixed a bug with timing in efa_gda fabtest

This improves timing precision in fabtests by allowing tests to exclude non-measurement periods from their timing calculations.

Edit: Crossed out changes caused increase in BW wall clock timings causing CI timeouts. Removing for now

Replace global timespec variables with a flexible ft_timer union that supports pause/resume functionality. This allows for more accurate timing measurements in tests by excluding syncronization time. Key changes: - Add ft_timer union with pause/resume capabilities - Replace timespec usage across all test files with new timer API - Update timing functions in benchmark and test utilities - `bandwidth_rma -o writedata` exclude reverse sync time from timing - Fix a bug in `uni_bandwidth` due to dafaulting to a global var `targs` - Fixed a bug with timing in `efa_gda` fabtest This improves timing precision in fabtests by allowing tests to exclude non-measurement periods from their timing calculations. Signed-off-by: Alexey Novikov <[email protected]>

aingerson

Fun changes! Thank you!

aingerson · 2025-10-21T13:41:44Z

fabtests/include/shared.h

+static inline void ft_timer_pause(union ft_timer *timer)
+{
+	timer->elapsed = ft_gettime_ns() - timer->start;
+}
+static inline void ft_timer_resume(union ft_timer *timer)
+{
+	timer->start = ft_gettime_ns() - timer->elapsed;
+}


Can we simplify this to just a start and stop where stop is
elapsed = gettime - start

and start is
start = gettime - elapsed

and elapsed is initialized to 0?

or else you could consider a model where
start is
start = gettime

and stop is
elapsed += gettime - start

either way, I don't think you need the pause/resume to be separate. I would rather keep it as simple as possible

Yeah, we can have just two state transition functions (start and stop). Do we want some initialization function to set elapsed to zero or should let caller to just zero initialize the timer ?
The later is not not going to work if we returned to global variable model, so the caller would have to reset the global timer before calling start.

Would not it be more convenient to reset timer in start() and still have resume() function which would just update elapsed?

In that case callers which do not need pause functionality can just call start() and stop() as before.

I'm ok with start/stop/resume with an init in start

aingerson · 2025-10-21T13:42:12Z

fabtests/include/shared.h

+}
+static inline uint64_t ft_timer_get_elapsed(union ft_timer timer, enum precision p)
+{
+	// assiming timer is stopped


aingerson · 2025-10-21T13:42:19Z

fabtests/include/shared.h

+}
+static inline bool ft_timer_is_elapsed(union ft_timer timer, uint64_t timeout, enum precision p)
+{
+	// assuming timer is running


typo, also couldn't hurt to add some asserts in the timer instead of having to comment this

I initially implemented ft_timer as struct and added asserts here, but converted it to a union then.
There is no way to add asserts having it as a union. Probably that was a premature optimization and a bad idea.
I'll revert it back.

Ah I missed the union! A fun optimization! But I think I would prefer the struct, especially if that adds the option of an assert to catch misuses of it. I wouldn't want performance numbers to get reported incorrectly if someone forgot to stop/resume in the correct order or something.

fabtests/ubertest/cq.c

aingerson · 2025-10-21T13:56:55Z

fabtests/include/shared.h

 }

-static inline void ft_start(void)
+static inline void ft_timer_start(union ft_timer *timer)


Is there a specific reason you changed this to pass in a timer instead of using a global? I'm not necessarily against it if you have a reason you want to update it like if you have a test you want to add that has multiple timers. It's just not necessary right now and makes for a lot of code changes without a clear reason.

The reason for that is to make it useful for cases like timeout in your previous comment. There are few places like that in fabtests.
Also, I have noticed global vars s and e are shadowed at lots of places across fabtests code. show_perf() function does function takes s and e as arguments and does not use globals. So, in my opinion, keeping it global might lead to tricky bugs in the future.
I'm fixing a similar shadowing bug with targs variable in the present PR rdm_bw_mt.c:444.

Sounds reasonable. Thanks!

aingerson · 2025-10-21T15:13:18Z

fabtests/benchmarks/benchmark_shared.c

+	if (timer)
+		ft_timer_pause(timer);
+	ret = ft_rx(ep, FT_RMA_SYNC_MSG_BYTES);
+	if (timer)
+		ft_timer_resume(timer);
+	return ret;


Could this give you false performance numbers? Depending on your completion level, the send may have not been completely sent so you could be reporting higher bandwidth. Same on the receive side for the RMA ops. The RMA ops aren't going to generate completions so if you stop the timer right after get_rx_comp you're not actually waiting for all the messages to come in. Unfortunately, for RMA ops you kind of have to wait for the sync message to ensure you're stopping the timer when all the data has arrived.

Makes sense. The rationale here was to exclude server -> client send latency from time calculation. For writedata benchmark we observed synchronization overhead of the same order of magnitude as overall timing due to connection setup delay for EFA.

Connection setup delay is only seen for a first few synchronizations, but we see 30% error in bandwidth measurements attributed to connection setup latency. The delay is negligible after first 5 synchronizations.

The reason the synchronization delay affects writedata test so much is asymmetry of the test. Client makes a lot of writes in the loop, but server replies with just a single send at the end of each window. We are warming up client->server path with the warm-up iterations, but do not warm up server->client connection.

If we want keep synchronization timing included, I think an alternative approach would be to warm up server->client path as well. Does it sound like a better approach?

Gotcha, thanks for the context. I think it's ok to warmup both sides. In my opinion, it's a little weird that the writedata function has such a different model than the other functions. I understand why (because of the rx completion) but I think it would be ok to have all ops behave in the same pattern. Can we change writedata to be two sided? Would that help as well? Or are you explicitly looking to have it remain a one-sided test?

Thanks for suggestion!
Will try to make it symmetric.

alekswn · 2025-10-21T20:45:14Z

Fun changes! Thank you!

Thanks for reviewing this PR and detailed feedback, Alexia!

If we don't want exclude synchronization time from bandwidth_rma timing introducing a new timer is not making much sense.
This PR also fixes a couple of timer-related bugs in efa_gda.c and multy_ep_mt.c. I think I'll cut separate PRs with fixes using the old timer.

alekswn requested review from a-szegel and shijin-aws October 15, 2025 15:33

alekswn force-pushed the pausable-timer branch 2 times, most recently from fadd60e to 892eff7 Compare October 16, 2025 23:13

sunkuamzn requested review from aingerson and j-xiong October 17, 2025 00:53

alekswn force-pushed the pausable-timer branch from 892eff7 to ebe3a5f Compare October 17, 2025 06:22

aingerson reviewed Oct 21, 2025

View reviewed changes

alekswn added the ⚠️ Do not merge label Oct 22, 2025

alekswn marked this pull request as draft October 22, 2025 23:36

fabtests: implement pausable timer #11507

Are you sure you want to change the base?

fabtests: implement pausable timer #11507

Conversation

alekswn commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aingerson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alekswn Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alekswn Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alekswn commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alekswn commented Oct 15, 2025 •

edited

Loading

alekswn Oct 21, 2025 •

edited

Loading

alekswn Oct 21, 2025 •

edited

Loading