-
Notifications
You must be signed in to change notification settings - Fork 451
fabtests: implement pausable timer #11507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fadd60e to
892eff7
Compare
Replace global timespec variables with a flexible ft_timer union that supports pause/resume functionality. This allows for more accurate timing measurements in tests by excluding syncronization time. Key changes: - Add ft_timer union with pause/resume capabilities - Replace timespec usage across all test files with new timer API - Update timing functions in benchmark and test utilities - `bandwidth_rma -o writedata` exclude reverse sync time from timing - Fix a bug in `uni_bandwidth` due to dafaulting to a global var `targs` - Fixed a bug with timing in `efa_gda` fabtest This improves timing precision in fabtests by allowing tests to exclude non-measurement periods from their timing calculations. Signed-off-by: Alexey Novikov <[email protected]>
892eff7 to
ebe3a5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fun changes! Thank you!
| static inline void ft_timer_pause(union ft_timer *timer) | ||
| { | ||
| timer->elapsed = ft_gettime_ns() - timer->start; | ||
| } | ||
| static inline void ft_timer_resume(union ft_timer *timer) | ||
| { | ||
| timer->start = ft_gettime_ns() - timer->elapsed; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify this to just a start and stop where stop is
elapsed = gettime - start
and start is
start = gettime - elapsed
and elapsed is initialized to 0?
or else you could consider a model where
start is
start = gettime
and stop is
elapsed += gettime - start
either way, I don't think you need the pause/resume to be separate. I would rather keep it as simple as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can have just two state transition functions (start and stop). Do we want some initialization function to set elapsed to zero or should let caller to just zero initialize the timer ?
The later is not not going to work if we returned to global variable model, so the caller would have to reset the global timer before calling start.
Would not it be more convenient to reset timer in start() and still have resume() function which would just update elapsed?
In that case callers which do not need pause functionality can just call start() and stop() as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with start/stop/resume with an init in start
| } | ||
| static inline uint64_t ft_timer_get_elapsed(union ft_timer timer, enum precision p) | ||
| { | ||
| // assiming timer is stopped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
| } | ||
| static inline bool ft_timer_is_elapsed(union ft_timer timer, uint64_t timeout, enum precision p) | ||
| { | ||
| // assuming timer is running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, also couldn't hurt to add some asserts in the timer instead of having to comment this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially implemented ft_timer as struct and added asserts here, but converted it to a union then.
There is no way to add asserts having it as a union. Probably that was a premature optimization and a bad idea.
I'll revert it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I missed the union! A fun optimization! But I think I would prefer the struct, especially if that adds the option of an assert to catch misuses of it. I wouldn't want performance numbers to get reported incorrectly if someone forgot to stop/resume in the correct order or something.
| } | ||
|
|
||
| static inline void ft_start(void) | ||
| static inline void ft_timer_start(union ft_timer *timer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason you changed this to pass in a timer instead of using a global? I'm not necessarily against it if you have a reason you want to update it like if you have a test you want to add that has multiple timers. It's just not necessary right now and makes for a lot of code changes without a clear reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for that is to make it useful for cases like timeout in your previous comment. There are few places like that in fabtests.
Also, I have noticed global vars s and e are shadowed at lots of places across fabtests code. show_perf() function does function takes s and e as arguments and does not use globals. So, in my opinion, keeping it global might lead to tricky bugs in the future.
I'm fixing a similar shadowing bug with targs variable in the present PR rdm_bw_mt.c:444.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. Thanks!
| if (timer) | ||
| ft_timer_pause(timer); | ||
| ret = ft_rx(ep, FT_RMA_SYNC_MSG_BYTES); | ||
| if (timer) | ||
| ft_timer_resume(timer); | ||
| return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this give you false performance numbers? Depending on your completion level, the send may have not been completely sent so you could be reporting higher bandwidth. Same on the receive side for the RMA ops. The RMA ops aren't going to generate completions so if you stop the timer right after get_rx_comp you're not actually waiting for all the messages to come in. Unfortunately, for RMA ops you kind of have to wait for the sync message to ensure you're stopping the timer when all the data has arrived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. The rationale here was to exclude server -> client send latency from time calculation. For writedata benchmark we observed synchronization overhead of the same order of magnitude as overall timing due to connection setup delay for EFA.
Connection setup delay is only seen for a first few synchronizations, but we see 30% error in bandwidth measurements attributed to connection setup latency. The delay is negligible after first 5 synchronizations.
The reason the synchronization delay affects writedata test so much is asymmetry of the test. Client makes a lot of writes in the loop, but server replies with just a single send at the end of each window. We are warming up client->server path with the warm-up iterations, but do not warm up server->client connection.
If we want keep synchronization timing included, I think an alternative approach would be to warm up server->client path as well. Does it sound like a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thanks for the context. I think it's ok to warmup both sides. In my opinion, it's a little weird that the writedata function has such a different model than the other functions. I understand why (because of the rx completion) but I think it would be ok to have all ops behave in the same pattern. Can we change writedata to be two sided? Would that help as well? Or are you explicitly looking to have it remain a one-sided test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for suggestion!
Will try to make it symmetric.
Thanks for reviewing this PR and detailed feedback, Alexia! If we don't want exclude synchronization time from |
Replace global timespec variables with a flexible ft_timer union that supports pause/resume functionality. This allows for more accurate timing measurements in tests by excluding client-server synchronization time.
In particular, this change fixes issue with
fi_rma_bw -o writedata -p efa -f efa-direct.Currently bandwidth numbers produced by that benchmark are wrong. In case of
writedataclient sends a stream of write messages to server and server replies with a send after each window. The purpose of the benchmark is to measure bandwidth of writedata operation, not the synchronization time. However in some cases synchronization time can take up to 30% of overall timing, therefore bandwidth calculation is off by the same 30%.Key changes:
bandwidth_rma -o writedataexcludes reverse sync time from total timingbandwidth_rmastarts timer only after all completions from warm-up iterations had been receivedEnable payload validation for warm-up iterationsIncrease default warmup iterations from 10 to 64 to mach default window sizeuni_bandwidthdue to dafaulting to a global vartargsefa_gdafabtestThis improves timing precision in fabtests by allowing tests to exclude non-measurement periods from their timing calculations.
Edit: Crossed out changes caused increase in BW wall clock timings causing CI timeouts. Removing for now