-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Refactor uniq field skipping to avoid allocations #8703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could you please the benchmark with hyperfine? With gnu, baseline and your version. |
CodSpeed Performance ReportMerging #8703 will improve performances by ×7.3Comparing Summary
Benchmarks breakdown
Footnotes
|
GNU testsuite comparison:
|
performance improvement |
GNU testsuite comparison:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix the various job failures
GNU testsuite comparison:
|
GNU testsuite comparison:
|
Could you please share/run the benchmark with hyperfine? With gnu, baseline and your version. |
benchmark Report: uniq (uutils) vs system uniqObjective: Compare the performance of the uutils uniq against the system /usr/bin/uniq using hyperfine. Method:Command form: cat | > /dev/null Four datasets:small.txt (~5k lines, two values) Results (mean time, relative speed):small.txt: System uniq is ~1.11x faster (both ~5 ms; shell overhead significant) |
GNU testsuite comparison:
|
could you please share the full output of hyperfine ? |
i think your usage of codex is a bit too heavy currently, this won't be merged given all the changes you have made |
GNU testsuite comparison:
|
GNU testsuite comparison:
|
f6d61aa
to
f415c6d
Compare
GNU testsuite comparison:
|
- Add read error localization keys to en-US and fr-FR locales - Refactor print_uniq to use buffered line reading with LineMeta for case-insensitive comparisons, avoiding memory issues with large inputs - Improve error handling by detecting read failures and exiting appropriately
f415c6d
to
1e25202
Compare
GNU testsuite comparison:
|
Summary
Reworked skip_fields to walk the input buffer directly and return the borrowed tail slice, eliminating the repeated Vec allocations while preserving the existing whitespace/field semantics.
Updated cmp_key to operate on the borrowed slice so that byte-based --skip-chars handling and the invalid-UTF-8 fallback continue to match the original behavior without extra copies.
Testing
✅ cargo test uniq --features=default
Benchmark
Baseline target/debug/uniq_baseline --skip-fields=3 --skip-chars=10 uniq_bench_input.txt > /dev/null: 18.475 s, 18.113 s, 17.271 s (avg 17.95 s).
Refactored target/debug/uniq --skip-fields=3 --skip-chars=10 uniq_bench_input.txt > /dev/null: 1.348 s, 1.288 s, 1.339 s (avg 1.33 s, ≈13.5× faster)