Refactor uniq field skipping to avoid allocations #8703

mattsu2020 · 2025-09-22T10:09:03Z

Summary

Reworked skip_fields to walk the input buffer directly and return the borrowed tail slice, eliminating the repeated Vec allocations while preserving the existing whitespace/field semantics.

Updated cmp_key to operate on the borrowed slice so that byte-based --skip-chars handling and the invalid-UTF-8 fallback continue to match the original behavior without extra copies.

Testing

✅ cargo test uniq --features=default

Benchmark

Baseline target/debug/uniq_baseline --skip-fields=3 --skip-chars=10 uniq_bench_input.txt > /dev/null: 18.475 s, 18.113 s, 17.271 s (avg 17.95 s).

Refactored target/debug/uniq --skip-fields=3 --skip-chars=10 uniq_bench_input.txt > /dev/null: 1.348 s, 1.288 s, 1.339 s (avg 1.33 s, ≈13.5× faster)

sylvestre · 2025-09-22T10:53:06Z

Could you please the benchmark with hyperfine? With gnu, baseline and your version.
It is much better for benchmarks

codspeed-hq · 2025-09-22T10:54:34Z

CodSpeed Performance Report

Merging #8703 will improve performances by ×7.3

_{Comparing mattsu2020:performance_tuning (1e25202) with main (f7490ca)}

Summary

⚡ 2 improvements
✅ 42 untouched
⏩ 64 skipped¹

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`uniq_case_insensitive[10000]`	6 ms	2 ms	×3
⚡	`uniq_with_count[10000]`	13.4 ms	1.8 ms	×7.3

64 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

github-actions · 2025-09-22T11:08:52Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

mattsu2020 · 2025-09-22T12:47:10Z

performance improvement

github-actions · 2025-09-22T13:19:00Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

sylvestre

please fix the various job failures

github-actions · 2025-09-23T05:30:05Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

github-actions · 2025-09-23T06:00:25Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

sylvestre · 2025-09-23T06:20:41Z

Could you please share/run the benchmark with hyperfine? With gnu, baseline and your version.

mattsu2020 · 2025-09-23T06:39:16Z

benchmark Report: uniq (uutils) vs system uniq

Objective: Compare the performance of the uutils uniq against the system /usr/bin/uniq using hyperfine.
Environment:
uutils binary: target/release/uniq
system binary: /usr/bin/uniq
　　　　　　　mac os macOS 26.0
datasets: /tmp/uniq_bench/{small.txt, medium.txt, large_runs.txt, sorted.txt}
runner: hyperfine -w 3 -m 10 (Markdown outputs saved as /tmp/uniq_bench/*.md)

Method:

Command form: cat | > /dev/null

Four datasets:

small.txt (~5k lines, two values)
medium.txt (200k unique-ish 16-char tokens)
large_runs.txt (1M lines with long runs of A/B/C)
sorted.txt (500k sorted random tokens)

Results (mean time, relative speed):

small.txt: System uniq is ~1.11x faster (both ~5 ms; shell overhead significant)
medium.txt: uutils uniq is ~2.16x faster
large_runs.txt: uutils uniq is ~2.63x faster
sorted.txt: uutils uniq is ~2.69x faster

small.txt
sorted.txt
large_runs.txt
medium.txt

github-actions · 2025-09-23T06:39:27Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

sylvestre · 2025-09-23T06:47:39Z

could you please share the full output of hyperfine ?

sylvestre · 2025-09-23T06:49:43Z

i think your usage of codex is a bit too heavy
it completely rewrote the tool make it is hard to review.

currently, this won't be merged given all the changes you have made

mattsu2020 · 2025-09-23T06:55:04Z

small.md
sorted.md
medium.md
large_runs.md

github-actions · 2025-09-23T07:32:05Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

github-actions · 2025-09-23T08:26:21Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

github-actions · 2025-09-28T20:43:42Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

- Add read error localization keys to en-US and fr-FR locales - Refactor print_uniq to use buffered line reading with LineMeta for case-insensitive comparisons, avoiding memory issues with large inputs - Improve error handling by detecting read failures and exiting appropriately

github-actions · 2025-09-29T19:18:15Z

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

sylvestre · 2025-09-29T19:37:21Z

bravo for the perf improvements!

sylvestre requested changes Sep 22, 2025

View reviewed changes

mattsu2020 requested a review from sylvestre September 23, 2025 00:30

sylvestre force-pushed the performance_tuning branch 2 times, most recently from f6d61aa to f415c6d Compare September 28, 2025 20:21

mattsu2020 and others added 7 commits September 29, 2025 20:29

refactor uniq skip_fields to avoid allocations

5f7a8cd

fix,build_count_prefix

4d8aa5a

ci test

fe2b1a8

fix &self

ae84112

fix fmt

b0a31c3

fix fmt

1e25202

sylvestre force-pushed the performance_tuning branch from f415c6d to 1e25202 Compare September 29, 2025 18:29

sylvestre merged commit 31d1f06 into uutils:main Sep 29, 2025
97 of 98 checks passed

mattsu2020 deleted the performance_tuning branch October 3, 2025 23:31

Uh oh!

Refactor uniq field skipping to avoid allocations #8703

Refactor uniq field skipping to avoid allocations #8703

Uh oh!

Conversation

mattsu2020 commented Sep 22, 2025

Summary

Testing

Benchmark

Uh oh!

sylvestre commented Sep 22, 2025

Uh oh!

codspeed-hq bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #8703 will improve performances by ×7.3

Summary

Benchmarks breakdown

Footnotes

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

mattsu2020 commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

sylvestre left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

sylvestre commented Sep 23, 2025

Uh oh!

mattsu2020 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

benchmark Report: uniq (uutils) vs system uniq

Method:

Four datasets:

Results (mean time, relative speed):

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

sylvestre commented Sep 23, 2025

Uh oh!

sylvestre commented Sep 23, 2025

Uh oh!

mattsu2020 commented Sep 23, 2025

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

github-actions bot commented Sep 28, 2025

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

sylvestre commented Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

codspeed-hq bot commented Sep 22, 2025 •

edited

Loading

mattsu2020 commented Sep 23, 2025 •

edited

Loading