Skip to content

Conversation

Nekrolm
Copy link
Contributor

@Nekrolm Nekrolm commented Sep 21, 2025

Hi.
I was benchmarking several tools and found there some low hanging minor optimizations for input processing that can be done:

  • Don't lookup in hashmap twice
  • Don't allocate extra arrays of input strings

@sylvestre
Copy link
Contributor

did you look at the actual perf wins ?
(with hyperfine - previous rust impl, gnu and this one)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@sylvestre
Copy link
Contributor

a job is failing, you need to do
cd fuzz && cargo build
to update Cargo.lock (yes, it is silly)

@Nekrolm
Copy link
Contributor Author

Nekrolm commented Sep 21, 2025

It optimizes only initial input processing, so any gain can be visible only on large files. Where the initial read is somehow noticeable

With 1 million edges linear chain (edges shuffled)

Benchmark 1: target/release/coreutils tsort line_shuf.txt || true
  Time (mean ± σ):     19.849 s ±  1.410 s    [User: 17.447 s, System: 2.365 s]
  Range (min … max):   17.963 s … 21.943 s    10 runs

After

Benchmark 1: target/release/coreutils tsort line_shuf.txt || true
  Time (mean ± σ):     19.577 s ±  0.652 s    [User: 17.293 s, System: 2.262 s]
  Range (min … max):   18.920 s … 20.719 s    10 runs

Actually, Itertools are not needed. And it's even better without them -- internal chunks' buffer overuses RefCell runtime checks
I've updated PR without it:

No itertools

Benchmark 1: target/release/coreutils tsort line_shuf.txt || true
  Time (mean ± σ):     18.866 s ±  0.453 s    [User: 16.725 s, System: 2.137 s]
  Range (min … max):   18.289 s … 19.867 s    10 runs
 

tsort (GNU coreutils) 9.4 -- (have't checked latest one)

dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ hyperfine "tsort line_shuf.txt || true"
Benchmark 1: tsort line_shuf.txt || true
  Time (mean ± σ):     28.474 s ±  4.092 s    [User: 27.882 s, System: 0.498 s]
  Range (min … max):   25.950 s … 39.779 s    10 runs

@sylvestre
Copy link
Contributor

please run hyperfine this way:
hyperfine "tsort line_shuf.txt || true" " target/release/coreutils tsort line_shuf.txt || true"
it work better this way to compare implementations

Copy link

codspeed-hq bot commented Sep 21, 2025

CodSpeed Performance Report

Merging #8694 will not alter performance

Comparing Nekrolm:main (4f09383) with main (7dbeb8f)

Summary

✅ 55 untouched
⏩ 1 skipped1

Footnotes

  1. 1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports.

@Nekrolm
Copy link
Contributor Author

Nekrolm commented Sep 21, 2025

dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$  hyperfine "tsort line_shuf.txt || true" " target/release/coreutils tsort line_shuf.txt || true"
Benchmark 1: tsort line_shuf.txt || true
  Time (mean ± σ):     33.722 s ±  3.816 s    [User: 33.083 s, System: 0.620 s]
  Range (min … max):   27.813 s … 40.324 s    10 runs
 
Benchmark 2:  target/release/coreutils tsort line_shuf.txt || true
  Time (mean ± σ):     21.824 s ±  2.108 s    [User: 19.456 s, System: 2.356 s]
  Range (min … max):   18.467 s … 25.517 s    10 runs
 
Summary
   target/release/coreutils tsort line_shuf.txt || true ran
    1.55 ± 0.23 times faster than tsort line_shuf.txt || true

dmis@dmis-asus-N7600PC:~/WORKSPACE/coreutils$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 140
model name      : 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
stepping        : 1
microcode       : 0xbc
cpu MHz         : 843.122
cache size      : 12288 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 27
wp              : yes

@sylvestre
Copy link
Contributor

well done!

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

_ => return Err(TsortError::NumTokensOdd(input.to_string_lossy().to_string()).into()),
}

let mut edge_tokens = data.split_whitespace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe document this a bit more to explain what it is doing ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded a bit

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

GNU test failed: tests/tail/overlay-headers. tests/tail/overlay-headers is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

GNU test failed: tests/tail/overlay-headers. tests/tail/overlay-headers is passing on 'main'. Maybe you have to rebase?

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants