Skip to content

Conversation

kimono-koans
Copy link
Contributor

@kimono-koans kimono-koans commented Sep 24, 2025

Would resolve: #8725

Copy link

codspeed-hq bot commented Sep 24, 2025

CodSpeed Performance Report

Merging #8728 will degrade performances by 5.24%

Comparing kimono-koans:recursive_loop (3e4b7eb) with main (32eef06)

Summary

❌ 10 regressions
✅ 83 untouched
⏩ 1 skipped1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
ls_recursive_balanced_tree[(3, 4, 8)] 436 µs 452.7 µs -3.7%
ls_recursive_balanced_tree[(4, 3, 6)] 428.7 µs 447.4 µs -4.19%
ls_recursive_balanced_tree[(5, 2, 10)] 435.1 µs 454.9 µs -4.35%
ls_recursive_long_all_balanced_tree[(4, 3, 6)] 514.5 µs 532.1 µs -3.3%
ls_recursive_long_all_balanced_tree[(5, 2, 10)] 525 µs 542.4 µs -3.2%
ls_recursive_long_all_wide_tree[(1000, 200)] 6.8 ms 7 ms -3.53%
ls_recursive_long_all_wide_tree[(5000, 500)] 32.5 ms 33.6 ms -3.29%
ls_recursive_mixed_tree 515.8 µs 527.9 µs -2.3%
ls_recursive_wide_tree[(1000, 200)] 5.3 ms 5.5 ms -2.98%
ls_recursive_wide_tree[(10000, 1000)] 48.4 ms 51.1 ms -5.24%

Footnotes

  1. 1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports.

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

GNU test failed: tests/tail/overlay-headers. tests/tail/overlay-headers is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)

Copy link

GNU testsuite comparison:

GNU test failed: tests/tail/overlay-headers. tests/tail/overlay-headers is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

according to https://codspeed.io/uutils/coreutils/branches/kimono-koans%3Arecursive_loop
it regressed a bunch of benchmarks.
Up to 6%:
image
wdyt ?

@sylvestre sylvestre closed this Sep 26, 2025
@sylvestre sylvestre reopened this Sep 26, 2025
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

i rerun the benchmark:
#8728 (comment)

@kimono-koans
Copy link
Contributor Author

kimono-koans commented Sep 26, 2025

according to https://codspeed.io/uutils/coreutils/branches/kimono-koans%3Arecursive_loop it regressed a bunch of benchmarks. Up to 6%: image wdyt ?

I'd say performance isn't the primary driver of this 1st/2nd/3rd step? Of course recursion can be faster than an iterative approach, especially in trivial/small cases (which I'll admit are the mostly the scenario for ls). Recursion, after all, uses stack space instead of the heap! The problem is usually -- the recursive method also never deallocates and can cause either a stack overflow or run away memory usage. It is also naturally single threaded.

And, as discussed, this is a step:

This is also the first step to concurrent directory search.

In other PRs, I've removed mutable state, removed the DirEntry (a >1K bloat) from the PathData struct, so now we can clone PathData and move PathData onto a new thread (!), such that concurrent search is at least possible. Even a rayon::join() is nearly possible now (to see why not, see the DirEntry in the PathData struct), whereby we simply spawn a rayon thread to get the next read_dir ready, while enter_directory runs.

Despite recursion being perhaps the best for the common case, we have to prepare for the worst case. It's why we use a HashMap instead of a BTreeMap, which would undoubtedly be the faster method in the common case. We have to anticipate Firefox cache directories with 700K items, etc.

These are all iterative steps (har har) which may provide more flexibility to refactor into something better in the future. For instance, below, you have to imagine the 2nd is more cache friendly than the 1st, simply because the loops are tighter, but we need to get rid of &mut state.out everywhere to make this happen.

for loc in locs {
    let path_data = PathData::new(PathBuf::from(loc), None, None, config, true);

    // Getting metadata here is no big deal as it's just the CWD
    // and we really just want to know if the strings exist as files/dirs
    //
    // Proper GNU handling is don't show if dereferenced symlink DNE
    // but only for the base dir, for a child dir show, and print ?s
    // in long format
    if path_data.get_metadata(&mut state.out).is_none() {
        continue;
    }

    let show_dir_contents = match path_data.file_type(&mut state.out) {
        Some(ft) => !config.directory && ft.is_dir(),
        None => {
            set_exit_code(1);
            false
        }
    };

    if show_dir_contents {
        dirs.push(path_data);
    } else {
        files.push(path_data);
    }
}

let (mut dirs, mut files): (Vec<PathData>, Vec<PathData>) = locs
    .into_iter()
    .map(|loc| PathData::new(PathBuf::from(loc), None, None, config, true))
    .filter(|path_data| {
        // Getting metadata here is no big deal as it's just the CWD
        // and we really just want to know if the strings exist as files/dirs
        //
        // Proper GNU handling is don't show if dereferenced symlink DNE
        // but only for the base dir, for a child dir show, and print ?s
        // in long format
        path_data.get_metadata().is_none()
    })
    .partition(|path_data| {
        let show_dir_contents = match path_data.file_type() {
            Some(ft) => !config.directory && ft.is_dir(),
            None => {
                set_exit_code(1);
                return false;
            }
        };

        show_dir_contents
    });

I'd also be very pleased to add all my PRs into a mega update, a mega update which shows measurable performance gains, if you would accept a mega update? Or any other approach the maintainers would prefer.

But somethings just aren't going to last, if this program ever wants to be faster, like mutable state everywhere and certain fat structs, etc. Someone is going to have to refactor them out sometime to get anywhere with ls.

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ls: enter_directory should use an iterative loop instead of recursing
3 participants