Skip to content

Conversation

wkliao
Copy link
Collaborator

@wkliao wkliao commented Aug 19, 2025

After several tests using different versions of OpenMPI, it appears that
5.0.6 fixed the data sieving problem we suspect. This PR demonstrates
it by building Darshan with both 5.0.5 and 5.0.6. Only 5.0.5 failed.

The small test MPI program mpi_file_write.c calls only MPI_File_write()
once, running on 4 MPI processes. when using OpenMPI 5.0.5, the test
program ran fine, but the contents of generated Darshan log file were corrupted.

According to OpenMPI 5.0.6 Release note:

  • Detailed Locking Protocol: Modified default file-locking protocols in
    UFS component to ensure data consistency, especially when using
    data-sieving operations, which require broader locking.

Most likely relevant fix in OpenMPI 5.0.6 is open-mpi/ompi#12759

@github-actions github-actions bot added the CI continuous integration label Aug 19, 2025
@wkliao wkliao added DO NOT MERGE Tests only. CI continuous integration and removed CI continuous integration labels Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI continuous integration DO NOT MERGE Tests only.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant