-
Notifications
You must be signed in to change notification settings - Fork 35
Duplicate mmap log file at the beginning of darshan_core_shutdown #1053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, just a few minor suggestions
darshan-runtime/lib/darshan-core.c
Outdated
final_core->mmap_log_name); | ||
mmap_fd = open(dup_log_fame, O_CREAT | O_WRONLY, 0644); | ||
if (mmap_fd != -1) { | ||
write(mmap_fd, fileSize, buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need to check the return code of the write here as well. If it fails we just do a best effort cleanup (unlink etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I on purpose ignore all the error codes in this code block, as we
do not want any error to affect the log processing after this code block.
The duplicated file will be unlinked at the end, line 809, any way.
If this is OK, I can add some comments into the codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that strategy makes sense. It may be good to add a comment explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments have been added in commit b3ce870.
I also notice that no CI is testing the configure option |
The solution proposed in this PR only works when |
Yeah we really should. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Let's see if Mikaila can test it. If it works for her then this can be merged.
This is in case an interrupt occurs while processing the log file, resulting the log file corrupted.
darshan-runtime/lib/darshan-core.c
Outdated
if (mmap_fd != -1) { | ||
int dup_mmap_fd; | ||
void *buf = (void*) malloc(final_core->mmap_size); | ||
read(mmap_fd, buf, final_core->mmap_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this block of code produces the following warning with gcc 14.2:
../../../darshan-runtime/lib/darshan-core.c: In function ‘darshan_core_shutdown’:
../../../darshan-runtime/lib/darshan-core.c:494:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
494 | read(mmap_fd, buf, final_core->mmap_size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../darshan-runtime/lib/darshan-core.c:500:13: warning: ignoring return value of ‘write’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
500 | write(dup_mmap_fd, buf, final_core->mmap_size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ignored them on purpose, as these codes are for file duplication, wouldn't
want any error stop the duplication.
Anyway, I added checks in 14a755c.
Strangely, I also run gcc 14.2.1, but did not get those warnings.
Please see if they are silenced.
darshan-runtime/lib/darshan-core.c
Outdated
void *buf = (void*) malloc(final_core->mmap_size); | ||
read(mmap_fd, buf, final_core->mmap_size); | ||
close(mmap_fd); | ||
snprintf(dup_log_fame, strlen(final_core->mmap_log_name), "%s.dup", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This snprintf truncates the files name to something like /tmp/carns_python3_id1445957_mmap-log-3779006568372861245-0.darsha
because the 2nd argument to is calling strlen()
on the original mmap_log_name
. Maybe this size limit on the snprintf should be based on the size of the dup_log_fname
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed in 6a5c60f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I have one other request here. Sorry this didn't occur to me sooner. I like that the duplicate file has "dup" in the file name, but it may be better if it weren't used as the file extension. I find myself doing things like ls /tmp/*.darshan
to look for intermediate files, when with this patch it would really need to be ls /tmp/*.darshan /tmp/*.darshan.dup
to find them all.
Can we change the naming convention so that even the duplicate file names end with .darshan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way is to run "ls /tmp/*.darshan*
".
The log file name is constructed in:
darshan/darshan-runtime/lib/darshan-core.c
Lines 830 to 832 in b7e4e89
snprintf(core->mmap_log_name, __DARSHAN_PATH_MAX, | |
"/%s/%s_%s_id%d_mmap-log-%" PRIu64 "-%d.darshan", | |
core->config.mmap_log_path, cuser, __progname, jobid, logmod, my_rank); |
Where do you suggest to insert substring "dup"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be other things in /tmp with "darshan" in the name that aren't log files, though (I have a bunch for sure, though I'm an outlier because I've been experimenting with a lot of Darshan things :))
I'm not picky about where to put dup in the name (other than not at the end). Maybe just making the extension .dup.darshan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Please give it a try.
This is in case an interrupt occurs while processing the log file, resulting the log file corrupted.
See issue #1052.