-
Notifications
You must be signed in to change notification settings - Fork 8
Modifies the parallelization to use more cpu resources #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
rmatsum836
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far! Just a couple minor changes. I pushed a commit which ran black on van_hove.py to enforce style. This can be done by running black van_hove.py on the command line (you may need to install black).
For the tests, I think we should add to the current test cases the parallel=True option.
If a user were to pass in a higher cpu_count than available resources, would this raise an error by multiprocessing?
scattering/van_hove.py
Outdated
| trj : mdtraj.Trajectory | ||
| trajectory on which to compute the Van Hove function | ||
| chunk_length : int | ||
| chunk_length : int, defualt=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo for default
scattering/van_hove.py
Outdated
| chunk_starts : int array-like, shape=(n_chunks,), optional, default=[chunk_length * i for i in range(trj.n_frames//chunk_length)] | ||
| The first frame of each chunk to be analyzed. | ||
| cpu_count : int, optional, default=min(multiprocessing.cpu_count(), total system memory in GB) | ||
| The number of cpu process to run at once if parallel is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of cpu *processes to simultaneously run if parallel=True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll get the typos. If a user passes a higher number for cpu_count than is available in the system, it just means more processes will be spawned by multiprocessing to compete for resources. That might cause a memory error or be slow if they pass a much bigger number, but other than that it shouldn't break anything. Depending on how the system handles threads, it may actually speed the code up on certain architectures.
|
Just added |
scattering/van_hove.py
Outdated
| pairs = trj.top.select_pairs(selection1=selection1, selection2=selection2) | ||
|
|
||
| n_chunks = int(trj.n_frames / chunk_length) | ||
| if chunk_starts is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to move this line up before chunk_starts is iterated on. I believe this is why tests are currently failing.
…ering into new_parallel
|
@lisankim0321 could you review this PR as well? Thanks! |
…ering into new_parallel
|
Tests seem to be passing locally but will fail on GHA until a new version of MDTraj is released. |
scattering/van_hove.py
Outdated
| n_bins=None, | ||
| self_correlation=True, | ||
| periodic=True, | ||
| num_concurrent_paris=100000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_concurrent_pairs has a spelling typo. Could this also be changed to n_concurrent_pairs?
lisankim0321
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small suggestion for the output of dictionary keys - other than that, look good to me.
| ("element {}".format(elem1.symbol), "element {}".format(elem2.symbol)) | ||
| ] = g_r_t_partial | ||
| partial_dict[ | ||
| ("element {}".format(elem1.symbol), "element {}".format(elem2.symbol)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps the dictionary key could simply be the symbol, not element {symbol}, to match better with the dictionary key format expected on vhf_from_pvhf
|
Partial test is going to fail, need to check if the assertion statement is accurate for the "self" case. |
This will add chunk-wise parallelization to compute_partial_van_hove. Now, each cpu thread can work on its own chunk, roughly increasing the speed by a factor of cpu_count.