Modifies the parallelization to use more cpu resources #53

jacoblumpkins · 2021-07-09T15:18:29Z

This will add chunk-wise parallelization to compute_partial_van_hove. Now, each cpu thread can work on its own chunk, roughly increasing the speed by a factor of cpu_count.

rmatsum836

Looks good so far! Just a couple minor changes. I pushed a commit which ran black on van_hove.py to enforce style. This can be done by running black van_hove.py on the command line (you may need to install black).

For the tests, I think we should add to the current test cases the parallel=True option.
If a user were to pass in a higher cpu_count than available resources, would this raise an error by multiprocessing?

rmatsum836 · 2021-07-30T16:26:37Z

scattering/van_hove.py

    trj : mdtraj.Trajectory
        trajectory on which to compute the Van Hove function
-    chunk_length : int
+    chunk_length : int, defualt=10


typo for default

rmatsum836 · 2021-07-30T16:30:38Z

scattering/van_hove.py

+    chunk_starts : int array-like, shape=(n_chunks,), optional, default=[chunk_length * i for i in range(trj.n_frames//chunk_length)]
+        The first frame of each chunk to be analyzed.
+    cpu_count : int, optional, default=min(multiprocessing.cpu_count(), total system memory in GB)
+        The number of cpu process to run at once if parallel is True


The number of cpu *processes to simultaneously run if parallel=True.

I'll get the typos. If a user passes a higher number for cpu_count than is available in the system, it just means more processes will be spawned by multiprocessing to compete for resources. That might cause a memory error or be slow if they pass a much bigger number, but other than that it shouldn't break anything. Depending on how the system handles threads, it may actually speed the code up on certain architectures.

rmatsum836 · 2021-07-30T16:42:52Z

Just added psutil to dependencies. We may want to add a skipif decorator for the parallel tests if psutil isn't installed.

rmatsum836 · 2021-07-30T16:50:43Z

scattering/van_hove.py

    pairs = trj.top.select_pairs(selection1=selection1, selection2=selection2)

-    n_chunks = int(trj.n_frames / chunk_length)
+    if chunk_starts is None:


I think you need to move this line up before chunk_starts is iterated on. I believe this is why tests are currently failing.

…ering into new_parallel

rmatsum836 · 2021-07-30T19:31:22Z

@lisankim0321 could you review this PR as well? Thanks!

…ering into new_parallel

rmatsum836 · 2021-07-30T20:48:26Z

Tests seem to be passing locally but will fail on GHA until a new version of MDTraj is released.

rmatsum836 · 2021-07-31T20:56:53Z

scattering/van_hove.py

    n_bins=None,
    self_correlation=True,
    periodic=True,
+    num_concurrent_paris=100000,


num_concurrent_pairs has a spelling typo. Could this also be changed to n_concurrent_pairs?

lisankim0321

Small suggestion for the output of dictionary keys - other than that, look good to me.

lisankim0321 · 2021-07-30T20:35:36Z

scattering/van_hove.py

-                ("element {}".format(elem1.symbol), "element {}".format(elem2.symbol))
-            ] = g_r_t_partial
+        partial_dict[
+            ("element {}".format(elem1.symbol), "element {}".format(elem2.symbol))


perhaps the dictionary key could simply be the symbol, not element {symbol}, to match better with the dictionary key format expected on vhf_from_pvhf

…sage

rmatsum836 · 2021-08-13T21:47:00Z

Partial test is going to fail, need to check if the assertion statement is accurate for the "self" case.

Jacob Lumpkins and others added 20 commits June 18, 2021 11:20

created new file to parallelize the vhf by chunk

bef37af

Major bugfixes to para_van_hove.py

1d0f5dc

Added pool to make the code neater and faster, work in progress

f6c2f66

More bugfixes and cleaning to para_van_hove.py

f8805ba

added support for the newest version of van_hove.py

536c437

Merge branch 'mattwthompson:master' into new_parallel

c43e6c5

added flag to specify number of cpu cores used

ecd1f16

Merge branch 'mattwthompson:master' into new_parallel

d7a7467

Add a new approach to parallelization to the partial vhf method

ddf43a6

Merged from upstream

2e646b5

removed unnecessary parallelization file

9239e04

remove debug print command

a588f61

Added flag for num_concurrent_pairs

fada06d

Added logic to prevent high core count systems from overusing memory

0cfdc52

rewrote to remove the maneger and dictionary, cleaned up

b5a7a48

Refactored to only generate some objects as needed for memory purposes

90a7838

Rearranged to move the progress bar into main

ccc87e7

Added max_value arg to progress bar to display correctly

e17fe44

Changed close to terminate in case the pool parent gets exited early

adca397

Changed variable name to match the previous convention

06b0843

rmatsum836 mentioned this pull request Jul 27, 2021

FIx progressbar for multiprocessing #38

Open

Changed the order of arguemnts to better match importance

dcda1a5

rmatsum836 mentioned this pull request Jul 29, 2021

Parallelize partial VHF calculations #19

Open

jacoblumpkins and others added 4 commits July 29, 2021 16:11

Merge branch 'master' into new_parallel

0e69443

code cleaning, added error check for late start times

6a86086

merge from master

d52421e

Run black on van_hove.py

e19e19c

rmatsum836 requested changes Jul 30, 2021

View reviewed changes

Add psutil to dependencies

5db9819

rmatsum836 reviewed Jul 30, 2021

View reviewed changes

jacoblumpkins added 6 commits July 30, 2021 11:56

typos and cleaning the code

e8bcc4f

Merge branch 'new_parallel' of https://github.com/jacoblumpkins/scatt…

6409b44

…ering into new_parallel

formatting

70e1a0d

fixed an incorrect function call in the serial implementation

9f2da3e

Add alphaetization for consistency

2bb92a1

Add alphabetization for consistency

151df14

Merge branch 'new_parallel' of https://github.com/jacoblumpkins/scatt…

941c96d

…ering into new_parallel

rmatsum836 reviewed Jul 31, 2021

View reviewed changes

fixing a typo

30ed908

lisankim0321 reviewed Aug 3, 2021

View reviewed changes

jacoblumpkins and others added 4 commits August 4, 2021 16:54

Changed where the trajectory splitting takes place to reduce memory u…

e156e7a

…sage

Merge branch 'master' into new_parallel

7967292

Merge branch 'master' into new_parallel

a6b493b

Update logic for self only correlations

b74766d

Fix self_correlation being passed in _data

ac50f3a

Modifies the parallelization to use more cpu resources #53

Are you sure you want to change the base?

Modifies the parallelization to use more cpu resources #53

Uh oh!

Conversation

jacoblumpkins commented Jul 9, 2021

Uh oh!

rmatsum836 left a comment

Choose a reason for hiding this comment

Uh oh!

rmatsum836 Jul 30, 2021

Choose a reason for hiding this comment

Uh oh!

rmatsum836 Jul 30, 2021

Choose a reason for hiding this comment

Uh oh!

jacoblumpkins Jul 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmatsum836 commented Jul 30, 2021

Uh oh!

rmatsum836 Jul 30, 2021

Choose a reason for hiding this comment

Uh oh!

rmatsum836 commented Jul 30, 2021

Uh oh!

rmatsum836 commented Jul 30, 2021

Uh oh!

rmatsum836 Jul 31, 2021

Choose a reason for hiding this comment

Uh oh!

lisankim0321 left a comment

Choose a reason for hiding this comment

Uh oh!

lisankim0321 Jul 30, 2021

Choose a reason for hiding this comment

Uh oh!

rmatsum836 commented Aug 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jacoblumpkins Jul 30, 2021 •

edited

Loading