Skip to content

Weeks_8and9

Michal J. Gajda edited this page Oct 28, 2020 · 3 revisions
alternate text

(source: http://learnyouahaskell.com,
cc Miran Lipovača)

#8th Week

I recognized that in my original timeline I forgot to include July, so I directly jumped from June to August, I'm sorry for that! But in this way, I gained a lot of time ;) .
So the milestone for the 8th and 9th week is to finish improving the vector-version of the program and start looking at the parallelization.

##Work To Do

Look really close at the merge1 function (and probably some others) and try to find possibilities to reduce space consumption such that it will be possible to run the program on our usual computers.

##Work Done

The original transalign program uses a dynamic programming approach to find the alignment with the highest score. During the last week I looked at the merge1 function, which is part of the forward tracking through the DP matrix. In this step, an 'accumulator' column is built. This process mainly needs cons operations. Since cons needs O(n) for vectors, but only O(1) for lists, I rewrote the accumulator column as a list as it is done in the original program.
The following profiling statistics is done using +RTS -N (parallel GC) -A4M (allocation area size) -H1G (heap size for GC) -s as additional compiler flags.
In total it can be seen that the current version became faster than the original version of the program. The amount of memory in use decreased from 53 GB needed by the original program to 37 GB for the current version.

###Profiling statistic - original version of the program###

To be able to compare directly, the following profiling statistics is the one based on the original code.

1,158,707,563,256 bytes allocated in the heap  
1,454,589,789,224 bytes copied during GC  
19,455,934,400 bytes maximum residency (92 sample(s))  
400,005,664 bytes maximum slop  
53419 MB total memory in use (0 MB lost due to fragmentation)  

                                    Tot time (elapsed)    Avg pause    Max pause
Gen    0     2238545 colls,     0 par    784.90s    828.29s     0.0004s    0.1411s
Gen    1        92 colls,     0 par    382.56s    510.63s     5.5503s    19.5180s

INIT    time    0.00s  (  0.00s elapsed)  
MUT     time  766.71s  (798.76s elapsed)  
GC      time  1167.46s  (1338.92s elapsed)  
RP      time    0.00s  (  0.00s elapsed)  
PROF    time    0.00s  (  0.00s elapsed)  
EXIT    time    0.00s  (  0.13s elapsed)  
Total   time  1934.17s  (2137.82s elapsed)  

%GC     time      60.4%  (62.6% elapsed)  

Alloc rate    1,511,272,271 bytes per MUT second  

Productivity  39.6% of total user, 35.9% of total elapsed  

###Profiling statistic - current version of the program###

1,360,179,341,888 bytes allocated in the heap  
708,349,399,608 bytes copied during GC  
12,136,272,040 bytes maximum residency (50 sample(s))  
4,809,563,584 bytes maximum slop  
37587 MB total memory in use (0 MB lost due to fragmentation)  

                                Tot time (elapsed)    Avg pause    Max pause  
Gen  0     326664 colls,     0 par    647.22s    684.31s     0.0021s    1.1773s  
Gen  1        50 colls,     0 par    96.78s    159.72s     3.1944s    18.9590s  

TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)  

SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)  

INIT    time    0.00s  (  0.00s elapsed)  
MUT     time  585.38s  (621.03s elapsed)  
GC      time  744.00s  (844.03s elapsed)  
RP      time    0.00s  (  0.00s elapsed)  
PROF    time    0.00s  (  0.00s elapsed)  
EXIT    time    0.01s  (  0.01s elapsed)  
Total   time  1329.39s  (1465.07s elapsed)  

Alloc rate    2,323,591,494 bytes per MUT second  

Productivity  44.0% of total user, 40.0% of total elapsed  

###Detailed profiling statistic###

As it can be seen in the following detailed profiling statistic, time and space consumption decreased for merge1. Now it is again group_al''' which needs most of it.

transalign_prof +RTS -N -A4M -H1G -p -RTS ../transalign/test-data/452.xml ../transalign/test-data/U50_vs_SP.xml  

total time  =      810.01 secs   (810012 ticks @ 1000 us, 1 processor)  
total alloc = 1,159,483,304,848 bytes  (excludes profiling overheads)  

COST CENTRE        MODULE     %time %alloc  

group_al'''        Align       79.7   67.7  
trans_align        Align       10.3   21.5  
readAlignmentCache BlastCache   2.5    3.3  
merge_group        Align        1.3    0.0  
match2align        Blast        1.3    2.8  
sort_column        Align        1.2    1.6  
groups             Align        1.1    0.1  
readAlignments     Blast        1.0    1.3  
merge1             Align        0.7    1.5  

###Space profiling###

This is the space consumption plot of the current version. It can be seen that the main part is consumes by ARR_WORDS which is the internal name for ByteArray#, in this case needed for ByteStrings the input is converted to.

alternate text

##Overview## The following plots show the development of the time and space consumption during the last weeks. The numbers on the X-axis show the weeks during GSoC, whereas the first week shows the data from the original version of the program. It can be seen that the current version is faster but stills needs more space than the original one.

###Time consumption###

alternate text

###Space consumption###

alternate text

#9th Week The milestone for the 8th and 9th week is to finish improving the vector-version of the program and start looking at the parallelization.

##Work To Do

At some parts, list were reincluded in the vector version of the program. We think about adding a new data structure to the program to take care of the lists.
Additionally, there are some options to improve the group_al''' a bit more, e.g. replacing the IntMap by Judy arrays or HashMaps.

##Work Done As a first step, I replaced the IntMap with a HashMap. To be able to compare both versions easily, I wrote an additional method called group_al4 which includes the HashMap. Afterwards, I did some time and space profiling. The results can be seen in the following plots. In comparison to last week's version, the version including the HashMap is a bit slower (but still faster than the original version) and the amount of total memory in use again decreased from 37 GB to 34 GB.

###Time profiling

1,422,297,359,944 bytes allocated in the heap  
673,587,351,560 bytes copied during GC  
11,549,725,664 bytes maximum residency (66 sample(s))  
4,755,821,272 bytes maximum slop  
34524 MB total memory in use (0 MB lost due to fragmentation)  

                                Tot time (elapsed)    Avg pause    Max pause  
Gen  0     342973 colls,     0 par   870.90s   914.44s     0.0027s    1.1391s  
Gen  1        66 colls,     0 par   113.79s   193.54s     2.9325s    17.9332s  

TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)  

SPARKS: 327 (0 converted, 0 overflowed, 0 dud, 9 GC'd, 318 fizzled)  

INIT    time    0.00s  (  0.00s elapsed)  
MUT     time  719.96s  (847.34s elapsed)  
GC      time  984.69s  (1107.98s elapsed)  
RP      time    0.00s  (  0.00s elapsed)  
PROF    time    0.00s  (  0.00s elapsed)  
EXIT    time    0.02s  (  0.02s elapsed)  
Total   time  1704.68s  (1955.35s elapsed)  

Alloc rate    1,975,530,983 bytes per MUT second  

Productivity  42.2% of total user, 36.8% of total elapsed  

###Space profiling

alternate text

###Overview plots I updated the overview plots, including the 9th week now, and used a logarithmic scale for the y-axis.

alternate text
alternate text
alternate text

#Next Weeks Here, you can find the entry for the next two weeks.

#References My github account for transalign.
Compiler flags for profiling
Judy arrays on Wikipedia
Strict HashMaps
Unordered containers

Back to main page.
Previous weeks.

Clone this wiki locally