-
Notifications
You must be signed in to change notification settings - Fork 1
Weeks_8and9
(source: http://learnyouahaskell.com,
cc Miran Lipovača)
#8th Week
I recognized that in my original timeline I forgot to include July, so I directly jumped from June to August, I'm sorry for that! But in this way, I gained a lot of time ;) .
So the milestone for the 8th and 9th week is to finish improving the vector-version of the program and start looking at the parallelization.
##Work To Do
Look really close at the merge1 function (and probably some others) and try to find possibilities to reduce space consumption such that it will be possible to run the program on our usual computers.
##Work Done
The original transalign program uses a dynamic programming approach to find the alignment with the highest score. During the last week I looked at the merge1 function, which is part of the forward tracking through the DP matrix. In this step, an 'accumulator' column is built. This process mainly needs cons operations. Since cons needs O(n) for vectors, but only O(1) for lists, I rewrote the accumulator column as a list as it is done in the original program.
The following profiling statistics is done using +RTS -N (parallel GC) -A4M (allocation area size) -H1G (heap size for GC) -s as additional compiler flags.
In total it can be seen that the current version became faster than the original version of the program. The amount of memory in use decreased from 53 GB needed by the original program to 37 GB for the current version.
###Profiling statistic - original version of the program###
To be able to compare directly, the following profiling statistics is the one based on the original code.
1,158,707,563,256 bytes allocated in the heap
1,454,589,789,224 bytes copied during GC
19,455,934,400 bytes maximum residency (92 sample(s))
400,005,664 bytes maximum slop
53419 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2238545 colls, 0 par 784.90s 828.29s 0.0004s 0.1411s
Gen 1 92 colls, 0 par 382.56s 510.63s 5.5503s 19.5180s
INIT time 0.00s ( 0.00s elapsed)
MUT time 766.71s (798.76s elapsed)
GC time 1167.46s (1338.92s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.13s elapsed)
Total time 1934.17s (2137.82s elapsed)
%GC time 60.4% (62.6% elapsed)
Alloc rate 1,511,272,271 bytes per MUT second
Productivity 39.6% of total user, 35.9% of total elapsed
###Profiling statistic - current version of the program###
1,360,179,341,888 bytes allocated in the heap
708,349,399,608 bytes copied during GC
12,136,272,040 bytes maximum residency (50 sample(s))
4,809,563,584 bytes maximum slop
37587 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 326664 colls, 0 par 647.22s 684.31s 0.0021s 1.1773s
Gen 1 50 colls, 0 par 96.78s 159.72s 3.1944s 18.9590s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 585.38s (621.03s elapsed)
GC time 744.00s (844.03s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.01s ( 0.01s elapsed)
Total time 1329.39s (1465.07s elapsed)
Alloc rate 2,323,591,494 bytes per MUT second
Productivity 44.0% of total user, 40.0% of total elapsed
###Detailed profiling statistic###
As it can be seen in the following detailed profiling statistic, time and space consumption decreased for merge1. Now it is again group_al''' which needs most of it.
transalign_prof +RTS -N -A4M -H1G -p -RTS ../transalign/test-data/452.xml ../transalign/test-data/U50_vs_SP.xml
total time = 810.01 secs (810012 ticks @ 1000 us, 1 processor)
total alloc = 1,159,483,304,848 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
group_al''' Align 79.7 67.7
trans_align Align 10.3 21.5
readAlignmentCache BlastCache 2.5 3.3
merge_group Align 1.3 0.0
match2align Blast 1.3 2.8
sort_column Align 1.2 1.6
groups Align 1.1 0.1
readAlignments Blast 1.0 1.3
merge1 Align 0.7 1.5
###Space profiling###
This is the space consumption plot of the current version. It can be seen that the main part is consumes by ARR_WORDS which is the internal name for ByteArray#, in this case needed for ByteStrings the input is converted to.
##Overview## The following plots show the development of the time and space consumption during the last weeks. The numbers on the X-axis show the weeks during GSoC, whereas the first week shows the data from the original version of the program. It can be seen that the current version is faster but stills needs more space than the original one.
###Time consumption###
###Space consumption###
#9th Week The milestone for the 8th and 9th week is to finish improving the vector-version of the program and start looking at the parallelization.
##Work To Do
At some parts, list were reincluded in the vector version of the program. We think about adding a new data structure to the program to take care of the lists.
Additionally, there are some options to improve the group_al''' a bit more, e.g. replacing the IntMap by Judy arrays or HashMaps.
##Work Done As a first step, I replaced the IntMap with a HashMap. To be able to compare both versions easily, I wrote an additional method called group_al4 which includes the HashMap. Afterwards, I did some time and space profiling. The results can be seen in the following plots. In comparison to last week's version, the version including the HashMap is a bit slower (but still faster than the original version) and the amount of total memory in use again decreased from 37 GB to 34 GB.
###Time profiling
1,422,297,359,944 bytes allocated in the heap
673,587,351,560 bytes copied during GC
11,549,725,664 bytes maximum residency (66 sample(s))
4,755,821,272 bytes maximum slop
34524 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 342973 colls, 0 par 870.90s 914.44s 0.0027s 1.1391s
Gen 1 66 colls, 0 par 113.79s 193.54s 2.9325s 17.9332s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 327 (0 converted, 0 overflowed, 0 dud, 9 GC'd, 318 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 719.96s (847.34s elapsed)
GC time 984.69s (1107.98s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.02s ( 0.02s elapsed)
Total time 1704.68s (1955.35s elapsed)
Alloc rate 1,975,530,983 bytes per MUT second
Productivity 42.2% of total user, 36.8% of total elapsed
###Space profiling
###Overview plots I updated the overview plots, including the 9th week now, and used a logarithmic scale for the y-axis.
#Next Weeks Here, you can find the entry for the next two weeks.
#References
My github account for transalign.
Compiler flags for profiling
Judy arrays on Wikipedia
Strict HashMaps
Unordered containers
Back to main page.
Previous weeks.