A Dynamic Programming Approach to Segment ONT Signals. Dynamont is a segmentation/resquiggling tool for ONT signals. Dynamont is currently designed for RNA002 and RNA004 data!
- Currently, I applied the trained transition parameters from the RNA004 model to the DNA R10 models. These should be fine-tuned for the DNA models.
pip install dynamont
conda config --add channels jannessp # to install all dependencies from the correct channel
conda create -n dynamont jannessp::dynamont
conda activate dynamont
# segment a dataset
dynamont-resquiggle -r <path/to/pod5/dataset/> -b <basecalls.bam> --mode basic -o <output.csv> -p <pore>
# train model
dynamont-train -r <path/to/pod5/dataset/> -b <basecalls.bam> --mode basic -o <output/path> -p <pore>
# choosing a pore will automatically load the default model for that pore, a custom model can be used with the parameter --pore_model <model/path>
- rna_r9 (tested)
- rna_rp4 (tested)
- dna_r9 not available yet
- dna_r10.4.1 260 bps (not tested)
- dna_r10.4.1 400 bps (not tested)
Dynamont produces a tabular output with the following columns:
Column Name | Description |
---|---|
readid | Unique identifier for the read. |
signalid | Identifier for the signal corresponding to the read. |
start | Start position of the signal segment in the read. |
end | End position of the signal segment in the read. |
basepos | Reference base position in the genomic sequence. |
base | The detected base at this position. |
motif | The surrounding sequence motif in which the base appears. |
state | The methylation state (or modification state) of the base. |
posterior_probability | Probability assigned to the predicted segment. |
polish | Polished kmer, only available in resquiggle mode. |
Below is an example of the output generated by Dynamont:
readid,signalid,start,end,basepos,base,motif,state,posterior_probability,polish
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12762,12777,53,A,AAAAAAAAA,M,0.12434,NA
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12777,12791,52,A,AAAAAAAAA,M,0.12146,NA
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12791,12806,51,A,AAAAAAAAA,M,0.11881,NA
476b4ed2-7865-4f81-9f78-82d614fb40a2,476b4ed2-7865-4f81-9f78-82d614fb40a2,12806,12820,50,A,AAAAAAAAA,M,0.11665,NA
- -11: Segmentation fault
- -9: Out of Memory error. Decrease the number of processes or move to a system with more memory.
- -6: std::bad_alloc
- 1:
resquiggle mode
specific: alignment score (Z) does not match between forward and backward run in preprocessing on signal (T) and read (N). - 2:
resquiggle mode
specific: alignment score (Z) does not match between forward and backward run in preprocessing on signal (T) and error correction (C). - 3: Alignment score (Z) does not match between forward and backward pass or is -Infinity
- 4: Input signal is missing or not found in stdin stream
- 5: Input read is missing or not found in stdin stream
- 6: raw file does not exist
- 7: Invalid model path was provided
- 8: Provided ONT signal is too short
- 9: Read is too short
- 10: Signal is smaller than read
- 11: Read is smaller than
kmerSize
of provided pore model - 20: Terminated using KeyboardInterrupt (Ctrl + C)