danbing-tk v1.0
Improvements:
- Improved length estimation accuracy using multi-boundary expansion, due to more accurate orthology mapping of VNTRs across haplotypes.
- More stringent QC on VNTR size, number of supporting haplotypes, consistency of liftover coordinates, etc.
- Slightly expand VNTR set from 29,111 to 32,138 loci.
- Added more user-friendly length estimation script.
- Added option for alignment output by using -awithdanbing-tk align
- DOI created using Zenodo
Additional resources:
- Repeat-pangenome graph encoded as pan.tr.kmers,pan.ntr.kmersandpan.graph.kmersinRPGG.tar.gz
- 84,411 raw VNTR coordinates tr.84411.bed
- 32,138 raw VNTR coordinates (high-confidence genotypable set) tr.good.bed
- 397 non-VNTR regions ctrl.bed
- Locus-specific biases of VNTR and non-VNTR regions LSB.tsv
- Summary of eGene discoveries Alltissue.egenes.tsv
- Comprehensive VNTR statistics vntr.statistics.tsvvntr.statistics.README
- 13 PacBio CLR assemblies (26 haplotypes) *.h?.fasta.gz
- 32,138 boundary-expanded VNTR coordinates in the 26 haplotypes pan.tr.mbe.no_CCS.bedandpan.tr.mbe.no_CCS.README
- 73,582 boundary-expanded VNTR coordinates pan.tr.73582.mbe.no_CCS.bed
Example analyses:
- QC of multi-boundary expansion 202011.MultiBoundaryExpansion.QC.ipynb
- Measuring length prediction accuracy 202012.Acc.pan.ipynb
- Contrasting the most informative kmer between populations 202012.mikmer.ipynb
- eQTL mapping 202012.eQTL.32138.ipynb
- Sample QC on locus-specific bias LSB_analysis.ipynb
- Heritability analysis of SNP v.s. SNP+VNTR models 202011.sg.joint.ipynb
- Miscellaneous analyses in the original manuscript 202012.revision.supp.ipynb