danbing-tk v1.0
Improvements:
- Improved length estimation accuracy using multi-boundary expansion, due to more accurate orthology mapping of VNTRs across haplotypes.
- More stringent QC on VNTR size, number of supporting haplotypes, consistency of liftover coordinates, etc.
- Slightly expand VNTR set from 29,111 to 32,138 loci.
- Added more user-friendly length estimation script.
- Added option for alignment output by using
-awithdanbing-tk align - DOI created using Zenodo
Additional resources:
- Repeat-pangenome graph encoded as
pan.tr.kmers,pan.ntr.kmersandpan.graph.kmersinRPGG.tar.gz - 84,411 raw VNTR coordinates
tr.84411.bed - 32,138 raw VNTR coordinates (high-confidence genotypable set)
tr.good.bed - 397 non-VNTR regions
ctrl.bed - Locus-specific biases of VNTR and non-VNTR regions
LSB.tsv - Summary of eGene discoveries
Alltissue.egenes.tsv - Comprehensive VNTR statistics
vntr.statistics.tsvvntr.statistics.README - 13 PacBio CLR assemblies (26 haplotypes)
*.h?.fasta.gz - 32,138 boundary-expanded VNTR coordinates in the 26 haplotypes
pan.tr.mbe.no_CCS.bedandpan.tr.mbe.no_CCS.README - 73,582 boundary-expanded VNTR coordinates
pan.tr.73582.mbe.no_CCS.bed
Example analyses:
- QC of multi-boundary expansion
202011.MultiBoundaryExpansion.QC.ipynb - Measuring length prediction accuracy
202012.Acc.pan.ipynb - Contrasting the most informative kmer between populations
202012.mikmer.ipynb - eQTL mapping
202012.eQTL.32138.ipynb - Sample QC on locus-specific bias
LSB_analysis.ipynb - Heritability analysis of SNP v.s. SNP+VNTR models
202011.sg.joint.ipynb - Miscellaneous analyses in the original manuscript
202012.revision.supp.ipynb