This repository contains all code used to analyze the data and plot figures in the paper:
Barbitoff YA, Polev DE, Shcherbakova EA, Kiselev AM, Glotov AS, Serebryakova EA, Kostareva AA, Glotov AS, Glotov OS, Predeus AV Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage, Scientific Reports 2020, 10(1), 1-13.
Preprint is available at bioRxiv.
./coverage_analysis - all scripts used to make alignment and coverage data manipulations
./coverage_analysis/multimap/ - scripts to analyze coverage difference upon MQ > 10 filtering
./coverage_analysis/norm_curves/ - scripts to calculate normalized coverage profiles from BEDGRAPH and histogram files generated by collect_coverage_data.sh
./coverage_analysis/wie_profiles/ - scripts to make mean WIE profiles for a selection of samples, per-platform
./Fig_1 - Fig_5 - R scripts and data files used to create figures
./variant_analysis - scripts used to analyze variant calling results
./linear_predictions - scripts and dataset for running GLM and random forest predictions of normalized coverage
For Fig_3, some larger data files are available for download via Google Drive.
If you have any questions, please contact Yury A Barbitoff (barbitoff at bioinf me) or Alexander V Predeus (predeus at bioinf me).
