This is the repository for the book chapter "Recombination rate estimation with pyrho
".
Here we describe the included files, and how to reproduce the results in the chapter.
The paper
directory
contains the LaTeX source code, a .pdf of the paper, and all of the figures.
The code
directory contains a jupyter notebook that can generate all
of the plots (except for the schematic in Figure 1, which was created in Keynote),
and all of the data necessary to create the plots.
The results of all of our simulations and inference runs are contained
in combined_results.csv
. To regenerate those results (up to some
stochasticity from not setting random seeds in convert_ts_to_vcf.py
and pyrho hyperparam
)
one would first need to run the Snakefile
, but beware that this will
require hundreds of core hours. The Snakefile
does contain all of the
relevant snippets of calling pyrho
however, so it may be useful to
look through. The Snakefile
calls the script convert_ts_to_vcf.py
,
which takes the simulated output from stdpopsim
and performs downsampling,
adds switch errors, and masks some genotypes as missing. It also calls
the script compute_corr.py
which compares the recombination maps inferred
(in the course of running the Snakefile
) to the recombination map used to
simulate the data, PyrhoYRI_GRCh38_chr1.txt
. Once the Snakefile
has
finished running, one can create a file with all of the results by running
munge.py
which will create a file called combined_results.csv
(overwriting the one currently in the repo!).
Note that a few of the jobs without the --fastmissing
flag ran extremely slowly (taking longer than 10 hours). Those will nol
longer get generated in this version of the
Snakefile
, but could be added back by editing line 57 of the
Snakefile
.