-
Notifications
You must be signed in to change notification settings - Fork 4
Indexing scripts
This page shows how to run the scripts to generate two types of indexes:
Sequence and analysis indexes
The script used to create a sequence index is named create_seq_index.py
, it is run in the following way:
create_seq_index.py --studies SRP000031 --output 1000genomes_pilot1.sequence.index -s settings.ini --analysis_group malaria_low
Where:
-
--studies:
is the ENA study accession id (or ids if multiple comma-separated study ids are passed) that will be included in the index -
--output:
is the file name given to the new index -
-s:
file with the configuration settings to run this script -
--analysis_group:
analysis_group name (will appear in the last column of the index)
Note:
If you see a warning similar to:
INFO:__main__:No population defined for SAMEA2031116. Will be set to 'NA'
It means that this particular sample accession id (SAMEA2031116
) does not have a population defined in the ENA and that the population column will be set to NA
for this index record.
The population information for these samples can be added later using a helper script named add_missing_pop.py
.
This script is run by doing:
add_missing_pop.py -i 1000genomes_pilot1.sequence.index --host mysql-igsr-web -u g1kro -P 4641 -d igsr_website_v2 --output 1000genomes_pilot1.sequence.new.index
Where:
-
-i:
is the sequence index generated usingcreate_seq_index.py
without population information -
--host
,-u
,-P
and-d
are the connection details for the MYSQLIGSR website
database containing the population information for the relevant samples -
--output:
name of the new index file with population information
The script used to create an analysis index is named create_analysis_index
, it is run in the following way:
create_analysis_index.py --studies ERP124807 --output bionano.analysis.index -s settings.ini
Where:
-
--studies:
is the ENA study accession id (or ids if multiple comma-separated study ids are passed) that will be included in the index -
--output:
is the file name given to the new index -
-s:
file with the configuration settings to run this script