This repository provides the Python scripts used to produce the figures in the manuscript by Young et al. "Defining the Rhizobium leguminosarum species complex", currently submitted for publication.
The scripts are provided here to document the data analysis used in the study. They were written for 'single use' with file names embedded. These have been changed so that the scripts can be tested using local files downloaded from the repository. The following folders are included.
The following scripts are provided here. Each script includes comments describing its function.
find_genes.py used for Figures 1, 3, 6, S1-S4
concat_seqs.py used for Figures 1, 3, S2
ANI_square_plot.py used for Figure 4
ANI_facetgrid_plot.py used for Figure 5
gene_sharing_versus_ANI.py used for Figure 8
gene_sharing_tables+plot.py used for Figure S5
chrom_nonchrom_orthocore.py used for Figure 7
chr_v_non_ANI_facetgrid_plot.py used for Figure 7
01_ANI.R used for Figure 2
Example data that can be used to test the scripts.
5 genome sequences (.fna)
2 Orthogroups.GeneCount.tsv files showing number of copies of each orthogroup in each genome
445_ANI.tab with the complete set of pairwise average nucleotide identity values
ANI_plot.csv ANI against USDA2370, with taxonomic groups
Files that the scripts need to refer to.
bac120_core_genes_Rhizobium.fas - the 120 core proteins (Figures 1, 3)
16S_rRNA_USDA2370.fas - reference 16S rRNA gene (band in Figure 3)
NodC_stds.fas - reference NodC proteins for symbiovars viciae, trifolii, phaseoli ((Figure S4)
NodA_stds.fas - ditto NodA (no figure)
NodD_stds.fas - ditto nodD (no figure)
RecA_AtpD_GyrB.fasta - proteins representing three housekeeping genes (Figures 6, S2)
rpoB_recA_stds.fas - high-throughput amplicon sequences from two genes (Figure S3)
ordered_orthocore4.fas - 3215 genes that are usually chromosomal in the Rlc (Figure 7)
names_in_itol_order.txt strain names in the order of appearance in the phylogeny in Figure 3
strain_info_440.csv strain list including assignment to genospecies
Type_strain_list.tab list of the reference strains representing each genospecies
Type_strain_list_anh.tab ditto but with R. anhuiense type added
gs_colour_list.csv
An empty folder that will receive the output when the scripts are run
A folder with the output files generated by the scripts
Phylogenies were displayed using iTOL, and are available at https://itol.embl.de/shared/rhizobium
Created by Peter Young - [email protected]