Skip to content

jpwyoung/Rlc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rlc: scripts for phylogenies and average nucleotide identity

This repository provides the Python scripts used to produce the figures in the manuscript by Young et al. "Defining the Rhizobium leguminosarum species complex", currently submitted for publication.

The scripts are provided here to document the data analysis used in the study. They were written for 'single use' with file names embedded. These have been changed so that the scripts can be tested using local files downloaded from the repository. The following folders are included.

code


The following scripts are provided here. Each script includes comments describing its function.

find_genes.py used for Figures 1, 3, 6, S1-S4

concat_seqs.py used for Figures 1, 3, S2

ANI_square_plot.py used for Figure 4

ANI_facetgrid_plot.py used for Figure 5

gene_sharing_versus_ANI.py used for Figure 8

gene_sharing_tables+plot.py used for Figure S5

chrom_nonchrom_orthocore.py used for Figure 7

chr_v_non_ANI_facetgrid_plot.py used for Figure 7

01_ANI.R used for Figure 2

data


Example data that can be used to test the scripts.

5 genome sequences (.fna)

2 Orthogroups.GeneCount.tsv files showing number of copies of each orthogroup in each genome

445_ANI.tab with the complete set of pairwise average nucleotide identity values

ANI_plot.csv ANI against USDA2370, with taxonomic groups

reference


Files that the scripts need to refer to.

Sequence files used as blast queries

bac120_core_genes_Rhizobium.fas - the 120 core proteins (Figures 1, 3)

16S_rRNA_USDA2370.fas - reference 16S rRNA gene (band in Figure 3)

NodC_stds.fas - reference NodC proteins for symbiovars viciae, trifolii, phaseoli ((Figure S4)

NodA_stds.fas - ditto NodA (no figure)

NodD_stds.fas - ditto nodD (no figure)

RecA_AtpD_GyrB.fasta - proteins representing three housekeeping genes (Figures 6, S2)

rpoB_recA_stds.fas - high-throughput amplicon sequences from two genes (Figure S3)

ordered_orthocore4.fas - 3215 genes that are usually chromosomal in the Rlc (Figure 7)

Lists of strains

names_in_itol_order.txt strain names in the order of appearance in the phylogeny in Figure 3

strain_info_440.csv strain list including assignment to genospecies

Type_strain_list.tab list of the reference strains representing each genospecies

Type_strain_list_anh.tab ditto but with R. anhuiense type added

Colours assigned to the genospecies

gs_colour_list.csv

output


An empty folder that will receive the output when the scripts are run

expected_output


A folder with the output files generated by the scripts



NOTE

Phylogenies were displayed using iTOL, and are available at https://itol.embl.de/shared/rhizobium


Created by Peter Young - [email protected]

About

Scripts used in "Defining the Rhizobium leguminosarum species complex"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published