-
Notifications
You must be signed in to change notification settings - Fork 199
Description
Hey,
Many people are asking about orthogroup annotation (#373 , #411 , #440 ) so I'm sharing my scripts.
I have finished a script that picks up the resulting N0 of Orthofinder and blasts a gene of each orthogroup against a reference set of well annotated genes (this is a tblastn, so it blasts against DNA genes, not proteins). It outputs the orthogroups and the best hit names, as well as identity, length and coordinates of alignment.
The script is here: (the download will come as .txt, please change that to .py)
annotate_ogroups_vs_ref.py
You run it like this:
annotate_ogroups_vs_ref.py N0.tsv ortho_dir/ ref_genes.fasta
- ortho_dir contains fastas for all species in N0. It's the input dir of the Orthofinder run.
- ref_genes.fasta is a set of well annotated genes. For example all the arabidopsis genes
Also, do this first: makeblastdb -dbtype nucl -in reference_genes.fasta
We have a script that extrapolates the annotations of the genes in each orthogroup to that orthogroup annotation, in case you have many well studied proteins in your Orthofinder run
Happy annotating,
Ricardo