Annotating Orthogroups: Blasting against set of known genes

Hey,

Many people are asking about orthogroup annotation (https://github.com/davidemms/OrthoFinder/issues/373 , https://github.com/davidemms/OrthoFinder/issues/411 , https://github.com/davidemms/OrthoFinder/issues/440 ) so I'm sharing my scripts.

I have finished a script that picks up the resulting N0 of Orthofinder and blasts a gene of each orthogroup against a reference set of well annotated genes (this is a tblastn, so it blasts against DNA genes, not proteins). It outputs the orthogroups and the best hit names, as well as identity, length and coordinates of alignment.

The script is here: (the download will come as .txt, please change that to .py)
[annotate_ogroups_vs_ref.py](https://github.com/davidemms/OrthoFinder/files/5174435/annotate_ogroups_vs_ref.txt)

You run it like this:
`annotate_ogroups_vs_ref.py   N0.tsv   ortho_dir/  ref_genes.fasta
`
 - ortho_dir contains fastas for all species in N0.  It's the input dir of the Orthofinder run.
 - ref_genes.fasta is a set of well annotated genes. For example all the [arabidopsis](https://www.arabidopsis.org/download_files/Genes/Araport11_genome_release/Araport11_blastsets/Araport11_genes.201606.cds.fasta.gz) genes

Also, do this first:  makeblastdb  -dbtype nucl -in reference_genes.fasta

We have a script that [extrapolates the annotations of the genes in each orthogroup to that orthogroup annotation](https://github.com/davidemms/OrthoFinder/pull/434), in case you have many well studied proteins in your Orthofinder run


Happy annotating,
Ricardo


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Annotating Orthogroups: Blasting against set of known genes #451

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Annotating Orthogroups: Blasting against set of known genes #451

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions