Skip to content

Annotating Orthogroups: Blasting against set of known genes #451

@ViriatoII

Description

@ViriatoII

Hey,

Many people are asking about orthogroup annotation (#373 , #411 , #440 ) so I'm sharing my scripts.

I have finished a script that picks up the resulting N0 of Orthofinder and blasts a gene of each orthogroup against a reference set of well annotated genes (this is a tblastn, so it blasts against DNA genes, not proteins). It outputs the orthogroups and the best hit names, as well as identity, length and coordinates of alignment.

The script is here: (the download will come as .txt, please change that to .py)
annotate_ogroups_vs_ref.py

You run it like this:
annotate_ogroups_vs_ref.py N0.tsv ortho_dir/ ref_genes.fasta

  • ortho_dir contains fastas for all species in N0. It's the input dir of the Orthofinder run.
  • ref_genes.fasta is a set of well annotated genes. For example all the arabidopsis genes

Also, do this first: makeblastdb -dbtype nucl -in reference_genes.fasta

We have a script that extrapolates the annotations of the genes in each orthogroup to that orthogroup annotation, in case you have many well studied proteins in your Orthofinder run

Happy annotating,
Ricardo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions