-
Notifications
You must be signed in to change notification settings - Fork 94
Output Files
Douglas Slotta edited this page Feb 28, 2019
·
16 revisions
Here are the expected output files for a successful annotation. Note that these outputs are not suitable for submission to GenBank (we are working on this part!).
- annot.fna: Genomic sequence(s) in FASTA format, as provided on input
- annot.faa: Protein products annotated on the genome in FASTA format. The FASTA definition line is formatted as a type general identifier (gnl|extdb|<locus_tag>) plus the product name. You can provide the locus tag prefix of your choice in the input metadata YAML file (see Input Files)
- annot.gbk: Annotated genomic sequence(s) in GenBank flat file format. Genes use the <locus_tag>, and protein_ids use the format extdb:<locus_tag>
- annot.gff: Annotation of the genomic sequence(s) in Generic Feature Format Version 3 (GFF3). Sequence identifiers (column 1) correspond to the identifier in the input FASTA file. Identifiers for genes use the format gene-locus_tags (gene-<locus_tag>), and identifiers for CDSs use the format cds-locus_tag (cds-<locus_tag>), matching locus tags in the annot.gbk file. Protein_ids use the format extdb:<locus_tag> similarly to the annot.faa file. Additional information about NCBI's GFF files is available at README_GFF3.txt.
- annot-gb.ent: ASN format of the annotated genomic sequence(s).