-
Notifications
You must be signed in to change notification settings - Fork 10
Submitting UCE Data to NCBI Genbank
Title: Submitting UCE Data to NCBI Genbank
Project: faircloth-lab documentation project
Author: Carl Oliveros, Brant Faircloth
Affiliation: faircloth-lab
Web: http://faircloth-lab.org
Date: 22 June 2017
These are the steps to follow to submit data from enriched UCE contigs (and other sorts of enriched, contig-like data) to the NCBI Targeted Locus Study database, which is part of NCBI Genbank.
-
Register an NCBI BioProject (if you have not already done so)
-
Register NCBI BioSamples for a BioProject (if you have not already done so)
-
Find the alignment files for the incomplete matrix containing data from ALL of the loci you enriched (this should be the untrimmed contigs).
-
Prepare an
ncbi.conf
file that contains metadata for all contigs for each sample that looks similar to:[metadata] molecule:DNA moltype:genomic specimen_voucher:{} bioproject:PRJNA304409 biosample:{} [organisms] abroscopus_albogularis_28164:Abroscopus albogularis acanthiza_murina_12152:Acanthiza murina [biosamples] abroscopus_albogularis_28164:SAMN04301695 acanthiza_murina_12152:SAMN04301696 [vouchers] abroscopus_albogularis_28164:KU:28164 acanthiza_murina_12152:KU:12152
Use only valid institution codes (which can be found here) in the voucher information. You can also include an
[exclude taxa]
or an[exclude loci]
section in thencbi.conf
file if you wish to exclude some samples or loci. -
Run the
phyluce_ncbi_prep_uce_align_files_for_ncbi_targeted_locus_db
program against the config file and the folder of alignments that you are annotating.phyluce_ncbi_prep_uce_align_files_for_ncbi_targeted_locus_db \ --alignments /path/to/your/alignments \ --conf ncbi.conf \ --output fsatbl_directory \ --input-format nexus
This program will create one
.fsa
file and one.tbl
file for each of your samples. -
Create metadata file on http://www.ncbi.nlm.nih.gov/WebSub/template.cgi. Fill in the form and download it as
template.sbt
. -
Create a text file named
comment.txt
with contents similar to:We identified thousands of ultra-conserved elements (UCEs) in 106 birds to examine songbird diversification. All of the UCEs from one bird are included in a single TLS project.
-
Use the wizard in https://submit.ncbi.nlm.nih.gov/structcomment/nongenomes/ to create a an assembly structured comment and save it as
assembly.cmt
. -
Get tbl2asn. For a description of all command line arguments of tbl2asn, go to http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/. Make a separate folder named
sequin
that will hold the*.sqn
files you are about to generate. Now, runtbl2asn
on each of the files in thefsatbl_directory
using the command:tbl2asn -t template.sbt \ -p fsatbl_directory \ -Y comment.txt \ -w assembly.cmt \ -H y -a s -V v -r sequin
Check the validations files (
*.val
) in the sequin folder and correct any errors. You can ignore warnings. -
Submit all
*.sqn
files using SequinMacroSend by filling in your information along with the following addition as a "note":
```
This submission is meant for the Targeted Locus Study database and the contigs
associated with this submission are from target enriched ultraconserved element
loci (sensu Faircloth et al. 2012). Several of these UCE sequences may be <200
bp in length. In previous emails between Brant Faircloth, Michael Baxter, Rich
McVeigh, and DeAnne Olsen Cravaritis, it was decided (not sure who) that for
these types of loci (associated with ultra-conserved elements) accepting
sequences < 200 bp was allowed.
```
If you have more than 19 files, it appears you need to upload these in batches. Once submitted, GenBank staff will provide your accession numbers and/or to feedback on your submission.
- Provided NCBI staff do not notify you of any problems, YOU ARE DONE!! You can safely ignore the steps below - you should not have to deal with Sequin.
In the case that you are asked to perform vector screening by NCBI staff, you will need to perform the following steps:
-
Open the Sequin program and for each of the
*.sqn
file in the sequin folder (yes, each and every one of them) perform the following steps:- Open the file using the "Read Existing Record" button:
- Select the "Edit" menu, then select "Edit submitter info." Enter a release date on the Submission tab then click "Accept":
- Select the
Search
menu, then selectVector Screen
andVector Search & Trim Tool
. Click on theSearch Univec
button and wait for the search results:
-
After the vector search has completed, click on
Select Only Strong and Moderate
then click onTrim Selected Sequences
. After trimming has completed, click onDismiss
and close theTrimmed Locations
window. -
Select the
Search
menu, then selectValidate
. Resolve any errors if any are found. -
Select the
File
menu, then selectSave As
. Add-final
suffix to the file name (e.g.split_1-final.sqn
). ClickYes
when asked to "propagate descriptors", then close the window.
- Repeat the above steps for all
*.sqn
files in thesequin
folder.