-
-
Notifications
You must be signed in to change notification settings - Fork 223
Description
I want to generate a JSON file containing the taxonID, the corresponding scientific name and synonyms (for the sake of simplicity the term synonyms is equivalent to synonyms, common names and Genbank common name), it will have the following structure:
{
taxonid1 : ["sciName, syn1, syn2.."],
taxonid2 : ["sciName, syn1, syn2.."],
......
}
And I'm willing to do that for a group of descendants (e.g Viridiplantae) based on their taxonIDs.
To get the desired results, I first used ncbi.get_descendant_taxa():
descendants = ncbi.get_descendant_taxa('Viridiplantae', intermediate_nodes=True)
To get the list of taxonIDs, and afterwards I downloaded the names.dmp file (which contain the synonyms) from NCBI then extracted the the information needed from it.
I don't know but I felt like I'm doing a repetitive job here, since ete3 downloads the dump files and stores them in sqlite database. But I was forced to follow this approach because when I looked in the database I didn't find all synonyms. For instance if we take Triticum aestivum, it has the following synonyms:
Scientific name: Triticum aestivum L.
Genbank common name: bread wheat
Synonym: Triticum aestivum subsp. aestivum
Triticum vulgare L.
Common name: Canadian hard winter wheat
Common wheat
Wheat
My question is, is there any possibility to add all this information while creating the database, for instance, if we used
ncbi.get_common_names([4565])
We can get:
{4565: ["Canadian hard winter wheat", "Common wheat", "Wheat"]}
And the same thing for synonyms, common names ?
Thank you !