Retrieve Genbanks From Taxid
2
1
Entering edit mode
12.5 years ago
JJK ▴ 60

I know the manual way of doing it but is there an automated way of retrieving all genbank files of all species belonging to a taxid like 51291?

It should be something like first retrieve all taxonomy ids of the strains belonging to this superclass taxid and then find the genbank file belonging to that taxid. But so far I couldnt really find a proper way of doing that...

Or retrieving the NC_XXXX id's would be sufficient as well as I already have a genbank download script.

With the use of efetch I know I can retrieve the partent id and lineage. However I cannot find an option to find the childs yet.

handle = Entrez.efetch(db="Taxonomy", id=taxId, retmode="xml")

Some extra code I am working on now, I did some == statements to direct the flow of the program.

def get_TaxonomyChild():
 handle = Entrez.esearch(db="Taxonomy", term="Chlamydiales [subtree] AND species[rank]", RetMax="100000")
 record = Entrez.read(handle)
 IdListOrganisms = record["IdList"]
 for organism in IdListOrganisms:
      if organism == "813":
         handle = Entrez.esearch(db="Taxonomy", term="txid"+organism+"[Organism]", RetMax="100000")
         record = Entrez.read(handle)
         StrainList = record["IdList"]
         for Strain in StrainList:
             if Strain == "471472":
                 print Strain
biopython taxonomy genbank • 6.0k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode
12.5 years ago

According to the taxonomy FAQ (http://www.ncbi.nlm.nih.gov/books/NBK54428/), you can find all species belonging to a taxa by:

How do I find all of the species in GenBank that belong to a particular group?

You can use Entrez queries to find taxa of a particular rank in a given lineage, e.g.:

Amphibia[subtree] AND species[rank]

You can restrict the output of this list to species with formal Linnaean binomial names:

Amphibia[subtree] AND species[rank] AND specified[prop]

To download the list of species names (1) click ‘Send to’ (2) select ‘File’ (3) switch Format to ‘Taxon name’ and (4) click ‘Create File’. This will create a file named “taxonomy_result” in your download directory.

So for your example, you can do a search for: Chlamydiales [subtree] AND species[rank]

Download all the specie names. Then use the list of specie names to EFetch all sequences belonging to the species.

ADD COMMENT
0
Entering edit mode

I am able to get all taxonomy IDs from every species via your method indeed. But one species can consist of many strains and somehow I am unable to retrieve the data.

ADD REPLY
0
Entering edit mode
12.5 years ago
Peter 6.0k

What is your question? If you just want to get the GenBank files for a given taxonomy ID, then (more or less as you showed) you get this with a single ESearch term like taxid12345[orgn] - see also http://news.open-bio.org/news/2009/06/ncbi-einfo-biopython/

There can be complications when you have lots of records and want to download them all (e.g. network errors), see this thread: http://lists.open-bio.org/pipermail/biopython/2012-April/007943.html

ADD COMMENT

Login before adding your answer.

Traffic: 1751 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6