Question

Retrieve Genbanks From Taxid

1

Entering edit mode

12.8 years ago

JJK ▴ 60

I know the manual way of doing it but is there an automated way of retrieving all genbank files of all species belonging to a taxid like 51291?

It should be something like first retrieve all taxonomy ids of the strains belonging to this superclass taxid and then find the genbank file belonging to that taxid. But so far I couldnt really find a proper way of doing that...

Or retrieving the NC_XXXX id's would be sufficient as well as I already have a genbank download script.

With the use of efetch I know I can retrieve the partent id and lineage. However I cannot find an option to find the childs yet.

handle = Entrez.efetch(db="Taxonomy", id=taxId, retmode="xml")

Some extra code I am working on now, I did some == statements to direct the flow of the program.

def get_TaxonomyChild():
 handle = Entrez.esearch(db="Taxonomy", term="Chlamydiales [subtree] AND species[rank]", RetMax="100000")
 record = Entrez.read(handle)
 IdListOrganisms = record["IdList"]
 for organism in IdListOrganisms:
      if organism == "813":
         handle = Entrez.esearch(db="Taxonomy", term="txid"+organism+"[Organism]", RetMax="100000")
         record = Entrez.read(handle)
         StrainList = record["IdList"]
         for Strain in StrainList:
             if Strain == "471472":
                 print Strain

biopython taxonomy genbank • 6.2k views

ADD COMMENT • link updated 12.8 years ago by Peter 6.0k • written 12.8 years ago by JJK ▴ 60

1

Entering edit mode

highly similar: http://www.biostars.org/post/show/18706

ADD REPLY • link 12.8 years ago by Pierre Lindenbaum 166k

score 0 · Answer 1 · 2012-07-05

According to the taxonomy FAQ (http://www.ncbi.nlm.nih.gov/books/NBK54428/), you can find all species belonging to a taxa by:

How do I find all of the species in GenBank that belong to a particular group?

You can use Entrez queries to find taxa of a particular rank in a given lineage, e.g.:

Amphibia[subtree] AND species[rank]

You can restrict the output of this list to species with formal Linnaean binomial names:

Amphibia[subtree] AND species[rank] AND specified[prop]

To download the list of species names (1) click ‘Send to’ (2) select ‘File’ (3) switch Format to ‘Taxon name’ and (4) click ‘Create File’. This will create a file named “taxonomy_result” in your download directory.

So for your example, you can do a search for: Chlamydiales [subtree] AND species[rank]

Download all the specie names. Then use the list of specie names to EFetch all sequences belonging to the species.

score 0 · Answer 2 · 2012-07-08

What is your question? If you just want to get the GenBank files for a given taxonomy ID, then (more or less as you showed) you get this with a single ESearch term like taxid12345[orgn] - see also http://news.open-bio.org/news/2009/06/ncbi-einfo-biopython/

There can be complications when you have lots of records and want to download them all (e.g. network errors), see this thread: http://lists.open-bio.org/pipermail/biopython/2012-April/007943.html