I know the manual way of doing it but is there an automated way of retrieving all genbank files of all species belonging to a taxid like 51291?
It should be something like first retrieve all taxonomy ids of the strains belonging to this superclass taxid and then find the genbank file belonging to that taxid. But so far I couldnt really find a proper way of doing that...
Or retrieving the NC_XXXX id's would be sufficient as well as I already have a genbank download script.
With the use of efetch I know I can retrieve the partent id and lineage. However I cannot find an option to find the childs yet.
handle = Entrez.efetch(db="Taxonomy", id=taxId, retmode="xml")
Some extra code I am working on now, I did some == statements to direct the flow of the program.
def get_TaxonomyChild():
handle = Entrez.esearch(db="Taxonomy", term="Chlamydiales [subtree] AND species[rank]", RetMax="100000")
record = Entrez.read(handle)
IdListOrganisms = record["IdList"]
for organism in IdListOrganisms:
if organism == "813":
handle = Entrez.esearch(db="Taxonomy", term="txid"+organism+"[Organism]", RetMax="100000")
record = Entrez.read(handle)
StrainList = record["IdList"]
for Strain in StrainList:
if Strain == "471472":
print Strain
highly similar: http://www.biostars.org/post/show/18706