Retrieving taxonomy from entrez search in biopython
0
0
Entering edit mode
2.4 years ago
pramirez ▴ 10

I annotated a list of protein sequences using NCBI. Now I have the list of proteins and their corresponding accession numbers for NCBI. I want to use biopython to search for the taxonomy of the sequences and print the PHYLUM only. I wrote a script that successfully searches for the entries in the protein db. The script returns all the information on the search. Do you know how can I obtain the phylum only? Thanks!

df = pd.read_csv('final.csv', sep='\t', decimal='.')
Entrez.email = ‘#####’

species_list = ['OGI11933.1']




 def get_tax_data(taxid):

search = Entrez.efetch(id = taxid, db = "Protein", retmode = "xml")
return Entrez.read(search)

for species in species_list:


   taxid = species_list  # Apply your functions
   data = get_tax_data(taxid)

   #lineage = {d['Rank']:d['ScientificName'] for d in data[0]['GBSeq_taxonomy'] if d['Rank'] in ['phylum']}
   taxid_list.append(taxid) # Append the data to lists already initiated
   data_list.append(data)



 print(data)

This returns all the information on the entry:

[{'GBSeq_locus': 'OGI11933', 'GBSeq_length': '230', 'GBSeq_moltype': 'AA', 'GBSeq_topology': 'linear', 'GBSeq_division': 'ENV', 'GBSeq_update-date': '19-OCT-2016', 'GBSeq_create-date': '19-OCT-2016', 'GBSeq_definition': 'MAG: 30S ribosomal protein S3 [Candidatus Micrarchaeota archaeon RBG_16_36_9]', 'GBSeq_primary-accession': 'OGI11933', 'GBSeq_accession-version': 'OGI11933.1', 'GBSeq_other-seqids': ['gb|OGI11933.1|', 'gnl|WGS:MFRR|A3K64_00470', 'gi|1083728961'], 'GBSeq_project': 'PRJNA288027', 'GBSeq_keywords': ['ENV', 'Metagenome Assembled Genome', 'MAG'], 'GBSeq_source': 'Candidatus Micrarchaeota archaeon RBG_16_36_9 (subsurface metagenome)', 'GBSeq_organism': 'Candidatus Micrarchaeota archaeon RBG_16_36_9', 'GBSeq_taxonomy': 'Archaea; Candidatus Micrarchaeota', 'GBSeq_references':}]

Thanks!

ncbi python biopython taxonomy entrez • 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 1327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6