Hello, I'm currently working with biopython's 'Entrez' library and finding it very frustrating and lacking proper documentation, I'm just trying to find all the sequencing data for lacY in e.coli and download it into SeqIO.
https://www.ncbi.nlm.nih.gov/gene/949083
from Bio import Entrez, SeqIO
# Search for e.coli lacY gene and find id's (GenBank Accession Numbers)
handle = Entrez.esearch(db='nucleotide', retmax=10, term='Escherichia coli[Orgn] AND lacY[Gene]')
record = Entrez.read(handle)
id_list = record['IdList']
# Efetch genbank data
handle1 = Entrez.efetch(db='nucleotide', id=id_list[0], rettype='gb', retmode='text')
print(handle1.read())
record1 = SeqIO.read(handle1, "genbank")
print(record1.seq)
# Error occurs, "ValueError: No records found in handle"
The library is able to download the id_list
s, and printing out the handle read it's found the link provided, but it can't download the actually fasta data from it. I'm interested to know if anyone else has been able to solve this programmatically, I could always download the fasta files manually but this was only a test run for a larger project I'm working on.
Thanks!
Hmm I'm still having issues finding fasta info related to this link lacY lactose permease, is there something weird about the way the ncbi accesses databases? The link contains 'gene' so when I esearch for it I get the correct id in my list, but once I use efetch with that id it fails - is there a super secret id that efetch actually uses to find fasta info? My assumptions are that 1. I've actually picked a very weird edge case where the data I'm looking for is actually in another database, and trying to access it using it's gene id fails because it's not actually there. 2. I'm missing some crucial info about how the efetch api actually works and I'm using it very wrong. I'll keep looking through the EUtils documentation, but it seems pretty vague on these details. Thanks for the help!
Hi, I came acorss very same problem. Did you myb find a soulution, or you have a follow up?