Entering edit mode
11.5 years ago
Bright
•
0
Hello,
I am using Biopython's "Entrez.esearch" and "Entrez.read" functions to download a list of gene IDs from NCBI. The code is working perfectly, no error is returned, and I have been able download a couple of IDs. However, Biopython cannot find some of the gene names I provided in the code. Even though I can find the gene IDs by searching for them on the NCBI website, there are many genes in my list as such I want to automate the search.
Is there a reason for this problem? Is there another way I could use Biopython to access the IDs from NCBI?
Thanks.
It's very hard to help without examples. Can you provide a couple of IDs that you can find via the web interface to entrez but not via Entrez.search?
Hi,
Here are the IDs (and corresponding genes) that can be found via the web interface but not from BioPython.
453232348, JC8.14, 'IV: 13253845..13254201'
453232067, F35G12.1, 'III: 4568306..4568878'
453232767, snoRNA:ZK994.7, 'V: 8500206..8500537'
Here is the example code that does not give ID:
Entrez.email = "myemail" # Always tell NCBI who you are
search = Entrez.read(Entrez.esearch(db='nucleotide', term='JC8.14[gene] "Caenorhabditis elegans"[orgn]', retmode='xml'))
print(search["IdList"])
Output: []
Here is an example where we get IDs.
Entrez.email = "myemail" # Always tell NCBI who you are
search = Entrez.read(Entrez.esearch(db='nucleotide', term='sgk-1[gene] "Caenorhabditis elegans"[orgn]', retmode='xml'))
print(search["IdList"])
Output: ['413004852', '453232919', '392928192', '449020132']
Your search doesn't return anything in the web interface either. If you check out the records returned by just searching on JC8.14 you'll see it's not a gene name...
The way it has been represented in the code given above is just BioPython syntax. Searching for JC8.14 returns some results. And here is the desired result in FASTA format - http://www.ncbi.nlm.nih.gov/nuccore/453232348?report=fasta
The query format isn't Biopython-specific, and if you look at that record (which is a whole chromosome) in Genbank format you'll see JC8.14 isn't a gene name, so searching on gene won't discover it.
I went through the results again and your observation is true. Thank you very much for the clarification and your help. Do you know any other way whether by we could pull gene information and sequences from the NCBI website?
Everything you can get from the website you can get via Entrez - it's just a matter of having the right IDs to search in against the right fields. Without knowing what you are trying to do, it's not really possible to provide more specific help.
Thank you for your response David. They were very helpful.
I'll echo David's point that unless you give some specific examples we can't help you. All the cases like this I have looked at the user isn't doing the same search on the website and via Biopython - often there has been a subtle difference like missing quotes or similar.