Hello everyone: I'm having a problem trying to download gene sequences from the Gene database at NCBI website using biopyhon. I iniciated the code by setting up a basic test search for two gene sequences in the "gene" database for S. coelicolor (txid100226).
from Bio import Entrez
Entrez.email = "chief@marsstation.com"
handle = Entrez.esearch(db="gene",term="txid100226[Organism]",retmax=2)
record = Entrez.read(handle)
The first ID for the first hit on this search is:
record_list = record["IdList"]
print record_list[0]
1096915
So this first ID was used to download the gene of interest by using this:
seq = Entrez.efetch(db="gene",id=record_list[0],rettype="fasta").read()
However the result stored in "seq" is the following:
http://www.ncbi.nlm.nih.gov/data_specs/dtd/NCBI_Entrezgene.dtd">
<Entrezgene-Set>SCO1489 –DNA-binding protein [Streptomyces coelicolor A3(2)]
DNA-binding protein
- Other Aliases:
- SCO1489, SC9C5.13, bldD
- Genomic context:
- Chromosome
- Annotation:
- NC_003888.3 (1592381..1592884)
- ID:
- 1096915
</Entrezgene-Set>
If I put db="protein" instead of gene I get the correct protein sequence.
I realize that one way to download the DNA sequence was manually, directly from the contig NC_003888.3 in S. coelicolor at the position 1592381..1592884 for this particular ID. That info is stored in "seq"
So here is the question: Is there any method (or trick) to download that DNA sequence using biopython? How can I solve this problem?
JFC
Even if I try to change the rettype, it doesn't work. The gene sequence for this example is within contig sequence, so the GI code for this sequence directs you to the contig. I don't know what to do to solve it, but thank you for your answer.
Well no, changing rettype won't work. The only valid rettype for db=Gene is gene_table; valid retmodes are asn.1, xml and text. In short: sequences cannot be retrieved from the Gene database.