Question

How To Retrieve Multiple Sequence In One Python Script

0

Entering edit mode

11.0 years ago

ahmedakhokhar ▴ 150

I have a list of Entrez gene IDs, I want to retrieve flanking regions of a mutation in the each (one mutation per) gene. Previously I was using the following code for the retrieval of one entry ...

out_handle = open("example.txt", "w")
from Bio import Entrez, SeqIO
Entrez.email='v.v@biw.kuleuven.be'
handle = Entrez.efetch(db="nucleotide", id="186972394", rettype="fasta", strand=1, seq_start=4000100, seq_stop=4000200, retmode='text')
record = SeqIO.parse (handle, "fasta")
SeqIO.write(record, out_handle, "fasta")
in_handle.close()
out_handle.close()

If some one can help in this regard, as I am totally new to python. Thanks.

python • 4.5k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 11.0 years ago by ahmedakhokhar ▴ 150

0

Entering edit mode

solution: retrieving FASTA sequences from ncbi using biopython

ADD REPLY • link 11.0 years ago by Andrzej Zielezinski 11k

score 0 · Answer 1 · 2014-01-24

Assuming you have the begin and end of each sequence, format the input file: Entrez_GeneID\tBegin\tEnd and try this


from Bio import Entrez, SeqIO

#open the file with your Entrez gene IDs
input_file = open("path/to/to/the/genelist")
out_handle = open("example.txt", "w")
Entrez.email='v.v@biw.kuleuven.be'

line = input_file.readline()

#this is a loop that goes through every single line of your file
while line != "":
    #Assuming each line it of the format Entrez gene ID\tBegin\tEnd
    line = line.strip().split('\t')
    handle = Entrez.efetch(db="nucleotide", id=, rettype=line[0], strand=1, seq_start=line[1], seq_stop=line[2], retmode='text')
    record = SeqIO.parse (handle, "fasta")
    SeqIO.write(record, out_handle, "fasta")
    line = input_file.readline()
    continue

in_handle.close()
out_handle.close()
input_file.close()