It is a very simple, but you have to find sequence GI instead chromosome number.
You can find GI in NCBI's Nucleotide DB.
For example, the mouse chromosome 6 has GI = 307603377, and you want to get a sequence of plus strand from 400100 to 400200:
from Bio import Entrez, SeqIO
Entrez.email ="A.N.Other@example.com"# Always tell NCBI who you are
handle = Entrez.efetch(db="nucleotide",
id="307603377",
rettype="fasta",
strand=1,
seq_start=4000100,
seq_stop=4000200)
record = SeqIO.read(handle, "fasta")
handle.close()
print record.seq
Parameters description from NCBI's efetch help:
strand - what strand of DNA to show (1 = plus or 2 = minus)
seq_start - show sequence starting from this base number
seq_stop - show sequence ending on this base number
complexity - gi is often a part of a biological blob, containing other gis
I'm looking at some miRNA sequences for TFBS and was going to ask a similar question being a python newbie myself (although the Biopython cookbook was helping). Anyway, great timing!
No, it's not a homework. Thanks for your suggestion.
I'm currently doing some research on 3' UTR region. I got the 3' UTR coordinates from USCS and need to know the sequence about them.
I know this can be done use galaxy.
As galaxy is written in python, just wonder if there is a module within biopython can do the same work or not.
Thanks a lot for your editing and rephrasing, Eric.