How To Fetch Genomics Sequence Using Coordinates In Biopython
3
9
Entering edit mode
13.3 years ago
dustar1986 ▴ 380

Hi everyone,

I'm a newbie of biopython. My question may be stupid but I would appreciate your help.

I want to use chromosome number, start position, end position, strand to fetch the corresponding sequence in the mouse genome.

How can this be done with biopython connecting to NCBI database? Could anyone help me please?

Thanks a lot.

biopython sequence retrieval entrez database • 20k views
ADD COMMENT
0
Entering edit mode

Thanks a lot for your editing and rephrasing, Eric.

ADD REPLY
22
Entering edit mode
13.3 years ago
Alex ★ 1.5k

It is a very simple, but you have to find sequence GI instead chromosome number. You can find GI in NCBI's Nucleotide DB.

For example, the mouse chromosome 6 has GI = 307603377, and you want to get a sequence of plus strand from 400100 to 400200:

from Bio import Entrez, SeqIO
Entrez.email = "A.N.Other@example.com"     # Always tell NCBI who you are
handle = Entrez.efetch(db="nucleotide", 
                       id="307603377", 
                       rettype="fasta", 
                       strand=1, 
                       seq_start=4000100, 
                       seq_stop=4000200)
record = SeqIO.read(handle, "fasta")
handle.close()
print record.seq

Parameters description from NCBI's efetch help:

strand - what strand of DNA to show (1 = plus or 2 = minus)
seq_start - show sequence starting from this base number
seq_stop - show sequence ending on this base number
complexity - gi is often a part of a biological blob, containing other gis
ADD COMMENT
0
Entering edit mode

This is great.

I'm looking at some miRNA sequences for TFBS and was going to ask a similar question being a python newbie myself (although the Biopython cookbook was helping). Anyway, great timing!

ADD REPLY
0
Entering edit mode

Extremely helpful. Thanks a lot.

ADD REPLY
0
Entering edit mode

Very helpful. I'm also working on promoter analysis of TFBS. thanks!

ADD REPLY
0
Entering edit mode

The Human chromosomes follow this pattern: "NC_000001", "NC_000002", ..., "NC_000023" (X), "NC_000024" (Y)

ADD REPLY
0
Entering edit mode

How can we get sequences for a certain genome build and group label? example: For homo sapiens, hg19, Grch37.p10 ? Thanks

ADD REPLY
2
Entering edit mode
13.3 years ago
Leszek 4.2k

Another homework?
Use combination of googling and reading, please. There you are biopython cook book.

ADD COMMENT
4
Entering edit mode

@ Leszek- This should have been comment not an answer

ADD REPLY
3
Entering edit mode

No, it's not a homework. Thanks for your suggestion. I'm currently doing some research on 3' UTR region. I got the 3' UTR coordinates from USCS and need to know the sequence about them. I know this can be done use galaxy. As galaxy is written in python, just wonder if there is a module within biopython can do the same work or not.

ADD REPLY
1
Entering edit mode
13.3 years ago

I think you can also use EnsEMBL (and NCBI I believe) via the PyCogent toolkit to do this using Python.

Check out http://pycogent.sourceforge.net/ - the examples and cookbook contain some decent code that may be helpful :-)

ADD COMMENT
0
Entering edit mode

Thanks a lot. It's helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6