Question

Extracting An Up Stream Or Downstream Sequence From Given Position

3

Entering edit mode

14.2 years ago

Janake ▴ 170

I need to extract an upstream and a downstream sequences flanking putative small RNA regions. Does anyone know a python script to this? Thanks,

small data genomics sequence • 9.0k views

ADD COMMENT • link updated 14.2 years ago by brentp 24k • written 14.2 years ago by Janake ▴ 170

2

Entering edit mode

Unclear: what is a "small RNA region"? A segment of chromosome encoding putative small RNA transcript?

ADD REPLY • link 14.2 years ago by Neilfws 49k

0

Entering edit mode

I am trying to find conserved small RNA in using ESTs. I have blast known miRNAs to the EST collection. I have the locations where miRNA match to each EST. Now I need to get upstream and downstream regions from these ESTs and see if they fold like miRNA precursors. Hope this is clear.

ADD REPLY • link 14.2 years ago by Janake ▴ 170

0

Entering edit mode

I am working a similar project. Please message me if you still need help. I'm a student at CWRU - Biology Dept.

ADD REPLY • link 14.2 years ago by Moss ▴ 20

score 12 · Answer 1 · 2010-09-10

Is it internal/unpublished sequence or from one of the existing genomes? If it is a sequenced genome, this is possible from either BioMart or UCSC Genome Browser style browser. If it is an internal project, the script jockeys can help you I'm sure.

BioMart (or the specific Mart for your species if it's out there):

Go to BioMart.org
Click Martview
Select the database and data set.
In Filters, set the region. Use this item to enter your list: Multiple Chromosomal Regions (Chr:Start:End:Strand)
In Attributes, choose Sequences. You also need to pick a nucleotide box* here, but I'm not sure which for this application.
Set and upstream and downstream flank.
Click results.

*Although now that I look at Sequences I'm not sure which radio to choose for this purpose. I'm going to have to ask them that. I think I'd choose exon for now.

UCSC: http://genome.ucsc.edu/

Go to the Table Browser
Set appropriate species/assembly stuff.
On Region line, set items in Define Region button.
Choose output format = sequence
click Get Output.
On next page say Genomic
Set upstream/downstream length
Get Sequence button

Or you can do all of either one from Galaxy (http://www.usegalaxy.org), and you can store that as a workflow you can always go back to later if you need it again. There is also a "Get Flanks" in the Galaxy menu "Operate on Genomic Intervals".

Ram · Answer 2 · 2010-09-10

7

Entering edit mode

14.2 years ago

brentp 24k

here's a "script-jockey" python solution (untested, but should get you close).

if, in your blast the est is the subject, then you can substitute query with subject and q(start|end) with s(start|end)

import sys
from pyfasta import Fasta

est_fasta_file = sys.argv[1]

# i'll assume it's tab-delimited...
# and est is the query.
est_mirna_blast = sys.argv[2]

# take 100bp up/down-stream.
xstream = 100

est_fasta = Fasta(est_fasta_file)

for line in open(est_mirna_blast):
    # convert to int and 0-based coords.
    qstart, qstop, sstart, sstop = [int(x) - 1 for x in line.split("\t")[6:10]]
    query, subject = line.split("\t")[:2]
    up = min(0, qstart - xstream)
    down = qstop + xstream + 1

    est_upstream = est_fasta[query][up:qstart]
    est_dowstream = est_fasta[query][qstop:down]

    print ">%s_up" % query
    print est_upstream
    print ">%s_down" % query
    print est_dowstream

ADD COMMENT • link updated 5.3 years ago by Ram 44k • written 14.2 years ago by brentp 24k

0

Entering edit mode

I'm running ubuntu and am having trouble running pyfasta... any suggestions? I am unfamiliar with python, so I am not sure how to "make" "install".

ADD REPLY • link 14.1 years ago by Moss ▴ 20

0

Entering edit mode

@Moss just run "sudo python setup.py install" or if you have setuptools: "sudo easy_install pyfasta"

if you have any troubles. send me an email bpederse[?]gmail.com

ADD REPLY • link 14.1 years ago by brentp 24k

0

Entering edit mode

Hello brentp I would like to do same thing & I tried the above script you wrote. it gives me the following error message while I am running it

python seq_ext_coordinate.py 
Traceback (most recent call last):
  File "seq_ext_coordinate.py", line 5, in <module>
    EST_030516_lnr_tab.fa = sys.argv[1]
IndexError: list index out of range

Could you please tell me what I did wrong?

ADD REPLY • link 8.5 years ago by tcf.hcdg ▴ 70