I need to extract an upstream and a downstream sequences flanking putative small RNA regions. Does anyone know a python script to this? Thanks,
I need to extract an upstream and a downstream sequences flanking putative small RNA regions. Does anyone know a python script to this? Thanks,
Is it internal/unpublished sequence or from one of the existing genomes? If it is a sequenced genome, this is possible from either BioMart or UCSC Genome Browser style browser. If it is an internal project, the script jockeys can help you I'm sure.
BioMart (or the specific Mart for your species if it's out there):
*Although now that I look at Sequences I'm not sure which radio to choose for this purpose. I'm going to have to ask them that. I think I'd choose exon for now.
UCSC: http://genome.ucsc.edu/
Or you can do all of either one from Galaxy (http://www.usegalaxy.org), and you can store that as a workflow you can always go back to later if you need it again. There is also a "Get Flanks" in the Galaxy menu "Operate on Genomic Intervals".
here's a "script-jockey" python solution (untested, but should get you close).
if, in your blast the est is the subject, then you can substitute query
with subject
and q(start|end)
with s(start|end)
import sys
from pyfasta import Fasta
est_fasta_file = sys.argv[1]
# i'll assume it's tab-delimited...
# and est is the query.
est_mirna_blast = sys.argv[2]
# take 100bp up/down-stream.
xstream = 100
est_fasta = Fasta(est_fasta_file)
for line in open(est_mirna_blast):
# convert to int and 0-based coords.
qstart, qstop, sstart, sstop = [int(x) - 1 for x in line.split("\t")[6:10]]
query, subject = line.split("\t")[:2]
up = min(0, qstart - xstream)
down = qstop + xstream + 1
est_upstream = est_fasta[query][up:qstart]
est_dowstream = est_fasta[query][qstop:down]
print ">%s_up" % query
print est_upstream
print ">%s_down" % query
print est_dowstream
Hello brentp I would like to do same thing & I tried the above script you wrote. it gives me the following error message while I am running it
python seq_ext_coordinate.py
Traceback (most recent call last):
File "seq_ext_coordinate.py", line 5, in <module>
EST_030516_lnr_tab.fa = sys.argv[1]
IndexError: list index out of range
Could you please tell me what I did wrong?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Unclear: what is a "small RNA region"? A segment of chromosome encoding putative small RNA transcript?
I am trying to find conserved small RNA in using ESTs. I have blast known miRNAs to the EST collection. I have the locations where miRNA match to each EST. Now I need to get upstream and downstream regions from these ESTs and see if they fold like miRNA precursors. Hope this is clear.
I am working a similar project. Please message me if you still need help. I'm a student at CWRU - Biology Dept.