How To Find All Rna Stems In Genomic Data?
5
3
Entering edit mode
14.1 years ago
None ▴ 30

I am searching for RNA stems of approx. 10 to 1000 bases. Is there a fast, BLAST-like tool for scanning DNA sequences of, for instance, 10kB?

As I want to extend this method to whole genome searches, exact algorithms are too much time consuming. I am only looking for sequence similarity. Energy values such as MFE should not taken into account. Unfortunately, BLAST lacks the ability to detect the uracil wobble pairs. Thus, very short stems are undetectable.

Is there any alignment tool I can use?

Thank you!

rna rna alignment genome search • 6.0k views
ADD COMMENT
2
Entering edit mode
14.1 years ago
Hanif Khalak ★ 1.3k

Something that might be relevant is a package which looks for bacterial transcriptional stop sites (short stem loops with certain characteristics) called TransTerm.

I co-wrote the the original version and went through a lot of iterations ranging from exact local matching of the DNA with its reverse complement to dynamic programming based alignment (which is what works best). It is tuned to predict sites with the specific bacterial terminator characteristics, but still may be worth a whirl.

You might also try WU-BLAST using a custom substitution matrix. Links can be found in this unrelated answer.

ADD COMMENT
1
Entering edit mode
14.1 years ago
Michael 55k

Not totally sure here: Stem-loop = inverted repeats sequence, right?

So, you could need a tool to find inverted repeats: aka.: Inverted Repeats Finder http://tandem.bu.edu/irf/irf.download.html

I haven't done this neither used the program, only refined the search terms, good luck.

ADD COMMENT
1
Entering edit mode
14.1 years ago
Qdjm 1.9k

Try FastA. There's an option to include your own scoring matrices described in this documentation page. I have not tried it myself but have heard of it being used for a similar task.

If you want the best performance, consider modeling base stacking interactions, not just single nucleotide pairings. You can do it by scoring dinucleotide pairings.

ADD COMMENT
0
Entering edit mode

I don't this is a case for an alignment tool, is it? afaik the op doesn't have a database with known 'stem' sequences to search for in a genome. A stem-loop is a RNA secondary structure that requires an inverted repeat sequence with the genome which afaik cannot be found by fasta.

ADD REPLY
0
Entering edit mode

@Michael Dondrup -- You can use alignment tools to find inverted repeats: BLAST short stretches of the genome against their reverse complements. If set your scoring matrix appropriately (and your gap penalty), then your BLAST score can be a good estimate of the free energy of the stem. There are better solutions, maybe IRF is one of them, but this is one way to do it.

ADD REPLY
0
Entering edit mode

I didn't know that. Can you do this for a whole genome?

ADD REPLY
0
Entering edit mode

@Michael Dondrup -- I don't see why not. There's some messy scripting involved to break the genome up into overlapping chunks, run the separate BLAST/FastA processes and then compile the results. Plus some sanity checking to make sure that the two sides of the stem don't overlap. You'd also have to check both strands for stems because the G-U wobble makes the calculation non-symmetric. It's not pretty. The good news is that it's easy to parallelize.

ADD REPLY
0
Entering edit mode
14.1 years ago
Mary 11k

I don't know if this would help, I haven't used it--but I was aware that on a gene details page at UCSC they provide a section called "mRNA Secondary Structure of 3' and 5' UTRs". So it's only selected sections of the sequences, but it offers various outputs for that data. It must have been a genome-wide survey (but does include free energy, which you don't want).

They say it relies on the Vienna RNA Package. On that page there are a number of different programs and strategies. I don't know if any would suit your needs. There's also a web server and it looks like other programs might be available there. That RNAz one has a "Genomic screen modus".

Maybe you know about all these and they aren't right. But figured I'd mention them.

Here is the TP53 details page I got this from.

ADD COMMENT
0
Entering edit mode

Generally Mfold or ViennaRNA would be the ones to use for RNA structure, but the OP does not want to use MFE (min. free energy)

ADD REPLY
0
Entering edit mode

Yeah, that would be why I wrote "which you don't want". But there are a number of other options over in those two links. I didn't know if all of them use that.

ADD REPLY
0
Entering edit mode
14.1 years ago
None • 0

Thanks for all the replies.

@Michael Dondrup: Well, you are right, I am looking for inverted repeats - but not only exact hits, and additionally, allowing the GU equivalents.

@Hanif Khalak: As far as I understand, BLAST provides some pre-compiled matrices, but only for amino acids. blastn does not offer such an option.

PS: I am sorry, but without registration it seems impossible to answer directly to someones post.

ADD COMMENT
0
Entering edit mode

Did you try the IRF software then?

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6