I am searching for RNA stems of approx. 10 to 1000 bases. Is there a fast, BLAST-like tool for scanning DNA sequences of, for instance, 10kB?
As I want to extend this method to whole genome searches, exact algorithms are too much time consuming. I am only looking for sequence similarity. Energy values such as MFE should not taken into account. Unfortunately, BLAST lacks the ability to detect the uracil wobble pairs. Thus, very short stems are undetectable.
Is there any alignment tool I can use?
Thank you!
I don't this is a case for an alignment tool, is it? afaik the op doesn't have a database with known 'stem' sequences to search for in a genome. A stem-loop is a RNA secondary structure that requires an inverted repeat sequence with the genome which afaik cannot be found by fasta.
@Michael Dondrup -- You can use alignment tools to find inverted repeats: BLAST short stretches of the genome against their reverse complements. If set your scoring matrix appropriately (and your gap penalty), then your BLAST score can be a good estimate of the free energy of the stem. There are better solutions, maybe IRF is one of them, but this is one way to do it.
I didn't know that. Can you do this for a whole genome?
@Michael Dondrup -- I don't see why not. There's some messy scripting involved to break the genome up into overlapping chunks, run the separate BLAST/FastA processes and then compile the results. Plus some sanity checking to make sure that the two sides of the stem don't overlap. You'd also have to check both strands for stems because the G-U wobble makes the calculation non-symmetric. It's not pretty. The good news is that it's easy to parallelize.