Question

Best Reference For Small Rna Alignment And Annotation

4

Entering edit mode

14.0 years ago

Doctoroots ▴ 800

from a wet lab experiment, an unknown small RNA was detected in a sample. the sample was later filtered for the RNA approximate size and sequenced to try and find out what it is. im now working on this sequence data.

i wanted to ask what you suggest is the best reference to use when aligning and annotating this read data?

i was thinking of a couple of options:

align to whole genome reference and then annotate the regions with most aligned reads.
align to ncRNA and see if i get lucky and one of them is the unknown RNA.
use biomart and get all the unspliced genes sequences with my reads base numbers flank
use biomart and get all the unspliced transcripts sequences, align to it and then see what transcript is most abundant

any thoughts or suggestions?

small alignment annotation reference • 4.4k views

ADD COMMENT • link updated 14.0 years ago by Michael 55k • written 14.0 years ago by Doctoroots ▴ 800

Ram · Answer 1 · 2010-12-03

3

Entering edit mode

14.0 years ago

Giovanni M Dall'Olio 28k

Did you try to translate it? Do you know whether it is a coding mRNA or another type of transcript?

The best way to align a mRNA to a genome is to use Blat, which has models to take into account introns and splicing signals. You can also do it through exonerate; you may have a look at this other question.

You can try to align the RNA to other known RNAs; the best is to use a software designed for this, RNA aligners rely more on the secondary structure that on the RNA sequence.

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 14.0 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

hi giovanni - i didnt translate it, since i dont know which of the reads align to the unknown RNA. this is the same for using blat, i can check the most common unique reads on blat, but for all the reads, it seems cumbersome. about the RNA aligners, since im dealing with small RNA, shouldnt i treat it like i treat miRNA (regular alignment to a reference sequence)?

ADD REPLY • link 14.0 years ago by Doctoroots ▴ 800

0

Entering edit mode

how short it is? Is it very short and you are sure it is not an artifact due to a restriction in the technique, then it is unlikely to be a coding sequence.

ADD REPLY • link 14.0 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

the segment is around 70bp long, the lab that ordered this analysis doesnt believe it is a coding sequence.

ADD REPLY • link 14.0 years ago by Doctoroots ▴ 800

0

Entering edit mode

Do you have one 70bp sequence and want to determine what it is likely to be? How about a BLAST search? http://blast.ncbi.nlm.nih.gov/Blast.cgi

ADD REPLY • link 14.0 years ago by Brad Chapman 9.7k

score 2 · Answer 2 · 2010-12-03

I'm not certain I would try Blat if the sequence in question is say 40 bp or smaller. Similarity search algorithms that accelerate the finding of a match sacrifice sensitivity for short queries.

My inclination is to align to the genome and annotate those regions that show perfect to nearly perfect matches - I'd collect nearly perfect because I'm not sure of the quality of the sequence data.

That said, it is not clear from your question how many queries you have. Is this a case of one small RNA or do you have thousands of reads that assemble into dozens or hundreds of distinct small RNA species?

In the end, it will be important to be very certain that the species you've sequenced are either from non-coding RNA genes or introns of protein-coding genes or something else.

Ram · Answer 3 · 2010-12-03

2

Entering edit mode

14.0 years ago

Michael 55k

Align to the genome and see if it overlaps a gene, ORF or if it is intronic or intergenic using a sensitive alignment method (e.g. BLAT, LASTZ, FASTA with short word sizes)
Blast against NT and find similarities in other related genomes
Search RNA databases, e.g Rfam (http://rfam.janelia.org/)
look here: Identified Potential Non-Coding Rna, And Then?

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 14.0 years ago by Michael 55k