from a wet lab experiment, an unknown small RNA was detected in a sample. the sample was later filtered for the RNA approximate size and sequenced to try and find out what it is. im now working on this sequence data.
i wanted to ask what you suggest is the best reference to use when aligning and annotating this read data?
i was thinking of a couple of options:
- align to whole genome reference and then annotate the regions with most aligned reads.
- align to ncRNA and see if i get lucky and one of them is the unknown RNA.
- use biomart and get all the unspliced genes sequences with my reads base numbers flank
- use biomart and get all the unspliced transcripts sequences, align to it and then see what transcript is most abundant
any thoughts or suggestions?
hi giovanni - i didnt translate it, since i dont know which of the reads align to the unknown RNA. this is the same for using blat, i can check the most common unique reads on blat, but for all the reads, it seems cumbersome. about the RNA aligners, since im dealing with small RNA, shouldnt i treat it like i treat miRNA (regular alignment to a reference sequence)?
how short it is? Is it very short and you are sure it is not an artifact due to a restriction in the technique, then it is unlikely to be a coding sequence.
the segment is around 70bp long, the lab that ordered this analysis doesnt believe it is a coding sequence.
Do you have one 70bp sequence and want to determine what it is likely to be? How about a BLAST search? http://blast.ncbi.nlm.nih.gov/Blast.cgi