Question

How Can I Accurately Identify Mirna Sequences From Small Rna Seq Results

7

Entering edit mode

11.3 years ago

Ryan Thompson ★ 3.6k

I have a dataset of small RNA-seq reads that have their 3-prime adapter sequence trimmed such that the remaining seqeunce precisely represents the bases present in the biological fragment that was sequenced. Many of these sequences represent mature miRNAs, and I want to count how many times each known miRNA sequence occurs in the dataset. Obviously I can perform an exact string match and get a good estimate, but that ignores the possibility of sequencing errors and imprecise end cleavage. So I think I want to run an alignment of all my reads against every known human mature miRNA sequence allowing for gaps and mismatches and pick the best match for each one (if there is any match good enough). Does anyone know of a fast and efficient way to do this, preferable using R?

mirna rna-seq alignment r • 11k views

ADD COMMENT • link updated 11.3 years ago by Mikael Huss 4.8k • written 11.3 years ago by Ryan Thompson ★ 3.6k

score 3 · Answer 1 · 2013-07-12

I don't know how to do it in R. But here is how I do it:

1) Select reads that range between 18-30 bp and aligned them against reference genome and only keep those reads that originated from the reference genome or got aligned. You can be little easy here with the mismatches. I will go with 2 mismatches.

2) Filtered reads that got aligned to database of small RNA other than miRNA. Either you can use a gtf file for the location of small RNAs or create a database of small rna and remove reads that mapped to them.

The first two steps can be combined or the first step can be totally skipped. I do it as a part of QC.

3) Align the filtered reads against the miRBase using SHRiMP2 under the miRNA mode using default settings. I used precursor microRNA database. I am not sure why I did this but somewhere I read that is advisable to align short reads against precursor miRNA rather than mature miRNA.

4) Then you can use some RPKM generator tool to get counts of various miRNAs from sam or bam file. I use my own script as sam files for miRNA samples are small. In case, a read gets aligned against two miRNAs , then i choose the alignment with the least mismatches.

score 3 · Answer 2 · 2013-07-12

3

Entering edit mode

11.3 years ago

Mikael Huss 4.8k

You could try to map against the mature.fa sequence file from miRBase. It is easy to find out which of those sequences are human (the hsa-XXX sequences). Note that you may need to substitute Ts for Us in the miRBase sequences. If you want to map your reads using R, you could use the subread aligner via the Rsubread package.

ADD COMMENT • link 11.3 years ago by Mikael Huss 4.8k

0

Entering edit mode

Yes, but I'd like to know what alignment tool to use. Can subread or any other short read aligner handle the case where the reference is shorter than the read?

ADD REPLY • link 11.3 years ago by Ryan Thompson ★ 3.6k

0

Entering edit mode

Good question. I have used Bowtie and Bowtie2 for mapping to miRBase and they work (it is not surprising that Bowtie2 works because it includes a local mapping option, but plain Bowtie is also fine). I haven't tried Subread for miRNA mapping. It does have something called "local read alignment" (p. 18 in http://www.bioconductor.org/packages/2.12/bioc/vignettes/Rsubread/inst/doc/SubreadUsersGuide.pdf), but I am not sure if it's equivalent to Bowtie2's local mapping.

ADD REPLY • link 11.3 years ago by Mikael Huss 4.8k

0

Entering edit mode

Could you please tell me which parameter did you use for bowtie ? thanks

ADD REPLY • link 11.2 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Sorry for the late reply ... I played around with different parameter settings and it seems I ultimately used "-l 20 -n 0 -v 2 -a --best --strata" (disregarding 'irrelevant' parameters about threads, output format, quality scale etc)

ADD REPLY • link 11.2 years ago by Mikael Huss 4.8k

0

Entering edit mode

What kind of Alignment rate are you guys getting?

% uniquely mapped

% multimapped

% no mapped

I know that I will vary quite a lot but just wondering, not a lot is posted out there...

I'm getting ~30, 30, 30 for human.

ADD REPLY • link 9.7 years ago by RemiToAmigo • 0