Question

While generating the % of mapped reads to all miRNAs, do I need to consider unique or multi-match reads?

0

Entering edit mode

6.6 years ago

bioinforesearchquestions ▴ 370

Hi folks,

I am trying to create a table for the distribution of mapped RNA sequenced reads across miRBase v.21 for each isomiRs.

I am using contents from this miRBase.mrd file to generate the reads mapped to isomiRs. I have copied the sample output from miRBase.mrd file generated using miRDeep2.

I have written a perl code to identify the isomirs using miRBase.mrd file.

As per the above output, some of the sequenced reads are present in both "ssc-let-7a-1" and "ssc-let-7c".

For example,

seq_3228221_x43

seq_3218517_x44

seq_2697760_x253

Note: seq_3228221 is ID and x43 refers to 43 times this sequence present in the FASTQ file.

I am aware that miRDeep2's mapper file uses following bowtie command "bowtie -p 1 -f -n 0 -e 80 -l 18 -a -m 5 --best --strata " to generate the alignment file. This command allows the same sequence to map to multiple miRNAs.

While generating the percentage of mapped reads for all miRNAs, do I need to consider only unique reads (seq_2697760_x253 only once) or reads with multiple matches then it becomes 2*253=506 reads?

For reference, this is the miRDeep2's final read count output file for all miRNAs

miRNA mirdeep2 RNAseq • 1.7k views

ADD COMMENT • link 6.6 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

I think you should count every read only once. It may map to 100 miRNA, but it is still one read. (I understand you are talking about "sequences" and not reads. I assume that "sequences" are just clustered identical reads, thus the situation doesn't change). The best thing, however, would be to do a very small experiment, if you have time. Try to do the analysis with only 1 sequence (mapping in multiple positions). From there, you will easily understand what is the correct way of treating it.