Hi biostars,
I've performed an alignment using Bowtie on small RNAseq reads (22-50 nt) from total RNA-Seq sequencing experiment. I got almost 90% of multiple mapped reads. Then, I counted the reads per biotype (gtf file from Ensembl) using mmquant program (which is designed for counting tasks in the case of high rate of multiple mapping reads, HTSeqcount and featureCounts don't take into account the multiple mapped reads, that is why I've used mmquant). After getting the matrix of count, and using a shell script I was able to count the reads per biotype class (protein_coding, lincRNA, rRNA, ...). I got like 80% of the alignments falling in intergenic regions (lincRNA), and only 6% of my reads correspond to protein_coding !!!
Can I continue downstream analysis with such results ?
Any idea ?
Did I understand correctly that you have sequenced small RNAs such as miRNA and expect protein coding genes?
It is total RNAseq experiment. The sequencing has been done on degraded RNA samples (single-end) and with a particular library preparation protocol, that is why I got very short RNAseq reads. We don't target any class of RNAs.
If it's total RNA I would expect that you have >80% rRNA
even if rRNAs have been removed during the experiment with rRNA depletion kit ?
You did not include that critical piece of information in original post. If that is true (and if the depletion did work as expected) it is unclear why you have 90% multi-mapped reads (per
featureCounts/htseq-count
?).Since the length of my reads is distributed between 22-50 nt, I think it is clear I got a high rate of multiple mapped reads. A very short read of 25 nt will get a higher number of multiple aligned locations on the genome than a read of a higher length. I am right ?
No, in that case no. Sorry, I forgot that option.
Just to confirm. You are expecting to get
smallRNA
reads from a total RNAseq dataset only because you are aligning with bowtie v.1?