I have some small RNA-seq data and I want to predict the miRNA sequence. Genome sequences is available for my organism. After preprocessing like adapter removal, redundant reads removal I choose only 18-24 bp long reads for further analysis. I align this reads with genome and chose only those reads which align with genome for next step. I align reads with RFAM, plant repeats and mRNA sequence of my genome to discard any contamination. Final reads (which are not aligned with RFAM, plant repeats and mRNA) are aligned with mature miRNA sequence obtained from miRBASE database. but bowtie result comes like that
2562148 (100.00%) were unpaired; of these:
2560555 (99.94%) aligned 0 times
264 (0.01%) aligned exactly 1 time
1329 (0.05%) aligned >1 times
0.06% overall alignment rate
Is it OK to get such a low number of reads? I use this command for bowtie alignment:
bowtie2 -N 0 --un unaligned/unaligned_mirbase_mature.fasta -x mature_mirna/mirbase_mature -f aligned/aligned_to_genome.fasta
Thanks
What is the protocol for library preparation? Did it involve a poly-A selection?
yes it includes that. Thanks
Do mature microRNAs have a poly-A tail?
It is not clear to me he is using an appropriate library preparation protocol, but I have little experience with small RNA. And no, I believe mature microRNAs do not have poly-A tails.
Why would you remove redundant reads? That's kinda counterproductive. I highly recommend you to first read publications about small RNA-Seq and microRNAs, before performing any further analysis.
My tip: Think about the protocol and think about the molecules you sequenced, would you expect redundancy?
He removed redundant reads, thats explains the low result :) he just discard them all and leave one read for each miR
Be careful with discarding repeats and mRNA hits as some miRNA have transposon or exon origin an you will lose those.