Hi all,
I really hope I can interest some folks in helping me figure this out once and for all. I manage a NGS sequencing core and we process quite a bit of miRNA libraries on our MiSeq. These reads are from mouse. Here is the confusing part:
When I map the reads to the mouse genome (GRCm38) using Bowtie:
bowtie --wrapper basic-0 -n 0 -l 15 -e 99999 -k 200 --best --strata
I get over 30 million mapped reads but I only have a little over 2 million reads.
When I map the reads to the mouse genome using Bowtie2:
bowtie2 --local -p 8 -q --phred33 -D 20 -R 3 -N 0 -L 8 -i S,1,0.50
I get about 1.5 million mapped reads (seemingly much more reasonable).
After mapping I use HtSeq-count to count how many of these mapped reads are miRNAs, obviously the Bowtie mapping renders much better results but how should I interpret this? Is is reasonable to count the same read more than one time. Any thoughts or knowledge on this topic would be helpful. Thanks so much.
Since you don't expect gaps (miRNA are small) bowtie1 is appropriate for the alignments. I assume you are seeing multiple mapping for some of the reads (which may be why you have 30 million mappings for 2M reads). See: A: miRNA alignment to the genome
Yes, I understand that these are multiple mapping but how should these be interpreted for downstream analysis. Should all of the alignments be counted and considered for differential expression?
That's the point to not just blindly mapping to the genome and using htseq-count. Don't reinvent the wheel, use something already written for miRNA quantification.
So do you mean don't use htseq to count the miRNAs, what tool do you recommend instead?
That's correct