Hi! I have some experience working with RNA-seq, but this is the first time I've been asked to do a miRNA-seq analysis and I'm having some trouble with it. Here's the workflow I'm following and the problems I found.
Sample preparation and sequencing: Samples are miRNA extracted from human serum. A spike-in was added in the extraction step (cel-miR-39). The miRNAs were sent to BIG genomics for sequencing, they returned an .fq file with the reads already trimmed for adapters. Mean length of reads is around 21bp.
Quality check with FastQC: no problems here, good quality of reads, high phred score, no adapters... samples had a high duplicates rate, but maybe this is usual for miRNAs
3a. Pseudoaligning with salmon: here's were the problems begin. After pseudoaligning with salmon to the miRBase mature index the mapping rate varies from 5 to 40% among samples
3b. Mapping and counting with Bowtie2 and Htseq: After seeing the result with salmon I also tried mapping to the human genome with bowtie2 and counting with Htseq. Althoug mapping rate increased when mapping to the genome (a 5% sample was about 15%) after counting the miRNA it dropped back to a value similar to salmon.
At this point I'm a bit lost and I dont't know what my next step should be. Can I continue to a differential expression analysis with this low mapping samples? are this samples useless?
Thanks for any help!
You should use an aligner like
bowtie v.1.x
which does ungapped alignments. You should also make sure that you have removed any adapter/extraneous sequences from your data before you align the data. Here is an example miRNA pipeline to try.