I've looking for and answer on similar questions but i don't find what i suppose it's the problem. First I have 9 files of smallRNA-seq reads from human. I have aligned them with Bowtie2. I got a .sam file for each sample. Now I am counting with featureCounts. The results show 0% assigned reads for all the files. For alignment i use Ensemble human genome (GRCh38) My .gff3 file is from mirBase v22 (hsa.gff3) I used FeatureCounts with the following command:
featureCounts - t miRNA -g 'Name' -a /path/to/hsa.gff3 -o /path_to_all/*.sam
Similar Output for 9 samples:
|| Process SAM file sample_name.sam...
|| Single-end reads are included.
|| Total alignments : 58350348
|| Successfully assigned alignments : 26377 (0.0%)
|| Running time : 4.79 minutes
Im guessing your question is, why do you have ~58m reads aligned but on ~26k reads assign to features?
The answer is probably because your gff file combined with your sequencing strategy. Try using a more "generic" gff file to see what's been sequenced/aligned.
But another issue i see is that you're using Bowtie2, which is not splice-aware. I'm not sure about smallRNAs but you're better off using STAR or another splice-aware aligner.
Actually for microRNA's you want to align without gaps since you are looking at small reads. Typically they will be 20-30 bp. So using
bowtie v.1.x
would be a better choice as an aligner.Thanks for the correction. What's the difference between micro and small RNA? or are they different terms for the same thing.
Small RNA are a superset of all (<200 nt) where as miRNA are much smaller (~22 nt) and thus require un-gapped alignment to detect.
Ok, thanks for the advice. I'll try using the Homo_sapiens.GRCh38.98.gtf to check more in detail. Also, the main reason for choosing Bowtie2 came for this publication: doi/10.1261/rna.055509.115. I also try different tools and aligners and got better results with Bowtie2 as the publication suggest.
Couple of things to check in this situation:
What % of reads are uniquely mapped? Remember that featureCounts doesn't count things that are secondary mappings or multimappings. Is your sequencing paired or single, because bowtie has issues when you do paired end and the two ends overlap too much, which would almost certainly always be the case with miRNA-seq.
Finally, its possible that your reads are running over the end of the miRNA annotation, in which case featureCounts will ignore them. I think there is a setting to tell it not to do this.
My library is single-end. I already try the advice from Amar ("generic gff file") and now i figured out that my reads got lots of information of snoRNA, so that's the main reason of low % when i use hsa.gff3. Anyway, thank you for your answers.
Its completely normal for a large fraction of your reads to be snoRNA or snRNA or other categories of small RNA, but I still wouldn't expect the amount mapping to miRNA to be that small.
Did you trim the reads before mapping? What was the post trimming size distribution (as measured by fastqc)?