I recently started working with RNA seq analysis and despite having some expertise in R, I'm very new in the linux environment! I've been trying to analyze miRNA seq data (illumina) from a human cell line and I've not been very lucky. My sequences are about 50 nucleotides long and my workflow has been: adapter removal (cutadapt), mapping with both bowtie and bowtie2 (i'm using Homo_sapiens.GRCh38.ncrna.fa as a reference genome in this step) and finally HTseq-count, trying both GRCh38.p13_genomic.gff.gz and Homo_sapiens.GRCh38.99.gtf.gz reference genome. Although my alignment results from bowtie is showing around 25% of mapping, all my htseq results only give me 0 counts. Am I using the wrong reference genome for aligning or counting (or both)? Any suggestion? Thank you!
So, are you aligning to a transcript database and try to count features based on Genomic data? This doesn't work in that way, either you align to transcriptome and count hits in transcripts or align to the genome and use the genome features to count.
Did you trim your data appropriately? Most miRNA kits have specific instructions to remove adapters. You would also want to stick with
bowtie v.1.x
for ungapped alignments since these miRNA reads are expected to be small (20-25 bp). Check to see howhtseq-count
deals with multi-mapping reads. It may not be counting then for that reason.is there a chance you entered the wrong strand orientation for your reads? Specifically it's the
--stranded
option when you run htseq-count.