When I was using featureCounts for counting RNA-seq reads, I found extremely low rate of Successfully assigned alignments : 134418 (0.4%)
, it's weired, because the hisat2 mapping rate is quite high(94.8%), even the uniquely mapping rate
is 45.0%. I looked up the summary file, and obviously there are a large percent of unmapped reads are from multimapping
and no features
as shown in the figure below:
I checked out other five samples from the same species, and the results were extremely similar.
My code used during mapping and counting is attached:
nohup hisat2 --new-summary -p 3 -x ~/Fman/index/index -1 1.clean_data/31-L-2-A_1.fq.gz -2 1.clean_data/31-L-2-A_2.fq.gz -S 31-L-rep2.sam --rna-strandness RF --dta &
samtools sort -o 31-L-rep2.bam 31-L-rep2.sam
featureCounts -T 10 -p -t exon -g gene_id -s 2 -a ~/Fman/EVM.final.gene.gtf -o 31-L-rep2_featureCounts.txt 31-L-rep2.bam
I'm quite sure my library is strand-specific with d-UTP method. And my sample is tetraploid. My question is : 1. why the mapping result between hisat2 and featureCounts are so different? 2. Did I do something wrong with the parameters I set? 3. Or it's just the normal circumstance for polyploidy species?
Have you visually inspected the alignments? Are they properly nested under exons or are scattered all over? DNA contamination can be a rare but possible problem. It would lead to good alignments but poor assignments/counts.