Question

featureCounts: extremely low rate of 'Successfully assigned alignments'

0

Entering edit mode

5.0 years ago

Yi Chu • 0

When I was using featureCounts for counting RNA-seq reads, I found extremely low rate of Successfully assigned alignments : 134418 (0.4%), it's weired, because the hisat2 mapping rate is quite high(94.8%), even the uniquely mapping rate is 45.0%. I looked up the summary file, and obviously there are a large percent of unmapped reads are from multimapping and no features as shown in the figure below:

enter image description here

I checked out other five samples from the same species, and the results were extremely similar.

enter image description here

My code used during mapping and counting is attached:

nohup hisat2 --new-summary -p 3 -x ~/Fman/index/index -1 1.clean_data/31-L-2-A_1.fq.gz -2 1.clean_data/31-L-2-A_2.fq.gz -S 31-L-rep2.sam --rna-strandness RF --dta & samtools sort -o 31-L-rep2.bam 31-L-rep2.sam featureCounts -T 10 -p -t exon -g gene_id -s 2 -a ~/Fman/EVM.final.gene.gtf -o 31-L-rep2_featureCounts.txt 31-L-rep2.bam

I'm quite sure my library is strand-specific with d-UTP method. And my sample is tetraploid. My question is : 1. why the mapping result between hisat2 and featureCounts are so different? 2. Did I do something wrong with the parameters I set? 3. Or it's just the normal circumstance for polyploidy species?

RNA-Seq featureCounts • 5.1k views

ADD COMMENT • link 5.0 years ago by Yi Chu • 0

0

Entering edit mode

Have you visually inspected the alignments? Are they properly nested under exons or are scattered all over? DNA contamination can be a rare but possible problem. It would lead to good alignments but poor assignments/counts.

ADD REPLY • link 5.0 years ago by GenoMax 152k

score 2 · Answer 1 · 2020-07-09

You are mixing up two concepts: hisat2 is mapping the reads to a reference genome, featurecounts is assigning mapped reads to genomic features - typically, genes. So, even when one finds a good mapping rate, it doesn't necessarily mean one will get high high counts for the annotated features.

There are multiple factors that can cause this, and you will have to investigate further to discover the cause. You said the species you are interested is a tetraploid, do you know if the reference genome is an haploid, diploid, or tetraploid representation? Did you check if the genome annotation has duplicated feature ID in different chromosomes, or many overlapping features? Did you check if featurecounts -s 0 or -s 1 improve assignment rate?