Low featureCounts assigned rate
1
0
Entering edit mode
10 months ago
weather • 0

Hi, I was running my RNA-seq analysis. I finished the hisat2 alignment and got the .bam files. The alignment rate seems to be OK with an average >90%.

However, when I ran the featureCounts for these bam files the assigned rate seemed to be low with an average of 55%. Just wondering if this is normal of if I need some modification for my analysis parameters.

Below are the hisat2 and featureCounts output for the same sample.

hisat2 output:

29441125 reads; of these:
  29441125 (100.00%) were paired; of these:
    3201971 (10.88%) aligned concordantly 0 times
    23447466 (79.64%) aligned concordantly exactly 1 time
    2791688 (9.48%) aligned concordantly >1 times
    ----
    3201971 pairs aligned concordantly 0 times; of these:
      648706 (20.26%) aligned discordantly 1 time
    ----
    2553265 pairs aligned 0 times concordantly or discordantly; of these:
      5106530 mates make up the pairs; of these:
        2960349 (57.97%) aligned 0 times
        1638099 (32.08%) aligned exactly 1 time
        508082 (9.95%) aligned >1 times
94.97% overall alignment rate

And the featureCounts output:

Assigned    21949831
Unassigned_Unmapped 1138346
Unassigned_Read_Type    0
Unassigned_Singleton    891940
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 9894324
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures   2531103
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    0

And this is the parameters that I used for featureCounts:

featureCounts -a "$annotation_file" -o "$output_file" -p --countReadPairs -B -O -T 8 "$bamfile"

Thanks.

featureCounts RNA-seq hisat2 • 593 views
ADD COMMENT
0
Entering edit mode
10 months ago

Typically the two main reasons why reads are unassigned are

  1. Reads are not overlapping with the annotation.
  2. Or because the read has multiple equal mappings

You can check both visually in IGV to see what might be the cause.

50% is might not be that low though, it all depends how good the annotation is, and what kind of RNA has been captured.

ADD COMMENT
0
Entering edit mode

Thanks for answering. I used the gtf file from NCBI and it should be a high-quality reference annotation. I did notice that the Sequence Duplication Levels were pretty high when I ran fastqc, and wondering if that could be the reason.

ADD REPLY
0
Entering edit mode

Duplication Levels were pretty high when I ran fastqc

Probably not. With a counting experiment like RNAseq you expect there to be some duplication (e.g. multiple transcripts from same gene).

Are you using matching genome sequence and annotation? i.e. you generally should not mix and match genome and annotation sources.

Have you checked your alignments in a viewer and are the reads piling up under exons?

ADD REPLY

Login before adding your answer.

Traffic: 1641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6