Question

FeatureCounts: Low assigned reads, can I proceed? Do I need to do another pre-processing step?

0

Entering edit mode

2.4 years ago

beginner123 • 0

So I'm basically getting these 'successfully assigned reads" of around 30 - 45%. When I did STAR I got alignment reads of like 70-80% which is really good. I know that FeatureCounts counts the amount of reads (achieved in STAR) appeared to overlap with known genes. And so it's normal that it's lower than the percentage I got for the STAR alignment. However, I don't really get how it gets this low. Is there maybe a fault that I could've prevented while doing fastqc? Like some trimming or something? Would it then increase in percentage? (Assigned reads). And secondly, if I don't need to do anything. What can I give as an explanation for this low percentage? Can I just proceed with my analysis?

Load annotation file /mnt/storage/data/resources/genomes/hg38/hg38.ref ... ||
||    Features : 831683                                                       ||
||    Meta-features : 28278                                                   ||
||    Chromosomes/contigs : 367                                               ||
||                                                                            ||
|| Process BAM file C1.bam...                                                 ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 35409751                                                  ||
||    Successfully assigned reads : **13794555 (39.0%)**                          ||
||    Running time : 0.47 minutes                                             ||
||                                                                            ||
|| Process BAM file C2.bam...                                                 ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 33118524                                                  ||
||    Successfully assigned reads : **14166283 (42.8%)**                          ||
||    Running time : 0.45 minutes                                             ||
||                                                                            ||
|| Process BAM file S1.bam...                                                 ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 31476163                                                  ||
||    Successfully assigned reads : **13113126 (41.7%)**                          ||
||    Running time : 0.42 minutes                                             ||
||                                                                            ||
|| Process BAM file S2.bam...                                                 ||
||    Single-end reads are included.                                          ||
||    Assign reads to features...                                             ||
||    Total reads : 38025040                                                  ||
||    Successfully assigned reads : **12831852 (33.7%)**                          ||
||    Running time : 0.51 minutes                                             ||
||                                                                            ||
||                         Read assignment finished.                          ||
||                                                                            ||
|| Summary of counting results can be found in file "all.counts.summary"

featureCounts LowAssignedReads STAR • 3.2k views

ADD COMMENT • link 2.4 years ago by beginner123 • 0

score 1 · Answer 1 · 2022-12-04

1

Entering edit mode

2.4 years ago

dbpzdbpz ▴ 220

The summarization file, all.counts.summary in your case, gives the numbers of reads assigned to genes/not assigned to genes for different reasons. It can give some clue on the lower-than-expected percentages of assigned reads.

ADD COMMENT • link 2.4 years ago by dbpzdbpz ▴ 220

0

Entering edit mode

This is what it gives me. What do you recommend me to do? I'm new to this, so I'm not quite sure what my next steps should be

Status  C1.bam  C2.bam  S1.bam  S2.bam
    Assigned    13794555    14166283    13113126    12831852
    Unassigned_Unmapped 0   0   0   0
    Unassigned_MappingQuality   19433175    16759053    16007608    22902234
    Unassigned_Chimera  0   0   0   0
    Unassigned_FragmentLength   0   0   0   0
    Unassigned_Duplicate    0   0   0   0
    Unassigned_MultiMapping 0   0   0   0
    Unassigned_Secondary    0   0   0   0
    Unassigned_Nonjunction  0   0   0   0
    Unassigned_NoFeatures   1606056 1623631 1637217 1694758
    Unassigned_Overlapping_Length   0   0   0   0
    Unassigned_Ambiguity    575965  569557  718212  596196

ADD REPLY • link 2.4 years ago by beginner123 • 0

1

Entering edit mode

It seems that more than 50% of reads were not assigned because their mapping quality reported in the BAM files dissatisfied the mapping quality requirement in featureCounts.

Did you specify the minMQS parameter when you ran this function?

ADD REPLY • link 2.4 years ago by dbpzdbpz ▴ 220

0

Entering edit mode

This is what I did, I set the mapping quality score as 10

featureCounts \
    -Q 10 \
    -g gene_name \
    -a /mnt/storage/data/resources/genomes/hg38/hg38.refGene.agat.gtf \
    -o all.counts \
    C1.bam C2.bam S1.bam S2.bam

ADD REPLY • link 2.4 years ago by beginner123 • 0

1

Entering edit mode

Because you specified -Q 10 in the command, featureCounts will only assign reads that have mapping quality scores equal to or higher than 10 in the BAM files. The reads with mapping quality scores lower than 10 were counted to the Unassigned_MappingQuality category in the summary file. It seems that a substantial part of reads in your BAM files had mapping quality scores lower than 10, hence they were not assigned to any gene.

ADD REPLY • link 2.4 years ago by dbpzdbpz ▴ 220

0

Entering edit mode

Yes, indeed. Can I continu with this? Or do I lower my mapping quality? What does this low percentage mean in my further work?