Counting gene vs exons in featureCounts
0
2
Entering edit mode
2.2 years ago
Meisam ▴ 250

I know that for RNASeq the default option is to count reads based on exons (-t 'exon' in case of counting with featureCounts), and based on my understanding from previous questions like here and here and -poorly described- subread documentation I have realised that by counting the reads based on genes (-t 'gene') we would end up including transcripts aligned to intron regions to our final count table for each gene as well. If this is correct I would assume that by changing 'exon' to 'gene' one should end up with more counts for most genes (both cases we summarise at gene level or -g 'gene_id') and overall more read counts, however it seems that by setting feature type to 'gene' we actually end up with less reads assigned:

analysing a test BAM file with:

featureCounts -p --countReadPairs -M -T 8 \
        -t 'gene' \
        -g 'gene_name' \
        -a gencode.v38.annotation.gtf \
        -o test.txt \
        test.bam

would result in:


//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file gencode.v38.annotation.gtf ...                        ||
||    Features : 60649                                                        ||
||    Meta-features : 59385                                                   ||
||    Chromosomes/contigs : 25                                                ||
||                                                                            ||
|| Process BAM file test.bam...                                               ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 30471464                                             ||
||    Successfully assigned alignments : 24302041 (79.8%)                     ||
||    Running time : 0.29 minutes                                             ||

while analysing the same BAM file with same arguments except for default exon feature type:

featureCounts -p --countReadPairs -M -T 8 \
        -t 'exon' \
        -g 'gene_name' \
        -a gencode.v38.annotation.gtf \
        -o test.txt \
        test.bam

results in 6% much more assigned reads:

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file gencode.v38.annotation.gtf ...                        ||
||    Features : 1499012                                                      ||
||    Meta-features : 59385                                                   ||
||    Chromosomes/contigs : 25                                                ||
||                                                                            ||
|| Process BAM file test.bam...                                               ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 30471464                                             ||
||    Successfully assigned alignments : 26053717 (85.5%)                     ||
||    Running time : 0.25 minutes                                             ||

Could someone please explain to me what I'm missing here?

RNA-Seq subread annotation featureCounts • 2.2k views
ADD COMMENT
0
Entering edit mode

Yeah , it is expected right. when you summarise against gene exon-exon spanning reads are counted once but in the exon level you count as twice. Since the split reads originated from two exons. Check the junction reads please.

ADD REPLY

Login before adding your answer.

Traffic: 1857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6