Question

Read counts generated from featureCounts

0

Entering edit mode

3.8 years ago

basuanubhav ▴ 140

Hey all, I had a doubt regarding the output generated by featureCounts. So, what would be the difference in output between the following commands:

featureCounts -p -s 2 -F GTF -t exon -a annotation.gtf -o counts.txt sample.bam 

featureCounts -p -s 2 -F GTF -t gene -a annotation.gtf -o counts.txt sample.bam

From what I understand, the second command will only read the lines in the GTF which have 'gene' in the 3rd column while the first command will count the exon lines. Also, the sum of the reads for all exons of a gene should be the same as the number of reads for the gene as a whole. So, should the output file have the same counts for both?

Thanks,

featureCounts • 1.4k views

ADD COMMENT • link updated 3.8 years ago by Carlo Yague 8.9k • written 3.8 years ago by basuanubhav ▴ 140

score 4 · Accepted Answer · 2021-02-04

4

Entering edit mode

3.8 years ago

Carlo Yague 8.9k

From what I understand, the second command will only read the lines in the GTF which have 'gene' in the 3rd column while the first command will count the exon lines.

This is correct. Both will be aggregated on the gene_id metafeature (that is the default for option -g) in a 1 - 1 relationship with -t gene or in a n - 1 relationship with -t exon.

Also, the sum of the reads for all exons of a gene should be the same as the number of reads for the gene as a whole.

This is not necessarily true. -t gene will also count reads mapped on introns. Usually this will account for low read count because introns are often spliced out but there are many cases of intron retention, and in some species (such as mammals) average intron length is much longer than average exon length. Also because of splicing dynamic, it is possible to sequence unspliced or partially spliced pre-mRNA. So in general, you should not expect the option -t exon and -t gene to produce the same counts.

ADD COMMENT • link 3.8 years ago by Carlo Yague 8.9k

0

Entering edit mode

Ah, thanks for the nice explanation. I hadn't thought of the aspect of reads aligning to introns skewing results. So, ideally, for an RNAseq experiment, I would want to use -t exon, correct?

ADD REPLY • link 3.8 years ago by basuanubhav ▴ 140

1

Entering edit mode

Yes, definitely in most cases !

ADD REPLY • link 3.8 years ago by Carlo Yague 8.9k

0

Entering edit mode

Thanks a lot, cheers!

ADD REPLY • link 3.8 years ago by basuanubhav ▴ 140

0

Entering edit mode

PS: unrelated to your question, but usually we do not use the -o option because it is often better to discard read that can not be unambiguously assigned, at least for differential expression analysis.

ADD REPLY • link 3.8 years ago by Carlo Yague 8.9k

0

Entering edit mode

Yes absolulety! Even I dont use the -O option. The -o I have used is just to define the output file.

ADD REPLY • link 3.8 years ago by basuanubhav ▴ 140