Hello,
I have a question regarding featureCounts behavior.
Assume my GTF file has 1 gene, comprising 2 transcripts (G1T1, G1T2), with a total of 3 exons (E1,E2,E3):
G1T1 consists of exons E1 and E2, while G1T2 consists of exons E3 and E2.
Assume my SAM file has 4 reads, as shown below.
- R1 overlaps E1;
- R2 is spliced between E1 and E3;
- R3 is spliced between E1 and E2;
- R4 is spliced between E3 and E2.
|____E1____| |____E2____| Transcript G1T1
|____E3____| |____E2____| transcript G1T2
R1------>
R2---------....--->
R3---------....................---->
R4-----.....---->
- When using featureCounts with option
-g transcript_id
, only R1 is assigned and the counts for G1T1 is 1. - When using featureCoutns with option
-g transcript_id -O
, all reads are assigned and the counts for G1T1 is 4 (R1,R2,R3,R4), whereas the count for G1T2 is 3 (R2,R3,R4).
I understand the logic behind allowing multiply overlapping features (-O
), however I would expect that at the transcript level R3 is uniquely assigned to G1T1 and R4 uniquely assigned to G1T2. In other words, I would expect that even though the parts of a spliced read are mapped to multiple features, when the reads are counted at a level where a distinction between metafeatures is possible, this distinction is used to uniquely assign the reads.
I feel that without "-O", the counts are unnecessarily low and with the option -O
the counts are unnecessarily inflated. What am I missing here?
Thanks in advance!