featureCounts with exon_id has counts backwards for reversely stranded genes
0
0
Entering edit mode
2.4 years ago
ksh4718 ▴ 10

Hello. I am trying to count at the exon level using Subread's (v1.6.3) featureCounts for my RNA-seq data (reversely stranded library prep). I noticed that for reversely stranded genes, the exon counts are backwards (i.e. for a 22 exon gene, exon 1 is actually exon 22, exon 2 is exon 21, and so on). Is there a way to fix this?

Here is my code [from part of a loop so $i is my file and $gtf is the path to the gtf file with exon_id:

featureCounts   -p \
                -F GTF \
                -s 2 \
                -a $gtf \
                -O \
                -t exon -g exon_id \
                -o counts/$i'_counts.txt' \
                ../aligned/$i'_mapped.sam'

Attached is a photo of IGV for an example gene and the output counts for this gene. You can see that I expect 0 counts for exons 2-4 and many counts for exon 22 but the data shows it backwards.

IGV screenshot: Thbs2-IGV-screenshot

Counts for the transcript in IGV screenshot: Thbs2-counts

Any guidance is appreciated. I want to ultimately do differential expression at the exon level so summarizing by gene is not an option.

subread exons rna-seq featureCounts • 1.2k views
ADD COMMENT
0
Entering edit mode

Wouldn't you be comparing exon expression between samples but still in the same gene? That should take care of possible problems with the exon numbering convention. Exons are functional at mRNA level, which is why the exon with the most 3' genomic position gets named the first exon, as it's the most 5' in the pre-mRNA. As long as feature IDs are unique and consistent across samples, differential expression should not be affected by the name any feature is given.

ADD REPLY
0
Entering edit mode

@Ram Yes that's true. I thought it would be more clear to present 'exon 1' as the first exon in the mRNA (closest to the start codon)

ADD REPLY
0
Entering edit mode

Exon 1 is still closest to the start codon no matter what the orientation is. The whole purpose of "reverse numbering" exons is to name the exon closest to start codon as exon-1. Am I misunderstanding something?

ADD REPLY
0
Entering edit mode

In the table that I uploaded, ENSMUST00000170872.2.22 is actually exon 1 and ENSMUST00000170872.2.1 is actually Exon 22. This is my issue.

ADD REPLY
1
Entering edit mode

Ah I see. This is going to be a little tricky. You should start by getting a list of genes on the minus strand, then use R to create lookup tables that contain number of exons per transcript. With that, you can do something like corrected_exon_number = abs(current_exon_number - total_exons) + 1 to generate new IDs for the exons.

ADD REPLY
0
Entering edit mode

That's a good idea. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 4205 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6