Question

Feature Counts - potential bug with outfile.jcounts

0

Entering edit mode

7 months ago

chrisk • 0

Greetings Biostars community and those familiar with Feature Counts,

We have one RNA sequencing, paired-end, 150bp file for which we estimate gene and junction counts using Feature Counts.

Feature Counts version: 2.0.8

Feature Counts code:

$FCOUNTS -a GCF_000001405.40_GRCh38.p14_genomic.gtf -G GCF_000001405.40_GRCh38.p14_genomic.fna  -T 8 -J -M -s 0 -p --countReadPairs --largestOverlap -o outfile infile.bam

We have documented the following instance in a junction counts output file (outfile.jcounts) :

Genes:

PNKD TMBIM1

Situation (see attached picture in IGV):

PNKD gene overlaps TMBIM1 gene, and all transcripts.
All TMBIM1 transcripts are in the PNKD intron.
All TMBIM1 junction reads ~1000 are splitting between PNKD and TMBIM1 for multiple intron junctions.
In outfile.jcounts PNKD is the primary gene, however has no exons located in this region.
When we remove the PNKD gene from the GTF and recount with Feature Counts, the reads are assigned perfectly to TMBIM1, as the primary gene with NA as secondary gene, as it should.
A check of gene counts using exon as feature correctly assigns counts to genes. This appears to be only a junction counting issue.
The .gtf and .fna are Refseq 1405.40 files (as noted above) and are untouched.
We use this as a simple verification of raw junction counts before analysing PSI.
As you will also note, the reverse is also true with PNKD exon-exon junctions sharing read counts with a partially overlapped gene CATIP-AS2.
We have encountered others and believe this may be a widespread event.

Here is the outfile.jcounts of that region:

PNKD    NA  NC_000002.12    218270602   +   NC_000002.12    218271381   +   17
PNKD    NA  NC_000002.12    218270602   +   NC_000002.12    218271424   +   4
PNKD    NA  NC_000002.12    218271549   +   NC_000002.12    218272570   +   14
PNKD    TMBIM1  NC_000002.12    218275621   -   NC_000002.12    218276026   -   74
PNKD    TMBIM1  NC_000002.12    218276079   -   NC_000002.12    218277004   -   46
PNKD    TMBIM1  NC_000002.12    218277099   -   NC_000002.12    218277366   -   90
PNKD    TMBIM1  NC_000002.12    218277453   -   NC_000002.12    218277633   -   96
PNKD    TMBIM1  NC_000002.12    218277670   -   NC_000002.12    218277935   -   73
PNKD    TMBIM1  NC_000002.12    218277974   -   NC_000002.12    218278515   -   123
PNKD    TMBIM1  NC_000002.12    218277992   -   NC_000002.12    218278515   -   1
PNKD    TMBIM1  NC_000002.12    218278118   -   NC_000002.12    218278515   -   2
PNKD    TMBIM1  NC_000002.12    218278565   -   NC_000002.12    218279038   -   128
PNKD    TMBIM1  NC_000002.12    218279091   -   NC_000002.12    218279289   -   101
PNKD    TMBIM1  NC_000002.12    218279091   -   NC_000002.12    218280026   -   6
PNKD    TMBIM1  NC_000002.12    218279353   -   NC_000002.12    218280026   -   109
PNKD    TMBIM1,MIR6513  NC_000002.12    218280126   -   NC_000002.12    218281940   -   26
PNKD    TMBIM1,MIR6513  NC_000002.12    218280126   NA  NC_000002.12    218292466   NA  34
PNKD    TMBIM1  NC_000002.12    218282181   -   NC_000002.12    218284039   -   1
PNKD    TMBIM1  NC_000002.12    218282181   -   NC_000002.12    218285763   -   1
PNKD    TMBIM1  NC_000002.12    218282181   -   NC_000002.12    218287207   -   1
PNKD    TMBIM1  NC_000002.12    218282181   NA  NC_000002.12    218292466   NA  71
PNKD    LOC105373881    NC_000002.12    218316364   +   NC_000002.12    218385147   +   3
PNKD    LOC105373880,LOC105373881   NC_000002.12    218316883   +   NC_000002.12    218317955   +   2
PNKD    CATIP-AS2   NC_000002.12    218323428   NA  NC_000002.12    218341591   NA  5
PNKD    CATIP-AS2   NC_000002.12    218323431   +   NC_000002.12    218339783   +   52
PNKD    CATIP-AS2   NC_000002.12    218339898   +   NC_000002.12    218340029   +   27
PNKD    CATIP-AS2   NC_000002.12    218340141   +   NC_000002.12    218340728   +   8
PNKD    CATIP-AS2   NC_000002.12    218340786   +   NC_000002.12    218341534   +   5
PNKD    CATIP-AS2   NC_000002.12    218341626   +   NC_000002.12    218341981   +   12
PNKD    CATIP-AS2   NC_000002.12    218342144   +   NC_000002.12    218343500   +   12
PNKD    CATIP-AS2   NC_000002.12    218343586   +   NC_000002.12    218344455   +   15
PNKD    CATIP-AS2   NC_000002.12    218344570   +   NC_000002.12    218344808   +   28

Has anyone encountered this before and know where things have gone wrong? It would be great to hear from Prof Smyth on this as well, just in case we have missed something.

Thanks in advance, Chris.

IGV_PNKD_TMBIM1

FeatureCounts • 1.4k views

ADD COMMENT • link 7 months ago by chrisk • 0

0

Entering edit mode

These two genes are on opposite strands. Based on your command you appear to have unstranded data or are doing the counting with that setting (-s 0). If you have strand specific RNA libraries you should use the correct option.

2       ensembl_havana  gene    218269651       218346793       .       +       .       gene_id "ENSG00000127838"; gene_version "15"; gene_name "PNKD"; gene_source "ensembl_havana"; gene_biotype "protein_coding";
2       ensembl_havana  gene    218274197       218292586       .       -       .       gene_id "ENSG00000135926"; gene_version "15"; gene_name "TMBIM1"; gene_source "ensembl_havana"; gene_biotype "protein_coding";

ADD REPLY • link 7 months ago by GenoMax 152k

0

Entering edit mode

Thanks for the comment @GenoMax. We too note, those genes are on opposite strands. For context, we have an unstranded library. Cheers, Chris.

ADD REPLY • link 7 months ago by chrisk • 0

0

Entering edit mode

Tagging Gordon Smyth

ADD REPLY • link 7 months ago by GenoMax 152k

0

Entering edit mode

Thank you GenoMax for tagging Gordon Smyth to this post. Looking forward to his feedback (as well as the biostars' community), on what might be occuring.

Regards, Chris

ADD REPLY • link 7 months ago by chrisk • 0

1

Entering edit mode

I am not the appropriate person to respond to a detailed question of this type. If you want to submit a (potential) bug report to the featureCounts authors, please post to the Bioconductor forum, as the Subread documentation advises you to do. Biostars is for getting opinions/experience from the Biostats community, not for communicating with the featureCounts authors.

ADD REPLY • link 7 months ago by Gordon Smyth ★ 8.2k

0

Entering edit mode

Thanks for the note @gordonsmyth.

Will open this up as a potential issue on Bioconductor as advised. I was hesitant to post on Bioconductor because I still I have an outstanding query for Feature Counts : https://support.bioconductor.org/p/9156945/

Here is the Bioconductor link for the query if anyone interested in following the thread : https://support.bioconductor.org/p/9160807/

Chris

ADD REPLY • link 7 months ago by chrisk • 0

0

Entering edit mode

What do the alignments say for these read pairs?

Is it possible that you have read pairs that start on features assigned to PNKD then end on those that are assigned to TMBIM1?

My expectation was always that featureCounts tabulates what exists in the alignments, and if there are reads that map across incompatible features then it will report those as such,

ADD REPLY • link 7 months ago by Istvan Albert 102k

0

Entering edit mode

Hi Istvan Albert

Thank you for the feedback.

Sorry, it is hard to see the span of reads on the IGV screenshot.

Here is an alternative shot; viewed as pairs, and sorted by start coordinate.

The paired end reads all start and end within the gene region of TMBIM1. The paired end reads over junctions are within the intronic region of 2 of PNKD's 5 transcripts. The 3 other transcripts are completely up/downstream of the TMBIM1 gene domain.

When we tested count calculation at gene level, (where the gene is meta, and exon is feature) the counts on genes look correct. There are ~1000 reads to count unfortunately.

When we remove the PNKD gene from the GTF and rerun Feature Counts, those junction reads are assigned perfectly to TMBIM1, as the primary gene, with NA as the secondary gene.

Have you encountered this before?

When we count the same file for junctions with Regtools, those junctions are given the correct counts, and PNKD does not have any junctions counted in that region (as expected).

As Gordon suggested, will move this to Bioconductor. Link is in the reply to Gordon if you are interested.

Cheers, Chris

PNKD_TMBIM1_closeup

ADD REPLY • link 7 months ago by chrisk • 0