Question

featureCounts difference assigned reads summary file and summed up reads in feature count matrix

0

Entering edit mode

3.6 years ago

Carambakaracho ★ 3.3k

Dear all,

this might be a naive question but my googlefoo fails me. I count reads from a bam, aligend by Star against a custom hg19 genome, after running picard markDuplicates, then counting reads assigned to exons with a slightly customized variation of the NCBI reference annotation gff. The customization is mostly adaption to the genome and propagation of some tags from gene level to exon.

Then I count reads using featureCounts, where $bam are all the bams in the pipeline. There's quite a lot of transcript variants per gene, also multimapping is allowed on purpose and I wanted to catch that with --fraction

featureCounts \
    -p -f -T 4 \
    -O -M --fraction \
    -a hg19.gff \
    -F GTF \
    -t "exon" \
    -g "ID" \
    -s 2 \
    --extraAttributes "toplevel_id,gene,transcript_id,GeneID,gbkey,gene_biotype,description,tag" \
    -o out.tsv \
    ${bam}

When summing up the counts assigned in the count table, they're different to what the corresponding .summary file reports as assigned reads. Is this an known / expected side effect of the fractional count with multiple exons and multimapping?

RNAseq featureCounts • 1.5k views

ADD COMMENT • link 3.6 years ago by Carambakaracho ★ 3.3k

1

Entering edit mode

I'd say that this is a subtle and specialized use case that only the implementer knows for sure. It feels like one of those issues where the various definitions of terms can be reconciled in multiple ways, and the different reports use it differently.

perhaps asking on the software's issue tracker would be a more appropriate

FWIW I would sort of ignore the discrepancy as something "expected"

ADD REPLY • link 3.6 years ago by Istvan Albert 102k

0

Entering edit mode

Thanks, sort of matches my gut feeling. Except the “write-the-developer-part”, obviously…

ADD REPLY • link 3.6 years ago by Carambakaracho ★ 3.3k