how to calculate gene length for TPM
1
1
Entering edit mode
4 days ago
adi.gershon1 ▴ 10

Hi i have row counts from featurecounts I want to calculate tpm, and for that I want to calculate gene length correctly I'm not sure what is the right way doing it do I need Total genomic span of the gene or Sum of non-overlapping exonic regions

For example running this command on a gtf file is good? awk '$3 == "exon" { match($0, /gene_id "([^"]+)"/, gene); gene_id = gene[1]; length[gene_id] += $5 - $4 + 1 } END { for (id in length) print id, length[id] }' gencode.v34.chr_patch_hapl_scaff.annotation.gtf > gene_lengths_cleaned.tsv

counts tpm reads • 220 views
ADD COMMENT
2
Entering edit mode
4 days ago
ATpoint 86k

featureCounts returns a gene length, use that. There is no naive approach that is better. Here is what it does: https://support.bioconductor.org/p/88133/#88135

The "more correct" way would be the way that tools like salmon uses. They resolve the transcript expression of each gene and then in combination with something like tximport give you the average length of all expressed transcipts. But you don't have that in featureCounts, and there is no naive method that could easily mimic that. So just use what featureCounts gives you.

ADD COMMENT

Login before adding your answer.

Traffic: 1579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6