What are some references of works that use a TPM expression threshold for filtering samples/genes?
1
0
Entering edit mode
4.9 years ago
n,n ▴ 370

I'm struggling to find papers that use transcripts per million (TPMs) on their pre-processing steps for filtering out non-expressed genes or very low expression genes. I'm aware that filtering is usually recommended with raw read counts as they provide more information to work with for the decision, however sometimes it is not possible to work with the raw read counts. I'm interested more than anything on what authors consider expressed (say TPMs of at least 1 or TPMs of at least 5) and what authors would consider a low expressed gene (say x percent of TPMs for a gene across samples don't meet the expression criteria). I know that the heuristic concept of TPM = 5 is roughly 1 transcript in a cell at any given time exists, but I haven't seen this mentioned in any citable works.

So far I've managed to find this article which investigates tibial nerve samples available in the GTEX project. They filter out genes with median TPM lesser than 0.5 or with max TPM lesser than 1 across samples. The GTEX project is a good example of a situation where you would want to filter by TPM since they already performed high quality processing of raw read counts and researchers may pickup the TPMs from the start. Does anyone know more papers in which filtering is established directly over the TPM counts?

RNA-Seq • 3.4k views
ADD COMMENT
1
Entering edit mode

If I recall correctly a major output of Kallisto is the TPM metric. Maybe try looking at papers that use Kallisto

ADD REPLY
1
Entering edit mode

Related discussion: TPM values of expressed genes

ADD REPLY
1
Entering edit mode
4.8 years ago
dsull ★ 7.0k

In my opinion, it's impossible to reliably determine what exactly may be non-expressed or lowly-expressed without further experiments (e.g. those including spike-ins). We only have heuristics -- which are probably extrapolated from previously published works/experiments (oftentimes without citation), but there is no golden rule and I can't think of any studies that actually reliably validate these heuristics.

See the following blog post (from the kallisto author) for a discussion where the concept of using such thresholds might have arisen: https://liorpachter.wordpress.com/2014/04/30/estimating-number-of-transcripts-from-rna-seq-measurements-and-why-i-believe-in-paywall/

ADD COMMENT

Login before adding your answer.

Traffic: 1909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6