Threshold for a value of RPKM considered significant
1
0
Entering edit mode
17 months ago
Chris ▴ 340

Hi all,

My lab don't have money to have biological replicates for bulk-RNA seq, so I can't perform DEG but only get a RPKM table. I found this question being asked but I am not sure if we have any update after a long time. What is the threshold for a RPKM value considered a gene expressed? Do we have any cases that we can't say a gene expressed because it has high RPKM value? Thank you so much!

Minimum Or Optimal Rpkm Value To Find If A Transcript Is Significant

RNA-seq • 2.2k views
ADD COMMENT
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thank you for sharing! So one transcript needs RPKM values at least 0.5, any genes have RPKM value less than 0.5 mean not expressed, is that correct?

ADD REPLY
1
Entering edit mode

Let's just say that it's a first order approximation ;)

ADD REPLY
0
Entering edit mode

Could only one transcript found in a cell can affect a pathway, changes function of a cell?

ADD REPLY
1
Entering edit mode

One transcript copy number per cell could possibly have regulatory activity in cis. We see this with some lncRNAs.

Of course, keep in mind the kinetic effects too: An RNAseq snapshot might show you, on average, one copy number per cell. But transcription isn't that simple. Take into account transcriptional bursting: Perhaps a transcript might have a large burst size but very low burst frequency.

ADD REPLY
0
Entering edit mode

Thank you so much! I am surprized that only one molecular can change a phenotype.

ADD REPLY
1
Entering edit mode

It really depends, and this is the problem. There is imo not any meaningful way to make any statements only based on a table of RPKMs. These thresholds are arbitrary. You will easily get hundreds to thousands more "expressed" genes if you take RPKM of 0.5, 0.1 or 1, and depending on how you calculate RPKM, without that any of these thresholds is better or worse in a justifyable way, because there is no direct justification beyond that a ranked RPKM curve ay show some sort of inflexion at this point. The cell doesnt't care about inflexion points. The density of genes around these thresholds is high, so you easily get more or fewer, depending on cutoff. Some meaningful genes can work at low expression levels, and others might not translate biology at high expression levels. There is really not a reliable way to infer anything on these sorts of ways you're trying. You need a precise question to answer. Blind OMICS exploration might work if you have lots of data and simply want to find and describe some patterns in the data, but this again is barely possible with unreplicated experiments. Especially genes that are not expressed highly tend to have more variability and as such require replication to assess reliability in expression patterns. Not sure what question you're trying to answer, but it reads to me that you're blindly going down a rabbit hole that will end up chasing ghosts. What is the question you want to answer with this RPKM table? Not what you want to hear, but I wonder why you even do experiments that you cannot afford. What if a reviewer asks for replication at some point, how will you tackle such a request?

ADD REPLY
0
Entering edit mode

Thanks ATpoint for your detail reply! The purpose of the experiment is to compare gene expression between 2 conditions and depend on the result, the PI will decide to spend more money to have replicates. So I can't say anything reliable about gene expression using RPKM table?

ADD REPLY
1
Entering edit mode

If you're comparing gene expression, why are you even using RPKM? No one uses that anymore which makes me concerned about what software you're using the generate your gene expression values. And why are you even trying to use thresholding?

In any case, do a search: there are plenty of answered questions on this site about how to do "exploratory" analyses when you don't have replicates.

ADD REPLY
0
Entering edit mode

Thanks dsull! I found some:
DeSeq2 analysis of samples with no replicates?.
DESeq2 (or EdgeR) Exploratory Analysis with no Replicates .

Seem I can use EdgeR or NOISeq but the result is not reliable so it is not worth to try, is that correct? So I can use TPM?

ADD REPLY
1
Entering edit mode

It's not reliable in getting you a list of differentially expressed genes. But of course if you're, say, differentating embryonic stem (ES) cells into liver tissue, one would hope that ES marker genes have higher TPMs in the ES cells and liver marker genes have higher TPMs in liver tissue. You can at least do that sort of exploratory analysis as a "quality control" before investing more money. I think such pilot experiments, if you at least have some sort of positive control validation, are useful for this reason. You can probably even figure these things out with a cheap iSeq run.

On the other hand, if you're doing some completely novel experiment with unknown results, then I have no idea why money was wasted on something that's unreliable since there's really no way to confidently discover anything new.

ADD REPLY
0
Entering edit mode

Thanks dsull. Just to make sure I understand your example. So ES is differentiated into liver cell, then I can compare TPM from ES marker genes with TPM from which genes? And similarly, comparing TPM liver marker genes with TPM from which genes? So can I compare TPM from gene A in condition 1 vs TPM from gene A in condition 2? Would you share how to calculate TPM from raw count table using tximport? I read the vignette but very confused the input files used: https://bioconductor.org/packages/devel/bioc/vignettes/tximport/inst/doc/tximport.html

ADD REPLY
1
Entering edit mode

ok, sorry, i need to end it here. I think you need to take a course on bioinformatics or read a resource or ask a bioinformatician to do the entire analysis for you before proceeding further. A big part of RNAseq is understanding how the analysis works (what's being done to the ATCGs that come out of the machine, what biases might exist, what are the best practice workflows, etc.) and doing it. RNAseq requires domain knowledge of nucleic acid biology, bioinformatics, and the chemistry of sequencing runs. One big challenge of RNAseq is the low number of replicates (in your case, none) which requires much more careful analysis given the high dimensionality of the data.

Doing it incorrectly wastes money, leads to journal papers with wrong results, etc.

Many resources exist including The Biostar Handbook. A bioinformatics e-book for beginners.

Good luck!

ADD REPLY

Login before adding your answer.

Traffic: 1721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6