Question

best value of lfc threshold

1

Entering edit mode

6.9 years ago

rthapa ▴ 90

What is the best value to assign for lfc threshold while using DESeq2 package? With 1 as lfc threshold, I got more than 3000 upregulated genes. Any suggestion please? Thanks

RNA-Seq • 4.5k views

ADD COMMENT • link updated 6.9 years ago by Kevin Blighe 88k • written 6.9 years ago by rthapa ▴ 90

score 6 · Answer 1 · 2018-01-13

In DESeq2, the 'lfc' values are on the log [base 2] scale (log2fc)..

This is an open-ended question. Ask 100 people and you'll get very different answers.

Log2fc of 1 is equivalent to linear fold change of 2
Log2fc of 2 is equivalent to linear fold change of 4
Log2fc of 3 is equivalent to linear fold change of 8

Each person appears to choose a cut-off value that relates to whatever the first trusted person in their careers told them. The mistake that these people then make is in rigidly adhering to this cut-off and in thinking that it's the only answer. In some cases, people do not even use any cut-off for fold-change and just use adjusted P-values (Q values) and then rank the statistically significant genes based on fold-change. As I recall, the first trusted voice in my own career told me: 'FDR Q<0.05 and absolute log2fC>2', but that was during a time when RNA-seq was not even available.

There really is no answer, though, and it depends on many factors, including:

The normalisation type (with the normalisation method(s) that produce FPKM/RPKM expression levels, unrealistically large log2fc values will be observed; with quantile or geometric normalisation, as used in DESeq2, log2fc values will be lower than with FPKM experssion levels, and will be balanced between negative and positive fold-changes)
how many genes you want to include for downstream analysis
previous literature of how many transcripts to expect in such a comparison that you're conducting
the adjusted P value that you are using for cut-off. For example, even at FDR Q<0.05 and log2fc=2, many of the transcripts will not be that much different when you visualise the normalised 'counts' between your comparisons (this comment only has validity in certain experimental setups though)
the variance of your data (high variance = unreliable log2fc values in any setting)

So, the message? - there is absolutely no standard cut-off. Use what is most appropriate for your data and what works best.

Kevin