Question

DESeq2 (or EdgeR) Exploratory Analysis with no Replicates

0

Entering edit mode

6.5 years ago

gabriel.jabud ▴ 40

My pipeline so far is hisat2->featureCounts->DESeq2. I have generated heatmaps after rlog and log2 transformation of the genes with the most variance, which is somewhat meaningful. What I really want to do is compare everything to the control sample and take the genes with the most log fold change in either direction. I've read through the DESeq2 vignette and haven't found a good example of that. Maybe I do this under the design parameter when running DESeqDataSetFromMatrix()? So far I've only set the design parameter to ~condition as I'm a little shaky on how that parameter works.

Maybe this is more of an R problem than a DESeq2 one? Is EdgeR the better tool since it allows you to do some analysis with no biological replicates by setting the dispersion value?

deseq2 edger RNA-Seq • 13k views

ADD COMMENT • link updated 3.9 years ago by Roman Feldbauer • 0 • written 6.5 years ago by gabriel.jabud ▴ 40

2

Entering edit mode

You can do a DESeq2 analysis with no replicates, the stats are just essentially meaningless. As they would be for any other tool or package trying to compare RNA-seq between single samples.

ADD REPLY • link 6.5 years ago by jared.andrews07 ★ 19k

0

Entering edit mode

Yes so I can make heatmaps from the log normalized counts and do things like PCA (and I have). My question is more about what other analysis I can do and how I can compare everything to the control sample in DESeq2. For example, say I want a list of most differentially expressed genes vs control sample, starting with featureCounts matrix which I've imported. Currently I'm not comparing everything to the control, but to each other. So I can get the list of genes with most variance with something like:

library("genefilter")
topVarGenes <- head(order(-rowVars(assay(rld))),20)
mat <- assay(rld)[ topVarGenes, ]
mat <- mat - rowMeans(mat)
pheatmap(mat, show_rownames=TRUE, cluster_cols=FALSE)

but that's not as meaningful as the genes that are most different from control.

Running results(dds) on the data actually gives an error that DESeq2 no longer supports experiments with only one replicate, so I don't get the nice summary that a well designed experiment would give.

ADD REPLY • link 6.5 years ago by gabriel.jabud ▴ 40

1

Entering edit mode

Please use the search function and read through what you can find on the BioC support page and google. I understand it is frustrating to analyse underpowered/unpowered experiments but this question really has been discussed like a hundred times before. Please go through the previous contents and see what you can take away from it. Don't be surprised if this question gets closed by a different moderator for the aforementioned reason.

ADD REPLY • link 6.5 years ago by ATpoint 90k

0

Entering edit mode

Do you have a particular thread in mind? I have looked at all those pages pretty extensively and none really cover what I'm looking for.

ADD REPLY • link 6.5 years ago by gabriel.jabud ▴ 40

0

Entering edit mode

As ATpoint highlights, there is a lot of material / discussion out there. Just search via your search engine of choice. For one, there is the EdgeR manual (see '2.11 What to do if you have no replicates'):

edgeR is primarily intended for use with data including biological replication. Nevertheless, RNA-Seq and ChIP-Seq are still expensive technologies, so it sometimes happens that only one library can be created for each treatment condition. In these cases there are no replicate libraries from which to estimate biological variability. In this situation, the data analyst is faced with the following choices, none of which are ideal. We do not recommend any of these choices as a satisfactory alternative for biological replication. Rather, they are the best that can be done at the analysis stage, and options 2–4 may be better than assuming that biological variability is absent.

As for other ideas other than heatmaps, etc., I am going to put a question back to you: why did you do the experiment in the first place if you did not even know the analysis plan that was going to be carried out? Perhaps I missed this somewhere in your original question (?) Would running a few cDNA micorarrays not have been better?

ADD REPLY • link 6.5 years ago by Kevin Blighe 89k

0

Entering edit mode

I didn't design the experiment, I inherited the data from a previous researcher and want to make use of it. I did read the edgeR manual and I will try to generate useful figures from that next. I guess I should have restated my original question as "How do I view logfold changes vs a control sample with no replicates using DESeq2", it seems like people are misinterpreting my original question.

ADD REPLY • link 6.5 years ago by gabriel.jabud ▴ 40

2

Entering edit mode

Perhaps we are mis-interpreting it; however, I, personally, want to put a stop to the propagation of 'noise' in research. Poor experimental design is one of the key reasons why so many published works that research the same thing are not reproducible.

ADD REPLY • link 6.5 years ago by Kevin Blighe 89k

score 1 · Answer 1 · 2019-09-15

1

Entering edit mode

6.2 years ago

Konstantinos Yeles ▴ 120

In addition to the EdgeR manual, you could use the NOISeq package that has a function precisely for cases without biological replication. manual: https://bioconductor.org/packages/release/bioc/vignettes/NOISeq/inst/doc/NOISeq.pdf chapter 5.1.2 NOISeq-sim: no replicates available

Good luck with the analysis.

ADD COMMENT • link 6.2 years ago by Konstantinos Yeles ▴ 120

0

Entering edit mode

The link says this function is for simulating technical replicates.

ADD REPLY • link 3.9 years ago by Roman Feldbauer • 0