Question

DESeq2 design matrix

1

Entering edit mode

5.1 years ago

mickey_95 ▴ 110

Hi,

I am trying to use DESeq2 for differential analysis of ATAC-seq data. I have performed peak calling, obtained a consensus peak data set and, using featureCounts, generated a count matrix with my samples as columns and the peak coordinates as rows. I have two conditions, 4 biological and 2 technical replicates each; so 16 samples in total. I am interested in the differences between my two conditions, so I was using DESeq2::DESeqDataSetFromMatrix with

design = ~condition

Now I am wondering about two things: 1. Should I collapse my technical replicates before running the DE pipeline? 2. Should I also include my biological replicates in the design formula? And also an interaction term (condition:BioRep)?

I am very new to this, so apologies if this is a very basic question. Appreciate input and help!

DESeq2 ATAC-seq DE analysis • 2.7k views

ADD COMMENT • link updated 5.1 years ago by ATpoint 85k • written 5.1 years ago by mickey_95 ▴ 110

1

Entering edit mode

For collapsing technical replicates in DESeq2 workflow, use collapseReplicates (https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/collapseReplicates) function.

ADD REPLY • link 5.1 years ago by cpad0112 21k

score 1 · Answer 1 · 2019-10-31

Yes, I would pool technical replicates together since you really are just interested in the biological reproducibility.

Your biological replicates don't need to be in the design formula. You have two conditions and each biological replicate will be assigned to one of the two conditions (e.g. 4 treated and 4 control).

score 1 · Answer 2 · 2019-10-31

1

Entering edit mode

5.1 years ago

ATpoint 85k

Should I collapse my technical replicates before running the DE pipeline

If technical means sequencing the exact same DNA on different lanes or flow cells then yes, collapse them. You can run plotPCA after normalizing the count matrix with vst to see if they indeed cluster close together (or are more or less identical) to confirm there is nothing odd going on.

Should I also include my biological replicates in the design formula? And also an interaction term (condition:BioRep)? No, the replication itself is listed in the colData that describes your experiment like:

Sample  Condition
foo_rep1    condition1
foo_rep2    condition1
foo_rep3    condition1
bar_rep1    condition2
bar_rep2    condition2
bar_rep3    condition2

The design would be ~Condition and this already contains the replication information. I suggest you go through the DESeq2 (and maybe the edgeR) manual as these have extensive examples on how to make designs plus some useful background infmrmation on differential analysis.

ADD COMMENT • link 5.1 years ago by ATpoint 85k

0

Entering edit mode

Thank you for the quick reply! My technical replicates cluster closely together in a PCA plot, so I will collapse them. I ran de DESeq2 pipeline with the design you suggested and am very surprised to see that no differentially accessible regions pop up. This seems very strange to me and I do not understand what I might be doing wrong. I have been through the manuals of both DESeq2 and edgeR and understand the basics of the theoretical concepts behind them.

ADD REPLY • link 5.1 years ago by mickey_95 ▴ 110

0

Entering edit mode

Well, the reason might be that either there are indeed no DEGs or 2) there is not enough power (=not enough replicates). Can you show the outputs of plotPCA and plotMA? What kind of samples are this, primary or cell line, species, setup?

ADD REPLY • link 5.1 years ago by ATpoint 85k

0

Entering edit mode

These are primary cells from mice, control and drug-treated. The samples are all over the place in the PCA plot, there is no clear clustering of the two groups (ctrl and treated). I guess this variability could be the explanation?

ADD REPLY • link 5.1 years ago by mickey_95 ▴ 110

0

Entering edit mode

Yes, that sounds reasonable towards why there are no DEGs. A look at the MA-plot could help decide if there might be a chance to get DEGs at higher replicate numbers.

ADD REPLY • link 5.1 years ago by ATpoint 85k

0

Entering edit mode

I don't have access to the data at the moment, I will try to upload a picture later during the week. But in the meantime: How would I assess whether the experiment is underpowered using the MA plot?

ADD REPLY • link 5.1 years ago by mickey_95 ▴ 110