I am trying to use DESeq2 for differential analysis of ATAC-seq data. I have performed peak calling, obtained a consensus peak data set and, using featureCounts, generated a count matrix with my samples as columns and the peak coordinates as rows.
I have two conditions, 4 biological and 2 technical replicates each; so 16 samples in total. I am interested in the differences between my two conditions, so I was using DESeq2::DESeqDataSetFromMatrix with
design = ~condition
Now I am wondering about two things:
1. Should I collapse my technical replicates before running the DE pipeline?
2. Should I also include my biological replicates in the design formula? And also an interaction term (condition:BioRep)?
I am very new to this, so apologies if this is a very basic question. Appreciate input and help!
Yes, I would pool technical replicates together since you really are just interested in the biological reproducibility.
Your biological replicates don't need to be in the design formula. You have two conditions and each biological replicate will be assigned to one of the two conditions (e.g. 4 treated and 4 control).
Should I collapse my technical replicates before running the DE pipeline
If technical means sequencing the exact same DNA on different lanes or flow cells then yes, collapse them. You can run plotPCA after normalizing the count matrix with vst to see if they indeed cluster close together (or are more or less identical) to confirm there is nothing odd going on.
Should I also include my biological replicates in the design formula? And also an interaction term (condition:BioRep)?
No, the replication itself is listed in the colData that describes your experiment like:
The design would be ~Condition and this already contains the replication information.
I suggest you go through the DESeq2 (and maybe the edgeR) manual as these have extensive examples on how to make designs plus some useful background infmrmation on differential analysis.
Thank you for the quick reply!
My technical replicates cluster closely together in a PCA plot, so I will collapse them. I ran de DESeq2 pipeline with the design you suggested and am very surprised to see that no differentially accessible regions pop up. This seems very strange to me and I do not understand what I might be doing wrong. I have been through the manuals of both DESeq2 and edgeR and understand the basics of the theoretical concepts behind them.
Well, the reason might be that either there are indeed no DEGs or 2) there is not enough power (=not enough replicates). Can you show the outputs of plotPCA and plotMA? What kind of samples are this, primary or cell line, species, setup?
These are primary cells from mice, control and drug-treated.
The samples are all over the place in the PCA plot, there is no clear clustering of the two groups (ctrl and treated). I guess this variability could be the explanation?
Yes, that sounds reasonable towards why there are no DEGs. A look at the MA-plot could help decide if there might be a chance to get DEGs at higher replicate numbers.
I don't have access to the data at the moment, I will try to upload a picture later during the week. But in the meantime: How would I assess whether the experiment is underpowered using the MA plot?
For collapsing technical replicates in DESeq2 workflow, use collapseReplicates (https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/collapseReplicates) function.