Hi everyone,
I'm looking for help and guidance with interpreting my DESeq2 output since I cannot figure out what I'm seeing. I've done a bulk RNASeq experiment looking at differences between three materials (A, B, C) and two time-points (Day 0 and 14). I have six groups in total: A_0, A_14, B_0, B_14, C_0 and C_14. I have used the same cell-line for all samples and have four biological replicates per group (n=4). The questions I'm investigating:
- Is there a difference in DEGs between A_14, B_14 and C_14 (between materials at the end-point)?
- Is there a difference in DEGs between A_0 vs A_14, B_0 vs B_14 and C_0 vs C_14 (time-points within each material)?
I used kallisto for pseudoalignment and looking at the QC output can at least say that I have decent read counts and alignment per sample. Initially, I thought a Wald test would be sufficient since this is not really a time-course experiment (only two time points) and I can specify the contrasts I want for each pair-wise comparison. Following the DESEq2 vignette, I created a new variable for each sample and the contrasts as such:
dds <- DESeqDataSetFromTximport(Txi_gene, colData=sample_file, design=~group)
dds <- DESeq(dds)
dds$group <- factor(paste0(dds$condition, dds$time))
design(dds) <- ~ group
res1 <- results(dds, contrast=c("group","A_14","B_14"))
res2 <- results(dds, contrast=c("group","A_14","A_0"))
*etc.*
Now my PCA plot is quite disappointing in terms of variance and my dispersion and MA plots are quite strange (see below). I'm not sure what could account for the missing variance, but will have a look at RUVSeq package to see if that might improve things a bit more. For the dispersion and MA plots I'm now wondering if I have not set up the model correctly and if anyone else has seen similar plots?
Thank you in advance for your time and consideration.
I think you probably have a batch effect here that manifests at PC2 as samples of the same group separate in the y-axis direction. You can use the mentioned RUVseq to see if you find evidence for that and can compensate for this. The MA-plots probably indicate that you have a lot of genes with consistently low counts. You can filter for e.g. only retaining genes that have at least 10 counts in some samples, or use
filterByExpr
from edgeR which does this aware of the design.Thank you for your reply @ATpoint! RUVSeq improved the MA plots slightly but did nothing for the PCA or dispersion plot. Similarly, filtering for the >10 genes had no discernible effect. Should I look into other methods to remove batch effect or accept it as is?
Get get a lot of DEGs, what exactly is the problem?
There is no strong evidence of any batch effect, nor of any other effects that may need removing. As mentioned by swbarnes2, if anything, PC1 is picking up an effect of
time
. I don't see any justification for removing effects that may not even exist - by doing this, you may introduce bias that previously had not existed.Please show some output from your results tables.
I do agree to ensure that you filter for low count genes prior to any normalisation step.