Help with DESeq2 output
2
0
Entering edit mode
2.6 years ago
randlkc • 0

Hi everyone,

I'm looking for help and guidance with interpreting my DESeq2 output since I cannot figure out what I'm seeing. I've done a bulk RNASeq experiment looking at differences between three materials (A, B, C) and two time-points (Day 0 and 14). I have six groups in total: A_0, A_14, B_0, B_14, C_0 and C_14. I have used the same cell-line for all samples and have four biological replicates per group (n=4). The questions I'm investigating:

  1. Is there a difference in DEGs between A_14, B_14 and C_14 (between materials at the end-point)?
  2. Is there a difference in DEGs between A_0 vs A_14, B_0 vs B_14 and C_0 vs C_14 (time-points within each material)?

I used kallisto for pseudoalignment and looking at the QC output can at least say that I have decent read counts and alignment per sample. Initially, I thought a Wald test would be sufficient since this is not really a time-course experiment (only two time points) and I can specify the contrasts I want for each pair-wise comparison. Following the DESEq2 vignette, I created a new variable for each sample and the contrasts as such:

 dds <- DESeqDataSetFromTximport(Txi_gene, colData=sample_file, design=~group)
 dds <- DESeq(dds)
 dds$group <- factor(paste0(dds$condition, dds$time))
 design(dds) <- ~ group

res1 <- results(dds, contrast=c("group","A_14","B_14"))
res2 <- results(dds, contrast=c("group","A_14","A_0"))

*etc.*

Now my PCA plot is quite disappointing in terms of variance and my dispersion and MA plots are quite strange (see below). I'm not sure what could account for the missing variance, but will have a look at RUVSeq package to see if that might improve things a bit more. For the dispersion and MA plots I'm now wondering if I have not set up the model correctly and if anyone else has seen similar plots?

Thank you in advance for your time and consideration.

PCA plot

Dispersion plot

MA-plot prior to shrinkage

MA-plot shrunken

expression DESEq2 differential RNASeq DEG • 1.2k views
ADD COMMENT
0
Entering edit mode

I think you probably have a batch effect here that manifests at PC2 as samples of the same group separate in the y-axis direction. You can use the mentioned RUVseq to see if you find evidence for that and can compensate for this. The MA-plots probably indicate that you have a lot of genes with consistently low counts. You can filter for e.g. only retaining genes that have at least 10 counts in some samples, or use filterByExpr from edgeR which does this aware of the design.

ADD REPLY
0
Entering edit mode

Thank you for your reply @ATpoint! RUVSeq improved the MA plots slightly but did nothing for the PCA or dispersion plot. Similarly, filtering for the >10 genes had no discernible effect. Should I look into other methods to remove batch effect or accept it as is?

MA plot

ADD REPLY
0
Entering edit mode

Get get a lot of DEGs, what exactly is the problem?

ADD REPLY
0
Entering edit mode

There is no strong evidence of any batch effect, nor of any other effects that may need removing. As mentioned by swbarnes2, if anything, PC1 is picking up an effect of time. I don't see any justification for removing effects that may not even exist - by doing this, you may introduce bias that previously had not existed.

Please show some output from your results tables.

I do agree to ensure that you filter for low count genes prior to any normalisation step.

ADD REPLY
1
Entering edit mode
2.6 years ago

I think you need to consider that your ABC treatment might not do much (though PC1 seems to sort of correlate with time), and RUVSeq might not be able to magically change that.

ADD COMMENT
0
Entering edit mode
2.6 years ago
randlkc • 0

Thank you for your input everyone! Apologies for the late reply, as I was traveling. I do agree that the ABC treatment might not be doing much. However, I still find the dispersion plot a little strange. So Kevin Blighe you do not recommend to do any RUVSeq at all? This is my results output, called with results(dds):

log2 fold change (MLE): group C_0 vs A_14 
Wald test p-value: group C_0 vs A_14 
DataFrame with 23340 rows and 6 columns
         baseMean log2FoldChange     lfcSE       stat      pvalue        padj
        <numeric>      <numeric> <numeric>  <numeric>   <numeric>   <numeric>
A1BG    272.40644      -0.107687  0.345739  -0.311468 7.55445e-01 0.901042476
A1CF      6.80456      -0.607053  0.724083  -0.838374 4.01821e-01 0.679598948
A2M    1457.03024       2.153635  0.472669   4.556325 5.20564e-06 0.000200291
A2ML1    22.42323      -1.117698  0.426748  -2.619106 8.81606e-03 0.063308770
A2MP1     5.12486       2.033289  1.073490   1.894093 5.82126e-02 0.222458122
...           ...            ...       ...        ...         ...         ...
ZYG11A    30.8967     -1.1283068  0.378537 -2.9807074  0.00287583   0.0284164
ZYG11B   857.3701     -0.2375091  0.186873 -1.2709677  0.20374013   0.4745415
ZYX     3178.4129     -0.3161235  0.224067 -1.4108416  0.15829133   0.4118275
ZZEF1    864.6738      0.0189854  0.241094  0.0787466  0.93723418   0.9780265
ZZZ3     907.3641     -0.2388627  0.158636 -1.5057267  0.13213734   0.3692228

Also, in regards to filtering: I thought this was a function inherent to DESeq2 and therefore does not need to be done explicitly?

ADD COMMENT

Login before adding your answer.

Traffic: 2107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6