Question

identify DEGs across all conditions and per specific conditions

0

Entering edit mode

15 months ago

camillab. ▴ 160

Hi,

I am analyzing a bulk-RNAseq and I want to analyse the dataset using Deseq2. I am very confused so apologies if it's a stupid question. My dataset has 12 samples (3 per condition). the conditions are: treatment and control and 2 time points (0hr, 12hrs). So I wanted to identify A) DEGs across all condition and B) DEGs per condition and across time point (so treat vs control at time point1 and treat vs control at time point2). So I created a file with my metadata that look like with mix column dividing the 12 samples per condition and time:

SampleID Condition Time  mix     
  <chr>    <fct>     <fct> <fct>   
1 C1D20    Control   time1 control1
2 C2D20    Control   time1 control1
3 C3D20    Control   time1 control1
4 P1D20    treat   time1 treat1
5 P2D20    treat   time1 treat1
6 P3D20    treat  time1 treat1

and this is my code:

library( "DESeq2" )
library(ggplot2)

library(readxl)
metaData <- read_excel("Desktop/forR.xls", sheet = "GroupAssignmentsTable")
metaData$Time = factor(metaData$Time)
metaData$Condition = factor(metaData$Condition)
metaData$mix = factor(metaData$mix)#convert into fctors
head(metaData)

countData2 <-  read_excel("Desktop/forR.xls")
#remove columb
countData <- countData2[ -c(1) ] #remove geneid columb

countData[is.na(countData)] <- 0 #na into 0

dds <- DESeqDataSetFromMatrix(countData=countData,
                              colData=metaData, 
                              design= ~ mix)

rownames(dds) <- countData2$Geneid
# run deseq2
dds <- DESeq(dds)

res <- results(dds)
head(results(dds, tidy=TRUE))  #let's look at the results table
summary(res) #summary of results
res <- res[order(res$padj),]
head(res)

and this is what I get:

log2 fold change (MLE): mix treat2 vs control1 
Wald test p-value: mix treat2 vs control1 
DataFrame with 6 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      stat       pvalue         padj
                   <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric>
ENSG00000177469.13   6373.26        5.28825  0.118647   44.5713  0.00000e+00  0.00000e+00
ENSG00000125730.18   4440.13        5.40412  0.159754   33.8278 7.69185e-251 1.32496e-246
ENSG00000163359.17  26483.80        3.36964  0.103306   32.6180 2.28045e-233 2.61879e-229
ENSG00000174498.14   6609.97       -5.43614  0.168546  -32.2531 3.18097e-228 2.73969e-224
ENSG00000171867.18   4485.77        3.38421  0.106702   31.7163 9.25174e-221 6.37463e-217
ENSG00000124749.18   2830.87        6.37862  0.203390   31.3616 6.76592e-216 3.88488e-212

Is this mean that he calculate the log2FC and p-value between treat2 and control1 only? and if so, where are all the other comparison?

I checked also

mcols(res)$description

and I got basically the same results:

[1] "mean of normalized counts for all samples"        "log2 fold change (MLE): mix treat2 vs control1"
[3] "standard error: mix treat2 vs control1"         "Wald statistic: mix treat2 vs control1"        
[5] "Wald test p-value: mix treat2 vs control1"      "BH adjusted p-values"

if I want DEGs for time1 control vs treatment and for time2 control vs treatment do I have to create another file separating those samples? What I am doing wrong?

Thank you
Camilla

R DEseq2 • 1.1k views

ADD COMMENT • link updated 15 months ago by Yogi ▴ 70 • written 15 months ago by camillab. ▴ 160

1

Entering edit mode

@camillab, Please look into Kevin Blighe's response in this post hope that helps.

ADD REPLY • link 15 months ago by luffy ▴ 130

1

Entering edit mode

15 months ago

Barry Digby ★ 1.3k

I know you are using DESeq2 but I have found this paper from limma to be very helpful and well worth investing time in: A guide to creating design matrices for gene expression experiments.

N.B: The section `Studies with multiple factors' exactly mimics your question

ADD COMMENT • link 15 months ago by Barry Digby ★ 1.3k

0

Entering edit mode

Thank you! I will look into it!

ADD REPLY • link 15 months ago by camillab. ▴ 160

score 2 · Accepted Answer · 2023-09-26

2

Entering edit mode

15 months ago

Yogi ▴ 70

Couple of things. (Feel free to correct if I'm misunderstanding).

"DESeq2" is meant for 1v1 comparisons. If you're trying to compare many different conditions simultaneously, you'll have to think a bit deeper statistically.

For example, if you have 3 categories that are ordered (e.g. "healthy control", "normal disease severity", "serious disease severity"), you could build an ordinal logistic regression model using the counts matrix (this makes some statistical assumptions).

Hope this helps!

ADD COMMENT • link 15 months ago by Yogi ▴ 70

0

Entering edit mode

oh thank you! I missed that! now everything make much more sense!

ADD REPLY • link 15 months ago by camillab. ▴ 160

0

Entering edit mode

Glad to hear!

ADD REPLY • link 15 months ago by Yogi ▴ 70