Hi,
I am analyzing a bulk-RNAseq and I want to analyse the dataset using Deseq2
. I am very confused so apologies if it's a stupid question. My dataset has 12 samples (3 per condition). the conditions are: treatment and control and 2 time points (0hr, 12hrs). So I wanted to identify A) DEGs across all condition and B) DEGs per condition and across time point (so treat vs control at time point1 and treat vs control at time point2). So I created a file with my metadata that look like with mix column dividing the 12 samples per condition and time:
SampleID Condition Time mix
<chr> <fct> <fct> <fct>
1 C1D20 Control time1 control1
2 C2D20 Control time1 control1
3 C3D20 Control time1 control1
4 P1D20 treat time1 treat1
5 P2D20 treat time1 treat1
6 P3D20 treat time1 treat1
and this is my code:
library( "DESeq2" )
library(ggplot2)
library(readxl)
metaData <- read_excel("Desktop/forR.xls", sheet = "GroupAssignmentsTable")
metaData$Time = factor(metaData$Time)
metaData$Condition = factor(metaData$Condition)
metaData$mix = factor(metaData$mix)#convert into fctors
head(metaData)
countData2 <- read_excel("Desktop/forR.xls")
#remove columb
countData <- countData2[ -c(1) ] #remove geneid columb
countData[is.na(countData)] <- 0 #na into 0
dds <- DESeqDataSetFromMatrix(countData=countData,
colData=metaData,
design= ~ mix)
rownames(dds) <- countData2$Geneid
# run deseq2
dds <- DESeq(dds)
res <- results(dds)
head(results(dds, tidy=TRUE)) #let's look at the results table
summary(res) #summary of results
res <- res[order(res$padj),]
head(res)
and this is what I get:
log2 fold change (MLE): mix treat2 vs control1
Wald test p-value: mix treat2 vs control1
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000177469.13 6373.26 5.28825 0.118647 44.5713 0.00000e+00 0.00000e+00
ENSG00000125730.18 4440.13 5.40412 0.159754 33.8278 7.69185e-251 1.32496e-246
ENSG00000163359.17 26483.80 3.36964 0.103306 32.6180 2.28045e-233 2.61879e-229
ENSG00000174498.14 6609.97 -5.43614 0.168546 -32.2531 3.18097e-228 2.73969e-224
ENSG00000171867.18 4485.77 3.38421 0.106702 31.7163 9.25174e-221 6.37463e-217
ENSG00000124749.18 2830.87 6.37862 0.203390 31.3616 6.76592e-216 3.88488e-212
Is this mean that he calculate the log2FC and p-value between treat2 and control1 only? and if so, where are all the other comparison?
I checked also
mcols(res)$description
and I got basically the same results:
[1] "mean of normalized counts for all samples" "log2 fold change (MLE): mix treat2 vs control1"
[3] "standard error: mix treat2 vs control1" "Wald statistic: mix treat2 vs control1"
[5] "Wald test p-value: mix treat2 vs control1" "BH adjusted p-values"
if I want DEGs for time1 control vs treatment and for time2 control vs treatment do I have to create another file separating those samples? What I am doing wrong?
Thank you
Camilla
@camillab, Please look into Kevin Blighe's response in this post hope that helps.