Hello, I have to do my first differential expression analysis. The thing is, a company already did it and Im doing it just to practice and compare results. They used DESeq2 and I want to use it too. I got the tables of counts using STAR mapper and cuantifier with the option --quantmode GeneCounts. The experiment is:
5 conditions (3 of them are naive or mock) -plant not infected (NAIVE) control -plant infected (With fly) -plant infected (With bacteria) -plant not infected but exposed to fly (mock) control -plant not infected but exposed to bacteria (mock) control
All these samples were taken at 2, 7, 14 and 21 days dpi, so im thinking about building a count matrix for each dpi and doing 4 differential expression analysis. (After that I will do a temporal analysis, so I just want to compare results)
Is it reasonable to build a matrix for each dpi like this?:
GENE_ID COUNT NAIVE21 INFECTED_FLY21 INFECTED_B21 NOTINFECTED_FLY NOTINFECTED_BACT
Also, I have 3 replicates for each, what do I do with them?? (Im new to analysis and I have no idea what to do with the 3 replicates of each sample, it would be a huge table if I add NAIVE 21a, NAIVE21b, NAIVE 21c , and so on...)
Thank you so much
Pilar
Could you clarify your experimental design?
My guess would by that dpi is Days Post Infection, and you you have 3 biological replicates for each condition (infected with bacteria,naive...) at different times (2 days,7 days...) , if this is okay, the experimental design is crucial for creating a DEseq object ,could you update the question with a more clear explanation of the design so that we can help you
Hello, yes il explain better:
Samples were taken at 2, 7, 14 and 21 days post infection. There are 5 conditions , and 3 plants (or biological replicates) for each condition.
I decided to do a diferential expression analysis for each dpi independently with DESeq2, and I did the following :
sampleTable< -data.frame(row.names=c("Bm14a","Bm14b","Bm14c","BTY14a","BTY14b","BTY14c","Mm14a","Mm14b","Mm14c","MTY14a","MTY14b","MTY14c","N14a","N14b","N14c"), condition=as.factor(c(rep("Bm14",3), rep("BTY14", 3), rep("Mm14", 3), rep("MTY14", 3),rep("N14", 3))))
dds <- DESeqDataSetFromMatrix(countData = cts,colData = sampleTable,design = ~ condition)
Then, for every comparison (there are 7) I did this:
Comp_1<-results(dds, contrast=c("condition","N14","Bm14"))
...
And to count total diferentially expressed genes for each comparison I did:
Comp_1_resSig <- Comp_1[which(Comp_1$padj <0.1),]
head(Comp_1_resSig[order(Comp_1_resSig$log2FoldChange, decreasing = TRUE),])
nrow(Comp_1_resSig)
Does this make sense? Or did I do something wrong?