Hi,
I am currently trying to use DeSeq2 to look at differential abundance in my OTU data. My problem is that I have a small data set (18 samples on total) with only two biological replicates per group (3 groups, on 3 different days-example shown below for day 3);
S19= E.ten (Infected), Day3, S20=E.ten (Infected), day3, 21= E.max (Infected), Day3, S22=max (Infected), day3, S23= Control, Day7, S24=Control, day3,
In order to analyse differences between, for example, the E.ten infected group and the Control I used the following code: Taking only day3 samples and comparing one of the infected groups to the controls.
library(DESeq2)
directory<-"/Users/sarah/16s/NGS/Qiime_Nov_2015/DeSeq2_practice"
sampleFiles<-c("Sample19_E.ten_infected.txt","Sample20a_E.ten_infected.txt","Sample23a_Cont_Uninfected.txt","Sample24a_Cont_Uninfected.txt")
sampleCondition<-c("treated","treated","untreated","untreated")
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("untreated","treated"))
dds<-DESeq(ddsHTSeq)
res<-results(dds)
res<-res[order(res$padj),]
head(res)
mcols(res,use.names=TRUE)
write.csv(as.data.frame(res),file="sim_condition_treated_results_deseq2.csv")
The CSV file returns the wald test p-value, untreated vs treated.
It is this P value that I use to indicate if there is statistical significance here?
Or is there a better way to do this?
Thank you,
Sarah
what you should use for statistical significance is the adjusted p-values (padj), last column, that accounts for multiple testing.
apart from this, I don't see any mistake in your approach. it would probably be useful to analyze all your data points together adding "time" as a variable in your linear model. but that depends on what your aim is.