Entering edit mode
9.3 years ago
CuriusScientist
▴
50
I have three samples each with 3 biological replicates i.e. one control and other at dose 12ug/ml and 25ug/ml at 24hr
> library('DESeq2')
> directory<-"./data"
> sampleFiles <- grep("Htseq",list.files(directory),value=TRUE)
> sampleCondition<-c("treated","treated","treated","treated","treated","treated","untreated","untreated","untreated")
> sampleGroup <-c ("A","A","A","B","B","B","C","C","C")
> sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition, group=sampleGroup)
> sampleTable
sampleName fileName condition group
1 24h_12ug_1.count 24h_12ug_1.count treated A
2 24h_12ug_2.count 24h_12ug_2.count treated A
3 24h_12ug_3.count 24h_12ug_3.count treated A
4 24h_25ug_1.count 24h_25ug_1.count treated B
5 24h_25ug_2.count 24h_25ug_2.count treated B
6 24h_25ug_3.count 24h_25ug_3.count treated B
7 24h_ctrl_1.count 24h_ctrl_1.count untreated C
8 24h_ctrl_2.count 24h_ctrl_2.count untreated C
9 24h_ctrl_3.count 24h_ctrl_3.count untreated C
What should be good design if I want to see the differential expressed genes between treated vs untreated as well between different groups?
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~group)
OR
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
What will be the difference between the two designs?
I tried the following design but it gave me error
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~group + condition + condition:group)
Error in DESeqDataSet(se, design = design, ignoreRank) : the model matrix is not full rank, so the model cannot be fit as specified. one or more variables or interaction terms in the design formula are linear combinations of the others and must be removed
How will the design will change if I add three samples each with 3 biological replicates i.e. one control and other at dose 12ug/ml and 25ug/ml at 6hr
Yes, I posted at SEQanswers but was unaware of the fact that I can't delete that post.
Thanks for the reply which will lead to better understanding of DESeq2 model design
I made the model according to following design
gave me results
However, when I tired contrast for trated vs untreated comparison it gave me an error
When I tried
resultsNames(object)
I got following result which didn't haveconditiontreated
andconditionuntreated
Another query:
is
results(dds, contrast=c("group","A","C"))
EQUAL TOresults(dds, contrast=c("group","C","A"))
? As I know first value is numerator and second one is denominator. If not, then how to decide which one to have as numerator or denominator?To compare conditions you would compare the average of groups A and B with group C (a vector of
c(0.5, 0.5, -1)
would probably work). You can't usecontrast=c("condition", ...)
becausecondition
isn't in your model.Yes swapping A and C in the contrast will produce essentially the same results, just with the sign on the log fold change swapped (since you're swapping the numerator and denominator).
Thanks for the reply.
When I used
results(dds, contrast=c(0.5,0.5,-1))
I got an errorBut it worked with
results(dds, contrast=c(0,0.5,0.5,-1))
Is this right?
I didn't get why we used -1 instead of 1. can you explain more about it.
Now if
sampleTable
looks likeThe comparison of the groups will be done as previously
to compare the treated versus untreated will the following command be correct
Correct me if I am wrong. In the first scenario, when we had only one time and different dose,
res
will give differential expressed genes due to different treatment type. However, when in second case, when I added samples from 6 hour thenres
will give differential expressed genes across different treatment and time points. Moreover, we will use contrast if we want to compare different groups.I guess resultNames() includes the intercept, I didn't check. The values in the vector are coefficients. So to see if the average of A and B are different from see, we need to ask if
(A/2)+(B/2)-C != 0
. So, you have coefficients of 0.5, 0.5, and -1. The coefficients should always sum to 0 (otherwise you'll get wrong p-values).The second contrast would then be
c(0,0.25, 0.25, -0.5, 0.25, 0.25, -0.5)
, which corresponds to(A+B+D+E)/4 - (C+F)/2
. You might want to consult/collaborate with a local statistician as your models get more complex as it's otherwise very easy to get confused by these.thanks for the reply.
now things are getting more clear. I can certainly collaborate with a statistician but I need to understand things myself.
will read "Statistical Models" by Freedman mentioned by you in order to get improve my stats.
thanks for all the help.
Hello,
I have a nearly identical experimental design as imsharmanitin; 2 treatment groups and 2 batches, but one of the batches is exclusively one treatment.
It is a poor experimental design, but unfortunately it is the data that I currently must work with. For Devon's point #2 above: I do want to see "What's the difference due to treating at all?" To account for my the potential batch effect of the B batch, I am using the design formula:
The results give me many less DE genes than if I simply ignored the batches and only use ~ treatment. This makes sense, because according to PCA and clustering, there is a batch effect in my samples. I've read the DESeq2 manual and many posts, but am not a statistician and would love to hear feedback for the design I'm using here. Thank you!