I'm working on a case/control study and including some other variables such as genotypes. One thing I want to do is a subgroup analysis of the genotypes for just the case group. The brute force method is to subset my data to the case group and run deseq on that. I'm very new to deseq and the very idea of contrasts. But from reading it seems like contrasts offer a more elegant way to do subgroup analysis without having to rerun deseq multiple times on differently subsetted data.
Suppose I have the following variables:
- Case: [Case, Control]
- GeneA: [YY, YN, NN]
- GeneB: [YY, YN, NN]
I run deseq with the design "~ Case + GeneA + GeneB + GeneA:GeneB"
How would I write contrasts that would be the equivalent of
- Subgroup just those with the disease (Case="Case")
- Subgroup just homozygous genotypes: (GeneA = [YY, NN], GeneB=[YY,NN])
- Ask the question: Is the strongest effect due to GeneA, GeneB, or the interaction?
So, for instance, given my design, to subgroup Case=="Case"
, my contrast would be contrast=c(?,?,?)
, etc.
Thanks!
Here is a putative study design to work with:
> fake_study_design = data.frame(
+ Case=sample(c('Case', 'Control'), 10, replace=T),
+ GeneA=sample(c('YY','YN','NN'), 10, replace=T),
+ GeneB=sample(c('YY', 'YN', 'NN'), 10, replace=T)
+ )
> fake_study_design
Case GeneA GeneB
1 Case YY YY
2 Case YN NN
3 Control YY NN
4 Control YY YN
5 Control YN YY
6 Control NN NN
7 Control NN YY
8 Case NN YN
9 Control YY NN
10 Control YY YY
Have you searched for the answer? DESeq2 contrasts is one of the most common questions both here and on Bioconductor support forum. The vignette also has all information that you need. Go to a search engine and search for
deseq2
vignette
. A useful skill in bioinformatics is to know how to seek out information.@Kevin-- Sorry if I seemed too noob for you. I'm pretty well versed in reading manuals. I read the vignette section on contrasts many times. It shows how to pull out the log fold change for two levels of a given covariate. Not how to subset an entire covariate level. Google has not helped me either with a general discussion deseq contrast design. On the other hand, from the sound of your indignation, you must be an expert in this, and could probably help me through the answer. So, please proceed.
I am not an 'expert' of DESeq2 - I am an experienced end-user of it. If anyone other than the developer (Michael Love) calls themselves an expert of DESeq2, then you know that they are lying.
You will have to clarify what you mean when you type 'subgroup' ... ? Usually people want to compare, e.g., Case versus Control. Why have you even got Controls if you are not intending to use them? It looks like you will require a merged variable for
GeneAGeneB
and then have an interaction between it andCase
. You could just clarify what you mean by 'subgroup', though.The point is that we already have the data from the patients. The 0th order question is Case vs. Control. However, we also want to look at the effect of the genes in just the subgroup with the disease (Case). Just one of the many questions we want to ask. We have about a dozen genotypes to look at as well as other variables. Rather than subsetting my data N times and running deseq on each subset, if I can use contrasts to pull out the comparisons I want from a deseq object that is more efficient. I agree that I would need a merged variable based on the "interactions" section in the vignette. I can't figure out how to set one up. Not enough information or examples in the vignette. Have you ever done it?
Hey, I think that I know what you mean now - thanks! I have limited time right now but will look again in a couple of hours. I think that you can create the interaction for
Case:GeneAGeneB
, and then select out different contrasts involving just the cases.In such a case, your contrast could be something like:
The vignette in the interactions part probably could indeed be expanded. However, there are further examples given in the manual page. Take a look at the very bottom of the manual entry for
?results