From the discussions above, I believe you are being asked to test the difference between those samples in group A compared to an average of all the samples. This is, in fact, the traditional (before R) way to test contrasts.
This, it turns out, is actually fairly easy to code up, and simply relies on using a different contrast encoding system for your model matrix.
Create a conditions frame/factor for your groups (A or B) and set its contrasts model to contr.sum
:
cluster <- factor(c("A", "A", "B", "B")
contrasts(cluster) <- contr.sum(2)
You can now create your model matrix as usual:
design = model.matrix(~ 1 + cluster)
When you fit your linear model, your will fit two coefficients, one is the intercept (that is effectively A+B) and the other is the difference for A (or membership of cluster A). There is no need to fit a contrast in your edgeR
workflow, the coefficient of interest will be coef=2 in your glmLRT
.
isn't A versus B+A just a contrast on B-versus-zero? All you'll get is a summary of average expression
Yes. If you wish to compare the expression in A to the expression in A plus the expression in B you are really testing if the expression in group B is zero:
H0: A = A + B
=> A - A = A + B -A => 0 = B
Do you mean that some samples have received treatment A and some samples have received treatment A and B?
Unfortunately they do not receive a treatment or a specific condition and for this reason it seems strange a requirement of this type to me. Anyway: I think they would like the relative expression value of A, i.e. a sort of delta of A over A+B. I cannot figure out the rationale.
I don't mean to pry, but you couldn't give us a bit more detail about the actual study could you? It may simply be that your collaborators have made a mistake in explaining what they want you to compare - they may be asking you to compare expression in the set A against expression in the set A u B, for example, IMO biologists / medics don't talk in terms of fitted coefficients.
No problem!!! I have around 100 breast cancer samples (primary). They performed RNA seq and then they clustered the samples identifying clusters (that here I called groups) of patients. Then They asked me the comparison I already explained.
Were they clustered using one subset of the genes, and now you're running diffex on a separate subset of the genes?
I think you need to talk to them about what their biological question is. There are screen methods where things like A/(A+B) are used as a measure of effect size, but you would test is this was equal to zero, not if A was equal to A + B. And I can't see this being a meaningful comparison in something like RNAseq.
I totally agree with you!
Your post is not a Tutorial, it is a Question, please use the appropriate category.
Let me see if I understood this correctly: you have one sample in group A, one sample in group B, and you want to compare A vs B+A? Does this even make sense?
I agree with you....anyway, suppose you have a gene "x" and you want to perform the DGE analysis on 10 samples of group A and 13 of group B. They asked me: DGA on A versus B plus A itself, i.e. if the expression of x in A is 40 and in B is 20, in A+B is 60, Finally the DGE will be 40 vs 60. Although it is mathematically clear, it is difficult to me to write properly the contrast.