In my differential expression analysis dataset (comparing promotes of healthy cells vs diseased cells of patients with defined genotypes) I have a few genotypes with only 2 samples instead of the classically used 3.
To give it a try I included them in the makeContrast for LIMMA and it runs without complain. Also, if I check for proteins I would expect to be lower expressed like the ones mutated in the patients, they are significant in those groups (and even specifically in those groups only).
So my question is if this approach is ok and valid or a total NO GO that won't be accepted by any reviewer.
Thanks for your comments! Sebastian
Thanks for your comments, unfortunately I am working on a very rare disease (congenital neutropenia) and its feels already quite an accomplishment to have 8 of the disease genotypes in my cohort. So unfortunately I won't get more and there are no other datasets as we are the first performing this kind of analysis - but thanks a lot for your comments. I will go on with the study and validate the proteome findings on genetic level. But good to hear that you actually consider it ok (with reservations) and don't reject it straight out :)
Especially in your case, with rare diseases, it would be acceptable to use n=2 I think. Good luck.
For a given gene, the within-group variance is assumed constant across the groups. And you've a range of different groups, some (? most) with >= 3 samples, so this isn't a classical n=2 experimental-design. Even within a given n=2 versus n=2 contrast, the study's a bit better than it would be if you only had the samples for those two groups. As a result, you can get pretty good estimates in the expeirment as described and it should be acceptable.