Hi! I've just got some RNA-seq data from a biologist. It is 4 patients, each tumor tissue and normal tissue, no replicates, so 8 samples together.
I don't like the fact of not having biological replicates, but I'm thinking this way: I'm interested in genes differentially expressed in the same way in all 4 tumors. So can't I just treat these data as one experiment with 4 biological replicates? There will probably be much more variance between the samples then there normally is between biological replicates (In this case, I would imagine biological replicates as 2 or more samples from the same (tumor or normal) tissue from the same patient) but still, do you think this is possible? i.e. which of these approaches would you suggest? :
a) Use my favorite differential expression software and simply use my patients as biological replicates.
b) Count fold change in genes' read number separately for each patient and try to perform statistic tests myself (possibly some "special" statistics?)
c) Use my favorite differential expression software for each patient separately as it is not so tragic not to have replicates and report genes found significant in all 4 patients.
d) Only count fold change in genes' read number separately for each patient, as it is a disaster not to have replicates and my favorite differential expression software would not give me meaningful results and then prey subsequent lab tests will find some of the genes top scoring in all 4 patients interesting.
e) Something else.
Thanks a lot!
Why can't you interpret your samples as biological replicates? I think they are. Your assumption about increased variability compared to samples from single patient is possibly correct, reducing the power to detect something, however if you do find some genes differentially expressed, they are more transferable. This scenario is definitely better than having only 4 samples from a single patient. So I'd say a) and go ahead (possibly b) too).
That's a completely normal setup for a cancer experiment. There's no need for per-patient replicates because you really aren't interested in finding (just controlling for) per-patient differences. As Michael mentioned, option (a) is the right way to go (in fact, there are likely examples of this sort of analysis in the edgeR or limma vignettes).
Thanks for your answers, that's good news. Now I think this is much cleaner for me.