Entering edit mode
7.6 years ago
Poorya Parvizi
▴
60
I have tried to use TCGA glioblastoma RNA-seq samples to apply differential expression. However i realized that the number of "Solid tissue Normal" is much lower than "Primary tumor" samples. For fpkm files in glioblastoma, there are 6 normal and 161 primary tumor samples.
Is this true? Am i missing something?
I wouldn't be surprised to have few normal samples since you're dealing with brain tissue here. You don't often remove normal brain tissue whether from the cancer patient or someone else.
You are right, but 124 of them are dead. So do you think that differential expression in this condition is statistically true?
Samples are usually obtained during surgery, trying to keep people alive. That's what I would assume unless there are more details on the samples provenance. Regarding the different sizes of the groups, statistical tests do not assume anything about sample size. In particular, as long as the assumptions of the test hold, the type I error (i.e. calling a difference statistically significant when it is not) is not affected. However, the power of the test (i.e. the probability of rejecting the null hypothesis when it is false) is reduced, this means that the probability of making a type II error (i.e. concluding there is no difference when there is really one) is increased. To put it in less mathematical terms, larger sample sizes make it easier to detect smaller differences. Also keep in mind that statistical significance and biological relevance are not linked a priori.