I am looking at differential gene expression in the context of Breast Cancer PAM50 Subtypes, of which there are five. I am working with a population in which the distribution of subtypes is similar to that of the general breast cancer population.
When comparing the gene expression of a given subtype to the gene expression of the rest of the population ("others"), is it more meaningful for the "others" to consist of an equally-sized sample of each subtype, or for the sample sizes to be "as is" (i.e., representative of the general population)? By meaningful, I mean that the resulting differential genes should be the most characteristic to the subtype being examined.