It is typical to do geneset enrichment analysis where the differentially expressed genes are divided into subgroups and each group is tested for significance.
I saw someone select a dozen or so overexpressed genes, and test if the overexpressing associated with certain phenotypic traits. In this case, they are all membrane expressing genes.
Further, different combinations of a handful genes (in triplets or pairs) are tested for association between their expression levels (high vs low) and phenotypes.
But is it OK to hand pick a small subset from thousands of differentially expressed genes for statistical analysis? Is it statistically sound? What needs to be considered when doing so?
It probably has been discussed before, but I have trouble to use right keywords to search.
Thanks!