I have a microarray expriment (Affymetrix) with 50 samples (each belonging to one of 5 disease subtypes) and about 5000 genes after filtering. Now, I would like to find a list of, let's say, 20 genes whose expression can be used to correctly classify the disease subtypes.
For now, I have created a design matrix in R that defines which subtype a sample belongs to. I then create a contrast matrix to distinguish between class A and the other classes, use eBayes to get the p-values, adjust them for multiple testing:
contrast.matrix = makeContrasts(A-(B+C+D+E)/4, levels=design)
fit = eBayes(contrasts.fit(lmFit(myFilteredGenes, design), contrast.matrix))
This is repeated for each class, so I end up with a 5000x5 matrix of p-values (where one column corresponds to how significant the current class expression compared to the average of all others is). How can I use this matrix to extract my class-discriminant genes? Or [preferred] how can I apply statistical testing on all classes instead of just 2 at a time?
Note: this is (1) for all classes at once (I'm not searching for pairwise comparison) and (2) taking the 4 highest p-values for each class might not be the smartest approach (or is it?).
I often use randomForests for this type of thing, though I cannot say that I have objective evidence that it is better than any other. It gives ranked gene lists as output (importance) and can also give sample weights for each class.
I'm currently classifying with SVMs on the whole filtered set, if there is a way to read the predictor strengths from the model this would already be great - I'll look into it, thanks.
SVM weight extraction: http://stackoverflow.com/questions/1899008/weights-from-linear-svm-model-in-r