Dear all,
I'm a totally newbie on PCA analysis, so here is my question:
I'm working with a list of genes coming from Microarray gene expression analysis; let's say I have the genes in rows and the sample names in the columns, I did a PCA analysis in R using princomp
in order to reduce the dimensionality of genes (i.e approx. 400). I know that I must choose the components with higher variance over the total, that is the first two. The problem arises when I have to choose those genes that contribute most in each component to the amount of variance: May I use the scores for each gene? May I choose these genes only for first component or from both two components?
Thanks
Just a note that even though PC1 captures the largest share of the variance, it is not always the most interesting biologically. Sometimes PC1 captures non-biologically-interesting features like technical artifacts, batch effects, and the like. Some caution is required in interpretation....