These are micro array datasets. I have 20 response variables Y=(Y1,...,Y20), and 1600 predictor variables X=(X1,...,Y1600). There are 128 observations. I wanted to know which pairs of X can best predict each of Y.
So I generated all the combinations of (Yi,Xj,Xk) and did linear regressions for each combination to find R-squared. Based on R-squared, I extracted top 100 combinations to further analyses which pairs of X are the best predictors for Y.
I haven't consider multicollinearity between any pair of predictors. Should I consider multicollinearity?
My goal is to find the best pairs of Xj, Xk that can predict a Yk. Can you give some suggestions to further improve this procedure to make it statistically valid?
I think it is a statistics question, not bioinformatics one. You should try asking here: http://stats.stackexchange.com/