Variable selection for multiple regression from large number of predictors

0

Entering edit mode

8.9 years ago

cjgunase ▴ 50

These are micro array datasets. I have 20 response variables Y=(Y1,...,Y20), and 1600 predictor variables X=(X1,...,Y1600). There are 128 observations. I wanted to know which pairs of X can best predict each of Y.

So I generated all the combinations of (Yi,Xj,Xk) and did linear regressions for each combination to find R-squared. Based on R-squared, I extracted top 100 combinations to further analyses which pairs of X are the best predictors for Y.

I haven't consider multicollinearity between any pair of predictors. Should I consider multicollinearity?

My goal is to find the best pairs of Xj, Xk that can predict a Yk. Can you give some suggestions to further improve this procedure to make it statistically valid?

gene ChIP-Seq • 1.9k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 8.9 years ago by cjgunase ▴ 50

0

Entering edit mode

I think it is a statistics question, not bioinformatics one. You should try asking here: http://stats.stackexchange.com/

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.9 years ago by mkulecka ▴ 360

Login before adding your answer.