Variable selection for multiple regression from large number of predictors
0
0
Entering edit mode
8.9 years ago
cjgunase ▴ 50

These are micro array datasets. I have 20 response variables Y=(Y1,...,Y20), and 1600 predictor variables X=(X1,...,Y1600). There are 128 observations. I wanted to know which pairs of X can best predict each of Y.

So I generated all the combinations of (Yi,Xj,Xk) and did linear regressions for each combination to find R-squared. Based on R-squared, I extracted top 100 combinations to further analyses which pairs of X are the best predictors for Y.

I haven't consider multicollinearity between any pair of predictors. Should I consider multicollinearity?

My goal is to find the best pairs of Xj, Xk that can predict a Yk. Can you give some suggestions to further improve this procedure to make it statistically valid?

gene ChIP-Seq • 1.9k views
ADD COMMENT
0
Entering edit mode

I think it is a statistics question, not bioinformatics one. You should try asking here: http://stats.stackexchange.com/

ADD REPLY

Login before adding your answer.

Traffic: 2064 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6