Correct way of reducing predictive model complexity
0
0
Entering edit mode
7 months ago
sativus ▴ 20

Hi Biostars!

I have ran into an issue regarding a predictive model for gene expression data which i am trying to construct. The model in question is created for binomial gene expression data where i have used filtered DEG results as input matrix, with their corresponding phenotype as a response vector. These genes are then further reduced through cross-validated lasso regression via the glmnet package (alpha=1.0, nfold=10), where the final model-genes are chosen as the coefficients associated with lambda.1se. The issue which i am running into, is that the selected "best" model is often still too complex (resulting in a Pr(>|z|) close or equal to 1.0, and AUC for the model equal to 1.0). Reducing the number of predictors seems to solve this issue, however i am unsure of the correct way to do so.

I have considered performing stepwise regression based on AIC on the final model genes after the cross-validated lasso regression, or simply choosing the predictors that adhere closest to the glm regression line and reducing them until Pr(>|z|) > 0.05 for the predictors, but as i am new to predictive modeling i am not sure if either of these approaches are valid from a statistical point of view.

Any and all input regarding this is highly appreciated.

lasso glmnet predictvemodeling regression • 201 views
ADD COMMENT

Login before adding your answer.

Traffic: 1546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6