Elastic net (glmnet) predictive modeling: signal lost in noise
2
0
Entering edit mode
7.2 years ago
lordoftheowl ▴ 10

Hello,

I am doing some predictive modeling of gene expression from SNP genotypes. I have about 500 expression values (centered and scaled) and about 3000 SNPs (matrix of 0,1,2). When I run my elastic net (cv.glmnet, alpha = 0.5, 10fold cv), the model "fails" to determine any predictive SNPs, ie, it assigns 0 to each weight.

However, I also have a smaller subset of ~40 SNPs that I have prior reason to believe are good predictors of expression for this gene. When I run elastic net on just these predictors, I have no problem getting out a decent model that includes most of these SNPs.

So it seems to me that I have a true signal that I can't detect once enough other SNPs are added.

Ultimately, the goal would be to detect the best predictive eQTL SNPs in an unbiased way. Are there ways to optimize my input or algorithm to avoid these false negatives?

~misha

R prediction machine learning • 2.5k views
ADD COMMENT
1
Entering edit mode
7.2 years ago

Try setting alpha to lower values to reduce the contribution of the lasso. The value of alpha obtained by cross-validation is a compromise between variable selection and prediction so it may not be optimal when you're concerned with variable selection. You may be interested in reading this blog on when the lasso fails and this paper evaluating elastic net for GWAS studies.

ADD COMMENT
0
Entering edit mode
7.2 years ago
aquaq ▴ 40

You could use caret package to tune for glmnet parameters. Here is a nice example of the process: http://rstudio-pubs-static.s3.amazonaws.com/14372_1700240153ae4c2190feb1c5ced2d1e5.html

ADD COMMENT

Login before adding your answer.

Traffic: 2331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6