Hello,
I am doing some predictive modeling of gene expression from SNP genotypes. I have about 500 expression values (centered and scaled) and about 3000 SNPs (matrix of 0,1,2). When I run my elastic net (cv.glmnet, alpha = 0.5, 10fold cv), the model "fails" to determine any predictive SNPs, ie, it assigns 0 to each weight.
However, I also have a smaller subset of ~40 SNPs that I have prior reason to believe are good predictors of expression for this gene. When I run elastic net on just these predictors, I have no problem getting out a decent model that includes most of these SNPs.
So it seems to me that I have a true signal that I can't detect once enough other SNPs are added.
Ultimately, the goal would be to detect the best predictive eQTL SNPs in an unbiased way. Are there ways to optimize my input or algorithm to avoid these false negatives?
~misha