Dear All,
I am conducting a research to study the effects of DNA repeats on gene expression. I have 20,000 observation for the gene expression and about 1200 DNA repeats (predictor variables) that effect gene expression. I need to build a multiple regression model for this study. I found some techniques for variable selection, for example LASSO regression. My question is there any other technique to do that or which is the best method for doing that. BTW, in my case the predictors variables P are less than the number of observation n.
Thanks, Sean. The dependent variable in my study which is gene expression is continuous and it is approximately normally distributed, but most of predictors are taking count as o or 1. I don't know if there any other assumption for the ElasticNet and LASSO that should be satisfied to run them for variable selection.