I have a data set that consists of 1 response variables (e.g., response time) and 20-30 predictors variables with a mixture of categorical and continuous data. I code the categorical data into dummy variables, split the data into training and test sets, then perform a ridge regression on the training set:
fit.ridge <- glmnet(x.train, y.train, family="gaussian", alpha=0)
I was curious how I could go about doing planned comparisons with this fit. I tried generating a model with anova by offsetting coefficients input into a model:
For coefficients in coef(fit.ridge), offset in anova
$opt_coef
coef abscoef r_squared
epoch -4.428626e-09 4.428626e-09 0.874085
request_length 2.665573e-07 2.665573e-07 0.874085
PUT.tot_count -3.965862e-06 3.965862e-06 0.874085
stage02.init 1.882785e-04 1.882785e-04 0.874085
upstream10.131.170.102 -2.551089e-04 2.551089e-04 0.874085
PUT.role_count 3.996994e-04 3.996994e-04 0.874085
stage06.2ndreclaim 5.491315e-04 5.491315e-04 0.874085
PUT.upstream_count -6.168589e-04 6.168589e-04 0.874085
PUT.client_count 1.197713e-03 1.197713e-03 0.874085
stage05.2ndrun 2.124676e-03 2.124676e-03 0.874085
stage04.reclaim -2.316906e-03 2.316906e-03 0.874085
stage03.delete -3.258289e-03 3.258289e-03 0.874085
upstream10.131.170.103 -5.956466e-03 5.956466e-03 0.874085
upstream10.131.170.104 -1.578803e-02 1.578803e-02 0.874085
rolebta 1.736442e-02 1.736442e-02 0.874085
rolebpe 2.324909e-02 2.324909e-02 0.874085
rolebpt -2.943251e-02 2.943251e-02 0.874085
rolemmw -3.597777e-02 3.597777e-02 0.874085
rolebdr -4.376095e-02 4.376095e-02 0.874085
rolefti -4.913617e-02 4.913617e-02 0.874085
rolebez -7.276036e-02 7.276036e-02 0.874085
rolebed -7.697812e-02 7.697812e-02 0.874085
rolebpp 8.460213e-02 8.460213e-02 0.874085
rolefts 1.461163e-01 1.461163e-01 0.874085
(Intercept) 6.944585e+03 6.944585e+03 0.874085
a1 <- aov(formula = response_variable ~ 1 + (coef_var1 * var1) + (coef_var2 * var2) ... )
Then calculate a comparison for a predictor variable:
TukeyHSD(a1, "var2")
However, I'm running into some issues when I try to generate the Tukey test:
> a1 <- aov(rt ~ 1 + offset(-4.428626e-09 * epoch) + rolebta, data=data)
> summary(a1)
Df Sum Sq Mean Sq F value Pr(>F)
rolebta 1 0.56 0.5597 16.38 5.25e-05 ***
Residuals 7324 250.30 0.0342
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> TukeyHSD(a1, "rolebta")
Error in TukeyHSD.aov(a1, "rolebta") : no factors in the fitted model
In addition: Warning message:
In replications(paste("~", xx), data = mf) : non-factors ignored: rolebta
> TukeyHSD(a1, data$rolebta)
Error in TukeyHSD.aov(a1, data$rolebta) : no factors in the fitted model
In addition: Warning message:
In replications(paste("~", xx), data = mf) : non-factors ignored: rolebta
Reason I'm trying to do it this way, as opposed to say a pairwise.t.test(), is because I'd like to account for any variance that is not contributed to my predictor of interest. So in the above example, we are correcting for the variable "epoch" before we do the planned contrasts on the "rolebta" factor
I feel like I'm missing something obvious, would someone help me figure out what I've overlooked?
M