Question

Planned comparison with a ridge regression.

0

Entering edit mode

5.7 years ago

mforde84 ★ 1.4k

I have a data set that consists of 1 response variables (e.g., response time) and 20-30 predictors variables with a mixture of categorical and continuous data. I code the categorical data into dummy variables, split the data into training and test sets, then perform a ridge regression on the training set:

fit.ridge <- glmnet(x.train, y.train, family="gaussian", alpha=0)

I was curious how I could go about doing planned comparisons with this fit. I tried generating a model with anova by offsetting coefficients input into a model:

For coefficients in coef(fit.ridge), offset in anova

$opt_coef
                                coef      abscoef r_squared
epoch                  -4.428626e-09 4.428626e-09  0.874085
request_length          2.665573e-07 2.665573e-07  0.874085
PUT.tot_count          -3.965862e-06 3.965862e-06  0.874085
stage02.init            1.882785e-04 1.882785e-04  0.874085
upstream10.131.170.102 -2.551089e-04 2.551089e-04  0.874085
PUT.role_count          3.996994e-04 3.996994e-04  0.874085
stage06.2ndreclaim      5.491315e-04 5.491315e-04  0.874085
PUT.upstream_count     -6.168589e-04 6.168589e-04  0.874085
PUT.client_count        1.197713e-03 1.197713e-03  0.874085
stage05.2ndrun          2.124676e-03 2.124676e-03  0.874085
stage04.reclaim        -2.316906e-03 2.316906e-03  0.874085
stage03.delete         -3.258289e-03 3.258289e-03  0.874085
upstream10.131.170.103 -5.956466e-03 5.956466e-03  0.874085
upstream10.131.170.104 -1.578803e-02 1.578803e-02  0.874085
rolebta                 1.736442e-02 1.736442e-02  0.874085
rolebpe                 2.324909e-02 2.324909e-02  0.874085
rolebpt                -2.943251e-02 2.943251e-02  0.874085
rolemmw                -3.597777e-02 3.597777e-02  0.874085
rolebdr                -4.376095e-02 4.376095e-02  0.874085
rolefti                -4.913617e-02 4.913617e-02  0.874085
rolebez                -7.276036e-02 7.276036e-02  0.874085
rolebed                -7.697812e-02 7.697812e-02  0.874085
rolebpp                 8.460213e-02 8.460213e-02  0.874085
rolefts                 1.461163e-01 1.461163e-01  0.874085
(Intercept)             6.944585e+03 6.944585e+03  0.874085

a1 <- aov(formula = response_variable ~ 1 + (coef_var1 * var1) + (coef_var2 * var2) ... )

Then calculate a comparison for a predictor variable:

TukeyHSD(a1, "var2")

However, I'm running into some issues when I try to generate the Tukey test:

> a1 <- aov(rt ~ 1 + offset(-4.428626e-09 * epoch) + rolebta, data=data)
> summary(a1)
              Df Sum Sq Mean Sq F value   Pr(>F)    
rolebta        1   0.56  0.5597   16.38 5.25e-05 ***
Residuals   7324 250.30  0.0342                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> TukeyHSD(a1, "rolebta")
Error in TukeyHSD.aov(a1, "rolebta") : no factors in the fitted model
In addition: Warning message:
In replications(paste("~", xx), data = mf) : non-factors ignored: rolebta
> TukeyHSD(a1, data$rolebta)
Error in TukeyHSD.aov(a1, data$rolebta) : no factors in the fitted model
In addition: Warning message:
In replications(paste("~", xx), data = mf) : non-factors ignored: rolebta

Reason I'm trying to do it this way, as opposed to say a pairwise.t.test(), is because I'd like to account for any variance that is not contributed to my predictor of interest. So in the above example, we are correcting for the variable "epoch" before we do the planned contrasts on the "rolebta" factor

I feel like I'm missing something obvious, would someone help me figure out what I've overlooked?

M

Regression contrast • 977 views

ADD COMMENT • link 5.7 years ago by mforde84 ★ 1.4k