I am building a descriptive model using OLS multiple linear regression. I have a couple dozen candidate predictors, but only around 200 cases.
Since I wanted at least 10 cases / variable for the global model, I chose 20 variables, leaving the rest outside the model.
This global model was subjected to backwards stepwise variable selection, informed by leave-one-out cross-validation. The final model was re-fitted using all the complete cases in the sample.
I then was advised to plot the residuals on the values of each of the initially excluded variables. I found clear patterns in some of those plots. Which I interpret means that those variables are somewhat associated with the outcome or some of the predictors in the model.
I do not know what to do next. I have not been able to find a clear answer of how to deal with those omitted but relevant variables. In some cases, they have been just added to the model and compared to the previous model through some criterion (MSE, AIC), but those examples I have seen them only in the context of econometrics.
Somehow, it rubs me wrong. It seems too simple. Could anyone point me towards bibliography on this topic?