I am using GAM with the mgcv
package in R to predict a discrete variable scaled to [0, 1]. The variable was under a beta distribution (checked with fitdistrplus). I performed the model diagnostics with appraise
in gratia
. The results looked good to me, but I still would like to ask a few fundamental questions to make sure I understood the diagnostics correctly.
Given the GAM fitting the response variable under a beta distribution, should the residuals follow a normal or a beta distribution? I understood that there is no assumption regarding the distribution of the model residuals. Is that correct?
In the Q-Q plot of residuals, are the theoretical quantiles calculated from the beta distribution? If not what is the distribution?
In the residuals-vs-linear-predictor plot, what is the linear predictor on x-axis? How is it calculated?
This question is related to question 1. In the histogram of residuals, what is the expected distribution of the residuals? I understood that there is no assumption about the distribution of residuals. But if there is any, what are the general rules? For instance, should we expect the residuals following the beta distribution if the GAM fits the response variable with a beta distribution?
- Looking at the following diagnostics, is there any problems with the model? Should I carry forward and make inference with it?
Technical details about the dataset. It was a survey about factors driving innovation in start-ups. The response variable was the innovation score (scaled to [0, 1]). Explanatory variables were discrete numbers.
k = 7
gam_reduced_mixlinear_beta = gam(formula = Innovation ~ s(Experience, k = k) +
s(Partnership, k = k) +
Government,
data = dat_scale,
family = betar,
method = 'REML')
appraise(gam_reduced_mixlinear_beta, guides = 'auto', line_col = 'blue') &
theme_pubr()