I am strangling to find the right modeling method for my data. Short explain of the dataset : I have a variable called alpha diversity ( Alpha diversity refers to the average species diversity in a habitat or specific area. Alpha diversity is a local measure) and we want to see the effect of environmental( n=9) variables on it
Here is the scatter plots of each variables x Alpha diversity(Shannon) . Because of the complexity of the data I decided yo use GAM in order to catch linear, non-linear and no-relationships within the dtaset
In the following plot you can see the scatter plots of each variables
blue line linear
red loess ( by using geom_smooth)
GAMs 1. first I ran the GAM by using smooth functions for all the variables as follows
GAM1 <- gam(Shannon ~ s(Distance_from_city_centre, bs = 'cr', k = 5)+
s(Light_complete_100m, bs = 'cr', k = 5)+
s(Temperature_Celsius, bs = 'cr', k = 5)+
s(Human_presence, bs = 'cr', k = 5)+
s(NDVI, bs = 'cr', k = 5)+
s(Sound_dbC, bs = 'cr', k = 5)+
s(Closest_Road_m, bs = 'cr', k = 5)+
s(Closest_Path_m, bs = 'cr', k = 5)+
s(Tree_cover, bs = 'cr', k = 5),
data=data_stats_model,method = "REML")
and the results :
From the edf I see that many of the variables are close to linear ( edf=1) So for these variables I don't use a smoother and I run again the model as follows
GAM4 <- gam(Shannon ~ s(Distance_from_city_centre, bs = 'cr', k = 20) +
Light_complete_100m +
Temperature_Celsius +
s(Human_presence, bs = 'cr', k = 25)+
NDVI+
Sound_dbC+
s(Closest_Road_m, bs = 'cr', k = 5)+
s(Closest_Path_m, bs = 'cr', k = 5)+
Tree_cover,
data=data_stats_model, method = "REML")
and the results are :
So my questions are :
- Is this the right logic
- If yes, how to handle the multicollinearity? When I ran LMM for the same data I used the VIF strategy
I would appreciate any help, Anna