Hi,
Does anyone know well about the parameter optimization for loess or smooth.spline?
In function loess or smooth.spline, the parameter span (or spar for smooth.spline) is very important for the fitting. I have tried to use the following method to optimize the parameter spar for smooth.spline:
tuneSpline = function(x,y,span.vals=seq(0.1,1,by=0.05),fold=10){
mae <- numeric(length(span.vals))
fun.fit <- function(x,y,span) {smooth.spline(x = x,y = y,spar = span)}
fun.predict <- function(fit,x0) {predict(fit,x0)$y}
ii = 0
for(span in span.vals){
ii <- ii+1
y.cv <- crossval(x,y,fun.fit,fun.predict,span=span,ngroup = fold)$cv.fit
fltr <- !is.na(y.cv)
save(fltr,y.cv,y,file="tmp.rda")
mae[ii] <- mean(abs(y[fltr]-y.cv[fltr]))
}
span <- span.vals[which.min(mae)]
return(span)
}
require(graphics)
attach(cars)
tuneSpline(speed,dist,fold = length(dist))
## return 0.1
But the optimized spar by this method is always 0.1 (the minimum value in span.vals
).
It's weird and I think the result may not be right.
Can anyone help me about this issue?
Best regards!
XianWu
Your code contains a couple of typos, namely missing argument 'y' on the third line, and missing opening bracket on the line with 'fltr' variable assignment. After fixing these, I ran the code and got 0.75 as the optimal value.
PS: I don't see how this question relates to bioinformatics, it's more suitable for StackOverflow
Pay attention to what you're doing. Why would you want to 'tune' the span of loess?
loess is used to make a line that doesnt exactly hit all the points; if you are measuring accuracy as how far away the points are from the line, then it follows that the optimal parameter is approaching zero. Now you've connected all the dots and made a zigzag. People use loess because they want a smooth curve that may miss points with assumed error, an optimal value is subjective simplicity; or maybe the 2nd derivative (curve sharpness)
It's a valid method of non-parametric estimation because of the use of cross-validation. See e.g. http://www.uab.ro/auajournal/upload/49_539_Nicoleta_Breaz-2.pdf for details.
if you run smooth.spline with the same data but without passing a spar parameter, is the fitted smoothing parameter that is returned by R less than 0.1?
Where did you get
crossval
from? The params forcrossval::crossval
are (predfun, X, Y, K, B, verbose, ...) so your ordering doesn't match thatPlease ignore this comment