Question: Optimize the parameter span for loess (or spar for smooth.spline)
1
0
Entering edit mode
9.9 years ago
wenbo • 0

Hi,

Does anyone know well about the parameter optimization for loess or smooth.spline?

In function loess or smooth.spline, the parameter span (or spar for smooth.spline) is very important for the fitting. I have tried to use the following method to optimize the parameter spar for smooth.spline:

tuneSpline = function(x,y,span.vals=seq(0.1,1,by=0.05),fold=10){
    mae <- numeric(length(span.vals))
    fun.fit <- function(x,y,span) {smooth.spline(x = x,y = y,spar = span)}
    fun.predict <- function(fit,x0) {predict(fit,x0)$y}

    ii = 0
    for(span in span.vals){
        ii <- ii+1
        y.cv <- crossval(x,y,fun.fit,fun.predict,span=span,ngroup = fold)$cv.fit
        fltr <- !is.na(y.cv)
        save(fltr,y.cv,y,file="tmp.rda")
        mae[ii] <- mean(abs(y[fltr]-y.cv[fltr]))
    }
    span <- span.vals[which.min(mae)]
    return(span)
}
require(graphics)
attach(cars)
tuneSpline(speed,dist,fold = length(dist))
## return 0.1

But the optimized spar by this method is always 0.1 (the minimum value in span.vals).

It's weird and I think the result may not be right.

Can anyone help me about this issue?

Best regards!

XianWu

loess R smooth.spline • 6.7k views
ADD COMMENT
0
Entering edit mode

Your code contains a couple of typos, namely missing argument 'y' on the third line, and missing opening bracket on the line with 'fltr' variable assignment. After fixing these, I ran the code and got 0.75 as the optimal value.

PS: I don't see how this question relates to bioinformatics, it's more suitable for StackOverflow

ADD REPLY
0
Entering edit mode

Pay attention to what you're doing. Why would you want to 'tune' the span of loess?

loess is used to make a line that doesnt exactly hit all the points; if you are measuring accuracy as how far away the points are from the line, then it follows that the optimal parameter is approaching zero. Now you've connected all the dots and made a zigzag. People use loess because they want a smooth curve that may miss points with assumed error, an optimal value is subjective simplicity; or maybe the 2nd derivative (curve sharpness)

ADD REPLY
1
Entering edit mode

It's a valid method of non-parametric estimation because of the use of cross-validation. See e.g. http://www.uab.ro/auajournal/upload/49_539_Nicoleta_Breaz-2.pdf for details.

ADD REPLY
0
Entering edit mode

if you run smooth.spline with the same data but without passing a spar parameter, is the fitted smoothing parameter that is returned by R less than 0.1?

Where did you get crossval from? The params for crossval::crossval are (predfun, X, Y, K, B, verbose, ...) so your ordering doesn't match that

Please ignore this comment

ADD REPLY
0
Entering edit mode
9.9 years ago
russhh 5.7k

With a working implementation you're approach works fine.

tuneSpline = function(x,y,span.vals=seq(0.1,1,by=0.05),fold=10){
    require(bootstrap)
    fun.fit <- function(x,y,span) {smooth.spline(x = x,y = y,spar = span)}
    fun.predict <- function(fit,x0) {predict(fit,x0)$y}

    mae <- sapply(span.vals, function(span){
      y.cv <- bootstrap::crossval(
        x,y,fun.fit,fun.predict,span=span,ngroup = fold
        )$cv.fit
      fltr <- which(!is.na(y.cv))
      mean(abs(y[fltr]-y.cv[fltr]))
      })
    span.vals[which.min(mae)]
}

attach(cars)
tuneSpline(speed,dist,fold = length(dist))

# 0.75
ADD COMMENT
0
Entering edit mode

Do you think this method is good?

ADD REPLY
0
Entering edit mode

When I used smooth.spline(speed,dist)$spar, I got the value of spar is "0.7801305". It's very similar to the value of tuneSpline. It looks like that smooth.spline can do the optimization and select the best spar value for fitting.

ADD REPLY

Login before adding your answer.

Traffic: 1518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6