1

My GAM curves are being shifted downwards. Is there something wrong with the intercept? I'm using the same code as Introduction to statistical learning... Any help's appreciated..

enter image description here

Here's the code. I simulated some data (a straight line with noise), and fit GAM multiple times using bootstrap. (It took me a while to figure out how to plot multiple GAM fits in one graph. Thanks to this post Sam's answer, and this post)

library(gam)

N = 1e2

set.seed(123)

dat = data.frame(x = 1:N,
                 y = seq(0, 5, length = N) + rnorm(N, mean = 0, sd = 2))
plot(dat$x, dat$y, xlim = c(1,100), ylim = c(-5,10))


gamFit = vector('list', 5)

for (ii in 1:5){

        ind = sample(1:N, N, replace = T)  #bootstrap
        gamFit[[ii]] = gam(y ~ s(x, 10), data = dat, subset = ind)

        par(new=T)

        plot(gamFit[[ii]], col = 'blue',
             xlim = c(1,100), ylim = c(-5,10),
             axes = F, xlab='', ylab='')
}
Community
  • 1
  • 1
YJZ
  • 3,080
  • 7
  • 28
  • 57
  • I don't have an answer exactly, but if you remove the `xlim` and `ylim` from both calls to `plot`, then the problem goes away. Still trying to figure out what the exact issue is, however. – Joel Carlson Dec 13 '15 at 14:35

1 Answers1

2

The issue is with plot.gam. If you take a look at the help page (?plot.gam), there is a parameter called scale, which states:

a lower limit for the number of units covered by the limits on the ‘y’ for each plot. The default is scale=0, in which case each plot uses the range of the functions being plotted to create their ylim. By setting scale to be the maximum value of diff(ylim) for all the plots, then all subsequent plots will produced in the same vertical units. This is essential for comparing the importance of fitted terms in additive models.

This is an issue, since you are not using range of the function being plotted (i.e. the range of y is not -5 to 10). So what you need to do is change

plot(gamFit[[ii]], col = 'blue',
     xlim = c(1,100), ylim = c(-5,10),
     axes = F, xlab='', ylab='')

to

plot(gamFit[[ii]], col = 'blue',
     scale = 15,
     axes = F, xlab='', ylab='')

And you get:

gam

Or you can just remove the xlim and ylim parameters from both calls to plot, and the automatic setting of plot to use the full range of the data will make everything work.

Joel Carlson
  • 590
  • 2
  • 9