Remote effects spatial process (RESP) model example

At a glance

Introduction

We fit the RESP model to average monthly precipitation over Colorado in winter, as described here. The R package telefit (hosted on CRAN and Github) implements the RESP model. Basic use of the telefit package is demonstrated below.

You may need to install a few packages before proceeding and you can use the devtools package to install telefit. The telefit package has several dependencies and uses C++; it may take several minutes to install.

# simple looping
if (!require("foreach")) install.packages("foreach")

# parallelization
if (!require("doMC")) install.packages("doMC")
if (!require("doRNG")) install.packages("doRNG")

# data manipulation and plotting
if (!require("dplyr")) install.packages("dplyr")
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("cowplot")) install.packages("cowplot")

# MCMC tools
if (!require("coda")) install.packages("coda")

# RESP model
if (!require("telefit")) install.packages("telefit")

# development version of RESP model
# if (!require("devtools")) install.packages("devtools")
# if (!require("telefit")) devtools::install_github('jmhewitt/telefit')

Data

Load the example data resp_coex.RData in an R session. The dat object contains the response variable (precipitation) and covariates. Other objects contain settings and sample model output. The dat.test and dat.train objects are associated with fcst.test and fit.train, which are used later to illustrate posterior predictions.

load('resp_coex.RData')
str(dat)
## List of 9
##  $ tLabs   : chr [1:33] "1981" "1982" "1983" "1984" ...
##  $ coords.s: num [1:240, 1:2] -110 -109 -109 -108 -108 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:2] "x" "y"
##  $ coords.r: num [1:5252, 1:2] -179 -178 -176 -175 -174 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:2] "x" "y"
##  $ X       : num [1:240, 1:2, 1:33] 1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "dimnames")=List of 3
##   .. ..$ : chr [1:240] "1" "2" "3" "4" ...
##   .. ..$ : chr [1:2] "(Intercept)" "T"
##   .. ..$ : NULL
##  $ Y       : num [1:240, 1:33] 0.0422 0.6052 0.4764 -0.4575 -0.4064 ...
##  $ Z       : num [1:5252, 1:33] 0.000128 0.000161 0.000187 0.000191 0.000136 ...
##  $ X.lab   : chr "TCWV"
##  $ Y.lab   : chr "PRISM Precip. (mm)"
##  $ Z.lab   : chr "SSTK (K)"
##  - attr(*, "class")= chr "stData"

Modeling

Fit the RESP model

The RESP model may take several hours to fit because it is a hierarchical Bayesian spatial model. The file resp_coex.RData contains sample (sub)model output fit, fit.local, fit.remote if you want to skip this step.

# sampler parameters (Note: cov.r "nugget" is not currently estimated)
maxIt = 21000
burn = 1000
priors = list(
  beta = list( Lambda = diag(10, ncol(dat$X[,,1]))), 
  cov.s = list( smoothness = .5, range = c(1, 600), variance = c(2, 30), 
                nugget = c(2, 1) ),
  cov.r = list( smoothness = .5, range = c(1, 600), variance = c(2,1e-1), 
                nugget = c(2,1) )
)

# parallelize estimation of (sub)models
ncores = detectCores() - 1
registerDoMC(ncores)

# fit (sub)models in parallel
ret = foreach(i=1:3) %dorng% {
  if(i==1) { # RESP model
    stFit(dat, priors, maxIt, coords.knots = coords.knots)
  } else if (i==2) { # SP submodel
    stFit(dat, priors, maxIt, coords.knots = coords.knots, localOnly = T)
  } else if (i==3) { # RE submodel
    stFit(dat, priors, maxIt, coords.knots = coords.knots, remoteOnly = T)
  }
}

# extract model fits
fit = ret[[1]]
fit.local = ret[[2]]
fit.remote = ret[[3]]
rm(ret)

The telefit package saves posterior samples of the RESP model’s mean and covariance parameters. Teleconnection effects are estimated separately, via composition sampling as spatial random effects. The posterior predictive distribution is also sampled via composition. Posterior predictive samples for fcst, fcst.local, and fcst.remote are not included in resp_coex.RData to reduce the file’s size. However, posterior sampling is relatively quick because it is easily parallelized.

# posterior parameter samples
str(fit$parameters$samples)
## List of 8
##  $ beta         : num [1:21000, 1:2] -0.00943 -0.00378 -0.00467 -0.01297 0.00962 ...
##  $ sigmasq_y    : num [1:21000, 1] 0.0683 0.0673 0.0702 0.0727 0.0725 ...
##  $ sigmasq_r    : num [1:21000, 1] 1.6 1.58 1.79 1.79 3.32 ...
##  $ sigmasq_r_eps: num [1:21000, 1] 0 0 0 0 0 0 0 0 0 0 ...
##  $ sigmasq_eps  : num [1:21000, 1] 3.51 3.51 3.11 2.99 2.64 ...
##  $ rho_y        : num [1:21000, 1] 73.1 73.1 73.1 73.1 73.1 ...
##  $ rho_r        : num [1:21000, 1] 606 606 606 596 596 ...
##  $ ll           : num [1:21000, 1] 889 900 1100 1164 1377 ...
# estimate teleconnection effects and make predictions
fcst.local = stPredict(stFit = fit.local, stData = dat, 
                       stDataNew = dat, burn = burn, ncores = ncores)
fcst.remote = stPredict(stFit = fit.remote, stData = dat, stDataNew = dat, 
                        burn = burn, ncores = ncores, returnFullAlphas = F)
fcst = stPredict(stFit = fit, stData = dat, stDataNew = dat, burn = burn, 
                 ncores = ncores, returnFullAlphas = T)

Evaluate the RESP model

The stEval() function in telefit evaluates predictions with respect to several measures. Variance inflation factors are also included in the stVIF() function.

# evaluate predictions
clim = rowMeans(dat$Y)
fcst = stEval(fcst, dat$Y, clim)
fcst.local = stEval(fcst.local, dat$Y, clim)
fcst.remote = stEval(fcst.remote, dat$Y, clim)

# look at errors for predictions at first timepoint
str(fcst$pred[[1]]$err)
## List of 10
##  $ mspe          : num 0.613
##  $ ppl           : num 211
##  $ r2            : num -0.0136
##  $ cor           : num -0.00967
##  $ coverage      : num 0.954
##  $ cat.correct   : num 0.25
##  $ cat.heidke    : num -0.0592
##  $ cat.heidke.alt: num -0.125
##  $ crps.cat      : num 0.481
##  $ bss           : num -0.00302
# compute VIFs
vif = stVIF(stData = dat, stFit = fit, burn = 1000)
vif$beta
## (Intercept)           T 
##    1.000000    1.071419
summary(vif$alpha)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9615  0.9724  0.9776  0.9787  0.9852  0.9979

Plots and output

Teleconnection schematic

The telefit package supports basic plotting and labeling of climate data. Complex graphics can be built using additional functions in ggplot2.

# unnormalize data
dat$Z = dat$Z*5252

# set plotting label
dat$Z.lab = 'z'

# set plot year
t = 1982

# build base plot of sea surface temperatures
x = plot(dat, type='remote', t=t, boxsize = 6.5) + ggtitle('')

# re-normalize data
dat$Z = dat$Z/5252

# add local precipitation
x = x + 
  geom_point(aes(x=lon, y=lat, colour = precip), inherit.aes = F, size = .6,
             stroke = 0, pch = 15,
             data = data.frame(lon = dat$coords.s[,1], lat = dat$coords.s[,2],
                               precip = dat$Y[,which(dat$tLabs==t)])) +
  scale_color_distiller('Y', palette = 'PuBuGn', direction = 1) +
  theme(legend.box = 'horizontal',
        legend.box.margin = margin(t=0, r=-8, b=0, l=0, unit='pt'))

# set left endpoints of arrows for schematic
arrows.start = data.frame(
  x = c(-100, -150, -200, -180),
  y = c(-5, 5, 35, 55)
)[c(3,2,1,4),] %>% mutate(x=ifelse(x<=0,x,x-360))

# set arrow color
col = 'forestgreen'

# plot after adding schematic arrows and labels
x + geom_curve(aes(x = x1, y = y1, xend = x2, yend = y2),
               curvature = .3, arrow = arrow(length = unit(0.03, "npc")),
               inherit.aes = F, lwd = 1, col=col,
               data = data.frame(x1=arrows.start$x[1:3],
                                 y1=arrows.start$y[1:3],
                                 x2=dat$coords.s[c(161,187,219),1],
                                 y2=dat$coords.s[c(161,187,219),2])
               ) +
  geom_curve(aes(x = x1, y = y1, xend = x2, yend = y2),
             curvature = -.3, arrow = arrow(length = unit(0.03, "npc")),
             inherit.aes = F, lwd = 1, col=col,
             data = data.frame(x1=arrows.start$x[4],
                               y1=arrows.start$y[4],
                               x2=dat$coords.s[23,1],
                               y2=dat$coords.s[23,2])
             ) +
  geom_point(aes(x=x, y=y), size=3, col=col,
             data = arrows.start, inherit.aes = F, pch=19) +
  geom_label(aes(x=x,y=y,label=label), inherit.aes = F,
             data = data.frame(
               x=c(-185,-110),
               y=c(15,dat$coords.s[94,2]+10),
               label=c('z(r,t)','x(s,t), Y(s,t)')
             ), col=col, size=5)

Exploratory teleconnection analysis

The telefit package also supports basic exploratory data analysis for teleconnection, including functions that compute and plot empirical orthogonal functions and their correlations with local response variables.

plot_grid(
  plot(dat, type='eof') +
    ggtitle('SST Empirical orthogonal function 1 (EOF 1)') +
    theme(text = element_text(size=8)),
  plot(dat, type='eof_cor', signif.level = .05, signif.telecon = T,
       lwd=.7, alpha=.6) +
    ggtitle('Correlation between Precip. and EOF 1 score') +
    theme(text = element_text(size=8)),
  ncol=2,
  labels=paste(LETTERS[1:2], ')', sep='')
)

RESP Teleconnection estimates

Of course, the telefit package plots posterior estimates of teleconnection effects.

# build base plot of teleconnection effect estimates
resp = plot(fcst, type='eof_alpha_knots', stFit = fit, stData = dat,
            signif.telecon = T, lwd=0.7, alpha=.6) +
  scale_fill_gradient2(expression(hat(alpha)~"'"(~bold(".")~","~1)),
                       low = "#0571b0", mid = '#f7f7f7', high = '#ca0020') +
  ggtitle('Estimated teleconnection effects associated with EOF 1') 

# plot after adding denver as a point of reference
resp +
  geom_point(aes(x=lon, y=lat), 
             data = data.frame(lon = -104.9903, lat = 39.7392),
             inherit.aes = F, size = 2) + 
  geom_label(aes(x=lon, y=lat, label=label),
             data = data.frame(lon = -103.5, lat = 39.7392, label = 'Denver'),
             inherit.aes = F, alpha = .85) + 
  theme(text = element_text(size=8))

Model comparison

The resp_coex.RData file contains model comparison information in the errs object.

# set grouping labels
ilabs = c('Submodels', 'Common models')

# build plot data
errs = errs %>% mutate(model = factor(model, levels = c('RESP', 'RE', 'SP', 
                                                        'SVC', 'ENSO-T', 'CCA', 
                                                        'CLIM')),
         crps.rel = crps.cat / median(errs$crps.cat[errs$model=='CLIM']),
         comp = recode(model, 'RE'=ilabs[1], 'SP'=ilabs[1], 'RESP'='',
                       'SVC'=ilabs[2], 'ENSO-T'=ilabs[2], 'CCA'=ilabs[2], 
                       'CLIM'=ilabs[2]))

# plot error comparison
errs %>% ggplot(aes(x=model, y=crps.rel)) +
  geom_boxplot() +
  geom_hline(yintercept = 1, lty = 2) +
  xlab('') +
  ylab('Relative RPS') +
  scale_y_continuous(breaks = c(.5, 1, 1.5, 2, 3)) +
  facet_grid(~comp, scales = 'free_x', space = 'free_x') +
  theme(strip.background = element_blank(),
        strip.text = element_text(colour = 'gray15'),
        panel.border = element_rect(colour = 'gray60', linetype = 1, size = .5),
        axis.line.y = element_blank(),
        panel.spacing = unit(0, 'pt'))

errs %>% ggplot(aes(x=model, y=cat.heidke.alt)) +
  geom_boxplot() +
  geom_hline(yintercept = 0, lty=2) +
  xlab('') +
  ylab('Heidke skill score') +
  coord_equal(ratio = 2) +
  facet_grid(~comp, scales = 'free_x', space = 'free_x') +
  theme(strip.background = element_blank(),
        strip.text = element_text(colour = 'gray15'),
        panel.border = element_rect(colour = 'gray60', linetype = 1, size = .5),
        axis.line.y = element_blank(),
        panel.spacing = unit(0, 'pt'))

Parameter estimates

The coda package can easily process posterior parameter samples to develop point estimates and credible intervals.

maxIt = 21e3
burn = 1e3

# extract posterior samples of parameters
params = data.frame(fit$parameters$samples) %>% 
  select(-ll, -sigmasq_r_eps) %>%
  slice(burn:maxIt)
colnames(params)[1:ncol(dat$X)] = colnames(dat$X)

# "unexpand" the nugget parameter
params[,5] = params[,5] * params[,3]

# compute highest posterior density intervals
hpds = HPDinterval(mcmc(params))
hpds = round(hpds, 2)

# compute posterior estimates
ests = round(colMeans(mcmc(params)),2)

# stronger rounding for range parameters
hpds[6:7,] = round(hpds[6:7,], 0)
ests[6:7] = round(ests[6:7], 0)

# assemble parameter estimates table
df = data.frame(
  Param = c('$\\beta_0$', '$\\beta_T$', '$\\sigma^2_w$', '$\\sigma^2_\\alpha$',
            '$\\sigma^2_\\varepsilon$', '$\\rho_w$', '$\\rho_\\alpha$'),
  Est=ests,
  HPD=paste('(', paste(hpds[,1], hpds[,2], sep=', '), ')', sep='')
)

# text formatting
df$Est = gsub('-', '$-$',sprintf('%.2f', df$Est))
df[,1] = as.character(df[,1])
df[,3] = gsub('-','$-$',as.character(df[,3]))

colnames(df) = c('', 'Posterior mean', '95\\% HPD')
rownames(df) = c('Local effects',' ',  '  ', '   ',
                 'Covariance', '    ', '     ')

df
##                                        Posterior mean          95\\% HPD
## Local effects               $\\beta_0$        $-$0.00    ($-$0.14, 0.14)
##                             $\\beta_T$        $-$0.18 ($-$0.24, $-$0.12)
##                          $\\sigma^2_w$           0.55       (0.49, 0.62)
##                    $\\sigma^2_\\alpha$           6.05      (1.04, 14.81)
## Covariance    $\\sigma^2_\\varepsilon$           0.01       (0.01, 0.01)
##                              $\\rho_w$         248.00         (220, 280)
##                        $\\rho_\\alpha$         509.00         (266, 799)

Posterior predictive maps

Sample map of posterior predictions (B) and uncertainty (C), compared to truth (A).

plot_grid(
  plot(dat.test, fill.lab=expression(Y)) + ggtitle('Observed (1982)'),
  plot(fcst, fill.lab=expression(hat(Y))) + ggtitle('Forecast') +
    theme(axis.title.y = element_text(color='white')),
  NULL,
  plot(fcst, type='se', fill.lab=expression(SE(hat(Y)))) + 
    ggtitle('Uncertainty'),
  ncol=2, labels = c('A)','B)', '', 'C)')
)

Similar maps for discretized posterior predictions.

# compute posterior logits
logits = qlogis(fcst.test$samples$cat_probs)
logits.est = matrix(nrow = nrow(logits), ncol = ncol(logits))
for(i in 1:nrow(logits)) {
  logits.est[i,] = rowMeans(logits[i,,])
}

# transfer logits to a plottable object
logitDat = dat.test
logitDat$Y = apply(logits.est, 1, max) 

# build plots

p = plot(dat.test, type='cat.response', 
         category.breaks = fcst$category.breaks, fill.lab='') +
  ggtitle('Observed (1982)')

uncty = plot(logitDat, fill.lab=expression(hat(logit)(tilde(Y)))) +
  ggtitle('Uncertainty')

# assemble and display plots
ggdraw(plot_grid(
  plot_grid(
    plot_grid(
      p + theme(legend.position = 'none'),
      NULL,
      plot(fcst.test, type='cat.pred', fill.lab='') +
        ggtitle('Forecast') +
        theme(axis.title.y = element_text(color='white'),
              legend.position = 'none'),
      nrow=1, align='v', rel_widths = c(1,.03,1), labels=c('A)','','B)')),
    get_legend(p),
  rel_widths=c(1, 0.2)
  ),
    plot_grid(
    plot_grid(
      NULL,
      NULL,
      uncty + theme(legend.position = 'none'),
      nrow=1, align='v', rel_widths = c(1,.03,1), labels=c('','','C)')),
    get_legend(uncty),
  rel_widths=c(1, 0.2)
  ),
  nrow=2
))

Knot locations

For completeness, we can view the knot locations used during estimation.

dat$Z.lab = 'SST'
plot(dat, type='remote', coords.knots = coords.knots, t = 1982) +
  ggtitle('')

Acknowledgements

This material is based upon work supported by the National Science Foundation under grant number AGS 1419558. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.