R/fit_hglm_occupancy_models.R
fit_hglm_occupancy_models.Rd
Estimate probability of occupancy for a set of features in a set of
planning units. Models are fitted as hierarchical generalized linear models
that account for for imperfect detection (following Royle & Link 2006)
using JAGS (via runjags::run.jags()
). To limit over-fitting,
covariate coefficients are sampled using a Laplace prior distribution
(equivalent to L1 regularization used in machine learning contexts)
(Park & Casella 2008).
fit_hglm_occupancy_models(
site_data,
feature_data,
site_detection_columns,
site_n_surveys_columns,
site_env_vars_columns,
feature_survey_sensitivity_column,
feature_survey_specificity_column,
jags_n_samples = rep(10000, length(site_detection_columns)),
jags_n_burnin = rep(1000, length(site_detection_columns)),
jags_n_thin = rep(100, length(site_detection_columns)),
jags_n_adapt = rep(1000, length(site_detection_columns)),
jags_n_chains = rep(4, length(site_detection_columns)),
n_folds = rep(5, length(site_detection_columns)),
n_threads = 1,
seed = 500,
verbose = FALSE
)
sf::sf()
object with site data.
base::data.frame()
object with feature data.
character
names of numeric
columns in the argument to site_data
that contain the proportion of
surveys conducted within each site that detected each feature.
Each column should correspond to a different feature, and contain
a proportion value (between zero and one). If a site has
not previously been surveyed, a value of zero should be used.
character
names of numeric
columns in the argument to site_data
that contain the total
number of surveys conducted for each each feature within each site.
Each column should correspond to a different feature, and contain
a non-negative integer number (e.g. 0, 1, 2, 3). If a site has
not previously been surveyed, a value of zero should be used.
character
names of columns in the
argument to site_data
that contain environmental information
for fitting updated occupancy models based on possible survey outcomes.
Each column should correspond to a different environmental variable,
and contain numeric
, factor
, or character
data.
No missing (NA
) values are permitted in these columns.
character
name of the
column in the argument to feature_data
that contains
probability of future surveys correctly detecting a presence of each
feature in a given site (i.e. the sensitivity of the survey methodology).
This column should have numeric
values that are between zero and
one. No missing (NA
) values are permitted in this column.
character
name of the
column in the argument to feature_data
that contains
probability of future surveys correctly detecting an absence of each
feature in a given site (i.e. the specificity of the survey methodology).
This column should have numeric
values that are between zero and
one. No missing (NA
) values are permitted in this column.
integer
number of sample to generate per chain
for MCMC analyses.
See documentation for the sample
parameter
in runjags::run.jags()
for more information).
Defaults to 10,000 for each feature.
integer
number of warm up iterations for MCMC
analyses.
See documentation for the burnin
parameter
in runjags::run.jags()
for more information).
Defaults to 10,000 for each feature.
integer
number of thinning iterations for MCMC
analyses.
See documentation for the thin
parameter
in runjags::run.jags()
for more information).
Defaults to 100 for each feature.
integer
number of adapting iterations for MCMC
analyses.
See documentation for the adapt
parameter
in runjags::run.jags()
for more information).
Defaults to 1,000 for each feature.
integer
total number of chains for MCMC analyses.
See documentation for the n.chains
parameter
in runjags::run.jags()
for more information).
Defaults to 4 for each fold for each feature.
numeric
number of folds to split the training
data into when fitting models for each feature.
Defaults to 5 for each feature.
integer
number of threads to use for parameter
tuning. Defaults to 1.
integer
initial random number generator state for model
fitting. Defaults to 500.
logical
indicating if information should be
printed during computations. Defaults to FALSE
.
A list
object containing:
list
of list
objects containing the models.
tibble::tibble()
object containing
predictions for each feature.
tibble::tibble()
object containing the
performance of the best models for each feature. It contains the following
columns:
name of the feature.
maximum multi-variate potential scale reduction factor (MPSRF) value for the models. A MPSRF value less than 1.05 means that all coefficients in a given model have converged, and so a value less than 1.05 in this column means that all the models fit for a given feature have successfully converged.
mean TSS statistic for models calculated using training data in cross-validation.
standard deviation in TSS statistics for models calculated using training data in cross-validation.
mean sensitivity statistic for models calculated using training data in cross-validation.
standard deviation in sensitivity statistics for models calculated using training data in cross-validation.
mean specificity statistic for models calculated using training data in cross-validation.
standard deviation in specificity statistics for models calculated using training data in cross-validation.
mean TSS statistic for models calculated using test data in cross-validation.
standard deviation in TSS statistics for models calculated using test data in cross-validation.
mean sensitivity statistic for models calculated using test data in cross-validation.
standard deviation in sensitivity statistics for models calculated using test data in cross-validation.
mean specificity statistic for models calculated using test data in cross-validation.
standard deviation in specificity statistics for models calculated using test data in cross-validation.
This function (i) prepares the data for model fitting, (ii) fits the models, and (iii) assesses the performance of the models. These analyses are performed separately for each feature. For a given feature:
The data are prepared for model fitting by partitioning the data using
k-fold cross-validation (set via argument to n_folds
). The
training and evaluation folds are constructed
in such a manner as to ensure that each training and evaluation
fold contains at least one presence and one absence observation.
A model for fit separately for each fold (see
inst/jags/model.jags
for model code). To assess convergence,
the multi-variate potential scale reduction factor
(MPSRF) statistic is calculated for each model.
The performance of the cross-validation models is evaluated.
Specifically, the TSS, sensitivity, and specificity statistics are
calculated (if relevant, weighted by the argument to
site_weights_data
). These performance values are calculated using
the models' training and evaluation folds. To assess convergence,
the maximum MPSRF statistic for the models fit for each feature
is calculated.
This function requires the JAGS software to be installed. For information on installing the JAGS software, please consult the documentation for the rjags package.
Park T & Casella G (2008) The Bayesian lasso. Journal of the American Statistical Association, 103: 681--686.
Royle JA & Link WA (2006) Generalized site occupancy models allowing for false positive and false negative errors. Ecology, 87: 835--841.
# \dontrun{
# set seeds for reproducibility
set.seed(123)
# simulate data for 200 sites, 2 features, and 3 environmental variables
site_data <- simulate_site_data(n_sites = 30, n_features = 2, prop = 0.1)
feature_data <- simulate_feature_data(n_features = 2, prop = 1)
# print JAGS model code
cat(readLines(system.file("jags", "model.jags", package = "surveyvoi")),
sep = "\n")
#> model {
#> # data
#> ## int<lower=1> n_vars
#> ## int<lower=1> train_n_sites
#> ## real sens
#> ## real spec
#> ## real[train_n_sites, n_vars] train_model_matrix
#> ## real[train_n_sites] train_n_obs
#> ## matrix[train_n_sites, max(train_n_obs)] train_obs
#>
#> # priors
#> lambda ~ dunif(0.001, 10) # regularization parameter
#> coefs[1] ~ dnorm(0, 0.001) # intercept parameter
#> for (i in 2:n_vars) {
#> coefs[i] ~ ddexp(0, lambda) # covariate parameters
#> }
#>
#> # transformed parameters
#> for (i in 1:train_n_sites) {
#> logit(yhat[i]) <- coefs %*% train_model_matrix[i, ]
#> }
#>
#> # likelihood
#> for (i in 1:train_n_sites) {
#> occ[i] ~ dbern(yhat[i])
#> for (j in 1:train_n_obs[i]) {
#> train_obs[i, j] ~ dbern((occ[i] * sens) + ((1 - occ[i]) * spec))
#> }
#> }
#> }
# fit models
# note that we use a small number of MCMC iterations so that the example
# finishes quickly, you probably want to use the defaults for real work
results <- fit_hglm_occupancy_models(
site_data, feature_data,
c("f1", "f2"), c("n1", "n2"), c("e1", "e2", "e3"),
"survey_sensitivity", "survey_specificity",
n_folds = rep(5, 2),
jags_n_samples = rep(250, 2), jags_n_burnin = rep(250, 2),
jags_n_thin = rep(1, 2), jags_n_adapt = rep(100, 2),
n_threads = 1)
#>
|
| | 0%
#> Loading required namespace: rjags
#> Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|== | 2%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|==== | 5%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|===== | 8%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|======= | 10%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|========= | 12%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|========== | 15%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|============ | 18%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|============== | 20%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|================ | 22%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|================== | 25%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|=================== | 28%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|===================== | 30%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|======================= | 32%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|======================== | 35%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|========================== | 38%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|============================ | 40%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|============================== | 42%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|================================ | 45%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|================================= | 48%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|=================================== | 50%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|===================================== | 52%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|====================================== | 55%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|======================================== | 58%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|========================================== | 60%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|============================================ | 62%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|============================================== | 65%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|=============================================== | 68%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|================================================= | 70%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|=================================================== | 72%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|==================================================== | 75%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|====================================================== | 78%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|======================================================== | 80%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|========================================================== | 82%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|============================================================ | 85%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|============================================================= | 88%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|=============================================================== | 90%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|================================================================= | 92%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|================================================================== | 95%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|==================================================================== | 98%Compiling rjags model...
#> Calling the simulation using the rjags method...
#> Adapting the model for 100 iterations...
#> Burning in the model for 250 iterations...
#> Running the model for 250 iterations...
#> Simulation complete
#> Calculating summary statistics...
#> Finished running the simulation
#>
|
|======================================================================| 100%
#> Calculating summary statistics...
#> Calculating the Gelman-Rubin statistic for 3 variables....
#> Calculating summary statistics...
#> Calculating the Gelman-Rubin statistic for 3 variables....
#> Calculating summary statistics...
#> Calculating the Gelman-Rubin statistic for 3 variables....
#> Calculating summary statistics...
#> Calculating the Gelman-Rubin statistic for 3 variables....
#> Calculating summary statistics...
#> Calculating the Gelman-Rubin statistic for 3 variables....
#> Calculating summary statistics...
#> Calculating the Gelman-Rubin statistic for 3 variables....
#> Calculating summary statistics...
#> Calculating the Gelman-Rubin statistic for 3 variables....
#> Calculating summary statistics...
#> Calculating the Gelman-Rubin statistic for 3 variables....
#> Calculating summary statistics...
#> Calculating the Gelman-Rubin statistic for 3 variables....
#> Calculating summary statistics...
#> Calculating the Gelman-Rubin statistic for 3 variables....
# print model predictions
print(results$predictions)
#> # A tibble: 30 × 2
#> f1 f2
#> <dbl> <dbl>
#> 1 5.88e- 2 0.948
#> 2 1.00e+ 0 0.00000153
#> 3 5.37e-10 1.00
#> 4 1.00e+ 0 0.0000000615
#> 5 1.00e+ 0 0.0000633
#> 6 3.23e- 3 0.998
#> 7 6.25e- 3 0.881
#> 8 9.97e- 1 0.0608
#> 9 1.89e- 4 0.999
#> 10 2.89e- 1 0.829
#> # ℹ 20 more rows
# print model performance
print(results$performance, width = Inf)
#> # A tibble: 2 × 14
#> feature max_mpsrf train_tss_mean train_tss_std train_sensitivity_mean
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 f1 1.17 0.866 0.0651 0.958
#> 2 f2 2.57 0.836 0.0812 0.967
#> train_sensitivity_std train_specificity_mean train_specificity_std
#> <dbl> <dbl> <dbl>
#> 1 0.0421 0.909 0.0371
#> 2 0.0453 0.869 0.0581
#> test_tss_mean test_tss_std test_sensitivity_mean test_sensitivity_std
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.818 0.209 0.882 0.161
#> 2 0.655 0.222 0.881 0.163
#> test_specificity_mean test_specificity_std
#> <dbl> <dbl>
#> 1 0.935 0.0926
#> 2 0.774 0.230
# }