Calculate the expected value of the management decision given survey information. This metric describes the value of the management decision that is expected when the decision maker conducts a surveys a set of sites to inform the decision. To speed up the calculations, an approximation method is used.

approx_evdsi(
  site_data,
  feature_data,
  site_detection_columns,
  site_n_surveys_columns,
  site_probability_columns,
  site_management_cost_column,
  site_survey_scheme_column,
  site_survey_cost_column,
  feature_survey_column,
  feature_survey_sensitivity_column,
  feature_survey_specificity_column,
  feature_model_sensitivity_column,
  feature_model_specificity_column,
  feature_target_column,
  total_budget,
  site_management_locked_in_column = NULL,
  site_management_locked_out_column = NULL,
  prior_matrix = NULL,
  n_approx_replicates = 100,
  n_approx_outcomes_per_replicate = 10000,
  seed = 500
)

Arguments

site_data

sf::sf() object with site data.

feature_data

base::data.frame() object with feature data.

site_detection_columns

character names of numeric columns in the argument to site_data that contain the proportion of surveys conducted within each site that detected each feature. Each column should correspond to a different feature, and contain a proportion value (between zero and one). If a site has not previously been surveyed, a value of zero should be used.

site_n_surveys_columns

character names of numeric columns in the argument to site_data that contain the total number of surveys conducted for each each feature within each site. Each column should correspond to a different feature, and contain a non-negative integer number (e.g. 0, 1, 2, 3). If a site has not previously been surveyed, a value of zero should be used.

site_probability_columns

character names of numeric columns in the argument to site_data that contain modeled probabilities of occupancy for each feature in each site. Each column should correspond to a different feature, and contain probability data (values between zero and one). No missing (NA) values are permitted in these columns.

site_management_cost_column

character name of column in the argument to site_data that contains costs for managing each site for conservation. This column should have numeric values that are equal to or greater than zero. No missing (NA) values are permitted in this column.

site_survey_scheme_column

character name of logical (TRUE / FALSE) column in the argument to site_data that indicates which sites are selected in the scheme or not. No missing NA values are permitted. Additionally, only sites that are missing data can be selected or surveying (as per the argument to site_detection_columns).

site_survey_cost_column

character name of column in the argument to site_data that contains costs for surveying each site. This column should have numeric values that are equal to or greater than zero. No missing (NA) values are permitted in this column.

feature_survey_column

character name of the column in the argument to feature_data that contains logical (TRUE / FALSE) values indicating if the feature will be surveyed in the planned surveys or not. Note that considering additional features will rapidly increase computational burden, and so it is only recommended to consider features that are of specific conservation interest. No missing (NA) values are permitted in this column.

feature_survey_sensitivity_column

character name of the column in the argument to feature_data that contains probability of future surveys correctly detecting a presence of each feature in a given site (i.e. the sensitivity of the survey methodology). This column should have numeric values that are between zero and one. No missing (NA) values are permitted in this column.

feature_survey_specificity_column

character name of the column in the argument to feature_data that contains probability of future surveys correctly detecting an absence of each feature in a given site (i.e. the specificity of the survey methodology). This column should have numeric values that are between zero and one. No missing (NA) values are permitted in this column.

feature_model_sensitivity_column

character name of the column in the argument to feature_data that contains probability of the initial models correctly predicting a presence of each feature in a given site (i.e. the sensitivity of the models). This column should have numeric values that are between zero and one. No missing (NA) values are permitted in this column. This should ideally be calculated using fit_xgb_occupancy_models() or fit_hglm_occupancy_models().

feature_model_specificity_column

character name of the column in the argument to feature_data that contains probability of the initial models correctly predicting an absence of each feature in a given site (i.e. the specificity of the models). This column should have numeric values that are between zero and one. No missing (NA) values are permitted in this column. This should ideally be calculated using fit_xgb_occupancy_models() or fit_hglm_occupancy_models().

feature_target_column

character name of the column in the argument to feature_data that contains the \(target\) values used to parametrize the conservation benefit of managing of each feature. This column should have numeric values that are equal to or greater than zero. No missing (NA) values are permitted in this column.

total_budget

numeric maximum expenditure permitted for conducting surveys and managing sites for conservation.

site_management_locked_in_column

character name of the column in the argument to site_data that contains logical (TRUE / FALSE) values indicating which sites should be locked in for (TRUE) being managed for conservation or (FALSE) not. No missing (NA) values are permitted in this column. This is useful if some sites have already been earmarked for conservation, or if some sites are already being managed for conservation. Defaults to NULL such that no sites are locked in.

site_management_locked_out_column

character name of the column in the argument to site_data that contains logical (TRUE / FALSE) values indicating which sites should be locked out for (TRUE) being managed for conservation or (FALSE) not. No missing (NA) values are permitted in this column. This is useful if some sites could potentially be surveyed to improve model predictions even if they cannot be managed for conservation. Defaults to NULL such that no sites are locked out.

prior_matrix

numeric matrix containing the prior probability of each feature occupying each site. Rows correspond to features, and columns correspond to sites. Defaults to NULL such that prior data is calculated automatically using prior_probability_matrix().

n_approx_replicates

integer number of replicates to use for approximating the expected value calculations. Defaults to 100.

n_approx_outcomes_per_replicate

integer number of outcomes to use per replicate for approximation calculations. Defaults to 10000.

seed

integer state of the random number generator for simulating outcomes when conducting the value of information analyses. Defaults to 500.

Value

A numeric vector containing the expected values for each replicate.

Details

This function uses approximation methods to estimate the expected value calculations. The accuracy of these calculations depend on the arguments to n_approx_replicates and n_approx_outcomes_per_replicate, and so you may need to increase these parameters for large problems.

Examples

# set seeds for reproducibility
set.seed(123)

# load example site data
data(sim_sites)
print(sim_sites)
#> Simple feature collection with 6 features and 13 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 0.10513 ymin: 0.04556193 xmax: 0.9764926 ymax: 0.8637977
#> CRS:           NA
#> # A tibble: 6 × 14
#>   survey_cost management_cost    f1    f2    f3    n1    n2    n3      e1     e2
#>         <dbl>           <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>  <dbl>
#> 1          14             102     1   1       1     3     3     3  1.00   -0.848
#> 2          25              90     0   0       0     0     0     0 -1.44    1.27 
#> 3          25             165     1   0.6     0     5     5     5  1.25    0.817
#> 4          17             104     0   0       0     0     0     0 -0.484  -0.292
#> 5          18             100     0   0       0     0     0     0  0.0135  0.380
#> 6          15              94     0   0       0     0     0     0 -0.347  -1.33 
#> # ℹ 4 more variables: p1 <dbl>, p2 <dbl>, p3 <dbl>, geometry <POINT>

# load example feature data
data(sim_features)
print(sim_features)
#> # A tibble: 3 × 7
#>   name  survey survey_sensitivity survey_specificity model_sensitivity
#>   <chr> <lgl>               <dbl>              <dbl>             <dbl>
#> 1 f1    TRUE                0.954              0.886             0.718
#> 2 f2    TRUE                0.974              0.875             0.705
#> 3 f3    TRUE                0.956              0.823             0.768
#> # ℹ 2 more variables: model_specificity <dbl>, target <dbl>

# set total budget for managing sites for conservation
 # (i.e. 50% of the cost of managing all sites)
total_budget <- sum(sim_sites$management_cost) * 0.5

# create a survey scheme that samples the first two sites that
# are missing data
sim_sites$survey_site <- FALSE
sim_sites$survey_site[which(sim_sites$n1 < 0.5)[1:2]] <- TRUE

# calculate expected value of management decision given the survey
# information using approximation method
approx_ev_survey <- approx_evdsi(
  sim_sites, sim_features,
  c("f1", "f2", "f3"), c("n1", "n2", "n3"), c("p1", "p2", "p3"),
  "management_cost", "survey_site",
  "survey_cost", "survey", "survey_sensitivity", "survey_specificity",
  "model_sensitivity", "model_specificity",
  "target", total_budget)

# print mean value
print(mean(approx_ev_survey))
#> [1] 2.99216