Approximately optimal survey scheme
Source:R/approx_optimal_survey_scheme.R
      approx_optimal_survey_scheme.RdFind the optimal survey scheme that maximizes value of information. This function uses the approximation method for calculating the expected value of the decision given a survey scheme.
Usage
approx_optimal_survey_scheme(
  site_data,
  feature_data,
  site_detection_columns,
  site_n_surveys_columns,
  site_probability_columns,
  site_management_cost_column,
  site_survey_cost_column,
  feature_survey_column,
  feature_survey_sensitivity_column,
  feature_survey_specificity_column,
  feature_model_sensitivity_column,
  feature_model_specificity_column,
  feature_target_column,
  total_budget,
  survey_budget,
  site_management_locked_in_column = NULL,
  site_management_locked_out_column = NULL,
  site_survey_locked_out_column = NULL,
  prior_matrix = NULL,
  n_approx_replicates = 100,
  n_approx_outcomes_per_replicate = 10000,
  seed = 500,
  n_threads = 1,
  verbose = FALSE
)Arguments
- site_data
- sf::sf()object with site data.
- feature_data
- base::data.frame()object with feature data.
- site_detection_columns
- characternames of- numericcolumns in the argument to- site_datathat contain the proportion of surveys conducted within each site that detected each feature. Each column should correspond to a different feature, and contain a proportion value (between zero and one). If a site has not previously been surveyed, a value of zero should be used.
- site_n_surveys_columns
- characternames of- numericcolumns in the argument to- site_datathat contain the total number of surveys conducted for each each feature within each site. Each column should correspond to a different feature, and contain a non-negative integer number (e.g. 0, 1, 2, 3). If a site has not previously been surveyed, a value of zero should be used.
- site_probability_columns
- characternames of- numericcolumns in the argument to- site_datathat contain modeled probabilities of occupancy for each feature in each site. Each column should correspond to a different feature, and contain probability data (values between zero and one). No missing (- NA) values are permitted in these columns.
- site_management_cost_column
- charactername of column in the argument to- site_datathat contains costs for managing each site for conservation. This column should have- numericvalues that are equal to or greater than zero. No missing (- NA) values are permitted in this column.
- site_survey_cost_column
- charactername of column in the argument to- site_datathat contains costs for surveying each site. This column should have- numericvalues that are equal to or greater than zero. No missing (- NA) values are permitted in this column.
- feature_survey_column
- charactername of the column in the argument to- feature_datathat contains- logical(- TRUE/- FALSE) values indicating if the feature will be surveyed in the planned surveys or not. Note that considering additional features will rapidly increase computational burden, and so it is only recommended to consider features that are of specific conservation interest. No missing (- NA) values are permitted in this column.
- feature_survey_sensitivity_column
- charactername of the column in the argument to- feature_datathat contains probability of future surveys correctly detecting a presence of each feature in a given site (i.e. the sensitivity of the survey methodology). This column should have- numericvalues that are between zero and one. No missing (- NA) values are permitted in this column.
- feature_survey_specificity_column
- charactername of the column in the argument to- feature_datathat contains probability of future surveys correctly detecting an absence of each feature in a given site (i.e. the specificity of the survey methodology). This column should have- numericvalues that are between zero and one. No missing (- NA) values are permitted in this column.
- feature_model_sensitivity_column
- charactername of the column in the argument to- feature_datathat contains probability of the initial models correctly predicting a presence of each feature in a given site (i.e. the sensitivity of the models). This column should have- numericvalues that are between zero and one. No missing (- NA) values are permitted in this column. This should ideally be calculated using- fit_xgb_occupancy_models()or- fit_hglm_occupancy_models().
- feature_model_specificity_column
- charactername of the column in the argument to- feature_datathat contains probability of the initial models correctly predicting an absence of each feature in a given site (i.e. the specificity of the models). This column should have- numericvalues that are between zero and one. No missing (- NA) values are permitted in this column. This should ideally be calculated using- fit_xgb_occupancy_models()or- fit_hglm_occupancy_models().
- feature_target_column
- charactername of the column in the argument to- feature_datathat contains the \(target\) values used to parametrize the conservation benefit of managing of each feature. This column should have- numericvalues that are equal to or greater than zero. No missing (- NA) values are permitted in this column.
- total_budget
- numericmaximum expenditure permitted for conducting surveys and managing sites for conservation.
- survey_budget
- numericmaximum expenditure permitted for conducting surveys.
- site_management_locked_in_column
- charactername of the column in the argument to- site_datathat contains- logical(- TRUE/- FALSE) values indicating which sites should be locked in for (- TRUE) being managed for conservation or (- FALSE) not. No missing (- NA) values are permitted in this column. This is useful if some sites have already been earmarked for conservation, or if some sites are already being managed for conservation. Defaults to- NULLsuch that no sites are locked in.
- site_management_locked_out_column
- charactername of the column in the argument to- site_datathat contains- logical(- TRUE/- FALSE) values indicating which sites should be locked out for (- TRUE) being managed for conservation or (- FALSE) not. No missing (- NA) values are permitted in this column. This is useful if some sites could potentially be surveyed to improve model predictions even if they cannot be managed for conservation. Defaults to- NULLsuch that no sites are locked out.
- site_survey_locked_out_column
- charactername of the column in the argument to- site_datathat contains- logical(- TRUE/- FALSE) values indicating which sites should be locked out (- TRUE) from being selected for future surveys or (- FALSE) not. No missing (- NA) values are permitted in this column. This is useful if some sites will never be considered for future surveys (e.g. because they are too costly to survey, or have a low chance of containing the target species). Defaults to- NULLsuch that no sites are locked out.
- prior_matrix
- numeric- matrixcontaining the prior probability of each feature occupying each site. Rows correspond to features, and columns correspond to sites. Defaults to- NULLsuch that prior data is calculated automatically using- prior_probability_matrix().
- n_approx_replicates
- integernumber of replicates to use for approximating the expected value calculations. Defaults to 100.
- n_approx_outcomes_per_replicate
- integernumber of outcomes to use per replicate for approximation calculations. Defaults to 10000.
- seed
- integerstate of the random number generator for simulating outcomes when conducting the value of information analyses. Defaults to 500.
- n_threads
- integernumber of threads to use for computation.
- verbose
- logicalindicating if information should be printed during processing. Defaults to- FALSE.
Value
A matrix of logical (TRUE/ FALSE)
values indicating if a site is selected in the scheme or not. Columns
correspond to sites, and rows correspond to different schemes. If
there is only one optimal survey scheme then the matrix will only
contain a single row.
This matrix also has a numeric
"ev"
attribute that contains a matrix with the approximate expected values. Within this attribute, each row corresponds to a different survey scheme and each column corresponds to a different replicate.
Details
The "approximately" optimal survey scheme is determined using a brute-force
algorithm.
Initially, all feasible (valid) survey schemes are identified given the
survey costs and the survey budget (using
feasible_survey_schemes(). Next, the expected value of each and
every feasible survey scheme is approximated
(using approx_evdsi()).
Finally, the greatest expected value is identified, and all survey schemes
that share this greatest expected value are returned. Due to the nature of
this algorithm, it can take a very long time to complete.
Dependencies
Please note that this function requires the Gurobi optimization software (https://www.gurobi.com/) and the gurobi R package if different sites have different survey costs. Installation instruction are available online for Linux, Windows, and Mac OS (see https://support.gurobi.com/hc/en-us/articles/4534161999889-How-do-I-install-Gurobi-Optimizer).
Examples
# set seeds for reproducibility
set.seed(123)
# load example site data
data(sim_sites)
print(sim_sites)
#> Simple feature collection with 6 features and 13 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 0.02541313 ymin: 0.07851093 xmax: 0.9888107 ymax: 0.717068
#> CRS:           NA
#> # A tibble: 6 × 14
#>   survey_cost management_cost    f1    f2    f3    n1    n2    n3     e1     e2
#>         <dbl>           <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
#> 1          19              99     0     0  0        0     0     0  1.13   0.535
#> 2          22              87     0     1  0.25     4     4     4 -1.37  -1.45 
#> 3          13              94     1     1  0        1     1     1  0.155 -0.867
#> 4          19              61     0     0  0        0     0     0 -0.792  1.32 
#> 5           9             105     0     0  0        0     0     0 -0.194  0.238
#> 6          12             136     0     0  0        0     0     0  1.07   0.220
#> # ℹ 4 more variables: p1 <dbl>, p2 <dbl>, p3 <dbl>, geometry <POINT>
# load example feature data
data(sim_features)
print(sim_features)
#> # A tibble: 3 × 7
#>   name  survey survey_sensitivity survey_specificity model_sensitivity
#>   <chr> <lgl>               <dbl>              <dbl>             <dbl>
#> 1 f1    TRUE                0.951              0.854             0.711
#> 2 f2    TRUE                0.990              0.832             0.722
#> 3 f3    TRUE                0.986              0.808             0.772
#> # ℹ 2 more variables: model_specificity <dbl>, target <dbl>
# set total budget for managing sites for conservation
 # (i.e. 50% of the cost of managing all sites)
total_budget <- sum(sim_sites$management_cost) * 0.5
# set total budget for surveying sites for conservation
# (i.e. 40% of the cost of surveying all sites)
survey_budget <- sum(sim_sites$survey_cost) * 0.4
# \dontrun{
# find optimal survey scheme using approximate method
# (using 10 replicates so that this example completes relatively quickly)
approx_opt_survey <- approx_optimal_survey_scheme(
  sim_sites, sim_features,
  c("f1", "f2", "f3"), c("n1", "n2", "n3"), c("p1", "p2", "p3"),
  "management_cost", "survey_cost",
  "survey", "survey_sensitivity", "survey_specificity",
  "model_sensitivity", "model_specificity",
  "target", total_budget, survey_budget)
# print result
print(approx_opt_survey)
#>       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
#> [1,] FALSE FALSE FALSE FALSE FALSE FALSE
#> attr(,"ev")
#>        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
#> [1,] 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165
#>       [,11]  [,12]  [,13]  [,14]  [,15]  [,16]  [,17]  [,18]  [,19]  [,20]
#> [1,] 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165
#>       [,21]  [,22]  [,23]  [,24]  [,25]  [,26]  [,27]  [,28]  [,29]  [,30]
#> [1,] 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165
#>       [,31]  [,32]  [,33]  [,34]  [,35]  [,36]  [,37]  [,38]  [,39]  [,40]
#> [1,] 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165
#>       [,41]  [,42]  [,43]  [,44]  [,45]  [,46]  [,47]  [,48]  [,49]  [,50]
#> [1,] 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165
#>       [,51]  [,52]  [,53]  [,54]  [,55]  [,56]  [,57]  [,58]  [,59]  [,60]
#> [1,] 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165
#>       [,61]  [,62]  [,63]  [,64]  [,65]  [,66]  [,67]  [,68]  [,69]  [,70]
#> [1,] 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165
#>       [,71]  [,72]  [,73]  [,74]  [,75]  [,76]  [,77]  [,78]  [,79]  [,80]
#> [1,] 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165
#>       [,81]  [,82]  [,83]  [,84]  [,85]  [,86]  [,87]  [,88]  [,89]  [,90]
#> [1,] 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165
#>       [,91]  [,92]  [,93]  [,94]  [,95]  [,96]  [,97]  [,98]  [,99] [,100]
#> [1,] 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165 1.7165
# }