Prioritize conservation projects by maximizing expected weighted species richness using exact algorithms

Prioritize funding for conservation projects with species weights and using exact algorithms. Unlike other algorithms for solving the 'Project Prioritization Protocol' (Joseph, Maloney & Possingham 2009), this method can identify solutions that are guaranteed to be optimal (or within a pre-specified optimality gap; see Underhill 1994; Rodrigues & Gaston 2002). As a consequence, it is strongly recommended to use this method for developing project prioritizations.

ppp_exact_spp_solution(x, y, spp, budget, project_column_name,
  success_column_name, action_column_name, cost_column_name,
  species_column_name, weight_column_name, locked_in_column_name = NULL,
  locked_out_column_name = NULL, gap = 1e-06, threads = 1L,
  number_solutions = 1L, time_limit = .Machine$integer.max,
  number_approx_points = 300, verbose = FALSE)

Arguments

x	`data.frame` or `tibble` table containing project data. Here, each row should correspond to a different project and columns should contain data that correspond to each project. This object should contain data that denote (i) the name of each project (specified in the argument to `project_column_name`), (ii) the probability that each project will succeed if all of its actions are funded (specified in the argument to `success_column_name`), (iii) the enhanced probability that each species will persist if it is funded, and (iv) and which actions are associated with which projects (specified in the action names in the argument to `y`). To account for the combined benefits of multiple actions (e.g. baiting and trapping different invasive species in the same area), additional projects should be created that indicate the combined cost and corresponding species' persistence probabilities. Furthermore, this object must have a baseline project, with a zero cost, that represents the probability that each species will persist if no other conservation project is funded.
y	`data.frame` or `tibble` table containing the action data. Here, each row should correspond to a different action and columns should contain data that correspond to each action. This object should contain data that denote (i) the name of each action (specified in the argument to `action_column_name`), (ii) the cost of each action (specified in the argument to `cost_column_name`). If certain actions should be locked in or out of the solution, then this object should also contain data that denote (iii) which actions should be locked in (specified using the argument to `locked_in_column_name` if relevant) and (iv) which actions should be locked out (specified using the argument to `locked_out_column_name` if relevant).
spp	`data.frame` or `tibble` table containing the species data. Here, each row should correspond to a different species and columns should contain data that correspond to each species. This object should contain data that denote (i) the name of each species (specified in the argument to `species_column_name`). It may also contain (ii) the weight for each species (specified in the argument to `weight_column_name` if relevant).
budget	`numeric` value that represents the total budget available for funding conservation actions.
project_column_name	`character` name of column that contains the name for each conservation project. This argument corresponds to the argument to `x`. Note that the project names must not contain any duplicates or missing values.
success_column_name	`character` name of column that denotes the probability that each project will succeed. This argument corresponds to the argument to `x`. This column must have `numeric` values which lay between zero and one. No missing values are permitted.
action_column_name	`character` name of column that contains the name for each conservation action. This argument corresponds to the argument to `y`. Note that the project names must not contain any duplicates or missing values.
cost_column_name	`character` name of column that indicates the cost for funding each action. This argument corresponds to the argument to `y`. This column must have `numeric` values which are equal to or greater than zero. No missing values are permitted.
species_column_name	`character` name of the column that contains the name for each species. This argument corresponds to the argument to `spp`.
weight_column_name	`character` name of the column that contains the weight for each species. This argument corresponds to the argument to `spp`. This argument defaults to `NULL`, such that all species are assigned an equal weighting.
locked_in_column_name	`character` name of column that indicates which actions should be locked into the funding scheme. This argument corresponds to the argument to `y`. For example, it may be desirable to mandate that projects for iconic species are funded in the prioritization. This column should contain `logical` values, and projects associated with `TRUE` values are locked into the solution. No missing values are permitted. Defaults to `NULL` such that no projects are locked into the solution.
locked_out_column_name	`character` name of column that indicates which actions should be locked out of the funding scheme. This argument corresponds to the argument to `y`. For example, it may be desirable to lock out projects for certain species that are expected to have little support from the public. This column should contain `logical` values, and projects associated with `TRUE` values are locked out of the solution. No missing values are permitted. Defaults to `NULL` such that no projects are locked out of the solution.
gap	`numeric` optimality gap. This gap should be expressed as a proportion. For example, to find a solution that is within 10 % of optimality, then `0.1` should be supplied. No missing values are permitted. Defaults to `0`, so that the optimal solution will be returned.
threads	`numeric` number of threads for computational processing. No missing values are permitted. Defaults to `1`.
number_solutions	`numeric` number of solutions to return. If the argument is greater than `1`, then the output will contain the set number of solutions that are closest to optimality. No missing values are permitted. Defaults to `1`.
time_limit	`numeric` maximum number of seconds that should be spent searching for a solution after formatting the data. Effectively, defaults to no time limit (but specifically is `.Machine$integer.max`). No missing values are permitted.
number_approx_points	`numeric` number of points to use for approximating the probability that branches will go extent. Larger values increase the precision of these calculations. No missing values are permitted. Defaults to `300`.
verbose	`logical` should information be printed while solving the problem? No missing values are permitted. Defaults to `FALSE`.

Value

A tibble object containing the solution(s) data. Each row corresponds to a different solution, and each column describes a different property of the solution. The object contains a column for each project (based on the argument to project_column_name) which contains logical values indicating if the project was prioritized for funded (TRUE) or not (FALSE) in a given solution. Additionally, the object also contains the following columns:

"solution": integer solution identifier.
"method": character name of method used to produce the solution(s).)
"budget": numeric budget used for generating each of the of the solution(s).
"obj": numeric objective value. If phylogenetic data were input, then this column contains the expected phylogenetic diversity (Faith 2008) associated with each of the solutions. Otherwise, this column contains the expected weighted species richness (i.e. the sum of the product between the species' persistence probabilities and their weights.
"cost": numeric total cost associated with each of of the solution(s).
"optimal": logical indicating if each of the solution(s) is known to be optimal (TRUE) or not (FALSE). Missing values (NA) indicate that optimality is unknown (i.e. because the method used to produce the solution(s) does not provide any bounds on their quality).

Details

This function works by formulating the 'Project Prioritization Protocol' as a mixed integer programming problem (MIP) and solving it using the Gurobi optimization software suite. Although Gurobi is a commercial software, academics can obtain a special license for no cost. After downloading and installing the hrefhttps://www.gurobi.comGurobi software suite, the gurobi package will also need to be installed (see instructions for Linux, Mac OSX, and Windows operating systems). Finally, the gurobi package will also need to be installed (see instructions for Linux, Mac OSX, and Windows operating systems).

This problem aims to maximize expected species weighted richness given a budget. Let $S$ denote the set of species (indexed by $s$), and let $W_s$ denote the weight for each species. Additionally, let E_s denote the probability that each species will go extinct given the funded conservation projects. The objective can be expressed as: $$ \sum_{s}^{S} (1 - E_s) W_s $$ For the complete mathematical formulation, please refer to the formulation for maximizing expected phylogenetic diversity (i.e. ppp_exact_phylo_solution). This is because maximizing expected weighted species richness is merely a special-case of expected phylogenetic diversity---instead of using a complete phylogeny, expected weighted species richness simply uses a star phylogeny with branch lengths set according to the species' weights.

References

Faith DP (2008) Threatened species and the potential loss of phylogenetic diversity: conservation scenarios based on estimated extinction probabilities and phylogenetic risk analysis. Conservation Biology, 22: 1461--1470.

Joseph LN, Maloney RF & Possingham HP (2009) Optimal allocation of resources among threatened species: A project prioritization protocol. Conservation Biology, 23, 328--338.

Rodrigues AS & Gaston KJ (2002) Optimisation in reserve selection procedures---why not? Biological Conservation, 107: 123-129.

Underhill LG (1994) Optimal and suboptimal reserve selection algorithms. Biological Conservation, 70: 85--87.

Examples

# load built-in data
data(sim_project_data, sim_action_data, sim_species_data)

# print simulated project data set
print(sim_project_data)
#> # A tibble: 6 x 13
#>   name  success    S1    S2     S3    S4    S5 S1_action S2_action S3_action
#>   <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <lgl>     <lgl>     <lgl>    
#> 1 S1_p~   0.919 0.791 0     0      0     0     TRUE      FALSE     FALSE    
#> 2 S2_p~   0.923 0     0.888 0      0     0     FALSE     TRUE      FALSE    
#> 3 S3_p~   0.829 0     0     0.502  0     0     FALSE     FALSE     TRUE     
#> 4 S4_p~   0.848 0     0     0      0.690 0     FALSE     FALSE     FALSE    
#> 5 S5_p~   0.814 0     0     0      0     0.617 FALSE     FALSE     FALSE    
#> 6 base~   1     0.298 0.250 0.0865 0.249 0.182 FALSE     FALSE     FALSE    
#> # ... with 3 more variables: S4_action <lgl>, S5_action <lgl>,
#> #   baseline_action <lgl>

# print simulated action data
print(sim_action_data)
#> # A tibble: 6 x 4
#>   name             cost locked_in locked_out
#>   <chr>           <dbl> <lgl>     <lgl>     
#> 1 S1_action        94.4 FALSE     FALSE     
#> 2 S2_action       101.  FALSE     FALSE     
#> 3 S3_action       103.  TRUE      FALSE     
#> 4 S4_action        99.2 FALSE     FALSE     
#> 5 S5_action        99.9 FALSE     TRUE      
#> 6 baseline_action   0   FALSE     FALSE     

# print simulated species data
print(sim_species_data)
#> # A tibble: 5 x 2
#>   name  weight
#>   <chr>  <dbl>
#> 1 S3     0.211
#> 2 S1     0.211
#> 3 S5     0.221
#> 4 S2     0.630
#> 5 S4     1.59 
# verify if guorbi package is installed
if (!require(gurobi, quietly = TRUE))
 stop("the gurobi R package is not installed.")

# find a solution that meets a budget of 300
s1 <- ppp_exact_spp_solution(sim_project_data, sim_action_data,
                             sim_species_data, 300, "name", "success",
                             "name", "cost", "name", "weight")

# print solution
print(s1)
#> # A tibble: 1 x 12
#>   solution method   obj budget  cost optimal S1_action S2_action S3_action
#>      <int> <chr>  <dbl>  <dbl> <dbl> <lgl>   <lgl>     <lgl>     <lgl>    
#> 1        1 exact   1.66    300  295. TRUE    TRUE      TRUE      FALSE    
#> # ... with 3 more variables: S4_action <lgl>, S5_action <lgl>,
#> #   baseline_action <lgl>

# plot solution
ppp_plot_spp_solution(sim_project_data, sim_action_data, sim_species_data,
                      s1, "name", "success", "name", "cost", "name",
                      "weight")

# find a solution that meets a budget of 300 and allocates
# funding for the "S3_action" project. For instance, species "S3" might
# be an iconic species that has cultural and economic importance.
sim_action_data2 <- sim_action_data
sim_action_data2$locked_in <- sim_action_data2$name == "S3_action"
s2 <- ppp_exact_spp_solution(sim_project_data, sim_action_data2,
                             sim_species_data, 300, "name", "success",
                             "name", "cost", "name", "weight",
                             locked_in_column_name = "locked_in")

# print solution
print(s2)
#> # A tibble: 1 x 12
#>   solution method   obj budget  cost optimal S1_action S2_action S3_action
#>      <int> <chr>  <dbl>  <dbl> <dbl> <lgl>   <lgl>     <lgl>     <lgl>    
#> 1        1 exact   1.37    300  297. TRUE    TRUE      FALSE     TRUE     
#> # ... with 3 more variables: S4_action <lgl>, S5_action <lgl>,
#> #   baseline_action <lgl>

# plot solution
ppp_plot_spp_solution(sim_project_data, sim_action_data2, sim_species_data,
                      s2, "name", "success", "name", "cost", "name",
                      "weight")

# find a solution that meets a budget of 300 and does not allocate
# funding for the "S2_action" project. For instance, species "S2"
# might have very little cultural or economic importance. Broadly speaking,
# though, it is better to "lock in" "important" species rather than
# "lock out" unimportant species.
sim_action_data3 <- sim_action_data
sim_action_data3$locked_out <- sim_action_data3$name == "S2_action"
s3 <- ppp_exact_spp_solution(sim_project_data, sim_action_data3,
                             sim_species_data, 300, "name", "success",
                             "name", "cost", "name", "weight",
                             locked_out_column_name = "locked_out")

# print solution
print(s3)
#> # A tibble: 1 x 12
#>   solution method   obj budget  cost optimal S1_action S2_action S3_action
#>      <int> <chr>  <dbl>  <dbl> <dbl> <lgl>   <lgl>     <lgl>     <lgl>    
#> 1        1 exact   1.37    300  294. TRUE    TRUE      FALSE     FALSE    
#> # ... with 3 more variables: S4_action <lgl>, S5_action <lgl>,
#> #   baseline_action <lgl>

# plot solution
ppp_plot_spp_solution(sim_project_data, sim_action_data3, sim_species_data,
                      s3, "name", "success", "name", "cost", "name",
                      "weight")

# find the top solutions
s4 <- ppp_exact_spp_solution(sim_project_data, sim_action_data,
                             sim_species_data, 300, "name", "success",
                             "name", "cost", "name", "weight",
                             number_solutions = 1000)
#> Warning: although 1000 requested, only 18 solutions exist.

# print solution
print(s4)
#> # A tibble: 18 x 12
#>    solution method   obj budget  cost optimal S1_action S2_action S3_action
#>       <int> <chr>  <dbl>  <dbl> <dbl> <lgl>   <lgl>     <lgl>     <lgl>    
#>  1        1 exact   1.66    300 295.  TRUE    TRUE      TRUE      FALSE    
#>  2        2 exact   1.60    300 295.  FALSE   TRUE      TRUE      FALSE    
#>  3        3 exact   1.57    300 200.  FALSE   FALSE     TRUE      FALSE    
#>  4        4 exact   1.45    300 200.  FALSE   FALSE     TRUE      FALSE    
#>  5        5 exact   1.37    300 294.  FALSE   TRUE      FALSE     FALSE    
#>  6        6 exact   1.37    300 297.  FALSE   TRUE      FALSE     TRUE     
#>  7        7 exact   1.30    300 194.  FALSE   TRUE      FALSE     FALSE    
#>  8        8 exact   1.28    300 199.  FALSE   FALSE     FALSE     FALSE    
#>  9        9 exact   1.28    300 202.  FALSE   FALSE     FALSE     TRUE     
#> 10       10 exact   1.21    300  99.2 FALSE   FALSE     FALSE     FALSE    
#> 11       11 exact   1.20    300 294.  FALSE   TRUE      FALSE     FALSE    
#> 12       12 exact   1.20    300 295.  FALSE   TRUE      TRUE      FALSE    
#> 13       13 exact   1.19    300 299.  FALSE   TRUE      TRUE      TRUE     
#> 14       14 exact   1.17    300 297.  FALSE   TRUE      FALSE     TRUE     
#> 15       15 exact   1.12    300 195.  FALSE   TRUE      TRUE      FALSE    
#> 16       16 exact   1.10    300 201.  FALSE   FALSE     TRUE      FALSE    
#> 17       17 exact   1.10    300 204.  FALSE   FALSE     TRUE      TRUE     
#> 18       18 exact   1.08    300 194.  FALSE   TRUE      FALSE     FALSE    
#> # ... with 3 more variables: S4_action <lgl>, S5_action <lgl>,
#> #   baseline_action <lgl>