Prioritize conservation projects by maximizing expected weighted species richness using heuristic algorithms

Prioritize funding for conservation projects using the 'Project Prioritization Protocol' (Joseph, Maloney & Possingham 2009) with species weights and using a backwards heuristic algorithm (Bennett et al. 2014). Although this algorithm can deliver solutions that perform better than random, it is extremely unlikely to identify solutions that are optimal (Underhill 1994; Rodrigues & Gaston 2002).

ppp_heuristic_spp_solution(x, y, spp, budget, project_column_name,
  success_column_name, action_column_name, cost_column_name,
  species_column_name, weight_column_name = NULL,
  locked_in_column_name = NULL, locked_out_column_name = NULL,
  number_solutions = 1L)

Arguments

x	`data.frame` or `tibble` table containing project data. Here, each row should correspond to a different project and columns should contain data that correspond to each project. This object should contain data that denote (i) the name of each project (specified in the argument to `project_column_name`), (ii) the probability that each project will succeed if all of its actions are funded (specified in the argument to `success_column_name`), (iii) the enhanced probability that each species will persist if it is funded, and (iv) and which actions are associated with which projects (specified in the action names in the argument to `y`). To account for the combined benefits of multiple actions (e.g. baiting and trapping different invasive species in the same area), additional projects should be created that indicate the combined cost and corresponding species' persistence probabilities. Furthermore, this object must have a baseline project, with a zero cost, that represents the probability that each species will persist if no other conservation project is funded.
y	`data.frame` or `tibble` table containing the action data. Here, each row should correspond to a different action and columns should contain data that correspond to each action. This object should contain data that denote (i) the name of each action (specified in the argument to `action_column_name`), (ii) the cost of each action (specified in the argument to `cost_column_name`). If certain actions should be locked in or out of the solution, then this object should also contain data that denote (iii) which actions should be locked in (specified using the argument to `locked_in_column_name` if relevant) and (iv) which actions should be locked out (specified using the argument to `locked_out_column_name` if relevant).
spp	`data.frame` or `tibble` table containing the species data. Here, each row should correspond to a different species and columns should contain data that correspond to each species. This object should contain data that denote (i) the name of each species (specified in the argument to `species_column_name`). It may also contain (ii) the weight for each species (specified in the argument to `weight_column_name` if relevant).
budget	`numeric` value that represents the total budget available for funding conservation actions.
project_column_name	`character` name of column that contains the name for each conservation project. This argument corresponds to the argument to `x`. Note that the project names must not contain any duplicates or missing values.
success_column_name	`character` name of column that denotes the probability that each project will succeed. This argument corresponds to the argument to `x`. This column must have `numeric` values which lay between zero and one. No missing values are permitted.
action_column_name	`character` name of column that contains the name for each conservation action. This argument corresponds to the argument to `y`. Note that the project names must not contain any duplicates or missing values.
cost_column_name	`character` name of column that indicates the cost for funding each action. This argument corresponds to the argument to `y`. This column must have `numeric` values which are equal to or greater than zero. No missing values are permitted.
species_column_name	`character` name of the column that contains the name for each species. This argument corresponds to the argument to `spp`.
weight_column_name	`character` name of the column that contains the weight for each species. This argument corresponds to the argument to `spp`. This argument defaults to `NULL`, such that all species are assigned an equal weighting.
locked_in_column_name	`character` name of column that indicates which actions should be locked into the funding scheme. This argument corresponds to the argument to `y`. For example, it may be desirable to mandate that projects for iconic species are funded in the prioritization. This column should contain `logical` values, and projects associated with `TRUE` values are locked into the solution. No missing values are permitted. Defaults to `NULL` such that no projects are locked into the solution.
locked_out_column_name	`character` name of column that indicates which actions should be locked out of the funding scheme. This argument corresponds to the argument to `y`. For example, it may be desirable to lock out projects for certain species that are expected to have little support from the public. This column should contain `logical` values, and projects associated with `TRUE` values are locked out of the solution. No missing values are permitted. Defaults to `NULL` such that no projects are locked out of the solution.
number_solutions	`numeric` number of solutions to return. If the argument is greater than `1`, then the output will contain the set number of solutions that are closest to optimality. No missing values are permitted. Defaults to `1`.

Value

A tibble object containing the solution(s) data. Each row corresponds to a different solution, and each column describes a different property of the solution. The object contains a column for each project (based on the argument to project_column_name) which contains logical values indicating if the project was prioritized for funded (TRUE) or not (FALSE) in a given solution. Additionally, the object also contains the following columns:

"solution": integer solution identifier.
"method": character name of method used to produce the solution(s).)
"budget": numeric budget used for generating each of the of the solution(s).
"obj": numeric objective value. If phylogenetic data were input, then this column contains the expected phylogenetic diversity (Faith 2008) associated with each of the solutions. Otherwise, this column contains the expected weighted species richness (i.e. the sum of the product between the species' persistence probabilities and their weights.
"cost": numeric total cost associated with each of of the solution(s).
"optimal": logical indicating if each of the solution(s) is known to be optimal (TRUE) or not (FALSE). Missing values (NA) indicate that optimality is unknown (i.e. because the method used to produce the solution(s) does not provide any bounds on their quality).

Details

This algorithm aims to identify a set of conservation projects, each associated with a set of conservation actions, that should be funded to maximize the expected weighted species richness (i.e. sum of each species' persisting into the future, multiplied by its weight). Briefly, this algorithm works by starting off with all conservation actions selected for funding and then begins iteratively defunding (removing) actions until the budget is met (Joseph, Maloney & Possingham 2009; Bennett et al. 2014). In a given iteration, each action is evaluated in terms of the amount of evolutionary history that is expected to be lost per unit cost when the action is not funded (based on expected weighted species richness), and the action associated with the lowest utility is defunded. Since projects are only considered funded when all of their associated actions are also funded---and species only receive benefits from projects that are funded, and not individual conservation actions---by iteratively removing actions according to their expected utility, this algorithm may identify cost-effective funding schemes. Note, however, that this algorithm is extremely unlikely to identify optimal solutions.

The calculations that underpin this algorithm can be expressed mathematically. To calculate the utility for funding a given action ($l$) among a set of actions ($L$), let the expected amount of evolutionary history that will persist into the future when all the actions are funded be expressed as $A(L)$. Also, let the expected amount of evolutionary history that will persist into the future when all the remaining actions are funded except for action $l$ be expressed as $A(L - l)$. Furthermore, allow the cost for funding action $l$ to be $C_l$. Given this, the relative benefit (or utility) for funding action $l$ in a given iteration can be expressed as ($U_l$):

$$U_l = \frac{A(L) - A(L - l)}{C_l}$$

To calculate the expected amount of evolutionary history that will persist into the future for a given set of funded actions, we will adopt a new set of definitions to avoid confusion. Let $I$ represent a given set of funded actions (indexed by $i$). For example, $I$ could denote all of the actions in a given iteration ($A(L)$) or all of the actions in a given iteration except for a specific action ($A(L - l)$). Next, let $S$ represent each species (indexed by $s$). Additionally, let $J$ denote the set of funded conservation projects (indexed by $j$) given the set of funded actions $I$. Let $P_j$ represent the probability of project $j$ being successful if it is funded. To represent the conservation outcome for funding the projects $J$, let $B_{sj}$ denote the probability of persistence for the species $s$ if project $j$ is funded and project $j$ is used to conserve that species (i.e. it is the funded project which best improves the persistence probability for that species). Furthermore, let $W_s$ denote the weight for each species. This weight may reflect cultural, economic, or taxonomic importance.

The probability that each species will go extinct ($E_s$) when a given set of projects are funded ($J$) can then be expressed as as:

$$E_s = 1 - \mathrm{max}(P_1 \times B_{s1}, \ldots, P_J \times B_{sJ})$$

The weighted species richness that is expected to persist when a given set of projects are funded can then be expressed as:

$$A(I) = \sum_{s = 0}^{S} (1 - E_s) W_s$$

References

Bennett JR, Elliott G, Mellish B, Joseph LN, Tulloch AI, Probert WJ, ... & Maloney R (2014) Balancing phylogenetic diversity and species numbers in conservation prioritization, using a case study of threatened species in New Zealand. Biological Conservation, 174: 47--54.

Faith DP (2008) Threatened species and the potential loss of phylogenetic diversity: conservation scenarios based on estimated extinction probabilities and phylogenetic risk analysis. Conservation Biology, 22: 1461--1470.

Joseph LN, Maloney RF, & Possingham HP (2009) Optimal allocation of resources among threatened species: a project prioritization protocol. Conservation biology, 23, 328--338.

Rodrigues AS & Gaston KJ (2002) Optimisation in reserve selection procedures---why not? Biological Conservation, 107: 123-129.

Underhill LG (1994) Optimal and suboptimal reserve selection algorithms. Biological Conservation, 70: 85--87.

Examples

# set seed for reproducibility
set.seed(500)

# load built-in data
data(sim_project_data, sim_action_data, sim_species_data)

# print simulated project data
print(sim_project_data)
#> # A tibble: 6 x 13
#>   name  success    S1    S2     S3    S4    S5 S1_action S2_action S3_action
#>   <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <lgl>     <lgl>     <lgl>    
#> 1 S1_p~   0.919 0.791 0     0      0     0     TRUE      FALSE     FALSE    
#> 2 S2_p~   0.923 0     0.888 0      0     0     FALSE     TRUE      FALSE    
#> 3 S3_p~   0.829 0     0     0.502  0     0     FALSE     FALSE     TRUE     
#> 4 S4_p~   0.848 0     0     0      0.690 0     FALSE     FALSE     FALSE    
#> 5 S5_p~   0.814 0     0     0      0     0.617 FALSE     FALSE     FALSE    
#> 6 base~   1     0.298 0.250 0.0865 0.249 0.182 FALSE     FALSE     FALSE    
#> # ... with 3 more variables: S4_action <lgl>, S5_action <lgl>,
#> #   baseline_action <lgl>

# print simulated action data
print(sim_action_data)
#> # A tibble: 6 x 4
#>   name             cost locked_in locked_out
#>   <chr>           <dbl> <lgl>     <lgl>     
#> 1 S1_action        94.4 FALSE     FALSE     
#> 2 S2_action       101.  FALSE     FALSE     
#> 3 S3_action       103.  TRUE      FALSE     
#> 4 S4_action        99.2 FALSE     FALSE     
#> 5 S5_action        99.9 FALSE     TRUE      
#> 6 baseline_action   0   FALSE     FALSE     

# print simulated species data
print(sim_species_data)
#> # A tibble: 5 x 2
#>   name  weight
#>   <chr>  <dbl>
#> 1 S3     0.211
#> 2 S1     0.211
#> 3 S5     0.221
#> 4 S2     0.630
#> 5 S4     1.59 
# find a solution that meets a budget of 300
s1 <- ppp_heuristic_spp_solution(sim_project_data, sim_action_data,
                                 sim_species_data, 300, "name", "success",
                                 "name", "cost", "name", "weight")

# print solution
print(s1)
#> # A tibble: 1 x 12
#>   solution method   obj budget  cost optimal S1_action S2_action S3_action
#>      <int> <chr>  <dbl>  <dbl> <dbl> <lgl>   <lgl>     <lgl>     <lgl>    
#> 1        1 heuri~  1.66    300  295. NA      TRUE      TRUE      FALSE    
#> # ... with 3 more variables: S4_action <lgl>, S5_action <lgl>,
#> #   baseline_action <lgl>

# print the names of which actions were funded
print(names(s1)[which(unlist(s1[1, sim_action_data$name]))])
#> [1] "solution" "method"   "budget"   "optimal" 

# plot solution
ppp_plot_spp_solution(sim_project_data, sim_action_data, sim_species_data,
                      s1, "name", "success", "name", "cost", "name",
                      "weight")

# find a solution that meets a budget of 300 and allocates
# funding for the "S3_action" action. For instance, species "S3" might
# be an iconic species that has cultural and economic importance.
sim_action_data2 <- sim_action_data
sim_action_data2$locked_in <- sim_action_data2$name == "S3_action"
s2 <- ppp_heuristic_spp_solution(sim_project_data, sim_action_data2,
                                 sim_species_data, 300, "name", "success",
                                 "name", "cost", "name", "weight",
                                 locked_in_column_name = "locked_in")

# print solution
print(s2)
#> # A tibble: 1 x 12
#>   solution method   obj budget  cost optimal S1_action S2_action S3_action
#>      <int> <chr>  <dbl>  <dbl> <dbl> <lgl>   <lgl>     <lgl>     <lgl>    
#> 1        1 heuri~  1.28    300  202. NA      FALSE     FALSE     TRUE     
#> # ... with 3 more variables: S4_action <lgl>, S5_action <lgl>,
#> #   baseline_action <lgl>

# plot solution
ppp_plot_spp_solution(sim_project_data, sim_action_data2,
                      sim_species_data, s2, "name", "success", "name",
                      "cost", "name", "weight")

# find a solution that meets a budget of 300 and does not allocate
# funding for the "S2_action" action. For instance, species "S2"
# might have very little cultural or economic importance. Broadly speaking,
# though, it is better to "lock in" "important" species rather than
# "lock out" unimportant species.
sim_action_data3 <- sim_action_data
sim_action_data3$locked_out <- sim_action_data3$name == "S2_action"
s3 <- ppp_heuristic_spp_solution(sim_project_data, sim_action_data3,
                                 sim_species_data, 300, "name", "success",
                                 "name", "cost", "name", "weight",
                                 locked_out_column_name = "locked_out")

# print solution
print(s3)
#> # A tibble: 1 x 12
#>   solution method   obj budget  cost optimal S1_action S2_action S3_action
#>      <int> <chr>  <dbl>  <dbl> <dbl> <lgl>   <lgl>     <lgl>     <lgl>    
#> 1        1 heuri~  1.37    300  294. NA      TRUE      FALSE     FALSE    
#> # ... with 3 more variables: S4_action <lgl>, S5_action <lgl>,
#> #   baseline_action <lgl>

# plot solution
ppp_plot_spp_solution(sim_project_data, sim_action_data3, sim_species_data,
                      s3, "name", "success", "name", "cost", "name",
                      "weight")

# find all solutions from the heuristic algorithm
# note we can set the budget higher than the total cost of all the
# projects, and the number of solutions to the total number of
# projects to achieve this
s4 <- ppp_heuristic_spp_solution(sim_project_data, sim_action_data,
                                 sim_species_data,
                                 sum(sim_action_data$cost) * 1.1,
                                 "name", "success", "name", "cost",
                                 "name", "weight",
                                 number_solutions = nrow(sim_action_data))

# print solutions
print(s4)
#> # A tibble: 6 x 12
#>   solution method   obj budget  cost optimal S1_action S2_action S3_action
#>      <int> <chr>  <dbl>  <dbl> <dbl> <lgl>   <lgl>     <lgl>     <lgl>    
#> 1        1 heuri~ 1.80    548. 498.  NA      TRUE      TRUE      TRUE     
#> 2        2 heuri~ 1.73    548. 395.  NA      TRUE      TRUE      FALSE    
#> 3        3 heuri~ 1.66    548. 295.  NA      TRUE      TRUE      FALSE    
#> 4        4 heuri~ 1.57    548. 200.  NA      FALSE     TRUE      FALSE    
#> 5        5 heuri~ 1.21    548.  99.2 NA      FALSE     FALSE     FALSE    
#> 6        6 heuri~ 0.675   548.   0   NA      FALSE     FALSE     FALSE    
#> # ... with 3 more variables: S4_action <lgl>, S5_action <lgl>,
#> #   baseline_action <lgl>

# plot solution cost against expected phylogenetic diversity
plot(obj ~ cost, data = s4,
     main = "Heuristic solutions", xlab = "Cost ($)",
     ylab = "Expected weighted species richness")