This package is still under development and not ready for use. Please do not use this package yet.
The optimalppp R package provides methods for prioritizing funding of conservation projects using the ‘Protect Prioritization Protocol’. Projects can be prioritized by maximizing expected species richness or expected phylogenetic diversity. Prioritizations can be generated using a range of methods, such as exact algorithms which can identify optimal solutions, heuristic algorithms which have conventionally been used to identify solutions, and by randomly funding projects. This package also provides the functionality to visualize how well solutions maintain biodiversity.
The latest development version of the optimalppp R package can be installed using the following code. Please note that you will need install the ggtree package from Bioconductor since it is not available on The Comprehensive R Archive Network.
if (!require(devtools))
install.packages("devtools")
devtools::install_github("prioritizr/optimalppp")
if (!require(ggtree))
devtools::install_bioc("ggtree")
Please note that you will also need to install the Gurobi optimization suite and gurobi R package. For instructions on installing these softwares, please refer to this installation guide.
Here we will provide a short example showing how the optimalppp R package can be used to prioritize funding for conservation projects. To start off, we will set the seed for the random number generator to ensure you get the same results as shown here, set some default behavior for the R session, load the optimalppp R package, and load the ggtree R package to plot phylogenetic trees.
Now we will load some data sets that are distributed with the package. First, we will load the sim_tree
object. This object describes the evolutionary relationships between 5 simulated species (named S1, S2, S3, …, S5). The length of each phylogenetic branch corresponds to millions of years of evolutionary history, and the last common ancestor for all these species occurred approximately 2 million years ago.
# load data
data(sim_tree)
# plot tree
ggtree(sim_tree) +
geom_tiplab(size = 2.5) +
geom_treescale(width = 2, x = 0, offset = 0.1)
Next, we will load the sim_action_data
object. This object stores information about various conservation actions in a tabular format (i.e. tibble
). Each row corresponds to a different action, and each column describes different properties associated with the actions. These actions correspond to specific management actions that have known costs. For example, they may relate to baiting or trapping sites of conservation importance. In this table, the "name"
column contains the name of each action, and the "cost"
action denotes the cost of funding each project. It also contains additional columns for customizing the solutions, but we will ignore them for now. Note that the last project—the "baseline_action"
—has a zero cost and is used subsequently to represent the baseline probability for species when no conservation actions are funded for them.
# load data
data(sim_action_data)
# print table
head(as.data.frame(sim_action_data))
## name cost locked_in locked_out
## 1 S1_action 94.39929 FALSE FALSE
## 2 S2_action 100.99137 FALSE FALSE
## 3 S3_action 103.22583 TRUE FALSE
## 4 S4_action 99.24274 FALSE FALSE
## 5 S5_action 99.90791 FALSE TRUE
## 6 baseline_action 0.00000 FALSE FALSE
Additionally, we will load the sim_project_data
object. This object stores information about various conservation projects in a tabular format (i.e. tibble
). Each row corresponds to a different project, and each column describes various properties associated with the projects. These projects correspond to groups of conservation actions. For example, a conservation project may pertain to a set of conservation actions that relate to a single species or single geographic locality. In this table, the "name"
column contains the name of each project, the "success"
column denotes the probability of each project succeeding if it is funded, the "S1"
–"SN"
columns show the enhanced probability of each species persisting if the project is funded, and the "S1_action"
–"SN_action"
columns indicate which actions are associated with which project. Note that the last project—the "baseline_project"
—is associated with the "baseline_action"
action. This project has a zero cost and represents the baseline probability of each species persisting if no other project is funded. Finally, although most projects in this example directly relate to a single species, you can input projects that directly affect the persistence of multiple species.
# load data
data(sim_project_data)
# print table
head(as.data.frame(sim_project_data))
## name success S1 S2 S3 S4
## 1 S1_project 0.9190985 0.7905800 0.0000000 0.0000000 0.0000000
## 2 S2_project 0.9232556 0.0000000 0.8881011 0.0000000 0.0000000
## 3 S3_project 0.8293499 0.0000000 0.0000000 0.5020887 0.0000000
## 4 S4_project 0.8475053 0.0000000 0.0000000 0.0000000 0.6899938
## 5 S5_project 0.8137868 0.0000000 0.0000000 0.0000000 0.0000000
## 6 baseline_project 1.0000000 0.2977965 0.2500224 0.0864612 0.2489246
## S5 S1_action S2_action S3_action S4_action S5_action
## 1 0.0000000 TRUE FALSE FALSE FALSE FALSE
## 2 0.0000000 FALSE TRUE FALSE FALSE FALSE
## 3 0.0000000 FALSE FALSE TRUE FALSE FALSE
## 4 0.0000000 FALSE FALSE FALSE TRUE FALSE
## 5 0.6166465 FALSE FALSE FALSE FALSE TRUE
## 6 0.1820005 FALSE FALSE FALSE FALSE FALSE
## baseline_action
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 TRUE
Let us assume that our resources are limited such that we can only spend, at most, $200 on funding conservation projects. In other words, our budget is capped at $200. Now, given the project data (sim_project_data
), the species’ evolutionary relationships (sim_tree
), and this budget (200
), So, let’s cut to the chase and find an optimal solution.
# solve problem
s1 <- ppp_exact_phylo_solution(x = sim_project_data, y = sim_action_data,
tree = sim_tree, budget = 200,
project_column_name = "name",
success_column_name = "success",
action_column_name = "name",
cost_column_name = "cost")
# print solution
head(as.data.frame(s1))
## solution method obj budget cost optimal S1_action S2_action
## 1 1 exact 2.447308 200 193.642 TRUE TRUE FALSE
## S3_action S4_action S5_action baseline_action
## 1 FALSE TRUE FALSE TRUE
The object s1
contains the solution and also various statistics associated with the solution in a tabular format (i.e. tibble
). Here, each row corresponds to a different solution. Specifically, the "solution"
column contains an identifier for the solution (which may be useful for methods that output multiple solutions), the "obj"
column contains the objective value (i.e. the expected phylogenetic diversity for this problem; Faith 2008), the "budget"
column stores the budget used for generating the solution, the "cost"
column stores the cost of the solution, the "optimal"
column indicates if the solution is known to be optimal (NA
values mean the optimality is unknown), and the "method"
column contains the name of the method used to generate the solution. The remaining columns ("S1_project"
, "S2_project"
, "S3_project"
, …, "SN_project"
, and "baseline_project"
) indicate if each project was prioritized for funding in the solution.
Here, the objective value (in the "obj"
column) denotes the amount of evolutionary history that is expected to persist (i.e. 2.447 million years). Put simply, solutions that are expected to result in better conservation outcomes will be associated with a greater objective value. Since tabular data can be difficult to intuit, let’s visualize how well this solution would maintain the different branches in the phylogeny. Note that species which receive any funding are denoted with an asterisk.
# visualize solution
ppp_plot_phylo_solution(sim_project_data, sim_action_data, sim_tree, s1,
project_column_name = "name",
success_column_name = "success",
action_column_name = "name",
cost_column_name = "cost") +
geom_treescale(width = 2, x = 0, offset = 0.1)
This has just been a taster of the optimalppp R package. For more information, see the package vignette.
Please use the following citation to cite the optimalppp R package in publications:
This package is still under development and not ready for use. Please do not use this package yet.
Hanson JO, Schuster R, Strimas-Mackey M, Bennett J, (2018). optimalppp: Optimal Project Prioritization Protocol. R package version 0.0.0.4. Available at https://github.com/prioritizr/optimalppp.