R/add_max_phylo_div_objective.R
add_max_phylo_div_objective.Rd
Set the objective of a project prioritization problem()
to
maximize the phylogenetic diversity that is expected to persist into the
future, whilst ensuring that the cost of the solution is within a
pre-specified budget (Bennett et al. 2014, Faith 2008).
add_max_phylo_div_objective(x, budget, tree)
ProjectProblem object.
numeric
budget for funding actions.
ape::phylo()
phylogenetic tree describing the
evolutionary relationships between the features. Note that the
argument to tree
must contain every feature, and only the
features, present in the argument to x
.
ProjectProblem object with the objective added to it.
A problem objective is used to specify the overall goal of the project prioritization problem. Here, the maximum phylogenetic diversity objective seeks to find the set of actions that maximizes the expected amount of evolutionary history that is expected to persist into the future given the evolutionary relationships between the features (e.g. populations, species). Let \(I\) represent the set of conservation actions (indexed by \(i\)). Let \(C_i\) denote the cost for funding action \(i\), and let \(m\) denote the maximum expenditure (i.e. the budget). Also, let \(F\) represent each feature (indexed by \(f\)), \(W_f\) represent the weight for each feature \(f\) (defaults to zero for each feature unless specified otherwise), and \(E_f\) denote the probability that each feature will go extinct given the funded conservation projects.
To describe the evolutionary relationships between the features \(f \in F\), consider a phylogenetic tree that contains features \(f \in F\) with branches of known lengths. This tree can be described using mathematical notation by letting \(B\) represent the branches (indexed by \(b\)) with lengths \(L_b\) and letting \(T_{bf}\) indicate which features \(f \in F\) are associated with which phylogenetic branches \(b \in B\) using zeros and ones. Ideally, the set of features \(F\) would contain all of the species in the study area---including non-threatened species---to fully account for the benefits for funding different actions.
To guide the prioritization, the conservation actions are organized into
conservation projects. Let \(J\) denote the set of conservation projects
(indexed by \(j\)), and let \(A_{ij}\) denote which actions
\(i \in I\) comprise each conservation project
\(j \in J\) using zeros and ones. Next, let \(P_j\) represent
the probability of project \(j\) being successful if it is funded. Also,
let \(B_{fj}\) denote the enhanced probability that each feature
\(f \in F\) associated with the project \(j \in J\)
will persist if all of the actions that comprise project \(j\) are funded
and that project is allocated to feature \(f\).
For convenience,
let \(Q_{fj}\) denote the actual probability that each
\(f \in F\) associated with the project \(j \in J\)
is expected to persist if the project is funded. If the argument
to adjust_for_baseline
in the problem
function was set to
TRUE
, and this is the default behavior, then
\(Q_{fj} = (P_{j} \times B_{fj}) + \bigg(\big(1 - (P_{j} B_{fj})\big)
\times (P_{n} \times B_{fn})\bigg)\), where n
corresponds to the
baseline "do nothing" project. This means that the probability
of a feature persisting if a project is allocated to a feature
depends on (i) the probability of the project succeeding, (ii) the
probability of the feature persisting if the project does not fail,
and (iii) the probability of the feature persisting even if the project
fails. Otherwise, if the argument is set to FALSE
, then
\(Q_{fj} = P_{j} \times B_{fj}\).
The binary control variables \(X_i\) in this problem indicate whether each project \(i \in I\) is funded or not. The decision variables in this problem are the \(Y_{j}\), \(Z_{fj}\), \(E_f\), and \(R_b\) variables. Specifically, the binary \(Y_{j}\) variables indicate if project \(j\) is funded or not based on which actions are funded; the binary \(Z_{fj}\) variables indicate if project \(j\) is used to manage feature \(f\) or not; the semi-continuous \(E_f\) variables denote the probability that feature \(f\) will go extinct; and the semi-continuous \(R_b\) variables denote the probability that phylogenetic branch \(b\) will remain in the future.
Now that we have defined all the data and variables, we can formulate the problem. For convenience, let the symbol used to denote each set also represent its cardinality (e.g. if there are ten features, let \(F\) represent the set of ten features and also the number ten).
$$ \mathrm{Maximize} \space (\sum_{b = 0}^{B} L_b R_b) + \sum_{f}^{F} (1 - E_f) W_f \space \mathrm{(eqn \space 1a)} \\ \mathrm{Subject \space to} \space \sum_{i = 0}^{I} C_i \leq m \space \mathrm{(eqn \space 1b)} \\ R_b = 1 - \prod_{f = 0}^{F} ifelse(T_{bf} == 1, \space E_f, \space 1) \space \forall \space b \in B \space \mathrm{(eqn \space 1c)} \\ E_f = 1 - \sum_{j = 0}^{J} Z_{fj} Q_{fj} \space \forall \space f \in F \space \mathrm{(eqn \space 1d)} \\ Z_{fj} \leq Y_{j} \space \forall \space j \in J \space \mathrm{(eqn \space 1e)} \\ \sum_{j = 0}^{J} Z_{fj} \times \mathrm{ceil}(Q_{fj}) = 1 \space \forall \space f \in F \space \mathrm{(eqn \space 1f)} \\ A_{ij} Y_{j} \leq X_{i} \space \forall \space i \in I, j \in J \space \mathrm{(eqn \space 1g)} \\ E_{f}, R_{b} \geq 0, E_{f}, R_{b} \leq 1 \space \forall \space b \in B \space f \in F \space \mathrm{(eqn \space 1h)} \\ X_{i}, Y_{j}, Z_{fj} \in [0, 1] \space \forall \space i \in I, j \in J, f \in F \space \mathrm{(eqn \space 1i)} $$
The objective (eqn 1a) is to maximize the expected phylogenetic diversity (Faith 2008) plus the probability each feature will remain multiplied by their weights (noting that the feature weights default to zero). Constraint (eqn 1b) limits the maximum expenditure (i.e. ensures that the cost of the funded actions do not exceed the budget). Constraints (eqn 1c) calculate the probability that each branch (including tips that correspond to a single feature) will go extinct according to the probability that the features which share a given branch will go extinct. Constraints (eqn 1d) calculate the probability that each feature will go extinct according to their allocated project. Constraints (eqn 1e) ensure that feature can only be allocated to projects that have all of their actions funded. Constraints (eqn 1f) state that each feature can only be allocated to a single project. Constraints (eqn 1g) ensure that a project cannot be funded unless all of its actions are funded. Constraints (eqns 1h) ensure that the probability variables (\(E_f\)) are bounded between zero and one. Constraints (eqns 1i) ensure that the action funding (\(X_i\)), project funding (\(Y_j\)), and project allocation (\(Z_{fj}\)) variables are binary.
Although this formulation is a mixed integer quadratically constrained programming problem (due to eqn 1c), it can be approximated using linear terms and then solved using commercial mixed integer programming solvers. This can be achieved by substituting the product of the feature extinction probabilities (eqn 1c) with the sum of the log feature extinction probabilities and using piecewise linear approximations (described in Hillier & Price 2005 pp. 390--392) to approximate the exponent of this term.
Bennett JR, Elliott G, Mellish B, Joseph LN, Tulloch AI, Probert WJ, Di Fonzo MMI, Monks JM, Possingham HP & Maloney R (2014) Balancing phylogenetic diversity and species numbers in conservation prioritization, using a case study of threatened species in New Zealand. Biological Conservation, 174: 47--54.
Faith DP (2008) Threatened species and the potential loss of phylogenetic diversity: conservation scenarios based on estimated extinction probabilities and phylogenetic risk analysis. Conservation Biology, 22: 1461--1470.
Hillier FS & Price CC (2005) International series in operations research & management science. Springer.
# load data
data(sim_projects, sim_features, sim_actions, sim_tree)
# plot tree
plot(sim_tree)
# build problem with maximum phylogenetic diversity objective and $200 budget
p1 <- problem(sim_projects, sim_actions, sim_features,
"name", "success", "name", "cost", "name") %>%
add_max_phylo_div_objective(budget = 200, tree = sim_tree) %>%
add_binary_decisions()
# \dontrun{
# solve problem
s1 <- solve(p1)
#> Set parameter Username
#> Set parameter TimeLimit to value 2147483647
#> Set parameter MIPGap to value 0
#> Set parameter NumericFocus to value 3
#> Set parameter Presolve to value 2
#> Set parameter Threads to value 1
#> Set parameter PoolSolutions to value 1
#> Set parameter PoolSearchMode to value 2
#> Academic license - for non-commercial use only - expires 2025-04-21
#> Gurobi Optimizer version 11.0.2 build v11.0.2rc0 (linux64 - "Ubuntu 22.04.4 LTS")
#>
#> CPU model: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz, instruction set [SSE2|AVX|AVX2|AVX512]
#> Thread count: 4 physical cores, 8 logical processors, using up to 1 threads
#>
#> Optimize a model with 50 rows, 50 columns and 123 nonzeros
#> Model fingerprint: 0x8b6e070d
#> Model has 3 piecewise-linear objective terms
#> Variable types: 3 continuous, 42 integer (42 binary)
#> Semi-Variable types: 5 continuous, 0 integer
#> Coefficient statistics:
#> Matrix range [9e-02, 1e+02]
#> Objective range [2e-01, 2e+00]
#> Bounds range [1e+00, 1e+00]
#> RHS range [1e+00, 2e+02]
#> PWLObj x range [6e-01, 5e+00]
#> PWLObj obj range [5e-03, 1e+00]
#> Found heuristic solution: objective 1.7230501
#> Presolve removed 16 rows and 12 columns
#> Presolve time: 0.00s
#> Presolved: 37 rows, 2530 columns, 2572 nonzeros
#> Variable types: 2495 continuous, 35 integer (35 binary)
#> Root relaxation presolved: 34 rows, 2527 columns, 2566 nonzeros
#>
#>
#> Root relaxation: objective 2.638346e+00, 23 iterations, 0.00 seconds (0.01 work units)
#>
#> Nodes | Current Node | Objective Bounds | Work
#> Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
#>
#> 0 0 2.63835 0 6 1.72305 2.63835 53.1% - 0s
#> H 0 0 1.9924876 2.62002 31.5% - 0s
#> H 0 0 2.0656903 2.62002 26.8% - 0s
#> H 0 0 2.1724148 2.62002 20.6% - 0s
#> H 0 0 2.5726722 2.62002 1.84% - 0s
#> 0 0 infeasible 0 2.57267 2.57267 0.00% - 0s
#>
#> Explored 1 nodes (28 simplex iterations) in 0.02 seconds (0.04 work units)
#> Thread count was 1 (of 8 available processors)
#>
#> Solution count 1: 2.57267
#> No other solutions better than 2.57267
#>
#> Optimal solution found (tolerance 0.00e+00)
#> Best objective 2.572672188340e+00, best bound 2.572672188340e+00, gap 0.0000%
# print solution
print(s1)
#> # A tibble: 1 × 21
#> solution status obj cost F1_action F2_action F3_action F4_action F5_action
#> <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 OPTIMAL 2.57 194. 1 0 0 0 1
#> # ℹ 12 more variables: baseline_action <dbl>, F1_project <dbl>,
#> # F2_project <dbl>, F3_project <dbl>, F4_project <dbl>, F5_project <dbl>,
#> # baseline_project <dbl>, F1 <dbl>, F2 <dbl>, F3 <dbl>, F4 <dbl>, F5 <dbl>
# plot solution
plot(p1, s1)
# build another problem that includes feature weights
p2 <- p1 %>%
add_feature_weights("weight")
# solve problem with feature weights
s2 <- solve(p2)
#> Set parameter Username
#> Set parameter TimeLimit to value 2147483647
#> Set parameter MIPGap to value 0
#> Set parameter NumericFocus to value 3
#> Set parameter Presolve to value 2
#> Set parameter Threads to value 1
#> Set parameter PoolSolutions to value 1
#> Set parameter PoolSearchMode to value 2
#> Academic license - for non-commercial use only - expires 2025-04-21
#> Gurobi Optimizer version 11.0.2 build v11.0.2rc0 (linux64 - "Ubuntu 22.04.4 LTS")
#>
#> CPU model: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz, instruction set [SSE2|AVX|AVX2|AVX512]
#> Thread count: 4 physical cores, 8 logical processors, using up to 1 threads
#>
#> Optimize a model with 50 rows, 50 columns and 123 nonzeros
#> Model fingerprint: 0x6f543abe
#> Model has 3 piecewise-linear objective terms
#> Variable types: 3 continuous, 42 integer (42 binary)
#> Semi-Variable types: 5 continuous, 0 integer
#> Coefficient statistics:
#> Matrix range [9e-02, 1e+02]
#> Objective range [4e-01, 3e+00]
#> Bounds range [1e+00, 1e+00]
#> RHS range [1e+00, 2e+02]
#> PWLObj x range [6e-01, 5e+00]
#> PWLObj obj range [5e-03, 1e+00]
#> Found heuristic solution: objective 2.3885146
#> Presolve removed 16 rows and 12 columns
#> Presolve time: 0.00s
#> Presolved: 37 rows, 2530 columns, 2572 nonzeros
#> Variable types: 2495 continuous, 35 integer (35 binary)
#> Root relaxation presolved: 34 rows, 2527 columns, 2566 nonzeros
#>
#>
#> Root relaxation: objective 4.074411e+00, 24 iterations, 0.00 seconds (0.01 work units)
#>
#> Nodes | Current Node | Objective Bounds | Work
#> Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
#>
#> 0 0 4.07441 0 6 2.38851 4.07441 70.6% - 0s
#> H 0 0 4.0560715 4.05682 0.02% - 0s
#> H 0 0 4.0567449 4.05682 0.00% - 0s
#> 0 0 cutoff 0 4.05674 4.05674 0.00% - 0s
#>
#> Explored 1 nodes (25 simplex iterations) in 0.01 seconds (0.03 work units)
#> Thread count was 1 (of 8 available processors)
#>
#> Solution count 1: 4.05674
#> No other solutions better than 4.05674
#>
#> Optimal solution found (tolerance 0.00e+00)
#> Best objective 4.056744940702e+00, best bound 4.056744940702e+00, gap 0.0000%
# print solution based on feature weights
print(s2)
#> # A tibble: 1 × 21
#> solution status obj cost F1_action F2_action F3_action F4_action F5_action
#> <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 OPTIMAL 4.06 199. 0 0 0 1 1
#> # ℹ 12 more variables: baseline_action <dbl>, F1_project <dbl>,
#> # F2_project <dbl>, F3_project <dbl>, F4_project <dbl>, F5_project <dbl>,
#> # baseline_project <dbl>, F1 <dbl>, F2 <dbl>, F3 <dbl>, F4 <dbl>, F5 <dbl>
# plot solution based on feature weights
plot(p2, s2)
# }