ecological-agent-skills 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (217) hide show
  1. package/AGENT_CONTEXT.md +191 -0
  2. package/CATALOG.md +329 -0
  3. package/LICENSE +692 -0
  4. package/README.md +347 -0
  5. package/bin/install.mjs +168 -0
  6. package/docs/comparison-with-alternatives.md +38 -0
  7. package/docs/global-examples-index.md +103 -0
  8. package/docs/repository-statistics.md +101 -0
  9. package/docs/theoretical-foundations.md +188 -0
  10. package/environment.yaml +106 -0
  11. package/examples/community/arctic_tundra_vegetation_example.md +247 -0
  12. package/examples/community/bird_landuse_example.md +63 -0
  13. package/examples/community/phytoplankton_reservoir_example.md +60 -0
  14. package/examples/community/reef_fish_indopacific_example.md +221 -0
  15. package/examples/impact/baci_road_example.md +57 -0
  16. package/examples/impact/ecosystem_services_atlantic_forest.md +83 -0
  17. package/examples/impact/forest_loss_borneo_timeseries_example.md +225 -0
  18. package/examples/occupancy/puma_camera_example.md +61 -0
  19. package/examples/occupancy/snow_leopard_himalayas_example.md +204 -0
  20. package/examples/reproducible/whittaker_biome_sdm_example.md +406 -0
  21. package/examples/sdm/anteater_cerrado_example.md +69 -0
  22. package/examples/sdm/jaguar_amazon_example.md +80 -0
  23. package/examples/sdm/koala_climate_change_example.md +170 -0
  24. package/examples/sdm/wolf_recolonization_europe_example.md +193 -0
  25. package/package.json +43 -0
  26. package/renv.lock +194 -0
  27. package/skills/SKILL_INDEX.json +1020 -0
  28. package/skills/acoustic-monitoring/SKILL.md +163 -0
  29. package/skills/acoustic-monitoring/examples/example-prompts.md +100 -0
  30. package/skills/acoustic-monitoring/examples/temperate_forest_birds_example.md +285 -0
  31. package/skills/acoustic-monitoring/resources/acoustic-indices-reference.md +93 -0
  32. package/skills/acoustic-monitoring/resources/soundscape-ecology-guide.md +90 -0
  33. package/skills/acoustic-monitoring/resources/species-id-tools-comparison.md +89 -0
  34. package/skills/acoustic-monitoring/scripts/batch_species_detection.py +360 -0
  35. package/skills/acoustic-monitoring/scripts/compute_acoustic_indices.R +235 -0
  36. package/skills/acoustic-monitoring/scripts/compute_acoustic_indices.py +374 -0
  37. package/skills/biostatistics-workbench/SKILL.md +140 -0
  38. package/skills/biostatistics-workbench/examples/example-prompts.md +39 -0
  39. package/skills/biostatistics-workbench/resources/effect-size-reference.md +81 -0
  40. package/skills/biostatistics-workbench/resources/glm-family-link-reference.md +47 -0
  41. package/skills/biostatistics-workbench/resources/test-selection-guide.md +93 -0
  42. package/skills/biostatistics-workbench/scripts/glm_pipeline.R +78 -0
  43. package/skills/biostatistics-workbench/scripts/glm_pipeline.py +210 -0
  44. package/skills/camera-trap-processing/SKILL.md +159 -0
  45. package/skills/camera-trap-processing/examples/example-prompts.md +103 -0
  46. package/skills/camera-trap-processing/examples/leopard_serengeti_example.md +231 -0
  47. package/skills/camera-trap-processing/resources/activity-patterns-reference.md +113 -0
  48. package/skills/camera-trap-processing/resources/camtrapR-workflow-guide.md +130 -0
  49. package/skills/camera-trap-processing/resources/detection-event-definition-guide.md +89 -0
  50. package/skills/camera-trap-processing/scripts/estimate_activity.R +169 -0
  51. package/skills/camera-trap-processing/scripts/process_camtrap_data.R +179 -0
  52. package/skills/camera-trap-processing/scripts/process_camtrap_data.py +192 -0
  53. package/skills/community-ecology-ordination/SKILL.md +133 -0
  54. package/skills/community-ecology-ordination/examples/example-prompts.md +35 -0
  55. package/skills/community-ecology-ordination/resources/dissimilarity-metric-guide.md +53 -0
  56. package/skills/community-ecology-ordination/resources/nmds-interpretation-guide.md +104 -0
  57. package/skills/community-ecology-ordination/scripts/__pycache__/community_analysis.cpython-311.pyc +0 -0
  58. package/skills/community-ecology-ordination/scripts/community_analysis.R +143 -0
  59. package/skills/community-ecology-ordination/scripts/community_analysis.py +231 -0
  60. package/skills/ecological-data-foundation/SKILL.md +129 -0
  61. package/skills/ecological-data-foundation/examples/example-prompts.md +40 -0
  62. package/skills/ecological-data-foundation/resources/coordinate-cleaning-flags.md +66 -0
  63. package/skills/ecological-data-foundation/resources/darwin-core-glossary.md +91 -0
  64. package/skills/ecological-data-foundation/resources/data-citation-guide.md +265 -0
  65. package/skills/ecological-data-foundation/resources/gbif-data-citation-guide.md +193 -0
  66. package/skills/ecological-data-foundation/resources/qa-checklist.md +83 -0
  67. package/skills/ecological-data-foundation/scripts/__pycache__/clean_occurrences.cpython-311.pyc +0 -0
  68. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_ebird.cpython-311.pyc +0 -0
  69. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_inat.cpython-311.pyc +0 -0
  70. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_iucn.cpython-311.pyc +0 -0
  71. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_obis.cpython-311.pyc +0 -0
  72. package/skills/ecological-data-foundation/scripts/clean_occurrences.R +230 -0
  73. package/skills/ecological-data-foundation/scripts/clean_occurrences.py +268 -0
  74. package/skills/ecological-data-foundation/scripts/download_from_ebird.R +251 -0
  75. package/skills/ecological-data-foundation/scripts/download_from_ebird.py +364 -0
  76. package/skills/ecological-data-foundation/scripts/download_from_gbif.R +315 -0
  77. package/skills/ecological-data-foundation/scripts/download_from_gbif.py +407 -0
  78. package/skills/ecological-data-foundation/scripts/download_from_inat.R +238 -0
  79. package/skills/ecological-data-foundation/scripts/download_from_inat.py +304 -0
  80. package/skills/ecological-data-foundation/scripts/download_from_iucn.R +273 -0
  81. package/skills/ecological-data-foundation/scripts/download_from_iucn.py +344 -0
  82. package/skills/ecological-data-foundation/scripts/download_from_obis.R +248 -0
  83. package/skills/ecological-data-foundation/scripts/download_from_obis.py +318 -0
  84. package/skills/ecological-impact-assessment/SKILL.md +123 -0
  85. package/skills/ecological-impact-assessment/examples/example-prompts.md +32 -0
  86. package/skills/ecological-impact-assessment/resources/baci-design-guide.md +55 -0
  87. package/skills/ecological-impact-assessment/resources/fragmentation-metrics-reference.md +86 -0
  88. package/skills/ecological-impact-assessment/resources/pressure-index-template.md +78 -0
  89. package/skills/ecological-impact-assessment/resources/study-design-guide.md +168 -0
  90. package/skills/ecological-impact-assessment/scripts/baci_analysis.R +161 -0
  91. package/skills/ecological-impact-assessment/scripts/fragmentation_analysis.py +141 -0
  92. package/skills/ecological-impact-assessment/scripts/power_analysis_baci.R +274 -0
  93. package/skills/ecosystem-services-assessment/SKILL.md +125 -0
  94. package/skills/ecosystem-services-assessment/examples/example-prompts.md +24 -0
  95. package/skills/ecosystem-services-assessment/resources/es-indicator-reference.md +45 -0
  96. package/skills/ecosystem-services-assessment/resources/invest-parameter-guide.md +86 -0
  97. package/skills/ecosystem-services-assessment/resources/rusle-coefficients.md +88 -0
  98. package/skills/ecosystem-services-assessment/scripts/__pycache__/compute_es.cpython-311.pyc +0 -0
  99. package/skills/ecosystem-services-assessment/scripts/compute_es.py +189 -0
  100. package/skills/ecosystem-services-assessment/scripts/tradeoff_analysis.R +161 -0
  101. package/skills/environmental-time-series/SKILL.md +125 -0
  102. package/skills/environmental-time-series/examples/example-prompts.md +33 -0
  103. package/skills/environmental-time-series/resources/anomaly-indices-reference.md +88 -0
  104. package/skills/environmental-time-series/resources/bfast-parameter-guide.md +69 -0
  105. package/skills/environmental-time-series/scripts/__pycache__/recovery_trajectory.cpython-311.pyc +0 -0
  106. package/skills/environmental-time-series/scripts/__pycache__/trend_analysis.cpython-311.pyc +0 -0
  107. package/skills/environmental-time-series/scripts/recovery_trajectory.R +305 -0
  108. package/skills/environmental-time-series/scripts/recovery_trajectory.py +178 -0
  109. package/skills/environmental-time-series/scripts/trend_analysis.R +192 -0
  110. package/skills/environmental-time-series/scripts/trend_analysis.py +184 -0
  111. package/skills/geoprocessing-for-ecology/SKILL.md +123 -0
  112. package/skills/geoprocessing-for-ecology/examples/example-prompts.md +32 -0
  113. package/skills/geoprocessing-for-ecology/resources/crs-reference.md +62 -0
  114. package/skills/geoprocessing-for-ecology/resources/global-predictor-sources.md +331 -0
  115. package/skills/geoprocessing-for-ecology/resources/resampling-methods.md +57 -0
  116. package/skills/geoprocessing-for-ecology/scripts/__pycache__/download_predictors.cpython-311.pyc +0 -0
  117. package/skills/geoprocessing-for-ecology/scripts/download_predictors.R +239 -0
  118. package/skills/geoprocessing-for-ecology/scripts/download_predictors.py +379 -0
  119. package/skills/geoprocessing-for-ecology/scripts/stack_and_extract.R +224 -0
  120. package/skills/geoprocessing-for-ecology/scripts/stack_and_extract.py +172 -0
  121. package/skills/landscape-connectivity/SKILL.md +170 -0
  122. package/skills/landscape-connectivity/examples/example-prompts.md +96 -0
  123. package/skills/landscape-connectivity/examples/jaguar_mesoamerica_corridor_example.md +271 -0
  124. package/skills/landscape-connectivity/resources/circuitscape-parameter-guide.md +155 -0
  125. package/skills/landscape-connectivity/resources/graph-theory-for-ecology.md +134 -0
  126. package/skills/landscape-connectivity/resources/resistance-surface-guide.md +141 -0
  127. package/skills/landscape-connectivity/scripts/connectivity_analysis.py +387 -0
  128. package/skills/landscape-connectivity/scripts/connectivity_metrics.R +274 -0
  129. package/skills/landscape-connectivity/scripts/resistance_surface.R +239 -0
  130. package/skills/model-validation-and-uncertainty/SKILL.md +131 -0
  131. package/skills/model-validation-and-uncertainty/examples/example-prompts.md +30 -0
  132. package/skills/model-validation-and-uncertainty/resources/extrapolation-risk-guide.md +236 -0
  133. package/skills/model-validation-and-uncertainty/resources/metric-selection-guide.md +52 -0
  134. package/skills/model-validation-and-uncertainty/resources/threshold-selection-guide.md +64 -0
  135. package/skills/model-validation-and-uncertainty/scripts/__pycache__/validate_model.cpython-311.pyc +0 -0
  136. package/skills/model-validation-and-uncertainty/scripts/extrapolation_risk.R +315 -0
  137. package/skills/model-validation-and-uncertainty/scripts/validate_model.py +226 -0
  138. package/skills/model-validation-and-uncertainty/scripts/validate_sdm.R +162 -0
  139. package/skills/occupancy-and-detection/SKILL.md +126 -0
  140. package/skills/occupancy-and-detection/examples/example-prompts.md +33 -0
  141. package/skills/occupancy-and-detection/resources/detection-history-format.md +100 -0
  142. package/skills/occupancy-and-detection/resources/occupancy-study-design.md +47 -0
  143. package/skills/occupancy-and-detection/scripts/__pycache__/occupancy_analysis.cpython-311.pyc +0 -0
  144. package/skills/occupancy-and-detection/scripts/occupancy_analysis.R +160 -0
  145. package/skills/occupancy-and-detection/scripts/occupancy_analysis.py +159 -0
  146. package/skills/population-viability-analysis/SKILL.md +161 -0
  147. package/skills/population-viability-analysis/examples/african_elephant_pva_example.md +266 -0
  148. package/skills/population-viability-analysis/examples/example-prompts.md +95 -0
  149. package/skills/population-viability-analysis/resources/extinction-risk-thresholds.md +128 -0
  150. package/skills/population-viability-analysis/resources/matrix-model-guide.md +139 -0
  151. package/skills/population-viability-analysis/resources/sensitivity-elasticity-reference.md +182 -0
  152. package/skills/population-viability-analysis/scripts/matrix_pva.R +258 -0
  153. package/skills/population-viability-analysis/scripts/pva_analysis.py +442 -0
  154. package/skills/population-viability-analysis/scripts/stochastic_pva.R +353 -0
  155. package/skills/predictive-modeling-best-practices/SKILL.md +136 -0
  156. package/skills/predictive-modeling-best-practices/examples/example-prompts.md +58 -0
  157. package/skills/predictive-modeling-best-practices/resources/collinearity-decision-tree.md +65 -0
  158. package/skills/predictive-modeling-best-practices/resources/sampling-bias-correction.md +267 -0
  159. package/skills/predictive-modeling-best-practices/resources/spatial-cv-guide.md +73 -0
  160. package/skills/predictive-modeling-best-practices/scripts/__pycache__/spatial_cv.cpython-311.pyc +0 -0
  161. package/skills/predictive-modeling-best-practices/scripts/collinearity_check.R +112 -0
  162. package/skills/predictive-modeling-best-practices/scripts/spatial_cv.py +182 -0
  163. package/skills/reproducible-ecology-pipeline/SKILL.md +139 -0
  164. package/skills/reproducible-ecology-pipeline/examples/example-prompts.md +35 -0
  165. package/skills/reproducible-ecology-pipeline/resources/directory-structure-template.md +94 -0
  166. package/skills/reproducible-ecology-pipeline/resources/params-yaml-template.yaml +84 -0
  167. package/skills/reproducible-ecology-pipeline/resources/reproducibility-checklist-template.md +66 -0
  168. package/skills/reproducible-ecology-pipeline/scripts/generate_file_manifest.py +110 -0
  169. package/skills/reproducible-ecology-pipeline/scripts/init_project.sh +53 -0
  170. package/skills/spatial-prioritization/SKILL.md +162 -0
  171. package/skills/spatial-prioritization/examples/biodiversity_hotspot_prioritization_example.md +289 -0
  172. package/skills/spatial-prioritization/examples/example-prompts.md +93 -0
  173. package/skills/spatial-prioritization/resources/cost-surface-reference.md +130 -0
  174. package/skills/spatial-prioritization/resources/marxan-vs-prioritizr-comparison.md +125 -0
  175. package/skills/spatial-prioritization/resources/prioritizr-formulation-guide.md +188 -0
  176. package/skills/spatial-prioritization/resources/representation-targets-guide.md +186 -0
  177. package/skills/spatial-prioritization/scripts/prioritization_sensitivity.R +320 -0
  178. package/skills/spatial-prioritization/scripts/run_prioritization.R +336 -0
  179. package/skills/species-distribution-modeling/SKILL.md +139 -0
  180. package/skills/species-distribution-modeling/examples/example-prompts.md +36 -0
  181. package/skills/species-distribution-modeling/resources/algorithm-comparison.md +25 -0
  182. package/skills/species-distribution-modeling/resources/calibration-area-guide.md +71 -0
  183. package/skills/species-distribution-modeling/resources/climate-scenario-preparation.md +170 -0
  184. package/skills/species-distribution-modeling/resources/maxent-calibration-guide.md +211 -0
  185. package/skills/species-distribution-modeling/resources/sdm-checklist.md +37 -0
  186. package/skills/species-distribution-modeling/scripts/predict_distribution.R +236 -0
  187. package/skills/species-distribution-modeling/scripts/predict_distribution.py +286 -0
  188. package/skills/species-distribution-modeling/scripts/prepare_future_layers.R +351 -0
  189. package/skills/species-distribution-modeling/scripts/project_scenarios.R +220 -0
  190. package/skills/species-distribution-modeling/scripts/run_ensemble_sdm.R +99 -0
  191. package/skills/species-distribution-modeling/scripts/sdm_pipeline.py +318 -0
  192. package/skills/species-distribution-modeling/scripts/tune_maxnet.R +344 -0
  193. package/templates/SKILL_TEMPLATE.md +225 -0
  194. package/templates/checklists/data-submission-checklist.md +38 -0
  195. package/templates/checklists/post-analysis-checklist.md +55 -0
  196. package/templates/checklists/pre-analysis-checklist.md +31 -0
  197. package/templates/prompts/debug-skill.md +47 -0
  198. package/templates/prompts/invoke-skill.md +34 -0
  199. package/templates/prompts/invoke-workflow.md +45 -0
  200. package/templates/reports/technical-report-template.md +80 -0
  201. package/templates/scripts/logger_setup.R +79 -0
  202. package/templates/scripts/logger_setup.py +119 -0
  203. package/templates/scripts/params_loader.R +28 -0
  204. package/templates/scripts/params_loader.py +38 -0
  205. package/workflows/analyze-community-structure/WORKFLOW.md +72 -0
  206. package/workflows/analyze-environmental-change/WORKFLOW.md +73 -0
  207. package/workflows/assess-ecological-impact/WORKFLOW.md +75 -0
  208. package/workflows/assess-ecosystem-services/WORKFLOW.md +68 -0
  209. package/workflows/assess-landscape-connectivity/WORKFLOW.md +84 -0
  210. package/workflows/build-fire-risk-map/WORKFLOW.md +79 -0
  211. package/workflows/produce-technical-report/WORKFLOW.md +113 -0
  212. package/workflows/run-camera-trap-occupancy/WORKFLOW.md +87 -0
  213. package/workflows/run-conservation-prioritization/WORKFLOW.md +89 -0
  214. package/workflows/run-multispecies-screening/WORKFLOW.md +197 -0
  215. package/workflows/run-occupancy-analysis/WORKFLOW.md +74 -0
  216. package/workflows/run-population-viability/WORKFLOW.md +90 -0
  217. package/workflows/run-sdm-study/WORKFLOW.md +99 -0
@@ -0,0 +1,353 @@
1
+ # ecological-agent-skills / Copyright (C) 2026 Francisco Diego Barros Barata
2
+ # SPDX-License-Identifier: GPL-3.0-or-later
3
+
4
+ # Usage: Rscript stochastic_pva.R <vital_rates_csv> <output_dir>
5
+ # [n_init] [t_max] [n_sim] [quasi_ext]
6
+ #
7
+ # Stochastic PVA via Monte Carlo simulation. Vital rates are drawn each year
8
+ # from Beta distributions (survival/stasis) or Lognormal distributions
9
+ # (fecundity), parameterised from observed inter-annual variation.
10
+ #
11
+ # Outputs:
12
+ # stochastic_pva_results.csv — P(quasi-extinction), MTE, lambda_s per threshold
13
+ # extinction_curve.csv — P(ext) vs time
14
+ # trajectory_plot.png — 200 stochastic trajectories + median
15
+ # extinction_curve.png — Cumulative extinction probability over time
16
+ # iucn_criterion_e.csv — IUCN Criterion E classification
17
+
18
+ # ── Inline logger ─────────────────────────────────────────────────────────────
19
+ SKILL_NAME <- "population-viability-analysis"
20
+ .log_ts <- function() format(Sys.time(), "[%Y-%m-%d %H:%M:%S]")
21
+ log_info <- function(...) message(.log_ts(), " [INFO] ", sprintf(...))
22
+ log_warn <- function(...) message(.log_ts(), " [WARN] ", sprintf(...))
23
+ log_error<- function(...) message(.log_ts(), " [ERROR] ", sprintf(...))
24
+ log_step <- function(n, d) log_info("-- STEP %d: %s", n, d)
25
+ log_decision <- function(v, val, why) log_info("DECISION | %s = %s | %s", v, val, why)
26
+ dir.create("logs", recursive=TRUE, showWarnings=FALSE)
27
+
28
+ suppressPackageStartupMessages(library(popbio))
29
+ suppressPackageStartupMessages(library(dplyr))
30
+ suppressPackageStartupMessages(library(ggplot2))
31
+
32
+ args <- commandArgs(trailingOnly = TRUE)
33
+ if (length(args) < 2) {
34
+ cat("Usage: Rscript stochastic_pva.R <vital_rates_csv> <output_dir>",
35
+ "[n_init] [t_max] [n_sim] [quasi_ext]\n")
36
+ quit(status = 1)
37
+ }
38
+
39
+ vr_path <- args[1]
40
+ output_dir <- args[2]
41
+ n_init <- if (length(args) >= 3) as.integer(args[3]) else NA_integer_
42
+ t_max <- if (length(args) >= 4) as.integer(args[4]) else 100L
43
+ n_sim <- if (length(args) >= 5) as.integer(args[5]) else 1000L
44
+ quasi_ext <- if (length(args) >= 6) as.numeric(args[6]) else 50
45
+
46
+ # ── Input precondition checks ─────────────────────────────────────────────────
47
+ if (!file.exists(vr_path)) {
48
+ log_error("Input nao encontrado: %s\nCausa provavel: passo anterior nao concluiu.\nVerifique: outputs do skill anterior.\nSkill anterior: population-viability-analysis (matrix_pva)", vr_path)
49
+ stop("Missing input: ", vr_path)
50
+ }
51
+
52
+ log_decision("t_max", t_max, "Time horizon in years for stochastic simulation")
53
+ log_decision("n_sim", n_sim, "Number of Monte Carlo simulation replicates")
54
+ log_decision("quasi_ext", quasi_ext, "Quasi-extinction threshold in individuals (IUCN Criterion E basis)")
55
+ log_decision("n_init", ifelse(is.na(n_init), "from_data", n_init), "Initial N; NA means read from vital_rates_csv last year or default 1000")
56
+
57
+ dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)
58
+
59
+ # ── Load vital rates ─────────────────────────────────────────────────────────
60
+ log_step(1, "Load vital rates CSV and detect matrix structure")
61
+ tryCatch({
62
+ vr <- read.csv(vr_path)
63
+ log_info("Loaded vital rates: %d rows, %d columns.", nrow(vr), ncol(vr))
64
+
65
+ mat_cols <- grep("^a_[0-9]+_[0-9]+$", names(vr), value = TRUE)
66
+ if (length(mat_cols) == 0) {
67
+ log_error("Nenhuma coluna de elemento de matriz (a_i_j) encontrada no CSV.\nCausa provavel: formato errado do arquivo de taxas vitais.\nVerifique: nomes das colunas do CSV.\nSkill anterior: population-viability-analysis (matrix_pva)")
68
+ stop("No matrix element columns (a_i_j) found in vital_rates_csv.")
69
+ }
70
+
71
+ if (nrow(vr) < 5) {
72
+ log_warn("Serie temporal curta (%d anos). Estimativas de variancia podem ser imprecisas; n_sim alto recomendado.", nrow(vr))
73
+ }
74
+
75
+ indices <- regmatches(mat_cols, gregexpr("[0-9]+", mat_cols))
76
+ k <- max(sapply(indices, function(x) max(as.integer(x))))
77
+ log_info("Matrix dimension detected: %d x %d", k, k)
78
+ log_decision("k", k, "Matrix dimension inferred from max index in vital rate column names")
79
+ }, error = function(e) {
80
+ log_error("Falha em load_vital_rates: %s\nCausa provavel: arquivo CSV malformado ou ausente.\nVerifique: caminho e formato do CSV de taxas vitais.\nSkill anterior: population-viability-analysis (matrix_pva)", conditionMessage(e))
81
+ stop(e)
82
+ })
83
+
84
+ # Per-element mean and variance
85
+ log_step(2, "Compute per-element mean and variance for distribution parameterisation")
86
+ tryCatch({
87
+ vr_stats <- lapply(mat_cols, function(col) {
88
+ x <- vr[[col]][!is.na(vr[[col]])]
89
+ mu <- mean(x)
90
+ sig2 <- var(x)
91
+ list(col = col, mu = mu, sig2 = sig2)
92
+ })
93
+ names(vr_stats) <- mat_cols
94
+ log_info("Vital rate statistics computed for %d matrix elements.", length(vr_stats))
95
+ }, error = function(e) {
96
+ log_error("Falha em compute_vr_stats: %s\nCausa provavel: colunas com todos os valores NA.\nVerifique: completude dos dados de taxas vitais.\nSkill anterior: population-viability-analysis (matrix_pva)", conditionMessage(e))
97
+ stop(e)
98
+ })
99
+
100
+ # ── Beta distribution parameterisation (for survival/stasis: bounded [0,1]) ──
101
+ beta_params <- function(mu, sig2) {
102
+ if (is.na(sig2) || sig2 <= 0 || mu <= 0 || mu >= 1) return(NULL)
103
+ # Cap variance to stay in valid Beta range
104
+ max_sig2 <- mu * (1 - mu) - 1e-6
105
+ sig2_use <- min(sig2, max_sig2 * 0.95)
106
+ denom <- mu * (1 - mu) / sig2_use - 1
107
+ list(shape1 = mu * denom, shape2 = (1 - mu) * denom)
108
+ }
109
+
110
+ # Lognormal parameterisation (for fecundity: unbounded > 0)
111
+ lnorm_params <- function(mu, sig2) {
112
+ if (is.na(sig2) || sig2 <= 0 || mu <= 0) return(NULL)
113
+ sigma2_ln <- log(1 + sig2 / mu^2)
114
+ mu_ln <- log(mu) - sigma2_ln / 2
115
+ list(meanlog = mu_ln, sdlog = sqrt(sigma2_ln))
116
+ }
117
+
118
+ # ── Draw a random matrix ──────────────────────────────────────────────────────
119
+ draw_matrix <- function() {
120
+ A <- matrix(0, k, k)
121
+ for (vs in vr_stats) {
122
+ idx <- as.integer(regmatches(vs$col, gregexpr("[0-9]+", vs$col))[[1]])
123
+ i <- idx[1]; j <- idx[2]
124
+ # Fecundity row (row 1): lognormal; survival/stasis: beta
125
+ if (i == 1 && vs$mu > 0) {
126
+ params <- lnorm_params(vs$mu, vs$sig2)
127
+ val <- if (!is.null(params)) max(0, rlnorm(1, params$meanlog, params$sdlog))
128
+ else vs$mu
129
+ } else {
130
+ params <- beta_params(vs$mu, vs$sig2)
131
+ val <- if (!is.null(params)) rbeta(1, params$shape1, params$shape2)
132
+ else min(max(vs$mu, 0), 1)
133
+ }
134
+ A[i, j] <- val
135
+ }
136
+ A
137
+ }
138
+
139
+ # ── Initial population vector ─────────────────────────────────────────────────
140
+ log_step(3, "Compute mean matrix and stable stage distribution for initial vector")
141
+ tryCatch({
142
+ A_mean <- matrix(0, k, k)
143
+ for (vs in vr_stats) {
144
+ idx <- as.integer(regmatches(vs$col, gregexpr("[0-9]+", vs$col))[[1]])
145
+ A_mean[idx[1], idx[2]] <- vs$mu
146
+ }
147
+ SS <- stable.stage(A_mean)
148
+
149
+ n0 <- if (!is.na(n_init)) n_init else {
150
+ if ("population_N" %in% names(vr)) as.integer(tail(vr$population_N, 1))
151
+ else 1000L
152
+ }
153
+ log_decision("n0", n0, "Initial N; from CLI arg or last year in vital_rates_csv or default 1000")
154
+
155
+ if (n0 <= quasi_ext) {
156
+ log_warn("n0 (%d) <= quasi_ext (%g). A populacao comeca abaixo do limiar de quasi-extincao.", n0, quasi_ext)
157
+ }
158
+ }, error = function(e) {
159
+ log_error("Falha em compute_initial_vector: %s\nCausa provavel: matriz singular ou eigenvalores complexos na A_mean.\nVerifique: estrutura da matriz de transicao.\nSkill anterior: population-viability-analysis (matrix_pva)", conditionMessage(e))
160
+ stop(e)
161
+ })
162
+
163
+ # ── Monte Carlo simulation ────────────────────────────────────────────────────
164
+ log_step(4, "Monte Carlo stochastic simulation")
165
+ log_info("Running %d stochastic simulations (t_max = %d, Ne = %g, N0 = %d)...",
166
+ n_sim, t_max, quasi_ext, n0)
167
+
168
+ tryCatch({
169
+ all_N <- matrix(NA_real_, n_sim, t_max + 1)
170
+ ext_times <- rep(NA_integer_, n_sim)
171
+
172
+ for (s in seq_len(n_sim)) {
173
+ n_vec <- round(n0 * SS)
174
+ N_total <- numeric(t_max + 1)
175
+ N_total[1] <- sum(n_vec)
176
+ extinct <- FALSE
177
+
178
+ for (t in seq_len(t_max)) {
179
+ if (!extinct) {
180
+ A_t <- draw_matrix()
181
+ n_vec <- A_t %*% n_vec
182
+ N_t <- sum(n_vec)
183
+ N_total[t + 1] <- N_t
184
+
185
+ if (N_t <= quasi_ext) {
186
+ extinct <- TRUE
187
+ ext_times[s] <- t
188
+ N_total[(t + 1):length(N_total)] <- 0
189
+ }
190
+ }
191
+ }
192
+ all_N[s, ] <- N_total
193
+ }
194
+ log_info("Simulations complete. Extinctions observed: %d / %d (%.1f%%)",
195
+ sum(!is.na(ext_times)), n_sim, 100 * mean(!is.na(ext_times)))
196
+ }, error = function(e) {
197
+ log_error("Falha em monte_carlo_simulation: %s\nCausa provavel: erro na amostragem de parametros ou overflow numerico.\nVerifique: parametros de distribuicao (mu, sig2) para cada elemento de matriz.\nSkill anterior: population-viability-analysis (matrix_pva)", conditionMessage(e))
198
+ stop(e)
199
+ })
200
+
201
+ # ── Extinction probability over time ──────────────────────────────────────────
202
+ log_step(5, "Compute extinction probability curve over time")
203
+ tryCatch({
204
+ ext_curve <- numeric(t_max)
205
+ for (t in seq_len(t_max)) {
206
+ ext_curve[t] <- mean(!is.na(ext_times) & ext_times <= t, na.rm = TRUE)
207
+ }
208
+
209
+ ext_df <- data.frame(time = seq_len(t_max), p_extinction = ext_curve)
210
+ write.csv(ext_df, file.path(output_dir, "extinction_curve.csv"), row.names = FALSE)
211
+ log_info("Extinction curve written. P(ext at t=%d) = %.4f", t_max, ext_curve[t_max])
212
+ }, error = function(e) {
213
+ log_error("Falha em extinction_curve: %s\nCausa provavel: vetor ext_times malformado ou diretorio de saida inacessivel.\nVerifique: output_dir e resultados da simulacao.\nSkill anterior: population-viability-analysis (matrix_pva)", conditionMessage(e))
214
+ stop(e)
215
+ })
216
+
217
+ # ── IUCN thresholds ───────────────────────────────────────────────────────────
218
+ log_step(6, "Classify IUCN Criterion E category")
219
+ tryCatch({
220
+ # IUCN Criterion E time horizons (generation time = t_max / 5 as approximation)
221
+ gen_time <- t_max / 5 # placeholder; adjust if generation time known
222
+ log_decision("gen_time", gen_time, "Approximated as t_max/5; replace with known generation time if available")
223
+
224
+ iucn_df <- data.frame(
225
+ category = c("CR", "EN", "VU"),
226
+ threshold = c(0.50, 0.20, 0.10),
227
+ time_horizon = c(min(100, max(10, 3 * gen_time)),
228
+ min(100, max(20, 5 * gen_time)),
229
+ 100)
230
+ )
231
+ iucn_df$p_extinction <- sapply(round(iucn_df$time_horizon), function(T) {
232
+ t_use <- min(T, t_max)
233
+ ext_curve[t_use]
234
+ })
235
+ iucn_df$qualifies <- iucn_df$p_extinction >= iucn_df$threshold
236
+ write.csv(iucn_df, file.path(output_dir, "iucn_criterion_e.csv"), row.names = FALSE)
237
+
238
+ # Overall risk category
239
+ risk_cat <- if (iucn_df$qualifies[iucn_df$category == "CR"]) "CR" else
240
+ if (iucn_df$qualifies[iucn_df$category == "EN"]) "EN" else
241
+ if (iucn_df$qualifies[iucn_df$category == "VU"]) "VU" else "LC/NT"
242
+
243
+ log_info("P(quasi-extinction <= %g at t = %d yr) = %.3f", quasi_ext, t_max, ext_curve[t_max])
244
+ log_info("IUCN Criterion E category: %s", risk_cat)
245
+
246
+ if (risk_cat %in% c("CR", "EN")) {
247
+ log_warn("Categoria IUCN %s detectada. Considerar medidas urgentes de conservacao.", risk_cat)
248
+ }
249
+ }, error = function(e) {
250
+ log_error("Falha em iucn_classification: %s\nCausa provavel: curva de extincao vazia ou horizontes temporais invalidos.\nVerifique: resultados da simulacao e parametro t_max.\nSkill anterior: population-viability-analysis (matrix_pva)", conditionMessage(e))
251
+ stop(e)
252
+ })
253
+
254
+ # ── Results summary ────────────────────────────────────────────────────────────
255
+ log_step(7, "Compute MTE, stochastic lambda_s, and write results summary")
256
+ tryCatch({
257
+ # MTE
258
+ valid_ext <- ext_times[!is.na(ext_times)]
259
+ mte_mean <- if (length(valid_ext) > 0) mean(valid_ext) else Inf
260
+ mte_ci <- if (length(valid_ext) >= 10)
261
+ quantile(valid_ext, c(0.025, 0.975)) else c(NA, NA)
262
+
263
+ if (length(valid_ext) < 10) {
264
+ log_warn("Menos de 10 extincoes observadas (%d). Intervalo de confianca do MTE nao calculado.", length(valid_ext))
265
+ }
266
+
267
+ # Stochastic growth rate (log lambda_s)
268
+ log_Ns <- log(all_N[, ncol(all_N)])
269
+ log_Ns <- log_Ns[is.finite(log_Ns) & log_Ns > 0]
270
+ lambda_s <- if (length(log_Ns) > 0) exp(mean(log_Ns - log(n0)) / t_max) else NA
271
+ log_decision("lambda_s", ifelse(is.na(lambda_s), "NA", round(lambda_s, 4)),
272
+ "Stochastic growth rate estimated from surviving simulation endpoints")
273
+
274
+ # Results summary
275
+ results_df <- data.frame(
276
+ metric = c("n_simulations", "n_init", "quasi_ext_threshold", "t_max",
277
+ "p_extinction", "mte_mean_yr", "mte_CI_2.5", "mte_CI_97.5",
278
+ "lambda_s", "iucn_category"),
279
+ value = c(n_sim, n0, quasi_ext, t_max,
280
+ round(ext_curve[t_max], 4),
281
+ round(mte_mean, 1), round(mte_ci[1], 1), round(mte_ci[2], 1),
282
+ round(lambda_s, 4), risk_cat)
283
+ )
284
+ write.csv(results_df, file.path(output_dir, "stochastic_pva_results.csv"),
285
+ row.names = FALSE)
286
+ log_info("Results written.")
287
+ log_info("Summary:\n%s", paste(capture.output(print(results_df)), collapse = "\n"))
288
+ }, error = function(e) {
289
+ log_error("Falha em results_summary: %s\nCausa provavel: erro no calculo do MTE ou lambda_s.\nVerifique: resultados da simulacao all_N e ext_times.\nSkill anterior: population-viability-analysis (matrix_pva)", conditionMessage(e))
290
+ stop(e)
291
+ })
292
+
293
+ # ── Trajectory plot ───────────────────────────────────────────────────────────
294
+ log_step(8, "Generate stochastic trajectory plot")
295
+ tryCatch({
296
+ # Sample 200 trajectories for plot
297
+ plot_idx <- sample(seq_len(n_sim), min(200, n_sim))
298
+ traj_df <- data.frame(
299
+ time = rep(0:t_max, length(plot_idx)),
300
+ N = as.vector(t(all_N[plot_idx, ])),
301
+ sim = rep(plot_idx, each = t_max + 1)
302
+ )
303
+
304
+ med_N <- apply(all_N, 2, median, na.rm = TRUE)
305
+ med_df <- data.frame(time = 0:t_max, N = med_N)
306
+
307
+ p_traj <- ggplot() +
308
+ geom_line(data = traj_df,
309
+ aes(x = time, y = N, group = sim),
310
+ alpha = 0.08, colour = "#2166AC", linewidth = 0.4) +
311
+ geom_line(data = med_df, aes(x = time, y = N),
312
+ colour = "darkblue", linewidth = 1.5) +
313
+ geom_hline(yintercept = quasi_ext, linetype = "dashed", colour = "red") +
314
+ scale_y_continuous(labels = scales::comma, limits = c(0, NA)) +
315
+ labs(x = "Time (years)", y = "Population size (N)",
316
+ title = sprintf("Stochastic PVA (%d simulations, N0 = %d, Ne = %g)",
317
+ n_sim, n0, quasi_ext),
318
+ subtitle = sprintf("P(extinction at t=%d) = %.3f | Category: %s",
319
+ t_max, ext_curve[t_max], risk_cat)) +
320
+ theme_minimal(base_size = 11)
321
+
322
+ ggsave(file.path(output_dir, "trajectory_plot.png"), p_traj,
323
+ width = 9, height = 5, dpi = 150)
324
+ log_info("Trajectory plot saved.")
325
+ }, error = function(e) {
326
+ log_error("Falha em trajectory_plot: %s\nCausa provavel: erro no ggplot2 ou dados de trajetoria invalidos.\nVerifique: instalacao do ggplot2 e matriz all_N.\nSkill anterior: population-viability-analysis (matrix_pva)", conditionMessage(e))
327
+ stop(e)
328
+ })
329
+
330
+ # ── Extinction curve plot ─────────────────────────────────────────────────────
331
+ log_step(9, "Generate extinction probability curve plot")
332
+ tryCatch({
333
+ p_ext <- ggplot(ext_df, aes(x = time, y = p_extinction)) +
334
+ geom_line(linewidth = 1.2, colour = "#D73027") +
335
+ geom_hline(yintercept = c(0.10, 0.20, 0.50),
336
+ linetype = "dashed", colour = c("goldenrod", "orange", "red")) +
337
+ annotate("text", x = t_max * 0.02, y = c(0.52, 0.22, 0.12),
338
+ label = c("CR >= 50%", "EN >= 20%", "VU >= 10%"),
339
+ colour = c("red", "orange", "goldenrod"), hjust = 0, size = 3) +
340
+ labs(x = "Time (years)", y = "Cumulative P(quasi-extinction)",
341
+ title = sprintf("Extinction probability curve (Ne threshold = %g)", quasi_ext)) +
342
+ coord_cartesian(ylim = c(0, 1)) +
343
+ theme_minimal(base_size = 11)
344
+
345
+ ggsave(file.path(output_dir, "extinction_curve.png"), p_ext,
346
+ width = 8, height = 5, dpi = 150)
347
+ log_info("Extinction curve plot saved.")
348
+ }, error = function(e) {
349
+ log_error("Falha em extinction_curve_plot: %s\nCausa provavel: erro no ggplot2 ou dados da curva de extincao invalidos.\nVerifique: instalacao do ggplot2 e ext_df.\nSkill anterior: population-viability-analysis (matrix_pva)", conditionMessage(e))
350
+ stop(e)
351
+ })
352
+
353
+ log_info("Stochastic PVA complete.")
@@ -0,0 +1,136 @@
1
+ ---
2
+ name: predictive-modeling-best-practices
3
+ description: "Guides predictor selection, collinearity checks, cross-validation strategy, and hyperparameter tuning for ecological predictive models. Use this skill when the user mentions VIF, collinearity, feature selection, spatial cross-validation, block CV, hyperparameter tuning, overfitting prevention, data leakage auditing, background point selection, pseudo-absence generation, ENMeval tuning, regularization, or spatial autocorrelation correction."
4
+ skill_version: 1.0.0
5
+ ---
6
+
7
+ # Skill: predictive-modeling-best-practices
8
+
9
+ **Domain:** CV · Tuning · Leakage · Collinearity · Overfitting
10
+ **Phase:** 1 — Foundation
11
+ **Used by:** run-sdm-study, build-fire-risk-map
12
+
13
+ ---
14
+
15
+ ## Purpose
16
+
17
+ Ensures that any predictive model in the project is built with sound ML practices: proper data splitting, cross-validation strategy, hyperparameter tuning, collinearity reduction, leakage prevention, and overfitting diagnosis.
18
+
19
+ ---
20
+
21
+ ## When to Invoke
22
+
23
+ - Before fitting any algorithmic model (MaxEnt, BRT, Random Forest, ANN, GLM for prediction)
24
+ - When designing the validation strategy for a modeling study
25
+ - When the user asks about feature selection, predictor reduction, or model tuning
26
+
27
+ ---
28
+
29
+ ## Inputs
30
+
31
+ | Input | Format | Required |
32
+ |-------|--------|----------|
33
+ | Feature matrix (predictors) | CSV, data frame | Yes |
34
+ | Target variable | Vector (binary, continuous, multiclass) | Yes |
35
+ | Spatial coordinates (if applicable) | lat/lon columns | Recommended |
36
+ | Candidate model list | Text description | Optional |
37
+
38
+ ---
39
+
40
+ ## Outputs
41
+
42
+ | Output | Description |
43
+ |--------|-------------|
44
+ | `cv_strategy.md` | Chosen CV method with rationale |
45
+ | `collinearity_report.csv` | VIF and pairwise correlation for all predictors |
46
+ | `selected_predictors.txt` | Final predictor set after reduction |
47
+ | `tuning_results.csv` | Hyperparameter grid search results |
48
+ | `leakage_audit.md` | Confirmation of no data leakage |
49
+ | `modeling_plan.md` | Complete modeling plan document |
50
+
51
+ ---
52
+
53
+ ## Steps
54
+
55
+ ### 1. Define the Modeling Objective
56
+ - Regression, binary classification, or multiclass?
57
+ - Interpolation within the study area or extrapolation to new areas/times?
58
+ - Primary metric: AUC, TSS, RMSE, R², F1?
59
+
60
+ ### 2. Data Splitting Strategy
61
+
62
+ **For non-spatial data:**
63
+ - Random split: 70% train / 15% validation / 15% test (or use k-fold CV)
64
+
65
+ **For spatial data (required for SDMs and most ecological models):**
66
+ - Spatial block cross-validation (checkerboard or custom blocks)
67
+ - Block size should exceed the spatial autocorrelation range
68
+ - Never use random splits for spatially autocorrelated data
69
+
70
+ **For temporal data:**
71
+ - Forward-chaining (walk-forward) CV; never shuffle temporal order
72
+
73
+ ### 3. Collinearity Assessment
74
+ - Compute Pearson/Spearman correlation matrix for all predictors
75
+ - Flag pairs with |r| > 0.7
76
+ - Compute VIF; flag predictors with VIF > 5
77
+ - Reduce collinear predictors using:
78
+ - Ecological/domain knowledge priority
79
+ - PCA (when interpretability is not critical)
80
+ - VIF-stepwise removal
81
+
82
+ ### 4. Leakage Audit
83
+ - Confirm that target variable information does not appear in any predictor
84
+ - Confirm that future information is not used for past predictions
85
+ - Confirm that validation/test data were not used during feature engineering or scaling
86
+
87
+ ### 5. Hyperparameter Tuning
88
+ - Define tuning grid for each candidate algorithm
89
+ - Use the training set + CV folds only; never touch the test set
90
+ - Report best hyperparameters and CV performance curve
91
+ - Flag overfitting: large gap between train and CV performance
92
+
93
+ ### 6. Feature Importance Pre-selection (optional)
94
+ - Run a preliminary model to rank feature importance
95
+ - Remove predictors with near-zero importance AND high collinearity burden
96
+ - Re-run CV with reduced predictor set; confirm no performance loss
97
+
98
+ ### 7. Finalize and Document Modeling Plan
99
+ - Chosen algorithm(s)
100
+ - CV strategy
101
+ - Final predictor set
102
+ - Tuned hyperparameters
103
+ - Primary evaluation metric
104
+
105
+ ---
106
+
107
+ ## Key Decisions to Document
108
+
109
+ - CV strategy and block size (for spatial CV)
110
+ - Collinearity threshold used
111
+ - Predictor selection method
112
+ - Tuning method (grid search, random search, Bayesian)
113
+ - Train/validation/test split sizes
114
+
115
+ ---
116
+
117
+ ## Tools and Libraries
118
+
119
+ **R:** `caret`, `tidymodels`, `blockCV`, `ENMeval`, `corrplot`, `usdm`
120
+ **Python:** `scikit-learn`, `optuna`, `shap`, `scipy.spatial`
121
+
122
+ ---
123
+
124
+ ## Resources
125
+
126
+ - `resources/spatial-cv-guide.md` — spatial block CV configuration guide
127
+ - `resources/collinearity-decision-tree.md` — when and how to remove predictors
128
+ - `examples/` — worked tuning examples for BRT and Random Forest
129
+
130
+ ---
131
+
132
+ ## Notes
133
+
134
+ - Spatial CV is mandatory for SDMs and any model with spatially autocorrelated responses
135
+ - Report both training and CV/test performance; never report training performance alone
136
+ - Regularisation (LASSO, ridge) is preferred over manual stepwise selection
@@ -0,0 +1,58 @@
1
+ # Example Invocation Prompts — predictive-modeling-best-practices
2
+
3
+ ## Full Pre-Modeling Assessment
4
+
5
+ ```
6
+ Load skill: predictive-modeling-best-practices
7
+ Task: Pre-modeling assessment for a jaguar SDM.
8
+ Predictor stack: data/predictors_stack.tif (19 bioclim + NDVI + slope = 21 variables)
9
+ Occurrence points: data/occ_clean.csv (n = 347)
10
+ Background points: 10,000 random points within the Amazon biome.
11
+
12
+ 1. Assess collinearity (threshold VIF < 5, |r| < 0.7). Use domain knowledge: prioritise
13
+ bio1, bio4, bio12, bio15, bio5, NDVI, slope.
14
+ 2. Define spatial CV strategy using blockCV. Study area is Amazon (~5 million km²).
15
+ 3. Design BRT and MaxEnt tuning grids.
16
+ 4. Produce: cv_strategy.md, collinearity_report.csv, selected_predictors.txt, modeling_plan.md
17
+ ```
18
+
19
+ ## Collinearity Check Only
20
+
21
+ ```
22
+ Load skill: predictive-modeling-best-practices
23
+ Task: Run collinearity check only on the environmental matrix in data/env_matrix.csv.
24
+ Threshold: VIF < 10 (lenient). Output: collinearity_report.csv.
25
+ Do NOT run CV or tuning.
26
+ ```
27
+
28
+ ## Sampling Bias Detection and Correction
29
+
30
+ ```
31
+ Load skill: predictive-modeling-best-practices
32
+ Task: Detect and correct sampling bias in jaguar occurrence records.
33
+ Occurrences: data/occ_clean.csv (n = 420, from GBIF)
34
+ Env stack: data/predictors_stack.tif
35
+ Study area: data/study_area/amazon_biome.shp
36
+
37
+ 1. Generate kernel density bias map from occurrence coordinates.
38
+ 2. Run KS test comparing environmental distribution of occurrences vs. background.
39
+ 3. If bias detected: apply target-group background using Carnivora GBIF records.
40
+ 4. As fallback: apply kernel density weighting to background.
41
+ 5. Output: bias_map.png, ks_test_results.csv, bg_weighted.csv, bias_correction_report.md
42
+ ```
43
+
44
+ ## Environmental Filtering (Thin in Environmental Space)
45
+
46
+ ```
47
+ Load skill: predictive-modeling-best-practices
48
+ Task: Apply environmental thinning to reduce bioclimatic over-representation.
49
+ Occurrences: data/occ_clean.csv (n = 850, many records from Cerrado)
50
+ Env stack: data/predictors_stack.tif (bio1, bio4, bio12, bio15 selected)
51
+
52
+ 1. Extract env values at all occurrence points.
53
+ 2. Run PCA on env values (first 2 axes).
54
+ 3. Grid sample in PC1/PC2 space (cell size = 0.5 SD units).
55
+ 4. Keep 1 record per environmental cell (random, seed = 42).
56
+ 5. Report: n before / n after, PCA variance explained, env coverage plot.
57
+ Output: occ_env_thinned.csv, env_thinning_report.md
58
+ ```
@@ -0,0 +1,65 @@
1
+ # Collinearity Management — Decision Guide
2
+
3
+ ## Step 1: Compute Correlations
4
+
5
+ ```r
6
+ library(usdm)
7
+ env_matrix <- values(predictor_stack) |> na.omit()
8
+
9
+ # Pairwise Pearson correlation
10
+ cor_matrix <- cor(env_matrix, method = "pearson")
11
+
12
+ # VIF for each variable
13
+ vif_results <- vifstep(env_matrix, th = 5) # remove until all VIF < 5
14
+ print(vif_results)
15
+ ```
16
+
17
+ ## Step 2: Apply the Decision Tree
18
+
19
+ ```
20
+ Compute pairwise |r| for all predictors
21
+
22
+ Any |r| > 0.7?
23
+ NO → Proceed; compute VIF as confirmation
24
+ YES → Apply reduction strategy (Step 3)
25
+
26
+ Any VIF > 5?
27
+ NO → Predictor set is acceptable
28
+ YES → Continue removing highest-VIF predictors
29
+ ```
30
+
31
+ ## Step 3: Reduction Strategies
32
+
33
+ ### A. Domain Knowledge Priority (preferred)
34
+ - List all predictors; mark those most ecologically relevant to the target species/process
35
+ - When two correlated predictors must be reduced, keep the one with stronger ecological rationale
36
+ - Document the justification for each kept predictor
37
+
38
+ ### B. VIF Stepwise Removal
39
+ - Iteratively remove the predictor with the highest VIF until all VIF < 5 (or < 10 for lenient threshold)
40
+ - `usdm::vifstep()` automates this
41
+
42
+ ### C. PCA (last resort, when interpretability is secondary)
43
+ - Apply PCA to collinear predictor block
44
+ - Retain axes explaining ≥ 90% of variance
45
+ - Trade-off: loses direct interpretation of individual predictors
46
+
47
+ ## Step 4: Document
48
+
49
+ Record in `collinearity_report.csv`:
50
+
51
+ | Predictor | VIF | Max_pairwise_r | Decision | Justification |
52
+ |-----------|-----|----------------|----------|---------------|
53
+ | bio1 | 2.3 | 0.61 | Keep | Key temperature variable |
54
+ | bio4 | 8.7 | 0.83 | Remove | Collinear with bio1 |
55
+ | bio12 | 1.9 | 0.45 | Keep | Key precipitation variable |
56
+
57
+ ## Common Collinear Groups in Bioclimatic Variables
58
+
59
+ | Group | Variables | Keep |
60
+ |-------|-----------|------|
61
+ | Temperature mean | bio1, bio11 | bio1 |
62
+ | Temperature seasonality | bio4, bio7 | bio4 |
63
+ | Precipitation total | bio12, bio13, bio14 | bio12 |
64
+ | Precipitation seasonality | bio15, bio3 | bio15 |
65
+ | Thermal extremes | bio5, bio6, bio8, bio9 | bio5 or bio6 (context-dependent) |