ecological-agent-skills 3.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENT_CONTEXT.md +191 -0
- package/CATALOG.md +329 -0
- package/LICENSE +692 -0
- package/README.md +347 -0
- package/bin/install.mjs +168 -0
- package/docs/comparison-with-alternatives.md +38 -0
- package/docs/global-examples-index.md +103 -0
- package/docs/repository-statistics.md +101 -0
- package/docs/theoretical-foundations.md +188 -0
- package/environment.yaml +106 -0
- package/examples/community/arctic_tundra_vegetation_example.md +247 -0
- package/examples/community/bird_landuse_example.md +63 -0
- package/examples/community/phytoplankton_reservoir_example.md +60 -0
- package/examples/community/reef_fish_indopacific_example.md +221 -0
- package/examples/impact/baci_road_example.md +57 -0
- package/examples/impact/ecosystem_services_atlantic_forest.md +83 -0
- package/examples/impact/forest_loss_borneo_timeseries_example.md +225 -0
- package/examples/occupancy/puma_camera_example.md +61 -0
- package/examples/occupancy/snow_leopard_himalayas_example.md +204 -0
- package/examples/reproducible/whittaker_biome_sdm_example.md +406 -0
- package/examples/sdm/anteater_cerrado_example.md +69 -0
- package/examples/sdm/jaguar_amazon_example.md +80 -0
- package/examples/sdm/koala_climate_change_example.md +170 -0
- package/examples/sdm/wolf_recolonization_europe_example.md +193 -0
- package/package.json +43 -0
- package/renv.lock +194 -0
- package/skills/SKILL_INDEX.json +1020 -0
- package/skills/acoustic-monitoring/SKILL.md +163 -0
- package/skills/acoustic-monitoring/examples/example-prompts.md +100 -0
- package/skills/acoustic-monitoring/examples/temperate_forest_birds_example.md +285 -0
- package/skills/acoustic-monitoring/resources/acoustic-indices-reference.md +93 -0
- package/skills/acoustic-monitoring/resources/soundscape-ecology-guide.md +90 -0
- package/skills/acoustic-monitoring/resources/species-id-tools-comparison.md +89 -0
- package/skills/acoustic-monitoring/scripts/batch_species_detection.py +360 -0
- package/skills/acoustic-monitoring/scripts/compute_acoustic_indices.R +235 -0
- package/skills/acoustic-monitoring/scripts/compute_acoustic_indices.py +374 -0
- package/skills/biostatistics-workbench/SKILL.md +140 -0
- package/skills/biostatistics-workbench/examples/example-prompts.md +39 -0
- package/skills/biostatistics-workbench/resources/effect-size-reference.md +81 -0
- package/skills/biostatistics-workbench/resources/glm-family-link-reference.md +47 -0
- package/skills/biostatistics-workbench/resources/test-selection-guide.md +93 -0
- package/skills/biostatistics-workbench/scripts/glm_pipeline.R +78 -0
- package/skills/biostatistics-workbench/scripts/glm_pipeline.py +210 -0
- package/skills/camera-trap-processing/SKILL.md +159 -0
- package/skills/camera-trap-processing/examples/example-prompts.md +103 -0
- package/skills/camera-trap-processing/examples/leopard_serengeti_example.md +231 -0
- package/skills/camera-trap-processing/resources/activity-patterns-reference.md +113 -0
- package/skills/camera-trap-processing/resources/camtrapR-workflow-guide.md +130 -0
- package/skills/camera-trap-processing/resources/detection-event-definition-guide.md +89 -0
- package/skills/camera-trap-processing/scripts/estimate_activity.R +169 -0
- package/skills/camera-trap-processing/scripts/process_camtrap_data.R +179 -0
- package/skills/camera-trap-processing/scripts/process_camtrap_data.py +192 -0
- package/skills/community-ecology-ordination/SKILL.md +133 -0
- package/skills/community-ecology-ordination/examples/example-prompts.md +35 -0
- package/skills/community-ecology-ordination/resources/dissimilarity-metric-guide.md +53 -0
- package/skills/community-ecology-ordination/resources/nmds-interpretation-guide.md +104 -0
- package/skills/community-ecology-ordination/scripts/__pycache__/community_analysis.cpython-311.pyc +0 -0
- package/skills/community-ecology-ordination/scripts/community_analysis.R +143 -0
- package/skills/community-ecology-ordination/scripts/community_analysis.py +231 -0
- package/skills/ecological-data-foundation/SKILL.md +129 -0
- package/skills/ecological-data-foundation/examples/example-prompts.md +40 -0
- package/skills/ecological-data-foundation/resources/coordinate-cleaning-flags.md +66 -0
- package/skills/ecological-data-foundation/resources/darwin-core-glossary.md +91 -0
- package/skills/ecological-data-foundation/resources/data-citation-guide.md +265 -0
- package/skills/ecological-data-foundation/resources/gbif-data-citation-guide.md +193 -0
- package/skills/ecological-data-foundation/resources/qa-checklist.md +83 -0
- package/skills/ecological-data-foundation/scripts/__pycache__/clean_occurrences.cpython-311.pyc +0 -0
- package/skills/ecological-data-foundation/scripts/__pycache__/download_from_ebird.cpython-311.pyc +0 -0
- package/skills/ecological-data-foundation/scripts/__pycache__/download_from_inat.cpython-311.pyc +0 -0
- package/skills/ecological-data-foundation/scripts/__pycache__/download_from_iucn.cpython-311.pyc +0 -0
- package/skills/ecological-data-foundation/scripts/__pycache__/download_from_obis.cpython-311.pyc +0 -0
- package/skills/ecological-data-foundation/scripts/clean_occurrences.R +230 -0
- package/skills/ecological-data-foundation/scripts/clean_occurrences.py +268 -0
- package/skills/ecological-data-foundation/scripts/download_from_ebird.R +251 -0
- package/skills/ecological-data-foundation/scripts/download_from_ebird.py +364 -0
- package/skills/ecological-data-foundation/scripts/download_from_gbif.R +315 -0
- package/skills/ecological-data-foundation/scripts/download_from_gbif.py +407 -0
- package/skills/ecological-data-foundation/scripts/download_from_inat.R +238 -0
- package/skills/ecological-data-foundation/scripts/download_from_inat.py +304 -0
- package/skills/ecological-data-foundation/scripts/download_from_iucn.R +273 -0
- package/skills/ecological-data-foundation/scripts/download_from_iucn.py +344 -0
- package/skills/ecological-data-foundation/scripts/download_from_obis.R +248 -0
- package/skills/ecological-data-foundation/scripts/download_from_obis.py +318 -0
- package/skills/ecological-impact-assessment/SKILL.md +123 -0
- package/skills/ecological-impact-assessment/examples/example-prompts.md +32 -0
- package/skills/ecological-impact-assessment/resources/baci-design-guide.md +55 -0
- package/skills/ecological-impact-assessment/resources/fragmentation-metrics-reference.md +86 -0
- package/skills/ecological-impact-assessment/resources/pressure-index-template.md +78 -0
- package/skills/ecological-impact-assessment/resources/study-design-guide.md +168 -0
- package/skills/ecological-impact-assessment/scripts/baci_analysis.R +161 -0
- package/skills/ecological-impact-assessment/scripts/fragmentation_analysis.py +141 -0
- package/skills/ecological-impact-assessment/scripts/power_analysis_baci.R +274 -0
- package/skills/ecosystem-services-assessment/SKILL.md +125 -0
- package/skills/ecosystem-services-assessment/examples/example-prompts.md +24 -0
- package/skills/ecosystem-services-assessment/resources/es-indicator-reference.md +45 -0
- package/skills/ecosystem-services-assessment/resources/invest-parameter-guide.md +86 -0
- package/skills/ecosystem-services-assessment/resources/rusle-coefficients.md +88 -0
- package/skills/ecosystem-services-assessment/scripts/__pycache__/compute_es.cpython-311.pyc +0 -0
- package/skills/ecosystem-services-assessment/scripts/compute_es.py +189 -0
- package/skills/ecosystem-services-assessment/scripts/tradeoff_analysis.R +161 -0
- package/skills/environmental-time-series/SKILL.md +125 -0
- package/skills/environmental-time-series/examples/example-prompts.md +33 -0
- package/skills/environmental-time-series/resources/anomaly-indices-reference.md +88 -0
- package/skills/environmental-time-series/resources/bfast-parameter-guide.md +69 -0
- package/skills/environmental-time-series/scripts/__pycache__/recovery_trajectory.cpython-311.pyc +0 -0
- package/skills/environmental-time-series/scripts/__pycache__/trend_analysis.cpython-311.pyc +0 -0
- package/skills/environmental-time-series/scripts/recovery_trajectory.R +305 -0
- package/skills/environmental-time-series/scripts/recovery_trajectory.py +178 -0
- package/skills/environmental-time-series/scripts/trend_analysis.R +192 -0
- package/skills/environmental-time-series/scripts/trend_analysis.py +184 -0
- package/skills/geoprocessing-for-ecology/SKILL.md +123 -0
- package/skills/geoprocessing-for-ecology/examples/example-prompts.md +32 -0
- package/skills/geoprocessing-for-ecology/resources/crs-reference.md +62 -0
- package/skills/geoprocessing-for-ecology/resources/global-predictor-sources.md +331 -0
- package/skills/geoprocessing-for-ecology/resources/resampling-methods.md +57 -0
- package/skills/geoprocessing-for-ecology/scripts/__pycache__/download_predictors.cpython-311.pyc +0 -0
- package/skills/geoprocessing-for-ecology/scripts/download_predictors.R +239 -0
- package/skills/geoprocessing-for-ecology/scripts/download_predictors.py +379 -0
- package/skills/geoprocessing-for-ecology/scripts/stack_and_extract.R +224 -0
- package/skills/geoprocessing-for-ecology/scripts/stack_and_extract.py +172 -0
- package/skills/landscape-connectivity/SKILL.md +170 -0
- package/skills/landscape-connectivity/examples/example-prompts.md +96 -0
- package/skills/landscape-connectivity/examples/jaguar_mesoamerica_corridor_example.md +271 -0
- package/skills/landscape-connectivity/resources/circuitscape-parameter-guide.md +155 -0
- package/skills/landscape-connectivity/resources/graph-theory-for-ecology.md +134 -0
- package/skills/landscape-connectivity/resources/resistance-surface-guide.md +141 -0
- package/skills/landscape-connectivity/scripts/connectivity_analysis.py +387 -0
- package/skills/landscape-connectivity/scripts/connectivity_metrics.R +274 -0
- package/skills/landscape-connectivity/scripts/resistance_surface.R +239 -0
- package/skills/model-validation-and-uncertainty/SKILL.md +131 -0
- package/skills/model-validation-and-uncertainty/examples/example-prompts.md +30 -0
- package/skills/model-validation-and-uncertainty/resources/extrapolation-risk-guide.md +236 -0
- package/skills/model-validation-and-uncertainty/resources/metric-selection-guide.md +52 -0
- package/skills/model-validation-and-uncertainty/resources/threshold-selection-guide.md +64 -0
- package/skills/model-validation-and-uncertainty/scripts/__pycache__/validate_model.cpython-311.pyc +0 -0
- package/skills/model-validation-and-uncertainty/scripts/extrapolation_risk.R +315 -0
- package/skills/model-validation-and-uncertainty/scripts/validate_model.py +226 -0
- package/skills/model-validation-and-uncertainty/scripts/validate_sdm.R +162 -0
- package/skills/occupancy-and-detection/SKILL.md +126 -0
- package/skills/occupancy-and-detection/examples/example-prompts.md +33 -0
- package/skills/occupancy-and-detection/resources/detection-history-format.md +100 -0
- package/skills/occupancy-and-detection/resources/occupancy-study-design.md +47 -0
- package/skills/occupancy-and-detection/scripts/__pycache__/occupancy_analysis.cpython-311.pyc +0 -0
- package/skills/occupancy-and-detection/scripts/occupancy_analysis.R +160 -0
- package/skills/occupancy-and-detection/scripts/occupancy_analysis.py +159 -0
- package/skills/population-viability-analysis/SKILL.md +161 -0
- package/skills/population-viability-analysis/examples/african_elephant_pva_example.md +266 -0
- package/skills/population-viability-analysis/examples/example-prompts.md +95 -0
- package/skills/population-viability-analysis/resources/extinction-risk-thresholds.md +128 -0
- package/skills/population-viability-analysis/resources/matrix-model-guide.md +139 -0
- package/skills/population-viability-analysis/resources/sensitivity-elasticity-reference.md +182 -0
- package/skills/population-viability-analysis/scripts/matrix_pva.R +258 -0
- package/skills/population-viability-analysis/scripts/pva_analysis.py +442 -0
- package/skills/population-viability-analysis/scripts/stochastic_pva.R +353 -0
- package/skills/predictive-modeling-best-practices/SKILL.md +136 -0
- package/skills/predictive-modeling-best-practices/examples/example-prompts.md +58 -0
- package/skills/predictive-modeling-best-practices/resources/collinearity-decision-tree.md +65 -0
- package/skills/predictive-modeling-best-practices/resources/sampling-bias-correction.md +267 -0
- package/skills/predictive-modeling-best-practices/resources/spatial-cv-guide.md +73 -0
- package/skills/predictive-modeling-best-practices/scripts/__pycache__/spatial_cv.cpython-311.pyc +0 -0
- package/skills/predictive-modeling-best-practices/scripts/collinearity_check.R +112 -0
- package/skills/predictive-modeling-best-practices/scripts/spatial_cv.py +182 -0
- package/skills/reproducible-ecology-pipeline/SKILL.md +139 -0
- package/skills/reproducible-ecology-pipeline/examples/example-prompts.md +35 -0
- package/skills/reproducible-ecology-pipeline/resources/directory-structure-template.md +94 -0
- package/skills/reproducible-ecology-pipeline/resources/params-yaml-template.yaml +84 -0
- package/skills/reproducible-ecology-pipeline/resources/reproducibility-checklist-template.md +66 -0
- package/skills/reproducible-ecology-pipeline/scripts/generate_file_manifest.py +110 -0
- package/skills/reproducible-ecology-pipeline/scripts/init_project.sh +53 -0
- package/skills/spatial-prioritization/SKILL.md +162 -0
- package/skills/spatial-prioritization/examples/biodiversity_hotspot_prioritization_example.md +289 -0
- package/skills/spatial-prioritization/examples/example-prompts.md +93 -0
- package/skills/spatial-prioritization/resources/cost-surface-reference.md +130 -0
- package/skills/spatial-prioritization/resources/marxan-vs-prioritizr-comparison.md +125 -0
- package/skills/spatial-prioritization/resources/prioritizr-formulation-guide.md +188 -0
- package/skills/spatial-prioritization/resources/representation-targets-guide.md +186 -0
- package/skills/spatial-prioritization/scripts/prioritization_sensitivity.R +320 -0
- package/skills/spatial-prioritization/scripts/run_prioritization.R +336 -0
- package/skills/species-distribution-modeling/SKILL.md +139 -0
- package/skills/species-distribution-modeling/examples/example-prompts.md +36 -0
- package/skills/species-distribution-modeling/resources/algorithm-comparison.md +25 -0
- package/skills/species-distribution-modeling/resources/calibration-area-guide.md +71 -0
- package/skills/species-distribution-modeling/resources/climate-scenario-preparation.md +170 -0
- package/skills/species-distribution-modeling/resources/maxent-calibration-guide.md +211 -0
- package/skills/species-distribution-modeling/resources/sdm-checklist.md +37 -0
- package/skills/species-distribution-modeling/scripts/predict_distribution.R +236 -0
- package/skills/species-distribution-modeling/scripts/predict_distribution.py +286 -0
- package/skills/species-distribution-modeling/scripts/prepare_future_layers.R +351 -0
- package/skills/species-distribution-modeling/scripts/project_scenarios.R +220 -0
- package/skills/species-distribution-modeling/scripts/run_ensemble_sdm.R +99 -0
- package/skills/species-distribution-modeling/scripts/sdm_pipeline.py +318 -0
- package/skills/species-distribution-modeling/scripts/tune_maxnet.R +344 -0
- package/templates/SKILL_TEMPLATE.md +225 -0
- package/templates/checklists/data-submission-checklist.md +38 -0
- package/templates/checklists/post-analysis-checklist.md +55 -0
- package/templates/checklists/pre-analysis-checklist.md +31 -0
- package/templates/prompts/debug-skill.md +47 -0
- package/templates/prompts/invoke-skill.md +34 -0
- package/templates/prompts/invoke-workflow.md +45 -0
- package/templates/reports/technical-report-template.md +80 -0
- package/templates/scripts/logger_setup.R +79 -0
- package/templates/scripts/logger_setup.py +119 -0
- package/templates/scripts/params_loader.R +28 -0
- package/templates/scripts/params_loader.py +38 -0
- package/workflows/analyze-community-structure/WORKFLOW.md +72 -0
- package/workflows/analyze-environmental-change/WORKFLOW.md +73 -0
- package/workflows/assess-ecological-impact/WORKFLOW.md +75 -0
- package/workflows/assess-ecosystem-services/WORKFLOW.md +68 -0
- package/workflows/assess-landscape-connectivity/WORKFLOW.md +84 -0
- package/workflows/build-fire-risk-map/WORKFLOW.md +79 -0
- package/workflows/produce-technical-report/WORKFLOW.md +113 -0
- package/workflows/run-camera-trap-occupancy/WORKFLOW.md +87 -0
- package/workflows/run-conservation-prioritization/WORKFLOW.md +89 -0
- package/workflows/run-multispecies-screening/WORKFLOW.md +197 -0
- package/workflows/run-occupancy-analysis/WORKFLOW.md +74 -0
- package/workflows/run-population-viability/WORKFLOW.md +90 -0
- package/workflows/run-sdm-study/WORKFLOW.md +99 -0
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: species-distribution-modeling
|
|
3
|
+
description: "Runs the complete species distribution modeling (SDM/ENM) pipeline: occurrence preparation, model fitting (MaxEnt, ensemble), thresholding, projection under climate scenarios, and interpretation. Use this skill when the user mentions habitat suitability, niche modeling, MaxEnt, biomod2, potential distribution, range maps, suitable area mapping, climate projections, invasion risk, range shift analysis, suitability mapping, ENM, ecological niche model, or calibration area definition."
|
|
4
|
+
skill_version: 1.0.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Skill: species-distribution-modeling
|
|
8
|
+
|
|
9
|
+
**Domain:** SDM · ENM · MaxEnt · Ensemble · Projection
|
|
10
|
+
**Phase:** 2 — Modeling
|
|
11
|
+
**Used by:** run-sdm-study
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Purpose
|
|
16
|
+
|
|
17
|
+
Guides the agent through the complete species distribution / ecological niche modeling pipeline: from occurrence and predictor preparation to model fitting, ensemble building, thresholding, projection, and interpretation.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## When to Invoke
|
|
22
|
+
|
|
23
|
+
- Modeling the potential or realised distribution of one or more species
|
|
24
|
+
- Projecting distributions under climate or land-use scenarios
|
|
25
|
+
- Comparing niche overlap between taxa or time periods
|
|
26
|
+
- Assessing invasion risk or connectivity
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Inputs
|
|
31
|
+
|
|
32
|
+
| Input | Format | Required |
|
|
33
|
+
|-------|--------|----------|
|
|
34
|
+
| Occurrence records (cleaned) | CSV with lat/lon | Yes |
|
|
35
|
+
| Environmental predictor stack | GeoTIFF (multiband or stack) | Yes |
|
|
36
|
+
| Study area / calibration area | SHP, GPKG | Yes |
|
|
37
|
+
| Future/alternative scenario rasters | GeoTIFF | Optional |
|
|
38
|
+
| Background / pseudo-absence points | CSV | Optional |
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Outputs
|
|
43
|
+
|
|
44
|
+
| Output | Description |
|
|
45
|
+
|--------|-------------|
|
|
46
|
+
| `suitability_current.tif` | Continuous suitability map (current) |
|
|
47
|
+
| `suitability_binary.tif` | Thresholded binary map |
|
|
48
|
+
| `suitability_scenarios/` | Projected maps per scenario |
|
|
49
|
+
| `ensemble_sd.tif` | Uncertainty (SD across algorithms) |
|
|
50
|
+
| `variable_importance.csv` | Predictor contributions |
|
|
51
|
+
| `response_curves.png` | Marginal response per predictor |
|
|
52
|
+
| `sdm_report.md` | Full methodological narrative |
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## Steps
|
|
57
|
+
|
|
58
|
+
### 1. Occurrence Curation
|
|
59
|
+
- Apply spatial thinning to reduce sampling bias (minimum distance = target resolution)
|
|
60
|
+
- Split into calibration and evaluation partitions using spatial blocks
|
|
61
|
+
- Report final occurrence count after thinning
|
|
62
|
+
|
|
63
|
+
### 2. Background / Pseudo-absence Sampling
|
|
64
|
+
- Sample background within the calibration area (or a bias-corrected version)
|
|
65
|
+
- Ratio: 1:1 to 1:10 (occurrences : background); document choice
|
|
66
|
+
- For pseudo-absence methods: apply geographic or environmental constraints
|
|
67
|
+
|
|
68
|
+
### 3. Predictor Selection
|
|
69
|
+
- Apply `predictive-modeling-best-practices` skill for collinearity reduction
|
|
70
|
+
- Prefer ecologically justified predictor subsets over data-driven selection alone
|
|
71
|
+
- Document final predictor set and sources
|
|
72
|
+
|
|
73
|
+
### 4. Algorithm Selection
|
|
74
|
+
- Run minimum 3 algorithms for ensemble:
|
|
75
|
+
- MaxEnt (presence-background)
|
|
76
|
+
- BRT / GBM (presence-absence or presence-background)
|
|
77
|
+
- Random Forest
|
|
78
|
+
- GLM (baseline)
|
|
79
|
+
- Additional: SVM, ANN, GAM as needed
|
|
80
|
+
|
|
81
|
+
### 5. Model Fitting and Tuning
|
|
82
|
+
- Tune regularisation/complexity per algorithm using spatial CV
|
|
83
|
+
- Store all tuned model objects and parameters
|
|
84
|
+
|
|
85
|
+
### 6. Ensemble Building
|
|
86
|
+
- Combine algorithms using weighted average (weights = TSS or AUC per algorithm)
|
|
87
|
+
- Report ensemble weights
|
|
88
|
+
- Compute ensemble SD as uncertainty layer
|
|
89
|
+
|
|
90
|
+
### 7. Thresholding
|
|
91
|
+
- Apply chosen threshold to produce binary map
|
|
92
|
+
- Report area predicted suitable (km²) above threshold
|
|
93
|
+
|
|
94
|
+
### 8. Projection
|
|
95
|
+
- Project ensemble to future/alternative scenarios
|
|
96
|
+
- Mask extrapolation areas (MESS or ExDet) to flag novel environments
|
|
97
|
+
- Report change in suitable area between current and projected
|
|
98
|
+
|
|
99
|
+
### 9. Interpretation
|
|
100
|
+
- Identify the 3 most important predictors
|
|
101
|
+
- Describe response curve shapes in ecological terms
|
|
102
|
+
- Flag any ecologically implausible responses
|
|
103
|
+
- Discuss model limitations and transferability
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Key Decisions to Document
|
|
108
|
+
|
|
109
|
+
- Spatial thinning distance
|
|
110
|
+
- Calibration area definition method
|
|
111
|
+
- Background sampling strategy
|
|
112
|
+
- Algorithm set and tuning ranges
|
|
113
|
+
- Ensemble weighting method
|
|
114
|
+
- Threshold selection method
|
|
115
|
+
- MESS/ExDet extrapolation masking
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## Tools and Libraries
|
|
120
|
+
|
|
121
|
+
**R:** `biomod2`, `ENMeval`, `dismo`, `maxnet`, `sdm`, `kuenm`
|
|
122
|
+
**Python:** `elapid`, `pysdm`, `sklearn`
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
## Resources
|
|
127
|
+
|
|
128
|
+
- `resources/sdm-checklist.md` — SDM reporting checklist (based on ODMAP protocol)
|
|
129
|
+
- `resources/calibration-area-guide.md` — M area selection methods
|
|
130
|
+
- `resources/algorithm-comparison.md` — algorithm strengths and limitations
|
|
131
|
+
- `examples/sdm/` — full worked example
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Notes
|
|
136
|
+
|
|
137
|
+
- Follow the ODMAP (Overview of Data and Methods in Presence-Absence Modeling) reporting standard
|
|
138
|
+
- Always mask predictions to the calibration area unless explicitly projecting to novel regions
|
|
139
|
+
- Climate projections should use multiple GCMs and report uncertainty across models
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
# Example Invocation Prompts — species-distribution-modeling
|
|
2
|
+
|
|
3
|
+
## Full SDM Pipeline
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
Load skill: species-distribution-modeling
|
|
7
|
+
Task: Build a habitat suitability model for Panthera onca in the Amazon biome.
|
|
8
|
+
|
|
9
|
+
Inputs:
|
|
10
|
+
- Occurrences: data/processed/data_clean.csv (n = 523 after cleaning)
|
|
11
|
+
- Predictors: data/predictors_stack.tif (bio1, bio4, bio12, bio15, NDVI, slope — 6 vars, collinearity checked)
|
|
12
|
+
- Calibration area: data/spatial/amazon_buffered.shp
|
|
13
|
+
- Background: 10,000 points sampled within calibration area
|
|
14
|
+
|
|
15
|
+
Steps:
|
|
16
|
+
1. Spatial thinning: 10 km minimum distance
|
|
17
|
+
2. Spatial CV: 5 blocks (blockCV, block size = 300 km)
|
|
18
|
+
3. Algorithms: MaxEnt (maxnet), BRT (gbm), Random Forest
|
|
19
|
+
4. Ensemble: weighted average by TSS score per algorithm
|
|
20
|
+
5. Threshold: MaxTSS and P10
|
|
21
|
+
6. Uncertainty: SD across algorithms
|
|
22
|
+
|
|
23
|
+
Output: suitability_current.tif, suitability_binary.tif, variable_importance.csv, response_curves.png, sdm_report.md
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Projection to 2050
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
Load skill: species-distribution-modeling
|
|
30
|
+
Task: Project the fitted jaguar SDM to 2050 conditions.
|
|
31
|
+
Scenario rasters: data/predictors_2050_ssp245/ (same variable names as current stack)
|
|
32
|
+
Calibration area: data/spatial/amazon_buffered.shp
|
|
33
|
+
Apply MESS mask to flag areas outside training range.
|
|
34
|
+
Compute change in suitable area (km²) between current and 2050.
|
|
35
|
+
Output: suitability_2050_ssp245.tif, mess_mask.tif, change_summary.csv
|
|
36
|
+
```
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# SDM Algorithm Comparison
|
|
2
|
+
|
|
3
|
+
| Algorithm | Type | Presence data | Absence needed | Overfitting risk | Extrapolation | Interpretability |
|
|
4
|
+
|-----------|------|--------------|----------------|-----------------|---------------|-----------------|
|
|
5
|
+
| MaxEnt | ML (max entropy) | Presence-background | No | Moderate | Poor (clamping) | Moderate |
|
|
6
|
+
| BRT/GBM | Ensemble tree | PA or PB | Recommended | High if untuned | Poor | Moderate |
|
|
7
|
+
| Random Forest | Ensemble tree | PA or PB | Recommended | Low–moderate | Poor | Low |
|
|
8
|
+
| GLM | Statistical | PA | Yes | Low | Good | High |
|
|
9
|
+
| GAM | Statistical | PA | Yes | Moderate | Good | High |
|
|
10
|
+
| BIOCLIM | Envelope | Presence only | No | Low | Poor | High |
|
|
11
|
+
| Mahalanobis | Distance | Presence only | No | Low | Moderate | High |
|
|
12
|
+
| SVM | ML | PA or PB | Recommended | Moderate | Moderate | Low |
|
|
13
|
+
| ANN / MLP | Deep learning | PA | Recommended | Very high | Poor | Very low |
|
|
14
|
+
|
|
15
|
+
## Recommendations by Context
|
|
16
|
+
|
|
17
|
+
| Context | Recommended algorithms |
|
|
18
|
+
|---------|----------------------|
|
|
19
|
+
| Presence-only data only | MaxEnt + BIOCLIM (ensemble) |
|
|
20
|
+
| < 50 occurrences | MaxEnt + GLM (regularised) |
|
|
21
|
+
| 50–200 occurrences | MaxEnt + BRT + GLM |
|
|
22
|
+
| > 200 occurrences | BRT + RF + GLM + MaxEnt (full ensemble) |
|
|
23
|
+
| Need for projection to novel climates | GLM + GAM (better extrapolation) |
|
|
24
|
+
| Regulatory / conservation decision | Minimum 3 algorithms in ensemble |
|
|
25
|
+
| Publication | Ensemble ≥ 3 algorithms + uncertainty map |
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# Calibration Area (M Area) Selection Guide
|
|
2
|
+
|
|
3
|
+
The calibration area (M, accessible area, or training region) defines the geographic extent from which background or pseudo-absence points are sampled and within which the model is calibrated. It is one of the most consequential decisions in SDM.
|
|
4
|
+
|
|
5
|
+
## Why M Matters
|
|
6
|
+
|
|
7
|
+
- Background points sampled outside M misrepresent the environmental conditions available to the species
|
|
8
|
+
- An M that is too large includes environments never accessible to the species → inflated model performance, incorrect niche characterisation
|
|
9
|
+
- An M that is too small may omit accessible habitats → truncated niche, poor transferability
|
|
10
|
+
|
|
11
|
+
## Common Delimitation Methods
|
|
12
|
+
|
|
13
|
+
### 1. Biotic Region / Biome
|
|
14
|
+
Use biogeographic boundaries (biome, ecoregion) that match the species' known biogeographic history.
|
|
15
|
+
- **Best for:** Continental or regional studies with well-known biogeographic context
|
|
16
|
+
- **R:** IBGE biome polygons, WWF Ecoregions (rnaturalearth)
|
|
17
|
+
|
|
18
|
+
### 2. Minimum Convex Hull (MCP) + Buffer
|
|
19
|
+
Convex hull around all occurrence points, expanded by a fixed buffer distance (e.g., 200–500 km).
|
|
20
|
+
- **Best for:** Species with limited occurrence data; simple implementation
|
|
21
|
+
- **Caution:** Can include barriers (oceans, mountain ranges) that limit dispersal
|
|
22
|
+
|
|
23
|
+
```r
|
|
24
|
+
library(sf)
|
|
25
|
+
occ_sf <- st_as_sf(occ, coords = c("decimalLongitude","decimalLatitude"), crs = 4326)
|
|
26
|
+
hull <- st_convex_hull(st_union(occ_sf))
|
|
27
|
+
M_area <- st_buffer(hull, dist = 200000) # 200 km buffer
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### 3. Dispersal Simulation (Circuitscape / BioModelos)
|
|
31
|
+
Model the area reachable by the species over a defined time period given dispersal rate and landscape permeability.
|
|
32
|
+
- **Best for:** Species with known dispersal capacity; connectivity studies
|
|
33
|
+
|
|
34
|
+
### 4. Watershed / Drainage Basin (aquatic species)
|
|
35
|
+
- **Best for:** Freshwater fish, invertebrates, plants with hydrological dispersal
|
|
36
|
+
|
|
37
|
+
### 5. Political / Administrative Unit
|
|
38
|
+
Use only when the species is genuinely limited to that unit (e.g., island endemics).
|
|
39
|
+
- **Avoid for:** Most terrestrial species with cross-border ranges
|
|
40
|
+
|
|
41
|
+
## Recommended Workflow
|
|
42
|
+
|
|
43
|
+
1. Start with the known biome or ecoregion containing all occurrences
|
|
44
|
+
2. Expand by one adjacent ecoregion or 200–500 km buffer to include accessible but unsampled habitats
|
|
45
|
+
3. Clip to ecologically meaningful barriers (e.g., remove ocean from terrestrial species M)
|
|
46
|
+
4. Verify that the background environment within M spans the full range of occurrence conditions (convex hull check in environmental space)
|
|
47
|
+
|
|
48
|
+
## Environmental Space Check
|
|
49
|
+
|
|
50
|
+
```r
|
|
51
|
+
library(terra)
|
|
52
|
+
env_bg <- extract(predictor_stack, bg_points)
|
|
53
|
+
env_occ <- extract(predictor_stack, occ_points)
|
|
54
|
+
|
|
55
|
+
# Check: are all occurrence environmental conditions within the background range?
|
|
56
|
+
for (v in names(env_bg)) {
|
|
57
|
+
occ_range <- range(env_occ[[v]], na.rm = TRUE)
|
|
58
|
+
bg_range <- range(env_bg[[v]], na.rm = TRUE)
|
|
59
|
+
cat(v, "| occ:", round(occ_range, 2),
|
|
60
|
+
"| bg:", round(bg_range, 2), "\n")
|
|
61
|
+
}
|
|
62
|
+
# If occ range exceeds bg range → M is too small for that variable
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## Reporting
|
|
66
|
+
|
|
67
|
+
Always report:
|
|
68
|
+
- M delimitation method and rationale
|
|
69
|
+
- Area of M in km²
|
|
70
|
+
- Proportion of occurrences inside M (should be 100%)
|
|
71
|
+
- Environmental coverage (does bg span all occurrence conditions?)
|
|
@@ -0,0 +1,170 @@
|
|
|
1
|
+
# Climate Scenario Preparation Guide
|
|
2
|
+
|
|
3
|
+
Preparing future climate layers for SDM projection: sources, SSPs, time horizons, and pipeline.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## 1. Sources of Future Climate Data
|
|
8
|
+
|
|
9
|
+
| Source | Resolution | SSPs available | Temporal coverage | Recommended for |
|
|
10
|
+
|---|---|---|---|---|
|
|
11
|
+
| **CHELSA-Future** | ~1 km (30 arc-sec) | SSP1-2.6, SSP3-7.0, SSP5-8.5 | 2041–2060, 2061–2080 | **Primary recommendation** — high resolution, bias-corrected |
|
|
12
|
+
| **WorldClim v2.1 future** | ~18 km (10 arc-min) | SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 | 2021–2040, 2041–2060, 2061–2080, 2081–2100 | Regional studies where 10-min resolution is acceptable |
|
|
13
|
+
| **CMIP6 raw** | ~100 km (1°) | All SSPs from each GCM | Monthly, 1850–2100 | Research requiring specific GCM; requires statistical downscaling |
|
|
14
|
+
| **TerraClimate** | ~4 km | N/A (historical only) | 1958–present | Historical baseline calibration |
|
|
15
|
+
|
|
16
|
+
**Key reference:** Karger et al. 2017 (CHELSA v1). DOI: [10.1038/sdata.2017.122](https://doi.org/10.1038/sdata.2017.122)
|
|
17
|
+
|
|
18
|
+
**Download portals:**
|
|
19
|
+
- CHELSA: https://chelsa-climate.org/downloads/
|
|
20
|
+
- WorldClim: https://worldclim.org/data/cmip6/cmip6climate.html
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## 2. SSPs — Shared Socioeconomic Pathways
|
|
25
|
+
|
|
26
|
+
| SSP | Common name | Radiative forcing (2100) | Temperature anomaly (°C, 2100) | When to use |
|
|
27
|
+
|---|---|---|---|---|
|
|
28
|
+
| **SSP1-2.6** | Optimistic / sustainability | 2.6 W/m² | ~1.8°C | Show "best case" scenario |
|
|
29
|
+
| **SSP2-4.5** | Intermediate | 4.5 W/m² | ~2.7°C | **Always include** — most likely trajectory |
|
|
30
|
+
| **SSP3-7.0** | Regional rivalry | 7.0 W/m² | ~3.6°C | Regional fragmentation scenarios |
|
|
31
|
+
| **SSP5-8.5** | High emissions / fossil fuel | 8.5 W/m² | ~4.4°C | **Always include** — worst-case bound |
|
|
32
|
+
|
|
33
|
+
**Minimum reporting standard:** Always include at least **SSP2-4.5** and **SSP5-8.5** to capture intermediate and worst-case trajectories. Including SSP1-2.6 (optimistic) is recommended to illustrate the conservation value of mitigation.
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## 3. Standard Time Horizons
|
|
38
|
+
|
|
39
|
+
| Label | Period | Commonly called |
|
|
40
|
+
|---|---|---|
|
|
41
|
+
| Near future | 2021–2040 | "2030" |
|
|
42
|
+
| Mid-century | 2041–2060 | **"2050"** (most used) |
|
|
43
|
+
| Late-century | 2061–2080 | **"2070"** (most used) |
|
|
44
|
+
| End-of-century | 2081–2100 | "2100" |
|
|
45
|
+
|
|
46
|
+
**Recommendation:** Report at minimum **2050** and **2070** for each SSP. This yields 4 projection scenarios minimum (SSP2-4.5 × 2050, SSP2-4.5 × 2070, SSP5-8.5 × 2050, SSP5-8.5 × 2070).
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## 4. Mandatory Preparation Pipeline
|
|
51
|
+
|
|
52
|
+
Follow these steps in order before passing any future layer to a model:
|
|
53
|
+
|
|
54
|
+
### Step 1 — Download future layers
|
|
55
|
+
Download the same set of bioclimatic/environmental variables that were selected
|
|
56
|
+
during calibration. **Never add or remove variables between current and future.**
|
|
57
|
+
|
|
58
|
+
```r
|
|
59
|
+
# Example: listing CHELSA-Future files for a specific SSP and GCM
|
|
60
|
+
# Files follow naming convention: CHELSA_{var}_2041-2060_{ssp}_{gcm}_V.2.1.tif
|
|
61
|
+
chelsa_files <- list.files("data/chelsa_future/ssp585_2050/", pattern = "\\.tif$",
|
|
62
|
+
full.names = TRUE)
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Step 2 — Reproject to calibration CRS
|
|
66
|
+
|
|
67
|
+
```r
|
|
68
|
+
suppressPackageStartupMessages(library(terra))
|
|
69
|
+
|
|
70
|
+
# Load reference (calibration) stack
|
|
71
|
+
ref_stack <- rast("data/predictors/env_train.tif")
|
|
72
|
+
future_stack <- rast(chelsa_files)
|
|
73
|
+
|
|
74
|
+
# Reproject to match calibration CRS
|
|
75
|
+
future_stack <- project(future_stack, crs(ref_stack), method = "bilinear")
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### Step 3 — Clip and mask to projection area (G area)
|
|
79
|
+
|
|
80
|
+
The projection area (G area) is typically the full study continent or biome.
|
|
81
|
+
It must be the same for all SSPs and time periods.
|
|
82
|
+
|
|
83
|
+
```r
|
|
84
|
+
study_area <- vect("data/study_area/g_area.shp")
|
|
85
|
+
future_stack <- crop(future_stack, study_area)
|
|
86
|
+
future_stack <- mask(future_stack, study_area)
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Step 4 — Verify identical geometry with terra::compareGeom()
|
|
90
|
+
|
|
91
|
+
This is **critical** — silent geometry mismatches cause wrong predictions.
|
|
92
|
+
|
|
93
|
+
```r
|
|
94
|
+
# compareGeom returns TRUE if extent, resolution, CRS all match
|
|
95
|
+
if (!compareGeom(ref_stack, future_stack, stopOnError = FALSE)) {
|
|
96
|
+
# Resample to exactly match reference grid
|
|
97
|
+
future_stack <- resample(future_stack, ref_stack, method = "bilinear")
|
|
98
|
+
|
|
99
|
+
# Verify again
|
|
100
|
+
if (!compareGeom(ref_stack, future_stack, stopOnError = FALSE)) {
|
|
101
|
+
stop("Geometry mismatch persists after resampling. Check CRS and extent.")
|
|
102
|
+
}
|
|
103
|
+
}
|
|
104
|
+
message("Geometry check passed.")
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
### Step 5 — Verify layer names match calibration stack
|
|
108
|
+
|
|
109
|
+
Layer name matching is **critical for maxnet and biomod2** — the model uses
|
|
110
|
+
names to match predictors. A mismatch causes silent wrong variable assignment.
|
|
111
|
+
|
|
112
|
+
```r
|
|
113
|
+
# Check names
|
|
114
|
+
ref_names <- names(ref_stack)
|
|
115
|
+
future_names <- names(future_stack)
|
|
116
|
+
|
|
117
|
+
missing_in_future <- setdiff(ref_names, future_names)
|
|
118
|
+
extra_in_future <- setdiff(future_names, ref_names)
|
|
119
|
+
|
|
120
|
+
if (length(missing_in_future) > 0) {
|
|
121
|
+
stop("Future stack is missing layers present in calibration: ",
|
|
122
|
+
paste(missing_in_future, collapse = ", "))
|
|
123
|
+
}
|
|
124
|
+
|
|
125
|
+
if (length(extra_in_future) > 0) {
|
|
126
|
+
message("Extra layers in future stack (will be dropped): ",
|
|
127
|
+
paste(extra_in_future, collapse = ", "))
|
|
128
|
+
future_stack <- future_stack[[ref_names]] # subset to calibration variables
|
|
129
|
+
}
|
|
130
|
+
|
|
131
|
+
# Reorder to match calibration
|
|
132
|
+
future_stack <- future_stack[[ref_names]]
|
|
133
|
+
message("Layer names verified and ordered.")
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
---
|
|
137
|
+
|
|
138
|
+
## 5. Quick-Reference Checklist
|
|
139
|
+
|
|
140
|
+
| Step | Check | Status |
|
|
141
|
+
|---|---|---|
|
|
142
|
+
| Same variables as calibration | All bioclim variables identical | ☐ |
|
|
143
|
+
| CRS matches calibration stack | `crs(future) == crs(train)` | ☐ |
|
|
144
|
+
| Resolution matches | `res(future) == res(train)` | ☐ |
|
|
145
|
+
| Extent matches after crop | `compareGeom()` returns TRUE | ☐ |
|
|
146
|
+
| Layer names identical and in same order | `names(future) == names(train)` | ☐ |
|
|
147
|
+
| Masked to G area polygon | `mask()` applied | ☐ |
|
|
148
|
+
| Output file saved with SSP+year label | `future_stack_ssp585_2050.tif` | ☐ |
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## 6. Common Errors
|
|
153
|
+
|
|
154
|
+
- **Using different variables between current and future:** e.g., using bio1–bio5 for calibration but bio1–bio3 for future. This is an automatic error in maxnet.
|
|
155
|
+
- **Not clipping to G area:** projecting to a wider area than intended inflates apparent suitable area and may increase extrapolation.
|
|
156
|
+
- **Silent CRS incompatibilities:** `terra::project()` will reproject, but if you skip this step and the CRS differs by even the datum, predictions will be geographically offset.
|
|
157
|
+
- **Using a different GCM for each SSP:** GCMs have different temperature sensitivities; using different GCMs per SSP conflates SSP and GCM effects. Use the same GCM(s) across all SSPs.
|
|
158
|
+
- **Layer name mismatch after renaming:** CHELSA uses long names like `CHELSA_bio1_2041-2060_ssp585_MPI-ESM1-2-HR_V.2.1`; rename layers to match short names (`bio1`, `bio2`, …) immediately after loading.
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## 7. References
|
|
163
|
+
|
|
164
|
+
| Citation | DOI |
|
|
165
|
+
|---|---|
|
|
166
|
+
| Karger et al. 2017. Sci. Data 4:170122 (CHELSA) | [10.1038/sdata.2017.122](https://doi.org/10.1038/sdata.2017.122) |
|
|
167
|
+
| Fick & Hijmans 2017. Int. J. Climatol. (WorldClim 2) | [10.1002/joc.5086](https://doi.org/10.1002/joc.5086) |
|
|
168
|
+
| Eyring et al. 2016. Geosci. Model Dev. (CMIP6) | [10.5194/gmd-9-1937-2016](https://doi.org/10.5194/gmd-9-1937-2016) |
|
|
169
|
+
| O'Neill et al. 2016. Geosci. Model Dev. (SSPs) | [10.5194/gmd-9-3461-2016](https://doi.org/10.5194/gmd-9-3461-2016) |
|
|
170
|
+
| Zurell et al. 2020. Ecography (ODMAP) | [10.1111/ecog.04960](https://doi.org/10.1111/ecog.04960) |
|
|
@@ -0,0 +1,211 @@
|
|
|
1
|
+
# MaxEnt Calibration Guide
|
|
2
|
+
|
|
3
|
+
Regularization, feature classes, model selection, and calibration grid for maxnet/ENMeval.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## 1. Regularization Multiplier (RM) and Feature Classes (FC)
|
|
8
|
+
|
|
9
|
+
MaxEnt (and its R equivalent **maxnet**) controls model complexity through two parameters:
|
|
10
|
+
|
|
11
|
+
### Regularization Multiplier (RM)
|
|
12
|
+
|
|
13
|
+
RM penalises model complexity. Think of it as a smoothing parameter:
|
|
14
|
+
|
|
15
|
+
| RM value | Effect |
|
|
16
|
+
|---|---|
|
|
17
|
+
| < 1 | Under-regularised — risk of **overfitting** (model memorises noise in occurrence data) |
|
|
18
|
+
| 1 (default) | Often appropriate for large, clean datasets |
|
|
19
|
+
| 1.5 – 2 | Recommended starting point for typical GBIF datasets |
|
|
20
|
+
| 3 – 6 | High smoothing — appropriate for very small n, noisy data, or wide-ranging species |
|
|
21
|
+
|
|
22
|
+
### Feature Classes (FC)
|
|
23
|
+
|
|
24
|
+
FC determines which response curve shapes are available to the model:
|
|
25
|
+
|
|
26
|
+
| FC code | Name | Response shape allowed | Risk |
|
|
27
|
+
|---|---|---|---|
|
|
28
|
+
| **L** | Linear | Monotonic linear relationship | Very simple; may underfit |
|
|
29
|
+
| **Q** | Quadratic | Unimodal response (bell curve) | Good default for most ecological variables |
|
|
30
|
+
| **H** | Hinge | Piecewise linear with breakpoint | Flexible; needs more data to fit reliably |
|
|
31
|
+
| **P** | Product | Interactions between pairs of predictors | Can model synergies; risk of overfitting |
|
|
32
|
+
| **T** | Threshold | Step function | Extreme flexibility; high overfitting risk |
|
|
33
|
+
|
|
34
|
+
Combinations are additive: `LQ` allows both linear and quadratic features; `LQHPT` allows all five.
|
|
35
|
+
|
|
36
|
+
**Typical recommendation:** Start with `LQ` or `LQH`; add `P` and `T` only with > 100 clean occurrences and only if AICc improves substantially.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## 2. Model Selection Criteria
|
|
41
|
+
|
|
42
|
+
### Why AUC alone is insufficient
|
|
43
|
+
|
|
44
|
+
AUC measures discrimination (can the model rank presences above absences?) but does **not**
|
|
45
|
+
detect overfitting to the calibration area. A highly over-fitted model can have AUC = 0.99
|
|
46
|
+
in calibration but fail completely in independent validation.
|
|
47
|
+
|
|
48
|
+
### Recommended criteria (in priority order)
|
|
49
|
+
|
|
50
|
+
| Criterion | What it measures | Threshold / rule |
|
|
51
|
+
|---|---|---|
|
|
52
|
+
| **OR10** (Omission Rate at 10%) | Whether the model omits 10% of training points at the selected threshold | Must be ≤ expected omission rate (0.10); lower is better |
|
|
53
|
+
| **AICc** | Model complexity-penalised fit (Akaike Information Criterion, corrected) | Select model(s) with lowest AICc; models within delta_AICc < 2 are equivalent |
|
|
54
|
+
| **Partial ROC** | AUC computed only over the biologically relevant part of the ROC curve | Ratio > 1 (compared to random); higher is better |
|
|
55
|
+
| **AUC (train)** | Training set AUC | Supplementary only — not a selection criterion |
|
|
56
|
+
|
|
57
|
+
**Peterson et al. 2008** showed that omission rate is a more ecologically meaningful
|
|
58
|
+
criterion than AUC for evaluating niche model performance.
|
|
59
|
+
DOI: [10.1016/j.ecolmodel.2007.11.008](https://doi.org/10.1016/j.ecolmodel.2007.11.008)
|
|
60
|
+
|
|
61
|
+
**Warren & Seifert 2011** introduced the OR_AICc approach: select models with low
|
|
62
|
+
omission rate AND low AICc to balance predictive accuracy and parsimony.
|
|
63
|
+
DOI: [10.1890/10-1171.1](https://doi.org/10.1890/10-1171.1)
|
|
64
|
+
|
|
65
|
+
### OR_AICc selection rule (recommended)
|
|
66
|
+
|
|
67
|
+
1. Filter all models where `OR10 <= 0.10 + tolerance` (tolerance = 0.05 recommended)
|
|
68
|
+
2. Among passing models, select those with `delta_AICc < 2`
|
|
69
|
+
3. If multiple models pass, report ensemble or select the most parsimonious (fewest parameters)
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## 3. Recommended Calibration Grid
|
|
74
|
+
|
|
75
|
+
Use this 35-model grid as the default for `tune_maxnet.R`:
|
|
76
|
+
|
|
77
|
+
```
|
|
78
|
+
RM = c(0.5, 1, 1.5, 2, 3, 4, 6)
|
|
79
|
+
FC = c("L", "LQ", "LQH", "LQHP", "LQHPT")
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Total combinations: 7 × 5 = **35 models**
|
|
83
|
+
|
|
84
|
+
For species with n_clean < 50, restrict to:
|
|
85
|
+
```
|
|
86
|
+
RM = c(1, 2, 3, 4, 6)
|
|
87
|
+
FC = c("L", "LQ", "LQH")
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## 4. kuenm vs ENMeval + maxnet
|
|
93
|
+
|
|
94
|
+
| Feature | kuenm (maxent.jar) | ENMeval + maxnet |
|
|
95
|
+
|---|---|---|
|
|
96
|
+
| Engine | MaxEnt Java `.jar` | maxnet (R-native, no Java) |
|
|
97
|
+
| Java required | **Yes** | **No** |
|
|
98
|
+
| CRAN installable | No (manual download) | **Yes** |
|
|
99
|
+
| Calibration parallelism | Yes (via Java threads) | Yes (via `parallel` package) |
|
|
100
|
+
| Output metrics | OR, AUC, AICc, pROC | OR, AUC, AICc |
|
|
101
|
+
| Partial ROC | Built-in | Separate `ENMeval::eval.stats()` |
|
|
102
|
+
| Reproducibility | Depends on MaxEnt version | Fully reproducible in R |
|
|
103
|
+
| Recommended for this repo | No | **Yes** |
|
|
104
|
+
|
|
105
|
+
**Why ENMeval in this repo:**
|
|
106
|
+
- No Java dependency → simpler CI, reproducible environments
|
|
107
|
+
- `maxnet` produces identical predictions to MaxEnt (Phillips et al. 2017)
|
|
108
|
+
- `ENMeval >= 2.0` provides a standardised, tidy interface
|
|
109
|
+
- Easily integrated with `terra` and `sf` pipelines
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## 5. Interpreting Calibration Results
|
|
114
|
+
|
|
115
|
+
### delta_AICc table
|
|
116
|
+
|
|
117
|
+
```
|
|
118
|
+
delta_AICc < 2 → Model is equivalent to best model; all are candidates
|
|
119
|
+
delta_AICc 2–7 → Some support for this model; use with caution
|
|
120
|
+
delta_AICc > 7 → Little to no support; exclude from ensemble
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Omission rate × AUC plot
|
|
124
|
+
|
|
125
|
+
```
|
|
126
|
+
Good model: low OR10 (y-axis) + high AUC (x-axis) → lower-right quadrant
|
|
127
|
+
Overfitted: very low OR10 (memorised training data) + moderate validation AUC
|
|
128
|
+
Underfitted: OR10 > 0.15 (misses known occurrences)
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
When reading the plot: **prioritise OR10 first, then AUC**. A model with OR10 = 0.08
|
|
132
|
+
and AUC = 0.82 is better than OR10 = 0.02 and AUC = 0.91 (the latter is likely overfitted).
|
|
133
|
+
|
|
134
|
+
### Quick-reference checklist
|
|
135
|
+
|
|
136
|
+
| Check | Pass condition |
|
|
137
|
+
|---|---|
|
|
138
|
+
| OR10 ≤ 0.15 | Model does not severely omit known occurrences |
|
|
139
|
+
| delta_AICc < 2 for selected model | Model is parsimonious |
|
|
140
|
+
| Training AUC > 0.70 | Model performs above random |
|
|
141
|
+
| No single FC dominates (e.g., all T features) | Model not over-complex |
|
|
142
|
+
| Selected RM in middle of grid (not 0.5 or 6) | RM not at boundary — extend grid if needed |
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## 6. R Code Example
|
|
147
|
+
|
|
148
|
+
```r
|
|
149
|
+
suppressPackageStartupMessages(library(ENMeval))
|
|
150
|
+
suppressPackageStartupMessages(library(terra))
|
|
151
|
+
|
|
152
|
+
# Load data
|
|
153
|
+
occ <- read.csv("tests/data/points_with_env.csv")
|
|
154
|
+
env <- rast("data/predictors/env_stack.tif")
|
|
155
|
+
|
|
156
|
+
occ_pts <- occ[, c("decimalLongitude", "decimalLatitude")]
|
|
157
|
+
env_vals <- as.data.frame(extract(env, occ_pts))
|
|
158
|
+
|
|
159
|
+
# Background points (10,000 random within study area)
|
|
160
|
+
bg <- spatSample(env, size = 10000, method = "random",
|
|
161
|
+
na.rm = TRUE, as.df = TRUE, xy = TRUE)
|
|
162
|
+
bg_pts <- bg[, c("x", "y")]
|
|
163
|
+
|
|
164
|
+
# Calibration grid
|
|
165
|
+
eval_out <- ENMevaluate(
|
|
166
|
+
occs = occ_pts,
|
|
167
|
+
envs = env,
|
|
168
|
+
bg = bg_pts,
|
|
169
|
+
algorithm = "maxnet",
|
|
170
|
+
partitions = "block", # spatial cross-validation
|
|
171
|
+
tune.args = list(
|
|
172
|
+
rm = c(0.5, 1, 1.5, 2, 3, 4, 6),
|
|
173
|
+
fc = c("L", "LQ", "LQH", "LQHP", "LQHPT")
|
|
174
|
+
)
|
|
175
|
+
)
|
|
176
|
+
|
|
177
|
+
# View results table
|
|
178
|
+
res <- eval.results(eval_out)
|
|
179
|
+
head(res[order(res$AICc), ])
|
|
180
|
+
|
|
181
|
+
# Select best model by OR_AICc criterion
|
|
182
|
+
best <- res[res$or.10p.avg <= 0.15 & res$delta.AICc < 2, ]
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
---
|
|
186
|
+
|
|
187
|
+
## 7. Common Pitfalls
|
|
188
|
+
|
|
189
|
+
- **RM too low (e.g., 0.5):** model memorises training localities; performs poorly in
|
|
190
|
+
independent validation. Symptom: very jagged, patchy suitability map.
|
|
191
|
+
- **RM too high (e.g., 6):** model is overly smooth and may miss real habitat patches.
|
|
192
|
+
Symptom: very broad, featureless map regardless of species ecology.
|
|
193
|
+
- **Using AUC as the sole selection criterion:** rewards discrimination but not calibration.
|
|
194
|
+
Always pair with OR10.
|
|
195
|
+
- **Not running spatial CV:** random k-fold CV inflates AUC due to spatial autocorrelation.
|
|
196
|
+
Always use `partitions="block"` or `"checkerboard"` in ENMeval.
|
|
197
|
+
- **Grid boundary effects:** if best RM is 0.5 or 6 (grid edges), extend the grid and re-run.
|
|
198
|
+
- **Not checking layer names:** maxnet uses column names from the training data to match
|
|
199
|
+
projection layers. Mismatch causes silent errors or wrong predictions.
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## 8. References
|
|
204
|
+
|
|
205
|
+
| Citation | DOI |
|
|
206
|
+
|---|---|
|
|
207
|
+
| Peterson et al. 2008. Ecol. Model. 213: 63–72 | [10.1016/j.ecolmodel.2007.11.008](https://doi.org/10.1016/j.ecolmodel.2007.11.008) |
|
|
208
|
+
| Warren & Seifert 2011. Ecol. Apps. 21: 335–342 | [10.1890/10-1171.1](https://doi.org/10.1890/10-1171.1) |
|
|
209
|
+
| Phillips et al. 2017. Ecography 40: 913–922 (maxnet) | [10.1111/ecog.03049](https://doi.org/10.1111/ecog.03049) |
|
|
210
|
+
| Muscarella et al. 2014. Meth. Ecol. Evol. (ENMeval v1) | [10.1111/2041-210X.12261](https://doi.org/10.1111/2041-210X.12261) |
|
|
211
|
+
| Kass et al. 2021. Meth. Ecol. Evol. (ENMeval v2) | [10.1111/2041-210X.13628](https://doi.org/10.1111/2041-210X.13628) |
|