ecological-agent-skills 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (217) hide show
  1. package/AGENT_CONTEXT.md +191 -0
  2. package/CATALOG.md +329 -0
  3. package/LICENSE +692 -0
  4. package/README.md +347 -0
  5. package/bin/install.mjs +168 -0
  6. package/docs/comparison-with-alternatives.md +38 -0
  7. package/docs/global-examples-index.md +103 -0
  8. package/docs/repository-statistics.md +101 -0
  9. package/docs/theoretical-foundations.md +188 -0
  10. package/environment.yaml +106 -0
  11. package/examples/community/arctic_tundra_vegetation_example.md +247 -0
  12. package/examples/community/bird_landuse_example.md +63 -0
  13. package/examples/community/phytoplankton_reservoir_example.md +60 -0
  14. package/examples/community/reef_fish_indopacific_example.md +221 -0
  15. package/examples/impact/baci_road_example.md +57 -0
  16. package/examples/impact/ecosystem_services_atlantic_forest.md +83 -0
  17. package/examples/impact/forest_loss_borneo_timeseries_example.md +225 -0
  18. package/examples/occupancy/puma_camera_example.md +61 -0
  19. package/examples/occupancy/snow_leopard_himalayas_example.md +204 -0
  20. package/examples/reproducible/whittaker_biome_sdm_example.md +406 -0
  21. package/examples/sdm/anteater_cerrado_example.md +69 -0
  22. package/examples/sdm/jaguar_amazon_example.md +80 -0
  23. package/examples/sdm/koala_climate_change_example.md +170 -0
  24. package/examples/sdm/wolf_recolonization_europe_example.md +193 -0
  25. package/package.json +43 -0
  26. package/renv.lock +194 -0
  27. package/skills/SKILL_INDEX.json +1020 -0
  28. package/skills/acoustic-monitoring/SKILL.md +163 -0
  29. package/skills/acoustic-monitoring/examples/example-prompts.md +100 -0
  30. package/skills/acoustic-monitoring/examples/temperate_forest_birds_example.md +285 -0
  31. package/skills/acoustic-monitoring/resources/acoustic-indices-reference.md +93 -0
  32. package/skills/acoustic-monitoring/resources/soundscape-ecology-guide.md +90 -0
  33. package/skills/acoustic-monitoring/resources/species-id-tools-comparison.md +89 -0
  34. package/skills/acoustic-monitoring/scripts/batch_species_detection.py +360 -0
  35. package/skills/acoustic-monitoring/scripts/compute_acoustic_indices.R +235 -0
  36. package/skills/acoustic-monitoring/scripts/compute_acoustic_indices.py +374 -0
  37. package/skills/biostatistics-workbench/SKILL.md +140 -0
  38. package/skills/biostatistics-workbench/examples/example-prompts.md +39 -0
  39. package/skills/biostatistics-workbench/resources/effect-size-reference.md +81 -0
  40. package/skills/biostatistics-workbench/resources/glm-family-link-reference.md +47 -0
  41. package/skills/biostatistics-workbench/resources/test-selection-guide.md +93 -0
  42. package/skills/biostatistics-workbench/scripts/glm_pipeline.R +78 -0
  43. package/skills/biostatistics-workbench/scripts/glm_pipeline.py +210 -0
  44. package/skills/camera-trap-processing/SKILL.md +159 -0
  45. package/skills/camera-trap-processing/examples/example-prompts.md +103 -0
  46. package/skills/camera-trap-processing/examples/leopard_serengeti_example.md +231 -0
  47. package/skills/camera-trap-processing/resources/activity-patterns-reference.md +113 -0
  48. package/skills/camera-trap-processing/resources/camtrapR-workflow-guide.md +130 -0
  49. package/skills/camera-trap-processing/resources/detection-event-definition-guide.md +89 -0
  50. package/skills/camera-trap-processing/scripts/estimate_activity.R +169 -0
  51. package/skills/camera-trap-processing/scripts/process_camtrap_data.R +179 -0
  52. package/skills/camera-trap-processing/scripts/process_camtrap_data.py +192 -0
  53. package/skills/community-ecology-ordination/SKILL.md +133 -0
  54. package/skills/community-ecology-ordination/examples/example-prompts.md +35 -0
  55. package/skills/community-ecology-ordination/resources/dissimilarity-metric-guide.md +53 -0
  56. package/skills/community-ecology-ordination/resources/nmds-interpretation-guide.md +104 -0
  57. package/skills/community-ecology-ordination/scripts/__pycache__/community_analysis.cpython-311.pyc +0 -0
  58. package/skills/community-ecology-ordination/scripts/community_analysis.R +143 -0
  59. package/skills/community-ecology-ordination/scripts/community_analysis.py +231 -0
  60. package/skills/ecological-data-foundation/SKILL.md +129 -0
  61. package/skills/ecological-data-foundation/examples/example-prompts.md +40 -0
  62. package/skills/ecological-data-foundation/resources/coordinate-cleaning-flags.md +66 -0
  63. package/skills/ecological-data-foundation/resources/darwin-core-glossary.md +91 -0
  64. package/skills/ecological-data-foundation/resources/data-citation-guide.md +265 -0
  65. package/skills/ecological-data-foundation/resources/gbif-data-citation-guide.md +193 -0
  66. package/skills/ecological-data-foundation/resources/qa-checklist.md +83 -0
  67. package/skills/ecological-data-foundation/scripts/__pycache__/clean_occurrences.cpython-311.pyc +0 -0
  68. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_ebird.cpython-311.pyc +0 -0
  69. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_inat.cpython-311.pyc +0 -0
  70. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_iucn.cpython-311.pyc +0 -0
  71. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_obis.cpython-311.pyc +0 -0
  72. package/skills/ecological-data-foundation/scripts/clean_occurrences.R +230 -0
  73. package/skills/ecological-data-foundation/scripts/clean_occurrences.py +268 -0
  74. package/skills/ecological-data-foundation/scripts/download_from_ebird.R +251 -0
  75. package/skills/ecological-data-foundation/scripts/download_from_ebird.py +364 -0
  76. package/skills/ecological-data-foundation/scripts/download_from_gbif.R +315 -0
  77. package/skills/ecological-data-foundation/scripts/download_from_gbif.py +407 -0
  78. package/skills/ecological-data-foundation/scripts/download_from_inat.R +238 -0
  79. package/skills/ecological-data-foundation/scripts/download_from_inat.py +304 -0
  80. package/skills/ecological-data-foundation/scripts/download_from_iucn.R +273 -0
  81. package/skills/ecological-data-foundation/scripts/download_from_iucn.py +344 -0
  82. package/skills/ecological-data-foundation/scripts/download_from_obis.R +248 -0
  83. package/skills/ecological-data-foundation/scripts/download_from_obis.py +318 -0
  84. package/skills/ecological-impact-assessment/SKILL.md +123 -0
  85. package/skills/ecological-impact-assessment/examples/example-prompts.md +32 -0
  86. package/skills/ecological-impact-assessment/resources/baci-design-guide.md +55 -0
  87. package/skills/ecological-impact-assessment/resources/fragmentation-metrics-reference.md +86 -0
  88. package/skills/ecological-impact-assessment/resources/pressure-index-template.md +78 -0
  89. package/skills/ecological-impact-assessment/resources/study-design-guide.md +168 -0
  90. package/skills/ecological-impact-assessment/scripts/baci_analysis.R +161 -0
  91. package/skills/ecological-impact-assessment/scripts/fragmentation_analysis.py +141 -0
  92. package/skills/ecological-impact-assessment/scripts/power_analysis_baci.R +274 -0
  93. package/skills/ecosystem-services-assessment/SKILL.md +125 -0
  94. package/skills/ecosystem-services-assessment/examples/example-prompts.md +24 -0
  95. package/skills/ecosystem-services-assessment/resources/es-indicator-reference.md +45 -0
  96. package/skills/ecosystem-services-assessment/resources/invest-parameter-guide.md +86 -0
  97. package/skills/ecosystem-services-assessment/resources/rusle-coefficients.md +88 -0
  98. package/skills/ecosystem-services-assessment/scripts/__pycache__/compute_es.cpython-311.pyc +0 -0
  99. package/skills/ecosystem-services-assessment/scripts/compute_es.py +189 -0
  100. package/skills/ecosystem-services-assessment/scripts/tradeoff_analysis.R +161 -0
  101. package/skills/environmental-time-series/SKILL.md +125 -0
  102. package/skills/environmental-time-series/examples/example-prompts.md +33 -0
  103. package/skills/environmental-time-series/resources/anomaly-indices-reference.md +88 -0
  104. package/skills/environmental-time-series/resources/bfast-parameter-guide.md +69 -0
  105. package/skills/environmental-time-series/scripts/__pycache__/recovery_trajectory.cpython-311.pyc +0 -0
  106. package/skills/environmental-time-series/scripts/__pycache__/trend_analysis.cpython-311.pyc +0 -0
  107. package/skills/environmental-time-series/scripts/recovery_trajectory.R +305 -0
  108. package/skills/environmental-time-series/scripts/recovery_trajectory.py +178 -0
  109. package/skills/environmental-time-series/scripts/trend_analysis.R +192 -0
  110. package/skills/environmental-time-series/scripts/trend_analysis.py +184 -0
  111. package/skills/geoprocessing-for-ecology/SKILL.md +123 -0
  112. package/skills/geoprocessing-for-ecology/examples/example-prompts.md +32 -0
  113. package/skills/geoprocessing-for-ecology/resources/crs-reference.md +62 -0
  114. package/skills/geoprocessing-for-ecology/resources/global-predictor-sources.md +331 -0
  115. package/skills/geoprocessing-for-ecology/resources/resampling-methods.md +57 -0
  116. package/skills/geoprocessing-for-ecology/scripts/__pycache__/download_predictors.cpython-311.pyc +0 -0
  117. package/skills/geoprocessing-for-ecology/scripts/download_predictors.R +239 -0
  118. package/skills/geoprocessing-for-ecology/scripts/download_predictors.py +379 -0
  119. package/skills/geoprocessing-for-ecology/scripts/stack_and_extract.R +224 -0
  120. package/skills/geoprocessing-for-ecology/scripts/stack_and_extract.py +172 -0
  121. package/skills/landscape-connectivity/SKILL.md +170 -0
  122. package/skills/landscape-connectivity/examples/example-prompts.md +96 -0
  123. package/skills/landscape-connectivity/examples/jaguar_mesoamerica_corridor_example.md +271 -0
  124. package/skills/landscape-connectivity/resources/circuitscape-parameter-guide.md +155 -0
  125. package/skills/landscape-connectivity/resources/graph-theory-for-ecology.md +134 -0
  126. package/skills/landscape-connectivity/resources/resistance-surface-guide.md +141 -0
  127. package/skills/landscape-connectivity/scripts/connectivity_analysis.py +387 -0
  128. package/skills/landscape-connectivity/scripts/connectivity_metrics.R +274 -0
  129. package/skills/landscape-connectivity/scripts/resistance_surface.R +239 -0
  130. package/skills/model-validation-and-uncertainty/SKILL.md +131 -0
  131. package/skills/model-validation-and-uncertainty/examples/example-prompts.md +30 -0
  132. package/skills/model-validation-and-uncertainty/resources/extrapolation-risk-guide.md +236 -0
  133. package/skills/model-validation-and-uncertainty/resources/metric-selection-guide.md +52 -0
  134. package/skills/model-validation-and-uncertainty/resources/threshold-selection-guide.md +64 -0
  135. package/skills/model-validation-and-uncertainty/scripts/__pycache__/validate_model.cpython-311.pyc +0 -0
  136. package/skills/model-validation-and-uncertainty/scripts/extrapolation_risk.R +315 -0
  137. package/skills/model-validation-and-uncertainty/scripts/validate_model.py +226 -0
  138. package/skills/model-validation-and-uncertainty/scripts/validate_sdm.R +162 -0
  139. package/skills/occupancy-and-detection/SKILL.md +126 -0
  140. package/skills/occupancy-and-detection/examples/example-prompts.md +33 -0
  141. package/skills/occupancy-and-detection/resources/detection-history-format.md +100 -0
  142. package/skills/occupancy-and-detection/resources/occupancy-study-design.md +47 -0
  143. package/skills/occupancy-and-detection/scripts/__pycache__/occupancy_analysis.cpython-311.pyc +0 -0
  144. package/skills/occupancy-and-detection/scripts/occupancy_analysis.R +160 -0
  145. package/skills/occupancy-and-detection/scripts/occupancy_analysis.py +159 -0
  146. package/skills/population-viability-analysis/SKILL.md +161 -0
  147. package/skills/population-viability-analysis/examples/african_elephant_pva_example.md +266 -0
  148. package/skills/population-viability-analysis/examples/example-prompts.md +95 -0
  149. package/skills/population-viability-analysis/resources/extinction-risk-thresholds.md +128 -0
  150. package/skills/population-viability-analysis/resources/matrix-model-guide.md +139 -0
  151. package/skills/population-viability-analysis/resources/sensitivity-elasticity-reference.md +182 -0
  152. package/skills/population-viability-analysis/scripts/matrix_pva.R +258 -0
  153. package/skills/population-viability-analysis/scripts/pva_analysis.py +442 -0
  154. package/skills/population-viability-analysis/scripts/stochastic_pva.R +353 -0
  155. package/skills/predictive-modeling-best-practices/SKILL.md +136 -0
  156. package/skills/predictive-modeling-best-practices/examples/example-prompts.md +58 -0
  157. package/skills/predictive-modeling-best-practices/resources/collinearity-decision-tree.md +65 -0
  158. package/skills/predictive-modeling-best-practices/resources/sampling-bias-correction.md +267 -0
  159. package/skills/predictive-modeling-best-practices/resources/spatial-cv-guide.md +73 -0
  160. package/skills/predictive-modeling-best-practices/scripts/__pycache__/spatial_cv.cpython-311.pyc +0 -0
  161. package/skills/predictive-modeling-best-practices/scripts/collinearity_check.R +112 -0
  162. package/skills/predictive-modeling-best-practices/scripts/spatial_cv.py +182 -0
  163. package/skills/reproducible-ecology-pipeline/SKILL.md +139 -0
  164. package/skills/reproducible-ecology-pipeline/examples/example-prompts.md +35 -0
  165. package/skills/reproducible-ecology-pipeline/resources/directory-structure-template.md +94 -0
  166. package/skills/reproducible-ecology-pipeline/resources/params-yaml-template.yaml +84 -0
  167. package/skills/reproducible-ecology-pipeline/resources/reproducibility-checklist-template.md +66 -0
  168. package/skills/reproducible-ecology-pipeline/scripts/generate_file_manifest.py +110 -0
  169. package/skills/reproducible-ecology-pipeline/scripts/init_project.sh +53 -0
  170. package/skills/spatial-prioritization/SKILL.md +162 -0
  171. package/skills/spatial-prioritization/examples/biodiversity_hotspot_prioritization_example.md +289 -0
  172. package/skills/spatial-prioritization/examples/example-prompts.md +93 -0
  173. package/skills/spatial-prioritization/resources/cost-surface-reference.md +130 -0
  174. package/skills/spatial-prioritization/resources/marxan-vs-prioritizr-comparison.md +125 -0
  175. package/skills/spatial-prioritization/resources/prioritizr-formulation-guide.md +188 -0
  176. package/skills/spatial-prioritization/resources/representation-targets-guide.md +186 -0
  177. package/skills/spatial-prioritization/scripts/prioritization_sensitivity.R +320 -0
  178. package/skills/spatial-prioritization/scripts/run_prioritization.R +336 -0
  179. package/skills/species-distribution-modeling/SKILL.md +139 -0
  180. package/skills/species-distribution-modeling/examples/example-prompts.md +36 -0
  181. package/skills/species-distribution-modeling/resources/algorithm-comparison.md +25 -0
  182. package/skills/species-distribution-modeling/resources/calibration-area-guide.md +71 -0
  183. package/skills/species-distribution-modeling/resources/climate-scenario-preparation.md +170 -0
  184. package/skills/species-distribution-modeling/resources/maxent-calibration-guide.md +211 -0
  185. package/skills/species-distribution-modeling/resources/sdm-checklist.md +37 -0
  186. package/skills/species-distribution-modeling/scripts/predict_distribution.R +236 -0
  187. package/skills/species-distribution-modeling/scripts/predict_distribution.py +286 -0
  188. package/skills/species-distribution-modeling/scripts/prepare_future_layers.R +351 -0
  189. package/skills/species-distribution-modeling/scripts/project_scenarios.R +220 -0
  190. package/skills/species-distribution-modeling/scripts/run_ensemble_sdm.R +99 -0
  191. package/skills/species-distribution-modeling/scripts/sdm_pipeline.py +318 -0
  192. package/skills/species-distribution-modeling/scripts/tune_maxnet.R +344 -0
  193. package/templates/SKILL_TEMPLATE.md +225 -0
  194. package/templates/checklists/data-submission-checklist.md +38 -0
  195. package/templates/checklists/post-analysis-checklist.md +55 -0
  196. package/templates/checklists/pre-analysis-checklist.md +31 -0
  197. package/templates/prompts/debug-skill.md +47 -0
  198. package/templates/prompts/invoke-skill.md +34 -0
  199. package/templates/prompts/invoke-workflow.md +45 -0
  200. package/templates/reports/technical-report-template.md +80 -0
  201. package/templates/scripts/logger_setup.R +79 -0
  202. package/templates/scripts/logger_setup.py +119 -0
  203. package/templates/scripts/params_loader.R +28 -0
  204. package/templates/scripts/params_loader.py +38 -0
  205. package/workflows/analyze-community-structure/WORKFLOW.md +72 -0
  206. package/workflows/analyze-environmental-change/WORKFLOW.md +73 -0
  207. package/workflows/assess-ecological-impact/WORKFLOW.md +75 -0
  208. package/workflows/assess-ecosystem-services/WORKFLOW.md +68 -0
  209. package/workflows/assess-landscape-connectivity/WORKFLOW.md +84 -0
  210. package/workflows/build-fire-risk-map/WORKFLOW.md +79 -0
  211. package/workflows/produce-technical-report/WORKFLOW.md +113 -0
  212. package/workflows/run-camera-trap-occupancy/WORKFLOW.md +87 -0
  213. package/workflows/run-conservation-prioritization/WORKFLOW.md +89 -0
  214. package/workflows/run-multispecies-screening/WORKFLOW.md +197 -0
  215. package/workflows/run-occupancy-analysis/WORKFLOW.md +74 -0
  216. package/workflows/run-population-viability/WORKFLOW.md +90 -0
  217. package/workflows/run-sdm-study/WORKFLOW.md +99 -0
@@ -0,0 +1,226 @@
1
+ #!/usr/bin/env python3
2
+ # ecological-agent-skills / Copyright (C) 2026 Francisco Diego Barros Barata
3
+ # SPDX-License-Identifier: GPL-3.0-or-later
4
+
5
+ """
6
+ validate_model.py
7
+ Compute AUC, TSS, calibration for binary predictions.
8
+ Usage: python validate_model.py <predictions_csv> <output_dir>
9
+ Requires: pandas, numpy, sklearn, matplotlib
10
+ """
11
+ import logging
12
+ import sys
13
+ from datetime import datetime
14
+ from pathlib import Path
15
+
16
+ SKILL_NAME = "model-validation-and-uncertainty"
17
+ _LOG_DIR = Path("logs")
18
+ _LOG_DIR.mkdir(parents=True, exist_ok=True)
19
+ _log_file = _LOG_DIR / f"skill_{SKILL_NAME}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
20
+ logging.basicConfig(
21
+ level=logging.INFO,
22
+ format="[%(asctime)s] [%(levelname)s] [" + SKILL_NAME + "] %(message)s",
23
+ datefmt="%Y-%m-%d %H:%M:%S",
24
+ handlers=[
25
+ logging.StreamHandler(sys.stdout),
26
+ logging.FileHandler(_log_file, encoding="utf-8"),
27
+ ],
28
+ )
29
+ logger = logging.getLogger(SKILL_NAME)
30
+
31
+ def log_step(n: int, desc: str) -> None:
32
+ logger.info("-- STEP %d: %s", n, desc)
33
+
34
+ def log_decision(var: str, val, why: str) -> None:
35
+ logger.info("DECISION | %s = %s | %s", var, val, why)
36
+
37
+ import numpy as np
38
+ import pandas as pd
39
+ import matplotlib.pyplot as plt
40
+ from sklearn.metrics import roc_auc_score, roc_curve
41
+
42
+
43
+ def compute_tss(y_true, y_pred_prob):
44
+ thresholds = np.linspace(0, 1, 101)
45
+ best_tss, best_thresh = -1, 0
46
+ for th in thresholds:
47
+ y_bin = (y_pred_prob >= th).astype(int)
48
+ tp = ((y_bin == 1) & (y_true == 1)).sum()
49
+ fp = ((y_bin == 1) & (y_true == 0)).sum()
50
+ tn = ((y_bin == 0) & (y_true == 0)).sum()
51
+ fn = ((y_bin == 0) & (y_true == 1)).sum()
52
+ sens = tp / (tp + fn) if (tp + fn) > 0 else 0
53
+ spec = tn / (tn + fp) if (tn + fp) > 0 else 0
54
+ tss = sens + spec - 1
55
+ if tss > best_tss:
56
+ best_tss, best_thresh = tss, th
57
+ return best_tss, best_thresh
58
+
59
+ def calibration_plot(y_true, y_pred, output_path, n_bins=10):
60
+ bins = np.linspace(0, 1, n_bins + 1)
61
+ bin_ids = np.digitize(y_pred, bins) - 1
62
+ bin_ids = np.clip(bin_ids, 0, n_bins - 1)
63
+ mean_pred, obs_rate, counts = [], [], []
64
+ for b in range(n_bins):
65
+ mask = bin_ids == b
66
+ if mask.sum() > 0:
67
+ mean_pred.append(y_pred[mask].mean())
68
+ obs_rate.append(y_true[mask].mean())
69
+ counts.append(mask.sum())
70
+ fig, ax = plt.subplots(figsize=(6, 5))
71
+ ax.plot([0, 1], [0, 1], "k--", label="Perfect calibration")
72
+ sc = ax.scatter(mean_pred, obs_rate, c=counts, cmap="Blues", s=80, edgecolor="navy", zorder=3)
73
+ ax.plot(mean_pred, obs_rate, color="steelblue")
74
+ plt.colorbar(sc, ax=ax, label="n per bin")
75
+ ax.set_xlabel("Mean predicted probability"); ax.set_ylabel("Observed rate")
76
+ ax.set_title("Calibration Plot"); ax.legend()
77
+ plt.tight_layout()
78
+ plt.savefig(output_path, dpi=150)
79
+ plt.close()
80
+
81
+ def roc_plot(y_true, y_pred, auc_val, output_path):
82
+ fpr, tpr, _ = roc_curve(y_true, y_pred)
83
+ plt.figure(figsize=(5, 5))
84
+ plt.plot(fpr, tpr, label=f"AUC = {auc_val:.3f}", color="steelblue")
85
+ plt.plot([0, 1], [0, 1], "k--")
86
+ plt.xlabel("False Positive Rate"); plt.ylabel("True Positive Rate")
87
+ plt.title("ROC Curve"); plt.legend()
88
+ plt.tight_layout()
89
+ plt.savefig(output_path, dpi=150)
90
+ plt.close()
91
+
92
+ def main():
93
+ pred_file = sys.argv[1] if len(sys.argv) > 1 else "outputs/predictions.csv"
94
+ output_dir = Path(sys.argv[2]) if len(sys.argv) > 2 else Path("outputs/validation")
95
+
96
+ log_step(1, "Validate inputs")
97
+ if not Path(pred_file).exists():
98
+ logger.error(
99
+ "Predictions file not found: %s\n"
100
+ "Causa provavel: caminho incorreto ou modelo nao gerou predicoes ainda\n"
101
+ "Verifique: o argumento predictions_csv e que o modelo foi ajustado\n"
102
+ "Skill anterior: species-distribution-modelling",
103
+ pred_file
104
+ )
105
+ sys.exit(1)
106
+
107
+ output_dir.mkdir(parents=True, exist_ok=True)
108
+
109
+ log_step(2, "Load predictions data")
110
+ try:
111
+ dat = pd.read_csv(pred_file)
112
+ except Exception as e:
113
+ logger.error(
114
+ "Unexpected error in load data: %s\n"
115
+ "Causa provavel: CSV malformado ou permissoes insuficientes\n"
116
+ "Verifique: encoding e estrutura do arquivo de predicoes\n"
117
+ "Skill anterior: species-distribution-modelling",
118
+ e
119
+ )
120
+ raise
121
+
122
+ if "observed" not in dat.columns or "predicted" not in dat.columns:
123
+ logger.error(
124
+ "Required columns missing. Expected: 'observed', 'predicted'. Found: %s\n"
125
+ "Causa provavel: cabecalho do CSV nao padronizado\n"
126
+ "Verifique: que o arquivo tem colunas 'observed' (0/1) e 'predicted' (probabilidade)\n"
127
+ "Skill anterior: species-distribution-modelling",
128
+ list(dat.columns)
129
+ )
130
+ sys.exit(1)
131
+
132
+ y_true = dat["observed"].values
133
+ y_pred = dat["predicted"].values
134
+
135
+ logger.info("Loaded %d predictions. Prevalence: %.3f", len(dat), y_true.mean())
136
+
137
+ n_pos = (y_true == 1).sum()
138
+ n_neg = (y_true == 0).sum()
139
+ log_decision("evaluation_metrics", "AUC + MaxTSS + calibration plot", "standard triad for binary SDM evaluation")
140
+ logger.info("Presences: %d | Absences: %d", n_pos, n_neg)
141
+
142
+ if n_pos < 10:
143
+ logger.warning(
144
+ "Only %d presence records. AUC and TSS estimates will be highly uncertain.", n_pos
145
+ )
146
+ if n_neg < 10:
147
+ logger.warning(
148
+ "Only %d absence/background records. Consider increasing background sample size.", n_neg
149
+ )
150
+ if np.any((y_pred < 0) | (y_pred > 1)):
151
+ logger.warning("Some predicted values are outside [0, 1]. Check that predictions are probabilities.")
152
+
153
+ log_step(3, "Compute AUC-ROC")
154
+ try:
155
+ auc = roc_auc_score(y_true, y_pred)
156
+ logger.info("AUC-ROC: %.4f", auc)
157
+ if auc < 0.7:
158
+ logger.warning(
159
+ "AUC = %.4f is below 0.70. Model discrimination is poor. "
160
+ "Consider revisiting predictors or sampling design.", auc
161
+ )
162
+ except Exception as e:
163
+ logger.error(
164
+ "Unexpected error in AUC-ROC: %s\n"
165
+ "Causa provavel: apenas uma classe em 'observed' ou valores NA\n"
166
+ "Verifique: que 'observed' contem tanto 0 quanto 1 e 'predicted' nao tem NA\n"
167
+ "Skill anterior: species-distribution-modelling",
168
+ e
169
+ )
170
+ raise
171
+
172
+ log_step(4, "Compute MaxTSS and optimal threshold")
173
+ log_decision("threshold_method", "MaxTSS", "maximises sensitivity + specificity; robust for SDMs")
174
+ try:
175
+ tss, thresh = compute_tss(y_true, y_pred)
176
+ logger.info("Max TSS: %.4f (threshold = %.2f)", tss, thresh)
177
+ if tss < 0.4:
178
+ logger.warning(
179
+ "MaxTSS = %.4f is low. Model may have poor predictive performance.", tss
180
+ )
181
+ except Exception as e:
182
+ logger.error(
183
+ "Unexpected error in TSS computation: %s\n"
184
+ "Causa provavel: valores NA ou classe unica em 'observed'\n"
185
+ "Verifique: que 'observed' contem tanto 0 quanto 1\n"
186
+ "Skill anterior: species-distribution-modelling",
187
+ e
188
+ )
189
+ raise
190
+
191
+ log_step(5, "Save performance metrics")
192
+ try:
193
+ metrics = pd.DataFrame({"metric": ["AUC-ROC", "MaxTSS", "Threshold_MaxTSS"],
194
+ "value": [round(auc, 4), round(tss, 4), thresh]})
195
+ metrics.to_csv(output_dir / "performance_metrics.csv", index=False)
196
+ logger.info("Performance metrics saved.")
197
+ except Exception as e:
198
+ logger.error(
199
+ "Unexpected error in save metrics: %s\n"
200
+ "Causa provavel: diretorio sem permissao de escrita\n"
201
+ "Verifique: output_dir e permissoes do sistema de arquivos\n"
202
+ "Skill anterior: model-validation-and-uncertainty (metrics computation)",
203
+ e
204
+ )
205
+ raise
206
+
207
+ log_step(6, "Generate diagnostic plots")
208
+ try:
209
+ calibration_plot(y_true, y_pred, output_dir / "calibration_plot.png")
210
+ logger.info("Calibration plot saved.")
211
+ roc_plot(y_true, y_pred, auc, output_dir / "roc_curve.png")
212
+ logger.info("ROC curve saved.")
213
+ except Exception as e:
214
+ logger.error(
215
+ "Unexpected error in diagnostic plots: %s\n"
216
+ "Causa provavel: dados insuficientes por bin ou backend matplotlib indisponivel\n"
217
+ "Verifique: distribuicao dos valores preditos e configuracao do matplotlib\n"
218
+ "Skill anterior: model-validation-and-uncertainty (metrics computation)",
219
+ e
220
+ )
221
+ raise
222
+
223
+ logger.info("Outputs written to: %s", output_dir)
224
+
225
+ if __name__ == "__main__":
226
+ main()
@@ -0,0 +1,162 @@
1
+ # ecological-agent-skills / Copyright (C) 2026 Francisco Diego Barros Barata
2
+ # SPDX-License-Identifier: GPL-3.0-or-later
3
+
4
+ # Usage: Rscript validate_sdm.R <model.rds> <test_data.csv> <output_dir> [threshold_method]
5
+ # Compute AUC, TSS, Boyce index and calibration for SDM predictions
6
+ # Usage: Rscript validate_sdm.R <predictions_csv> <output_dir>
7
+ # Requires: PresenceAbsence, ecospat, dplyr, ggplot2
8
+
9
+ # ── Inline logger ─────────────────────────────────────────────────────────────
10
+ SKILL_NAME <- "model-validation-and-uncertainty"
11
+ .log_ts <- function() format(Sys.time(), "[%Y-%m-%d %H:%M:%S]")
12
+ log_info <- function(...) message(.log_ts(), " [INFO] ", sprintf(...))
13
+ log_warn <- function(...) message(.log_ts(), " [WARN] ", sprintf(...))
14
+ log_error<- function(...) message(.log_ts(), " [ERROR] ", sprintf(...))
15
+ log_step <- function(n, d) log_info("-- STEP %d: %s", n, d)
16
+ log_decision <- function(v, val, why) log_info("DECISION | %s = %s | %s", v, val, why)
17
+ dir.create("logs", recursive=TRUE, showWarnings=FALSE)
18
+
19
+ suppressPackageStartupMessages({
20
+ library(dplyr)
21
+ library(ggplot2)
22
+ })
23
+
24
+ args <- commandArgs(trailingOnly = TRUE)
25
+ pred_file <- ifelse(length(args) >= 1, args[1], "outputs/predictions.csv")
26
+ output_dir <- ifelse(length(args) >= 2, args[2], "outputs/validation")
27
+
28
+ log_step(1, "Validate inputs")
29
+ if (!file.exists(pred_file)) {
30
+ log_error(
31
+ "Falha em validate inputs: arquivo de predicoes nao encontrado: %s\nCausa provavel: caminho incorreto ou modelo nao gerou predicoes ainda\nVerifique: o argumento predictions_csv e que o modelo foi ajustado\nSkill anterior: species-distribution-modelling",
32
+ pred_file
33
+ )
34
+ stop("Predictions file not found.")
35
+ }
36
+
37
+ dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)
38
+
39
+ log_step(2, "Load predictions data")
40
+ tryCatch({
41
+ dat <- read.csv(pred_file)
42
+ }, error = function(e) {
43
+ log_error(
44
+ "Falha em load data: %s\nCausa provavel: CSV malformado ou permissoes insuficientes\nVerifique: encoding e estrutura do arquivo de predicoes\nSkill anterior: species-distribution-modelling",
45
+ conditionMessage(e)
46
+ )
47
+ stop(e)
48
+ })
49
+
50
+ if (!all(c("observed", "predicted") %in% names(dat))) {
51
+ log_error(
52
+ "Falha em validate columns: colunas obrigatorias ausentes. Esperado: 'observed', 'predicted'. Encontrado: %s\nCausa provavel: cabecalho do CSV nao padronizado\nVerifique: que o arquivo tem colunas 'observed' (0/1) e 'predicted' (probabilidade)\nSkill anterior: species-distribution-modelling",
53
+ paste(names(dat), collapse = ", ")
54
+ )
55
+ stop("Required columns 'observed' and 'predicted' not found.")
56
+ }
57
+
58
+ log_info("Loaded %d predictions. Prevalence: %.3f", nrow(dat), mean(dat$observed))
59
+
60
+ n_pos <- sum(dat$observed == 1)
61
+ n_neg <- sum(dat$observed == 0)
62
+ log_decision("evaluation_approach", "AUC + MaxTSS + calibration", "standard triad for binary SDM evaluation")
63
+ log_info("Presences: %d | Absences: %d", n_pos, n_neg)
64
+
65
+ if (n_pos < 10) {
66
+ log_warn("Only %d presence records. AUC and TSS estimates will be highly uncertain with so few presences.", n_pos)
67
+ }
68
+ if (n_neg < 10) {
69
+ log_warn("Only %d absence/background records. Consider increasing background sample size.", n_neg)
70
+ }
71
+
72
+ if (any(dat$predicted < 0 | dat$predicted > 1, na.rm = TRUE)) {
73
+ log_warn("Some predicted values are outside [0, 1]. Check that predictions are probabilities.")
74
+ }
75
+
76
+ log_step(3, "Compute AUC-ROC")
77
+ tryCatch({
78
+ roc_data <- dat |> arrange(desc(predicted))
79
+ roc_data$tpr <- cumsum(roc_data$observed == 1) / n_pos
80
+ roc_data$fpr <- cumsum(roc_data$observed == 0) / n_neg
81
+ auc <- abs(sum(diff(roc_data$fpr) * (roc_data$tpr[-1] + roc_data$tpr[-nrow(roc_data)]) / 2))
82
+ log_info("AUC-ROC: %.3f", auc)
83
+ if (auc < 0.7) {
84
+ log_warn("AUC = %.3f is below 0.70. Model discrimination is poor. Consider revisiting predictors or sampling design.", auc)
85
+ }
86
+ }, error = function(e) {
87
+ log_error(
88
+ "Falha em AUC-ROC: %s\nCausa provavel: valores NA em 'observed' ou 'predicted', ou apenas uma classe\nVerifique: que 'observed' contem 0 e 1 e 'predicted' nao tem NA\nSkill anterior: species-distribution-modelling",
89
+ conditionMessage(e)
90
+ )
91
+ stop(e)
92
+ })
93
+
94
+ log_step(4, "Compute MaxTSS and optimal threshold")
95
+ log_decision("threshold_method", "MaxTSS", "maximises sensitivity + specificity; robust for SDMs")
96
+ tryCatch({
97
+ thresholds <- seq(0, 1, by = 0.01)
98
+ tss_vals <- sapply(thresholds, function(th) {
99
+ pred_bin <- as.integer(dat$predicted >= th)
100
+ tp <- sum(pred_bin == 1 & dat$observed == 1)
101
+ fp <- sum(pred_bin == 1 & dat$observed == 0)
102
+ tn <- sum(pred_bin == 0 & dat$observed == 0)
103
+ fn <- sum(pred_bin == 0 & dat$observed == 1)
104
+ sens <- if ((tp + fn) > 0) tp / (tp + fn) else 0
105
+ spec <- if ((tn + fp) > 0) tn / (tn + fp) else 0
106
+ sens + spec - 1
107
+ })
108
+ best_tss_idx <- which.max(tss_vals)
109
+ best_thresh <- thresholds[best_tss_idx]
110
+ best_tss <- tss_vals[best_tss_idx]
111
+ log_info("MaxTSS: %.3f at threshold: %.2f", best_tss, best_thresh)
112
+ if (best_tss < 0.4) {
113
+ log_warn("MaxTSS = %.3f is low. Model may have poor predictive performance.", best_tss)
114
+ }
115
+ }, error = function(e) {
116
+ log_error(
117
+ "Falha em TSS computation: %s\nCausa provavel: valores NA ou classe unica em 'observed'\nVerifique: que 'observed' contem tanto 0 quanto 1\nSkill anterior: species-distribution-modelling",
118
+ conditionMessage(e)
119
+ )
120
+ stop(e)
121
+ })
122
+
123
+ log_step(5, "Save performance metrics")
124
+ tryCatch({
125
+ metrics <- data.frame(
126
+ metric = c("AUC-ROC", "MaxTSS", "Threshold_MaxTSS"),
127
+ value = round(c(auc, best_tss, best_thresh), 4)
128
+ )
129
+ write.csv(metrics, file.path(output_dir, "performance_metrics.csv"), row.names = FALSE)
130
+ log_info("Performance metrics saved.")
131
+ }, error = function(e) {
132
+ log_error(
133
+ "Falha em save metrics: %s\nCausa provavel: diretorio sem permissao de escrita\nVerifique: output_dir e permissoes do sistema de arquivos\nSkill anterior: model-validation-and-uncertainty (metrics computation)",
134
+ conditionMessage(e)
135
+ )
136
+ stop(e)
137
+ })
138
+
139
+ log_step(6, "Generate calibration plot")
140
+ tryCatch({
141
+ dat$bin <- cut(dat$predicted, breaks = seq(0, 1, by = 0.1), include.lowest = TRUE)
142
+ cal <- dat |>
143
+ group_by(bin) |>
144
+ summarise(mean_pred = mean(predicted), obs_rate = mean(observed), n = n(), .groups = "drop")
145
+
146
+ p_cal <- ggplot(cal, aes(x = mean_pred, y = obs_rate)) +
147
+ geom_abline(slope = 1, intercept = 0, linetype = "dashed", colour = "grey50") +
148
+ geom_point(aes(size = n), colour = "#2166ac") +
149
+ geom_line(colour = "#2166ac") +
150
+ scale_size_area(max_size = 8) +
151
+ labs(title = "Calibration Plot", x = "Mean Predicted Probability", y = "Observed Rate",
152
+ size = "n") +
153
+ theme_bw()
154
+ ggsave(file.path(output_dir, "calibration_plot.png"), p_cal, width = 6, height = 5, dpi = 150)
155
+ log_info("Calibration plot written.")
156
+ }, error = function(e) {
157
+ log_error(
158
+ "Falha em calibration plot: %s\nCausa provavel: dados insuficientes por bin ou valores extremos de predicao\nVerifique: distribuicao dos valores preditos e numero de registros\nSkill anterior: model-validation-and-uncertainty (metrics computation)",
159
+ conditionMessage(e)
160
+ )
161
+ stop(e)
162
+ })
@@ -0,0 +1,126 @@
1
+ ---
2
+ name: occupancy-and-detection
3
+ description: "Fits single-season and dynamic occupancy models that account for imperfect detection in wildlife survey data. Use this skill when the user mentions occupancy estimation, detection probability, imperfect detection, detection histories, repeated visits, MacKenzie models, psi estimation, dynamic occupancy (colonization/extinction), goodness-of-fit testing (c-hat), site occupancy, or unmarked package analyses."
4
+ skill_version: 1.0.0
5
+ ---
6
+
7
+ # Skill: occupancy-and-detection
8
+
9
+ **Domain:** Occupancy models · Imperfect detection · Replicate surveys
10
+ **Phase:** 3 — Specialist
11
+ **Used by:** run-occupancy-analysis
12
+
13
+ ---
14
+
15
+ ## Purpose
16
+
17
+ Guides the agent through the design and analysis of occupancy studies that account for imperfect detection. Covers single-season and dynamic occupancy models, covariate specification, goodness-of-fit testing, and result interpretation.
18
+
19
+ ---
20
+
21
+ ## When to Invoke
22
+
23
+ - Species were surveyed at multiple sites with repeated visits
24
+ - Detection probability is likely < 1 and must be estimated separately from occupancy
25
+ - The goal is to estimate ψ (occupancy) and p (detection) and their covariates
26
+ - Designing a new monitoring protocol where detection needs to be modelled
27
+
28
+ ---
29
+
30
+ ## Inputs
31
+
32
+ | Input | Format | Required |
33
+ |-------|--------|----------|
34
+ | Detection history matrix (sites × occasions) | CSV (1/0/NA) | Yes |
35
+ | Site-level covariates (ψ covariates) | CSV | Recommended |
36
+ | Observation-level covariates (p covariates) | CSV or 3D array | Recommended |
37
+ | Number of seasons (for dynamic models) | Integer | Conditional |
38
+
39
+ ---
40
+
41
+ ## Outputs
42
+
43
+ | Output | Description |
44
+ |--------|-------------|
45
+ | `occupancy_estimates.csv` | ψ estimates per site (if site-level) |
46
+ | `detection_estimates.csv` | p estimates per occasion |
47
+ | `model_selection_table.csv` | AIC table for all candidate models |
48
+ | `covariate_effects.csv` | Beta coefficients with 95% CIs |
49
+ | `gof_report.md` | MacKenzie-Bailey χ² goodness-of-fit |
50
+ | `occupancy_map.tif` | Predicted occupancy surface (if spatial) |
51
+
52
+ ---
53
+
54
+ ## Steps
55
+
56
+ ### 1. Assess Study Design
57
+ - Confirm: multiple sites, multiple repeat surveys per site within a season
58
+ - Confirm: population is closed within season (single-season) or document seasons
59
+ - Calculate naive occupancy (proportion of sites with ≥1 detection) as a baseline
60
+ - Report detection rates per occasion
61
+
62
+ ### 2. Format the Detection History
63
+ - Rows = sites, columns = survey occasions
64
+ - Values: 1 (detected), 0 (surveyed, not detected), NA (not surveyed)
65
+ - Standardise continuous covariates (mean = 0, SD = 1)
66
+
67
+ ### 3. Define Candidate Models
68
+ - Build candidate model set based on a priori ecological hypotheses
69
+ - Include a null model (ψ(.), p(.)) as baseline
70
+ - Typical covariate hypotheses for ψ: habitat quality, elevation, disturbance index
71
+ - Typical covariate hypotheses for p: observer, time of day, weather, survey effort
72
+ - Avoid all-subsets model selection; limit to ≤ K candidates (K = sample size / 10)
73
+
74
+ ### 4. Fit Models
75
+ - Use maximum likelihood (unmarked package) or Bayesian (JAGS/Stan) estimation
76
+ - For single-season: `occu(~p_covariates ~psi_covariates)`
77
+ - For dynamic (multi-season): specify colonisation (γ) and extinction (ε) parameters
78
+ - Check for convergence warnings
79
+
80
+ ### 5. Goodness-of-Fit
81
+ - Apply MacKenzie-Bailey χ² test (parametric bootstrap, n = 1000 iterations)
82
+ - Report ĉ (overdispersion factor); if ĉ > 1.5, use QAICc instead of AICc
83
+ - Visualise observed vs expected detection frequencies
84
+
85
+ ### 6. Model Selection
86
+ - Rank by AICc (or QAICc)
87
+ - Report ΔAIC and Akaike weights
88
+ - If top models are within ΔAIC < 2, use model averaging
89
+
90
+ ### 7. Interpret Results
91
+ - Report ψ with 95% CI on the probability scale
92
+ - Report p with 95% CI; discuss implications for survey design
93
+ - Report covariate effects as odds ratios or backtransformed probabilities
94
+ - Compute minimum number of surveys needed to confirm absence (given estimated p)
95
+
96
+ ---
97
+
98
+ ## Key Decisions to Document
99
+
100
+ - Closure assumption justification
101
+ - Candidate model set rationale
102
+ - Goodness-of-fit result and action taken (e.g., use QAICc)
103
+ - Model averaging vs. best-model inference
104
+
105
+ ---
106
+
107
+ ## Tools and Libraries
108
+
109
+ **R:** `unmarked`, `RPresence`, `PRESENCE`, `jagsUI`, `rstan`
110
+ **Python:** `pyoccupancy` (limited), interface to JAGS via `pyjags`
111
+
112
+ ---
113
+
114
+ ## Resources
115
+
116
+ - `resources/occupancy-study-design.md` — required replicates for target power
117
+ - `resources/detection-history-format.md` — how to format the input matrix
118
+ - `examples/` — worked single-season and dynamic occupancy examples
119
+
120
+ ---
121
+
122
+ ## Notes
123
+
124
+ - At least 3 repeat surveys per site are recommended for reliable p estimation
125
+ - Spatial replication (many sites) is more important than temporal replication per site
126
+ - Dynamic models require careful closure assumption per season
@@ -0,0 +1,33 @@
1
+ # Example Invocation Prompts — occupancy-and-detection
2
+
3
+ ## Single-Season Occupancy
4
+
5
+ ```
6
+ Load skill: occupancy-and-detection
7
+ Task: Single-season occupancy analysis for puma (Puma concolor) from camera trap data.
8
+
9
+ Files:
10
+ - data/detection_history.csv (80 sites × 6 survey occasions; 1/0/NA)
11
+ - data/site_covariates.csv (elevation, forest_cover, dist_to_road)
12
+ - data/obs_covariates.csv (effort_nights per occasion, observer_id)
13
+
14
+ Candidate models (occupancy ~ ..., detection ~ ...):
15
+ ψ(forest_cover), p(effort)
16
+ ψ(forest_cover + dist_to_road), p(effort)
17
+ ψ(elevation + forest_cover), p(effort + observer)
18
+ ψ(.), p(.) ← null model
19
+
20
+ Run goodness-of-fit (MacKenzie-Bailey χ², 1000 bootstraps).
21
+ Select by AICc. If ĉ > 1.5, use QAICc.
22
+ Report ψ and p estimates with 95% CIs on probability scale.
23
+ ```
24
+
25
+ ## Power Analysis
26
+
27
+ ```
28
+ Load skill: occupancy-and-detection
29
+ Task: Power analysis for a proposed camera trap study.
30
+ Expected occupancy (ψ): 0.4. Expected detection per occasion (p): 0.25.
31
+ Target: 80% power to detect a 20% decline in occupancy.
32
+ How many sites and survey occasions are needed?
33
+ ```
@@ -0,0 +1,100 @@
1
+ # Detection History Format Reference
2
+
3
+ The detection history matrix is the primary input for all occupancy models. Correct formatting is essential.
4
+
5
+ ## Matrix Structure
6
+
7
+ - **Rows** = sites (sampling units)
8
+ - **Columns** = survey occasions (within season)
9
+ - **Values**: `1` (detected), `0` (surveyed, not detected), `NA` (not surveyed)
10
+
11
+ ```
12
+ occ1 occ2 occ3 occ4 occ5 occ6
13
+ site_01 1 0 1 1 0 NA
14
+ site_02 0 0 0 0 0 0
15
+ site_03 NA 1 1 0 1 1
16
+ site_04 0 NA 0 NA 0 0
17
+ site_05 0 0 0 0 0 0
18
+ ```
19
+
20
+ ## Critical Rules
21
+
22
+ 1. `0` means the site WAS surveyed but species was NOT detected — not the same as `NA`
23
+ 2. `NA` means the site was NOT surveyed on that occasion (equipment failure, weather, etc.)
24
+ 3. A row of all zeros = site was surveyed on all occasions, never detected
25
+ 4. A row of all `NA` = site was never surveyed; **remove this row**
26
+ 5. Occasions must be within the closure period (population assumed closed)
27
+
28
+ ## Building the Matrix in R
29
+
30
+ ```r
31
+ library(dplyr)
32
+ library(tidyr)
33
+
34
+ # Assuming raw_data has columns: site_id, occasion, detected (0/1), surveyed (TRUE/FALSE)
35
+ det_history <- raw_data |>
36
+ mutate(value = ifelse(!surveyed, NA, detected)) |>
37
+ pivot_wider(id_cols = site_id, names_from = occasion, values_from = value,
38
+ names_prefix = "occ") |>
39
+ column_to_rownames("site_id") |>
40
+ as.matrix()
41
+
42
+ # Sanity checks
43
+ cat("Sites:", nrow(det_history), "\n")
44
+ cat("Occasions:", ncol(det_history), "\n")
45
+ cat("Detection rate:", mean(det_history, na.rm = TRUE), "\n")
46
+ cat("Sites with ≥1 detection:", sum(rowSums(det_history, na.rm=TRUE) > 0), "\n")
47
+ cat("Sites never surveyed (remove):", sum(rowSums(!is.na(det_history)) == 0), "\n")
48
+ ```
49
+
50
+ ## Observation Covariates
51
+
52
+ Site-level covariates (for ψ): one row per site, one column per variable.
53
+ Observation-level covariates (for p): same dimensions as detection history matrix, or a list of matrices.
54
+
55
+ ```r
56
+ # Site covariates (same row order as detection history)
57
+ site_covs <- data.frame(
58
+ forest_cover = c(0.82, 0.45, 0.91, 0.33, 0.71), # 0–1
59
+ elevation_m = c(450, 230, 680, 150, 520),
60
+ row.names = rownames(det_history)
61
+ )
62
+
63
+ # Observation covariates (matrix: same dimensions as detection history)
64
+ effort_nights <- matrix(
65
+ c(3, 3, NA, 3, 3, 3, # site 1
66
+ 3, 3, 3, 3, 3, 3, # site 2
67
+ ...),
68
+ nrow = nrow(det_history), byrow = TRUE
69
+ )
70
+
71
+ # Build unmarkedFrame
72
+ library(unmarked)
73
+ umf <- unmarkedFrameOccu(
74
+ y = det_history,
75
+ siteCovs = site_covs,
76
+ obsCovs = list(effort = effort_nights)
77
+ )
78
+ summary(umf)
79
+ ```
80
+
81
+ ## Standardising Covariates
82
+
83
+ Always standardise continuous covariates to mean = 0, SD = 1 before modelling:
84
+
85
+ ```r
86
+ site_covs_std <- site_covs |>
87
+ mutate(across(where(is.numeric), ~ as.vector(scale(.))))
88
+ ```
89
+
90
+ This improves numerical stability and allows direct comparison of coefficient magnitudes.
91
+
92
+ ## Common Formatting Errors
93
+
94
+ | Error | Symptom | Fix |
95
+ |-------|---------|-----|
96
+ | Using -9 or 999 as NA code | Model fails to converge | Replace with `NA` |
97
+ | Occasions in wrong order | Apparent temporal patterns are artefacts | Sort by date within site |
98
+ | Site covariate row order mismatched | Covariates assigned to wrong sites | Use row names to match |
99
+ | Mixing detection probability with occupancy | Overestimates p | Only use surveys within closure period |
100
+ | Zero variance in a covariate | Model rank deficiency | Remove constant covariates |