ecological-agent-skills 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (217) hide show
  1. package/AGENT_CONTEXT.md +191 -0
  2. package/CATALOG.md +329 -0
  3. package/LICENSE +692 -0
  4. package/README.md +347 -0
  5. package/bin/install.mjs +168 -0
  6. package/docs/comparison-with-alternatives.md +38 -0
  7. package/docs/global-examples-index.md +103 -0
  8. package/docs/repository-statistics.md +101 -0
  9. package/docs/theoretical-foundations.md +188 -0
  10. package/environment.yaml +106 -0
  11. package/examples/community/arctic_tundra_vegetation_example.md +247 -0
  12. package/examples/community/bird_landuse_example.md +63 -0
  13. package/examples/community/phytoplankton_reservoir_example.md +60 -0
  14. package/examples/community/reef_fish_indopacific_example.md +221 -0
  15. package/examples/impact/baci_road_example.md +57 -0
  16. package/examples/impact/ecosystem_services_atlantic_forest.md +83 -0
  17. package/examples/impact/forest_loss_borneo_timeseries_example.md +225 -0
  18. package/examples/occupancy/puma_camera_example.md +61 -0
  19. package/examples/occupancy/snow_leopard_himalayas_example.md +204 -0
  20. package/examples/reproducible/whittaker_biome_sdm_example.md +406 -0
  21. package/examples/sdm/anteater_cerrado_example.md +69 -0
  22. package/examples/sdm/jaguar_amazon_example.md +80 -0
  23. package/examples/sdm/koala_climate_change_example.md +170 -0
  24. package/examples/sdm/wolf_recolonization_europe_example.md +193 -0
  25. package/package.json +43 -0
  26. package/renv.lock +194 -0
  27. package/skills/SKILL_INDEX.json +1020 -0
  28. package/skills/acoustic-monitoring/SKILL.md +163 -0
  29. package/skills/acoustic-monitoring/examples/example-prompts.md +100 -0
  30. package/skills/acoustic-monitoring/examples/temperate_forest_birds_example.md +285 -0
  31. package/skills/acoustic-monitoring/resources/acoustic-indices-reference.md +93 -0
  32. package/skills/acoustic-monitoring/resources/soundscape-ecology-guide.md +90 -0
  33. package/skills/acoustic-monitoring/resources/species-id-tools-comparison.md +89 -0
  34. package/skills/acoustic-monitoring/scripts/batch_species_detection.py +360 -0
  35. package/skills/acoustic-monitoring/scripts/compute_acoustic_indices.R +235 -0
  36. package/skills/acoustic-monitoring/scripts/compute_acoustic_indices.py +374 -0
  37. package/skills/biostatistics-workbench/SKILL.md +140 -0
  38. package/skills/biostatistics-workbench/examples/example-prompts.md +39 -0
  39. package/skills/biostatistics-workbench/resources/effect-size-reference.md +81 -0
  40. package/skills/biostatistics-workbench/resources/glm-family-link-reference.md +47 -0
  41. package/skills/biostatistics-workbench/resources/test-selection-guide.md +93 -0
  42. package/skills/biostatistics-workbench/scripts/glm_pipeline.R +78 -0
  43. package/skills/biostatistics-workbench/scripts/glm_pipeline.py +210 -0
  44. package/skills/camera-trap-processing/SKILL.md +159 -0
  45. package/skills/camera-trap-processing/examples/example-prompts.md +103 -0
  46. package/skills/camera-trap-processing/examples/leopard_serengeti_example.md +231 -0
  47. package/skills/camera-trap-processing/resources/activity-patterns-reference.md +113 -0
  48. package/skills/camera-trap-processing/resources/camtrapR-workflow-guide.md +130 -0
  49. package/skills/camera-trap-processing/resources/detection-event-definition-guide.md +89 -0
  50. package/skills/camera-trap-processing/scripts/estimate_activity.R +169 -0
  51. package/skills/camera-trap-processing/scripts/process_camtrap_data.R +179 -0
  52. package/skills/camera-trap-processing/scripts/process_camtrap_data.py +192 -0
  53. package/skills/community-ecology-ordination/SKILL.md +133 -0
  54. package/skills/community-ecology-ordination/examples/example-prompts.md +35 -0
  55. package/skills/community-ecology-ordination/resources/dissimilarity-metric-guide.md +53 -0
  56. package/skills/community-ecology-ordination/resources/nmds-interpretation-guide.md +104 -0
  57. package/skills/community-ecology-ordination/scripts/__pycache__/community_analysis.cpython-311.pyc +0 -0
  58. package/skills/community-ecology-ordination/scripts/community_analysis.R +143 -0
  59. package/skills/community-ecology-ordination/scripts/community_analysis.py +231 -0
  60. package/skills/ecological-data-foundation/SKILL.md +129 -0
  61. package/skills/ecological-data-foundation/examples/example-prompts.md +40 -0
  62. package/skills/ecological-data-foundation/resources/coordinate-cleaning-flags.md +66 -0
  63. package/skills/ecological-data-foundation/resources/darwin-core-glossary.md +91 -0
  64. package/skills/ecological-data-foundation/resources/data-citation-guide.md +265 -0
  65. package/skills/ecological-data-foundation/resources/gbif-data-citation-guide.md +193 -0
  66. package/skills/ecological-data-foundation/resources/qa-checklist.md +83 -0
  67. package/skills/ecological-data-foundation/scripts/__pycache__/clean_occurrences.cpython-311.pyc +0 -0
  68. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_ebird.cpython-311.pyc +0 -0
  69. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_inat.cpython-311.pyc +0 -0
  70. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_iucn.cpython-311.pyc +0 -0
  71. package/skills/ecological-data-foundation/scripts/__pycache__/download_from_obis.cpython-311.pyc +0 -0
  72. package/skills/ecological-data-foundation/scripts/clean_occurrences.R +230 -0
  73. package/skills/ecological-data-foundation/scripts/clean_occurrences.py +268 -0
  74. package/skills/ecological-data-foundation/scripts/download_from_ebird.R +251 -0
  75. package/skills/ecological-data-foundation/scripts/download_from_ebird.py +364 -0
  76. package/skills/ecological-data-foundation/scripts/download_from_gbif.R +315 -0
  77. package/skills/ecological-data-foundation/scripts/download_from_gbif.py +407 -0
  78. package/skills/ecological-data-foundation/scripts/download_from_inat.R +238 -0
  79. package/skills/ecological-data-foundation/scripts/download_from_inat.py +304 -0
  80. package/skills/ecological-data-foundation/scripts/download_from_iucn.R +273 -0
  81. package/skills/ecological-data-foundation/scripts/download_from_iucn.py +344 -0
  82. package/skills/ecological-data-foundation/scripts/download_from_obis.R +248 -0
  83. package/skills/ecological-data-foundation/scripts/download_from_obis.py +318 -0
  84. package/skills/ecological-impact-assessment/SKILL.md +123 -0
  85. package/skills/ecological-impact-assessment/examples/example-prompts.md +32 -0
  86. package/skills/ecological-impact-assessment/resources/baci-design-guide.md +55 -0
  87. package/skills/ecological-impact-assessment/resources/fragmentation-metrics-reference.md +86 -0
  88. package/skills/ecological-impact-assessment/resources/pressure-index-template.md +78 -0
  89. package/skills/ecological-impact-assessment/resources/study-design-guide.md +168 -0
  90. package/skills/ecological-impact-assessment/scripts/baci_analysis.R +161 -0
  91. package/skills/ecological-impact-assessment/scripts/fragmentation_analysis.py +141 -0
  92. package/skills/ecological-impact-assessment/scripts/power_analysis_baci.R +274 -0
  93. package/skills/ecosystem-services-assessment/SKILL.md +125 -0
  94. package/skills/ecosystem-services-assessment/examples/example-prompts.md +24 -0
  95. package/skills/ecosystem-services-assessment/resources/es-indicator-reference.md +45 -0
  96. package/skills/ecosystem-services-assessment/resources/invest-parameter-guide.md +86 -0
  97. package/skills/ecosystem-services-assessment/resources/rusle-coefficients.md +88 -0
  98. package/skills/ecosystem-services-assessment/scripts/__pycache__/compute_es.cpython-311.pyc +0 -0
  99. package/skills/ecosystem-services-assessment/scripts/compute_es.py +189 -0
  100. package/skills/ecosystem-services-assessment/scripts/tradeoff_analysis.R +161 -0
  101. package/skills/environmental-time-series/SKILL.md +125 -0
  102. package/skills/environmental-time-series/examples/example-prompts.md +33 -0
  103. package/skills/environmental-time-series/resources/anomaly-indices-reference.md +88 -0
  104. package/skills/environmental-time-series/resources/bfast-parameter-guide.md +69 -0
  105. package/skills/environmental-time-series/scripts/__pycache__/recovery_trajectory.cpython-311.pyc +0 -0
  106. package/skills/environmental-time-series/scripts/__pycache__/trend_analysis.cpython-311.pyc +0 -0
  107. package/skills/environmental-time-series/scripts/recovery_trajectory.R +305 -0
  108. package/skills/environmental-time-series/scripts/recovery_trajectory.py +178 -0
  109. package/skills/environmental-time-series/scripts/trend_analysis.R +192 -0
  110. package/skills/environmental-time-series/scripts/trend_analysis.py +184 -0
  111. package/skills/geoprocessing-for-ecology/SKILL.md +123 -0
  112. package/skills/geoprocessing-for-ecology/examples/example-prompts.md +32 -0
  113. package/skills/geoprocessing-for-ecology/resources/crs-reference.md +62 -0
  114. package/skills/geoprocessing-for-ecology/resources/global-predictor-sources.md +331 -0
  115. package/skills/geoprocessing-for-ecology/resources/resampling-methods.md +57 -0
  116. package/skills/geoprocessing-for-ecology/scripts/__pycache__/download_predictors.cpython-311.pyc +0 -0
  117. package/skills/geoprocessing-for-ecology/scripts/download_predictors.R +239 -0
  118. package/skills/geoprocessing-for-ecology/scripts/download_predictors.py +379 -0
  119. package/skills/geoprocessing-for-ecology/scripts/stack_and_extract.R +224 -0
  120. package/skills/geoprocessing-for-ecology/scripts/stack_and_extract.py +172 -0
  121. package/skills/landscape-connectivity/SKILL.md +170 -0
  122. package/skills/landscape-connectivity/examples/example-prompts.md +96 -0
  123. package/skills/landscape-connectivity/examples/jaguar_mesoamerica_corridor_example.md +271 -0
  124. package/skills/landscape-connectivity/resources/circuitscape-parameter-guide.md +155 -0
  125. package/skills/landscape-connectivity/resources/graph-theory-for-ecology.md +134 -0
  126. package/skills/landscape-connectivity/resources/resistance-surface-guide.md +141 -0
  127. package/skills/landscape-connectivity/scripts/connectivity_analysis.py +387 -0
  128. package/skills/landscape-connectivity/scripts/connectivity_metrics.R +274 -0
  129. package/skills/landscape-connectivity/scripts/resistance_surface.R +239 -0
  130. package/skills/model-validation-and-uncertainty/SKILL.md +131 -0
  131. package/skills/model-validation-and-uncertainty/examples/example-prompts.md +30 -0
  132. package/skills/model-validation-and-uncertainty/resources/extrapolation-risk-guide.md +236 -0
  133. package/skills/model-validation-and-uncertainty/resources/metric-selection-guide.md +52 -0
  134. package/skills/model-validation-and-uncertainty/resources/threshold-selection-guide.md +64 -0
  135. package/skills/model-validation-and-uncertainty/scripts/__pycache__/validate_model.cpython-311.pyc +0 -0
  136. package/skills/model-validation-and-uncertainty/scripts/extrapolation_risk.R +315 -0
  137. package/skills/model-validation-and-uncertainty/scripts/validate_model.py +226 -0
  138. package/skills/model-validation-and-uncertainty/scripts/validate_sdm.R +162 -0
  139. package/skills/occupancy-and-detection/SKILL.md +126 -0
  140. package/skills/occupancy-and-detection/examples/example-prompts.md +33 -0
  141. package/skills/occupancy-and-detection/resources/detection-history-format.md +100 -0
  142. package/skills/occupancy-and-detection/resources/occupancy-study-design.md +47 -0
  143. package/skills/occupancy-and-detection/scripts/__pycache__/occupancy_analysis.cpython-311.pyc +0 -0
  144. package/skills/occupancy-and-detection/scripts/occupancy_analysis.R +160 -0
  145. package/skills/occupancy-and-detection/scripts/occupancy_analysis.py +159 -0
  146. package/skills/population-viability-analysis/SKILL.md +161 -0
  147. package/skills/population-viability-analysis/examples/african_elephant_pva_example.md +266 -0
  148. package/skills/population-viability-analysis/examples/example-prompts.md +95 -0
  149. package/skills/population-viability-analysis/resources/extinction-risk-thresholds.md +128 -0
  150. package/skills/population-viability-analysis/resources/matrix-model-guide.md +139 -0
  151. package/skills/population-viability-analysis/resources/sensitivity-elasticity-reference.md +182 -0
  152. package/skills/population-viability-analysis/scripts/matrix_pva.R +258 -0
  153. package/skills/population-viability-analysis/scripts/pva_analysis.py +442 -0
  154. package/skills/population-viability-analysis/scripts/stochastic_pva.R +353 -0
  155. package/skills/predictive-modeling-best-practices/SKILL.md +136 -0
  156. package/skills/predictive-modeling-best-practices/examples/example-prompts.md +58 -0
  157. package/skills/predictive-modeling-best-practices/resources/collinearity-decision-tree.md +65 -0
  158. package/skills/predictive-modeling-best-practices/resources/sampling-bias-correction.md +267 -0
  159. package/skills/predictive-modeling-best-practices/resources/spatial-cv-guide.md +73 -0
  160. package/skills/predictive-modeling-best-practices/scripts/__pycache__/spatial_cv.cpython-311.pyc +0 -0
  161. package/skills/predictive-modeling-best-practices/scripts/collinearity_check.R +112 -0
  162. package/skills/predictive-modeling-best-practices/scripts/spatial_cv.py +182 -0
  163. package/skills/reproducible-ecology-pipeline/SKILL.md +139 -0
  164. package/skills/reproducible-ecology-pipeline/examples/example-prompts.md +35 -0
  165. package/skills/reproducible-ecology-pipeline/resources/directory-structure-template.md +94 -0
  166. package/skills/reproducible-ecology-pipeline/resources/params-yaml-template.yaml +84 -0
  167. package/skills/reproducible-ecology-pipeline/resources/reproducibility-checklist-template.md +66 -0
  168. package/skills/reproducible-ecology-pipeline/scripts/generate_file_manifest.py +110 -0
  169. package/skills/reproducible-ecology-pipeline/scripts/init_project.sh +53 -0
  170. package/skills/spatial-prioritization/SKILL.md +162 -0
  171. package/skills/spatial-prioritization/examples/biodiversity_hotspot_prioritization_example.md +289 -0
  172. package/skills/spatial-prioritization/examples/example-prompts.md +93 -0
  173. package/skills/spatial-prioritization/resources/cost-surface-reference.md +130 -0
  174. package/skills/spatial-prioritization/resources/marxan-vs-prioritizr-comparison.md +125 -0
  175. package/skills/spatial-prioritization/resources/prioritizr-formulation-guide.md +188 -0
  176. package/skills/spatial-prioritization/resources/representation-targets-guide.md +186 -0
  177. package/skills/spatial-prioritization/scripts/prioritization_sensitivity.R +320 -0
  178. package/skills/spatial-prioritization/scripts/run_prioritization.R +336 -0
  179. package/skills/species-distribution-modeling/SKILL.md +139 -0
  180. package/skills/species-distribution-modeling/examples/example-prompts.md +36 -0
  181. package/skills/species-distribution-modeling/resources/algorithm-comparison.md +25 -0
  182. package/skills/species-distribution-modeling/resources/calibration-area-guide.md +71 -0
  183. package/skills/species-distribution-modeling/resources/climate-scenario-preparation.md +170 -0
  184. package/skills/species-distribution-modeling/resources/maxent-calibration-guide.md +211 -0
  185. package/skills/species-distribution-modeling/resources/sdm-checklist.md +37 -0
  186. package/skills/species-distribution-modeling/scripts/predict_distribution.R +236 -0
  187. package/skills/species-distribution-modeling/scripts/predict_distribution.py +286 -0
  188. package/skills/species-distribution-modeling/scripts/prepare_future_layers.R +351 -0
  189. package/skills/species-distribution-modeling/scripts/project_scenarios.R +220 -0
  190. package/skills/species-distribution-modeling/scripts/run_ensemble_sdm.R +99 -0
  191. package/skills/species-distribution-modeling/scripts/sdm_pipeline.py +318 -0
  192. package/skills/species-distribution-modeling/scripts/tune_maxnet.R +344 -0
  193. package/templates/SKILL_TEMPLATE.md +225 -0
  194. package/templates/checklists/data-submission-checklist.md +38 -0
  195. package/templates/checklists/post-analysis-checklist.md +55 -0
  196. package/templates/checklists/pre-analysis-checklist.md +31 -0
  197. package/templates/prompts/debug-skill.md +47 -0
  198. package/templates/prompts/invoke-skill.md +34 -0
  199. package/templates/prompts/invoke-workflow.md +45 -0
  200. package/templates/reports/technical-report-template.md +80 -0
  201. package/templates/scripts/logger_setup.R +79 -0
  202. package/templates/scripts/logger_setup.py +119 -0
  203. package/templates/scripts/params_loader.R +28 -0
  204. package/templates/scripts/params_loader.py +38 -0
  205. package/workflows/analyze-community-structure/WORKFLOW.md +72 -0
  206. package/workflows/analyze-environmental-change/WORKFLOW.md +73 -0
  207. package/workflows/assess-ecological-impact/WORKFLOW.md +75 -0
  208. package/workflows/assess-ecosystem-services/WORKFLOW.md +68 -0
  209. package/workflows/assess-landscape-connectivity/WORKFLOW.md +84 -0
  210. package/workflows/build-fire-risk-map/WORKFLOW.md +79 -0
  211. package/workflows/produce-technical-report/WORKFLOW.md +113 -0
  212. package/workflows/run-camera-trap-occupancy/WORKFLOW.md +87 -0
  213. package/workflows/run-conservation-prioritization/WORKFLOW.md +89 -0
  214. package/workflows/run-multispecies-screening/WORKFLOW.md +197 -0
  215. package/workflows/run-occupancy-analysis/WORKFLOW.md +74 -0
  216. package/workflows/run-population-viability/WORKFLOW.md +90 -0
  217. package/workflows/run-sdm-study/WORKFLOW.md +99 -0
@@ -0,0 +1,129 @@
1
+ ---
2
+ name: ecological-data-foundation
3
+ description: "Cleans, validates, and standardizes ecological occurrence records and downloads biodiversity data from global repositories. Use this skill when the user needs data cleaning, coordinate validation, duplicate removal, outlier detection, taxonomic harmonization, or downloads from GBIF, iNaturalist, eBird, OBIS, or IUCN Red List. Also triggers for Darwin Core formatting, data quality reports, flagged records, and biodiversity data preparation."
4
+ skill_version: 1.0.0
5
+ ---
6
+
7
+ # Skill: ecological-data-foundation
8
+
9
+ **Domain:** Data ingestion · QA · Schema · Metadata
10
+ **Phase:** 1 — Foundation
11
+ **Used by:** All workflows
12
+
13
+ ---
14
+
15
+ ## Purpose
16
+
17
+ This skill guides the agent through the first mandatory step of any quantitative ecology project: getting raw data into a clean, well-documented, analysis-ready state. It covers ingestion of heterogeneous sources, structural validation, duplicate and outlier detection, schema standardisation, and metadata generation.
18
+
19
+ ---
20
+
21
+ ## When to Invoke
22
+
23
+ - A new dataset arrives (CSV, XLSX, GDB, Shapefile, database export, API pull)
24
+ - Before any spatial operation, statistical test, or model fitting
25
+ - When the user asks to "clean", "validate", "check", or "prepare" ecological data
26
+ - When merging datasets from different sources or institutions
27
+
28
+ ---
29
+
30
+ ## Inputs
31
+
32
+ | Input | Format | Required |
33
+ |-------|--------|----------|
34
+ | Raw occurrence or survey data | CSV, XLSX, TSV, GDB | Yes |
35
+ | Data dictionary or field protocol | PDF, DOCX, TXT | Recommended |
36
+ | Environmental layers (if applicable) | GeoTIFF, NetCDF | Optional |
37
+ | Existing metadata record | EML, JSON, YAML | Optional |
38
+
39
+ ---
40
+
41
+ ## Outputs
42
+
43
+ | Output | Description |
44
+ |--------|-------------|
45
+ | `data_clean.csv` | Validated, deduplicated, standardised dataset |
46
+ | `qa_report.md` | Summary of issues found and actions taken |
47
+ | `schema.yaml` | Field definitions, types, valid ranges, units |
48
+ | `metadata.xml` | EML or Dublin Core metadata record |
49
+ | `flagged_records.csv` | Records removed or flagged, with reason codes |
50
+
51
+ ---
52
+
53
+ ## Steps
54
+
55
+ ### 1. Ingest and Inspect
56
+ - Load all source files; report file sizes, row counts, column names, and data types
57
+ - Identify encoding issues, BOM characters, delimiter inconsistencies
58
+ - Document source provenance (institution, date, version, license)
59
+
60
+ ### 2. Schema Standardisation
61
+ - Map field names to a canonical schema (Darwin Core preferred for biodiversity)
62
+ - Enforce consistent data types (dates as ISO-8601, coordinates as decimal degrees WGS84)
63
+ - Flag or convert non-standard units
64
+
65
+ ### 3. Duplicate Detection
66
+ - Identify exact duplicates (all fields identical)
67
+ - Identify spatial-temporal near-duplicates (same species, same coordinates, same date ± N days)
68
+ - Document resolution strategy (keep first, keep most complete, remove all)
69
+
70
+ ### 4. Coordinate and Spatial QA
71
+ - Check coordinate ranges (latitude: −90 to 90; longitude: −180 to 180)
72
+ - Flag records with coordinates at country centroids, capital cities, or institution headquarters (likely georeferencing errors)
73
+ - Flag records with zero coordinates (0,0)
74
+ - Validate records fall within the stated country/region polygon
75
+
76
+ ### 5. Taxonomic QA
77
+ - Check species names against a reference taxonomy (GBIF Backbone, Catalogue of Life)
78
+ - Flag synonyms, misspellings, and higher-rank-only identifications
79
+ - Resolve to accepted name + authorship
80
+
81
+ ### 6. Temporal QA
82
+ - Check for dates in the future or before plausible survey era
83
+ - Flag records with only year-level precision when finer precision is needed
84
+
85
+ ### 7. Attribute QA
86
+ - Check numeric fields for out-of-range values (e.g., abundance < 0, DBH > 500 cm)
87
+ - Check categorical fields for invalid entries
88
+ - Assess missing value rates per field; flag fields above threshold (default 20%)
89
+
90
+ ### 8. Generate Outputs
91
+ - Write `data_clean.csv` with action codes in an appended `QA_status` column
92
+ - Write `qa_report.md` summarising each issue category, count, and resolution
93
+ - Write `schema.yaml` with field definitions
94
+ - Write `metadata.xml` in EML format
95
+
96
+ ---
97
+
98
+ ## Key Decisions to Document
99
+
100
+ - Duplicate resolution strategy
101
+ - Coordinate uncertainty threshold for exclusion
102
+ - Temporal precision requirement
103
+ - Missing value threshold per field
104
+ - Taxonomy backbone version used
105
+
106
+ ---
107
+
108
+ ## Tools and Libraries
109
+
110
+ **R:** `dplyr`, `readr`, `janitor`, `CoordinateCleaner`, `taxize`, `EML`
111
+ **Python:** `pandas`, `pyjanitor`, `geopy`, `pycountry`, `dwca-reader`
112
+ **CLI:** `csvkit`, `miller (mlr)`
113
+
114
+ ---
115
+
116
+ ## Resources
117
+
118
+ - `resources/darwin-core-glossary.md` — Darwin Core field reference
119
+ - `resources/qa-checklist.md` — printable QA checklist
120
+ - `resources/coordinate-cleaning-flags.md` — flag code reference
121
+ - `examples/` — example prompt invocations
122
+
123
+ ---
124
+
125
+ ## Notes
126
+
127
+ - Always preserve the original raw file; never overwrite it
128
+ - Record the exact software versions used for reproducibility
129
+ - Large raster datasets should be handled by the `geoprocessing-for-ecology` skill after this step
@@ -0,0 +1,40 @@
1
+ # Example Invocation Prompts — ecological-data-foundation
2
+
3
+ ## Basic Cleaning
4
+
5
+ ```
6
+ Load skill: ecological-data-foundation
7
+ Task: Clean and validate the occurrence dataset at data/raw/occurrences_raw.csv.
8
+ Apply all standard QA checks. The target taxon is mammals. Use GBIF Backbone for taxonomy.
9
+ Output results to data/processed/.
10
+ ```
11
+
12
+ ## Merging Multiple Sources
13
+
14
+ ```
15
+ Load skill: ecological-data-foundation
16
+ Task: I have three occurrence datasets from different institutions:
17
+ - data/raw/mnrj_mammals.csv (MNRJ herbarium export)
18
+ - data/raw/gbif_download.csv (GBIF Darwin Core)
19
+ - data/raw/fieldwork_2023.xlsx (our field data)
20
+ Merge them into a single Darwin Core dataset. Remove duplicates and apply full QA.
21
+ Report how many records each source contributed after cleaning.
22
+ ```
23
+
24
+ ## Targeted Check
25
+
26
+ ```
27
+ Load skill: ecological-data-foundation
28
+ Task: I already cleaned my data but want to run just the coordinate checks.
29
+ File: data/processed/occ_v1.csv. Apply CoordinateCleaner flags and report.
30
+ Do NOT modify the file; just produce a flag report.
31
+ ```
32
+
33
+ ## Schema Validation
34
+
35
+ ```
36
+ Load skill: ecological-data-foundation
37
+ Task: Validate that data/processed/occ_clean.csv conforms to Darwin Core.
38
+ List any fields that are missing, misnamed, or have incorrect data types.
39
+ Generate schema.yaml from the current file.
40
+ ```
@@ -0,0 +1,66 @@
1
+ # Coordinate Cleaning Flag Reference
2
+
3
+ Based on the `CoordinateCleaner` R package flag system.
4
+
5
+ ## Flag Codes and Descriptions
6
+
7
+ | Flag | Description | Default threshold | Action |
8
+ |------|-------------|------------------|--------|
9
+ | `.cap` | Coordinates at capital city | 0.1° radius | Flag and investigate |
10
+ | `.cen` | Coordinates at country centroid | 0.1° radius | Flag and investigate |
11
+ | `.gbif` | Coordinates at GBIF headquarters | 0.1° radius | Remove |
12
+ | `.inst` | Coordinates at known herbarium/museum | 0.1° radius | Flag and investigate |
13
+ | `.sea` | Coordinates in the ocean (for terrestrial taxa) | — | Flag and investigate |
14
+ | `.val` | Coordinates outside valid range | lat [-90,90], lon [-180,180] | Remove |
15
+ | `.zero` | Coordinates exactly at (0, 0) | — | Remove |
16
+ | `.equ` | Identical lat and lon values | — | Flag and investigate |
17
+ | `.dup` | Identical coordinates to another record | — | Flag; keep one |
18
+ | `.env` | Environmental outlier (extreme value in predictor space) | Mahalanobis distance p < 0.025 | Flag and investigate |
19
+ | `.out` | Spatial outlier (geographic distance from main cluster) | 7 MADs from median | Flag and investigate |
20
+
21
+ ## Recommended Workflow (R)
22
+
23
+ ```r
24
+ library(CoordinateCleaner)
25
+
26
+ flags <- clean_coordinates(
27
+ x = occ_df,
28
+ lon = "decimalLongitude",
29
+ lat = "decimalLatitude",
30
+ species = "species",
31
+ tests = c("capitals", "centroids", "equal", "gbif",
32
+ "institutions", "seas", "urban", "validity", "zeros"),
33
+ capitals_rad = 10000, # 10 km radius
34
+ centroids_rad = 1000, # 1 km radius
35
+ seas_ref = "buffland" # use buffered land polygon
36
+ )
37
+
38
+ # Inspect flags
39
+ summary(flags)
40
+ occ_clean <- occ_df[flags$.summary, ]
41
+ occ_flagged <- occ_df[!flags$.summary, ]
42
+ ```
43
+
44
+ ## Known Country Centroid Coordinates (South America)
45
+
46
+ | Country | Approx centroid lat | Approx centroid lon |
47
+ |---------|--------------------|--------------------|
48
+ | Brazil | -10.333 | -53.200 |
49
+ | Colombia | 4.099 | -72.888 |
50
+ | Peru | -9.190 | -75.016 |
51
+ | Bolivia | -16.290 | -63.589 |
52
+ | Argentina | -34.000 | -64.000 |
53
+ | Paraguay | -23.442 | -58.444 |
54
+
55
+ ## Ocean Check for Terrestrial Taxa
56
+
57
+ Use `cc_sea()` with a buffered land polygon (0.5–1° buffer) to avoid incorrectly flagging coastal records:
58
+
59
+ ```r
60
+ occ_sea_checked <- cc_sea(
61
+ x = occ_df,
62
+ lon = "decimalLongitude",
63
+ lat = "decimalLatitude",
64
+ ref = buffland # load from CoordinateCleaner package
65
+ )
66
+ ```
@@ -0,0 +1,91 @@
1
+ # Darwin Core Field Glossary
2
+
3
+ Essential fields for biodiversity occurrence data following the [Darwin Core standard](https://dwc.tdwg.org/).
4
+
5
+ ## Identity Fields
6
+
7
+ | Field | Type | Description | Example |
8
+ |-------|------|-------------|---------|
9
+ | `occurrenceID` | string | Globally unique identifier for the occurrence | `urn:uuid:a948571f-...` |
10
+ | `catalogNumber` | string | Institution-assigned identifier | `MNRJ-12345` |
11
+ | `recordedBy` | string | Observer name(s) | `"Silva, J.R."` |
12
+ | `recordNumber` | string | Field number assigned by the observer | `JRS-2023-001` |
13
+
14
+ ## Taxonomic Fields
15
+
16
+ | Field | Type | Description | Example |
17
+ |-------|------|-------------|---------|
18
+ | `scientificName` | string | Full scientific name with authorship | `Panthera onca (Linnaeus, 1758)` |
19
+ | `kingdom` | string | | `Animalia` |
20
+ | `phylum` | string | | `Chordata` |
21
+ | `class` | string | | `Mammalia` |
22
+ | `order` | string | | `Carnivora` |
23
+ | `family` | string | | `Felidae` |
24
+ | `genus` | string | | `Panthera` |
25
+ | `specificEpithet` | string | Species epithet only | `onca` |
26
+ | `taxonRank` | string | Lowest rank of the name | `species` |
27
+ | `vernacularName` | string | Common name | `jaguar` |
28
+ | `taxonID` | string | Taxon identifier in a reference system | `gbif:5219243` |
29
+
30
+ ## Occurrence Fields
31
+
32
+ | Field | Type | Description | Example |
33
+ |-------|------|-------------|---------|
34
+ | `basisOfRecord` | string | Nature of the record | `HumanObservation`, `PreservedSpecimen`, `MachineObservation` |
35
+ | `occurrenceStatus` | string | Presence or absence | `present`, `absent` |
36
+ | `individualCount` | integer | Number of individuals | `3` |
37
+ | `sex` | string | | `male`, `female`, `undetermined` |
38
+ | `lifeStage` | string | | `adult`, `juvenile`, `larva` |
39
+ | `behavior` | string | Observed behavior | `foraging` |
40
+
41
+ ## Location Fields
42
+
43
+ | Field | Type | Description | Example |
44
+ |-------|------|-------------|---------|
45
+ | `decimalLatitude` | float | Latitude in decimal degrees (WGS84) | `-15.7801` |
46
+ | `decimalLongitude` | float | Longitude in decimal degrees (WGS84) | `-47.9292` |
47
+ | `geodeticDatum` | string | Datum for coordinates | `WGS84` |
48
+ | `coordinateUncertaintyInMeters` | float | Radius of coordinate uncertainty | `100` |
49
+ | `coordinatePrecision` | float | Decimal places of coordinates | `0.0001` |
50
+ | `countryCode` | string | ISO 3166-1 alpha-2 | `BR` |
51
+ | `stateProvince` | string | State or province | `Mato Grosso` |
52
+ | `county` | string | County or municipality | `Cáceres` |
53
+ | `locality` | string | Specific location description | `"Fazenda São José, margem do rio"` |
54
+ | `verbatimElevation` | string | Original elevation text | `"850-900 m"` |
55
+ | `minimumElevationInMeters` | float | | `850` |
56
+ | `maximumElevationInMeters` | float | | `900` |
57
+
58
+ ## Event Fields
59
+
60
+ | Field | Type | Description | Example |
61
+ |-------|------|-------------|---------|
62
+ | `eventDate` | string | ISO-8601 date or date range | `2023-07-15`, `2023-07/2023-08` |
63
+ | `year` | integer | | `2023` |
64
+ | `month` | integer | | `7` |
65
+ | `day` | integer | | `15` |
66
+ | `eventTime` | string | Time of observation | `14:30:00-03:00` |
67
+ | `samplingProtocol` | string | Method used | `"point count"`, `"camera trap"` |
68
+ | `samplingEffort` | string | Effort description | `"3 nights, 1 trap"` |
69
+
70
+ ## Data Quality Fields
71
+
72
+ | Field | Type | Description |
73
+ |-------|------|-------------|
74
+ | `identificationQualifier` | string | Uncertainty in identification (`cf.`, `aff.`) |
75
+ | `identifiedBy` | string | Who identified the specimen |
76
+ | `dateIdentified` | string | When identification was made |
77
+ | `datasetName` | string | Source dataset name |
78
+ | `institutionCode` | string | Institution acronym |
79
+ | `license` | string | Data license (CC BY 4.0, etc.) |
80
+ | `rightsHolder` | string | Owner of the data rights |
81
+
82
+ ## Common Issues and Fixes
83
+
84
+ | Issue | Detection | Fix |
85
+ |-------|-----------|-----|
86
+ | Coordinates swapped (lat/lon) | lat > 90 or < -90 | Swap columns |
87
+ | Comma as decimal separator | `"-15,78"` | Replace `,` → `.` |
88
+ | DMS instead of decimal | `"15°46'48"S"` | Convert to decimal |
89
+ | Missing datum | No `geodeticDatum` | Assume WGS84; flag |
90
+ | Future date | `eventDate` > today | Flag and investigate |
91
+ | Country centroid | Coordinates = known centroid | Flag as `COUNTRY_CENTROID` |
@@ -0,0 +1,265 @@
1
+ # Data Citation Guide for Ecological Occurrence Data
2
+
3
+ Reference for citing biodiversity data sources correctly in publications, reports, and analytical workflows. Covers citation formats, data-use policies, and licence requirements for the six primary sources supported by this skill library.
4
+
5
+ ---
6
+
7
+ ## 1. GBIF — Global Biodiversity Information Facility
8
+
9
+ **Website**: https://www.gbif.org
10
+ **API**: https://api.gbif.org/v1/
11
+
12
+ ### Citation formats
13
+
14
+ #### `occ_download` / `occ.download` (preferred for publications)
15
+ GBIF issues a citable DOI for every download submitted via `occ_download` (R) or `occ.download` (Python). The DOI resolves to a stable dataset snapshot.
16
+
17
+ ```
18
+ GBIF.org (YEAR) GBIF Occurrence Download.
19
+ https://doi.org/10.15468/dl.XXXXXXX
20
+ Accessed on YYYY-MM-DD.
21
+ ```
22
+
23
+ Example (R):
24
+ ```r
25
+ meta <- occ_download_meta(dl_key)
26
+ doi <- meta$doi
27
+ cat("Citation: GBIF.org (", format(Sys.Date(), "%Y"), ") GBIF Occurrence Download. ",
28
+ "https://doi.org/", doi, " Accessed on ", Sys.Date(), ".\n", sep = "")
29
+ ```
30
+
31
+ #### `occ_search` / `occ.search` (no DOI)
32
+ `occ_search` does not generate a persistent DOI and **should not be used in publications**. For exploratory analysis only. Record the query date and parameters manually.
33
+
34
+ ```
35
+ GBIF.org (YEAR) Occurrence records for [species]. Query via GBIF API (occ_search).
36
+ Accessed on YYYY-MM-DD. [Not citable — re-run with occ_download for publication.]
37
+ ```
38
+
39
+ ### Data use policy
40
+ - Most records: **CC BY 4.0** or **CC0**
41
+ - Some datasets: **CC BY-NC 4.0** (check individual dataset licences at https://www.gbif.org/dataset)
42
+ - Always attribute the original data provider, not only GBIF
43
+ - Use the GBIF citation widget: https://www.gbif.org/citation-guidelines
44
+
45
+ ---
46
+
47
+ ## 2. iNaturalist
48
+
49
+ **Website**: https://www.inaturalist.org
50
+ **API**: https://api.inaturalist.org/v1/
51
+
52
+ ### Citation format
53
+
54
+ iNaturalist does **not** issue download DOIs. Record the access date precisely.
55
+
56
+ ```
57
+ iNaturalist contributors and the California Academy of Sciences (YEAR).
58
+ iNaturalist Research-grade Observations. iNaturalist.org.
59
+ Accessed on YYYY-MM-DD. https://www.inaturalist.org
60
+ ```
61
+
62
+ Or, if re-exported through GBIF:
63
+ ```
64
+ iNaturalist (YEAR) Occurrence records exported via GBIF.
65
+ https://doi.org/10.15468/ab3s5x
66
+ Accessed on YYYY-MM-DD.
67
+ ```
68
+
69
+ ### Quality grades
70
+ | Grade | Meaning | SDM recommendation |
71
+ |-------|---------|-------------------|
72
+ | **Research** | ID agreed by ≥2 users, has coordinates, is not captive | Use by default |
73
+ | **Needs ID** | Single ID or disagreement | Avoid for SDM |
74
+ | **Casual** | Missing fields, captive, or cultivated | Exclude |
75
+
76
+ ### Data use policy
77
+ - Research-grade observations: **CC BY-NC** by default (individual users may choose CC0 or CC BY)
78
+ - Commercial use requires CC0 or CC BY records only
79
+ - Bulk data exports for research: permitted; cite as above
80
+
81
+ ---
82
+
83
+ ## 3. eBird (Cornell Lab of Ornithology)
84
+
85
+ **Website**: https://ebird.org
86
+ **Data portal**: https://ebird.org/data/download
87
+
88
+ ### Citation format
89
+
90
+ eBird data requires a **signed Data Use Agreement**. Access is free but must be requested.
91
+
92
+ ```
93
+ eBird Basic Dataset (EBD). Version: YEAR-MM.
94
+ Cornell Lab of Ornithology, Ithaca, New York.
95
+ https://ebird.org/data/download
96
+ Accessed on YYYY-MM-DD.
97
+ ```
98
+
99
+ ### Data use policy
100
+ - **Non-commercial research** only under the eBird Data Access Agreement
101
+ - Data may not be redistributed or used to create competing products
102
+ - Publications must acknowledge eBird and include the citation above
103
+ - Restricted species data (sensitive species, breeding season) may be withheld
104
+
105
+ ### Filtering recommendations for SDM
106
+ | Filter | Recommended value | Rationale |
107
+ |--------|------------------|-----------|
108
+ | Protocol | Stationary, Traveling | Standardised effort |
109
+ | Approved | TRUE | Quality-reviewed by eBird |
110
+ | Complete checklist | TRUE | Absence implied for non-detected species |
111
+ | Effort distance | ≤5 km | Reduces localisation uncertainty |
112
+ | Duration | 5–300 min | Avoids very short/long checklists |
113
+
114
+ ---
115
+
116
+ ## 4. OBIS — Ocean Biodiversity Information System
117
+
118
+ **Website**: https://obis.org
119
+ **API**: https://api.obis.org/v3/
120
+
121
+ ### Citation format
122
+
123
+ ```
124
+ OBIS (YEAR). Ocean Biodiversity Information System.
125
+ Intergovernmental Oceanographic Commission of UNESCO.
126
+ www.obis.org. Accessed on YYYY-MM-DD.
127
+ ```
128
+
129
+ For specific datasets within OBIS, also cite the original dataset DOI available in each record's `datasetName` and `resourceID` fields.
130
+
131
+ ### Data use policy
132
+ - **CC0 1.0** (public domain dedication)
133
+ - Users encouraged (but not required) to share results with OBIS
134
+ - Do not use OBIS data to identify precise locations of sensitive species (use aggregated records)
135
+
136
+ ### Quality flags to exclude
137
+ | Flag | Meaning |
138
+ |------|---------|
139
+ | `NO_COORD` | Missing coordinates |
140
+ | `ZERO_COORD` | Coordinates are 0,0 |
141
+ | `ON_LAND` | Marine record mapped to land |
142
+ | `DEPTH_EXCEEDS_BATH` | Depth exceeds bathymetry |
143
+ | `COORDINATE_MISMATCH` | Textual and coordinate locations conflict |
144
+
145
+ ---
146
+
147
+ ## 5. IUCN Red List
148
+
149
+ **Website**: https://www.iucnredlist.org
150
+ **API**: https://apiv3.iucnredlist.org
151
+ **API key**: Required — apply at https://apiv3.iucnredlist.org/
152
+
153
+ ### Citation format
154
+
155
+ ```
156
+ IUCN (YEAR). The IUCN Red List of Threatened Species. Version YEAR-N.
157
+ https://www.iucnredlist.org. Accessed on YYYY-MM-DD.
158
+ ```
159
+
160
+ For species-specific assessments:
161
+ ```
162
+ [Author(s)] (YEAR). [Species name]. The IUCN Red List of Threatened Species YEAR:
163
+ [Category]. https://dx.doi.org/10.2305/IUCN.UK.[version].RLTS.[TXID].en.
164
+ Accessed on YYYY-MM-DD.
165
+ ```
166
+
167
+ ### Data use policy
168
+ - **CC BY 4.0**
169
+ - Distribution maps and species data may not be used to create competing databases
170
+ - API key must not be shared; each user must register independently
171
+ - Sensitive species (CR, EN) distribution data may be partially obscured
172
+
173
+ ### Red List categories
174
+ | Code | Category |
175
+ |------|---------|
176
+ | EX | Extinct |
177
+ | EW | Extinct in the Wild |
178
+ | CR | Critically Endangered |
179
+ | EN | Endangered |
180
+ | VU | Vulnerable |
181
+ | NT | Near Threatened |
182
+ | LC | Least Concern |
183
+ | DD | Data Deficient |
184
+ | NE | Not Evaluated |
185
+
186
+ ---
187
+
188
+ ## 6. WorldClim / CHELSA (Predictor Data)
189
+
190
+ ### WorldClim v2.1
191
+
192
+ ```
193
+ Fick, S.E. & Hijmans, R.J. (2017). WorldClim 2: new 1-km spatial resolution climate surfaces
194
+ for global land areas. International Journal of Climatology 37(12): 4302–4315.
195
+ https://doi.org/10.1002/joc.5086
196
+ ```
197
+
198
+ **Licence**: CC BY 4.0
199
+
200
+ ### CHELSA v2.1
201
+
202
+ ```
203
+ Karger, D.N. et al. (2021). Global climate data at high spatial resolution (CHELSA v2.1).
204
+ Scientific Data 8: 282. https://doi.org/10.1038/s41597-021-01084-7
205
+ ```
206
+
207
+ **Licence**: CC BY 4.0
208
+
209
+ ---
210
+
211
+ ## Combining Multiple Sources
212
+
213
+ When merging occurrence data from multiple sources, include the `source` and `datasetName` columns in your output so records can be traced back to their origin.
214
+
215
+ ### Recommended merge workflow
216
+ ```r
217
+ library(dplyr)
218
+ occ_all <- bind_rows(
219
+ read_csv("output/gbif/occurrences_raw_GBIF_Panthera_onca_20260101.csv"),
220
+ read_csv("output/inat/occurrences_raw_iNat_Panthera_onca_20260101.csv"),
221
+ read_csv("output/obis/occurrences_raw_OBIS_Chelonia_mydas_20260101.csv")
222
+ ) |>
223
+ distinct(species, decimalLatitude, decimalLongitude, eventDate, .keep_all = TRUE)
224
+ ```
225
+
226
+ ### Master data citation block (for Methods section)
227
+ ```
228
+ Occurrence data were downloaded from GBIF (GBIF.org YEAR, doi:...), iNaturalist
229
+ (iNaturalist contributors and CAS YEAR, accessed YYYY-MM-DD), and OBIS (OBIS YEAR,
230
+ accessed YYYY-MM-DD). Records were merged and spatially thinned to one record per
231
+ [resolution] grid cell. Final dataset: [n] records of [n_species] species, [year_range].
232
+ ```
233
+
234
+ ---
235
+
236
+ ## Licence Compatibility Matrix
237
+
238
+ | Source | Licence | Commercial use | Redistribute | Attribution required |
239
+ |--------|---------|----------------|-------------|----------------------|
240
+ | GBIF (CC0) | CC0 | Yes | Yes | Strongly recommended |
241
+ | GBIF (CC BY) | CC BY 4.0 | Yes | Yes | Yes |
242
+ | GBIF (CC BY-NC) | CC BY-NC 4.0 | **No** | Yes | Yes |
243
+ | iNaturalist | CC BY-NC | **No** | Yes | Yes |
244
+ | eBird | Custom DUA | **No** | **No** | Yes |
245
+ | OBIS | CC0 | Yes | Yes | Recommended |
246
+ | IUCN | CC BY 4.0 | Yes | Yes* | Yes |
247
+ | WorldClim | CC BY 4.0 | Yes | Yes | Yes |
248
+ | CHELSA | CC BY 4.0 | Yes | Yes | Yes |
249
+
250
+ *IUCN: redistribution of the full database is not permitted; individual species data may be shared with attribution.
251
+
252
+ ---
253
+
254
+ ## Quick Reference: occ_search vs occ_download (GBIF)
255
+
256
+ | Aspect | `occ_search` | `occ_download` |
257
+ |--------|-------------|---------------|
258
+ | Speed | Immediate | Minutes to hours |
259
+ | Record limit | 100,000 | Unlimited |
260
+ | DOI generated | **No** | **Yes** |
261
+ | Reproducible | **No** | **Yes** |
262
+ | Recommended for | Exploration | Publication |
263
+ | Credentials needed | No | Yes (GBIF account) |
264
+
265
+ **Rule of thumb**: Use `occ_search` for initial exploration and data assessment. Switch to `occ_download` before any analysis intended for publication.