ecological-agent-skills 3.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENT_CONTEXT.md +191 -0
- package/CATALOG.md +329 -0
- package/LICENSE +692 -0
- package/README.md +347 -0
- package/bin/install.mjs +168 -0
- package/docs/comparison-with-alternatives.md +38 -0
- package/docs/global-examples-index.md +103 -0
- package/docs/repository-statistics.md +101 -0
- package/docs/theoretical-foundations.md +188 -0
- package/environment.yaml +106 -0
- package/examples/community/arctic_tundra_vegetation_example.md +247 -0
- package/examples/community/bird_landuse_example.md +63 -0
- package/examples/community/phytoplankton_reservoir_example.md +60 -0
- package/examples/community/reef_fish_indopacific_example.md +221 -0
- package/examples/impact/baci_road_example.md +57 -0
- package/examples/impact/ecosystem_services_atlantic_forest.md +83 -0
- package/examples/impact/forest_loss_borneo_timeseries_example.md +225 -0
- package/examples/occupancy/puma_camera_example.md +61 -0
- package/examples/occupancy/snow_leopard_himalayas_example.md +204 -0
- package/examples/reproducible/whittaker_biome_sdm_example.md +406 -0
- package/examples/sdm/anteater_cerrado_example.md +69 -0
- package/examples/sdm/jaguar_amazon_example.md +80 -0
- package/examples/sdm/koala_climate_change_example.md +170 -0
- package/examples/sdm/wolf_recolonization_europe_example.md +193 -0
- package/package.json +43 -0
- package/renv.lock +194 -0
- package/skills/SKILL_INDEX.json +1020 -0
- package/skills/acoustic-monitoring/SKILL.md +163 -0
- package/skills/acoustic-monitoring/examples/example-prompts.md +100 -0
- package/skills/acoustic-monitoring/examples/temperate_forest_birds_example.md +285 -0
- package/skills/acoustic-monitoring/resources/acoustic-indices-reference.md +93 -0
- package/skills/acoustic-monitoring/resources/soundscape-ecology-guide.md +90 -0
- package/skills/acoustic-monitoring/resources/species-id-tools-comparison.md +89 -0
- package/skills/acoustic-monitoring/scripts/batch_species_detection.py +360 -0
- package/skills/acoustic-monitoring/scripts/compute_acoustic_indices.R +235 -0
- package/skills/acoustic-monitoring/scripts/compute_acoustic_indices.py +374 -0
- package/skills/biostatistics-workbench/SKILL.md +140 -0
- package/skills/biostatistics-workbench/examples/example-prompts.md +39 -0
- package/skills/biostatistics-workbench/resources/effect-size-reference.md +81 -0
- package/skills/biostatistics-workbench/resources/glm-family-link-reference.md +47 -0
- package/skills/biostatistics-workbench/resources/test-selection-guide.md +93 -0
- package/skills/biostatistics-workbench/scripts/glm_pipeline.R +78 -0
- package/skills/biostatistics-workbench/scripts/glm_pipeline.py +210 -0
- package/skills/camera-trap-processing/SKILL.md +159 -0
- package/skills/camera-trap-processing/examples/example-prompts.md +103 -0
- package/skills/camera-trap-processing/examples/leopard_serengeti_example.md +231 -0
- package/skills/camera-trap-processing/resources/activity-patterns-reference.md +113 -0
- package/skills/camera-trap-processing/resources/camtrapR-workflow-guide.md +130 -0
- package/skills/camera-trap-processing/resources/detection-event-definition-guide.md +89 -0
- package/skills/camera-trap-processing/scripts/estimate_activity.R +169 -0
- package/skills/camera-trap-processing/scripts/process_camtrap_data.R +179 -0
- package/skills/camera-trap-processing/scripts/process_camtrap_data.py +192 -0
- package/skills/community-ecology-ordination/SKILL.md +133 -0
- package/skills/community-ecology-ordination/examples/example-prompts.md +35 -0
- package/skills/community-ecology-ordination/resources/dissimilarity-metric-guide.md +53 -0
- package/skills/community-ecology-ordination/resources/nmds-interpretation-guide.md +104 -0
- package/skills/community-ecology-ordination/scripts/__pycache__/community_analysis.cpython-311.pyc +0 -0
- package/skills/community-ecology-ordination/scripts/community_analysis.R +143 -0
- package/skills/community-ecology-ordination/scripts/community_analysis.py +231 -0
- package/skills/ecological-data-foundation/SKILL.md +129 -0
- package/skills/ecological-data-foundation/examples/example-prompts.md +40 -0
- package/skills/ecological-data-foundation/resources/coordinate-cleaning-flags.md +66 -0
- package/skills/ecological-data-foundation/resources/darwin-core-glossary.md +91 -0
- package/skills/ecological-data-foundation/resources/data-citation-guide.md +265 -0
- package/skills/ecological-data-foundation/resources/gbif-data-citation-guide.md +193 -0
- package/skills/ecological-data-foundation/resources/qa-checklist.md +83 -0
- package/skills/ecological-data-foundation/scripts/__pycache__/clean_occurrences.cpython-311.pyc +0 -0
- package/skills/ecological-data-foundation/scripts/__pycache__/download_from_ebird.cpython-311.pyc +0 -0
- package/skills/ecological-data-foundation/scripts/__pycache__/download_from_inat.cpython-311.pyc +0 -0
- package/skills/ecological-data-foundation/scripts/__pycache__/download_from_iucn.cpython-311.pyc +0 -0
- package/skills/ecological-data-foundation/scripts/__pycache__/download_from_obis.cpython-311.pyc +0 -0
- package/skills/ecological-data-foundation/scripts/clean_occurrences.R +230 -0
- package/skills/ecological-data-foundation/scripts/clean_occurrences.py +268 -0
- package/skills/ecological-data-foundation/scripts/download_from_ebird.R +251 -0
- package/skills/ecological-data-foundation/scripts/download_from_ebird.py +364 -0
- package/skills/ecological-data-foundation/scripts/download_from_gbif.R +315 -0
- package/skills/ecological-data-foundation/scripts/download_from_gbif.py +407 -0
- package/skills/ecological-data-foundation/scripts/download_from_inat.R +238 -0
- package/skills/ecological-data-foundation/scripts/download_from_inat.py +304 -0
- package/skills/ecological-data-foundation/scripts/download_from_iucn.R +273 -0
- package/skills/ecological-data-foundation/scripts/download_from_iucn.py +344 -0
- package/skills/ecological-data-foundation/scripts/download_from_obis.R +248 -0
- package/skills/ecological-data-foundation/scripts/download_from_obis.py +318 -0
- package/skills/ecological-impact-assessment/SKILL.md +123 -0
- package/skills/ecological-impact-assessment/examples/example-prompts.md +32 -0
- package/skills/ecological-impact-assessment/resources/baci-design-guide.md +55 -0
- package/skills/ecological-impact-assessment/resources/fragmentation-metrics-reference.md +86 -0
- package/skills/ecological-impact-assessment/resources/pressure-index-template.md +78 -0
- package/skills/ecological-impact-assessment/resources/study-design-guide.md +168 -0
- package/skills/ecological-impact-assessment/scripts/baci_analysis.R +161 -0
- package/skills/ecological-impact-assessment/scripts/fragmentation_analysis.py +141 -0
- package/skills/ecological-impact-assessment/scripts/power_analysis_baci.R +274 -0
- package/skills/ecosystem-services-assessment/SKILL.md +125 -0
- package/skills/ecosystem-services-assessment/examples/example-prompts.md +24 -0
- package/skills/ecosystem-services-assessment/resources/es-indicator-reference.md +45 -0
- package/skills/ecosystem-services-assessment/resources/invest-parameter-guide.md +86 -0
- package/skills/ecosystem-services-assessment/resources/rusle-coefficients.md +88 -0
- package/skills/ecosystem-services-assessment/scripts/__pycache__/compute_es.cpython-311.pyc +0 -0
- package/skills/ecosystem-services-assessment/scripts/compute_es.py +189 -0
- package/skills/ecosystem-services-assessment/scripts/tradeoff_analysis.R +161 -0
- package/skills/environmental-time-series/SKILL.md +125 -0
- package/skills/environmental-time-series/examples/example-prompts.md +33 -0
- package/skills/environmental-time-series/resources/anomaly-indices-reference.md +88 -0
- package/skills/environmental-time-series/resources/bfast-parameter-guide.md +69 -0
- package/skills/environmental-time-series/scripts/__pycache__/recovery_trajectory.cpython-311.pyc +0 -0
- package/skills/environmental-time-series/scripts/__pycache__/trend_analysis.cpython-311.pyc +0 -0
- package/skills/environmental-time-series/scripts/recovery_trajectory.R +305 -0
- package/skills/environmental-time-series/scripts/recovery_trajectory.py +178 -0
- package/skills/environmental-time-series/scripts/trend_analysis.R +192 -0
- package/skills/environmental-time-series/scripts/trend_analysis.py +184 -0
- package/skills/geoprocessing-for-ecology/SKILL.md +123 -0
- package/skills/geoprocessing-for-ecology/examples/example-prompts.md +32 -0
- package/skills/geoprocessing-for-ecology/resources/crs-reference.md +62 -0
- package/skills/geoprocessing-for-ecology/resources/global-predictor-sources.md +331 -0
- package/skills/geoprocessing-for-ecology/resources/resampling-methods.md +57 -0
- package/skills/geoprocessing-for-ecology/scripts/__pycache__/download_predictors.cpython-311.pyc +0 -0
- package/skills/geoprocessing-for-ecology/scripts/download_predictors.R +239 -0
- package/skills/geoprocessing-for-ecology/scripts/download_predictors.py +379 -0
- package/skills/geoprocessing-for-ecology/scripts/stack_and_extract.R +224 -0
- package/skills/geoprocessing-for-ecology/scripts/stack_and_extract.py +172 -0
- package/skills/landscape-connectivity/SKILL.md +170 -0
- package/skills/landscape-connectivity/examples/example-prompts.md +96 -0
- package/skills/landscape-connectivity/examples/jaguar_mesoamerica_corridor_example.md +271 -0
- package/skills/landscape-connectivity/resources/circuitscape-parameter-guide.md +155 -0
- package/skills/landscape-connectivity/resources/graph-theory-for-ecology.md +134 -0
- package/skills/landscape-connectivity/resources/resistance-surface-guide.md +141 -0
- package/skills/landscape-connectivity/scripts/connectivity_analysis.py +387 -0
- package/skills/landscape-connectivity/scripts/connectivity_metrics.R +274 -0
- package/skills/landscape-connectivity/scripts/resistance_surface.R +239 -0
- package/skills/model-validation-and-uncertainty/SKILL.md +131 -0
- package/skills/model-validation-and-uncertainty/examples/example-prompts.md +30 -0
- package/skills/model-validation-and-uncertainty/resources/extrapolation-risk-guide.md +236 -0
- package/skills/model-validation-and-uncertainty/resources/metric-selection-guide.md +52 -0
- package/skills/model-validation-and-uncertainty/resources/threshold-selection-guide.md +64 -0
- package/skills/model-validation-and-uncertainty/scripts/__pycache__/validate_model.cpython-311.pyc +0 -0
- package/skills/model-validation-and-uncertainty/scripts/extrapolation_risk.R +315 -0
- package/skills/model-validation-and-uncertainty/scripts/validate_model.py +226 -0
- package/skills/model-validation-and-uncertainty/scripts/validate_sdm.R +162 -0
- package/skills/occupancy-and-detection/SKILL.md +126 -0
- package/skills/occupancy-and-detection/examples/example-prompts.md +33 -0
- package/skills/occupancy-and-detection/resources/detection-history-format.md +100 -0
- package/skills/occupancy-and-detection/resources/occupancy-study-design.md +47 -0
- package/skills/occupancy-and-detection/scripts/__pycache__/occupancy_analysis.cpython-311.pyc +0 -0
- package/skills/occupancy-and-detection/scripts/occupancy_analysis.R +160 -0
- package/skills/occupancy-and-detection/scripts/occupancy_analysis.py +159 -0
- package/skills/population-viability-analysis/SKILL.md +161 -0
- package/skills/population-viability-analysis/examples/african_elephant_pva_example.md +266 -0
- package/skills/population-viability-analysis/examples/example-prompts.md +95 -0
- package/skills/population-viability-analysis/resources/extinction-risk-thresholds.md +128 -0
- package/skills/population-viability-analysis/resources/matrix-model-guide.md +139 -0
- package/skills/population-viability-analysis/resources/sensitivity-elasticity-reference.md +182 -0
- package/skills/population-viability-analysis/scripts/matrix_pva.R +258 -0
- package/skills/population-viability-analysis/scripts/pva_analysis.py +442 -0
- package/skills/population-viability-analysis/scripts/stochastic_pva.R +353 -0
- package/skills/predictive-modeling-best-practices/SKILL.md +136 -0
- package/skills/predictive-modeling-best-practices/examples/example-prompts.md +58 -0
- package/skills/predictive-modeling-best-practices/resources/collinearity-decision-tree.md +65 -0
- package/skills/predictive-modeling-best-practices/resources/sampling-bias-correction.md +267 -0
- package/skills/predictive-modeling-best-practices/resources/spatial-cv-guide.md +73 -0
- package/skills/predictive-modeling-best-practices/scripts/__pycache__/spatial_cv.cpython-311.pyc +0 -0
- package/skills/predictive-modeling-best-practices/scripts/collinearity_check.R +112 -0
- package/skills/predictive-modeling-best-practices/scripts/spatial_cv.py +182 -0
- package/skills/reproducible-ecology-pipeline/SKILL.md +139 -0
- package/skills/reproducible-ecology-pipeline/examples/example-prompts.md +35 -0
- package/skills/reproducible-ecology-pipeline/resources/directory-structure-template.md +94 -0
- package/skills/reproducible-ecology-pipeline/resources/params-yaml-template.yaml +84 -0
- package/skills/reproducible-ecology-pipeline/resources/reproducibility-checklist-template.md +66 -0
- package/skills/reproducible-ecology-pipeline/scripts/generate_file_manifest.py +110 -0
- package/skills/reproducible-ecology-pipeline/scripts/init_project.sh +53 -0
- package/skills/spatial-prioritization/SKILL.md +162 -0
- package/skills/spatial-prioritization/examples/biodiversity_hotspot_prioritization_example.md +289 -0
- package/skills/spatial-prioritization/examples/example-prompts.md +93 -0
- package/skills/spatial-prioritization/resources/cost-surface-reference.md +130 -0
- package/skills/spatial-prioritization/resources/marxan-vs-prioritizr-comparison.md +125 -0
- package/skills/spatial-prioritization/resources/prioritizr-formulation-guide.md +188 -0
- package/skills/spatial-prioritization/resources/representation-targets-guide.md +186 -0
- package/skills/spatial-prioritization/scripts/prioritization_sensitivity.R +320 -0
- package/skills/spatial-prioritization/scripts/run_prioritization.R +336 -0
- package/skills/species-distribution-modeling/SKILL.md +139 -0
- package/skills/species-distribution-modeling/examples/example-prompts.md +36 -0
- package/skills/species-distribution-modeling/resources/algorithm-comparison.md +25 -0
- package/skills/species-distribution-modeling/resources/calibration-area-guide.md +71 -0
- package/skills/species-distribution-modeling/resources/climate-scenario-preparation.md +170 -0
- package/skills/species-distribution-modeling/resources/maxent-calibration-guide.md +211 -0
- package/skills/species-distribution-modeling/resources/sdm-checklist.md +37 -0
- package/skills/species-distribution-modeling/scripts/predict_distribution.R +236 -0
- package/skills/species-distribution-modeling/scripts/predict_distribution.py +286 -0
- package/skills/species-distribution-modeling/scripts/prepare_future_layers.R +351 -0
- package/skills/species-distribution-modeling/scripts/project_scenarios.R +220 -0
- package/skills/species-distribution-modeling/scripts/run_ensemble_sdm.R +99 -0
- package/skills/species-distribution-modeling/scripts/sdm_pipeline.py +318 -0
- package/skills/species-distribution-modeling/scripts/tune_maxnet.R +344 -0
- package/templates/SKILL_TEMPLATE.md +225 -0
- package/templates/checklists/data-submission-checklist.md +38 -0
- package/templates/checklists/post-analysis-checklist.md +55 -0
- package/templates/checklists/pre-analysis-checklist.md +31 -0
- package/templates/prompts/debug-skill.md +47 -0
- package/templates/prompts/invoke-skill.md +34 -0
- package/templates/prompts/invoke-workflow.md +45 -0
- package/templates/reports/technical-report-template.md +80 -0
- package/templates/scripts/logger_setup.R +79 -0
- package/templates/scripts/logger_setup.py +119 -0
- package/templates/scripts/params_loader.R +28 -0
- package/templates/scripts/params_loader.py +38 -0
- package/workflows/analyze-community-structure/WORKFLOW.md +72 -0
- package/workflows/analyze-environmental-change/WORKFLOW.md +73 -0
- package/workflows/assess-ecological-impact/WORKFLOW.md +75 -0
- package/workflows/assess-ecosystem-services/WORKFLOW.md +68 -0
- package/workflows/assess-landscape-connectivity/WORKFLOW.md +84 -0
- package/workflows/build-fire-risk-map/WORKFLOW.md +79 -0
- package/workflows/produce-technical-report/WORKFLOW.md +113 -0
- package/workflows/run-camera-trap-occupancy/WORKFLOW.md +87 -0
- package/workflows/run-conservation-prioritization/WORKFLOW.md +89 -0
- package/workflows/run-multispecies-screening/WORKFLOW.md +197 -0
- package/workflows/run-occupancy-analysis/WORKFLOW.md +74 -0
- package/workflows/run-population-viability/WORKFLOW.md +90 -0
- package/workflows/run-sdm-study/WORKFLOW.md +99 -0
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ecological-data-foundation
|
|
3
|
+
description: "Cleans, validates, and standardizes ecological occurrence records and downloads biodiversity data from global repositories. Use this skill when the user needs data cleaning, coordinate validation, duplicate removal, outlier detection, taxonomic harmonization, or downloads from GBIF, iNaturalist, eBird, OBIS, or IUCN Red List. Also triggers for Darwin Core formatting, data quality reports, flagged records, and biodiversity data preparation."
|
|
4
|
+
skill_version: 1.0.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Skill: ecological-data-foundation
|
|
8
|
+
|
|
9
|
+
**Domain:** Data ingestion · QA · Schema · Metadata
|
|
10
|
+
**Phase:** 1 — Foundation
|
|
11
|
+
**Used by:** All workflows
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Purpose
|
|
16
|
+
|
|
17
|
+
This skill guides the agent through the first mandatory step of any quantitative ecology project: getting raw data into a clean, well-documented, analysis-ready state. It covers ingestion of heterogeneous sources, structural validation, duplicate and outlier detection, schema standardisation, and metadata generation.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## When to Invoke
|
|
22
|
+
|
|
23
|
+
- A new dataset arrives (CSV, XLSX, GDB, Shapefile, database export, API pull)
|
|
24
|
+
- Before any spatial operation, statistical test, or model fitting
|
|
25
|
+
- When the user asks to "clean", "validate", "check", or "prepare" ecological data
|
|
26
|
+
- When merging datasets from different sources or institutions
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Inputs
|
|
31
|
+
|
|
32
|
+
| Input | Format | Required |
|
|
33
|
+
|-------|--------|----------|
|
|
34
|
+
| Raw occurrence or survey data | CSV, XLSX, TSV, GDB | Yes |
|
|
35
|
+
| Data dictionary or field protocol | PDF, DOCX, TXT | Recommended |
|
|
36
|
+
| Environmental layers (if applicable) | GeoTIFF, NetCDF | Optional |
|
|
37
|
+
| Existing metadata record | EML, JSON, YAML | Optional |
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Outputs
|
|
42
|
+
|
|
43
|
+
| Output | Description |
|
|
44
|
+
|--------|-------------|
|
|
45
|
+
| `data_clean.csv` | Validated, deduplicated, standardised dataset |
|
|
46
|
+
| `qa_report.md` | Summary of issues found and actions taken |
|
|
47
|
+
| `schema.yaml` | Field definitions, types, valid ranges, units |
|
|
48
|
+
| `metadata.xml` | EML or Dublin Core metadata record |
|
|
49
|
+
| `flagged_records.csv` | Records removed or flagged, with reason codes |
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Steps
|
|
54
|
+
|
|
55
|
+
### 1. Ingest and Inspect
|
|
56
|
+
- Load all source files; report file sizes, row counts, column names, and data types
|
|
57
|
+
- Identify encoding issues, BOM characters, delimiter inconsistencies
|
|
58
|
+
- Document source provenance (institution, date, version, license)
|
|
59
|
+
|
|
60
|
+
### 2. Schema Standardisation
|
|
61
|
+
- Map field names to a canonical schema (Darwin Core preferred for biodiversity)
|
|
62
|
+
- Enforce consistent data types (dates as ISO-8601, coordinates as decimal degrees WGS84)
|
|
63
|
+
- Flag or convert non-standard units
|
|
64
|
+
|
|
65
|
+
### 3. Duplicate Detection
|
|
66
|
+
- Identify exact duplicates (all fields identical)
|
|
67
|
+
- Identify spatial-temporal near-duplicates (same species, same coordinates, same date ± N days)
|
|
68
|
+
- Document resolution strategy (keep first, keep most complete, remove all)
|
|
69
|
+
|
|
70
|
+
### 4. Coordinate and Spatial QA
|
|
71
|
+
- Check coordinate ranges (latitude: −90 to 90; longitude: −180 to 180)
|
|
72
|
+
- Flag records with coordinates at country centroids, capital cities, or institution headquarters (likely georeferencing errors)
|
|
73
|
+
- Flag records with zero coordinates (0,0)
|
|
74
|
+
- Validate records fall within the stated country/region polygon
|
|
75
|
+
|
|
76
|
+
### 5. Taxonomic QA
|
|
77
|
+
- Check species names against a reference taxonomy (GBIF Backbone, Catalogue of Life)
|
|
78
|
+
- Flag synonyms, misspellings, and higher-rank-only identifications
|
|
79
|
+
- Resolve to accepted name + authorship
|
|
80
|
+
|
|
81
|
+
### 6. Temporal QA
|
|
82
|
+
- Check for dates in the future or before plausible survey era
|
|
83
|
+
- Flag records with only year-level precision when finer precision is needed
|
|
84
|
+
|
|
85
|
+
### 7. Attribute QA
|
|
86
|
+
- Check numeric fields for out-of-range values (e.g., abundance < 0, DBH > 500 cm)
|
|
87
|
+
- Check categorical fields for invalid entries
|
|
88
|
+
- Assess missing value rates per field; flag fields above threshold (default 20%)
|
|
89
|
+
|
|
90
|
+
### 8. Generate Outputs
|
|
91
|
+
- Write `data_clean.csv` with action codes in an appended `QA_status` column
|
|
92
|
+
- Write `qa_report.md` summarising each issue category, count, and resolution
|
|
93
|
+
- Write `schema.yaml` with field definitions
|
|
94
|
+
- Write `metadata.xml` in EML format
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## Key Decisions to Document
|
|
99
|
+
|
|
100
|
+
- Duplicate resolution strategy
|
|
101
|
+
- Coordinate uncertainty threshold for exclusion
|
|
102
|
+
- Temporal precision requirement
|
|
103
|
+
- Missing value threshold per field
|
|
104
|
+
- Taxonomy backbone version used
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## Tools and Libraries
|
|
109
|
+
|
|
110
|
+
**R:** `dplyr`, `readr`, `janitor`, `CoordinateCleaner`, `taxize`, `EML`
|
|
111
|
+
**Python:** `pandas`, `pyjanitor`, `geopy`, `pycountry`, `dwca-reader`
|
|
112
|
+
**CLI:** `csvkit`, `miller (mlr)`
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## Resources
|
|
117
|
+
|
|
118
|
+
- `resources/darwin-core-glossary.md` — Darwin Core field reference
|
|
119
|
+
- `resources/qa-checklist.md` — printable QA checklist
|
|
120
|
+
- `resources/coordinate-cleaning-flags.md` — flag code reference
|
|
121
|
+
- `examples/` — example prompt invocations
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## Notes
|
|
126
|
+
|
|
127
|
+
- Always preserve the original raw file; never overwrite it
|
|
128
|
+
- Record the exact software versions used for reproducibility
|
|
129
|
+
- Large raster datasets should be handled by the `geoprocessing-for-ecology` skill after this step
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# Example Invocation Prompts — ecological-data-foundation
|
|
2
|
+
|
|
3
|
+
## Basic Cleaning
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
Load skill: ecological-data-foundation
|
|
7
|
+
Task: Clean and validate the occurrence dataset at data/raw/occurrences_raw.csv.
|
|
8
|
+
Apply all standard QA checks. The target taxon is mammals. Use GBIF Backbone for taxonomy.
|
|
9
|
+
Output results to data/processed/.
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
## Merging Multiple Sources
|
|
13
|
+
|
|
14
|
+
```
|
|
15
|
+
Load skill: ecological-data-foundation
|
|
16
|
+
Task: I have three occurrence datasets from different institutions:
|
|
17
|
+
- data/raw/mnrj_mammals.csv (MNRJ herbarium export)
|
|
18
|
+
- data/raw/gbif_download.csv (GBIF Darwin Core)
|
|
19
|
+
- data/raw/fieldwork_2023.xlsx (our field data)
|
|
20
|
+
Merge them into a single Darwin Core dataset. Remove duplicates and apply full QA.
|
|
21
|
+
Report how many records each source contributed after cleaning.
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Targeted Check
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
Load skill: ecological-data-foundation
|
|
28
|
+
Task: I already cleaned my data but want to run just the coordinate checks.
|
|
29
|
+
File: data/processed/occ_v1.csv. Apply CoordinateCleaner flags and report.
|
|
30
|
+
Do NOT modify the file; just produce a flag report.
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Schema Validation
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
Load skill: ecological-data-foundation
|
|
37
|
+
Task: Validate that data/processed/occ_clean.csv conforms to Darwin Core.
|
|
38
|
+
List any fields that are missing, misnamed, or have incorrect data types.
|
|
39
|
+
Generate schema.yaml from the current file.
|
|
40
|
+
```
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# Coordinate Cleaning Flag Reference
|
|
2
|
+
|
|
3
|
+
Based on the `CoordinateCleaner` R package flag system.
|
|
4
|
+
|
|
5
|
+
## Flag Codes and Descriptions
|
|
6
|
+
|
|
7
|
+
| Flag | Description | Default threshold | Action |
|
|
8
|
+
|------|-------------|------------------|--------|
|
|
9
|
+
| `.cap` | Coordinates at capital city | 0.1° radius | Flag and investigate |
|
|
10
|
+
| `.cen` | Coordinates at country centroid | 0.1° radius | Flag and investigate |
|
|
11
|
+
| `.gbif` | Coordinates at GBIF headquarters | 0.1° radius | Remove |
|
|
12
|
+
| `.inst` | Coordinates at known herbarium/museum | 0.1° radius | Flag and investigate |
|
|
13
|
+
| `.sea` | Coordinates in the ocean (for terrestrial taxa) | — | Flag and investigate |
|
|
14
|
+
| `.val` | Coordinates outside valid range | lat [-90,90], lon [-180,180] | Remove |
|
|
15
|
+
| `.zero` | Coordinates exactly at (0, 0) | — | Remove |
|
|
16
|
+
| `.equ` | Identical lat and lon values | — | Flag and investigate |
|
|
17
|
+
| `.dup` | Identical coordinates to another record | — | Flag; keep one |
|
|
18
|
+
| `.env` | Environmental outlier (extreme value in predictor space) | Mahalanobis distance p < 0.025 | Flag and investigate |
|
|
19
|
+
| `.out` | Spatial outlier (geographic distance from main cluster) | 7 MADs from median | Flag and investigate |
|
|
20
|
+
|
|
21
|
+
## Recommended Workflow (R)
|
|
22
|
+
|
|
23
|
+
```r
|
|
24
|
+
library(CoordinateCleaner)
|
|
25
|
+
|
|
26
|
+
flags <- clean_coordinates(
|
|
27
|
+
x = occ_df,
|
|
28
|
+
lon = "decimalLongitude",
|
|
29
|
+
lat = "decimalLatitude",
|
|
30
|
+
species = "species",
|
|
31
|
+
tests = c("capitals", "centroids", "equal", "gbif",
|
|
32
|
+
"institutions", "seas", "urban", "validity", "zeros"),
|
|
33
|
+
capitals_rad = 10000, # 10 km radius
|
|
34
|
+
centroids_rad = 1000, # 1 km radius
|
|
35
|
+
seas_ref = "buffland" # use buffered land polygon
|
|
36
|
+
)
|
|
37
|
+
|
|
38
|
+
# Inspect flags
|
|
39
|
+
summary(flags)
|
|
40
|
+
occ_clean <- occ_df[flags$.summary, ]
|
|
41
|
+
occ_flagged <- occ_df[!flags$.summary, ]
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Known Country Centroid Coordinates (South America)
|
|
45
|
+
|
|
46
|
+
| Country | Approx centroid lat | Approx centroid lon |
|
|
47
|
+
|---------|--------------------|--------------------|
|
|
48
|
+
| Brazil | -10.333 | -53.200 |
|
|
49
|
+
| Colombia | 4.099 | -72.888 |
|
|
50
|
+
| Peru | -9.190 | -75.016 |
|
|
51
|
+
| Bolivia | -16.290 | -63.589 |
|
|
52
|
+
| Argentina | -34.000 | -64.000 |
|
|
53
|
+
| Paraguay | -23.442 | -58.444 |
|
|
54
|
+
|
|
55
|
+
## Ocean Check for Terrestrial Taxa
|
|
56
|
+
|
|
57
|
+
Use `cc_sea()` with a buffered land polygon (0.5–1° buffer) to avoid incorrectly flagging coastal records:
|
|
58
|
+
|
|
59
|
+
```r
|
|
60
|
+
occ_sea_checked <- cc_sea(
|
|
61
|
+
x = occ_df,
|
|
62
|
+
lon = "decimalLongitude",
|
|
63
|
+
lat = "decimalLatitude",
|
|
64
|
+
ref = buffland # load from CoordinateCleaner package
|
|
65
|
+
)
|
|
66
|
+
```
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
# Darwin Core Field Glossary
|
|
2
|
+
|
|
3
|
+
Essential fields for biodiversity occurrence data following the [Darwin Core standard](https://dwc.tdwg.org/).
|
|
4
|
+
|
|
5
|
+
## Identity Fields
|
|
6
|
+
|
|
7
|
+
| Field | Type | Description | Example |
|
|
8
|
+
|-------|------|-------------|---------|
|
|
9
|
+
| `occurrenceID` | string | Globally unique identifier for the occurrence | `urn:uuid:a948571f-...` |
|
|
10
|
+
| `catalogNumber` | string | Institution-assigned identifier | `MNRJ-12345` |
|
|
11
|
+
| `recordedBy` | string | Observer name(s) | `"Silva, J.R."` |
|
|
12
|
+
| `recordNumber` | string | Field number assigned by the observer | `JRS-2023-001` |
|
|
13
|
+
|
|
14
|
+
## Taxonomic Fields
|
|
15
|
+
|
|
16
|
+
| Field | Type | Description | Example |
|
|
17
|
+
|-------|------|-------------|---------|
|
|
18
|
+
| `scientificName` | string | Full scientific name with authorship | `Panthera onca (Linnaeus, 1758)` |
|
|
19
|
+
| `kingdom` | string | | `Animalia` |
|
|
20
|
+
| `phylum` | string | | `Chordata` |
|
|
21
|
+
| `class` | string | | `Mammalia` |
|
|
22
|
+
| `order` | string | | `Carnivora` |
|
|
23
|
+
| `family` | string | | `Felidae` |
|
|
24
|
+
| `genus` | string | | `Panthera` |
|
|
25
|
+
| `specificEpithet` | string | Species epithet only | `onca` |
|
|
26
|
+
| `taxonRank` | string | Lowest rank of the name | `species` |
|
|
27
|
+
| `vernacularName` | string | Common name | `jaguar` |
|
|
28
|
+
| `taxonID` | string | Taxon identifier in a reference system | `gbif:5219243` |
|
|
29
|
+
|
|
30
|
+
## Occurrence Fields
|
|
31
|
+
|
|
32
|
+
| Field | Type | Description | Example |
|
|
33
|
+
|-------|------|-------------|---------|
|
|
34
|
+
| `basisOfRecord` | string | Nature of the record | `HumanObservation`, `PreservedSpecimen`, `MachineObservation` |
|
|
35
|
+
| `occurrenceStatus` | string | Presence or absence | `present`, `absent` |
|
|
36
|
+
| `individualCount` | integer | Number of individuals | `3` |
|
|
37
|
+
| `sex` | string | | `male`, `female`, `undetermined` |
|
|
38
|
+
| `lifeStage` | string | | `adult`, `juvenile`, `larva` |
|
|
39
|
+
| `behavior` | string | Observed behavior | `foraging` |
|
|
40
|
+
|
|
41
|
+
## Location Fields
|
|
42
|
+
|
|
43
|
+
| Field | Type | Description | Example |
|
|
44
|
+
|-------|------|-------------|---------|
|
|
45
|
+
| `decimalLatitude` | float | Latitude in decimal degrees (WGS84) | `-15.7801` |
|
|
46
|
+
| `decimalLongitude` | float | Longitude in decimal degrees (WGS84) | `-47.9292` |
|
|
47
|
+
| `geodeticDatum` | string | Datum for coordinates | `WGS84` |
|
|
48
|
+
| `coordinateUncertaintyInMeters` | float | Radius of coordinate uncertainty | `100` |
|
|
49
|
+
| `coordinatePrecision` | float | Decimal places of coordinates | `0.0001` |
|
|
50
|
+
| `countryCode` | string | ISO 3166-1 alpha-2 | `BR` |
|
|
51
|
+
| `stateProvince` | string | State or province | `Mato Grosso` |
|
|
52
|
+
| `county` | string | County or municipality | `Cáceres` |
|
|
53
|
+
| `locality` | string | Specific location description | `"Fazenda São José, margem do rio"` |
|
|
54
|
+
| `verbatimElevation` | string | Original elevation text | `"850-900 m"` |
|
|
55
|
+
| `minimumElevationInMeters` | float | | `850` |
|
|
56
|
+
| `maximumElevationInMeters` | float | | `900` |
|
|
57
|
+
|
|
58
|
+
## Event Fields
|
|
59
|
+
|
|
60
|
+
| Field | Type | Description | Example |
|
|
61
|
+
|-------|------|-------------|---------|
|
|
62
|
+
| `eventDate` | string | ISO-8601 date or date range | `2023-07-15`, `2023-07/2023-08` |
|
|
63
|
+
| `year` | integer | | `2023` |
|
|
64
|
+
| `month` | integer | | `7` |
|
|
65
|
+
| `day` | integer | | `15` |
|
|
66
|
+
| `eventTime` | string | Time of observation | `14:30:00-03:00` |
|
|
67
|
+
| `samplingProtocol` | string | Method used | `"point count"`, `"camera trap"` |
|
|
68
|
+
| `samplingEffort` | string | Effort description | `"3 nights, 1 trap"` |
|
|
69
|
+
|
|
70
|
+
## Data Quality Fields
|
|
71
|
+
|
|
72
|
+
| Field | Type | Description |
|
|
73
|
+
|-------|------|-------------|
|
|
74
|
+
| `identificationQualifier` | string | Uncertainty in identification (`cf.`, `aff.`) |
|
|
75
|
+
| `identifiedBy` | string | Who identified the specimen |
|
|
76
|
+
| `dateIdentified` | string | When identification was made |
|
|
77
|
+
| `datasetName` | string | Source dataset name |
|
|
78
|
+
| `institutionCode` | string | Institution acronym |
|
|
79
|
+
| `license` | string | Data license (CC BY 4.0, etc.) |
|
|
80
|
+
| `rightsHolder` | string | Owner of the data rights |
|
|
81
|
+
|
|
82
|
+
## Common Issues and Fixes
|
|
83
|
+
|
|
84
|
+
| Issue | Detection | Fix |
|
|
85
|
+
|-------|-----------|-----|
|
|
86
|
+
| Coordinates swapped (lat/lon) | lat > 90 or < -90 | Swap columns |
|
|
87
|
+
| Comma as decimal separator | `"-15,78"` | Replace `,` → `.` |
|
|
88
|
+
| DMS instead of decimal | `"15°46'48"S"` | Convert to decimal |
|
|
89
|
+
| Missing datum | No `geodeticDatum` | Assume WGS84; flag |
|
|
90
|
+
| Future date | `eventDate` > today | Flag and investigate |
|
|
91
|
+
| Country centroid | Coordinates = known centroid | Flag as `COUNTRY_CENTROID` |
|
|
@@ -0,0 +1,265 @@
|
|
|
1
|
+
# Data Citation Guide for Ecological Occurrence Data
|
|
2
|
+
|
|
3
|
+
Reference for citing biodiversity data sources correctly in publications, reports, and analytical workflows. Covers citation formats, data-use policies, and licence requirements for the six primary sources supported by this skill library.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## 1. GBIF — Global Biodiversity Information Facility
|
|
8
|
+
|
|
9
|
+
**Website**: https://www.gbif.org
|
|
10
|
+
**API**: https://api.gbif.org/v1/
|
|
11
|
+
|
|
12
|
+
### Citation formats
|
|
13
|
+
|
|
14
|
+
#### `occ_download` / `occ.download` (preferred for publications)
|
|
15
|
+
GBIF issues a citable DOI for every download submitted via `occ_download` (R) or `occ.download` (Python). The DOI resolves to a stable dataset snapshot.
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
GBIF.org (YEAR) GBIF Occurrence Download.
|
|
19
|
+
https://doi.org/10.15468/dl.XXXXXXX
|
|
20
|
+
Accessed on YYYY-MM-DD.
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Example (R):
|
|
24
|
+
```r
|
|
25
|
+
meta <- occ_download_meta(dl_key)
|
|
26
|
+
doi <- meta$doi
|
|
27
|
+
cat("Citation: GBIF.org (", format(Sys.Date(), "%Y"), ") GBIF Occurrence Download. ",
|
|
28
|
+
"https://doi.org/", doi, " Accessed on ", Sys.Date(), ".\n", sep = "")
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
#### `occ_search` / `occ.search` (no DOI)
|
|
32
|
+
`occ_search` does not generate a persistent DOI and **should not be used in publications**. For exploratory analysis only. Record the query date and parameters manually.
|
|
33
|
+
|
|
34
|
+
```
|
|
35
|
+
GBIF.org (YEAR) Occurrence records for [species]. Query via GBIF API (occ_search).
|
|
36
|
+
Accessed on YYYY-MM-DD. [Not citable — re-run with occ_download for publication.]
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### Data use policy
|
|
40
|
+
- Most records: **CC BY 4.0** or **CC0**
|
|
41
|
+
- Some datasets: **CC BY-NC 4.0** (check individual dataset licences at https://www.gbif.org/dataset)
|
|
42
|
+
- Always attribute the original data provider, not only GBIF
|
|
43
|
+
- Use the GBIF citation widget: https://www.gbif.org/citation-guidelines
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## 2. iNaturalist
|
|
48
|
+
|
|
49
|
+
**Website**: https://www.inaturalist.org
|
|
50
|
+
**API**: https://api.inaturalist.org/v1/
|
|
51
|
+
|
|
52
|
+
### Citation format
|
|
53
|
+
|
|
54
|
+
iNaturalist does **not** issue download DOIs. Record the access date precisely.
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
iNaturalist contributors and the California Academy of Sciences (YEAR).
|
|
58
|
+
iNaturalist Research-grade Observations. iNaturalist.org.
|
|
59
|
+
Accessed on YYYY-MM-DD. https://www.inaturalist.org
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Or, if re-exported through GBIF:
|
|
63
|
+
```
|
|
64
|
+
iNaturalist (YEAR) Occurrence records exported via GBIF.
|
|
65
|
+
https://doi.org/10.15468/ab3s5x
|
|
66
|
+
Accessed on YYYY-MM-DD.
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### Quality grades
|
|
70
|
+
| Grade | Meaning | SDM recommendation |
|
|
71
|
+
|-------|---------|-------------------|
|
|
72
|
+
| **Research** | ID agreed by ≥2 users, has coordinates, is not captive | Use by default |
|
|
73
|
+
| **Needs ID** | Single ID or disagreement | Avoid for SDM |
|
|
74
|
+
| **Casual** | Missing fields, captive, or cultivated | Exclude |
|
|
75
|
+
|
|
76
|
+
### Data use policy
|
|
77
|
+
- Research-grade observations: **CC BY-NC** by default (individual users may choose CC0 or CC BY)
|
|
78
|
+
- Commercial use requires CC0 or CC BY records only
|
|
79
|
+
- Bulk data exports for research: permitted; cite as above
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## 3. eBird (Cornell Lab of Ornithology)
|
|
84
|
+
|
|
85
|
+
**Website**: https://ebird.org
|
|
86
|
+
**Data portal**: https://ebird.org/data/download
|
|
87
|
+
|
|
88
|
+
### Citation format
|
|
89
|
+
|
|
90
|
+
eBird data requires a **signed Data Use Agreement**. Access is free but must be requested.
|
|
91
|
+
|
|
92
|
+
```
|
|
93
|
+
eBird Basic Dataset (EBD). Version: YEAR-MM.
|
|
94
|
+
Cornell Lab of Ornithology, Ithaca, New York.
|
|
95
|
+
https://ebird.org/data/download
|
|
96
|
+
Accessed on YYYY-MM-DD.
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Data use policy
|
|
100
|
+
- **Non-commercial research** only under the eBird Data Access Agreement
|
|
101
|
+
- Data may not be redistributed or used to create competing products
|
|
102
|
+
- Publications must acknowledge eBird and include the citation above
|
|
103
|
+
- Restricted species data (sensitive species, breeding season) may be withheld
|
|
104
|
+
|
|
105
|
+
### Filtering recommendations for SDM
|
|
106
|
+
| Filter | Recommended value | Rationale |
|
|
107
|
+
|--------|------------------|-----------|
|
|
108
|
+
| Protocol | Stationary, Traveling | Standardised effort |
|
|
109
|
+
| Approved | TRUE | Quality-reviewed by eBird |
|
|
110
|
+
| Complete checklist | TRUE | Absence implied for non-detected species |
|
|
111
|
+
| Effort distance | ≤5 km | Reduces localisation uncertainty |
|
|
112
|
+
| Duration | 5–300 min | Avoids very short/long checklists |
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## 4. OBIS — Ocean Biodiversity Information System
|
|
117
|
+
|
|
118
|
+
**Website**: https://obis.org
|
|
119
|
+
**API**: https://api.obis.org/v3/
|
|
120
|
+
|
|
121
|
+
### Citation format
|
|
122
|
+
|
|
123
|
+
```
|
|
124
|
+
OBIS (YEAR). Ocean Biodiversity Information System.
|
|
125
|
+
Intergovernmental Oceanographic Commission of UNESCO.
|
|
126
|
+
www.obis.org. Accessed on YYYY-MM-DD.
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
For specific datasets within OBIS, also cite the original dataset DOI available in each record's `datasetName` and `resourceID` fields.
|
|
130
|
+
|
|
131
|
+
### Data use policy
|
|
132
|
+
- **CC0 1.0** (public domain dedication)
|
|
133
|
+
- Users encouraged (but not required) to share results with OBIS
|
|
134
|
+
- Do not use OBIS data to identify precise locations of sensitive species (use aggregated records)
|
|
135
|
+
|
|
136
|
+
### Quality flags to exclude
|
|
137
|
+
| Flag | Meaning |
|
|
138
|
+
|------|---------|
|
|
139
|
+
| `NO_COORD` | Missing coordinates |
|
|
140
|
+
| `ZERO_COORD` | Coordinates are 0,0 |
|
|
141
|
+
| `ON_LAND` | Marine record mapped to land |
|
|
142
|
+
| `DEPTH_EXCEEDS_BATH` | Depth exceeds bathymetry |
|
|
143
|
+
| `COORDINATE_MISMATCH` | Textual and coordinate locations conflict |
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## 5. IUCN Red List
|
|
148
|
+
|
|
149
|
+
**Website**: https://www.iucnredlist.org
|
|
150
|
+
**API**: https://apiv3.iucnredlist.org
|
|
151
|
+
**API key**: Required — apply at https://apiv3.iucnredlist.org/
|
|
152
|
+
|
|
153
|
+
### Citation format
|
|
154
|
+
|
|
155
|
+
```
|
|
156
|
+
IUCN (YEAR). The IUCN Red List of Threatened Species. Version YEAR-N.
|
|
157
|
+
https://www.iucnredlist.org. Accessed on YYYY-MM-DD.
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
For species-specific assessments:
|
|
161
|
+
```
|
|
162
|
+
[Author(s)] (YEAR). [Species name]. The IUCN Red List of Threatened Species YEAR:
|
|
163
|
+
[Category]. https://dx.doi.org/10.2305/IUCN.UK.[version].RLTS.[TXID].en.
|
|
164
|
+
Accessed on YYYY-MM-DD.
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
### Data use policy
|
|
168
|
+
- **CC BY 4.0**
|
|
169
|
+
- Distribution maps and species data may not be used to create competing databases
|
|
170
|
+
- API key must not be shared; each user must register independently
|
|
171
|
+
- Sensitive species (CR, EN) distribution data may be partially obscured
|
|
172
|
+
|
|
173
|
+
### Red List categories
|
|
174
|
+
| Code | Category |
|
|
175
|
+
|------|---------|
|
|
176
|
+
| EX | Extinct |
|
|
177
|
+
| EW | Extinct in the Wild |
|
|
178
|
+
| CR | Critically Endangered |
|
|
179
|
+
| EN | Endangered |
|
|
180
|
+
| VU | Vulnerable |
|
|
181
|
+
| NT | Near Threatened |
|
|
182
|
+
| LC | Least Concern |
|
|
183
|
+
| DD | Data Deficient |
|
|
184
|
+
| NE | Not Evaluated |
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
## 6. WorldClim / CHELSA (Predictor Data)
|
|
189
|
+
|
|
190
|
+
### WorldClim v2.1
|
|
191
|
+
|
|
192
|
+
```
|
|
193
|
+
Fick, S.E. & Hijmans, R.J. (2017). WorldClim 2: new 1-km spatial resolution climate surfaces
|
|
194
|
+
for global land areas. International Journal of Climatology 37(12): 4302–4315.
|
|
195
|
+
https://doi.org/10.1002/joc.5086
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
**Licence**: CC BY 4.0
|
|
199
|
+
|
|
200
|
+
### CHELSA v2.1
|
|
201
|
+
|
|
202
|
+
```
|
|
203
|
+
Karger, D.N. et al. (2021). Global climate data at high spatial resolution (CHELSA v2.1).
|
|
204
|
+
Scientific Data 8: 282. https://doi.org/10.1038/s41597-021-01084-7
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
**Licence**: CC BY 4.0
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## Combining Multiple Sources
|
|
212
|
+
|
|
213
|
+
When merging occurrence data from multiple sources, include the `source` and `datasetName` columns in your output so records can be traced back to their origin.
|
|
214
|
+
|
|
215
|
+
### Recommended merge workflow
|
|
216
|
+
```r
|
|
217
|
+
library(dplyr)
|
|
218
|
+
occ_all <- bind_rows(
|
|
219
|
+
read_csv("output/gbif/occurrences_raw_GBIF_Panthera_onca_20260101.csv"),
|
|
220
|
+
read_csv("output/inat/occurrences_raw_iNat_Panthera_onca_20260101.csv"),
|
|
221
|
+
read_csv("output/obis/occurrences_raw_OBIS_Chelonia_mydas_20260101.csv")
|
|
222
|
+
) |>
|
|
223
|
+
distinct(species, decimalLatitude, decimalLongitude, eventDate, .keep_all = TRUE)
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
### Master data citation block (for Methods section)
|
|
227
|
+
```
|
|
228
|
+
Occurrence data were downloaded from GBIF (GBIF.org YEAR, doi:...), iNaturalist
|
|
229
|
+
(iNaturalist contributors and CAS YEAR, accessed YYYY-MM-DD), and OBIS (OBIS YEAR,
|
|
230
|
+
accessed YYYY-MM-DD). Records were merged and spatially thinned to one record per
|
|
231
|
+
[resolution] grid cell. Final dataset: [n] records of [n_species] species, [year_range].
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
---
|
|
235
|
+
|
|
236
|
+
## Licence Compatibility Matrix
|
|
237
|
+
|
|
238
|
+
| Source | Licence | Commercial use | Redistribute | Attribution required |
|
|
239
|
+
|--------|---------|----------------|-------------|----------------------|
|
|
240
|
+
| GBIF (CC0) | CC0 | Yes | Yes | Strongly recommended |
|
|
241
|
+
| GBIF (CC BY) | CC BY 4.0 | Yes | Yes | Yes |
|
|
242
|
+
| GBIF (CC BY-NC) | CC BY-NC 4.0 | **No** | Yes | Yes |
|
|
243
|
+
| iNaturalist | CC BY-NC | **No** | Yes | Yes |
|
|
244
|
+
| eBird | Custom DUA | **No** | **No** | Yes |
|
|
245
|
+
| OBIS | CC0 | Yes | Yes | Recommended |
|
|
246
|
+
| IUCN | CC BY 4.0 | Yes | Yes* | Yes |
|
|
247
|
+
| WorldClim | CC BY 4.0 | Yes | Yes | Yes |
|
|
248
|
+
| CHELSA | CC BY 4.0 | Yes | Yes | Yes |
|
|
249
|
+
|
|
250
|
+
*IUCN: redistribution of the full database is not permitted; individual species data may be shared with attribution.
|
|
251
|
+
|
|
252
|
+
---
|
|
253
|
+
|
|
254
|
+
## Quick Reference: occ_search vs occ_download (GBIF)
|
|
255
|
+
|
|
256
|
+
| Aspect | `occ_search` | `occ_download` |
|
|
257
|
+
|--------|-------------|---------------|
|
|
258
|
+
| Speed | Immediate | Minutes to hours |
|
|
259
|
+
| Record limit | 100,000 | Unlimited |
|
|
260
|
+
| DOI generated | **No** | **Yes** |
|
|
261
|
+
| Reproducible | **No** | **Yes** |
|
|
262
|
+
| Recommended for | Exploration | Publication |
|
|
263
|
+
| Credentials needed | No | Yes (GBIF account) |
|
|
264
|
+
|
|
265
|
+
**Rule of thumb**: Use `occ_search` for initial exploration and data assessment. Switch to `occ_download` before any analysis intended for publication.
|