PyPI - ngiab-data-preprocess - Versions diffs - 4.6.5__tar.gz → 4.6.7__tar.gz - Mend

ngiab-data-preprocess 4.6.5tar.gz → 4.6.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ngiab_data_preprocess
-Version: 4.6.5
+Version: 4.6.7
 Summary: Graphical Tools for creating Next Gen Water model input data.
 Author-email: Josh Cunningham <jcunningham8@ua.edu>
 Project-URL: Homepage, https://github.com/CIROH-UA/NGIAB_data_preprocess
@@ -75,13 +75,13 @@ This repository contains tools for preparing data to run a [NextGen](https://git
 ## What does this tool do?
-This tool prepares data to run a NextGen-based simulation by creating a run package that can be used with NGIAB.
-It uses geometry and model attributes from the [v2.2 hydrofabric](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg) more information on [all data sources here](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html).
+This tool prepares data to run a NextGen-based simulation by creating a run package that can be used with NGIAB.
+It uses geometry and model attributes from the [v2.2 hydrofabric](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg) more information on [all data sources here](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html).
 The raw forcing data is [nwm retrospective v3 forcing](https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/index.html#CONUS/zarr/forcing/) data or the [AORC 1km gridded data](https://noaa-nws-aorc-v1-1-1km.s3.amazonaws.com/index.html) depending on user input
-1. **Subsets** (delineates) everything upstream of your point of interest (catchment, gage, flowpath etc) from the hydrofabric. This subset is output as a geopackage (.gpkg).
-2. Calculates **forcings** as a weighted mean of the gridded NWM or AORC forcings. Weights are calculated using [exact extract](https://isciences.github.io/exactextract/) and computed with numpy.
-3. Creates **configuration files** for a default NGIAB model run.
+1. **Subsets** (delineates) everything upstream of your point of interest (catchment, gage, flowpath etc) from the hydrofabric. This subset is output as a geopackage (.gpkg).
+2. Calculates **forcings** as a weighted mean of the gridded NWM or AORC forcings. Weights are calculated using [exact extract](https://isciences.github.io/exactextract/) and computed with numpy.
+3. Creates **configuration files** for a default NGIAB model run.
     -  realization.json  - ngen model configuration
     -  troute.yaml - routing configuration.
     -  **per catchment** model configuration
@@ -136,13 +136,13 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
 # Create a virtual environment in the current directory
 uv venv
-# Install the tool in the virtual environment
+# Install the tool in the virtual environment
 uv pip install ngiab_data_preprocess
 # To run the cli
 uv run cli --help
-# To run the map
+# To run the map
 uv run map_app
 ```
@@ -160,7 +160,7 @@ UV automatically detects any virtual environments in the current directory and w
 (notebook) jovyan@jupyter-user:~$ conda deactivate
 jovyan@jupyter-user:~$
 # The interactive map won't work on 2i2c
-```
+```
 ```bash
 # This tool is likely to not work without a virtual environment
@@ -205,7 +205,7 @@ To install and run the tool, follow these steps:
 Running the `map_app` tool will open the app in a new browser tab.
-Install-free: `uvx --from ngiab-data-preprocess map_app`
+Install-free: `uvx --from ngiab-data-preprocess map_app`
 Installed with uv: `uv run map_app`
 ## Using the map interface
@@ -225,7 +225,7 @@ Once all the steps are finished, you can run NGIAB on the folder shown underneat
 ## Running the CLI
-Install-free: `uvx ngiab-prep`
+Install-free: `uvx ngiab-prep`
 Installed with uv: `uv run cli`
 ## Arguments
@@ -236,6 +236,7 @@ Installed with uv: `uv run cli`
 - `-l`, `--latlon`: Use latitude and longitude instead of catid. Expects comma-separated values via the CLI, e.g., `python -m ngiab_data_cli -i 54.33,-69.4 -l -s`.
 - `-g`, `--gage`: Use gage ID instead of catid. Expects a single gage ID via the CLI, e.g., `python -m ngiab_data_cli -i 01646500 -g -s`.
 - `-s`, `--subset`: Subset the hydrofabric to the given feature.
+- `--subset_type`: Specify the subset type. `nexus`: get everything flowing into the downstream nexus of the selected catchment. `catchment`: get everything flowing into the selected catchment.
 - `-f`, `--forcings`: Generate forcings for the given feature.
 - `-r`, `--realization`: Create a realization for the given feature.
 - `--lstm`: Configures the data for the [python lstm](https://github.com/ciroh-ua/lstm/).
@@ -259,7 +260,7 @@ Installed with uv: `uv run cli`
 1. Prepare everything for an NGIAB run at a given gage:
    ```bash
-   uvx ngiab-prep -i gage-10154200 -sfr --start 2022-01-01 --end 2022-02-28
+   uvx ngiab-prep -i gage-10154200 -sfr --start 2022-01-01 --end 2022-02-28
    #         add --run or replace -sfr with --all to run NGIAB, too
    # to name the folder, add -o folder_name
    ```

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/README.md RENAMED Viewed

@@ -35,13 +35,13 @@ This repository contains tools for preparing data to run a [NextGen](https://git
 ## What does this tool do?
-This tool prepares data to run a NextGen-based simulation by creating a run package that can be used with NGIAB.
-It uses geometry and model attributes from the [v2.2 hydrofabric](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg) more information on [all data sources here](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html).
+This tool prepares data to run a NextGen-based simulation by creating a run package that can be used with NGIAB.
+It uses geometry and model attributes from the [v2.2 hydrofabric](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg) more information on [all data sources here](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html).
 The raw forcing data is [nwm retrospective v3 forcing](https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/index.html#CONUS/zarr/forcing/) data or the [AORC 1km gridded data](https://noaa-nws-aorc-v1-1-1km.s3.amazonaws.com/index.html) depending on user input
-1. **Subsets** (delineates) everything upstream of your point of interest (catchment, gage, flowpath etc) from the hydrofabric. This subset is output as a geopackage (.gpkg).
-2. Calculates **forcings** as a weighted mean of the gridded NWM or AORC forcings. Weights are calculated using [exact extract](https://isciences.github.io/exactextract/) and computed with numpy.
-3. Creates **configuration files** for a default NGIAB model run.
+1. **Subsets** (delineates) everything upstream of your point of interest (catchment, gage, flowpath etc) from the hydrofabric. This subset is output as a geopackage (.gpkg).
+2. Calculates **forcings** as a weighted mean of the gridded NWM or AORC forcings. Weights are calculated using [exact extract](https://isciences.github.io/exactextract/) and computed with numpy.
+3. Creates **configuration files** for a default NGIAB model run.
     -  realization.json  - ngen model configuration
     -  troute.yaml - routing configuration.
     -  **per catchment** model configuration
@@ -96,13 +96,13 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
 # Create a virtual environment in the current directory
 uv venv
-# Install the tool in the virtual environment
+# Install the tool in the virtual environment
 uv pip install ngiab_data_preprocess
 # To run the cli
 uv run cli --help
-# To run the map
+# To run the map
 uv run map_app
 ```
@@ -120,7 +120,7 @@ UV automatically detects any virtual environments in the current directory and w
 (notebook) jovyan@jupyter-user:~$ conda deactivate
 jovyan@jupyter-user:~$
 # The interactive map won't work on 2i2c
-```
+```
 ```bash
 # This tool is likely to not work without a virtual environment
@@ -165,7 +165,7 @@ To install and run the tool, follow these steps:
 Running the `map_app` tool will open the app in a new browser tab.
-Install-free: `uvx --from ngiab-data-preprocess map_app`
+Install-free: `uvx --from ngiab-data-preprocess map_app`
 Installed with uv: `uv run map_app`
 ## Using the map interface
@@ -185,7 +185,7 @@ Once all the steps are finished, you can run NGIAB on the folder shown underneat
 ## Running the CLI
-Install-free: `uvx ngiab-prep`
+Install-free: `uvx ngiab-prep`
 Installed with uv: `uv run cli`
 ## Arguments
@@ -196,6 +196,7 @@ Installed with uv: `uv run cli`
 - `-l`, `--latlon`: Use latitude and longitude instead of catid. Expects comma-separated values via the CLI, e.g., `python -m ngiab_data_cli -i 54.33,-69.4 -l -s`.
 - `-g`, `--gage`: Use gage ID instead of catid. Expects a single gage ID via the CLI, e.g., `python -m ngiab_data_cli -i 01646500 -g -s`.
 - `-s`, `--subset`: Subset the hydrofabric to the given feature.
+- `--subset_type`: Specify the subset type. `nexus`: get everything flowing into the downstream nexus of the selected catchment. `catchment`: get everything flowing into the selected catchment.
 - `-f`, `--forcings`: Generate forcings for the given feature.
 - `-r`, `--realization`: Create a realization for the given feature.
 - `--lstm`: Configures the data for the [python lstm](https://github.com/ciroh-ua/lstm/).
@@ -219,7 +220,7 @@ Installed with uv: `uv run cli`
 1. Prepare everything for an NGIAB run at a given gage:
    ```bash
-   uvx ngiab-prep -i gage-10154200 -sfr --start 2022-01-01 --end 2022-02-28
+   uvx ngiab-prep -i gage-10154200 -sfr --start 2022-01-01 --end 2022-02-28
    #         add --run or replace -sfr with --all to run NGIAB, too
    # to name the folder, add -o folder_name
    ```

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/modules/data_processing/dataset_utils.py RENAMED Viewed

@@ -107,9 +107,13 @@ def clip_dataset_to_bounds(
     """
     # check time range here in case just this function is imported and not the whole module
     start_time, end_time = validate_time_range(dataset, start_time, end_time)
+    samplex = dataset.x.values[:2]
+    intervalx = samplex[1] - samplex[0]
+    sampley = dataset.y.values[:2]
+    intervaly = sampley[1] - sampley[0]
     dataset = dataset.sel(
-        x=slice(bounds[0], bounds[2]),
-        y=slice(bounds[1], bounds[3]),
+        x=slice(bounds[0]-intervalx, bounds[2]+intervalx),
+        y=slice(bounds[1]-intervaly, bounds[3]+intervaly),
         time=slice(start_time, end_time),
     )
     logger.info("Selected time range and clipped to bounds")

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/modules/data_processing/gpkg_utils.py RENAMED Viewed

@@ -530,7 +530,7 @@ def get_cat_to_nhd_feature_id(gpkg: Path = FilePaths.conus_hydrofabric) -> Dict[
         )
     table_name = list(tables)[0]
-    sql_query = f"SELECT divide_id, hf_id FROM {table_name} WHERE divide_id IS NOT NULL AND hf_id IS NOT NULL"
+    sql_query = f"SELECT divide_id, hf_id FROM {table_name} WHERE divide_id IS NOT NULL AND hf_id IS NOT NULL ORDER BY hf_hydroseq DESC"
     with sqlite3.connect(gpkg) as conn:
         result: List[Tuple[str, str]] = conn.execute(sql_query).fetchall()
@@ -539,6 +539,7 @@ def get_cat_to_nhd_feature_id(gpkg: Path = FilePaths.conus_hydrofabric) -> Dict[
     for cat, feature in result:
         # the ids are stored as floats this converts to int to match nwm output
         # numeric ids should be stored as strings.
+        # Because of the ORDER BY above, the lowest hf_hydroseq "wins"
         mapping[cat] = int(feature)
     return mapping

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/modules/map_app/static/css/toggle.css RENAMED Viewed

@@ -79,7 +79,7 @@
 .toggle-input:checked + .toggle-label .toggle-text-left {
     color: #888; /* Grey color for non-selected text */
 }
 .toggle-input:checked + .toggle-label .toggle-text-right {
     color: #888; /* Grey color for non-selected text */
 }
@@ -91,7 +91,7 @@
   top: -40px;
 }
-.menu-switch__input {
+#menuToggle .menu-switch__input {
   opacity: 0;
   width: 100%;
   height: 100%;
@@ -162,7 +162,7 @@
         height: 1.4em;
         position: relative;
         top: -0.2em;
-        margin-right: 1em;
+        margin-right: 1em;
         vertical-align: top;
         cursor: pointer;
         text-align: center;

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/modules/map_app/static/js/data_processing.js RENAMED Viewed

@@ -19,12 +19,13 @@ async function subset() {
                 document.getElementById('output-path').innerHTML = "Subset canceled. Geopackage located at " + filename;
                 return;
             }
-        }
+        }
         // check what kind of subset
-        // get the position of the subset toggle
-        // false means subset by nexus, true means subset by catchment
-        var nexus_catchment = document.getElementById('subset-toggle').checked;
-        var subset_type = nexus_catchment ? 'catchment' : 'nexus';
+        if (document.getElementById('radio-nexus').checked) {
+            var subset_type = 'nexus'
+        } else {
+            var subset_type = 'catchment'
+        }
         const startTime = performance.now(); // Start the timer
         fetch('/subset', {
@@ -126,7 +127,7 @@ async function forcings() {
                 body: JSON.stringify(forcing_dir),
             })
             .then(async (response) => response.text())
-            .then(progressFile => {
+            .then(progressFile => {
                 pollForcingsProgress(progressFile); // Start polling for progress
             })
             fetch('/forcings', {
@@ -138,7 +139,7 @@ async function forcings() {
             .catch(error => {
                 console.error('Error:', error);
             }).finally(() => {
-                document.getElementById('forcings-button').disabled = false;
+                document.getElementById('forcings-button').disabled = false;
             });
         } else {
             alert('No existing geopackage found. Please subset the data before getting forcings');

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/modules/ngiab_data_cli/__main__.py RENAMED Viewed

@@ -177,7 +177,7 @@ def main() -> None:
             else:
                 logging.info("Subsetting hydrofabric")
                 include_outlet = True
-                if args.gage:
+                if args.gage or args.subset_type == "catchment":
                     include_outlet = False
                 subset(
                     feature_to_subset,

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/modules/ngiab_data_cli/arguments.py RENAMED Viewed

@@ -3,6 +3,7 @@ from datetime import datetime
 # Constants
 DATE_FORMAT = "%Y-%m-%d"  # used for datetime parsing
+DATE_FORMAT2 = "%Y-%m-%d %H:%M"  # used for datetime parsing
 DATE_FORMAT_HINT = "YYYY-MM-DD"  # printed in help message
@@ -91,13 +92,13 @@ def parse_arguments() -> argparse.Namespace:
     parser.add_argument(
         "--start_date",
         "--start",
-        type=lambda s: datetime.strptime(s, DATE_FORMAT),
+        type=lambda s: datetime.strptime(s, DATE_FORMAT) if len(s) == 10 else datetime.strptime(s, DATE_FORMAT2),
         help=f"Start date for forcings/realization (format {DATE_FORMAT_HINT})",
     )
     parser.add_argument(
         "--end_date",
         "--end",
-        type=lambda s: datetime.strptime(s, DATE_FORMAT),
+        type=lambda s: datetime.strptime(s, DATE_FORMAT) if len(s) == 10 else datetime.strptime(s, DATE_FORMAT2),
         help=f"End date for forcings/realization (format {DATE_FORMAT_HINT})",
     )
     parser.add_argument(
@@ -147,6 +148,13 @@ def parse_arguments() -> argparse.Namespace:
         choices=["aorc", "nwm"],
         default="nwm",
     )
+    parser.add_argument(
+        "--subset_type",
+        type=str,
+        help="By nexus: get everything flowing into the downstream nexus of the selected catchment. By catchment: get everything flowing into the selected catchment.",
+        choices=["nexus", "catchment"],
+        default="nexus",
+    )
     parser.add_argument(
         "-a",
         "--all",

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/modules/ngiab_data_preprocess.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ngiab_data_preprocess
-Version: 4.6.5
+Version: 4.6.7
 Summary: Graphical Tools for creating Next Gen Water model input data.
 Author-email: Josh Cunningham <jcunningham8@ua.edu>
 Project-URL: Homepage, https://github.com/CIROH-UA/NGIAB_data_preprocess
@@ -75,13 +75,13 @@ This repository contains tools for preparing data to run a [NextGen](https://git
 ## What does this tool do?
-This tool prepares data to run a NextGen-based simulation by creating a run package that can be used with NGIAB.
-It uses geometry and model attributes from the [v2.2 hydrofabric](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg) more information on [all data sources here](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html).
+This tool prepares data to run a NextGen-based simulation by creating a run package that can be used with NGIAB.
+It uses geometry and model attributes from the [v2.2 hydrofabric](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg) more information on [all data sources here](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html).
 The raw forcing data is [nwm retrospective v3 forcing](https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/index.html#CONUS/zarr/forcing/) data or the [AORC 1km gridded data](https://noaa-nws-aorc-v1-1-1km.s3.amazonaws.com/index.html) depending on user input
-1. **Subsets** (delineates) everything upstream of your point of interest (catchment, gage, flowpath etc) from the hydrofabric. This subset is output as a geopackage (.gpkg).
-2. Calculates **forcings** as a weighted mean of the gridded NWM or AORC forcings. Weights are calculated using [exact extract](https://isciences.github.io/exactextract/) and computed with numpy.
-3. Creates **configuration files** for a default NGIAB model run.
+1. **Subsets** (delineates) everything upstream of your point of interest (catchment, gage, flowpath etc) from the hydrofabric. This subset is output as a geopackage (.gpkg).
+2. Calculates **forcings** as a weighted mean of the gridded NWM or AORC forcings. Weights are calculated using [exact extract](https://isciences.github.io/exactextract/) and computed with numpy.
+3. Creates **configuration files** for a default NGIAB model run.
     -  realization.json  - ngen model configuration
     -  troute.yaml - routing configuration.
     -  **per catchment** model configuration
@@ -136,13 +136,13 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
 # Create a virtual environment in the current directory
 uv venv
-# Install the tool in the virtual environment
+# Install the tool in the virtual environment
 uv pip install ngiab_data_preprocess
 # To run the cli
 uv run cli --help
-# To run the map
+# To run the map
 uv run map_app
 ```
@@ -160,7 +160,7 @@ UV automatically detects any virtual environments in the current directory and w
 (notebook) jovyan@jupyter-user:~$ conda deactivate
 jovyan@jupyter-user:~$
 # The interactive map won't work on 2i2c
-```
+```
 ```bash
 # This tool is likely to not work without a virtual environment
@@ -205,7 +205,7 @@ To install and run the tool, follow these steps:
 Running the `map_app` tool will open the app in a new browser tab.
-Install-free: `uvx --from ngiab-data-preprocess map_app`
+Install-free: `uvx --from ngiab-data-preprocess map_app`
 Installed with uv: `uv run map_app`
 ## Using the map interface
@@ -225,7 +225,7 @@ Once all the steps are finished, you can run NGIAB on the folder shown underneat
 ## Running the CLI
-Install-free: `uvx ngiab-prep`
+Install-free: `uvx ngiab-prep`
 Installed with uv: `uv run cli`
 ## Arguments
@@ -236,6 +236,7 @@ Installed with uv: `uv run cli`
 - `-l`, `--latlon`: Use latitude and longitude instead of catid. Expects comma-separated values via the CLI, e.g., `python -m ngiab_data_cli -i 54.33,-69.4 -l -s`.
 - `-g`, `--gage`: Use gage ID instead of catid. Expects a single gage ID via the CLI, e.g., `python -m ngiab_data_cli -i 01646500 -g -s`.
 - `-s`, `--subset`: Subset the hydrofabric to the given feature.
+- `--subset_type`: Specify the subset type. `nexus`: get everything flowing into the downstream nexus of the selected catchment. `catchment`: get everything flowing into the selected catchment.
 - `-f`, `--forcings`: Generate forcings for the given feature.
 - `-r`, `--realization`: Create a realization for the given feature.
 - `--lstm`: Configures the data for the [python lstm](https://github.com/ciroh-ua/lstm/).
@@ -259,7 +260,7 @@ Installed with uv: `uv run cli`
 1. Prepare everything for an NGIAB run at a given gage:
    ```bash
-   uvx ngiab-prep -i gage-10154200 -sfr --start 2022-01-01 --end 2022-02-28
+   uvx ngiab-prep -i gage-10154200 -sfr --start 2022-01-01 --end 2022-02-28
    #         add --run or replace -sfr with --all to run NGIAB, too
    # to name the folder, add -o folder_name
    ```

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/modules/ngiab_data_preprocess.egg-info/SOURCES.txt RENAMED Viewed

@@ -49,4 +49,5 @@ modules/ngiab_data_preprocess.egg-info/dependency_links.txt
 modules/ngiab_data_preprocess.egg-info/entry_points.txt
 modules/ngiab_data_preprocess.egg-info/requires.txt
 modules/ngiab_data_preprocess.egg-info/top_level.txt
-tests/test_nan_impute.py
+tests/test_nan_impute.py
+tests/test_ngiab_data_cli_regression.py

{ngiab_data_preprocess-4.6.5 → ngiab_data_preprocess-4.6.7}/pyproject.toml RENAMED Viewed

@@ -19,7 +19,7 @@ filterwarnings = [
 ]
 [project]
 name = "ngiab_data_preprocess"
-version = "v4.6.5"
+version = "v4.6.7"
 authors = [{ name = "Josh Cunningham", email = "jcunningham8@ua.edu" }]
 description = "Graphical Tools for creating Next Gen Water model input data."
 readme = "README.md"

ngiab_data_preprocess-4.6.7/tests/test_ngiab_data_cli_regression.py ADDED Viewed

@@ -0,0 +1,375 @@
+import logging
+import shutil
+import subprocess
+from pathlib import Path
+import geopandas as gpd
+import numpy as np
+import pytest
+import xarray as xr
+from data_processing.file_paths import FilePaths
+logger = logging.getLogger(__name__)
+logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
+CONFIG_PATH = FilePaths.config_file
+def run_cli(input_id, start_date, end_date, output_name, source="aorc"):
+    """Run the CLI and return output paths."""
+    # Read config to get output root
+    with open(CONFIG_PATH, "r") as f:
+        output_root = Path(f.readline().strip()).expanduser()
+    output_path = output_root / output_name
+    # Clean up any existing output directory
+    if output_path.exists():
+        shutil.rmtree(output_path)
+    cmd = [
+        "uv",
+        "run",
+        "cli",
+        "-i",
+        input_id,
+        "-s",
+        "-f",
+        "--start_date",
+        start_date,
+        "--end_date",
+        end_date,
+        "--source",
+        source,
+        "-o",
+        output_name,
+    ]
+    try:
+        subprocess.run(cmd, capture_output=True, text=True, check=True, timeout=600)
+    except subprocess.CalledProcessError as e:
+        pytest.fail(f"CLI failed: {e.stderr}")
+    except subprocess.TimeoutExpired:
+        pytest.fail("CLI timed out")
+    assert output_path.exists(), f"Output directory not created: {output_path}"
+    return {
+        "output_dir": output_path,
+        "start_date": start_date,
+        "end_date": end_date,
+        "gpkg_path": output_path / "config" / f"{output_name}_subset.gpkg",
+        "raw_nc": output_path / "forcings" / "raw_gridded_data.nc",
+        "forcings_nc": output_path / "forcings" / "forcings.nc",
+    }
+@pytest.fixture(scope="module")
+def cat_1555522_output():
+    """Single catchment test: cat-1555522, 1 day."""
+    return run_cli("cat-1555522", "2020-01-01", "2020-01-02", "test_cat_1555522")
+@pytest.fixture(scope="module")
+def gage_10109001_output():
+    """Multi-catchment gage test: gage-10109001, 9 days."""
+    return run_cli("gage-10109001", "2019-10-01", "2019-10-10", "test_gage_10109001")
+# =============================================================================
+# Test configurations
+# =============================================================================
+GEOPACKAGE_LAYERS = [
+    "divides",
+    "divide-attributes",
+    "flowpath-attributes",
+    "flowpath-attributes-ml",
+    "flowpaths",
+    "hydrolocations",
+    "nexus",
+    "pois",
+    "lakes",
+    "network",
+]
+FORCING_VARS = [
+    "SPFH_2maboveground",
+    "DSWRF_surface",
+    "VGRD_10maboveground",
+    "DLWRF_surface",
+    "APCP_surface",
+    "UGRD_10maboveground",
+    "PRES_surface",
+    "TMP_2maboveground",
+    "precip_rate",
+    "ids",
+    "Time",
+]
+PHYSICAL_RANGES = {
+    "TMP_2maboveground": (200, 330),
+    "PRES_surface": (50000, 110000),
+    "SPFH_2maboveground": (0, 0.05),
+    "DSWRF_surface": (0, 1400),
+    "DLWRF_surface": (0, 600),
+    "APCP_surface": (0, 500),
+    "precip_rate": (0, 0.2),
+}
+CAT_1555522_REGRESSION = {
+    "dims": {"catchment-id": 1, "time": 25},
+    "catchment_ids": ["cat-1555522"],
+    "table_counts": {"divides": 1, "flowpaths": 1, "nexus": 1},
+    "stats": {
+        "TMP_2maboveground": {"min": 270.04, "max": 287.06, "mean": 276.30},
+        "PRES_surface": {"min": 96235.0, "max": 97941.0, "mean": 97159.5},
+        "DSWRF_surface": {"min": 0.0, "max": 366.91, "mean": 85.40},
+        "DLWRF_surface": {"min": 207.95, "max": 248.33, "mean": 222.67},
+        "SPFH_2maboveground": {"min": 0.0024, "max": 0.00464, "mean": 0.00313},
+    },
+    "sample_values": {
+        "TMP_2maboveground": [275.938, 274.598, 273.735, 272.985, 272.476],
+        "PRES_surface": [97941.0, 97872.3, 97857.2, 97852.3, 97829.1],
+    },
+    "time_values": [1577836800, 1577840400, 1577844000, 1577847600, 1577851200],
+}
+GAGE_10109001_REGRESSION = {
+    "dims": {"catchment-id": 88, "time": 217},
+    "catchment_ids": [
+        "cat-2861379",
+        "cat-2861380",
+        "cat-2861387",
+        "cat-2861414",
+        "cat-2861421",
+        "cat-2861429",
+        "cat-2861431",
+        "cat-2861436",
+        "cat-2861438",
+        "cat-2861442",
+    ],  # First 10 for spot check
+    "table_counts": {"divides": 88, "flowpaths": 88, "nexus": 38},
+    "stats": {
+        "TMP_2maboveground": {"min": 266.08, "max": 293.25, "mean": 276.13},
+        "PRES_surface": {"min": 72895.4, "max": 85003.4, "mean": 77537.8},
+        "DSWRF_surface": {"min": 0.0, "max": 711.17, "mean": 179.39},
+        "DLWRF_surface": {"min": 177.51, "max": 322.51, "mean": 222.13},
+        "SPFH_2maboveground": {"min": 0.00122, "max": 0.00588, "mean": 0.00333},
+        "APCP_surface": {"min": 0.0, "max": 4.696, "mean": 0.0233},
+    },
+    "sample_values": {
+        "TMP_2maboveground": [274.370, 272.429, 270.498, 268.974, 269.294],
+        "PRES_surface": [74866.3, 74861.7, 74884.5, 74898.7, 74877.5],
+    },
+    "time_values": [1569888000, 1569891600, 1569895200, 1569898800, 1569902400],
+}
+# =============================================================================
+# cat-1555522 Tests (Single Catchment)
+# =============================================================================
+class TestCat1555522Geopackage:
+    """Geopackage tests for cat-1555522."""
+    def test_geopackage_layers(self, cat_1555522_output):
+        gpkg = cat_1555522_output["gpkg_path"]
+        assert gpkg.exists()
+        actual = set(gpd.list_layers(gpkg)["name"])
+        assert not (set(GEOPACKAGE_LAYERS) - actual), (
+            f"Missing layers: {set(GEOPACKAGE_LAYERS) - actual}"
+        )
+    @pytest.mark.parametrize("layer", ["divides", "flowpaths", "nexus"])
+    def test_table_row_counts(self, cat_1555522_output, layer):
+        gdf = gpd.read_file(cat_1555522_output["gpkg_path"], layer=layer)
+        assert len(gdf) == CAT_1555522_REGRESSION["table_counts"][layer]
+class TestCat1555522GriddedForcings:
+    """Raw gridded forcing tests for cat-1555522."""
+    def test_netcdf_structure(self, cat_1555522_output):
+        nc = cat_1555522_output["raw_nc"]
+        assert nc.exists()
+        with xr.open_dataset(nc) as ds:
+            assert "time" in ds.dims
+            assert any(d in ds.dims for d in ("x", "lon"))
+            assert any(d in ds.dims for d in ("y", "lat"))
+    def test_netcdf_time_range(self, cat_1555522_output):
+        with xr.open_dataset(cat_1555522_output["raw_nc"]) as ds:
+            assert ds.time.min().values >= np.datetime64(cat_1555522_output["start_date"])
+            assert ds.time.max().values <= np.datetime64(cat_1555522_output["end_date"])
+class TestCat1555522ProcessedForcings:
+    """Processed forcing tests for cat-1555522."""
+    def test_structure(self, cat_1555522_output):
+        nc = cat_1555522_output["forcings_nc"]
+        assert nc.exists()
+        with xr.open_dataset(nc) as ds:
+            assert ds.sizes["catchment-id"] == CAT_1555522_REGRESSION["dims"]["catchment-id"]
+            assert ds.sizes["time"] == CAT_1555522_REGRESSION["dims"]["time"]
+            for var in FORCING_VARS:
+                assert var in ds.data_vars or var in ds.coords
+    def test_catchment_ids(self, cat_1555522_output):
+        gpkg_ids = set(gpd.read_file(cat_1555522_output["gpkg_path"], layer="divides")["divide_id"])
+        with xr.open_dataset(cat_1555522_output["forcings_nc"]) as ds:
+            nc_ids = set(ds["ids"].values)
+        assert gpkg_ids == nc_ids
+    def test_value_ranges(self, cat_1555522_output):
+        with xr.open_dataset(cat_1555522_output["forcings_nc"]) as ds:
+            for var, (lo, hi) in PHYSICAL_RANGES.items():
+                if var in ds.data_vars:
+                    data = ds[var].values
+                    assert np.nanmin(data) >= lo, f"{var} below min"
+                    assert np.nanmax(data) <= hi, f"{var} above max"
+    def test_regression_stats(self, cat_1555522_output):
+        with xr.open_dataset(cat_1555522_output["forcings_nc"]) as ds:
+            for var, expected in CAT_1555522_REGRESSION["stats"].items():
+                data = ds[var].values
+                np.testing.assert_allclose(np.nanmin(data), expected["min"], rtol=0.01)
+                np.testing.assert_allclose(np.nanmax(data), expected["max"], rtol=0.01)
+                np.testing.assert_allclose(np.nanmean(data), expected["mean"], rtol=0.01)
+    def test_regression_sample_values(self, cat_1555522_output):
+        with xr.open_dataset(cat_1555522_output["forcings_nc"]) as ds:
+            for var, expected in CAT_1555522_REGRESSION["sample_values"].items():
+                actual = ds[var].isel({"catchment-id": 0, "time": slice(0, 5)}).values
+                np.testing.assert_allclose(actual, expected, rtol=0.001)
+    def test_regression_time_values(self, cat_1555522_output):
+        with xr.open_dataset(cat_1555522_output["forcings_nc"]) as ds:
+            actual = ds["Time"].isel({"catchment-id": 0, "time": slice(0, 5)}).values.tolist()
+            assert actual == CAT_1555522_REGRESSION["time_values"]
+# =============================================================================
+# gage-10109001 Tests (Multi-Catchment)
+# =============================================================================
+class TestGage10109001Geopackage:
+    """Geopackage tests for gage-10109001."""
+    def test_geopackage_layers(self, gage_10109001_output):
+        gpkg = gage_10109001_output["gpkg_path"]
+        assert gpkg.exists()
+        actual = set(gpd.list_layers(gpkg)["name"])
+        assert not (set(GEOPACKAGE_LAYERS) - actual)
+    @pytest.mark.parametrize("layer", ["divides", "flowpaths", "nexus"])
+    def test_table_row_counts(self, gage_10109001_output, layer):
+        gdf = gpd.read_file(gage_10109001_output["gpkg_path"], layer=layer)
+        assert len(gdf) == GAGE_10109001_REGRESSION["table_counts"][layer]
+class TestGage10109001GriddedForcings:
+    """Raw gridded forcing tests for gage-10109001."""
+    def test_netcdf_structure(self, gage_10109001_output):
+        nc = gage_10109001_output["raw_nc"]
+        assert nc.exists()
+        with xr.open_dataset(nc) as ds:
+            assert "time" in ds.dims
+            assert any(d in ds.dims for d in ("x", "lon"))
+            assert any(d in ds.dims for d in ("y", "lat"))
+    def test_netcdf_time_range(self, gage_10109001_output):
+        with xr.open_dataset(gage_10109001_output["raw_nc"]) as ds:
+            assert ds.time.min().values >= np.datetime64(gage_10109001_output["start_date"])
+            assert ds.time.max().values <= np.datetime64(gage_10109001_output["end_date"])
+class TestGage10109001ProcessedForcings:
+    """Processed forcing tests for gage-10109001."""
+    def test_structure(self, gage_10109001_output):
+        nc = gage_10109001_output["forcings_nc"]
+        assert nc.exists()
+        with xr.open_dataset(nc) as ds:
+            assert ds.sizes["catchment-id"] == GAGE_10109001_REGRESSION["dims"]["catchment-id"]
+            assert ds.sizes["time"] == GAGE_10109001_REGRESSION["dims"]["time"]
+            for var in FORCING_VARS:
+                assert var in ds.data_vars or var in ds.coords
+    def test_catchment_ids_subset(self, gage_10109001_output):
+        """Check that expected catchment IDs are present (spot check first 10)."""
+        with xr.open_dataset(gage_10109001_output["forcings_nc"]) as ds:
+            nc_ids = set(ds["ids"].values)
+        for cat_id in GAGE_10109001_REGRESSION["catchment_ids"]:
+            assert cat_id in nc_ids
+    def test_catchment_ids_match_gpkg(self, gage_10109001_output):
+        gpkg_ids = set(
+            gpd.read_file(gage_10109001_output["gpkg_path"], layer="divides")["divide_id"]
+        )
+        with xr.open_dataset(gage_10109001_output["forcings_nc"]) as ds:
+            nc_ids = set(ds["ids"].values)
+        assert gpkg_ids == nc_ids
+    def test_value_ranges(self, gage_10109001_output):
+        with xr.open_dataset(gage_10109001_output["forcings_nc"]) as ds:
+            for var, (lo, hi) in PHYSICAL_RANGES.items():
+                if var in ds.data_vars:
+                    data = ds[var].values
+                    assert np.nanmin(data) >= lo, f"{var} below min"
+                    assert np.nanmax(data) <= hi, f"{var} above max"
+    def test_no_all_nan(self, gage_10109001_output):
+        with xr.open_dataset(gage_10109001_output["forcings_nc"]) as ds:
+            for var in ds.data_vars:
+                if ds[var].dtype in (np.float32, np.float64):
+                    assert not np.all(np.isnan(ds[var].values)), f"{var} is all NaN"
+    def test_regression_stats(self, gage_10109001_output):
+        with xr.open_dataset(gage_10109001_output["forcings_nc"]) as ds:
+            for var, expected in GAGE_10109001_REGRESSION["stats"].items():
+                data = ds[var].values
+                np.testing.assert_allclose(np.nanmin(data), expected["min"], rtol=0.01)
+                np.testing.assert_allclose(np.nanmax(data), expected["max"], rtol=0.01)
+                np.testing.assert_allclose(np.nanmean(data), expected["mean"], rtol=0.01)
+    def test_regression_sample_values(self, gage_10109001_output):
+        with xr.open_dataset(gage_10109001_output["forcings_nc"]) as ds:
+            for var, expected in GAGE_10109001_REGRESSION["sample_values"].items():
+                actual = ds[var].isel({"catchment-id": 0, "time": slice(0, 5)}).values
+                np.testing.assert_allclose(actual, expected, rtol=0.001)
+    def test_regression_time_values(self, gage_10109001_output):
+        with xr.open_dataset(gage_10109001_output["forcings_nc"]) as ds:
+            actual = ds["Time"].isel({"catchment-id": 0, "time": slice(0, 5)}).values.tolist()
+            assert actual == GAGE_10109001_REGRESSION["time_values"]
+# =============================================================================
+# End-to-End Tests
+# =============================================================================
+class TestEndToEnd:
+    """End-to-end integration tests."""
+    @pytest.mark.parametrize("fixture_name", ["cat_1555522_output", "gage_10109001_output"])
+    def test_complete_pipeline(self, fixture_name, request):
+        output = request.getfixturevalue(fixture_name)
+        assert output["gpkg_path"].exists()
+        assert output["raw_nc"].exists()
+        assert output["forcings_nc"].exists()
+    @pytest.mark.parametrize("fixture_name", ["cat_1555522_output", "gage_10109001_output"])
+    def test_output_size_reasonable(self, fixture_name, request):
+        output = request.getfixturevalue(fixture_name)
+        size_mb = sum(f.stat().st_size for f in output["output_dir"].rglob("*") if f.is_file()) / (
+            1024 * 1024
+        )
+        assert 0.1 < size_mb < 1000, f"Suspicious output size: {size_mb:.2f} MB"
+if __name__ == "__main__":
+    pytest.main([__file__, "-v", "-s"])