PyPI - xarray-dbd - Versions diffs - 0.2.3__tar.gz → 0.2.6__tar.gz - Mend

xarray-dbd 0.2.3tar.gz → 0.2.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (62) hide show

xarray_dbd-0.2.6/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,126 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.2.6] - 2026-03-30
+### Added
+- `--list-sensors` flag for `dbd2nc` CLI to print available sensors without conversion
+- `batch_size` parameter for `write_multi_dbd_netcdf()` (was hardcoded at 100)
+- Signal handling in `mkone` — Ctrl+C now terminates child processes cleanly
+- "Working with Glider Data" section in README (sensor discovery, time conversion, fill values)
+- Tests for `get_CTD_sync`, `determine_ctd_type`, `get_global_time_range`, file ordering, batch boundaries
+### Changed
+- `get_sync()` logs interpolation failures at WARNING level instead of INFO
+- Streaming writer logs summary when batches are skipped due to errors
+- `set_time_limits()` accepts numeric epoch seconds in addition to date strings
+- C++ `SensorsMap::setUpForData()` validates sensor byte sizes across files
+### Fixed
+- **Data loss in streaming writer**: removed Python-side double-skip at batch boundaries (C++ already handles `skip_first_record`)
+- **dbdreader2 file ordering**: pass `presorted=True` to `read_dbd_files` so C++ respects chronological order from `DBDList.sort()`
+- **mkone worker error propagation**: workers now exit non-zero on failure so parent detects errors
+- **`_get_with_source` time ordering**: results now sorted by time for consistency with normal `get()` path
+- **`sci_extensions` missing `.sbd`**: file pairing now recognizes `.sbd` as a science file type
+- **`set_time_limits` falsy check**: epoch time 0 no longer causes spurious ValueError
+- **inf-to-NaN for repeated values**: code=1 (repeat) now converts infinity consistently with code=2 (new value)
+- Removed unused `"j"` dimension from `DBDDataStore.get_dimensions()`
+- Fixed `--skip-first` help text (was stale after skip semantics change)
+- Fixed README: CLI command names, removed false wildcard `to_keep` claim
+## [0.2.5] - 2026-03-30
+### Added
+- `sort` parameter for `open_multi_dbd_dataset()` and `write_multi_dbd_netcdf()` with three modes: `"header_time"` (default, sort by `fileopen_time` from each file's DBD header), `"lexicographic"`, and `"none"` (preserve caller's order)
+- `--sort` CLI flag for `dbd2nc`, `mkone`, and `2csv` commands
+- `presorted` parameter for `read_dbd_files()` C++ binding to skip internal lexicographic sort when files are pre-sorted by Python
+- `sensor_size` attribute on variables from `open_multi_dbd_dataset()`, matching single-file behavior
+- `--skip-first` flag for `mkone` as consistent alias for the inverse `--keep-first`
+- Duplicate file detection and deduplication with warning in multi-file functions
+- Output directory auto-creation in `write_multi_dbd_netcdf()`
+- "Choosing an API" and "Slocum File Types" sections in README
+- Fill value and CF-compliance guidance in README Known Limitations
+### Changed
+- `skip_first_record` in `read_dbd_files()` now skips the first record of **all** files (including the first), matching Lucas Merckelbach's dbdreader behavior
+- Streaming NetCDF writer keeps a single file handle open instead of reopening per batch
+### Fixed
+- File ordering for TWR-style filenames (e.g. `ce_1137-2026-085-1-10.dbd` incorrectly sorting before `-2.dbd` under lexicographic sort)
+- `_parse_fileopen_time()` now logs a warning instead of silently sorting unparseable files to end
+- `DBD.get_fileopen_time()` no longer raises on unparseable header values
+- Thread-safe random number generator in C++ cache file creation
+- Integer overflow guard in C++ column capacity doubling
+## [0.2.3] - 2026-02-23
+### Added
+- `include_source` support in `MultiDBD.get()` — returns per-record source DBD references, matching dbdreader's API
+- `continue_on_reading_error` parameter for `MultiDBD.get()` — skip corrupted files instead of raising, matching dbdreader v0.5.9
+- `DBD_ERROR_READ_ERROR` error code (14) for compatibility with dbdreader
+- Python 3.14 pre-built wheels for all platforms (Linux, macOS, Windows)
+- Attribution to Lucas Merckelbach's [dbdreader](https://github.com/smerckel/dbdreader) in README
+## [0.2.2] - 2026-02-23
+### Added
+- `preload` parameter for `DBD` and `MultiDBD` constructors
+- Changelog configuration and tag/version validation in publish workflow
+### Fixed
+- mypy errors: `datetime.UTC`, tuple assignments, type annotations
+- ruff formatting compliance
+## [0.2.1] - 2026-02-22
+### Added
+- Streaming NetCDF writer (`write_multi_dbd_netcdf`) for low-memory batch conversion
+- dbdreader-compatible API layer (`DBD` and `MultiDBD` classes in `xarray_dbd.dbdreader2`)
+- Unified CLI under `xdbd` command with subcommands (`2nc`, `mkone`, `2csv`, `missions`, `cache`)
+- Monotonicity check in `get_sync()` to prevent silent wrong results from `np.interp`
+### Changed
+- CLI restructured: standalone `dbd2nc` and `mkone` commands replaced by `xdbd 2nc` and `xdbd mkone`
+- Streaming mode is now the default for non-append `2nc` and `mkone` (requires netCDF4)
+- Fill values corrected: -127 for int8, -32768 for int16 (matching C++ dbd2netCDF standalone)
+- Multi-file reader uses read-copy-discard strategy to reduce peak memory ~53%
+- Replaced inf with NaN in float reads to match C++ dbd2netCDF behavior
+### Fixed
+- Multi-file parse dropping records from unfactored DBD files
+- Corrupted file recovery: discard partial record on I/O error
+## [0.1.0] - 2026-02-20
+### Added
+- C++ backend via pybind11 wrapping [dbd2netCDF](https://github.com/mousebrains/dbd2netcdf) parser
+- Native xarray engine integration (`xr.open_dataset(f, engine="dbd")`)
+- Multi-file reading with `open_multi_dbd_dataset()` using C++ SensorsMap two-pass approach
+- CLI tools: `dbd2nc` for single/multi-file conversion, `mkone` for batch directory processing
+- Native dtype support: int8, int16, float32, float64 columns (no double-conversion overhead)
+- LZ4 decompression for compressed `.?cd` files
+- Sensor filtering (`to_keep`), mission filtering (`skip_missions`/`keep_missions`)
+- Corrupted file recovery with `repair=True`
+- Python 3.10+ and free-threaded Python (PEP 703) support
+### Changed
+- Replaced pure-Python parser with C++ pybind11 extension for ~5x performance improvement
+- Fill values: NaN for float32/float64, -127 for int8, -32768 for int16 (matching C++ dbd2netCDF)

{xarray_dbd-0.2.3 → xarray_dbd-0.2.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: xarray-dbd
-Version: 0.2.3
+Version: 0.2.6
 Summary: Efficient xarray backend for reading glider DBD files
 Keywords: glider,oceanography,dbd,slocum,xarray,netcdf
 Author-Email: Pat Welch <pat@mousebrains.com>
@@ -41,7 +41,7 @@ Description-Content-Type: text/markdown
 [![License](https://img.shields.io/pypi/l/xarray-dbd)](License.txt)
 [![CI](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/ci.yml/badge.svg)](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/ci.yml)
 [![CodeQL](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/codeql.yml/badge.svg)](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/codeql.yml)
-[![Codecov](https://codecov.io/gh/mousebrains/dbd2netcdf-python/branch/main/graph/badge.svg)](https://codecov.io/gh/mousebrains/dbd2netcdf-python)
+[![codecov](https://codecov.io/gh/mousebrains/dbd2netcdf-python/graph/badge.svg?token=EJQEIVEB0U)](https://codecov.io/gh/mousebrains/dbd2netcdf-python)
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 An efficient xarray backend for reading Dinkum Binary Data (DBD) files from
@@ -74,7 +74,7 @@ pip install xarray-dbd
 For the CLI tools only:
 ```bash
-pipx install xarray-dbd   # installs dbd2nc and mkone commands
+pipx install xarray-dbd   # installs xdbd command (xdbd 2nc, xdbd mkone, etc.)
 ```
 Or install from source (requires a C++ compiler and CMake):
@@ -147,6 +147,30 @@ ds = xdbd.open_multi_dbd_dataset(
 )
 ```
+### File sort order
+By default, files are sorted by the `fileopen_time` timestamp in each file's
+header, which is correct regardless of filename convention. Alternative sort
+modes are available:
+```python
+# Default: sort by header timestamp (universally correct)
+ds = xdbd.open_multi_dbd_dataset(files)
+# Sort by filename (lexicographic)
+ds = xdbd.open_multi_dbd_dataset(files, sort="lexicographic")
+# Preserve the caller's order (no sorting)
+ds = xdbd.open_multi_dbd_dataset(files, sort="none")
+```
+The `--sort` flag is also available on all CLI commands:
+```bash
+dbd2nc --sort lexicographic -C cache -o output.nc *.dbd
+mkone --sort none --output-prefix /path/to/output/ /path/to/raw/
+```
 ### Advanced options
 ```python
@@ -154,7 +178,7 @@ ds = xdbd.open_dbd_dataset(
     'test.sbd',
     skip_first_record=True,  # Skip first record (default)
     repair=True,             # Attempt to repair corrupted data
-    to_keep=['m_*'],         # Keep sensors matching pattern (future feature)
+    to_keep=['m_depth', 'm_lat'],  # Keep only these sensors
     criteria=['m_present_time'],  # Sensors for record selection
 )
 ```
@@ -189,6 +213,7 @@ Open a single DBD file as an xarray Dataset.
 - `to_keep` (list of str): Sensor names to keep (default: all)
 - `criteria` (list of str): Sensor names for selection criteria
 - `drop_variables` (list of str): Variables to exclude
+- `cache_dir` (str, Path, or None): Directory for sensor cache files
 **Returns:** `xarray.Dataset`
@@ -204,9 +229,32 @@ Open multiple DBD files as a single concatenated xarray Dataset.
 - `criteria` (list of str): Sensor names for selection criteria
 - `skip_missions` (list of str): Mission names to skip
 - `keep_missions` (list of str): Mission names to keep
+- `cache_dir` (str, Path, or None): Directory for sensor cache files
+- `sort` (str): File sort order — `"header_time"` (default, sort by `fileopen_time` from each file's header), `"lexicographic"`, or `"none"` (preserve caller's order).
 **Returns:** `xarray.Dataset`
+### `write_multi_dbd_netcdf(filenames, output, **kwargs)`
+Stream multiple DBD files directly to a NetCDF file without loading all data
+into memory. Preferred for large datasets (100+ files).
+**Parameters:**
+- `filenames` (iterable): Paths to DBD files (duplicates removed automatically)
+- `output` (str or Path): Output NetCDF file path (parent directory created if needed)
+- `skip_first_record` (bool): Skip first record in each file (default: True)
+- `repair` (bool): Attempt to repair corrupted records (default: False)
+- `to_keep` (list of str): Sensor names to keep (default: all)
+- `criteria` (list of str): Sensor names for selection criteria
+- `skip_missions` (list of str): Mission names to skip
+- `keep_missions` (list of str): Mission names to keep
+- `cache_dir` (str, Path, or None): Directory for sensor cache files
+- `compression` (int): Zlib compression level 0-9 (default: 5, 0 disables)
+- `sort` (str): File sort order (default: `"header_time"`)
+- `batch_size` (int): Files per batch (default: 100; smaller reduces peak memory)
+**Returns:** `tuple[int, int]` — (n_records, n_files)
 ## Migration from dbdreader
 The dbdreader2 API is derived from Lucas Merckelbach's
@@ -353,9 +401,8 @@ mdbd = dbdreader.MultiDBD(
   to batch additional sensors into the first `get()` call.
 - **`skip_initial_line` semantics.** When reading multiple files, the
-  first contributing file keeps all its records; subsequent files skip
-  their first record. dbdreader skips the first record of every file.
-  Multi-file record counts may therefore differ by up to N-1.
+  first record of every file is skipped (matching dbdreader). Multi-file
+  record counts should match dbdreader exactly.
 - **Float64 output.** `get()` always returns float64 arrays, matching
   dbdreader's behavior. Integer fill values (-127 for int8, -32768 for
@@ -472,6 +519,7 @@ print(f"Depth units: {ds['m_depth'].attrs['units']}")
 ### Working with trajectories
 ```python
+from pathlib import Path
 import xarray_dbd as xdbd
 import matplotlib.pyplot as plt
@@ -492,6 +540,7 @@ plt.show()
 ### Extracting science data
 ```python
+from pathlib import Path
 # Read full resolution science data
 files = sorted(Path('.').glob('*.ebd'))
 ds = xdbd.open_multi_dbd_dataset(
@@ -504,6 +553,74 @@ df = ds.to_dataframe()
 print(df.describe())
 ```
+## Choosing an API
+| Scenario | Recommended API |
+|----------|----------------|
+| Single file, quick look | `xr.open_dataset(f, engine="dbd")` |
+| Multiple files, < 1 GB | `xdbd.open_multi_dbd_dataset(files, to_keep=[...])` |
+| Multiple files, large dataset | `xdbd.write_multi_dbd_netcdf(files, "out.nc")` |
+| Interactive / Jupyter | `xdbd.MultiDBD(filenames=files)` with `.get()` (lazy) |
+| Batch processing 1000+ files | `mkone` CLI (multiprocessing) |
+| Drop-in dbdreader replacement | `import xarray_dbd.dbdreader2 as dbdreader` |
+## Slocum File Types
+| Extension | Name | Contents |
+|-----------|------|----------|
+| `.dbd` / `.dcd` | Flight | Vehicle sensors: depth, attitude, speed, GPS |
+| `.ebd` / `.ecd` | Science | Payload sensors: CTD, optics, oxygen |
+| `.sbd` / `.scd` | Short burst | Surface telemetry summary records |
+| `.tbd` / `.tcd` | Technical | Detailed engineering telemetry |
+| `.mbd` / `.mcd` | Mini | Compact engineering subset |
+| `.nbd` / `.ncd` | Narrow | Compact science subset |
+Compressed variants (`.?cd`) use LZ4 framing and are handled transparently.
+## Working with Glider Data
+### Discovering available sensors
+```python
+import xarray_dbd as xdbd
+# xarray API
+ds = xdbd.open_dbd_dataset("file.dbd", cache_dir="cache")
+for var in sorted(ds.data_vars):
+    print(f"  {var:30s} {ds[var].attrs.get('units', '')}")
+# dbdreader2 API
+dbd = xdbd.MultiDBD(pattern="*.dbd", cacheDir="cache")
+for name in sorted(dbd.parameterNames["eng"]):
+    print(f"  {name:30s} {dbd.parameterUnits.get(name, '')}")
+```
+Sensor naming conventions are documented in
+[TWR's masterdata files](https://gliderfs2.ceoas.oregonstate.edu/gliderweb/masterdata/).
+### Time conversion
+`m_present_time` contains UTC seconds since 1970-01-01 (Unix epoch, float64):
+```python
+import pandas as pd
+time = pd.to_datetime(ds["m_present_time"].values, unit="s", utc=True)
+```
+### Handling fill values
+Float sensors use NaN for missing data. Integer sensors use sentinel fill
+values (-127 for int8, -32768 for int16). Filter them out:
+```python
+# xarray — replace sentinels with NaN
+ds = ds.where(ds != -32768)
+# dbdreader2 — automatic filtering (default)
+t, v = dbd.get("m_depth")  # return_nans=False by default
+```
 ## Known Limitations
 - **Python 3.10+ required** — uses `from __future__ import annotations` for modern type-hint syntax.
@@ -514,6 +631,18 @@ print(df.describe())
 - **No lazy loading for xarray API** — `open_dataset()` reads all sensor data
   into memory. For very large deployments, use `to_keep` to select only needed
   sensors. The dbdreader2 API (`DBD`/`MultiDBD`) uses lazy incremental loading.
+- **Fill values in xarray output** — Integer sensors use sentinel fill values
+  (-127 for int8, -32768 for int16) rather than NaN. Between dives, science
+  sensors may contain these sentinels or NaN. Filter with
+  `ds.where(ds != -32768)` or use the dbdreader2 `get(return_nans=False)` API
+  which filters automatically.
+- **Not CF-compliant** — NetCDF output preserves sensor `units` but does not
+  add CF attributes (`standard_name`, `axis`, `calendar`). Add metadata
+  post-hoc for publication, e.g.:
+  ```python
+  ds["m_present_time"].attrs["axis"] = "T"
+  ds["m_present_time"].attrs["units"] = "seconds since 1970-01-01"
+  ```
 ## Troubleshooting

{xarray_dbd-0.2.3 → xarray_dbd-0.2.6}/README.md RENAMED Viewed

@@ -5,7 +5,7 @@
 [![License](https://img.shields.io/pypi/l/xarray-dbd)](License.txt)
 [![CI](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/ci.yml/badge.svg)](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/ci.yml)
 [![CodeQL](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/codeql.yml/badge.svg)](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/codeql.yml)
-[![Codecov](https://codecov.io/gh/mousebrains/dbd2netcdf-python/branch/main/graph/badge.svg)](https://codecov.io/gh/mousebrains/dbd2netcdf-python)
+[![codecov](https://codecov.io/gh/mousebrains/dbd2netcdf-python/graph/badge.svg?token=EJQEIVEB0U)](https://codecov.io/gh/mousebrains/dbd2netcdf-python)
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 An efficient xarray backend for reading Dinkum Binary Data (DBD) files from
@@ -38,7 +38,7 @@ pip install xarray-dbd
 For the CLI tools only:
 ```bash
-pipx install xarray-dbd   # installs dbd2nc and mkone commands
+pipx install xarray-dbd   # installs xdbd command (xdbd 2nc, xdbd mkone, etc.)
 ```
 Or install from source (requires a C++ compiler and CMake):
@@ -111,6 +111,30 @@ ds = xdbd.open_multi_dbd_dataset(
 )
 ```
+### File sort order
+By default, files are sorted by the `fileopen_time` timestamp in each file's
+header, which is correct regardless of filename convention. Alternative sort
+modes are available:
+```python
+# Default: sort by header timestamp (universally correct)
+ds = xdbd.open_multi_dbd_dataset(files)
+# Sort by filename (lexicographic)
+ds = xdbd.open_multi_dbd_dataset(files, sort="lexicographic")
+# Preserve the caller's order (no sorting)
+ds = xdbd.open_multi_dbd_dataset(files, sort="none")
+```
+The `--sort` flag is also available on all CLI commands:
+```bash
+dbd2nc --sort lexicographic -C cache -o output.nc *.dbd
+mkone --sort none --output-prefix /path/to/output/ /path/to/raw/
+```
 ### Advanced options
 ```python
@@ -118,7 +142,7 @@ ds = xdbd.open_dbd_dataset(
     'test.sbd',
     skip_first_record=True,  # Skip first record (default)
     repair=True,             # Attempt to repair corrupted data
-    to_keep=['m_*'],         # Keep sensors matching pattern (future feature)
+    to_keep=['m_depth', 'm_lat'],  # Keep only these sensors
     criteria=['m_present_time'],  # Sensors for record selection
 )
 ```
@@ -153,6 +177,7 @@ Open a single DBD file as an xarray Dataset.
 - `to_keep` (list of str): Sensor names to keep (default: all)
 - `criteria` (list of str): Sensor names for selection criteria
 - `drop_variables` (list of str): Variables to exclude
+- `cache_dir` (str, Path, or None): Directory for sensor cache files
 **Returns:** `xarray.Dataset`
@@ -168,9 +193,32 @@ Open multiple DBD files as a single concatenated xarray Dataset.
 - `criteria` (list of str): Sensor names for selection criteria
 - `skip_missions` (list of str): Mission names to skip
 - `keep_missions` (list of str): Mission names to keep
+- `cache_dir` (str, Path, or None): Directory for sensor cache files
+- `sort` (str): File sort order — `"header_time"` (default, sort by `fileopen_time` from each file's header), `"lexicographic"`, or `"none"` (preserve caller's order).
 **Returns:** `xarray.Dataset`
+### `write_multi_dbd_netcdf(filenames, output, **kwargs)`
+Stream multiple DBD files directly to a NetCDF file without loading all data
+into memory. Preferred for large datasets (100+ files).
+**Parameters:**
+- `filenames` (iterable): Paths to DBD files (duplicates removed automatically)
+- `output` (str or Path): Output NetCDF file path (parent directory created if needed)
+- `skip_first_record` (bool): Skip first record in each file (default: True)
+- `repair` (bool): Attempt to repair corrupted records (default: False)
+- `to_keep` (list of str): Sensor names to keep (default: all)
+- `criteria` (list of str): Sensor names for selection criteria
+- `skip_missions` (list of str): Mission names to skip
+- `keep_missions` (list of str): Mission names to keep
+- `cache_dir` (str, Path, or None): Directory for sensor cache files
+- `compression` (int): Zlib compression level 0-9 (default: 5, 0 disables)
+- `sort` (str): File sort order (default: `"header_time"`)
+- `batch_size` (int): Files per batch (default: 100; smaller reduces peak memory)
+**Returns:** `tuple[int, int]` — (n_records, n_files)
 ## Migration from dbdreader
 The dbdreader2 API is derived from Lucas Merckelbach's
@@ -317,9 +365,8 @@ mdbd = dbdreader.MultiDBD(
   to batch additional sensors into the first `get()` call.
 - **`skip_initial_line` semantics.** When reading multiple files, the
-  first contributing file keeps all its records; subsequent files skip
-  their first record. dbdreader skips the first record of every file.
-  Multi-file record counts may therefore differ by up to N-1.
+  first record of every file is skipped (matching dbdreader). Multi-file
+  record counts should match dbdreader exactly.
 - **Float64 output.** `get()` always returns float64 arrays, matching
   dbdreader's behavior. Integer fill values (-127 for int8, -32768 for
@@ -436,6 +483,7 @@ print(f"Depth units: {ds['m_depth'].attrs['units']}")
 ### Working with trajectories
 ```python
+from pathlib import Path
 import xarray_dbd as xdbd
 import matplotlib.pyplot as plt
@@ -456,6 +504,7 @@ plt.show()
 ### Extracting science data
 ```python
+from pathlib import Path
 # Read full resolution science data
 files = sorted(Path('.').glob('*.ebd'))
 ds = xdbd.open_multi_dbd_dataset(
@@ -468,6 +517,74 @@ df = ds.to_dataframe()
 print(df.describe())
 ```
+## Choosing an API
+| Scenario | Recommended API |
+|----------|----------------|
+| Single file, quick look | `xr.open_dataset(f, engine="dbd")` |
+| Multiple files, < 1 GB | `xdbd.open_multi_dbd_dataset(files, to_keep=[...])` |
+| Multiple files, large dataset | `xdbd.write_multi_dbd_netcdf(files, "out.nc")` |
+| Interactive / Jupyter | `xdbd.MultiDBD(filenames=files)` with `.get()` (lazy) |
+| Batch processing 1000+ files | `mkone` CLI (multiprocessing) |
+| Drop-in dbdreader replacement | `import xarray_dbd.dbdreader2 as dbdreader` |
+## Slocum File Types
+| Extension | Name | Contents |
+|-----------|------|----------|
+| `.dbd` / `.dcd` | Flight | Vehicle sensors: depth, attitude, speed, GPS |
+| `.ebd` / `.ecd` | Science | Payload sensors: CTD, optics, oxygen |
+| `.sbd` / `.scd` | Short burst | Surface telemetry summary records |
+| `.tbd` / `.tcd` | Technical | Detailed engineering telemetry |
+| `.mbd` / `.mcd` | Mini | Compact engineering subset |
+| `.nbd` / `.ncd` | Narrow | Compact science subset |
+Compressed variants (`.?cd`) use LZ4 framing and are handled transparently.
+## Working with Glider Data
+### Discovering available sensors
+```python
+import xarray_dbd as xdbd
+# xarray API
+ds = xdbd.open_dbd_dataset("file.dbd", cache_dir="cache")
+for var in sorted(ds.data_vars):
+    print(f"  {var:30s} {ds[var].attrs.get('units', '')}")
+# dbdreader2 API
+dbd = xdbd.MultiDBD(pattern="*.dbd", cacheDir="cache")
+for name in sorted(dbd.parameterNames["eng"]):
+    print(f"  {name:30s} {dbd.parameterUnits.get(name, '')}")
+```
+Sensor naming conventions are documented in
+[TWR's masterdata files](https://gliderfs2.ceoas.oregonstate.edu/gliderweb/masterdata/).
+### Time conversion
+`m_present_time` contains UTC seconds since 1970-01-01 (Unix epoch, float64):
+```python
+import pandas as pd
+time = pd.to_datetime(ds["m_present_time"].values, unit="s", utc=True)
+```
+### Handling fill values
+Float sensors use NaN for missing data. Integer sensors use sentinel fill
+values (-127 for int8, -32768 for int16). Filter them out:
+```python
+# xarray — replace sentinels with NaN
+ds = ds.where(ds != -32768)
+# dbdreader2 — automatic filtering (default)
+t, v = dbd.get("m_depth")  # return_nans=False by default
+```
 ## Known Limitations
 - **Python 3.10+ required** — uses `from __future__ import annotations` for modern type-hint syntax.
@@ -478,6 +595,18 @@ print(df.describe())
 - **No lazy loading for xarray API** — `open_dataset()` reads all sensor data
   into memory. For very large deployments, use `to_keep` to select only needed
   sensors. The dbdreader2 API (`DBD`/`MultiDBD`) uses lazy incremental loading.
+- **Fill values in xarray output** — Integer sensors use sentinel fill values
+  (-127 for int8, -32768 for int16) rather than NaN. Between dives, science
+  sensors may contain these sentinels or NaN. Filter with
+  `ds.where(ds != -32768)` or use the dbdreader2 `get(return_nans=False)` API
+  which filters automatically.
+- **Not CF-compliant** — NetCDF output preserves sensor `units` but does not
+  add CF attributes (`standard_name`, `axis`, `calendar`). Add metadata
+  post-hoc for publication, e.g.:
+  ```python
+  ds["m_present_time"].attrs["axis"] = "T"
+  ds["m_present_time"].attrs["units"] = "seconds since 1970-01-01"
+  ```
 ## Troubleshooting

{xarray_dbd-0.2.3 → xarray_dbd-0.2.6}/csrc/ColumnData.C RENAMED Viewed

@@ -123,7 +123,7 @@ ColumnDataResult read_columns(std::istream& is,
                 qKeep |= sensor.qCriteria();
                 const int oi = outIndex[i];
                 if (oi >= 0) {
-                    // Copy previous value into current row
+                    // Copy previous value into current row, converting inf to NaN
                     std::visit([nRows, oi](auto& col_vec, const auto& prev_vec) {
                         using T = typename std::decay_t<decltype(col_vec)>::value_type;
                         using PT = typename std::decay_t<decltype(prev_vec)>::value_type;
@@ -136,7 +136,11 @@ ColumnDataResult read_columns(std::istream& is,
                                 else
                                     col_vec.resize(col_vec.size() * 2, NAN);
                             }
-                            col_vec[nRows] = prev_vec[0];
+                            T val = prev_vec[0];
+                            if constexpr (std::is_floating_point_v<T>) {
+                                if (std::isinf(val)) val = NAN;
+                            }
+                            col_vec[nRows] = val;
                         }
                     }, columns[oi], prevValues[oi]);
                 }

{xarray_dbd-0.2.3 → xarray_dbd-0.2.6}/csrc/Decompress.C RENAMED Viewed

@@ -38,17 +38,20 @@ int DecompressTWRBuf::underflow() {
     if (!this->mIS.read(frame.data(), n)) { // EOF
       return std::char_traits<char>::eof();
     }
-    const int j = LZ4_decompress_safe(frame.data(), this->mBuffer, static_cast<int>(n), sizeof(this->mBuffer));
+    const int j(LZ4_decompress_safe(frame.data(), this->mBuffer, static_cast<int>(n), sizeof(this->mBuffer)));
     if (j < 0) { // LZ4 decompression error
+      LOG_ERROR("LZ4 decompression failed (error {}) in {} (block size {})",
+                j, this->mFilename, n);
       return std::char_traits<char>::eof();
     }
-    if (static_cast<size_t>(j) > sizeof(this->mBuffer)) { // Probably a corrupted file
-      return std::char_traits<char>::eof();
-    }
-    this->setg(this->mBuffer, this->mBuffer, this->mBuffer + j);
+    const size_t decompressedSize(static_cast<size_t>(j));
+    this->setg(this->mBuffer, this->mBuffer, this->mBuffer + decompressedSize);
+    this->mPos += decompressedSize;
   } else { // Not compressed
     if (this->mIS.read(this->mBuffer, sizeof(this->mBuffer)) || this->mIS.gcount()) {
-      this->setg(this->mBuffer, this->mBuffer, this->mBuffer + this->mIS.gcount());
+      const auto n = this->mIS.gcount();
+      this->setg(this->mBuffer, this->mBuffer, this->mBuffer + n);
+      this->mPos += static_cast<size_t>(n);
     } else {
       return std::char_traits<char>::eof();
     }
@@ -57,6 +60,18 @@ int DecompressTWRBuf::underflow() {
   return std::char_traits<char>::to_int_type(*this->gptr());
 }
+DecompressTWRBuf::pos_type
+DecompressTWRBuf::seekoff(off_type off, std::ios_base::seekdir dir,
+                          std::ios_base::openmode /*which*/) {
+  // Only support tellg(): seekoff(0, cur)
+  if (dir == std::ios_base::cur && off == 0) {
+    // mPos is total bytes loaded; subtract unread bytes remaining in buffer
+    const auto remaining = this->egptr() - this->gptr();
+    return static_cast<pos_type>(this->mPos - static_cast<size_t>(remaining));
+  }
+  return pos_type(off_type(-1)); // Seeking not supported
+}
 bool qCompressed(const std::string& fn) {
   const std::string suffix(fs::path(fn).extension().string());
   const bool q((suffix.size() == 4) && (std::tolower(static_cast<unsigned char>(suffix[2])) == 'c'));

{xarray_dbd-0.2.3 → xarray_dbd-0.2.6}/csrc/Decompress.H RENAMED Viewed

@@ -11,6 +11,7 @@ class DecompressTWRBuf: public std::streambuf {
   const bool mqCompressed;
   char mBuffer[65536];
   const std::string mFilename;
+  size_t mPos = 0; // Total decompressed bytes loaded into buffer
 public:
   DecompressTWRBuf(const std::string& fn, const bool qCompressed)
     : mIS(fn.c_str(), std::ios::binary)
@@ -23,7 +24,11 @@ public:
   void close() {mIS.close();}
-  int underflow();
+  int underflow() override;
+protected:
+  pos_type seekoff(off_type off, std::ios_base::seekdir dir,
+                   std::ios_base::openmode which = std::ios_base::in) override;
 };
 class DecompressTWR: public std::istream {

xarray-dbd 0.2.3__tar.gz → 0.2.6__tar.gz

xarray-dbd 0.2.3tar.gz → 0.2.6tar.gz