PyPI - xarray-dbd - Versions diffs - 0.2.3__tar.gz → 0.2.5__tar.gz - Mend

xarray-dbd 0.2.3tar.gz → 0.2.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (62) hide show

xarray_dbd-0.2.5/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,96 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.2.5] - 2026-03-30
+### Added
+- `sort` parameter for `open_multi_dbd_dataset()` and `write_multi_dbd_netcdf()` with three modes: `"header_time"` (default, sort by `fileopen_time` from each file's DBD header), `"lexicographic"`, and `"none"` (preserve caller's order)
+- `--sort` CLI flag for `dbd2nc`, `mkone`, and `2csv` commands
+- `presorted` parameter for `read_dbd_files()` C++ binding to skip internal lexicographic sort when files are pre-sorted by Python
+- `sensor_size` attribute on variables from `open_multi_dbd_dataset()`, matching single-file behavior
+- `--skip-first` flag for `mkone` as consistent alias for the inverse `--keep-first`
+- Duplicate file detection and deduplication with warning in multi-file functions
+- Output directory auto-creation in `write_multi_dbd_netcdf()`
+- "Choosing an API" and "Slocum File Types" sections in README
+- Fill value and CF-compliance guidance in README Known Limitations
+### Changed
+- `skip_first_record` in `read_dbd_files()` now skips the first record of **all** files (including the first), matching Lucas Merckelbach's dbdreader behavior
+- Streaming NetCDF writer keeps a single file handle open instead of reopening per batch
+### Fixed
+- File ordering for TWR-style filenames (e.g. `ce_1137-2026-085-1-10.dbd` incorrectly sorting before `-2.dbd` under lexicographic sort)
+- `_parse_fileopen_time()` now logs a warning instead of silently sorting unparseable files to end
+- `DBD.get_fileopen_time()` no longer raises on unparseable header values
+- Thread-safe random number generator in C++ cache file creation
+- Integer overflow guard in C++ column capacity doubling
+## [0.2.3] - 2026-02-23
+### Added
+- `include_source` support in `MultiDBD.get()` — returns per-record source DBD references, matching dbdreader's API
+- `continue_on_reading_error` parameter for `MultiDBD.get()` — skip corrupted files instead of raising, matching dbdreader v0.5.9
+- `DBD_ERROR_READ_ERROR` error code (14) for compatibility with dbdreader
+- Python 3.14 pre-built wheels for all platforms (Linux, macOS, Windows)
+- Attribution to Lucas Merckelbach's [dbdreader](https://github.com/smerckel/dbdreader) in README
+## [0.2.2] - 2026-02-23
+### Added
+- `preload` parameter for `DBD` and `MultiDBD` constructors
+- Changelog configuration and tag/version validation in publish workflow
+### Fixed
+- mypy errors: `datetime.UTC`, tuple assignments, type annotations
+- ruff formatting compliance
+## [0.2.1] - 2026-02-22
+### Added
+- Streaming NetCDF writer (`write_multi_dbd_netcdf`) for low-memory batch conversion
+- dbdreader-compatible API layer (`DBD` and `MultiDBD` classes in `xarray_dbd.dbdreader2`)
+- Unified CLI under `xdbd` command with subcommands (`2nc`, `mkone`, `2csv`, `missions`, `cache`)
+- Monotonicity check in `get_sync()` to prevent silent wrong results from `np.interp`
+### Changed
+- CLI restructured: standalone `dbd2nc` and `mkone` commands replaced by `xdbd 2nc` and `xdbd mkone`
+- Streaming mode is now the default for non-append `2nc` and `mkone` (requires netCDF4)
+- Fill values corrected: -127 for int8, -32768 for int16 (matching C++ dbd2netCDF standalone)
+- Multi-file reader uses read-copy-discard strategy to reduce peak memory ~53%
+- Replaced inf with NaN in float reads to match C++ dbd2netCDF behavior
+### Fixed
+- Multi-file parse dropping records from unfactored DBD files
+- Corrupted file recovery: discard partial record on I/O error
+## [0.1.0] - 2026-02-20
+### Added
+- C++ backend via pybind11 wrapping [dbd2netCDF](https://github.com/mousebrains/dbd2netcdf) parser
+- Native xarray engine integration (`xr.open_dataset(f, engine="dbd")`)
+- Multi-file reading with `open_multi_dbd_dataset()` using C++ SensorsMap two-pass approach
+- CLI tools: `dbd2nc` for single/multi-file conversion, `mkone` for batch directory processing
+- Native dtype support: int8, int16, float32, float64 columns (no double-conversion overhead)
+- LZ4 decompression for compressed `.?cd` files
+- Sensor filtering (`to_keep`), mission filtering (`skip_missions`/`keep_missions`)
+- Corrupted file recovery with `repair=True`
+- Python 3.10+ and free-threaded Python (PEP 703) support
+### Changed
+- Replaced pure-Python parser with C++ pybind11 extension for ~5x performance improvement
+- Fill values: NaN for float32/float64, -127 for int8, -32768 for int16 (matching C++ dbd2netCDF)

{xarray_dbd-0.2.3 → xarray_dbd-0.2.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: xarray-dbd
-Version: 0.2.3
+Version: 0.2.5
 Summary: Efficient xarray backend for reading glider DBD files
 Keywords: glider,oceanography,dbd,slocum,xarray,netcdf
 Author-Email: Pat Welch <pat@mousebrains.com>
@@ -41,7 +41,7 @@ Description-Content-Type: text/markdown
 [![License](https://img.shields.io/pypi/l/xarray-dbd)](License.txt)
 [![CI](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/ci.yml/badge.svg)](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/ci.yml)
 [![CodeQL](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/codeql.yml/badge.svg)](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/codeql.yml)
-[![Codecov](https://codecov.io/gh/mousebrains/dbd2netcdf-python/branch/main/graph/badge.svg)](https://codecov.io/gh/mousebrains/dbd2netcdf-python)
+[![codecov](https://codecov.io/gh/mousebrains/dbd2netcdf-python/graph/badge.svg?token=EJQEIVEB0U)](https://codecov.io/gh/mousebrains/dbd2netcdf-python)
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 An efficient xarray backend for reading Dinkum Binary Data (DBD) files from
@@ -147,6 +147,30 @@ ds = xdbd.open_multi_dbd_dataset(
 )
 ```
+### File sort order
+By default, files are sorted by the `fileopen_time` timestamp in each file's
+header, which is correct regardless of filename convention. Alternative sort
+modes are available:
+```python
+# Default: sort by header timestamp (universally correct)
+ds = xdbd.open_multi_dbd_dataset(files)
+# Sort by filename (lexicographic)
+ds = xdbd.open_multi_dbd_dataset(files, sort="lexicographic")
+# Preserve the caller's order (no sorting)
+ds = xdbd.open_multi_dbd_dataset(files, sort="none")
+```
+The `--sort` flag is also available on all CLI commands:
+```bash
+dbd2nc --sort lexicographic -C cache -o output.nc *.dbd
+mkone --sort none --output-prefix /path/to/output/ /path/to/raw/
+```
 ### Advanced options
 ```python
@@ -189,6 +213,7 @@ Open a single DBD file as an xarray Dataset.
 - `to_keep` (list of str): Sensor names to keep (default: all)
 - `criteria` (list of str): Sensor names for selection criteria
 - `drop_variables` (list of str): Variables to exclude
+- `cache_dir` (str, Path, or None): Directory for sensor cache files
 **Returns:** `xarray.Dataset`
@@ -204,6 +229,8 @@ Open multiple DBD files as a single concatenated xarray Dataset.
 - `criteria` (list of str): Sensor names for selection criteria
 - `skip_missions` (list of str): Mission names to skip
 - `keep_missions` (list of str): Mission names to keep
+- `cache_dir` (str, Path, or None): Directory for sensor cache files
+- `sort` (str): File sort order — `"header_time"` (default, sort by `fileopen_time` from each file's header), `"lexicographic"`, or `"none"` (preserve caller's order).
 **Returns:** `xarray.Dataset`
@@ -353,9 +380,8 @@ mdbd = dbdreader.MultiDBD(
   to batch additional sensors into the first `get()` call.
 - **`skip_initial_line` semantics.** When reading multiple files, the
-  first contributing file keeps all its records; subsequent files skip
-  their first record. dbdreader skips the first record of every file.
-  Multi-file record counts may therefore differ by up to N-1.
+  first record of every file is skipped (matching dbdreader). Multi-file
+  record counts should match dbdreader exactly.
 - **Float64 output.** `get()` always returns float64 arrays, matching
   dbdreader's behavior. Integer fill values (-127 for int8, -32768 for
@@ -504,6 +530,30 @@ df = ds.to_dataframe()
 print(df.describe())
 ```
+## Choosing an API
+| Scenario | Recommended API |
+|----------|----------------|
+| Single file, quick look | `xr.open_dataset(f, engine="dbd")` |
+| Multiple files, < 1 GB | `xdbd.open_multi_dbd_dataset(files, to_keep=[...])` |
+| Multiple files, large dataset | `xdbd.write_multi_dbd_netcdf(files, "out.nc")` |
+| Interactive / Jupyter | `xdbd.MultiDBD(filenames=files)` with `.get()` (lazy) |
+| Batch processing 1000+ files | `mkone` CLI (multiprocessing) |
+| Drop-in dbdreader replacement | `import xarray_dbd.dbdreader2 as dbdreader` |
+## Slocum File Types
+| Extension | Name | Contents |
+|-----------|------|----------|
+| `.dbd` / `.dcd` | Flight | Vehicle sensors: depth, attitude, speed, GPS |
+| `.ebd` / `.ecd` | Science | Payload sensors: CTD, optics, oxygen |
+| `.sbd` / `.scd` | Short burst | Surface telemetry summary records |
+| `.tbd` / `.tcd` | Technical | Detailed engineering telemetry |
+| `.mbd` / `.mcd` | Mini | Compact engineering subset |
+| `.nbd` / `.ncd` | Narrow | Compact science subset |
+Compressed variants (`.?cd`) use LZ4 framing and are handled transparently.
 ## Known Limitations
 - **Python 3.10+ required** — uses `from __future__ import annotations` for modern type-hint syntax.
@@ -514,6 +564,18 @@ print(df.describe())
 - **No lazy loading for xarray API** — `open_dataset()` reads all sensor data
   into memory. For very large deployments, use `to_keep` to select only needed
   sensors. The dbdreader2 API (`DBD`/`MultiDBD`) uses lazy incremental loading.
+- **Fill values in xarray output** — Integer sensors use sentinel fill values
+  (-127 for int8, -32768 for int16) rather than NaN. Between dives, science
+  sensors may contain these sentinels or NaN. Filter with
+  `ds.where(ds != -32768)` or use the dbdreader2 `get(return_nans=False)` API
+  which filters automatically.
+- **Not CF-compliant** — NetCDF output preserves sensor `units` but does not
+  add CF attributes (`standard_name`, `axis`, `calendar`). Add metadata
+  post-hoc for publication, e.g.:
+  ```python
+  ds["m_present_time"].attrs["axis"] = "T"
+  ds["m_present_time"].attrs["units"] = "seconds since 1970-01-01"
+  ```
 ## Troubleshooting

{xarray_dbd-0.2.3 → xarray_dbd-0.2.5}/README.md RENAMED Viewed

@@ -5,7 +5,7 @@
 [![License](https://img.shields.io/pypi/l/xarray-dbd)](License.txt)
 [![CI](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/ci.yml/badge.svg)](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/ci.yml)
 [![CodeQL](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/codeql.yml/badge.svg)](https://github.com/mousebrains/dbd2netcdf-python/actions/workflows/codeql.yml)
-[![Codecov](https://codecov.io/gh/mousebrains/dbd2netcdf-python/branch/main/graph/badge.svg)](https://codecov.io/gh/mousebrains/dbd2netcdf-python)
+[![codecov](https://codecov.io/gh/mousebrains/dbd2netcdf-python/graph/badge.svg?token=EJQEIVEB0U)](https://codecov.io/gh/mousebrains/dbd2netcdf-python)
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 An efficient xarray backend for reading Dinkum Binary Data (DBD) files from
@@ -111,6 +111,30 @@ ds = xdbd.open_multi_dbd_dataset(
 )
 ```
+### File sort order
+By default, files are sorted by the `fileopen_time` timestamp in each file's
+header, which is correct regardless of filename convention. Alternative sort
+modes are available:
+```python
+# Default: sort by header timestamp (universally correct)
+ds = xdbd.open_multi_dbd_dataset(files)
+# Sort by filename (lexicographic)
+ds = xdbd.open_multi_dbd_dataset(files, sort="lexicographic")
+# Preserve the caller's order (no sorting)
+ds = xdbd.open_multi_dbd_dataset(files, sort="none")
+```
+The `--sort` flag is also available on all CLI commands:
+```bash
+dbd2nc --sort lexicographic -C cache -o output.nc *.dbd
+mkone --sort none --output-prefix /path/to/output/ /path/to/raw/
+```
 ### Advanced options
 ```python
@@ -153,6 +177,7 @@ Open a single DBD file as an xarray Dataset.
 - `to_keep` (list of str): Sensor names to keep (default: all)
 - `criteria` (list of str): Sensor names for selection criteria
 - `drop_variables` (list of str): Variables to exclude
+- `cache_dir` (str, Path, or None): Directory for sensor cache files
 **Returns:** `xarray.Dataset`
@@ -168,6 +193,8 @@ Open multiple DBD files as a single concatenated xarray Dataset.
 - `criteria` (list of str): Sensor names for selection criteria
 - `skip_missions` (list of str): Mission names to skip
 - `keep_missions` (list of str): Mission names to keep
+- `cache_dir` (str, Path, or None): Directory for sensor cache files
+- `sort` (str): File sort order — `"header_time"` (default, sort by `fileopen_time` from each file's header), `"lexicographic"`, or `"none"` (preserve caller's order).
 **Returns:** `xarray.Dataset`
@@ -317,9 +344,8 @@ mdbd = dbdreader.MultiDBD(
   to batch additional sensors into the first `get()` call.
 - **`skip_initial_line` semantics.** When reading multiple files, the
-  first contributing file keeps all its records; subsequent files skip
-  their first record. dbdreader skips the first record of every file.
-  Multi-file record counts may therefore differ by up to N-1.
+  first record of every file is skipped (matching dbdreader). Multi-file
+  record counts should match dbdreader exactly.
 - **Float64 output.** `get()` always returns float64 arrays, matching
   dbdreader's behavior. Integer fill values (-127 for int8, -32768 for
@@ -468,6 +494,30 @@ df = ds.to_dataframe()
 print(df.describe())
 ```
+## Choosing an API
+| Scenario | Recommended API |
+|----------|----------------|
+| Single file, quick look | `xr.open_dataset(f, engine="dbd")` |
+| Multiple files, < 1 GB | `xdbd.open_multi_dbd_dataset(files, to_keep=[...])` |
+| Multiple files, large dataset | `xdbd.write_multi_dbd_netcdf(files, "out.nc")` |
+| Interactive / Jupyter | `xdbd.MultiDBD(filenames=files)` with `.get()` (lazy) |
+| Batch processing 1000+ files | `mkone` CLI (multiprocessing) |
+| Drop-in dbdreader replacement | `import xarray_dbd.dbdreader2 as dbdreader` |
+## Slocum File Types
+| Extension | Name | Contents |
+|-----------|------|----------|
+| `.dbd` / `.dcd` | Flight | Vehicle sensors: depth, attitude, speed, GPS |
+| `.ebd` / `.ecd` | Science | Payload sensors: CTD, optics, oxygen |
+| `.sbd` / `.scd` | Short burst | Surface telemetry summary records |
+| `.tbd` / `.tcd` | Technical | Detailed engineering telemetry |
+| `.mbd` / `.mcd` | Mini | Compact engineering subset |
+| `.nbd` / `.ncd` | Narrow | Compact science subset |
+Compressed variants (`.?cd`) use LZ4 framing and are handled transparently.
 ## Known Limitations
 - **Python 3.10+ required** — uses `from __future__ import annotations` for modern type-hint syntax.
@@ -478,6 +528,18 @@ print(df.describe())
 - **No lazy loading for xarray API** — `open_dataset()` reads all sensor data
   into memory. For very large deployments, use `to_keep` to select only needed
   sensors. The dbdreader2 API (`DBD`/`MultiDBD`) uses lazy incremental loading.
+- **Fill values in xarray output** — Integer sensors use sentinel fill values
+  (-127 for int8, -32768 for int16) rather than NaN. Between dives, science
+  sensors may contain these sentinels or NaN. Filter with
+  `ds.where(ds != -32768)` or use the dbdreader2 `get(return_nans=False)` API
+  which filters automatically.
+- **Not CF-compliant** — NetCDF output preserves sensor `units` but does not
+  add CF attributes (`standard_name`, `axis`, `calendar`). Add metadata
+  post-hoc for publication, e.g.:
+  ```python
+  ds["m_present_time"].attrs["axis"] = "T"
+  ds["m_present_time"].attrs["units"] = "seconds since 1970-01-01"
+  ```
 ## Troubleshooting

{xarray_dbd-0.2.3 → xarray_dbd-0.2.5}/csrc/Sensors.C RENAMED Viewed

@@ -156,9 +156,8 @@ Sensors::mkFilename(const std::string& dir) const
 namespace {
   // Generate a unique temporary filename suffix
   std::string uniqueTempSuffix() {
-    static std::random_device rd;
-    static std::mt19937 gen(rd());
-    static std::uniform_int_distribution<> dis(100000, 999999);
+    thread_local std::mt19937 gen(std::random_device{}());
+    thread_local std::uniform_int_distribution<> dis(100000, 999999);
     return std::to_string(dis(gen));
   }
 }

{xarray_dbd-0.2.3 → xarray_dbd-0.2.5}/csrc/config.h RENAMED Viewed

@@ -3,7 +3,7 @@
 #ifndef INC_CONFIG_H_
 #define INC_CONFIG_H_
-#define VERSION "1.6.10"
+#define VERSION "1.7.0"
 #define MAINTAINER "pat@mousebrains.com"
 #define HAVE_INT8_T

{xarray_dbd-0.2.3 → xarray_dbd-0.2.5}/csrc/dbd_python.cpp RENAMED Viewed

@@ -178,14 +178,17 @@ MultiFileResult parse_multiple_files(
     const std::vector<std::string>& skip_missions,
     const std::vector<std::string>& keep_missions,
     bool skip_first_record,
-    bool repair)
+    bool repair,
+    bool presorted)
 {
     if (filenames.empty()) {
         return {{}, {}, 0, 0};
     }
     std::vector<std::string> sorted_files(filenames);
-    std::sort(sorted_files.begin(), sorted_files.end());
+    if (!presorted) {
+        std::sort(sorted_files.begin(), sorted_files.end());
+    }
     Header::tMissions skipSet, keepSet;
     for (const auto& m : skip_missions) Header::addMission(m, skipSet);
@@ -280,7 +283,7 @@ MultiFileResult parse_multiple_files(
             size_t n = result.n_records;
             size_t start = 0;
-            if (skip_first_record && fileCount > 0 && n > 0) {
+            if (skip_first_record && n > 0) {
                 start = 1;
                 n -= 1;
             }
@@ -288,7 +291,8 @@ MultiFileResult parse_multiple_files(
             if (n > 0) {
                 // Grow union columns if needed (doubling strategy)
                 if (offset + n > capacity) {
-                    capacity = std::max(offset + n, capacity * 2);
+                    size_t doubled = (capacity <= SIZE_MAX / 2) ? capacity * 2 : SIZE_MAX;
+                    capacity = std::max(offset + n, doubled);
                     grow_union_columns(unionColumns, unionInfo, capacity);
                 }
@@ -573,14 +577,15 @@ PYBIND11_MODULE(_dbd_cpp, m, py::mod_gil_not_used()) {
            const std::vector<std::string>& skip_missions,
            const std::vector<std::string>& keep_missions,
            bool skip_first_record,
-           bool repair) -> py::dict {
+           bool repair,
+           bool presorted) -> py::dict {
             MultiFileResult result;
             {
                 py::gil_scoped_release release;
                 result = parse_multiple_files(filenames, cache_dir, to_keep,
                                               criteria, skip_missions,
                                               keep_missions, skip_first_record,
-                                              repair);
+                                              repair, presorted);
             }
             return multi_result_to_python(std::move(result));
         },
@@ -592,10 +597,11 @@ PYBIND11_MODULE(_dbd_cpp, m, py::mod_gil_not_used()) {
         py::arg("keep_missions") = std::vector<std::string>(),
         py::arg("skip_first_record") = true,
         py::arg("repair") = false,
+        py::arg("presorted") = false,
         "Read multiple DBD files with sensor union and return concatenated data.\n\n"
         "Uses a two-pass approach: pass 1 scans headers and builds a unified\n"
         "sensor list via SensorsMap, pass 2 reads data and merges into union\n"
-        "columns. Files are sorted internally.\n\n"
+        "columns. Files are sorted internally unless presorted is True.\n\n"
         "Parameters\n"
         "----------\n"
         "filenames : list of str\n"
@@ -614,7 +620,10 @@ PYBIND11_MODULE(_dbd_cpp, m, py::mod_gil_not_used()) {
         "    If True (default), drop the first record of each file after\n"
         "    the first.\n"
         "repair : bool, optional\n"
-        "    If True, attempt to recover data from corrupted records.\n\n"
+        "    If True, attempt to recover data from corrupted records.\n"
+        "presorted : bool, optional\n"
+        "    If True, skip internal lexicographic sort and process files\n"
+        "    in the order given. Default False.\n\n"
         "Returns\n"
         "-------\n"
         "dict\n"

{xarray_dbd-0.2.3 → xarray_dbd-0.2.5}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "scikit_build_core.build"
 [project]
 name = "xarray-dbd"
-version = "0.2.3"
+version = "0.2.5"
 description = "Efficient xarray backend for reading glider DBD files"
 readme = "README.md"
 requires-python = ">=3.10"

{xarray_dbd-0.2.3 → xarray_dbd-0.2.5}/tests/test_cli.py RENAMED Viewed

@@ -876,6 +876,7 @@ def _base_args(**overrides) -> Namespace:
         "mail_from": None,
         "mail_subject": None,
         "smtp_host": "localhost",
+        "sort": "header_time",
     }
     defaults.update(overrides)
     return Namespace(**defaults)
@@ -1056,6 +1057,57 @@ class TestDbd2ncRun:
         assert len(ds.data_vars) > 0
         ds.close()
+    def test_dbd2nc_run_sort_header_time(self, tmp_path):
+        """Streaming write with --sort header_time produces valid output."""
+        import xarray as xr
+        from xarray_dbd.cli.dbd2nc import run
+        dcd_files = sorted(DBD_DIR.glob("*.dcd"))[:3]
+        outfile = tmp_path / "out.nc"
+        args = _base_args(
+            files=dcd_files,
+            cache=Path(CACHE_DIR),
+            output=outfile,
+            append=False,
+            sensors=None,
+            sensor_output=None,
+            skip_mission=None,
+            keep_mission=None,
+            skip_first=True,
+            repair=False,
+            compression=5,
+            sort="header_time",
+        )
+        rc = run(args)
+        assert rc == 0
+        ds = xr.open_dataset(str(outfile), decode_timedelta=False)
+        assert len(ds.data_vars) > 0
+        ds.close()
+    def test_dbd2nc_run_sort_none(self, tmp_path):
+        """Streaming write with --sort none produces valid output."""
+        from xarray_dbd.cli.dbd2nc import run
+        dcd_files = sorted(DBD_DIR.glob("*.dcd"))[:2]
+        outfile = tmp_path / "out.nc"
+        args = _base_args(
+            files=dcd_files,
+            cache=Path(CACHE_DIR),
+            output=outfile,
+            append=False,
+            sensors=None,
+            sensor_output=None,
+            skip_mission=None,
+            keep_mission=None,
+            skip_first=True,
+            repair=False,
+            compression=5,
+            sort="none",
+        )
+        rc = run(args)
+        assert rc == 0
     def test_dbd2nc_run_no_compression(self, tmp_path):
         from xarray_dbd.cli.dbd2nc import run

{xarray_dbd-0.2.3 → xarray_dbd-0.2.5}/tests/test_cpp_backend.py RENAMED Viewed

@@ -119,6 +119,79 @@ def test_open_multi_dbd_dataset():
     ds.close()
+def test_read_dbd_files_presorted():
+    """read_dbd_files with presorted=True preserves caller's file order."""
+    files = sorted(str(f) for f in DBD_DIR.glob("*.dcd"))[:5]
+    if len(files) < 2:
+        pytest.skip("Need at least 2 test files")
+    # Normal (lexicographic) order
+    result_lex = read_dbd_files(files, cache_dir=CACHE_DIR, skip_first_record=True)
+    # Reversed order with presorted=True — should produce different data order
+    result_rev = read_dbd_files(
+        list(reversed(files)),
+        cache_dir=CACHE_DIR,
+        skip_first_record=True,
+        presorted=True,
+    )
+    # Both should have the same total records and sensor names
+    assert result_lex["n_records"] == result_rev["n_records"]
+    assert set(result_lex["sensor_names"]) == set(result_rev["sensor_names"])
+def test_open_multi_dbd_dataset_sort_header_time():
+    """open_multi_dbd_dataset with sort='header_time' produces valid output."""
+    files = sorted(DBD_DIR.glob("*.dcd"))[:5]
+    if len(files) < 2:
+        pytest.skip("Need at least 2 test files")
+    ds = xdbd.open_multi_dbd_dataset(
+        files,
+        skip_first_record=True,
+        cache_dir=CACHE_DIR,
+        sort="header_time",
+    )
+    assert len(ds.data_vars) > 0
+    assert len(ds.i) > 0
+    # Compare record count with lexicographic sort — should be the same
+    ds_lex = xdbd.open_multi_dbd_dataset(
+        files,
+        skip_first_record=True,
+        cache_dir=CACHE_DIR,
+        sort="lexicographic",
+    )
+    assert len(ds.i) == len(ds_lex.i)
+    ds.close()
+    ds_lex.close()
+def test_open_multi_dbd_dataset_sort_none():
+    """open_multi_dbd_dataset with sort='none' preserves caller's order."""
+    files = sorted(DBD_DIR.glob("*.dcd"))[:3]
+    if len(files) < 2:
+        pytest.skip("Need at least 2 test files")
+    ds = xdbd.open_multi_dbd_dataset(
+        files,
+        skip_first_record=True,
+        cache_dir=CACHE_DIR,
+        sort="none",
+    )
+    assert len(ds.data_vars) > 0
+    assert len(ds.i) > 0
+    ds.close()
+def test_open_multi_dbd_dataset_sort_invalid():
+    """open_multi_dbd_dataset rejects invalid sort values."""
+    files = sorted(DBD_DIR.glob("*.dcd"))[:1]
+    with pytest.raises(ValueError, match="sort must be one of"):
+        xdbd.open_multi_dbd_dataset(files, cache_dir=CACHE_DIR, sort="bogus")
 def test_nan_fill_for_floats():
     """Float columns use NaN for absent values, int columns use 0."""
     files = sorted(str(f) for f in DBD_DIR.glob("*.dcd"))[:5]

{xarray_dbd-0.2.3 → xarray_dbd-0.2.5}/xarray_dbd/_dbd_cpp.pyi RENAMED Viewed

@@ -49,6 +49,7 @@ def read_dbd_files(
     keep_missions: list[str] = ...,
     skip_first_record: bool = True,
     repair: bool = False,
+    presorted: bool = False,
 ) -> _MultiResult: ...
 def scan_sensors(
     filenames: list[str],

xarray-dbd 0.2.3__tar.gz → 0.2.5__tar.gz

xarray-dbd 0.2.3tar.gz → 0.2.5tar.gz