PyPI - samplesheet-parser - Versions diffs - 0.3.2__tar.gz → 0.3.4__tar.gz - Mend

samplesheet-parser 0.3.2tar.gz → 0.3.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (68) hide show

{samplesheet_parser-0.3.2 → samplesheet_parser-0.3.4}/.github/workflows/ci.yml RENAMED Viewed

@@ -67,3 +67,18 @@ jobs:
         uses: pypa/gh-action-pypi-publish@release/v1
         with:
           password: ${{ secrets.PYPI_API_TOKEN }}
+  release:
+    needs: publish
+    runs-on: ubuntu-latest
+    if: startsWith(github.ref, 'refs/tags/v')
+    permissions:
+      contents: write
+    steps:
+      - uses: actions/checkout@v4
+      - name: Create GitHub Release
+        uses: softprops/action-gh-release@v2
+        with:
+          generate_release_notes: true

{samplesheet_parser-0.3.2 → samplesheet_parser-0.3.4}/.gitignore RENAMED Viewed

@@ -2,6 +2,8 @@
 BLOGPOST.md
 tests/fixtures/outputs/
 demo_output.txt
+**/CSBJ Submission/
+**/.claude/
 # Cache and build artifacts
 __pycache__/

{samplesheet_parser-0.3.2 → samplesheet_parser-0.3.4}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,60 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 ---
+## [0.3.4] - 2026-04-04
+### Added
+- **`samplesheet info` CLI command** — prints a concise summary of any V1 or
+  V2 sample sheet (format, sample count, lanes, index type, read lengths,
+  adapters, experiment name, instrument). Supports `--format json` for
+  machine-readable output; exits 0 on success, 2 on unreadable files.
+- **Configurable Hamming distance threshold** — `SampleSheetValidator.validate()`
+  now accepts a `min_hamming_distance` keyword argument (default: 3) so labs
+  using longer indexes can enforce stricter thresholds without changing the
+  module-level constant.
+  - `SampleSheetMerger` accepts the same parameter in `__init__()` and applies
+    it to both the intra-sheet and cross-sheet Hamming checks as well as the
+    post-merge validation step.
+  - `samplesheet validate` exposes `--min-hamming N` (must be ≥ 1; exits 2 on
+    invalid input). The JSON output includes `min_hamming_distance` for
+    auditability.
+- **`normalize_index_lengths()` utility** — normalizes index sequence lengths
+  across a list of sample dicts (output of `sheet.samples()`) to a consistent
+  length before merging sheets with mixed-length indexes.
+  - `strategy="trim"` — trims all indexes to the shortest sequence length.
+  - `strategy="pad"` — pads shorter indexes to the longest length using `"N"`
+    wildcard characters (supported by BCLConvert ≥ 3.9 and bcl2fastq ≥ 2.20).
+  - Auto-detects V1-style (`index`/`index2`) and V2-style (`Index`/`Index2`)
+    field names; explicit `index1_key`/`index2_key` overrides supported.
+  - Exported from the top-level package as `normalize_index_lengths`.
+- **CI / pre-commit integration guide** in README — GitHub Actions workflow
+  and pre-commit hook configuration for automatic sample sheet validation on
+  every commit or pull request that touches a `SampleSheet.csv`.
+### Fixed
+- `_detect_key()` in `index_utils` now selects the key with at least one
+  non-empty value before falling back to key presence, preventing silent
+  normalization skip when a key exists but all its values are `None` or `""`.
+### Changed
+- `--min-hamming` CLI option default and help text are now derived from the
+  `MIN_HAMMING_DISTANCE` constant in `validators.py` to prevent drift.
+---
+## [0.3.3] - 2026-03-13
+### Documentation
+- Add architecture diagram showing full library structure including CLI and SampleSheetMerger
+- Update README with architecture overview, solid vs dashed line legend
+- Add `[Custom_Sections*]` to V1 and V2 format descriptions
 ## [0.3.2] - 2026-03-12
 ### Added

{samplesheet_parser-0.3.2 → samplesheet_parser-0.3.4}/CITATION.cff RENAMED Viewed

@@ -1,23 +1,33 @@
 cff-version: 1.2.0
-message: "If you use this software, please cite it as below."
+message: "If you use this software, please cite it using the metadata below."
 type: software
 title: "samplesheet-parser"
+abstract: >
+  A Python library for parsing, validating, converting, and merging
+  Illumina SampleSheet V1 and V2 files for BCLConvert and bcl2fastq.
+  Provides format auto-detection, bidirectional V1/V2 conversion,
+  structural and index validation, sheet diffing, and cross-project
+  merging with Hamming-distance collision detection.
 version: 0.3.2
 date-released: 2026-03-12
 license: Apache-2.0
 url: "https://github.com/chaitanyakasaraneni/samplesheet-parser"
 repository-code: "https://github.com/chaitanyakasaraneni/samplesheet-parser"
-abstract: >
-  A Python library for parsing, validating, converting, and merging
-  Illumina SampleSheet V1 and V2 files for BCLConvert and bcl2fastq.
+repository-artifact: "https://pypi.org/project/samplesheet-parser/"
 keywords:
   - bioinformatics
   - Illumina
   - SampleSheet
   - BCLConvert
+  - bcl2fastq
   - demultiplexing
   - genomics
+  - sequencing
   - Python
+identifiers:
+  - type: doi
+    value: 10.5281/zenodo.18989694
+    description: Concept DOI (all versions)
 authors:
   - family-names: Kasaraneni
     given-names: Chaitanya Krishna

{samplesheet_parser-0.3.2 → samplesheet_parser-0.3.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: samplesheet-parser
-Version: 0.3.2
+Version: 0.3.4
 Summary: Format-agnostic parser for Illumina SampleSheet.csv files — supports IEM V1 and BCLConvert V2
 Project-URL: Homepage, https://github.com/chaitanyakasaraneni/samplesheet-parser
 Project-URL: Documentation, https://illumina-samplesheet.readthedocs.io
@@ -60,8 +60,9 @@ Supports both the classic IEM V1 format (bcl2fastq era) and the modern BCLConver
 [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-yellow.svg)](https://opensource.org/licenses/Apache-2.0)
 [![Tests](https://github.com/chaitanyakasaraneni/samplesheet-parser/actions/workflows/ci.yml/badge.svg)](https://github.com/chaitanyakasaraneni/samplesheet-parser/actions)
 [![codecov](https://codecov.io/gh/chaitanyakasaraneni/samplesheet-parser/branch/main/graph/badge.svg?token=CODECOV_TOKEN)](https://codecov.io/gh/chaitanyakasaraneni/samplesheet-parser)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18989694.svg)](https://doi.org/10.5281/zenodo.18989694)
-![samplesheet-parser overview](https://raw.githubusercontent.com/chaitanyakasaraneni/samplesheet-parser/main/images/samplesheet_parser_overview.png)
+![samplesheet-parser overview](https://raw.githubusercontent.com/chaitanyakasaraneni/samplesheet-parser/main/images/samplesheet_parser_arch_v03.png)
 *`SampleSheetFactory` auto-detects the format and routes to the correct parser. Both formats share a common interface — `SampleSheetConverter` handles bidirectional conversion, `SampleSheetValidator` catches index and adapter issues, `SampleSheetDiff` compares two sheets across any combination of V1/V2 formats, `SampleSheetMerger` combines multiple per-project sheets into one, and `SampleSheetWriter` builds or edits sheets programmatically. The `samplesheet` CLI exposes all of this from the shell.*
@@ -556,6 +557,80 @@ sheet.get_read_structure()   # → ReadStructure dataclass
 ---
+## CI / pre-commit integration
+The CLI exits with meaningful codes (`0` = clean, `1` = issues, `2` = error), making it easy to wire into automated pipelines.
+### GitHub Actions
+Add a validation step to any workflow that touches `SampleSheet.csv`:
+```yaml
+# .github/workflows/validate-samplesheet.yml
+name: Validate SampleSheet
+on:
+  push:
+    paths:
+      - '**/SampleSheet.csv'
+  pull_request:
+    paths:
+      - '**/SampleSheet.csv'
+jobs:
+  validate:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+      - run: pip install "samplesheet-parser[cli]"
+      - name: Validate SampleSheet
+        run: samplesheet validate SampleSheet.csv --format json
+```
+### pre-commit hook
+Gate commits that touch any `SampleSheet.csv` in the repository:
+```yaml
+# .pre-commit-config.yaml
+repos:
+  - repo: local
+    hooks:
+      - id: samplesheet-validate
+        name: Validate SampleSheet.csv
+        entry: samplesheet validate
+        language: python
+        additional_dependencies: ["samplesheet-parser[cli]"]
+        files: SampleSheet\.csv$
+        pass_filenames: true
+```
+Install and run once to verify:
+```bash
+pip install pre-commit
+pre-commit install
+pre-commit run samplesheet-validate --all-files
+```
+### Stricter Hamming distance in CI
+If your lab uses longer indexes (10 bp+), raise the minimum Hamming distance threshold to catch borderline cases earlier:
+```bash
+samplesheet validate SampleSheet.csv --min-hamming 4
+```
+This is especially useful in CI where you want to prevent runs that will likely fail demultiplexing.
+---
 ## Contributing
 ```bash
@@ -584,7 +659,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for the full local testing guide and PR c
   title   = {samplesheet-parser: Format-agnostic parser for Illumina SampleSheet.csv},
   year    = {2026},
   url     = {https://github.com/chaitanyakasaraneni/samplesheet-parser},
-  version = {0.3.2}
+  version = {0.3.4}
 }
 ```

{samplesheet_parser-0.3.2 → samplesheet_parser-0.3.4}/README.md RENAMED Viewed

@@ -9,8 +9,9 @@ Supports both the classic IEM V1 format (bcl2fastq era) and the modern BCLConver
 [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-yellow.svg)](https://opensource.org/licenses/Apache-2.0)
 [![Tests](https://github.com/chaitanyakasaraneni/samplesheet-parser/actions/workflows/ci.yml/badge.svg)](https://github.com/chaitanyakasaraneni/samplesheet-parser/actions)
 [![codecov](https://codecov.io/gh/chaitanyakasaraneni/samplesheet-parser/branch/main/graph/badge.svg?token=CODECOV_TOKEN)](https://codecov.io/gh/chaitanyakasaraneni/samplesheet-parser)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18989694.svg)](https://doi.org/10.5281/zenodo.18989694)
-![samplesheet-parser overview](https://raw.githubusercontent.com/chaitanyakasaraneni/samplesheet-parser/main/images/samplesheet_parser_overview.png)
+![samplesheet-parser overview](https://raw.githubusercontent.com/chaitanyakasaraneni/samplesheet-parser/main/images/samplesheet_parser_arch_v03.png)
 *`SampleSheetFactory` auto-detects the format and routes to the correct parser. Both formats share a common interface — `SampleSheetConverter` handles bidirectional conversion, `SampleSheetValidator` catches index and adapter issues, `SampleSheetDiff` compares two sheets across any combination of V1/V2 formats, `SampleSheetMerger` combines multiple per-project sheets into one, and `SampleSheetWriter` builds or edits sheets programmatically. The `samplesheet` CLI exposes all of this from the shell.*
@@ -505,6 +506,80 @@ sheet.get_read_structure()   # → ReadStructure dataclass
 ---
+## CI / pre-commit integration
+The CLI exits with meaningful codes (`0` = clean, `1` = issues, `2` = error), making it easy to wire into automated pipelines.
+### GitHub Actions
+Add a validation step to any workflow that touches `SampleSheet.csv`:
+```yaml
+# .github/workflows/validate-samplesheet.yml
+name: Validate SampleSheet
+on:
+  push:
+    paths:
+      - '**/SampleSheet.csv'
+  pull_request:
+    paths:
+      - '**/SampleSheet.csv'
+jobs:
+  validate:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+      - run: pip install "samplesheet-parser[cli]"
+      - name: Validate SampleSheet
+        run: samplesheet validate SampleSheet.csv --format json
+```
+### pre-commit hook
+Gate commits that touch any `SampleSheet.csv` in the repository:
+```yaml
+# .pre-commit-config.yaml
+repos:
+  - repo: local
+    hooks:
+      - id: samplesheet-validate
+        name: Validate SampleSheet.csv
+        entry: samplesheet validate
+        language: python
+        additional_dependencies: ["samplesheet-parser[cli]"]
+        files: SampleSheet\.csv$
+        pass_filenames: true
+```
+Install and run once to verify:
+```bash
+pip install pre-commit
+pre-commit install
+pre-commit run samplesheet-validate --all-files
+```
+### Stricter Hamming distance in CI
+If your lab uses longer indexes (10 bp+), raise the minimum Hamming distance threshold to catch borderline cases earlier:
+```bash
+samplesheet validate SampleSheet.csv --min-hamming 4
+```
+This is especially useful in CI where you want to prevent runs that will likely fail demultiplexing.
+---
 ## Contributing
 ```bash
@@ -533,7 +608,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for the full local testing guide and PR c
   title   = {samplesheet-parser: Format-agnostic parser for Illumina SampleSheet.csv},
   year    = {2026},
   url     = {https://github.com/chaitanyakasaraneni/samplesheet-parser},
-  version = {0.3.2}
+  version = {0.3.4}
 }
 ```

samplesheet_parser-0.3.4/images/samplesheet_parser_arch_v03.png ADDED Viewed

Binary file

{samplesheet_parser-0.3.2 → samplesheet_parser-0.3.4}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "samplesheet-parser"
-version = "0.3.2"
+version = "0.3.4"
 description = "Format-agnostic parser for Illumina SampleSheet.csv files — supports IEM V1 and BCLConvert V2"
 readme = "README.md"
 license = { file = "LICENSE" }

{samplesheet_parser-0.3.2 → samplesheet_parser-0.3.4}/samplesheet_parser/__init__.py RENAMED Viewed

@@ -36,6 +36,7 @@ from samplesheet_parser.converter import SampleSheetConverter
 from samplesheet_parser.diff import DiffResult, SampleSheetDiff
 from samplesheet_parser.enums import IndexType, SampleSheetVersion
 from samplesheet_parser.factory import SampleSheetFactory
+from samplesheet_parser.index_utils import normalize_index_lengths
 from samplesheet_parser.merger import MergeResult, SampleSheetMerger
 from samplesheet_parser.parsers.v1 import SampleSheetV1
 from samplesheet_parser.parsers.v2 import SampleSheetV2
@@ -56,5 +57,6 @@ __all__ = [
     "SampleSheetWriter",
     "SampleSheetMerger",
     "MergeResult",
+    "normalize_index_lengths",
     "__version__",
 ]

{samplesheet_parser-0.3.2 → samplesheet_parser-0.3.4}/samplesheet_parser/cli.py RENAMED Viewed

@@ -5,6 +5,7 @@ Entry point: ``samplesheet`` (configured in ``pyproject.toml``).
 Commands
 --------
+info        Show a quick summary of a sample sheet.
 validate    Validate a sheet — exit 0 if clean, exit 1 if errors.
 convert     Convert between V1 and V2 formats.
 diff        Diff two sheets — exit 1 if changes detected.
@@ -20,8 +21,12 @@ Usage
 -----
 ::
+    samplesheet info SampleSheet.csv
+    samplesheet info SampleSheet.csv --format json
     samplesheet validate SampleSheet.csv
     samplesheet validate SampleSheet.csv --format json
+    samplesheet validate SampleSheet.csv --min-hamming 4
     samplesheet convert SampleSheet_v1.csv --to v2 --output SampleSheet_v2.csv
     samplesheet convert SampleSheet_v2.csv --to v1 --output SampleSheet_v1.csv
@@ -50,6 +55,7 @@ except ImportError:  # pragma: no cover
     _TYPER_AVAILABLE = False
 from samplesheet_parser.enums import SampleSheetVersion
+from samplesheet_parser.validators import MIN_HAMMING_DISTANCE as _MIN_HAMMING_DEFAULT
 if _TYPER_AVAILABLE:
     app = typer.Typer(
@@ -115,6 +121,87 @@ if _TYPER_AVAILABLE:
             typer.echo(f"Error: unknown format '{fmt}'. Use 'text' or 'json'.", err=True)
             raise typer.Exit(code=2)
+    # ---------------------------------------------------------------------------
+    # info
+    # ---------------------------------------------------------------------------
+    @app.command()
+    def info(
+        path: Annotated[Path, typer.Argument(help="Path to SampleSheet.csv.", metavar="FILE")],
+        fmt: _FormatOption = "text",
+    ) -> None:
+        """Display a quick summary of a sample sheet without full validation.
+        Shows format version, sample count, lanes, index type, read lengths,
+        and adapter sequences at a glance.
+        Exits 0 on success, 2 on unreadable files.
+        """
+        from samplesheet_parser.factory import SampleSheetFactory
+        from samplesheet_parser.parsers.v1 import SampleSheetV1
+        _validate_fmt(fmt)
+        if not path.exists():
+            typer.echo(f"Error: file not found: {path}", err=True)
+            raise typer.Exit(code=2)
+        try:
+            factory = SampleSheetFactory()
+            sheet = factory.create_parser(str(path), parse=True, clean=False)
+        except Exception as exc:
+            typer.echo(f"Error: could not parse {path}: {exc}", err=True)
+            raise typer.Exit(code=2) from exc
+        if factory.version is None:  # pragma: no cover
+            raise RuntimeError("SampleSheetFactory.version must be set after create_parser")
+        samples = sheet.samples()
+        lanes = sorted({str(s.get("lane") or "") for s in samples} - {""}) or ["(none)"]
+        index_type = sheet.index_type()
+        adapters: list[str] = getattr(sheet, "adapters", []) or []
+        experiment_name: str | None = getattr(sheet, "experiment_name", None)
+        if isinstance(sheet, SampleSheetV1):
+            read_lengths = [str(r) for r in (sheet.read_lengths or [])]
+            instrument = sheet.instrument_type
+        else:
+            reads_dict = sheet.reads or {}
+            read_lengths = [
+                str(reads_dict[k])
+                for k in ("Read1Cycles", "Read2Cycles")
+                if k in reads_dict
+            ]
+            instrument = sheet.instrument_platform
+        if fmt == "json":
+            _print_json({
+                "file": str(path),
+                "format": factory.version.value,
+                "sample_count": len(samples),
+                "lanes": lanes,
+                "index_type": index_type,
+                "read_lengths": read_lengths,
+                "adapters": adapters,
+                "experiment_name": experiment_name,
+                "instrument": instrument,
+            })
+        else:
+            typer.echo(f"File:          {path}")
+            typer.echo(f"Format:        {factory.version.value}")
+            typer.echo(f"Samples:       {len(samples)}")
+            typer.echo(f"Lanes:         {', '.join(lanes)}")
+            typer.echo(f"Index type:    {index_type}")
+            typer.echo(
+                f"Read lengths:  {' + '.join(read_lengths) if read_lengths else '(not set)'}"
+            )
+            typer.echo(f"Adapters:      {', '.join(adapters) if adapters else '(none)'}")
+            if experiment_name:
+                typer.echo(f"Experiment:    {experiment_name}")
+            if instrument:
+                typer.echo(f"Instrument:    {instrument}")
+        raise typer.Exit(code=0)
     # ---------------------------------------------------------------------------
     # validate
     # ---------------------------------------------------------------------------
@@ -123,6 +210,17 @@ if _TYPER_AVAILABLE:
     def validate(
         path: Annotated[Path, typer.Argument(help="Path to SampleSheet.csv.", metavar="FILE")],
         fmt: _FormatOption = "text",
+        min_hamming: Annotated[
+            int,
+            typer.Option(
+                "--min-hamming",
+                help=(
+                    f"Minimum Hamming distance between indexes "
+                    f"(default: {_MIN_HAMMING_DEFAULT}, must be >= 1)."
+                ),
+                metavar="N",
+            ),
+        ] = _MIN_HAMMING_DEFAULT,
     ) -> None:
         """Validate a sample sheet for index, adapter, and structural issues.
@@ -134,6 +232,11 @@ if _TYPER_AVAILABLE:
         from samplesheet_parser.validators import SampleSheetValidator
         _validate_fmt(fmt)
+        if min_hamming < 1:
+            typer.echo(
+                f"Error: --min-hamming must be >= 1, got {min_hamming}.", err=True
+            )
+            raise typer.Exit(code=2)
         if not path.exists():
             typer.echo(f"Error: file not found: {path}", err=True)
             raise typer.Exit(code=2)
@@ -149,13 +252,14 @@ if _TYPER_AVAILABLE:
             raise RuntimeError("SampleSheetFactory.version must be set after create_parser")
         version = factory.version
-        result = SampleSheetValidator().validate(sheet)
+        result = SampleSheetValidator().validate(sheet, min_hamming_distance=min_hamming)
         if fmt == "json":
             _print_json({
                 "file": str(path),
                 "version": version.value,
                 "is_valid": result.is_valid,
+                "min_hamming_distance": min_hamming,
                 "errors": [
                     {"code": e.code, "message": e.message, "context": e.context}
                     for e in result.errors

samplesheet-parser 0.3.2__tar.gz → 0.3.4__tar.gz

samplesheet-parser 0.3.2tar.gz → 0.3.4tar.gz