PyPI - samplesheet-parser - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

samplesheet-parser 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

{samplesheet_parser-0.2.0 → samplesheet_parser-0.3.0}/.github/workflows/ci.yml RENAMED Viewed

@@ -13,7 +13,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python-version: ["3.10", "3.11", "3.12"]
+        python-version: ["3.12"]
     steps:
       - uses: actions/checkout@v4
@@ -24,7 +24,7 @@ jobs:
           python-version: ${{ matrix.python-version }}
       - name: Install dependencies
-        run: pip install -e ".[dev]"
+        run: pip install -e ".[dev,cli]"
       - name: Lint with ruff
         run: ruff check samplesheet_parser/

{samplesheet_parser-0.2.0 → samplesheet_parser-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: samplesheet-parser
-Version: 0.2.0
+Version: 0.3.0
 Summary: Format-agnostic parser for Illumina SampleSheet.csv files — supports IEM V1 and BCLConvert V2
 Project-URL: Homepage, https://github.com/chaitanyakasaraneni/samplesheet-parser
 Project-URL: Documentation, https://illumina-samplesheet.readthedocs.io
@@ -33,26 +33,27 @@ Classifier: Intended Audience :: Developers
 Classifier: Intended Audience :: Science/Research
 Classifier: License :: OSI Approved :: Apache Software License
 Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.10
-Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
 Classifier: Typing :: Typed
-Requires-Python: >=3.10
+Requires-Python: >=3.12
 Requires-Dist: loguru>=0.7
+Provides-Extra: cli
+Requires-Dist: typer>=0.9; extra == 'cli'
 Provides-Extra: dev
 Requires-Dist: black>=24.0; extra == 'dev'
 Requires-Dist: mypy>=1.8; extra == 'dev'
 Requires-Dist: pytest-cov>=4.1; extra == 'dev'
 Requires-Dist: pytest>=7.4; extra == 'dev'
 Requires-Dist: ruff>=0.3; extra == 'dev'
+Requires-Dist: typer>=0.9; extra == 'dev'
 Description-Content-Type: text/markdown
 # samplesheet-parser
 **Format-agnostic parser for Illumina SampleSheet.csv files.**
-Supports both the classic IEM V1 format (bcl2fastq era) and the modern BCLConvert V2 format (NovaSeq X series) — with automatic format detection, bidirectional conversion, index validation, Hamming distance checking, diff comparison, and programmatic sheet creation.
+Supports both the classic IEM V1 format (bcl2fastq era) and the modern BCLConvert V2 format (NovaSeq X series) — with automatic format detection, bidirectional conversion, index validation, Hamming distance checking, diff comparison, multi-sheet merging, programmatic sheet creation, and a full-featured CLI.
 [![PyPI version](https://img.shields.io/pypi/v/samplesheet-parser.svg)](https://pypi.org/project/samplesheet-parser/)
 [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
@@ -62,7 +63,7 @@ Supports both the classic IEM V1 format (bcl2fastq era) and the modern BCLConver
 ![samplesheet-parser overview](https://raw.githubusercontent.com/chaitanyakasaraneni/samplesheet-parser/main/images/samplesheet_parser_overview.png)
-*`SampleSheetFactory` auto-detects the format and routes to the correct parser. Both formats share a common interface — `SampleSheetConverter` handles bidirectional conversion, `SampleSheetValidator` catches index and adapter issues, `SampleSheetDiff` compares two sheets across any combination of V1/V2 formats, and `SampleSheetWriter` builds or edits sheets programmatically.*
+*`SampleSheetFactory` auto-detects the format and routes to the correct parser. Both formats share a common interface — `SampleSheetConverter` handles bidirectional conversion, `SampleSheetValidator` catches index and adapter issues, `SampleSheetDiff` compares two sheets across any combination of V1/V2 formats, `SampleSheetMerger` combines multiple per-project sheets into one, and `SampleSheetWriter` builds or edits sheets programmatically. The `samplesheet` CLI exposes all of this from the shell.*
 ---
@@ -77,10 +78,14 @@ Existing tools either hard-code one format or require the caller to know which f
 ## Installation
 ```bash
+# Core library only
 pip install samplesheet-parser
+# With the CLI (adds typer)
+pip install "samplesheet-parser[cli]"
 ```
-Requires Python 3.10+. No mandatory dependencies beyond `loguru`.
+Requires Python 3.10+. No mandatory runtime dependencies beyond `loguru`.
 ---
@@ -227,6 +232,106 @@ converts format while editing.
 ---
+### Merge multiple sheets
+Combine per-project sheets from a single run into one merged sheet.
+Conflicts (index collisions, read-length mismatches, adapter disagreements)
+are surfaced as structured results rather than silent failures.
+```python
+from samplesheet_parser import SampleSheetMerger
+from samplesheet_parser.enums import SampleSheetVersion
+result = (
+    SampleSheetMerger(target_version=SampleSheetVersion.V2)
+    .add("ProjectA.csv")
+    .add("ProjectB.csv")
+    .add("ProjectC.csv")
+    .merge("SampleSheet_combined.csv")
+)
+print(result.summary())
+# Merged 3 sheet(s) → SampleSheet_combined.csv (12 samples) — 0 conflict(s), 0 warning(s)
+if result.has_conflicts:
+    for c in result.conflicts:
+        print(c)
+    # [CONFLICT] INDEX_COLLISION: Index 'ATTACTCG+TATAGCCT' in lane 1
+    #   appears in both ProjectA.csv and ProjectB.csv
+for w in result.warnings:
+    print(w)
+    # [WARNING] MIXED_FORMAT: Input sheets are a mix of V1 and V2 formats.
+    #   All will be converted to V2 for output.
+```
+Mixed V1/V2 inputs are automatically converted to the target format.
+Pass `abort_on_conflicts=False` to write output even when conflicts exist.
+---
+## CLI
+Install the CLI extra and use the `samplesheet` command directly from the shell:
+```bash
+pip install "samplesheet-parser[cli]"
+```
+### validate
+```bash
+# Text output — exit 0 if clean, exit 1 if errors
+samplesheet validate SampleSheet.csv
+# JSON output for CI pipelines
+samplesheet validate SampleSheet.csv --format json
+```
+### convert
+```bash
+samplesheet convert SampleSheet_v1.csv --to v2 --output SampleSheet_v2.csv
+samplesheet convert SampleSheet_v2.csv --to v1 --output SampleSheet_v1.csv
+```
+### diff
+```bash
+# Exit 0 if identical, exit 1 if any differences detected
+samplesheet diff old/SampleSheet.csv new/SampleSheet.csv
+# JSON output for scripting
+samplesheet diff old/SampleSheet.csv new/SampleSheet.csv --format json
+```
+### merge
+```bash
+# Clean merge — exit 0
+samplesheet merge ProjectA.csv ProjectB.csv --output combined.csv
+# Merge three sheets to V1 format
+samplesheet merge ProjectA.csv ProjectB.csv ProjectC.csv --to v1 --output combined.csv
+# Write output even if conflicts are found
+samplesheet merge ProjectA.csv ProjectB.csv --output combined.csv --force
+# JSON output
+samplesheet merge ProjectA.csv ProjectB.csv --output combined.csv --format json
+```
+**Exit codes** (all commands):
+| Code | Meaning |
+|---|---|
+| `0` | Success / no issues |
+| `1` | Errors found (invalid sheet, conflicts, differences detected) |
+| `2` | Usage error (missing file, bad argument) |
+---
 ## Format detection logic
 The factory uses a three-step detection strategy — no format hints required from the caller:
@@ -274,6 +379,22 @@ result = ValidationResult()
 SampleSheetValidator()._check_index_distances(samples, result, min_distance=4)
 ```
+---
+## Merger conflict and warning codes
+| Code | Level | Description |
+|---|---|---|
+| `PARSE_ERROR` | conflict | An input sheet could not be parsed |
+| `INDEX_COLLISION` | conflict | The same index appears in the same lane across two sheets |
+| `READ_LENGTH_CONFLICT` | conflict | Sheets specify different read lengths or cycle counts |
+| `MERGE_VALIDATION_ERROR` | conflict | Post-merge validation of the combined sheet failed |
+| `MIXED_FORMAT` | warning | Input sheets are a mix of V1 and V2 formats |
+| `INDEX_DISTANCE_TOO_LOW` | warning | Cross-sheet index pair has Hamming distance below threshold |
+| `ADAPTER_CONFLICT` | warning | Adapter sequences differ between sheets (primary sheet adapters are used) |
+| `INCOMPLETE_SAMPLE_RECORD` | warning | A sample row is missing `Sample_ID` or index and was skipped |
 ---
 ## Diff
@@ -411,12 +532,36 @@ sheet.get_read_structure()   # → ReadStructure dataclass
 ---
+---
+### `SampleSheetMerger`
+| Method / attribute | Returns | Description |
+|---|---|---|
+| `SampleSheetMerger(target_version=)` | — | Instantiate; default target is `SampleSheetVersion.V2` |
+| `add(path)` | `self` | Register an input sheet path (fluent) |
+| `merge(output_path, *, validate=True, abort_on_conflicts=True)` | `MergeResult` | Run the merge and write output |
+### `MergeResult`
+| Attribute / method | Type | Description |
+|---|---|---|
+| `has_conflicts` | `bool` | `True` if any conflict was recorded |
+| `sample_count` | `int` | Number of samples in the merged output |
+| `output_path` | `Path \| None` | Path written; `None` if write was aborted |
+| `source_versions` | `dict[str, str]` | Per-input-file detected format version |
+| `conflicts` | `list[MergeConflict]` | Structured conflict records |
+| `warnings` | `list[MergeConflict]` | Structured warning records |
+| `summary()` | `str` | Human-readable one-line summary |
+---
 ## Contributing
 ```bash
 git clone https://github.com/chaitanyakasaraneni/samplesheet-parser
 cd samplesheet-parser
-pip install -e ".[dev]"
+pip install -e ".[dev,cli]"
 # Run tests
 pytest tests/ -v
@@ -439,7 +584,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for the full local testing guide and PR c
   title   = {samplesheet-parser: Format-agnostic parser for Illumina SampleSheet.csv},
   year    = {2026},
   url     = {https://github.com/chaitanyakasaraneni/samplesheet-parser},
-  version = {0.2.0}
+  version = {0.3.0}
 }
 ```

{samplesheet_parser-0.2.0 → samplesheet_parser-0.3.0}/README.md RENAMED Viewed

@@ -2,7 +2,7 @@
 **Format-agnostic parser for Illumina SampleSheet.csv files.**
-Supports both the classic IEM V1 format (bcl2fastq era) and the modern BCLConvert V2 format (NovaSeq X series) — with automatic format detection, bidirectional conversion, index validation, Hamming distance checking, diff comparison, and programmatic sheet creation.
+Supports both the classic IEM V1 format (bcl2fastq era) and the modern BCLConvert V2 format (NovaSeq X series) — with automatic format detection, bidirectional conversion, index validation, Hamming distance checking, diff comparison, multi-sheet merging, programmatic sheet creation, and a full-featured CLI.
 [![PyPI version](https://img.shields.io/pypi/v/samplesheet-parser.svg)](https://pypi.org/project/samplesheet-parser/)
 [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
@@ -12,7 +12,7 @@ Supports both the classic IEM V1 format (bcl2fastq era) and the modern BCLConver
 ![samplesheet-parser overview](https://raw.githubusercontent.com/chaitanyakasaraneni/samplesheet-parser/main/images/samplesheet_parser_overview.png)
-*`SampleSheetFactory` auto-detects the format and routes to the correct parser. Both formats share a common interface — `SampleSheetConverter` handles bidirectional conversion, `SampleSheetValidator` catches index and adapter issues, `SampleSheetDiff` compares two sheets across any combination of V1/V2 formats, and `SampleSheetWriter` builds or edits sheets programmatically.*
+*`SampleSheetFactory` auto-detects the format and routes to the correct parser. Both formats share a common interface — `SampleSheetConverter` handles bidirectional conversion, `SampleSheetValidator` catches index and adapter issues, `SampleSheetDiff` compares two sheets across any combination of V1/V2 formats, `SampleSheetMerger` combines multiple per-project sheets into one, and `SampleSheetWriter` builds or edits sheets programmatically. The `samplesheet` CLI exposes all of this from the shell.*
 ---
@@ -27,10 +27,14 @@ Existing tools either hard-code one format or require the caller to know which f
 ## Installation
 ```bash
+# Core library only
 pip install samplesheet-parser
+# With the CLI (adds typer)
+pip install "samplesheet-parser[cli]"
 ```
-Requires Python 3.10+. No mandatory dependencies beyond `loguru`.
+Requires Python 3.10+. No mandatory runtime dependencies beyond `loguru`.
 ---
@@ -177,6 +181,106 @@ converts format while editing.
 ---
+### Merge multiple sheets
+Combine per-project sheets from a single run into one merged sheet.
+Conflicts (index collisions, read-length mismatches, adapter disagreements)
+are surfaced as structured results rather than silent failures.
+```python
+from samplesheet_parser import SampleSheetMerger
+from samplesheet_parser.enums import SampleSheetVersion
+result = (
+    SampleSheetMerger(target_version=SampleSheetVersion.V2)
+    .add("ProjectA.csv")
+    .add("ProjectB.csv")
+    .add("ProjectC.csv")
+    .merge("SampleSheet_combined.csv")
+)
+print(result.summary())
+# Merged 3 sheet(s) → SampleSheet_combined.csv (12 samples) — 0 conflict(s), 0 warning(s)
+if result.has_conflicts:
+    for c in result.conflicts:
+        print(c)
+    # [CONFLICT] INDEX_COLLISION: Index 'ATTACTCG+TATAGCCT' in lane 1
+    #   appears in both ProjectA.csv and ProjectB.csv
+for w in result.warnings:
+    print(w)
+    # [WARNING] MIXED_FORMAT: Input sheets are a mix of V1 and V2 formats.
+    #   All will be converted to V2 for output.
+```
+Mixed V1/V2 inputs are automatically converted to the target format.
+Pass `abort_on_conflicts=False` to write output even when conflicts exist.
+---
+## CLI
+Install the CLI extra and use the `samplesheet` command directly from the shell:
+```bash
+pip install "samplesheet-parser[cli]"
+```
+### validate
+```bash
+# Text output — exit 0 if clean, exit 1 if errors
+samplesheet validate SampleSheet.csv
+# JSON output for CI pipelines
+samplesheet validate SampleSheet.csv --format json
+```
+### convert
+```bash
+samplesheet convert SampleSheet_v1.csv --to v2 --output SampleSheet_v2.csv
+samplesheet convert SampleSheet_v2.csv --to v1 --output SampleSheet_v1.csv
+```
+### diff
+```bash
+# Exit 0 if identical, exit 1 if any differences detected
+samplesheet diff old/SampleSheet.csv new/SampleSheet.csv
+# JSON output for scripting
+samplesheet diff old/SampleSheet.csv new/SampleSheet.csv --format json
+```
+### merge
+```bash
+# Clean merge — exit 0
+samplesheet merge ProjectA.csv ProjectB.csv --output combined.csv
+# Merge three sheets to V1 format
+samplesheet merge ProjectA.csv ProjectB.csv ProjectC.csv --to v1 --output combined.csv
+# Write output even if conflicts are found
+samplesheet merge ProjectA.csv ProjectB.csv --output combined.csv --force
+# JSON output
+samplesheet merge ProjectA.csv ProjectB.csv --output combined.csv --format json
+```
+**Exit codes** (all commands):
+| Code | Meaning |
+|---|---|
+| `0` | Success / no issues |
+| `1` | Errors found (invalid sheet, conflicts, differences detected) |
+| `2` | Usage error (missing file, bad argument) |
+---
 ## Format detection logic
 The factory uses a three-step detection strategy — no format hints required from the caller:
@@ -224,6 +328,22 @@ result = ValidationResult()
 SampleSheetValidator()._check_index_distances(samples, result, min_distance=4)
 ```
+---
+## Merger conflict and warning codes
+| Code | Level | Description |
+|---|---|---|
+| `PARSE_ERROR` | conflict | An input sheet could not be parsed |
+| `INDEX_COLLISION` | conflict | The same index appears in the same lane across two sheets |
+| `READ_LENGTH_CONFLICT` | conflict | Sheets specify different read lengths or cycle counts |
+| `MERGE_VALIDATION_ERROR` | conflict | Post-merge validation of the combined sheet failed |
+| `MIXED_FORMAT` | warning | Input sheets are a mix of V1 and V2 formats |
+| `INDEX_DISTANCE_TOO_LOW` | warning | Cross-sheet index pair has Hamming distance below threshold |
+| `ADAPTER_CONFLICT` | warning | Adapter sequences differ between sheets (primary sheet adapters are used) |
+| `INCOMPLETE_SAMPLE_RECORD` | warning | A sample row is missing `Sample_ID` or index and was skipped |
 ---
 ## Diff
@@ -361,12 +481,36 @@ sheet.get_read_structure()   # → ReadStructure dataclass
 ---
+---
+### `SampleSheetMerger`
+| Method / attribute | Returns | Description |
+|---|---|---|
+| `SampleSheetMerger(target_version=)` | — | Instantiate; default target is `SampleSheetVersion.V2` |
+| `add(path)` | `self` | Register an input sheet path (fluent) |
+| `merge(output_path, *, validate=True, abort_on_conflicts=True)` | `MergeResult` | Run the merge and write output |
+### `MergeResult`
+| Attribute / method | Type | Description |
+|---|---|---|
+| `has_conflicts` | `bool` | `True` if any conflict was recorded |
+| `sample_count` | `int` | Number of samples in the merged output |
+| `output_path` | `Path \| None` | Path written; `None` if write was aborted |
+| `source_versions` | `dict[str, str]` | Per-input-file detected format version |
+| `conflicts` | `list[MergeConflict]` | Structured conflict records |
+| `warnings` | `list[MergeConflict]` | Structured warning records |
+| `summary()` | `str` | Human-readable one-line summary |
+---
 ## Contributing
 ```bash
 git clone https://github.com/chaitanyakasaraneni/samplesheet-parser
 cd samplesheet-parser
-pip install -e ".[dev]"
+pip install -e ".[dev,cli]"
 # Run tests
 pytest tests/ -v
@@ -389,7 +533,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for the full local testing guide and PR c
   title   = {samplesheet-parser: Format-agnostic parser for Illumina SampleSheet.csv},
   year    = {2026},
   url     = {https://github.com/chaitanyakasaraneni/samplesheet-parser},
-  version = {0.2.0}
+  version = {0.3.0}
 }
 ```

{samplesheet_parser-0.2.0 → samplesheet_parser-0.3.0}/examples/parse_examples.py RENAMED Viewed

@@ -6,7 +6,8 @@ Run from the repo root:
     python examples/parse_examples.py
 Demonstrates auto-detection, samples(), index_type(), UMI extraction,
-and validation for every example sheet in examples/sample_sheets/.
+validation, and custom section parsing for every example sheet in
+examples/sample_sheets/.
 """
 from __future__ import annotations
@@ -22,18 +23,23 @@ from samplesheet_parser import SampleSheetFactory, SampleSheetValidator
 SHEETS_DIR = Path(__file__).parent / "sample_sheets"
 # Ordered for readability: V1 first, then V2
-EXAMPLE_FILES = [
-    "v1_dual_index.csv",
-    "v1_single_index.csv",
-    "v1_multi_lane.csv",
-    "v2_novaseq_x_dual_index.csv",
-    "v2_with_index_umi.csv",
-    "v2_with_read_umi.csv",
-    "v2_nextseq_single_index.csv",
+# Each entry is (filename, list of custom section names to demo, or [])
+EXAMPLE_FILES: list[tuple[str, list[str]]] = [
+    ("v1_dual_index.csv",            []),
+    ("v1_single_index.csv",          []),
+    ("v1_multi_lane.csv",            []),
+    ("v1_with_manifests.csv",        ["Manifests"]),
+    ("v1_with_lab_qc_settings.csv",  ["Lab_QC_Settings"]),
+    ("v2_novaseq_x_dual_index.csv",  []),
+    ("v2_with_index_umi.csv",        []),
+    ("v2_with_read_umi.csv",         []),
+    ("v2_nextseq_single_index.csv",  []),
+    ("v2_with_cloud_settings.csv",   ["Cloud_Settings"]),
+    ("v2_with_pipeline_settings.csv", ["Pipeline_Settings"]),
 ]
-def parse_sheet(path: Path) -> None:
+def parse_sheet(path: Path, custom_sections: list[str]) -> None:
     print(f"\n{'='*60}")
     print(f"  {path.name}")
     print(f"{'='*60}")
@@ -70,6 +76,18 @@ def parse_sheet(path: Path) -> None:
         print(f"  UMI location    : {rs.umi_location}")
         print(f"  Read structure  : {rs.read_structure}")
+    # Custom sections
+    if custom_sections:
+        print("\n  Custom sections:")
+        for section_name in custom_sections:
+            data = sheet.parse_custom_section(section_name)
+            if data:
+                print(f"    [{section_name}]")
+                for key, value in data.items():
+                    print(f"      {key:<28} {value}")
+            else:
+                print(f"    [{section_name}] — (empty or not present)")
     # Samples table
     samples = sheet.samples()
     print(f"\n  Samples ({len(samples)} total):")
@@ -97,14 +115,14 @@ def main() -> None:
     print("samplesheet-parser — Example Sheet Demo")
     print(f"Parsing {len(EXAMPLE_FILES)} example sheets from {SHEETS_DIR}\n")
-    missing = [f for f in EXAMPLE_FILES if not (SHEETS_DIR / f).exists()]
+    missing = [f for f, _ in EXAMPLE_FILES if not (SHEETS_DIR / f).exists()]
     if missing:
         print(f"Warning: missing files: {missing}")
-    for filename in EXAMPLE_FILES:
+    for filename, custom_sections in EXAMPLE_FILES:
         path = SHEETS_DIR / filename
         if path.exists():
-            parse_sheet(path)
+            parse_sheet(path, custom_sections)
     print(f"\n{'='*60}")
     print("Done.")

samplesheet-parser 0.2.0__tar.gz → 0.3.0__tar.gz

samplesheet-parser 0.2.0tar.gz → 0.3.0tar.gz