PyPI - samplesheet-parser - Versions diffs - 0.1.5__tar.gz → 0.2.0__tar.gz - Mend

samplesheet-parser 0.1.5tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

samplesheet_parser-0.2.0/.github/workflows/copilot-instructions.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Copilot Review Instructions
+## Project context
+`samplesheet-parser` is a Python library for parsing, validating, converting,
+and writing Illumina SampleSheet.csv files. The codebase follows strict typing
+(mypy strict mode), ruff linting, and pytest for tests.
+---
+## What to focus on
+- Logic bugs in CSV rendering that would produce malformed output
+- Correctness of V1 ↔ V2 format conversion
+- Missing edge cases in tests (multi-lane, empty sections, malformed input)
+- Violations of the shared V1/V2 interface contract
+- Actual type errors or missing None checks
+---
+## What to skip
+**Do not suggest `_validate_field` or defensive input validation on fields
+with naturally constrained input domains**, including:
+- `set_software_version(version)` — version strings like `"4.2.7"` do not
+  contain commas by definition
+- `set_reads(read1, read2, index1, index2)` — integer parameters
+- `set_override_cycles(override)` — semicolon-delimited cycle strings where
+  commas are not valid syntax anyway
+- Any integer, enum, or boolean parameter
+`_validate_field` is intentionally applied only to free-text string fields
+where user input is genuinely unpredictable (sample IDs, project names,
+index sequences, custom column values).
+**Do not flag unused-looking constants or helpers without checking all call
+sites**, including private methods prefixed with `_`.
+**Do not suggest adding `__all__` exports for internal helpers** (functions
+or classes prefixed with `_`).
+**Do not suggest docstring changes** unless the docstring is factually wrong.
+---
+## Code style expectations
+- Line length: 88 (ruff default)
+- Type annotations: required on all public methods
+- Test naming: `test_<what>_<condition>_<expected>` pattern
+- No `assert` in production code — raise `ValueError` with a descriptive message
+- Method chaining: all configuration methods return `self`

{samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/.gitignore RENAMED Viewed

@@ -1,4 +1,9 @@
+# SampleSheet Parser local files to ignore
 BLOGPOST.md
+tests/fixtures/outputs/
+demo_output.txt
+# Cache and build artifacts
 __pycache__/
 *.py[cod]
 *.pyo

samplesheet_parser-0.2.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,126 @@
+# Changelog
+All notable changes to `samplesheet-parser` are documented here.
+The format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
+---
+## [0.2.0] - 2026-02-25
+### Added
+- **`SampleSheetWriter`** — programmatic creation and editing of IEM V1 and
+  BCLConvert V2 sample sheets.
+  - Build sheets from scratch with a fluent API: `set_header()`, `set_reads()`,
+    `set_adapter()`, `set_override_cycles()`, `set_software_version()`,
+    `set_setting()`, `add_sample()`.
+  - `from_sheet(sheet, version=)` class method — load any parsed V1/V2 sheet,
+    edit in place, and write back; pass a different `version` to convert format
+    while editing.
+  - `remove_sample(sample_id, lane=)` and `update_sample(sample_id, **fields)`
+    for surgical edits to existing sheets.
+  - `write(path, validate=True)` — runs `SampleSheetValidator` before writing
+    by default; raises `ValueError` with the full error list if validation fails.
+  - `to_string()` — serialise to a string without writing to disk (useful for
+    testing and inspection).
+  - CSV safety: `_validate_field` rejects commas, newlines, and quotes in all
+    free-text inputs (`sample_id`, `index`, `project`, `run_name`, adapter
+    sequences, custom column keys/values, etc.) at input time with a clear
+    error message.
+  - `SampleSheetWriter` is now exported from the top-level package.
+- **`SampleSheetDiff`** — structured comparison of two sample sheets across
+  any combination of V1 and V2 formats.
+  - Compares header, reads, settings, and samples in a single `compare()` call.
+  - Returns a `DiffResult` dataclass with `header_changes`, `samples_added`,
+    `samples_removed`, and `sample_changes`.
+  - V1-only metadata columns (`I7_Index_ID`, `I5_Index_ID`, `Sample_Name`,
+    `Description`) are suppressed during cross-format comparison to avoid
+    format-noise diffs.
+  - `DiffResult.summary()` and `DiffResult.has_changes` for quick inspection.
+- **`INDEX_DISTANCE_TOO_LOW` validation check** — `SampleSheetValidator` now
+  computes the Hamming distance between every pair of index sequences within
+  each lane and warns when the distance falls below the recommended minimum
+  of 3. For dual-index sheets the combined I7+I5 sequence is used so that
+  pairs well-separated on I5 are not incorrectly flagged.
+- **`_hamming_distance` helper** — module-level pure function, independently
+  testable, handles sequences of unequal length by comparing up to the shorter
+  sequence length.
+- **`scripts/demo_writer.py`** — smoke-test script demonstrating V1/V2
+  from-scratch creation and round-trip editing.
+- **`scripts/demo_diff.py`** — smoke-test script demonstrating identical,
+  modified, and cross-format diff scenarios.
+- **`.github/copilot-instructions.md`** — Copilot review instructions scoping
+  suggestions to logic bugs, test coverage gaps, and type errors.
+### Changed
+- README updated to document `SampleSheetDiff`, `SampleSheetWriter`,
+  Hamming distance validation, and the full API reference tables.
+---
+## [0.1.5] - 2026-02-23
+### Added
+- **`SampleSheetConverter`** — bidirectional V1 ↔ V2 format conversion.
+  - `to_v2(output_path)` — converts IEM V1 to BCLConvert V2.
+  - `to_v1(output_path)` — converts BCLConvert V2 to IEM V1 (lossy; V2-only
+    fields dropped with a warning).
+  - Auto-detects source format via `SampleSheetFactory`.
+- **`scripts/demo_converter.py`** — smoke-test script for converter scenarios
+  including V1→V2→V1 and V2→V1→V2 round-trips.
+- **`CONTRIBUTING.md`** — local development setup, test instructions, and
+  PR checklist.
+---
+## [0.1.1] – [0.1.4] - 2026-02-22 / 2026-02-23
+### Fixed
+- CI workflow not triggering on tag push — added `tags` trigger to
+  `ci.yml` (was gated on tags but never configured to *run* on them).
+- PyPI README image not rendering — switched from `badge.fury.io` to
+  `shields.io` dynamic badge; bumped versions to force PyPI to re-render
+  the README on each new release.
+- Minor ruff and mypy fixes surfaced during initial CI runs.
+> These were infrastructure-only patch releases with no API or behaviour
+> changes.
+---
+## [0.1.0] - 2026-02-22
+### Added
+- **`SampleSheetV1`** — parser for IEM V1 (bcl2fastq-era) sample sheets.
+  Parses `[Header]`, `[Reads]`, `[Settings]`, `[Manifests]`, and `[Data]`
+  sections. Exposes `samples()`, `index_type()`, `adapters`, `read_lengths`,
+  and all standard header fields.
+- **`SampleSheetV2`** — parser for BCLConvert V2 (NovaSeq X series) sample
+  sheets. Parses `[Header]`, `[Reads]`, `[BCLConvert_Settings]`,
+  `[BCLConvert_Data]`, and optional `[Cloud_Data]` sections. Adds
+  `get_umi_length()` and `get_read_structure()` for `OverrideCycles` decoding.
+- **`SampleSheetFactory`** — auto-detects V1 vs V2 format using a three-step
+  strategy (header key scan → section name scan → V1 fallback) and returns
+  the appropriate parser.
+- **`SampleSheetValidator`** — validates parsed sheets for `EMPTY_SAMPLES`,
+  `INVALID_INDEX_CHARS`, `INDEX_TOO_SHORT`, `INDEX_TOO_LONG`,
+  `DUPLICATE_INDEX`, `MISSING_INDEX2`, `DUPLICATE_SAMPLE_ID`, `NO_ADAPTERS`,
+  and `ADAPTER_MISMATCH`. Returns a structured `ValidationResult`.
+- Initial PyPI release. Requires Python 3.10+, depends only on `loguru`.

samplesheet_parser-0.2.0/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,195 @@
+# Contributing to samplesheet-parser
+Thanks for contributing! This guide explains how to set up a local development environment, run tests, and prepare your PR with the evidence reviewers need to merge confidently.
+---
+## Table of contents
+- [Contributing to samplesheet-parser](#contributing-to-samplesheet-parser)
+  - [Table of contents](#table-of-contents)
+  - [Setup](#setup)
+  - [Running the test suite](#running-the-test-suite)
+  - [Running the demo script](#running-the-demo-script)
+  - [What to attach to your PR](#what-to-attach-to-your-pr)
+    - [For any PR](#for-any-pr)
+    - [For PRs touching the converter](#for-prs-touching-the-converter)
+    - [Example PR description](#example-pr-description)
+  - [PR checklist](#pr-checklist)
+  - [Code style](#code-style)
+  - [Adding fixture files](#adding-fixture-files)
+---
+## Setup
+```bash
+# 1. Clone and enter the repo
+git clone https://github.com/chaitanyakasaraneni/samplesheet-parser.git
+cd samplesheet-parser
+# 2. Create a virtual environment (Python 3.10+)
+python -m venv .venv
+source .venv/bin/activate        # Windows: .venv\Scripts\activate
+# 3. Install in editable mode with dev dependencies
+pip install -e ".[dev]"
+```
+---
+## Running the test suite
+```bash
+# Run all tests with coverage
+pytest
+# Run only the converter tests
+pytest tests/test_converter.py -v
+# Run a single test class
+pytest tests/test_converter.py::TestV1ToV2 -v
+# Run with a coverage threshold (CI requires ≥ 85%)
+pytest --cov-fail-under=85
+```
+Coverage and test results are printed to the terminal and written to `coverage.xml` (used by Codecov on CI).
+---
+## Running the demo script
+For PRs that touch the converter, run the demo script to generate real input/output artifacts you can attach to the PR.
+```bash
+python scripts/demo_converter.py
+```
+This script:
+1. Reads the fixture files from `tests/fixtures/`
+2. Runs V1 → V2 and V2 → V1 conversions
+3. Runs both round-trip directions (V1 → V2 → V1 and V2 → V1 → V2)
+4. Validates that sample IDs survive each round-trip
+5. Writes all output files to `tests/fixtures/outputs/`
+A passing run looks like:
+```
+────────────────────────────────────────────────────────────
+  1/4  V1 → V2  (SampleSheet_v1_dual_index.csv)
+────────────────────────────────────────────────────────────
+Input  : tests/fixtures/SampleSheet_v1_dual_index.csv
+Output : tests/fixtures/outputs/SampleSheet_v1_converted_to_v2.csv
+✓ FileFormatVersion : 2
+✓ RunName           : NovaSeqRun_20240115
+✓ Sample count      : 8
+✓ Index type        : dual
+...
+✓ All conversions passed. Attach the files in tests/fixtures/outputs/ to your PR.
+```
+Exit code `0` means all conversions passed. Exit code `1` means something failed — the error is printed to stderr.
+---
+## What to attach to your PR
+### For any PR
+Paste the output of `pytest` into the PR description or a comment. The minimum required is the summary line:
+```
+===== 47 passed, 0 warnings in 3.21s =====
+```
+A screenshot works too. The CI checks will also run automatically on push.
+### For PRs touching the converter
+Run `python scripts/demo_converter.py` and attach **all four output files** from `tests/fixtures/outputs/` as file attachments to the PR description:
+| File | What it shows |
+|---|---|
+| `SampleSheet_v1_converted_to_v2.csv` | V1 → V2 output |
+| `SampleSheet_v2_converted_to_v1.csv` | V2 → V1 output (lossy — note dropped fields in logs) |
+| `SampleSheet_v1_roundtrip.csv` | V1 → V2 → V1 — sample IDs must match the original |
+| `SampleSheet_v2_roundtrip.csv` | V2 → V1 → V2 — sample IDs must match the original |
+To attach files to a GitHub PR description or comment, drag-and-drop them into the text box, or use the paperclip icon. `.csv` files can be attached directly.
+Also paste the full terminal output of the demo script so reviewers can see the round-trip validation results without running it themselves.
+### Example PR description
+```
+## What this PR does
+Adds `SampleSheetConverter` to support V1 ↔ V2 conversions.
+## Test results
+pytest output:
+```
+===== 47 passed in 3.2s =====
+```
+## Converter demo output
+<paste full output of `python scripts/demo_converter.py` here>
+## Attached files
+- SampleSheet_v1_converted_to_v2.csv
+- SampleSheet_v2_converted_to_v1.csv
+- SampleSheet_v1_roundtrip.csv
+- SampleSheet_v2_roundtrip.csv
+```
+---
+## PR checklist
+Before marking a PR ready for review, confirm:
+- [ ] `pytest` passes locally with no failures
+- [ ] Coverage has not decreased (run `pytest` and check the `TOTAL` line)
+- [ ] New behaviour has tests — aim for one test per logical case, not per line
+- [ ] For converter changes: demo script passes and output files are attached
+- [ ] `ruff check .` passes (no lint errors)
+- [ ] `black --check .` passes (code is formatted)
+- [ ] Docstrings updated for any changed public methods
+- [ ] `CHANGELOG.md` entry added under `[Unreleased]` if applicable
+---
+## Code style
+The project uses [Black](https://black.readthedocs.io) for formatting and [Ruff](https://docs.astral.sh/ruff/) for linting.
+```bash
+# Format
+black .
+# Lint
+ruff check .
+# Fix auto-fixable lint issues
+ruff check . --fix
+```
+Both are enforced by CI. The line length is **100 characters** (set in `pyproject.toml`).
+Type annotations are required for all public functions. Run mypy with:
+```bash
+mypy samplesheet_parser/
+```
+---
+## Adding fixture files
+If your PR adds a new sheet format or edge case, add a corresponding fixture file to `tests/fixtures/` alongside the existing ones. Name it descriptively, e.g. `SampleSheet_v1_no_reads.csv`. The fixture directory is tracked in git so reviewers can inspect the inputs your tests use.

samplesheet-parser 0.1.5__tar.gz → 0.2.0__tar.gz

samplesheet-parser 0.1.5tar.gz → 0.2.0tar.gz