samplesheet-parser 0.1.5__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. samplesheet_parser-0.2.0/.github/workflows/copilot-instructions.md +53 -0
  2. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/.gitignore +5 -0
  3. samplesheet_parser-0.2.0/CHANGELOG.md +126 -0
  4. samplesheet_parser-0.2.0/CONTRIBUTING.md +195 -0
  5. samplesheet_parser-0.2.0/PKG-INFO +459 -0
  6. samplesheet_parser-0.2.0/README.md +409 -0
  7. samplesheet_parser-0.2.0/images/samplesheet_parser_overview.png +0 -0
  8. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/pyproject.toml +1 -1
  9. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/samplesheet_parser/__init__.py +7 -0
  10. samplesheet_parser-0.2.0/samplesheet_parser/converter.py +401 -0
  11. samplesheet_parser-0.2.0/samplesheet_parser/diff.py +429 -0
  12. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/samplesheet_parser/validators.py +110 -0
  13. samplesheet_parser-0.2.0/samplesheet_parser/writer.py +975 -0
  14. samplesheet_parser-0.2.0/scripts/demo_converter.py +175 -0
  15. samplesheet_parser-0.2.0/scripts/demo_diff.py +205 -0
  16. samplesheet_parser-0.2.0/scripts/demo_writer.py +245 -0
  17. samplesheet_parser-0.2.0/tests/fixtures/SampleSheet_v1_dual_index.csv +27 -0
  18. samplesheet_parser-0.2.0/tests/fixtures/SampleSheet_v2_dual_index.csv +30 -0
  19. samplesheet_parser-0.2.0/tests/fixtures/SampleSheet_v2_modified.csv +31 -0
  20. samplesheet_parser-0.2.0/tests/test_converter.py +483 -0
  21. samplesheet_parser-0.2.0/tests/test_diff.py +657 -0
  22. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/tests/test_parsers/test_v1.py +60 -0
  23. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/tests/test_parsers/test_v2.py +87 -0
  24. samplesheet_parser-0.2.0/tests/test_validators/test_hamming.py +672 -0
  25. samplesheet_parser-0.2.0/tests/test_writer.py +698 -0
  26. samplesheet_parser-0.1.5/PKG-INFO +0 -384
  27. samplesheet_parser-0.1.5/README.md +0 -334
  28. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/.github/workflows/ci.yml +0 -0
  29. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/LICENSE +0 -0
  30. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/examples/parse_examples.py +0 -0
  31. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/examples/sample_sheets/README.md +0 -0
  32. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/examples/sample_sheets/v1_dual_index.csv +0 -0
  33. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/examples/sample_sheets/v1_multi_lane.csv +0 -0
  34. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/examples/sample_sheets/v1_single_index.csv +0 -0
  35. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/examples/sample_sheets/v2_nextseq_single_index.csv +0 -0
  36. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/examples/sample_sheets/v2_novaseq_x_dual_index.csv +0 -0
  37. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/examples/sample_sheets/v2_with_index_umi.csv +0 -0
  38. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/examples/sample_sheets/v2_with_read_umi.csv +0 -0
  39. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/samplesheet_parser/enums.py +0 -0
  40. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/samplesheet_parser/factory.py +0 -0
  41. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/samplesheet_parser/parsers/__init__.py +0 -0
  42. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/samplesheet_parser/parsers/v1.py +0 -0
  43. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/samplesheet_parser/parsers/v2.py +0 -0
  44. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/tests/__init__.py +0 -0
  45. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/tests/conftest.py +0 -0
  46. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/tests/test_factory.py +0 -0
  47. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/tests/test_parsers/__init__.py +0 -0
  48. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/tests/test_validators/__init__.py +0 -0
  49. {samplesheet_parser-0.1.5 → samplesheet_parser-0.2.0}/tests/test_validators/test_validators.py +0 -0
@@ -0,0 +1,53 @@
1
+ # Copilot Review Instructions
2
+
3
+ ## Project context
4
+
5
+ `samplesheet-parser` is a Python library for parsing, validating, converting,
6
+ and writing Illumina SampleSheet.csv files. The codebase follows strict typing
7
+ (mypy strict mode), ruff linting, and pytest for tests.
8
+
9
+ ---
10
+
11
+ ## What to focus on
12
+
13
+ - Logic bugs in CSV rendering that would produce malformed output
14
+ - Correctness of V1 ↔ V2 format conversion
15
+ - Missing edge cases in tests (multi-lane, empty sections, malformed input)
16
+ - Violations of the shared V1/V2 interface contract
17
+ - Actual type errors or missing None checks
18
+
19
+ ---
20
+
21
+ ## What to skip
22
+
23
+ **Do not suggest `_validate_field` or defensive input validation on fields
24
+ with naturally constrained input domains**, including:
25
+
26
+ - `set_software_version(version)` — version strings like `"4.2.7"` do not
27
+ contain commas by definition
28
+ - `set_reads(read1, read2, index1, index2)` — integer parameters
29
+ - `set_override_cycles(override)` — semicolon-delimited cycle strings where
30
+ commas are not valid syntax anyway
31
+ - Any integer, enum, or boolean parameter
32
+
33
+ `_validate_field` is intentionally applied only to free-text string fields
34
+ where user input is genuinely unpredictable (sample IDs, project names,
35
+ index sequences, custom column values).
36
+
37
+ **Do not flag unused-looking constants or helpers without checking all call
38
+ sites**, including private methods prefixed with `_`.
39
+
40
+ **Do not suggest adding `__all__` exports for internal helpers** (functions
41
+ or classes prefixed with `_`).
42
+
43
+ **Do not suggest docstring changes** unless the docstring is factually wrong.
44
+
45
+ ---
46
+
47
+ ## Code style expectations
48
+
49
+ - Line length: 88 (ruff default)
50
+ - Type annotations: required on all public methods
51
+ - Test naming: `test_<what>_<condition>_<expected>` pattern
52
+ - No `assert` in production code — raise `ValueError` with a descriptive message
53
+ - Method chaining: all configuration methods return `self`
@@ -1,4 +1,9 @@
1
+ # SampleSheet Parser local files to ignore
1
2
  BLOGPOST.md
3
+ tests/fixtures/outputs/
4
+ demo_output.txt
5
+
6
+ # Cache and build artifacts
2
7
  __pycache__/
3
8
  *.py[cod]
4
9
  *.pyo
@@ -0,0 +1,126 @@
1
+ # Changelog
2
+
3
+ All notable changes to `samplesheet-parser` are documented here.
4
+
5
+ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
6
+
7
+ ---
8
+
9
+ ## [0.2.0] - 2026-02-25
10
+
11
+ ### Added
12
+
13
+ - **`SampleSheetWriter`** — programmatic creation and editing of IEM V1 and
14
+ BCLConvert V2 sample sheets.
15
+ - Build sheets from scratch with a fluent API: `set_header()`, `set_reads()`,
16
+ `set_adapter()`, `set_override_cycles()`, `set_software_version()`,
17
+ `set_setting()`, `add_sample()`.
18
+ - `from_sheet(sheet, version=)` class method — load any parsed V1/V2 sheet,
19
+ edit in place, and write back; pass a different `version` to convert format
20
+ while editing.
21
+ - `remove_sample(sample_id, lane=)` and `update_sample(sample_id, **fields)`
22
+ for surgical edits to existing sheets.
23
+ - `write(path, validate=True)` — runs `SampleSheetValidator` before writing
24
+ by default; raises `ValueError` with the full error list if validation fails.
25
+ - `to_string()` — serialise to a string without writing to disk (useful for
26
+ testing and inspection).
27
+ - CSV safety: `_validate_field` rejects commas, newlines, and quotes in all
28
+ free-text inputs (`sample_id`, `index`, `project`, `run_name`, adapter
29
+ sequences, custom column keys/values, etc.) at input time with a clear
30
+ error message.
31
+ - `SampleSheetWriter` is now exported from the top-level package.
32
+
33
+ - **`SampleSheetDiff`** — structured comparison of two sample sheets across
34
+ any combination of V1 and V2 formats.
35
+ - Compares header, reads, settings, and samples in a single `compare()` call.
36
+ - Returns a `DiffResult` dataclass with `header_changes`, `samples_added`,
37
+ `samples_removed`, and `sample_changes`.
38
+ - V1-only metadata columns (`I7_Index_ID`, `I5_Index_ID`, `Sample_Name`,
39
+ `Description`) are suppressed during cross-format comparison to avoid
40
+ format-noise diffs.
41
+ - `DiffResult.summary()` and `DiffResult.has_changes` for quick inspection.
42
+
43
+ - **`INDEX_DISTANCE_TOO_LOW` validation check** — `SampleSheetValidator` now
44
+ computes the Hamming distance between every pair of index sequences within
45
+ each lane and warns when the distance falls below the recommended minimum
46
+ of 3. For dual-index sheets the combined I7+I5 sequence is used so that
47
+ pairs well-separated on I5 are not incorrectly flagged.
48
+
49
+ - **`_hamming_distance` helper** — module-level pure function, independently
50
+ testable, handles sequences of unequal length by comparing up to the shorter
51
+ sequence length.
52
+
53
+ - **`scripts/demo_writer.py`** — smoke-test script demonstrating V1/V2
54
+ from-scratch creation and round-trip editing.
55
+
56
+ - **`scripts/demo_diff.py`** — smoke-test script demonstrating identical,
57
+ modified, and cross-format diff scenarios.
58
+
59
+ - **`.github/copilot-instructions.md`** — Copilot review instructions scoping
60
+ suggestions to logic bugs, test coverage gaps, and type errors.
61
+
62
+ ### Changed
63
+
64
+ - README updated to document `SampleSheetDiff`, `SampleSheetWriter`,
65
+ Hamming distance validation, and the full API reference tables.
66
+
67
+ ---
68
+
69
+ ## [0.1.5] - 2026-02-23
70
+
71
+ ### Added
72
+
73
+ - **`SampleSheetConverter`** — bidirectional V1 ↔ V2 format conversion.
74
+ - `to_v2(output_path)` — converts IEM V1 to BCLConvert V2.
75
+ - `to_v1(output_path)` — converts BCLConvert V2 to IEM V1 (lossy; V2-only
76
+ fields dropped with a warning).
77
+ - Auto-detects source format via `SampleSheetFactory`.
78
+
79
+ - **`scripts/demo_converter.py`** — smoke-test script for converter scenarios
80
+ including V1→V2→V1 and V2→V1→V2 round-trips.
81
+
82
+ - **`CONTRIBUTING.md`** — local development setup, test instructions, and
83
+ PR checklist.
84
+
85
+ ---
86
+
87
+ ## [0.1.1] – [0.1.4] - 2026-02-22 / 2026-02-23
88
+
89
+ ### Fixed
90
+
91
+ - CI workflow not triggering on tag push — added `tags` trigger to
92
+ `ci.yml` (was gated on tags but never configured to *run* on them).
93
+ - PyPI README image not rendering — switched from `badge.fury.io` to
94
+ `shields.io` dynamic badge; bumped versions to force PyPI to re-render
95
+ the README on each new release.
96
+ - Minor ruff and mypy fixes surfaced during initial CI runs.
97
+
98
+ > These were infrastructure-only patch releases with no API or behaviour
99
+ > changes.
100
+
101
+ ---
102
+
103
+ ## [0.1.0] - 2026-02-22
104
+
105
+ ### Added
106
+
107
+ - **`SampleSheetV1`** — parser for IEM V1 (bcl2fastq-era) sample sheets.
108
+ Parses `[Header]`, `[Reads]`, `[Settings]`, `[Manifests]`, and `[Data]`
109
+ sections. Exposes `samples()`, `index_type()`, `adapters`, `read_lengths`,
110
+ and all standard header fields.
111
+
112
+ - **`SampleSheetV2`** — parser for BCLConvert V2 (NovaSeq X series) sample
113
+ sheets. Parses `[Header]`, `[Reads]`, `[BCLConvert_Settings]`,
114
+ `[BCLConvert_Data]`, and optional `[Cloud_Data]` sections. Adds
115
+ `get_umi_length()` and `get_read_structure()` for `OverrideCycles` decoding.
116
+
117
+ - **`SampleSheetFactory`** — auto-detects V1 vs V2 format using a three-step
118
+ strategy (header key scan → section name scan → V1 fallback) and returns
119
+ the appropriate parser.
120
+
121
+ - **`SampleSheetValidator`** — validates parsed sheets for `EMPTY_SAMPLES`,
122
+ `INVALID_INDEX_CHARS`, `INDEX_TOO_SHORT`, `INDEX_TOO_LONG`,
123
+ `DUPLICATE_INDEX`, `MISSING_INDEX2`, `DUPLICATE_SAMPLE_ID`, `NO_ADAPTERS`,
124
+ and `ADAPTER_MISMATCH`. Returns a structured `ValidationResult`.
125
+
126
+ - Initial PyPI release. Requires Python 3.10+, depends only on `loguru`.
@@ -0,0 +1,195 @@
1
+ # Contributing to samplesheet-parser
2
+
3
+ Thanks for contributing! This guide explains how to set up a local development environment, run tests, and prepare your PR with the evidence reviewers need to merge confidently.
4
+
5
+ ---
6
+
7
+ ## Table of contents
8
+
9
+ - [Contributing to samplesheet-parser](#contributing-to-samplesheet-parser)
10
+ - [Table of contents](#table-of-contents)
11
+ - [Setup](#setup)
12
+ - [Running the test suite](#running-the-test-suite)
13
+ - [Running the demo script](#running-the-demo-script)
14
+ - [What to attach to your PR](#what-to-attach-to-your-pr)
15
+ - [For any PR](#for-any-pr)
16
+ - [For PRs touching the converter](#for-prs-touching-the-converter)
17
+ - [Example PR description](#example-pr-description)
18
+ - [PR checklist](#pr-checklist)
19
+ - [Code style](#code-style)
20
+ - [Adding fixture files](#adding-fixture-files)
21
+
22
+ ---
23
+
24
+ ## Setup
25
+
26
+ ```bash
27
+ # 1. Clone and enter the repo
28
+ git clone https://github.com/chaitanyakasaraneni/samplesheet-parser.git
29
+ cd samplesheet-parser
30
+
31
+ # 2. Create a virtual environment (Python 3.10+)
32
+ python -m venv .venv
33
+ source .venv/bin/activate # Windows: .venv\Scripts\activate
34
+
35
+ # 3. Install in editable mode with dev dependencies
36
+ pip install -e ".[dev]"
37
+ ```
38
+
39
+ ---
40
+
41
+ ## Running the test suite
42
+
43
+ ```bash
44
+ # Run all tests with coverage
45
+ pytest
46
+
47
+ # Run only the converter tests
48
+ pytest tests/test_converter.py -v
49
+
50
+ # Run a single test class
51
+ pytest tests/test_converter.py::TestV1ToV2 -v
52
+
53
+ # Run with a coverage threshold (CI requires ≥ 85%)
54
+ pytest --cov-fail-under=85
55
+ ```
56
+
57
+ Coverage and test results are printed to the terminal and written to `coverage.xml` (used by Codecov on CI).
58
+
59
+ ---
60
+
61
+ ## Running the demo script
62
+
63
+ For PRs that touch the converter, run the demo script to generate real input/output artifacts you can attach to the PR.
64
+
65
+ ```bash
66
+ python scripts/demo_converter.py
67
+ ```
68
+
69
+ This script:
70
+ 1. Reads the fixture files from `tests/fixtures/`
71
+ 2. Runs V1 → V2 and V2 → V1 conversions
72
+ 3. Runs both round-trip directions (V1 → V2 → V1 and V2 → V1 → V2)
73
+ 4. Validates that sample IDs survive each round-trip
74
+ 5. Writes all output files to `tests/fixtures/outputs/`
75
+
76
+ A passing run looks like:
77
+
78
+ ```
79
+ ────────────────────────────────────────────────────────────
80
+ 1/4 V1 → V2 (SampleSheet_v1_dual_index.csv)
81
+ ────────────────────────────────────────────────────────────
82
+ Input : tests/fixtures/SampleSheet_v1_dual_index.csv
83
+ Output : tests/fixtures/outputs/SampleSheet_v1_converted_to_v2.csv
84
+
85
+ ✓ FileFormatVersion : 2
86
+ ✓ RunName : NovaSeqRun_20240115
87
+ ✓ Sample count : 8
88
+ ✓ Index type : dual
89
+
90
+ ...
91
+
92
+ ✓ All conversions passed. Attach the files in tests/fixtures/outputs/ to your PR.
93
+ ```
94
+
95
+ Exit code `0` means all conversions passed. Exit code `1` means something failed — the error is printed to stderr.
96
+
97
+ ---
98
+
99
+ ## What to attach to your PR
100
+
101
+ ### For any PR
102
+
103
+ Paste the output of `pytest` into the PR description or a comment. The minimum required is the summary line:
104
+
105
+ ```
106
+ ===== 47 passed, 0 warnings in 3.21s =====
107
+ ```
108
+
109
+ A screenshot works too. The CI checks will also run automatically on push.
110
+
111
+ ### For PRs touching the converter
112
+
113
+ Run `python scripts/demo_converter.py` and attach **all four output files** from `tests/fixtures/outputs/` as file attachments to the PR description:
114
+
115
+ | File | What it shows |
116
+ |---|---|
117
+ | `SampleSheet_v1_converted_to_v2.csv` | V1 → V2 output |
118
+ | `SampleSheet_v2_converted_to_v1.csv` | V2 → V1 output (lossy — note dropped fields in logs) |
119
+ | `SampleSheet_v1_roundtrip.csv` | V1 → V2 → V1 — sample IDs must match the original |
120
+ | `SampleSheet_v2_roundtrip.csv` | V2 → V1 → V2 — sample IDs must match the original |
121
+
122
+ To attach files to a GitHub PR description or comment, drag-and-drop them into the text box, or use the paperclip icon. `.csv` files can be attached directly.
123
+
124
+ Also paste the full terminal output of the demo script so reviewers can see the round-trip validation results without running it themselves.
125
+
126
+ ### Example PR description
127
+
128
+ ```
129
+ ## What this PR does
130
+ Adds `SampleSheetConverter` to support V1 ↔ V2 conversions.
131
+
132
+ ## Test results
133
+
134
+ pytest output:
135
+ ```
136
+ ===== 47 passed in 3.2s =====
137
+ ```
138
+
139
+ ## Converter demo output
140
+
141
+ <paste full output of `python scripts/demo_converter.py` here>
142
+
143
+ ## Attached files
144
+
145
+ - SampleSheet_v1_converted_to_v2.csv
146
+ - SampleSheet_v2_converted_to_v1.csv
147
+ - SampleSheet_v1_roundtrip.csv
148
+ - SampleSheet_v2_roundtrip.csv
149
+ ```
150
+
151
+ ---
152
+
153
+ ## PR checklist
154
+
155
+ Before marking a PR ready for review, confirm:
156
+
157
+ - [ ] `pytest` passes locally with no failures
158
+ - [ ] Coverage has not decreased (run `pytest` and check the `TOTAL` line)
159
+ - [ ] New behaviour has tests — aim for one test per logical case, not per line
160
+ - [ ] For converter changes: demo script passes and output files are attached
161
+ - [ ] `ruff check .` passes (no lint errors)
162
+ - [ ] `black --check .` passes (code is formatted)
163
+ - [ ] Docstrings updated for any changed public methods
164
+ - [ ] `CHANGELOG.md` entry added under `[Unreleased]` if applicable
165
+
166
+ ---
167
+
168
+ ## Code style
169
+
170
+ The project uses [Black](https://black.readthedocs.io) for formatting and [Ruff](https://docs.astral.sh/ruff/) for linting.
171
+
172
+ ```bash
173
+ # Format
174
+ black .
175
+
176
+ # Lint
177
+ ruff check .
178
+
179
+ # Fix auto-fixable lint issues
180
+ ruff check . --fix
181
+ ```
182
+
183
+ Both are enforced by CI. The line length is **100 characters** (set in `pyproject.toml`).
184
+
185
+ Type annotations are required for all public functions. Run mypy with:
186
+
187
+ ```bash
188
+ mypy samplesheet_parser/
189
+ ```
190
+
191
+ ---
192
+
193
+ ## Adding fixture files
194
+
195
+ If your PR adds a new sheet format or edge case, add a corresponding fixture file to `tests/fixtures/` alongside the existing ones. Name it descriptively, e.g. `SampleSheet_v1_no_reads.csv`. The fixture directory is tracked in git so reviewers can inspect the inputs your tests use.