PyPI - rmcontrols - Versions diffs - 0.1.0__tar.gz - Mend

rmcontrols 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

rmcontrols-0.1.0/LICENSE +19 -0
rmcontrols-0.1.0/PKG-INFO +551 -0
rmcontrols-0.1.0/README.md +509 -0
rmcontrols-0.1.0/pyproject.toml +67 -0
rmcontrols-0.1.0/rmcontrols/__init__.py +46 -0
rmcontrols-0.1.0/rmcontrols/_blobs.py +110 -0
rmcontrols-0.1.0/rmcontrols/_build.py +41 -0
rmcontrols-0.1.0/rmcontrols/_cli_extract.py +130 -0
rmcontrols-0.1.0/rmcontrols/_cli_validate.py +287 -0
rmcontrols-0.1.0/rmcontrols/_extract.py +217 -0
rmcontrols-0.1.0/rmcontrols/_features.py +172 -0
rmcontrols-0.1.0/rmcontrols/_hooks.py +99 -0
rmcontrols-0.1.0/rmcontrols/_region.py +40 -0
rmcontrols-0.1.0/rmcontrols/_s3.py +353 -0
rmcontrols-0.1.0/rmcontrols/_segmentation.py +101 -0
rmcontrols-0.1.0/rmcontrols/_types.py +154 -0
rmcontrols-0.1.0/rmcontrols/_validation.py +435 -0
rmcontrols-0.1.0/rmcontrols/cli.py +171 -0
rmcontrols-0.1.0/rmcontrols/detector.py +438 -0
rmcontrols-0.1.0/rmcontrols/py.typed +0 -0
rmcontrols-0.1.0/rmcontrols/viz.py +282 -0
rmcontrols-0.1.0/rmcontrols.egg-info/PKG-INFO +551 -0
rmcontrols-0.1.0/rmcontrols.egg-info/SOURCES.txt +33 -0
rmcontrols-0.1.0/rmcontrols.egg-info/dependency_links.txt +1 -0
rmcontrols-0.1.0/rmcontrols.egg-info/entry_points.txt +5 -0
rmcontrols-0.1.0/rmcontrols.egg-info/requires.txt +22 -0
rmcontrols-0.1.0/rmcontrols.egg-info/top_level.txt +1 -0
rmcontrols-0.1.0/setup.cfg +4 -0
rmcontrols-0.1.0/tests/test_blobs.py +159 -0
rmcontrols-0.1.0/tests/test_build.py +108 -0
rmcontrols-0.1.0/tests/test_cli.py +240 -0
rmcontrols-0.1.0/tests/test_detector.py +222 -0
rmcontrols-0.1.0/tests/test_features.py +191 -0
rmcontrols-0.1.0/tests/test_segmentation.py +115 -0
rmcontrols-0.1.0/tests/test_viz.py +156 -0

rmcontrols-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,19 @@
+Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
+(CC BY-NC-ND 4.0)
+Copyright (c) 2026 afilt
+You are free to:
+  Share — copy and redistribute the material in any medium or format
+Under the following terms:
+  Attribution   — You must give appropriate credit, provide a link to the
+                  license, and indicate if changes were made.
+  NonCommercial — You may not use the material for commercial purposes.
+  NoDerivatives — If you remix, transform, or build upon the material, you
+                  may not distribute the modified material.
+No additional restrictions — You may not apply legal terms or technological
+measures that legally restrict others from doing anything the license permits.
+Full license text: https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode

rmcontrols-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,551 @@
+Metadata-Version: 2.4
+Name: rmcontrols
+Version: 0.1.0
+Summary: Flag control tissues from immunochemistry whole slide images
+Author: afilt
+License-Expression: CC-BY-NC-ND-4.0
+Project-URL: Homepage, https://github.com/afilt/rmcontrols
+Project-URL: Repository, https://github.com/afilt/rmcontrols
+Project-URL: Issues, https://github.com/afilt/rmcontrols/issues
+Keywords: IHC,histology,whole slide image,control tissue,pathology
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Operating System :: OS Independent
+Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
+Classifier: Topic :: Scientific/Engineering :: Image Processing
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy>=1.23
+Requires-Dist: Pillow
+Requires-Dist: scipy
+Requires-Dist: matplotlib
+Requires-Dist: scikit-image
+Requires-Dist: tqdm
+Provides-Extra: wsi
+Requires-Dist: openslide-python; extra == "wsi"
+Provides-Extra: s3
+Requires-Dist: boto3; extra == "s3"
+Provides-Extra: dev
+Requires-Dist: pytest; extra == "dev"
+Requires-Dist: ruff; extra == "dev"
+Requires-Dist: mypy; extra == "dev"
+Requires-Dist: pytest-cov; extra == "dev"
+Requires-Dist: pre-commit; extra == "dev"
+Requires-Dist: nbstripout; extra == "dev"
+Requires-Dist: ipykernel; extra == "dev"
+Requires-Dist: pip; extra == "dev"
+Dynamic: license-file
+# rmcontrols
+Detect and flag control tissues in immunohistochemistry (IHC) whole-slide image thumbnails.
+---
+## Installation
+```bash
+pip install rmcontrols
+```
+For WSI support (OpenSlide):
+```bash
+pip install "rmcontrols[wsi]"
+```
+For S3 support (boto3):
+```bash
+pip install "rmcontrols[s3]"
+```
+---
+## Check control detection
+Interactively validate the detected split line for a collection of thumbnails
+or whole-slide images directly from the command line.  Results are written to
+a JSON file; the output directory is created automatically.
+### Thumbnails
+```bash
+# Validate all PNGs in assets/ — results go to ./outputs/validate_thumbnails.json
+rmcontrols-validate-thumbnails "assets/*.png" --side left
+# Custom output file
+rmcontrols-validate-thumbnails "assets/*.png" --side left \
+    --output my_results.json
+# Replace an existing output file
+rmcontrols-validate-thumbnails "assets/*.png" --overwrite
+# Use the full 5-panel debug grid instead of the simple split-line view
+rmcontrols-validate-thumbnails "assets/*.png" --full-debug
+```
+### Slides (local or S3)
+Requires `uv sync --extra wsi` (OpenSlide).  For S3 slides also
+`uv sync --extra s3` (boto3).
+```bash
+# Local .mrxs slides — results go to ./outputs/validate_slides.json
+rmcontrols-validate-slides "slides/*.mrxs" --side left
+# Mix of formats (svs, ndpi, scn …)
+rmcontrols-validate-slides "slides/*.svs" --side left \
+    --thumbnail-size 800 --output svs_results.json
+# Single S3 slide (pass the URI directly instead of a glob)
+rmcontrols-validate-slides "s3://my-bucket/slides/case_001.mrxs" \
+    --side left --output outputs/case_001.json
+# Replace an existing output file
+rmcontrols-validate-slides "slides/*.mrxs" --overwrite
+```
+### Output format
+Both commands produce a JSON array where each entry corresponds to one
+thumbnail or slide:
+```json
+[
+  {
+    "path": "assets/thumbnail_1.png",
+    "control_split_x": 142,
+    "thumbnail_width": 512,
+    "pct": "27.7%"
+  },
+  {
+    "path": "assets/thumbnail_2.png",
+    "control_split_x": null,
+    "thumbnail_width": 512,
+    "pct": "N/A"
+  }
+]
+```
+`control_split_x` is `null` when the user entered `0` to mark "no controls".
+---
+## Quick start
+### Python API
+```python
+from rmcontrols import detect_controls, visualize
+thumbnail, regions, control_split_x = detect_controls(
+    "assets/thumbnail_1.png",
+    side="left",
+)
+for r in regions:
+    print(r.label, r.bbox)
+img = visualize(thumbnail, regions, control_split_x=control_split_x)
+img.save("result.png")
+```
+### CLI
+```bash
+uv run rmcontrols assets/thumbnail_1.png --side left
+uv run rmcontrols assets/thumbnail_1.png --side right --output results.json
+uv run rmcontrols assets/thumbnail_1.png --visualize annotated.png
+```
+| Option | Default | Description |
+|---|---|---|
+| `--side` | `left` | Side where controls are placed (`left` or `right`) |
+| `--strip-width` | `0.30` | Strip width as fraction of image width (max 0.40) |
+| `--threshold` | `2.0` | Dissimilarity Z-score threshold |
+| `--min-area` | `500` | Minimum blob area in pixels |
+| `--max-aspect-ratio` | `5.0` | Reject blobs with aspect ratio above this |
+| `--split-margin` | `50` | Extra pixels added beyond outermost control edge |
+| `--proximity` | `50` | Proximity rescue radius in pixels |
+| `--visualize` | — | Save annotated side-by-side PNG |
+| `--output`, `-o` | `outputs/<stem>.json` | Write JSON results to file |
+| `--overwrite` | off | Replace the output file if it already exists |
+---
+## Debug visualisation
+```python
+from rmcontrols import detect_controls_debug, visualize_debug
+import matplotlib.pyplot as plt
+thumbnail, regions, control_split_x, debug_info = detect_controls_debug(
+    "assets/thumbnail_1.png", side="left",
+)
+fig = visualize_debug(thumbnail, debug_info)
+plt.show()
+```
+The figure shows five panels:
+1. **Original thumbnail** — with the split line overlaid
+2. **Tissue mask** — binary output of the Otsu step
+3. **Blob roles** — colour-coded bounding boxes (blue=main, green=control,
+   purple=proximity-rescued, orange=rejected)
+4. **Dissimilarity scores** — bar chart of strip-blob scores vs. threshold
+5. **Shape features** — grouped bar chart of geometric features per blob
+---
+## Interactive validation
+```python
+from rmcontrols import validate_control_split_x
+control_split_x, width = validate_control_split_x(
+    "assets/thumbnail_1.png",
+    side="left",
+    strip_width_frac=0.40,
+    dissimilarity_threshold=0.05,
+)
+print(f"split={control_split_x}  ({control_split_x / width * 100:.1f}% of {width}px)")
+```
+At the prompt:
+| Input | Effect |
+|---|---|
+| *(Enter)* | Accept current value |
+| `<integer>` | Override and redisplay |
+| `0` | No controls → `control_split_x = None` |
+| `debug` | Toggle full 5-panel grid on/off |
+| `break` | Accept and stop (also stops batch loops) |
+### Batch validation
+```python
+from rmcontrols import validate_control_split_x_batch
+from pathlib import Path
+results = validate_control_split_x_batch(
+    sorted(Path("assets/").glob("*.png")),
+    side="left",
+)
+# results: {"path/to/img.png": (control_split_x, width), ...}
+```
+### WSI batch (requires OpenSlide)
+```python
+from rmcontrols import validate_control_split_x_wsi
+results = validate_control_split_x_wsi(
+    sorted(Path("slides/").glob("*.svs")),
+    side="left",
+    thumbnail_size=1000,
+)
+# results: {"slide.svs": (control_split_x, width), ...}
+```
+---
+## Hooks
+```python
+from rmcontrols import detect_controls, DetectionHooks
+def log_mask(mask, otsu, scale):
+    fg = mask.mean() * 100
+    print(f"  mask: otsu={otsu}, scale={scale:.2f}, fg={fg:.1f}%")
+def log_score(blob, score):
+    print(f"  blob {blob['blob_id']}: score={score:.3f}")
+hooks = DetectionHooks(
+    on_mask_ready=log_mask,
+    on_blob_scored=log_score,
+)
+thumbnail, regions, cx = detect_controls("thumbnail.png", hooks=hooks)
+```
+Available hooks:
+| Hook | Signature | When called |
+|---|---|---|
+| `on_mask_ready` | `(mask, otsu, scale) → None` | After tissue segmentation |
+| `on_blobs_extracted` | `(blobs) → None` | After connected-component extraction |
+| `on_blob_scored` | `(blob, score) → None` | Once per strip blob, after scoring |
+| `on_detection_complete` | `(regions, debug_info) → None` | At pipeline end |
+---
+## Tuning guide
+| Symptom | Parameter | Direction |
+|---|---|---|
+| Controls not detected | `dissimilarity_threshold` | ↓ lower |
+| Main tissue wrongly flagged as control | `dissimilarity_threshold` | ↑ raise |
+| Split line cuts into main tissue | `control_split_x_margin` | ↑ raise |
+| Small dust/artifact blobs detected | `min_tissue_area_px` | ↑ raise |
+| Long thin stain lines detected | `max_aspect_ratio` | ↓ lower |
+| Faint tissue not segmented | `threshold_scale` | ↑ raise (or set explicitly) |
+| Background over-segmented | `threshold_scale` | ↓ lower (or set explicitly) |
+| Two physically-connected blobs split apart | `control_proximity_px` | ↑ raise |
+---
+## Development
+### Setup
+```bash
+# 1. Install uv (if not already installed)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+# 2. Clone and install with dev dependencies
+git clone git@github.com:afilt/rmcontrols.git
+cd rmcontrols
+uv sync --extra dev
+# 3. Install pre-commit hooks
+uv run pre-commit install
+```
+```bash
+uv run pytest
+uv run ruff check rmcontrols/
+uv run mypy rmcontrols/
+```
+### Pre-commit hooks
+The repository ships a `.pre-commit-config.yaml` that runs on every
+`git commit`:
+| Hook | What it does |
+|---|---|
+| **nbstripout** | Strips all outputs and metadata from `*.ipynb` before committing |
+| **ruff** | Lints Python files and auto-fixes safe issues (imports, style) |
+| **ruff-format** | Formats Python code |
+| **trailing-whitespace** | Removes trailing spaces |
+| **end-of-file-fixer** | Ensures files end with a newline |
+| **check-yaml / check-toml** | Validates `*.yaml` and `*.toml` syntax |
+| **check-merge-conflict** | Aborts if merge-conflict markers are present |
+| **debug-statements** | Rejects accidental `breakpoint()` / `pdb` calls |
+**First-time setup** (requires a git repository):
+```bash
+git init          # if the repo does not exist yet
+uv run pre-commit install
+```
+**Run all hooks manually** (without committing):
+```bash
+uv run pre-commit run --all-files
+```
+---
+## How it works
+The detector runs an 11-step pipeline on a grayscale thumbnail:
+### Step 1 — Grayscale conversion
+The RGB thumbnail is converted to grayscale using BT.601 luminance weights
+(`0.299 R + 0.587 G + 0.114 B`) rather than a simple channel mean, preserving
+perceptual brightness.
+### Step 2 — Adaptive Otsu thresholding
+A scale factor is applied to the standard Otsu threshold to handle slides where
+faintly-stained tissue would otherwise be missed.
+In **auto mode** (`threshold_scale=None`, the default) the scale is chosen by
+sweeping from 1.0 to 2.0 in 0.02 steps and observing the foreground fraction
+(proportion of pixels below the threshold).  The scale just before the
+foreground fraction explodes — the *elbow* of the curve — is selected and
+blended towards 1.0: `scale = 0.3 × 1.0 + 0.7 × elbow_scale`.  This avoids
+both over-segmentation (too much background included) and under-segmentation
+(faint tissue missed).
+You can override this by passing an explicit `threshold_scale` value.
+### Step 3 — Morphological closing + hole filling
+`scipy.ndimage.binary_closing` (3 × 3 connectivity, 5 iterations) bridges small
+intra-tissue gaps introduced by lightly-stained regions.  `binary_fill_holes`
+recovers tissue enclosed by stained borders.
+### Step 4 — Border-margin zeroing
+A 5 % margin is zeroed on the **top**, **bottom**, and the side **opposite** to
+the controls.  This removes scan-border artifacts (edge staining, slide labels)
+without discarding controls that extend to the near edge.
+> **Edge case**: if `side="left"`, the left border is kept open; the right 5 %
+> is zeroed.  Vice-versa for `side="right"`.
+### Step 5 — Connected-component extraction
+`scipy.ndimage.label` + `find_objects` extracts connected components in a single
+label-array pass (O(H×W + Σ areas) rather than O(H×W × n_blobs)).  Blobs
+smaller than `min_tissue_area_px` are discarded.
+### Step 6 — Strip / main partition
+Blobs whose centroid x-coordinate falls within the control strip
+(`strip_width_frac × W` pixels from the control side) are designated *strip
+blobs*; the remainder are *main tissue blobs*.
+> **Edge case**: if no strip blobs survive, detection stops and returns an
+> empty result.
+### Step 7 — Aspect-ratio filter (line-artifact rejection)
+Blobs with a bounding-box aspect ratio `max(w,h)/min(w,h) > max_aspect_ratio`
+are rejected as line-like scan artifacts.  Applied to **both** strip and main
+blobs so that long thin artifacts do not pollute the reference distribution
+used in the next step.
+> **Edge case**: if all strip blobs are rejected by this filter, detection
+> returns an empty result.
+### Step 8 — Morphological dissimilarity scoring
+This is the **core decision step**: each strip blob is compared to the
+main-tissue population and assigned a scalar *dissimilarity score*.  A blob
+whose score exceeds `dissimilarity_threshold` is accepted as a control tissue.
+#### Feature vector
+Eight descriptors are computed for every blob:
+| Feature | Formula | Typical range | What it captures |
+|---|---|---|---|
+| `extent` | area / (bbox_w × bbox_h) | 0 – 1 | How densely the blob fills its bounding box |
+| `aspect_ratio` | max(w,h) / min(w,h) | ≥ 1 | Overall elongation of the bounding box |
+| `solidity` | area / convex-hull area | 0 – 1 | Degree of convexity; deeply notched blobs score low |
+| `convexity` | hull perimeter / blob perimeter | 0 – 1 | Smoothness of the contour |
+| `isoperimetric_ratio` | perim² / (4π × area) | ≥ 1 (circle = 1) | Compactness; complex, ragged blobs score high |
+| `elongation` | λ_max / λ_min of inertia tensor | ≥ 1 | Principal-axis elongation independent of bounding box |
+| `mean_intensity` | mean BT.601 grayscale within blob | 0 – 255 | Average staining darkness |
+| `std_intensity` | std of BT.601 grayscale within blob | 0 – 128 | Staining heterogeneity |
+The perimeter is computed once via binary erosion and reused for both
+`convexity` and `isoperimetric_ratio`, avoiding redundant morphological passes.
+> **Why these features?**  IHC control tissues are typically small, compact,
+> and uniformly stained punch-outs placed at the slide edge.  In contrast, the
+> main tissue fragment is large, irregularly shaped, and may have heterogeneous
+> staining.  Features such as `isoperimetric_ratio`, `solidity`, and
+> `mean_intensity` tend to be the most discriminative because they capture
+> compactness and staining level simultaneously.
+#### Z-score formulation (normal path)
+When the reference population contains **two or more** main-tissue blobs, the
+dissimilarity score is the **maximum absolute Z-score** across all features
+shared by the candidate and every reference blob:
+```
+score = max_k  |f_k(candidate) - mean_k(ref)| / std_k(ref)
+```
+where `k` indexes the feature keys, `mean_k` and `std_k` are computed over all
+main-tissue blobs, and the denominator is clamped to 1 × 10⁻⁶ to avoid
+division by zero when all reference blobs agree exactly on a feature.
+Taking the **maximum** (rather than an average or Euclidean norm) means the
+score is driven by the single most deviant feature.  This is intentional:
+a control that is identical to main tissue in seven features but wildly
+different in staining intensity should still be flagged.
+**Worked example** — suppose the reference has three main-tissue blobs with
+`isoperimetric_ratio` values `{1.05, 1.08, 1.07}`:
+```
+mean   = 1.067
+std    = 0.013
+candidate isoperimetric_ratio = 1.45
+Z      = |1.45 - 1.067| / 0.013  ≈  29.5
+```
+If all other Z-scores are below the threshold, this single feature drives the
+score to 29.5, which exceeds any sensible threshold and the blob is accepted
+as a control.
+The default threshold of `0.05` corresponds to roughly 0.05 standard deviations
+— blobs that are statistically typical of main tissue are rejected as controls,
+while genuine controls (which are morphologically distinct in at least one
+dimension) receive scores well above 0.05.
+#### Single-reference fallback
+When there is exactly **one** main-tissue blob, the sample standard deviation
+is undefined.  In this case the score falls back to the maximum *normalised
+absolute difference*:
+```
+score = max_k  |f_k(candidate) - f_k(ref)| / range_k
+```
+where `range_k` is a hand-tuned plausible range for each feature:
+| Feature | `range_k` |
+|---|---|
+| `extent` | 1.0 |
+| `aspect_ratio` | 9.0 |
+| `solidity` | 1.0 |
+| `convexity` | 1.0 |
+| `isoperimetric_ratio` | 10.0 |
+| `elongation` | 20.0 |
+| `mean_intensity` | 255.0 |
+| `std_intensity` | 128.0 |
+This makes the fallback score dimensionless and comparable to the Z-score
+regime so that the same `dissimilarity_threshold` remains meaningful.
+#### No-reference edge case
+When there is **no** main tissue at all (e.g. the entire image is background),
+every strip blob is unconditionally accepted as a control and assigned an
+infinite score (`float('inf')`).  In practice this is rare but can occur with
+very sparse or over-segmented images.
+#### Tuning `dissimilarity_threshold`
+| Threshold | Behaviour |
+|---|---|
+| Low (e.g. 0.5) | Accepts blobs that differ only modestly from main tissue; increases sensitivity but may cause false positives |
+| Default (2.0) | ~2 σ deviation required; robust for typical IHC slides |
+| High (e.g. 5.0) | Only accepts blobs that are extreme outliers; reduces false positives but may miss subtle controls |
+Use `detect_controls_debug()` and the `visualize_debug()` panel **"4. Dissimilarity scores"**
+to inspect per-blob scores and choose an appropriate threshold for your dataset.
+A strip blob is accepted as a control if `score >= dissimilarity_threshold`.
+### Step 9 — Proximity rescue
+A strip blob initially rejected by dissimilarity is **reinstated** if its
+x-centroid lies within `control_proximity_px` pixels of an already-accepted
+control.  This handles cases where a single physical control tissue is split
+into two blobs by a thin staining gap and only one part passes the dissimilarity
+test.
+### Step 10 — Spatial constraint
+Accepted controls whose bounding box extends **into** the main-tissue bounding
+box are re-rejected.  This prevents a large tissue blob that straddles the
+strip boundary from being misclassified as a control.
+> **Edge case**: if all candidates are removed by this constraint, detection
+> returns an empty result.
+### Step 11 — control_split_x
+The split coordinate is placed at:
+- `side="left"`:  `min(max_control_right_edge + control_split_x_margin, W)`
+- `side="right"`: `max(min_control_left_edge  − control_split_x_margin, 0)`
+This is the x-coordinate boundary that separates control tissue from main
+tissue in downstream analysis.