PyPI - multiscoresplot - Versions diffs - 2.0.0__tar.gz → 2.1.0__tar.gz - Mend

multiscoresplot 2.0.0tar.gz → 2.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

{multiscoresplot-2.0.0 → multiscoresplot-2.1.0}/.gitignore RENAMED Viewed

@@ -1,3 +1,4 @@
+TODOs.md
 CLAUDE.md
 RELEASE.md
 .claude

{multiscoresplot-2.0.0 → multiscoresplot-2.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: multiscoresplot
-Version: 2.0.0
+Version: 2.1.0
 Summary: Multi-dimensional gene set scoring and visualization for single-cell transcriptomics
 Project-URL: Homepage, https://github.com/AndreMacedo88/multiscoresplot
 Project-URL: Documentation, https://AndreMacedo88.github.io/multiscoresplot/
@@ -121,6 +121,14 @@ rgb = msp.reduce_to_rgb(scores, method="nmf")
 rgb = msp.reduce_to_rgb(scores, method="ica")
 ```
+**Or use a callable directly for one-off reductions:**
+```python
+rgb = msp.reduce_to_rgb(scores, method=my_custom_fn, component_prefix="MY")
+```
+> **Note:** In reduction mode, RGB channels are learned statistical axes. Similar colors indicate similar *projected* profiles, not necessarily identical biology — see the [Pipeline Guide](https://AndreMacedo88.github.io/multiscoresplot/pipeline/) for interpretation caveats.
 ### Step 3 — Plot embedding
 Scatter plot of embedding coordinates colored by RGB values, with an integrated color-space legend.
@@ -180,6 +188,17 @@ msp.render_legend(ax, "pca")
 msp.render_legend(ax, "nmf", component_labels=["NMF1", "NMF2", "NMF3"])
 ```
+## One-Step Convenience Function
+For quick exploration, `plot_scores` wraps the entire pipeline in a single call:
+```python
+scores, rgb, ax = msp.plot_scores(adata, gene_sets, basis="X_umap")
+```
+It auto-selects `blend_to_rgb` for ≤ 3 gene sets and `reduce_to_rgb(method="pca")` for more.
+All scoring, color-mapping, and plotting parameters are accepted as keyword arguments.
 ## Extensibility — Custom reducers
 Register your own dimensionality reduction method:
@@ -216,6 +235,7 @@ Full documentation is available at **[AndreMacedo88.github.io/multiscoresplot](h
 | Function / Class                                    | Description                                         |
 | --------------------------------------------------- | --------------------------------------------------- |
+| `plot_scores(adata, gene_sets)`                     | One-step convenience: score → RGB → plot            |
 | `score_gene_sets(adata, gene_sets)`                 | Score gene sets per cell via pyUCell                |
 | `blend_to_rgb(scores)`                              | Multiplicative blend to RGB (2–3 sets) → RGBResult  |
 | `reduce_to_rgb(scores, method="pca")`               | Dimensionality reduction to RGB (2+ sets) → RGBResult |

{multiscoresplot-2.0.0 → multiscoresplot-2.1.0}/README.md RENAMED Viewed

@@ -88,6 +88,14 @@ rgb = msp.reduce_to_rgb(scores, method="nmf")
 rgb = msp.reduce_to_rgb(scores, method="ica")
 ```
+**Or use a callable directly for one-off reductions:**
+```python
+rgb = msp.reduce_to_rgb(scores, method=my_custom_fn, component_prefix="MY")
+```
+> **Note:** In reduction mode, RGB channels are learned statistical axes. Similar colors indicate similar *projected* profiles, not necessarily identical biology — see the [Pipeline Guide](https://AndreMacedo88.github.io/multiscoresplot/pipeline/) for interpretation caveats.
 ### Step 3 — Plot embedding
 Scatter plot of embedding coordinates colored by RGB values, with an integrated color-space legend.
@@ -147,6 +155,17 @@ msp.render_legend(ax, "pca")
 msp.render_legend(ax, "nmf", component_labels=["NMF1", "NMF2", "NMF3"])
 ```
+## One-Step Convenience Function
+For quick exploration, `plot_scores` wraps the entire pipeline in a single call:
+```python
+scores, rgb, ax = msp.plot_scores(adata, gene_sets, basis="X_umap")
+```
+It auto-selects `blend_to_rgb` for ≤ 3 gene sets and `reduce_to_rgb(method="pca")` for more.
+All scoring, color-mapping, and plotting parameters are accepted as keyword arguments.
 ## Extensibility — Custom reducers
 Register your own dimensionality reduction method:
@@ -183,6 +202,7 @@ Full documentation is available at **[AndreMacedo88.github.io/multiscoresplot](h
 | Function / Class                                    | Description                                         |
 | --------------------------------------------------- | --------------------------------------------------- |
+| `plot_scores(adata, gene_sets)`                     | One-step convenience: score → RGB → plot            |
 | `score_gene_sets(adata, gene_sets)`                 | Score gene sets per cell via pyUCell                |
 | `blend_to_rgb(scores)`                              | Multiplicative blend to RGB (2–3 sets) → RGBResult  |
 | `reduce_to_rgb(scores, method="pca")`               | Dimensionality reduction to RGB (2+ sets) → RGBResult |

{multiscoresplot-2.0.0 → multiscoresplot-2.1.0}/docs/api/index.md RENAMED Viewed

@@ -8,6 +8,7 @@
 | [`blend_to_rgb`](colorspace.md#multiscoresplot.blend_to_rgb) | Multiplicative blend to RGB (2–3 sets) | Step 2 |
 | [`reduce_to_rgb`](colorspace.md#multiscoresplot.reduce_to_rgb) | Dimensionality reduction to RGB (2+ sets) | Step 2 |
 | [`RGBResult`](colorspace.md#multiscoresplot.RGBResult) | Return type of `blend_to_rgb` / `reduce_to_rgb` with metadata | Step 2 |
+| [`plot_scores`](pipeline.md) | One-step convenience: score → RGB → plot | Steps 1–3 |
 | [`plot_embedding`](plotting.md) | Static matplotlib scatter plot | Step 3 |
 | [`plot_embedding_interactive`](interactive.md) | Interactive Plotly scatter plot | Step 3 |
 | [`render_legend`](legend.md) | Draw color-space legend on axes | Optional |

multiscoresplot-2.1.0/docs/api/pipeline.md ADDED Viewed

@@ -0,0 +1,3 @@
+# plot_scores
+::: multiscoresplot.plot_scores

multiscoresplot-2.1.0/docs/changelog.md ADDED Viewed

@@ -0,0 +1,54 @@
+# Changelog
+## 2.1.0
+### New features
+- **`blend_to_rgb` / `reduce_to_rgb`**: new `prefix` and `suffix` keyword parameters for custom score column naming conventions (e.g., `prefix="msp-"`, `suffix="_v2"`). Defaults match existing `"score-"` behavior.
+- **`RGBResult`**: new `prefix` and `suffix` fields so downstream functions auto-detect the naming convention.
+- **`plot_embedding_interactive`**: new `prefix` and `suffix` keyword parameters for correct hover auto-extraction with custom column names. Defaults inherit from `RGBResult` when available.
+- **`plot_scores`**: new `prefix` and `suffix` keyword parameters forwarded to all pipeline steps (scoring, color mapping, and interactive plotting).
+- **`plot_scores`**: new one-step convenience function that wraps the full score → RGB → plot pipeline. Auto-selects `blend_to_rgb` for ≤ 3 gene sets and `reduce_to_rgb(method="pca")` for more.
+- **`reduce_to_rgb`**: `method` now accepts a callable with signature `(X, n_components, **kwargs) -> NDArray` for one-off custom reductions. New `component_prefix` parameter overrides legend axis labels.
+- **`score_gene_sets`**: emits `UserWarning` listing missing genes per gene set (genes not found in `adata.var_names` are imputed by pyUCell with worst-case rank).
+- **`score_gene_sets`**: emits `UserWarning` when `adata.X` contains negative values (e.g., after `sc.pp.scale()`), since UCell is designed for non-negative counts.
+- **`score_gene_sets`**: automatically copies read-only `adata.X` arrays to prevent crashes inside pyUCell (works around a pyUCell bug with read-only arrays after `sc.pp.scale()`).
+- **`score_gene_sets`**: new `clip_pct` parameter for per-gene-set percentile clipping (winsorization). Accepts a single float for upper-tail clipping or a `(lo, hi)` tuple for both tails.
+- **`score_gene_sets`**: new `normalize` parameter for per-gene-set min-max rescaling to [0, 1]. Applied after clipping.
+### Documentation
+- Added color interpretation caveats for reduction mode in Pipeline Guide.
+- Added inline callable example in Examples page.
+## 2.0.0
+### Breaking changes
+- `blend_to_rgb` and `reduce_to_rgb` now return `RGBResult` (carries RGB array + metadata). The object supports numpy array protocol, so `np.asarray(result)`, indexing, and comparisons still work.
+- `plot_embedding` and `plot_embedding_interactive`: `basis=` now takes the **full obsm key** (e.g. `"X_umap"`, `"umap_consensus"`). The old short form (`basis="umap"`) still works but emits a `DeprecationWarning`.
+- `plot_embedding`: `legend=True` (default) now **requires** a known method. Previously it silently skipped the legend when `method=None`; now it raises `ValueError`. Pass `legend=False` or provide `method=`, or use an `RGBResult`.
+- `plot_embedding_interactive`: `width`/`height` replaced by `figsize` (inches) + `dpi`. Pixel dimensions = `figsize * dpi`.
+### New features
+- **`RGBResult`**: new dataclass returned by `blend_to_rgb` / `reduce_to_rgb`. Carries `method`, `gene_set_names`, and `colors` metadata that plotting functions auto-detect.
+- **`plot_embedding`**: new params `legend_size`, `legend_resolution`, `dpi`.
+- **`plot_embedding_interactive`**: new param `legend_kwargs`. `figsize`/`dpi` replace `width`/`height`.
+- **Both plotting functions**: consistent `legend_size`, `legend_resolution`, and `legend_kwargs` params.
+- **`hover_columns`** now falls back to `adata.var_names` for gene expression values (sparse matrices supported).
+- **`gene_set_names`** behavior is now consistent between static and interactive plots.
+## 1.0.3
+- Fix badge display in README
+## 1.0.2
+- Fix legend not being plotted in interactive "direct" methods
+- Fix CI badge path in README
+## 1.0.1
+- Initial stable release
+- 5-step pipeline: score, blend, reduce, plot, legend
+- Built-in reducers: PCA, NMF, ICA
+- Pluggable reducer registry
+- Static matplotlib and interactive Plotly plotting
+- Color-space legends for direct and reduction modes

{multiscoresplot-2.0.0 → multiscoresplot-2.1.0}/docs/examples.md RENAMED Viewed

@@ -43,6 +43,33 @@ rgb = msp.reduce_to_rgb(scores, method="umap")
 msp.plot_embedding(adata, rgb, basis="X_umap")
 ```
+## Inline Callable Reducer
+For one-off custom reductions, pass a callable directly to `reduce_to_rgb` instead of
+registering it:
+```python
+import multiscoresplot as msp
+import umap
+def umap_reducer(X, n_components, **kwargs):
+    embedding = umap.UMAP(n_components=n_components, **kwargs).fit_transform(X)
+    for j in range(embedding.shape[1]):
+        col = embedding[:, j]
+        lo, hi = col.min(), col.max()
+        if hi > lo:
+            embedding[:, j] = (col - lo) / (hi - lo)
+    return embedding
+rgb = msp.reduce_to_rgb(scores, method=umap_reducer, component_prefix="UMAP")
+msp.plot_embedding(adata, rgb, basis="X_umap")
+```
+This is equivalent to `register_reducer` + `reduce_to_rgb(method="umap")`, but more
+convenient when you only need the reducer once.
 ## Different Embeddings
 Plot the same RGB coloring on different embeddings to compare:

{multiscoresplot-2.0.0 → multiscoresplot-2.1.0}/docs/pipeline.md RENAMED Viewed

@@ -21,9 +21,51 @@ scores = msp.score_gene_sets(
 ```
 !!! note
-    `score_gene_sets` wraps pyUCell's ranking-based scoring. The `max_rank` parameter
-    controls how many top-ranked genes per cell are considered — set it close to the
-    median number of detected genes per cell for best results.
+`score_gene_sets` wraps pyUCell's ranking-based scoring. The `max_rank` parameter
+controls how many top-ranked genes per cell are considered — set it close to the
+median number of detected genes per cell for best results.
+!!! tip "Custom column naming"
+All pipeline functions accept `prefix` and `suffix` parameters to customise score
+column names. For example, `prefix="msp-"` produces columns like `msp-Dorsal`
+instead of `score-Dorsal`. Pass the same `prefix`/`suffix` to downstream functions
+(`blend_to_rgb`, `reduce_to_rgb`, `plot_embedding_interactive`) — or use `plot_scores`
+which forwards them automatically. `RGBResult` stores these values so interactive
+plots auto-detect them from the result object.
+### Score Post-Processing
+UCell scores are bounded to [0, 1] in theory, but in practice the observed range is
+often narrow (e.g., [0.1, 0.6]) because the theoretical extremes are rare. This
+underutilizes the color space for visualization. Two optional parameters help:
+- **`clip_pct`** — Percentile clipping (winsorization) to reduce the influence of
+  outlier cells. A single float clips the upper tail; a tuple `(lo, hi)` clips both.
+- **`normalize`** — Per-gene-set min-max rescaling so the scores span the full [0, 1]
+  range, maximizing use of the color space.
+Processing order: clip first, then normalize (so normalization stretches the clipped
+range).
+```python
+# Clip upper 1% outliers, then stretch to [0, 1]
+scores = msp.score_gene_sets(
+    adata, gene_sets,
+    clip_pct=99,
+    normalize=True,
+)
+# Clip both tails
+scores = msp.score_gene_sets(adata, gene_sets, clip_pct=(1, 99))
+# Only normalize (no clipping)
+scores = msp.score_gene_sets(adata, gene_sets, normalize=True)
+```
+!!! tip
+For most visualization workflows, `clip_pct=99, normalize=True` is a good default.
+This removes extreme outliers and stretches the remaining scores to fill the full
+color range.
 ## Step 2 — Map Scores to RGB
@@ -57,20 +99,23 @@ For **any number of gene sets**, dimensionality reduction projects the score mat
 rgb = msp.reduce_to_rgb(scores, method="pca")  # default
 rgb = msp.reduce_to_rgb(scores, method="nmf")
 rgb = msp.reduce_to_rgb(scores, method="ica")
+# Or pass a callable directly for one-off custom reductions
+rgb = msp.reduce_to_rgb(scores, method=my_reducer_fn)
 ```
 #### Choosing a Reduction Method
-| Method | Best for | Properties |
-|--------|----------|------------|
-| **PCA** | General use | Linear, orthogonal components, preserves maximum variance. Components can mix positive and negative loadings. |
-| **NMF** | Interpretability | Non-negative components — each RGB channel corresponds to a non-negative combination of gene sets. Often more biologically intuitive. |
-| **ICA** | Independent signals | Maximizes statistical independence between components. Useful when gene programs are expected to be independent. |
+| Method  | Best for            | Properties                                                                                                                            |
+| ------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
+| **PCA** | General use         | Linear, orthogonal components, preserves maximum variance. Components can mix positive and negative loadings.                         |
+| **NMF** | Interpretability    | Non-negative components — each RGB channel corresponds to a non-negative combination of gene sets. Often more biologically intuitive. |
+| **ICA** | Independent signals | Maximizes statistical independence between components. Useful when gene programs are expected to be independent.                      |
 !!! tip
-    Start with **PCA** for exploration. Switch to **NMF** if you want components that are
-    easier to interpret biologically. Use **ICA** when you have prior reason to believe
-    the gene programs are driven by separate, non-overlapping regulatory mechanisms.
+Start with **PCA** for exploration. Switch to **NMF** if you want components that are
+easier to interpret biologically. Use **ICA** when you have prior reason to believe
+the gene programs are driven by separate, non-overlapping regulatory mechanisms.
 #### PCA — Principal Component Analysis
@@ -79,7 +124,7 @@ RGB channels. It is the best default choice because it captures the most informa
 (the largest differences between cells) in the fewest components.
 However, PCA components can have both positive and negative loadings on gene sets. This
-means a single RGB channel might represent "high in gene set A *and* low in gene set B"
+means a single RGB channel might represent "high in gene set A _and_ low in gene set B"
 at the same time, which can make the color mapping less intuitive to interpret. In
 practice, this is rarely a problem for visualization — the overall color patterns still
 reveal meaningful structure — but it does mean the legend's R/G/B labels are abstract
@@ -91,7 +136,7 @@ to map neatly to a biological concept.
 #### NMF — Non-negative Matrix Factorization
 NMF decomposes the score matrix into non-negative factors. Because both the loadings and
-the coefficients are constrained to be ≥ 0, each RGB channel can only be a *positive*
+the coefficients are constrained to be ≥ 0, each RGB channel can only be a _positive_
 combination of gene sets — it can never represent "high A and low B" in the same
 component. This makes the components more naturally interpretable: each color channel
 tends to capture a distinct group of co-active gene programs.
@@ -110,7 +155,7 @@ programs, making the plot easier to interpret biologically.
 #### ICA — Independent Component Analysis
-ICA looks for components that are statistically *independent* — meaning knowing the value
+ICA looks for components that are statistically _independent_ — meaning knowing the value
 of one component tells you nothing about the others. This is a stronger requirement than
 PCA's orthogonality (uncorrelated), which only rules out linear relationships.
@@ -129,14 +174,35 @@ components by variance like PCA does, so the R/G/B assignment is less predictabl
 regulated biological processes and want the color channels to separate them as cleanly
 as possible.
+!!! warning "Interpreting Colors in Reduction Mode"
+    1. **Similar colors ≠ similar biology.** Reduction methods (PCA/NMF/ICA) project
+       the full score matrix into just 3 dimensions. Only the top 3 axes of variation
+       are captured - all other differences between cells are collapsed. Two cells with
+       very different gene set score profiles can appear the same color if their
+       differences lie along components that were dropped.
+    2. **Same gene set score ≠ same gene expression.** UCell computes a rank-based
+       aggregate score per gene set. Two cells can achieve the same UCell score by
+       highly expressing different subsets of genes within that gene set. The score
+       summarizes overall gene set activity, not the identity of which specific genes
+       drive it.
+    3. **RGB channels are abstract in reduction mode.** Unlike `blend_to_rgb` where
+       each color maps directly to a specific gene set, the R, G, and B channels in
+       reduction mode represent learned linear combinations of all gene set scores.
+       The legend labels (e.g., PC1/PC2/PC3) are statistical axes, not biological
+       programs. For more interpretable components, use `method="nmf"` (non-negative
+       parts-based decomposition) or `blend_to_rgb` for ≤ 3 gene sets.
 ### Blend vs. Reduce — When to Use Which
-| | `blend_to_rgb` | `reduce_to_rgb` |
-|---|---|---|
-| **Gene sets** | 2–3 only | 2 or more |
-| **Color mapping** | Direct: each gene set has its own color | Learned: RGB channels are linear combinations of scores |
-| **Interpretability** | Immediate — colors correspond directly to gene sets | Requires the legend to interpret RGB channels |
-| **Best for** | Focused comparisons of 2–3 programs | Exploratory analysis of many programs simultaneously |
+|                      | `blend_to_rgb`                                      | `reduce_to_rgb`                                         |
+| -------------------- | --------------------------------------------------- | ------------------------------------------------------- |
+| **Gene sets**        | 2–3 only                                            | 2 or more                                               |
+| **Color mapping**    | Direct: each gene set has its own color             | Learned: RGB channels are linear combinations of scores |
+| **Interpretability** | Immediate — colors correspond directly to gene sets | Requires the legend to interpret RGB channels           |
+| **Best for**         | Focused comparisons of 2–3 programs                 | Exploratory analysis of many programs simultaneously    |
 ## Step 3 — Plot Embedding
@@ -149,9 +215,9 @@ functions detect this metadata automatically, so you don't need to repeat `metho
 and `gene_set_names=`.
 !!! note "Full obsm key"
-    The `basis` parameter now takes the **full obsm key** (e.g., `"X_umap"`,
-    `"umap_consensus"`), not a short name. This lets you use any obsm key, not
-    just those prefixed with `X_`.
+The `basis` parameter now takes the **full obsm key** (e.g., `"X_umap"`,
+`"umap_consensus"`), not a short name. This lets you use any obsm key, not
+just those prefixed with `X_`.
 ```python
 # Basic usage — method and gene_set_names auto-detected from RGBResult
@@ -206,11 +272,11 @@ msp.plot_embedding_interactive(
 ```
 !!! tip "Hover over genes"
-    `hover_columns` accepts both `adata.obs` column names **and** gene names from
-    `adata.var_names`. Gene names display the expression value from `adata.X`.
+`hover_columns` accepts both `adata.obs` column names **and** gene names from
+`adata.var_names`. Gene names display the expression value from `adata.X`.
 !!! note
-    Interactive plots require the `plotly` extra: `pip install 'multiscoresplot[interactive]'`
+Interactive plots require the `plotly` extra: `pip install 'multiscoresplot[interactive]'`
 ## Optional — Standalone Legend
@@ -230,3 +296,34 @@ msp.render_legend(ax, "direct", gene_set_names=["A", "B", "C"])
 msp.render_legend(ax, "pca")
 msp.render_legend(ax, "nmf", component_labels=["NMF1", "NMF2", "NMF3"])
 ```
+## One-Step Convenience Function
+For quick exploration, `plot_scores` wraps the entire 3-step pipeline in a single call.
+It auto-selects `blend_to_rgb` for ≤ 3 gene sets and `reduce_to_rgb(method="pca")` for
+more.
+```python
+# All-in-one: score → map to RGB → plot
+scores, rgb, ax = msp.plot_scores(adata, gene_sets, basis="X_umap")
+# With post-processing and custom method
+scores, rgb, ax = msp.plot_scores(
+    adata, gene_sets,
+    clip_pct=99,
+    normalize=True,
+    method="nmf",
+    basis="X_umap",
+    show=False,
+)
+# Force blend even with 2 sets (default already does this, but explicit)
+scores, rgb, ax = msp.plot_scores(
+    adata, gene_sets_2,
+    method="blend",
+    basis="X_umap",
+)
+```
+The return value is a `(scores, rgb, plot_result)` tuple, so you can reuse
+the intermediate outputs for further analysis.

{multiscoresplot-2.0.0 → multiscoresplot-2.1.0}/mkdocs.yml RENAMED Viewed

@@ -41,6 +41,7 @@ nav:
   - Pipeline Guide: pipeline.md
   - API Reference:
       - Overview: api/index.md
+      - plot_scores: api/pipeline.md
       - score_gene_sets: api/scoring.md
       - blend_to_rgb & reduce_to_rgb: api/colorspace.md
       - plot_embedding: api/plotting.md

{multiscoresplot-2.0.0 → multiscoresplot-2.1.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "multiscoresplot"
-version = "2.0.0"
+version = "2.1.0"
 description = "Multi-dimensional gene set scoring and visualization for single-cell transcriptomics"
 readme = "README.md"
 license = "MIT"

{multiscoresplot-2.0.0 → multiscoresplot-2.1.0}/src/multiscoresplot/__init__.py RENAMED Viewed

@@ -11,6 +11,7 @@ from multiscoresplot._colorspace import (
 )
 from multiscoresplot._interactive import plot_embedding_interactive
 from multiscoresplot._legend import render_legend
+from multiscoresplot._pipeline import plot_scores
 from multiscoresplot._plotting import plot_embedding
 from multiscoresplot._scoring import score_gene_sets
@@ -20,6 +21,7 @@ __all__ = [
     "get_component_labels",
     "plot_embedding",
     "plot_embedding_interactive",
+    "plot_scores",
     "project_direct",
     "project_pca",
     "reduce_to_rgb",
@@ -27,4 +29,4 @@ __all__ = [
     "render_legend",
     "score_gene_sets",
 ]
-__version__ = "2.0.0"
+__version__ = "2.1.0"

multiscoresplot 2.0.0__tar.gz → 2.1.0__tar.gz

multiscoresplot 2.0.0tar.gz → 2.1.0tar.gz