PyPI - scatrans - Versions diffs - 0.7.0.dev0__tar.gz → 0.8.0.dev0__tar.gz - Mend

scatrans 0.7.0.dev0tar.gz → 0.8.0.dev0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

{scatrans-0.7.0.dev0 → scatrans-0.8.0.dev0}/.github/workflows/ci.yml RENAMED Viewed

@@ -22,6 +22,8 @@ jobs:
     steps:
       - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0  # Good practice for setuptools_scm (version detection)
       - name: Set up Python ${{ matrix.python-version }}
         uses: actions/setup-python@v5

scatrans-0.8.0.dev0/.github/workflows/publish.yml ADDED Viewed

@@ -0,0 +1,74 @@
+name: Publish to PyPI
+on:
+  push:
+    tags:
+      - "v*"
+  release:
+    types: [published]
+  workflow_dispatch:
+    inputs:
+      version:
+        description: "Force a specific version (SETUPTOOLS_SCM_PRETEND_VERSION). Useful for dev releases when not on a tag."
+        required: false
+        default: ""
+jobs:
+  build:
+    name: Build distribution 📦
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0  # Critical for setuptools_scm to detect tags and produce correct version
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - name: Install build tools
+        run: python -m pip install --upgrade build
+      - name: Build source and wheel distributions
+        run: |
+          if [ -n "${{ github.event.inputs.version }}" ]; then
+            echo "Using forced version: ${{ github.event.inputs.version }}"
+            SETUPTOOLS_SCM_PRETEND_VERSION="${{ github.event.inputs.version }}" python -m build
+          else
+            python -m build
+          fi
+      - name: Upload distribution artifacts
+        uses: actions/upload-artifact@v4
+        with:
+          name: python-package-distributions
+          path: dist/
+  publish:
+    name: Publish to PyPI
+    needs: build
+    runs-on: ubuntu-latest
+    # Required for Trusted Publishing (OIDC) - no API token secret needed
+    permissions:
+      id-token: write
+    # Recommended: tie to a protected GitHub Environment (create "pypi" environment in repo settings)
+    # You can add required reviewers or branch restrictions in the environment settings.
+    # environment:
+    #   name: pypi
+    steps:
+      - name: Download all dists
+        uses: actions/download-artifact@v4
+        with:
+          name: python-package-distributions
+          path: dist/
+      - name: Publish distribution 📦 to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
+        # For publishing to TestPyPI instead (for testing the workflow):
+        # with:
+        #   repository-url: https://test.pypi.org/legacy/
+        #   verbose: true

{scatrans-0.7.0.dev0 → scatrans-0.8.0.dev0}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,31 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.8.0] - 2026-06-14
+### Added (enrichment module — major paper-readiness upgrade)
+- `run_go(ontology="BP"|"CC"|"MF"|"ALL", ...)` — direct wrapper analogous to clusterProfiler `enrichGO`. Supports `adjust_across_all=True` for a single BH correction across all GO terms when using "ALL".
+- `save_enrichment_report(res, prefix=..., save_excel=True, save_csv=True, save_tsv=True, save_metadata=True, save_term_gene_table=True)` — one-call export of main table, term-gene long table (via `expand_enrichment_genes`), and rich `metadata.json` + xlsx sheet. Auto-creates parent directories. List columns (e.g. `Genes_list`) are sanitized to `;` strings for clean export.
+- `expand_enrichment_genes(res)` — expands the `Genes` (semicolon) column into a long-format Term–Gene table (one row per gene). Preserves `Ontology` column when input came from `run_go(..., "ALL")`.
+- Rich provenance in every result `.attrs` (success and empty):
+  - `analysis_info`: package, version, timestamp, module
+  - `gene_set_info`: `requested`/`resolved`, `requested_source` vs `actual_source` ("bundled", "gseapy", "gmt", "dict"), `library_name`, `n_terms`, `n_unique_genes`
+  - `universe_info`: full details of background handling (provided size, restricted, dropped_by_annotation, force_universe, mapping counts)
+  - Empty results now carry `reason` ("gene_list_empty", "universe_empty", "no_term_overlap_after_filters", ...) + the above fields so users can diagnose why nothing came back.
+- New `run_enrichment` / `run_kegg` / `run_go` parameters: `padj_cutoff` (preferred modern name), `include_gene_list` (adds `Genes_list` python-list column), `adjust_across_all`.
+- `list_bundled_gene_sets()` now clearly documents the 2026 organism-specific defaults.
+- Improved low-mapping-rate warning (includes input examples + gene-set examples).
+- `background` is now a documented deprecated alias of `universe`; passing both raises immediately.
+- All empty-result DataFrames preserve consistent columns (including optional `Genes_list` when requested) and full diagnostic attrs.
+### Changed / Improved
+- `_load_gene_sets` now returns `(term_to_genes, term_to_desc, load_info)` so `actual_source` is always recorded accurately (even on gseapy fallback after bundled attempt).
+- `run_kegg` fully synchronized with new parameters (`padj_cutoff`, `include_gene_list`, etc.).
+- `enrich_dotplot` (pl.py) and various tl.py flows updated for new columns/attrs.
+- Version unified to 0.8.0 for this release.
+- README and docstrings extensively updated with manuscript-export examples, `run_go`, provenance details, and `adjust_across_all` guidance.
+- Full test coverage for new paths (per-ontology attrs, within_ontology p.adjust, save+tsv+dir creation, expand with Ontology, dual-cutoff warning, etc.). All tests pass.
 ## [Unreleased]
 ### Added

scatrans-0.8.0.dev0/MANIFEST.in ADDED Viewed

@@ -0,0 +1,15 @@
+# Control what goes into the source distribution (sdist).
+# The wheel only contains the runtime package (src/scatrans + data).
+# Standard important files
+include LICENSE
+include README.md
+include CHANGELOG.md
+include pyproject.toml
+# Include the GitHub Actions workflows (requested)
+include .github/workflows/ci.yml
+include .github/workflows/publish.yml
+# If more workflows are added in the future, this will catch them:
+include .github/workflows/*.yml

{scatrans-0.7.0.dev0 → scatrans-0.8.0.dev0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: scatrans
-Version: 0.7.0.dev0
+Version: 0.8.0.dev0
 Summary: Single-cell Active Transcription Analysis
 Author: scATrans Developers
 License: MIT
@@ -42,9 +42,9 @@ Dynamic: license-file
 # scATrans
-scATrans computes a composite score that integrates differential expression with a simple reference-based measure of excess unspliced (nascent) RNA between two groups. It is designed for users working with single-cell spliced/unspliced or mature/nascent data who want to rank genes according to this combined signal.
+scATrans computes a composite score from differential expression and reference-based excess unspliced (nascent) RNA between groups. It ranks genes in single-cell spliced/unspliced or mature/nascent data.
-The package supplies a basic analysis path together with several optional extensions. All methods have limitations; results should be interpreted in light of the diagnostics and the experimental design. The tool does not claim to be a gold standard or to recover "truly active" genes in an absolute sense.
+Results must be interpreted using the provided diagnostics. The method has known limitations and does not guarantee recovery of truly active genes.
 ## Installation
@@ -88,109 +88,60 @@ ufrac = scat.qc.unspliced_global(adata)   # logs INFO + WARNING if > 50%
 `active_score` automatically runs this check and records the value in diagnostics.
-### Preserving raw counts + original spliced/unspliced layers (strongly recommended)
-scATrans (especially the Memento backend and velocity/active-transcription calculations) works best when you still have access to the original raw counts and the original spliced/unspliced (or mature/nascent) matrices on as many genes as possible.
-Call this **early** (right after loading + basic QC, before any HVG, normalize or log1p):
+## Quick Start (Minimal Default Flow)
 ```python
+import scanpy as sc
 import scatrans as scat
-# Save raw counts + the original velocity layers for later use
+adata = sc.read_h5ad("your_data.h5ad")
+# Preserve original counts and spliced/unspliced layers before HVG or normalization.
 scat.store_raw_counts(adata, layer="counts", save_raw=False)
-# Now you can safely do the usual Scanpy preprocessing for visualization
 sc.pp.highly_variable_genes(adata, n_top_genes=3000)
-# ... normalize_total, log1p, neighbors, umap, leiden ...
-```
+# ... normalize, log1p, neighbors, UMAP, clustering ...
-What `store_raw_counts` does:
-- Saves the current `.X` (your raw counts at that moment) into `layers["counts"]`.
-- If your adata contains `"spliced"` / `"unspliced"` (or `"mature"` / `"nascent"`) layers, it also saves them under `raw_spliced`, `raw_unspliced` etc. These preserved layers survive later HVG subsetting of the main object.
-- `save_raw=False` is now the default (we do **not** automatically set `adata.raw` unless you explicitly ask for it with `save_raw=True`).
+# Optional: attach bundled gene features for bias correction.
+adata = scat.add_gene_features(adata)
-This way:
-- Your visualization pipeline can use a small HVG + log1p `.X`.
-- Later you can still run `differential_expression(..., use_memento_de=True)` or `active_score` using the full-gene raw counts and the original spliced/unspliced data from the saved layers.
-- When doing enrichment, pass the gene list from the preserved full set as `universe` (see the enrichment section below for details and warnings).
+adata_res, significant, all_results = scat.active_score(
+    adata_input=adata,
+    groupby="condition",
+    target_group="Disease",
+    reference_group="Control",
+)
-See also the "Additional Capability: Standalone Differential Expression" section and the HVG-vs-velocity-layers note below.
+print(all_results.head())
+```
-**Impact of HVG filtering on spliced/unspliced layers (important)**
+Default parameters require no choices for bias correction, effective gamma, or mixed models. Pseudobulk mode and DE method (`de_method`) are configurable options. The built-in `significant` list is strict and often small or empty; use the full ranked table in `all_results`.
-In standard Scanpy operations:
+### Preserving raw counts and layers
-```python
-sc.pp.highly_variable_genes(adata, n_top_genes=3000)
-adata = adata[:, adata.var.highly_variable].copy()
-```
+Call `store_raw_counts` immediately after loading and basic QC, before HVG selection or normalization. It preserves the full raw counts and the original spliced/unspliced (or mature/nascent) layers.
-**This will also affect the spliced/unspliced layers**:
+`store_raw_counts` writes the current `.X` to `layers["counts"]` and copies spliced/unspliced layers to `raw_spliced` / `raw_unspliced`. These survive HVG subsetting. The default `save_raw=False` avoids setting `adata.raw`.
-- AnnData's `.layers` (including the "spliced" and "unspliced" you stored) are automatically subset together with the genes.
-- This is standard AnnData behavior and is usually **desired**, because velocity calculations (gamma estimation, unspliced excess, active_score) require the same gene set as the main expression matrix.
-- If you want to use HVGs only for **visualization/clustering**, but use more genes (the full post-QC gene set or a large collection) for **differential analysis (especially Memento)**, the recommended workflow is:
-  1. Immediately after loading + basic QC, call `scat.store_raw_counts(adata)` (preserves the full/large gene raw counts into the layer + .raw at that time).
-  2. Make a copy for HVG + visualization: `adata_viz = adata.copy(); ... HVG on adata_viz ...`
-  3. For DE, use the **original adata** (or the restored version), at which point it can still retrieve the corresponding raw counts from the layer (the number of genes depends on how many genes the adata had when you called store).
-  4. If you have already performed HVG subset on the main adata, the layer will also only contain raw counts for those HVGs. In that case DE can only be performed on these genes (consistent with the principle of "user performs filtering before store").
+After HVG-based visualization on a copy, restore or use the preserved layers for full-gene DE, active scoring, or enrichment (pass `adata=` to `run_enrichment` or `run_kegg` to use the stored gene list as background).
-In short: HVG subset will reduce the genes retained in spliced/unspliced, keeping it consistent with .X. If you want to use more genes for DE, you should call the DE function before HVG subset (or on a copy that has not been subset).
+HVG subsetting also subsets the saved layers. This keeps velocity calculations consistent with `.X`. To analyze more genes than the HVG set, store before subsetting or operate on the unfiltered object for DE and enrichment steps.
-Optionally, if you have done HVG + log1p for visualization but later want the raw counts back in `.X` (for the genes currently selected), you can use:
+To restore raw counts into `.X` for the current gene set:
 ```python
-# Restore raw counts into .X (non-destructive by default)
 adata_raw = scat.restore_raw_counts(adata, layer="counts", inplace=False)
-# or inplace=True to modify the current adata
 ```
-See also the "Additional Capability: Standalone Differential Expression" section below for the pure-DE (no velocity) use case.
+See the standalone differential expression section for the no-velocity use case.
 ---
-## Core Positioning
-scATrans helps users extract **condition-wise nascent RNA relative excess** signals (a lightweight proxy for differential active transcription) from single-cell velocity-style data.
-- **Basic pipeline (on by default):** DE + unspliced excess after reference gamma correction + optional light bias correction for length/intron number + composite scoring + gene filtering + enrichment + plotting.
-- **Advanced options are opt-in:** They are powerful but add complexity and information overload. New users should start with defaults.
-- **Honest by design:** The default `significant` list is deliberately strict (often empty or very small on real data). The primary deliverable is the full ranked table (`all_results`). Diagnostics are always provided so you can judge whether the signals are trustworthy in your data.
----
-## Quick Start (Minimal Default Flow) — Recommended
-```python
-import scanpy as sc
-import scatrans as scat
-# 1. Load data that contains spliced/unspliced or mature/nascent layers
-adata = sc.read_h5ad("your_data.h5ad")
-# 2. (Optional but recommended) Attach gene features for bias correction
-#    Uses the bundled mouse table by default.
-adata = scat.add_gene_features(adata)
-# 3. Run the analysis with default parameters — no need to worry about
-#    bias_correction, effective_gamma, mixed models, etc.
-adata_res, significant, all_results = scat.active_score(
-    adata_input=adata,
-    groupby="condition",
-    target_group="Disease",
-    reference_group="Control",
-)
-# 4. The most important output for almost everyone is all_results (full ranked table)
-print(all_results.head())
-```
-**Key point:** The default settings run a basic analysis without requiring decisions about `bias_correction`, `effective_gamma`, `use_mixed_model`, or `use_permutation`.
+## Core Workflow
-Pseudobulk analysis (`use_pseudobulk`) and choice of differential expression test (`de_method`, e.g. "wilcoxon") are standard configuration options that can be selected according to the experimental design (see the section on common basic switches).
+The default path performs differential expression, reference-gamma unspliced excess, optional length/intron bias correction, composite scoring, gene filtering, enrichment, and plotting.
-The built-in `significant` list uses a strict conjunction of thresholds and is frequently small or empty. This behavior is expected. The primary output for most users is the full ranked table returned as `all_results`.
+Advanced options are disabled by default. The internal `significant` list applies strict thresholds and is frequently empty or small. Return the complete ranked table in `all_results` and apply custom filters. Diagnostics are stored in `adata_res.uns["scatrans"]["diagnostics"]`.
 ---
@@ -208,16 +159,11 @@ adata_res, significant, all_results = scat.active_score(
 )
 ```
-This performs:
-- Differential expression between the two groups
-- Velocity delta (nascent excess) using a reference-group gamma
-- Light Huber bias correction on gene length + intron number (default)
-- Composite active_score (0–100)
-- Rich diagnostics written to `adata_res.uns["scatrans"]["diagnostics"]`
+This computes differential expression, reference-group gamma excess for the unspliced layer, optional Huber bias correction on gene length and intron number, a composite active score, and stores diagnostics in `adata_res.uns["scatrans"]["diagnostics"]`.
 ### 3.1.1 Common basic switches: pseudobulk and DE test method
-These two are **standard basic options**, not advanced exploration features. You can turn them on freely depending on your data and analysis preferences:
+These are standard options available for most analyses.
 **Pseudobulk mode** (recommended when you have multiple biological replicates per condition):
@@ -258,7 +204,7 @@ The `filter_active_genes` helper has a `preset="pseudobulk"` that applies more l
 ### 3.2 Gene filtering with filter_active_genes (core output tool)
-Because the built-in `significant` list is strict, most users derive their final list from `all_results` using `filter_active_genes`.
+The internal `significant` list is strict. Most users filter the full table returned in `all_results` with `filter_active_genes`.
 ```python
 # Start permissive, then tighten based on your data
@@ -314,14 +260,7 @@ kegg_res = scat.run_kegg(
 ### Default: use the package's bundled gene sets (clearest logic)
-The package now **defaults to the new organism-specific built-in libraries** (4 files added to data/):
-- `Hs_GO_Biological_Process_2026.txt` + `Hs_KEGG_2026.txt` for human
-- `Mm_GO_Biological_Process_2026.txt` + `Mm_KEGG_2026.txt` for mouse
-You only need to specify `organism=` (for KEGG especially). Base names like "GO_Biological_Process", "KEGG", "GO_BP" are automatically resolved to the correct organism + 2026 built-in file.
-If you want a specific historical Enrichr version (e.g. GO_Biological_Process_2023), just write the full name — it will be treated as an Enrichr request.
+The package defaults to organism-specific bundled sets (`Hs_GO_Biological_Process_2026.txt`, `Hs_KEGG_2026.txt`, and the corresponding mouse files). Specify `organism=` for KEGG or base GO names. Historical Enrichr names (e.g., `GO_Biological_Process_2023`) are passed through when supplied explicitly.
 ```python
 # KEGG — just specify organism, gets the correct built-in (Hs/Mm_2026) automatically
@@ -391,6 +330,61 @@ simplified = scat.simplify_enrichment(
 `run_kegg` and `simplify_enrichment` are convenience wrappers around the core `run_enrichment` function.
+### run_go (GO enrichment, clusterProfiler-style)
+```python
+# Biological Process (defaults to the bundled Mm/Hs_GO_Biological_Process_2026)
+go_bp = scat.run_go(
+    gene_list=markers,
+    ontology="BP",          # "BP", "CC", "MF", or "ALL"
+    organism="mouse",
+    adata=adata,            # recommended for correct universe
+    return_all=True,
+)
+# ALL three ontologies + unified multiple-testing correction across them
+go_all = scat.run_go(
+    markers, ontology="ALL", organism="mouse",
+    return_all=True,
+    adjust_across_all=True,   # re-compute BH on all terms together (stricter)
+)
+# go_all.attrs["per_ontology_attrs"] contains full diagnostics for BP/CC/MF separately
+```
+`run_go` automatically resolves to the organism-specific bundled sets when possible (BP is bundled; CC/MF fall back to gseapy/Enrichr if the library is installed).
+### Exporting results for manuscripts / supplementary materials
+The new helpers make it trivial to produce clean, reproducible tables:
+```python
+res = scat.run_kegg(genes, organism="mouse", return_all=True, include_gene_list=True)
+saved = scat.save_enrichment_report(
+    res,
+    prefix="cluster1_kegg",   # or "results/suppl/my_enrich" (directories created automatically)
+    save_excel=True,
+    save_csv=True,
+    save_tsv=True,            # often preferred for gene symbols + Excel locale safety
+    save_metadata=True,
+    save_term_gene_table=True,
+)
+# saved -> {'results_csv': ..., 'results_tsv': ..., 'term_gene_table_csv': ..., 'metadata_json': ..., 'results_xlsx': ...}
+# Long-format term–gene table (one row per gene; perfect for networks, follow-up stats, etc.)
+long_table = scat.expand_enrichment_genes(res)
+# If the input was from run_go(ontology="ALL"), long_table will have an "Ontology" column first.
+```
+`save_enrichment_report` also writes a rich `metadata.json` (and a "metadata" sheet in the xlsx) containing:
+- `analysis_info` (package, version, timestamp)
+- `gene_set_info` (requested/resolved + `requested_source` vs `actual_source`: "bundled", "gseapy", "gmt", "dict")
+- `universe_info` (effective N, dropped genes, restrict behavior, etc.)
+- Full `.attrs` from the enrichment call (including per-ontology details for GO ALL)
+All empty results still carry diagnostic `.attrs` (`reason`, `gene_set_info`, `universe_info`, etc.) so you never lose information when a call returns no terms.
 ### 3.4 Visualization
 ```python
@@ -501,28 +495,11 @@ This function looks for common gene list columns (`Genes`, `Lead_genes`, etc.) a
 ---
-## Result Interpretation and Notes
-### Default `significant` is often empty or very small — this is normal
-The internal significance mask is a strict conjunction:
-- `p_adj < pval_cutoff`
-- `logFC > logfc_cutoff`
-- `velocity_residual > 0`
-- sufficient expression
-- `active_score > 0`
-- (if `use_permutation`) `active_score_fdr < active_fdr_cutoff`
-- (if `use_delta_variance_pval`) `delta_var_pval < cutoff`
-On real data this frequently returns 0–few genes. **Use `all_results`** and apply your own biologically motivated filters.
+## Result Interpretation
-### Always start from `all_results`
+The internal significance mask applies a strict conjunction of thresholds. On real data it often returns zero or few genes. Use the full table in `all_results`, which is sorted by `active_score` descending and retains every gene that passed initial expression filters.
-It is already sorted by `active_score` descending and contains every gene that passed basic expression filters together with all computed values.
-### Diagnostics (always inspect these)
-After every run look at:
+After each run inspect the diagnostics:
 ```python
 meta = adata_res.uns["scatrans"]
@@ -531,25 +508,23 @@ print(meta["diagnostics"]["bias_correction"])
 print(meta.get("permutation_approximation_note"))
 ```
-- **unspliced_global_fraction**: > ~50% often indicates technical problems (nuclear enrichment, gDNA contamination).
-- **bias_correction**: number of genes used for the fit, coefficients, whether median fallback was used.
-- **permutation_approximation_note**: only present when `use_permutation=True`. Records that velocity layers/gamma were fixed for speed.
+Global unspliced fractions above ~50% frequently indicate technical issues. Bias-correction diagnostics report the number of genes used and any fallback behavior. The permutation note records that velocity layers and the reference gamma were fixed for speed.
 ---
-## Optional Advanced Features (Opt-in)
+## Optional Advanced Features
-The following options can be enabled when relevant to the analysis goals:
+The following flags are disabled by default and should be enabled only when required by the experimental design:
-- `use_permutation=True`: compute a permutation-based FDR for the composite score. When enabled, a note describing the approximation (velocity layers and reference gamma are fixed from the original labeling) is stored in the results.
-- `bias_correction="none"`: disable the length/intron correction on the velocity delta. The raw delta is then used directly as `velocity_residual`.
-- `show_effective_gamma=True`: include the per-gene reference-group U/S ratio (used internally for the delta calculation) in the output tables.
-- `use_mixed_model=True`: fit a mixed linear model with sample as random intercept and obtain `delta_variance` (fraction of modeled variance attributed to condition) along with a likelihood-ratio p-value.
-- `prioritize_velocity=True`: convenience flag that increases the relative weight given to the velocity_residual (nascent excess) term while decreasing the weights on the differential expression terms. This option is provided for analyses whose primary goal is to highlight differences in unspliced abundance after reference correction. It is documented under advanced features because it changes the balance of the composite score.
+- `use_permutation=True`
+- `bias_correction="none"`
+- `show_effective_gamma=True`
+- `use_mixed_model=True`
+- `prioritize_velocity=True`
-A helper function `diagnose_design` is available to summarize cell and sample counts, global unspliced fraction, and to surface warnings and suggestions before or between runs of `active_score`.
+`diagnose_design` summarizes cell and sample counts plus global unspliced fraction and returns warnings and a suggested `filter_active_genes` preset. It runs automatically when `sample_col` or `use_pseudobulk=True` is supplied.
-These options are not enabled by default. When used, the corresponding diagnostics should be examined.
+Inspect the corresponding diagnostics after enabling any advanced option.
 ### use_permutation=True
@@ -598,17 +573,11 @@ Recommended only when you have a reasonable number of cells and want noise reduc
 ## Limitations
-The method implements a composite score based on a simplified, reference-group gamma excess calculation for the unspliced layer together with standard differential expression statistics.
+The unspliced excess term is a group-contrast proxy derived from a reference-group gamma calculation. It is not a full stochastic or dynamical model.
-- The unspliced excess term is a group-contrast proxy and is not equivalent to scVelo's full stochastic or dynamical models.
-- The approach is most straightforward to interpret for clear binary group contrasts. Heterogeneity within the target group can reduce the observed signal.
-- When `use_permutation=True`, only the group labels are permuted; the velocity layers and reference gamma are computed once on the original data for computational efficiency. This approximation is recorded in the results metadata.
-- Global unspliced fractions above ~50% are flagged by the package, as they may indicate technical issues affecting the velocity layers.
-- Bias correction performance depends on the number and quality of genes with length and intron annotations.
-- With small numbers of biological replicates, power for the velocity component and for permutation-based FDR is limited. Users should examine the full distributions in `all_results`.
-- `delta_variance` and the associated mixed-model p-values tend to be conservative in the presence of substantial between-sample variation.
+Interpretation is simplest for clear binary contrasts. Within-group heterogeneity reduces observed signal. The permutation approximation (used when `use_permutation=True`) fixes velocity layers and the reference gamma on the original labels; the note is recorded in the results. Global unspliced fractions above ~50% are flagged as potential technical artifacts. Bias-correction quality depends on the number of genes with length and intron annotations. With few biological replicates, power for the velocity term and permutation-based FDR is limited. Mixed-model statistics tend to be conservative when between-sample variation is large.
-Users should examine the diagnostics stored under `adata.uns["scatrans"]["diagnostics"]`, the distributions of scores in the returned tables, and (where possible) the raw spliced/unspliced counts for candidate genes before biological interpretation.
+Always examine diagnostics, score distributions, and (when available) the original spliced/unspliced counts before biological interpretation.
 ---
@@ -651,7 +620,7 @@ Full signatures and all parameters are documented in the function docstrings and
 - `add_gene_features(adata, organism="mouse", ...)` — attach length/intron info
 - `list_available_gene_features()`
 - `diagnose_design(adata, groupby, target_group, reference_group, sample_col=None)` — analyzes cell/sample counts and global unspliced fraction; returns warnings, recommendations, and a suggested `filter_active_genes` preset. Automatically called internally when `sample_col` or `use_pseudobulk=True` is used.
-- `run_enrichment(...)`, `run_kegg(...)`, `simplify_enrichment(...)`, `list_bundled_gene_sets()`
+- `run_enrichment(...)`, `run_kegg(...)`, `run_go(...)`, `simplify_enrichment(...)`, `save_enrichment_report(...)`, `expand_enrichment_genes(...)`, `list_bundled_gene_sets()`
 - `scat.pl.*` plotting functions (comet_plot, volcano_plot, bias_diagnostic_plot, ...)
 - `scat.qc.unspliced_global(adata)`
@@ -691,12 +660,10 @@ All `scat.pl.*` functions support `ax=` / `axes=` (for embedding in multi-panel
   Recommended: log fold change vs. bias-corrected unspliced residual (velocity_residual), sized and colored by active_score.
   - `s=3` (or 1-5): force **fixed** small point size for everything (direct, simple control).
   - `point_scale=0.2` + `min_size=1`: for variable sizing, make tiniest background points truly small.
-  (Size API modeled after flexible controls seen in omicverse.pl.* )
 - `scat.pl.volcano_plot(results_df, top_n=10, label_genes=None, point_scale=1.0, min_size=2, s=None, ...)`
   2D volcano (logFC vs. -log10(p_adj)). Supports `label_genes=[...]` for manual gene labels
-  (combined with top_n) — ggVolcano style flexibility. Classic up/down/ns coloring when
-  not using active_score. See https://github.com/BioSenior/ggVolcano for style inspiration.
+  (combined with top_n). Classic up/down/ns coloring when not using active_score.
   Use `s=2` for uniformly small points, or min_size + point_scale for score/p-value sized tiny backgrounds.
   Especially helpful for pure DE results (no active_score).
@@ -778,9 +745,7 @@ scat.pl.enrich_dotplot(enrich)
 `differential_expression` supports the same flexible backends as `active_score` (scanpy methods, PyDESeq2 pseudobulk, mixed models, and optionally Memento as a method-of-moments estimator). The returned table is directly compatible with `filter_active_genes`, enrichment functions, and all `scat.pl.*` plotting helpers.
-This makes the package useful even if you only need modern DE + enrichment + visualization, while the core `active_score` workflow remains the recommended path when you have velocity information.
-See `examples/memento_de_example.py` for a complete demonstration of both the velocity-focused and pure-DE usage patterns.
+The package therefore supports both velocity-based active transcription analysis and conventional DE + enrichment workflows. See `examples/memento_de_example.py` for a complete demonstration of the pure-DE path.
 **Important: raw counts requirement**
@@ -823,4 +788,4 @@ MIT License.
 ---
-*This README emphasizes the basic, honest, low-ceremony workflow centered on active transcription analysis from velocity data. Advanced capabilities (including standalone DE with Memento support) remain available for users who need them.*

scatrans 0.7.0.dev0__tar.gz → 0.8.0.dev0__tar.gz

scatrans 0.7.0.dev0tar.gz → 0.8.0.dev0tar.gz