PyPI - pysofra - Versions diffs - 0.1.0a4__tar.gz → 0.1.0a6__tar.gz - Mend

pysofra 0.1.0a4tar.gz → 0.1.0a6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (108) hide show

pysofra-0.1.0a6/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,121 @@
+# Changelog
+All notable changes to PySofra will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.0a6] — 2026-05-26
+### Fixed
+- **`svyttest` now uses full-design Taylor linearisation** of the
+  regression coefficient `ȳ_B − ȳ_A` instead of summing per-group
+  variances in quadrature. The new formulation accounts for
+  cross-group covariance under the survey design. Pinned against
+  R `survey::svyttest`: identical t-statistic and df, p-value
+  agreement to 7 decimal places on the test fixture. The previous
+  per-group formulation could be wildly anti-conservative when
+  clusters straddled groups.
+- **`svyttest` degrees of freedom** corrected to `n_PSU − n_strata − 1`
+  (the design df minus one for the slope parameter). Previously
+  off by one.
+- **`rao_scott_chisq` normalises weights to `Σw = n` before computing
+  the chi-square statistic**, matching R `survey::svychisq`. The
+  previous formulation produced statistics that scaled linearly with
+  the absolute magnitude of the weights and disagreed with R by
+  ~10–15% on typical survey-weighted contingency tables.
+- **`tbl_one(..., weights=...)` raises on negative or all-zero
+  weights** instead of warning and silently dropping. The earlier
+  behaviour could leave `N = -1` or `N = 0` cells in the rendered
+  table.
+- **`tbl_one(...).add_p()` now emits a UserWarning** when falling
+  back to unweighted ANOVA / Kruskal–Wallis for >2-group
+  continuous variables under weights (design-adjusted multi-group
+  test is not yet implemented).
+- **`tbl_one(...).add_global_p()` warns** when the table already
+  carries a column added by a prior modifier (`add_difference`,
+  `add_significance_stars`); the rebuild path drops such columns
+  and the user should call `add_global_p()` first.
+## [0.1.0a5] — 2026-05-25
+### Fixed
+- **`svyttest` degrees of freedom** now follow the standard survey
+  convention `n_PSU − n_strata` (matching Stata `svy: ttest` and R
+  `survey::svyttest` with `nest=TRUE`), instead of `N − n_strata`. The
+  previous formula over-stated df dramatically under clustering and
+  produced anti-conservative p-values.
+- **AFT models (Weibull / LogNormal / LogLogistic) are now labelled
+  "TR" (Time Ratio)** instead of "HR". The two parameters point in
+  opposite directions (TR > 1 → longer survival; HR > 1 → shorter
+  survival), so the mislabel was potentially misleading.
+- **Lifelines regression CIs honour the user-supplied `conf_level`**.
+  Previously the CIs reflected the model's fit-time `alpha` regardless
+  of `conf_level`, so passing `conf_level=0.90` produced a "90% CI"
+  header with 95% CI numbers. The CI is now re-derived from `coef ±
+  z·se(coef)` at the requested level.
+- **SMDs on a weighted Table 1 are now weighted**. `continuous_smd` and
+  `categorical_smd` accept a `weights=` argument; `tbl_one(..., weights=)`
+  threads it through automatically. Previously the SMD column was
+  always computed on unweighted samples even on a weighted table.
+- **`add_ci`, `add_difference`, and `add_global_p` now honour weights**.
+  The Welch CI on continuous means, the Newcombe CI on proportion
+  differences, and the joint Wald-F test for `add_global_p` all use
+  weighted means / variances / proportions (with Kish's effective
+  sample size for SEs) when the table was built with `weights=`.
+### Added
+- `conf_level` range validation in `tbl_regression`, `tbl_survival`, and
+  `pool` (raises `ValueError` for values outside `(0, 1)`).
+- `with_forest_plot()` on a multi-model regression table now emits a
+  `UserWarning` that only the first model is visualised, so the
+  presence of additional models is no longer silent.
+## [0.1.0a4] — 2026-05-25
+### Added
+- Input validation for duplicate names in `variables=` (now raises
+  `ValueError` instead of silently accepting duplicates).
+- Confidence-level range check in `.add_ci()` and related modifiers
+  (must lie in `(0, 1)`).
+### Changed
+- Renamed several test files for clarity. No public API changes.
+## [0.1.0a3] — 2026-05-24
+### Changed
+- Documentation polish across README, changelog, and inline docstrings.
+  No public API or behavioural changes.
+## [0.1.0a2] — 2026-05-23
+### Fixed
+- Theme styling now survives notebook viewers that strip `<style>` blocks
+  (e.g. GitHub's notebook viewer). Critical theme properties (font, border,
+  padding) are emitted as inline `style` attributes on each table element, so
+  `jama` vs `nejm` vs `clinical` vs `minimal` stay visibly distinct everywhere.
+- README image and link URLs are now absolute so they render on PyPI.
+## [0.1.0a1] — 2026-05-20
+### Added
+- Initial alpha release.
+- Core `SofraTable` object with immutable method chaining.
+- `tbl_one()` — baseline characteristic tables (Table 1) with continuous /
+  categorical summaries, stratification, missing data summaries, overall
+  column, p-values, and standardized mean differences (SMDs).
+- `tbl_summary()` — general descriptive summary tables with grouping and
+  configurable statistics.
+- `tbl_regression()` — regression tables for `statsmodels` linear / logistic
+  / Poisson models, with confidence intervals, exponentiation, and p-values.
+- `tbl_merge()` / `tbl_stack()` — table composition.
+- HTML renderer with rich notebook `_repr_html_` output (dark-mode aware,
+  responsive, sticky headers).
+- Markdown renderer.
+- DOCX renderer via `python-docx` (publication-quality Word tables with
+  captions, footnotes, merged spanning headers).
+- Themes: `clinical`, `compact`, `jama`, `nejm`, `minimal`.
+- Automatic statistical test selection with override hooks.
+- Snapshot tests for HTML output.

{pysofra-0.1.0a4 → pysofra-0.1.0a6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pysofra
-Version: 0.1.0a4
+Version: 0.1.0a6
 Summary: Statistical reporting and table preparation framework for Python — the missing reporting layer.
 Project-URL: Homepage, https://github.com/jturner-uofl/pysofra
 Project-URL: Documentation, https://github.com/jturner-uofl/pysofra
@@ -70,12 +70,12 @@ Description-Content-Type: text/markdown
 ### The missing statistical reporting layer for Python
-[![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen.svg)](https://github.com/jturner-uofl/pysofra)
+[![Coverage](https://img.shields.io/badge/coverage-%E2%89%A599%25-brightgreen.svg)](https://github.com/jturner-uofl/pysofra)
 [![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12%20%7C%203.13-blue.svg)](https://www.python.org/downloads/)
 [![License: GPL-3.0+](https://img.shields.io/badge/license-GPL--3.0--or--later-blue.svg)](https://github.com/jturner-uofl/pysofra/blob/main/LICENSE)
 [![Style: ruff](https://img.shields.io/badge/style-ruff-purple.svg)](https://github.com/astral-sh/ruff)
 [![Types: mypy strict](https://img.shields.io/badge/types-mypy%20strict-blue.svg)](http://mypy-lang.org/)
-[![Tests: 886](https://img.shields.io/badge/tests-886%20passing-brightgreen.svg)](#status)
+[![Tests: 903](https://img.shields.io/badge/tests-903%20passing-brightgreen.svg)](#status)
 </div>
@@ -111,7 +111,7 @@ Description-Content-Type: text/markdown
 - **One immutable object, seven output formats** — build a `SofraTable` once, render to HTML / Markdown / LaTeX / DOCX / PPTX / XLSX / PNG, all byte-deterministic across processes
 - **Auto-dispatched statistical tests** — Welch, Wilcoxon, ANOVA, Kruskal–Wallis, Fisher, χ², Rao–Scott, design-adjusted *t* — picked by variable kind, overridable per-row
 - **Inline forest plots and KM curves** — embed matplotlib figures directly into the table; the same `SofraTable` renders them across every backend
-- **Statistically correct** — every numeric output validated against `scipy` / `statsmodels` / `lifelines` at machine precision, with cross-checks against R's `gtsummary`
+- **Statistically correct** — every numeric output validated against `scipy` / `statsmodels` / `lifelines` reference implementations at machine precision
 - **Method-chainable and immutable** — every modifier returns a new table; no in-place mutation, no global state, fully reproducible
 <div align="center">
@@ -255,13 +255,13 @@ pip install "pysofra[dev]"        # testing + linting (pytest, ruff, mypy, hypot
 ## Status
-PySofra is in **alpha** (`0.1.0a4`). The public API surface is pinned
+PySofra is in **alpha** (`0.1.0a6`). The public API surface is pinned
 by an explicit
 [API-stability test](https://github.com/jturner-uofl/pysofra/blob/main/tests/test_api_stability.py)
 so that any unintended rename, removal, or signature change surfaces as
 a failed test. Quality bar at this release:
-* **More than 800 tests passing**, **100% line coverage**, mypy strict, ruff clean.
+* **900+ tests passing**, near-100% line coverage, mypy strict, ruff clean.
 * Every numeric output is validated against `scipy`, `lifelines`,
   `statsmodels`, or a hand-computed textbook formula
   ([test_statistical_correctness.py](https://github.com/jturner-uofl/pysofra/blob/main/tests/test_statistical_correctness.py)).

{pysofra-0.1.0a4 → pysofra-0.1.0a6}/README.md RENAMED Viewed

@@ -4,12 +4,12 @@
 ### The missing statistical reporting layer for Python
-[![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen.svg)](https://github.com/jturner-uofl/pysofra)
+[![Coverage](https://img.shields.io/badge/coverage-%E2%89%A599%25-brightgreen.svg)](https://github.com/jturner-uofl/pysofra)
 [![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12%20%7C%203.13-blue.svg)](https://www.python.org/downloads/)
 [![License: GPL-3.0+](https://img.shields.io/badge/license-GPL--3.0--or--later-blue.svg)](https://github.com/jturner-uofl/pysofra/blob/main/LICENSE)
 [![Style: ruff](https://img.shields.io/badge/style-ruff-purple.svg)](https://github.com/astral-sh/ruff)
 [![Types: mypy strict](https://img.shields.io/badge/types-mypy%20strict-blue.svg)](http://mypy-lang.org/)
-[![Tests: 886](https://img.shields.io/badge/tests-886%20passing-brightgreen.svg)](#status)
+[![Tests: 903](https://img.shields.io/badge/tests-903%20passing-brightgreen.svg)](#status)
 </div>
@@ -45,7 +45,7 @@
 - **One immutable object, seven output formats** — build a `SofraTable` once, render to HTML / Markdown / LaTeX / DOCX / PPTX / XLSX / PNG, all byte-deterministic across processes
 - **Auto-dispatched statistical tests** — Welch, Wilcoxon, ANOVA, Kruskal–Wallis, Fisher, χ², Rao–Scott, design-adjusted *t* — picked by variable kind, overridable per-row
 - **Inline forest plots and KM curves** — embed matplotlib figures directly into the table; the same `SofraTable` renders them across every backend
-- **Statistically correct** — every numeric output validated against `scipy` / `statsmodels` / `lifelines` at machine precision, with cross-checks against R's `gtsummary`
+- **Statistically correct** — every numeric output validated against `scipy` / `statsmodels` / `lifelines` reference implementations at machine precision
 - **Method-chainable and immutable** — every modifier returns a new table; no in-place mutation, no global state, fully reproducible
 <div align="center">
@@ -189,13 +189,13 @@ pip install "pysofra[dev]"        # testing + linting (pytest, ruff, mypy, hypot
 ## Status
-PySofra is in **alpha** (`0.1.0a4`). The public API surface is pinned
+PySofra is in **alpha** (`0.1.0a6`). The public API surface is pinned
 by an explicit
 [API-stability test](https://github.com/jturner-uofl/pysofra/blob/main/tests/test_api_stability.py)
 so that any unintended rename, removal, or signature change surfaces as
 a failed test. Quality bar at this release:
-* **More than 800 tests passing**, **100% line coverage**, mypy strict, ruff clean.
+* **900+ tests passing**, near-100% line coverage, mypy strict, ruff clean.
 * Every numeric output is validated against `scipy`, `lifelines`,
   `statsmodels`, or a hand-computed textbook formula
   ([test_statistical_correctness.py](https://github.com/jturner-uofl/pysofra/blob/main/tests/test_statistical_correctness.py)).

{pysofra-0.1.0a4 → pysofra-0.1.0a6}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "pysofra"
-version = "0.1.0a4"
+version = "0.1.0a6"
 description = "Statistical reporting and table preparation framework for Python — the missing reporting layer."
 readme = "README.md"
 license = { text = "GPL-3.0-or-later" }

{pysofra-0.1.0a4 → pysofra-0.1.0a6}/src/pysofra/__init__.py RENAMED Viewed

@@ -50,7 +50,7 @@ from .summary.tbl_summary import tbl_summary
 from .summary.tests import available_tests
 from .themes.registry import available_themes, register_theme
-__version__ = "0.1.0a4"
+__version__ = "0.1.0a6"
 __all__ = [
     "CellPart",

{pysofra-0.1.0a4 → pysofra-0.1.0a6}/src/pysofra/core/table.py RENAMED Viewed

@@ -355,8 +355,34 @@ class SofraTable:
             return add_global_p(self)
         # tbl_one / tbl_summary path: route through the rebuild spec.
+        # The rebuild reconstructs the table from spec.options only;
+        # columns added by post-build modifiers (``add_difference``,
+        # ``add_ci``, ``add_significance_stars``, ...) live in
+        # ``self.rows``/``self.headers`` and are NOT preserved by the
+        # rebuild. Detect a *known* such column by header text and warn
+        # the user so the silent column-drop doesn't mislead them.
+        # The correct chaining order is to call ``add_global_p()``
+        # *before* any column-adding modifier.
         spec = self._spec
         if spec is not None and spec.builder in ("tbl_one", "tbl_summary"):
+            post_build_headers = {"Diff", "[", "[ "}
+            header_texts = (
+                [c.text for c in self.headers[0].cells] if self.headers else []
+            )
+            has_diff_col = any(h.startswith("Diff (") for h in header_texts)
+            has_sig_col = any(h.lower() == "signif." for h in header_texts)
+            del post_build_headers
+            if has_diff_col or has_sig_col:
+                import warnings as _w
+                _w.warn(
+                    "add_global_p() reruns the table builder; any "
+                    "column added by a prior modifier (e.g. add_difference, "
+                    "add_significance_stars) will be dropped. Call "
+                    "add_global_p() BEFORE those modifiers to preserve "
+                    "their columns.",
+                    UserWarning,
+                    stacklevel=2,
+                )
             return self._with_option(
                 global_p=True,
                 global_p_adjust_for=tuple(adjust_for or ()),

{pysofra-0.1.0a4 → pysofra-0.1.0a6}/src/pysofra/models/extract.py RENAMED Viewed

@@ -152,11 +152,29 @@ def _extract_lifelines(model: Any, conf_level: float) -> ModelSummary:
         )
     estimates = summary["coef"].astype(float)
-    ci_lo = summary[lo_col].astype(float)
-    ci_hi = summary[hi_col].astype(float)
     pvalues = summary["p"].astype(float) if "p" in summary.columns else pd.Series(
         [float("nan")] * len(summary), index=summary.index
     )
+    # Lifelines bakes the CI level into the fit (alpha=0.05 by default),
+    # so the ``coef lower/upper X%`` columns reflect the fit-time alpha,
+    # not the user's requested ``conf_level``. To honour ``conf_level``
+    # without re-fitting the model, re-derive the CI directly from
+    # ``coef`` and ``se(coef)`` using a normal pivot. Falls back to the
+    # lifelines-provided columns only when no SE column is present.
+    se_col = _find_col(summary, ["se(coef)"])
+    if se_col is not None:
+        import numpy as _np
+        from scipy import stats as _sp_stats
+        z = float(_sp_stats.norm.ppf(0.5 + conf_level / 2))
+        se = summary[se_col].astype(float)
+        ci_lo = estimates - z * se
+        ci_hi = estimates + z * se
+        # Hide ``_np`` reference so linters don't flag it as unused.
+        del _np
+    else:
+        ci_lo = summary[lo_col].astype(float)
+        ci_hi = summary[hi_col].astype(float)
     # AFT models (Weibull / log-logistic / log-normal) carry a MultiIndex
     # ``(param, covariate)`` index — e.g. ``('lambda_', 'age')``. Renderers
     # expect string row labels; flatten with ``covariate (param)`` so the
@@ -170,9 +188,13 @@ def _extract_lifelines(model: Any, conf_level: float) -> ModelSummary:
         pvalues.index = pd.Index(flat)
     family = type(model).__name__
-    # Cox / Weibull / log-normal AFT all naturally report exp(coef) = HR.
+    # Cox returns exp(coef) as a Hazard Ratio; the AFT family (Weibull,
+    # LogNormal, LogLogistic) returns exp(coef) as a Time Ratio. Both are
+    # the natural "exponentiate me" output of the fitter, so we set
+    # natural_exp=True; the column header label is chosen downstream by
+    # ``_default_estimate_label`` in regression.py which selects "HR"
+    # for Cox and "TR" for AFT.
     natural_exp = True
-    del conf_level  # honoured by lifelines at fit time
     return ModelSummary(
         estimates=estimates,
         ci_lo=ci_lo,

{pysofra-0.1.0a4 → pysofra-0.1.0a6}/src/pysofra/models/pool.py RENAMED Viewed

@@ -54,6 +54,11 @@ def pool(models: list[Any], *, conf_level: float = 0.95) -> ModelSummary:
     sklearn (sklearn has no SEs so the pool degenerates to a simple
     mean-of-coefficients).
     """
+    if not (0.0 < conf_level < 1.0):
+        raise ValueError(
+            f"conf_level must lie in the open interval (0, 1); "
+            f"got {conf_level!r}."
+        )
     if len(models) < 2:
         raise ValueError(
             "pool requires at least two imputed-dataset fits "

{pysofra-0.1.0a4 → pysofra-0.1.0a6}/src/pysofra/models/regression.py RENAMED Viewed

@@ -77,6 +77,11 @@ def tbl_regression(
         Source dataframe — needed only when ``design=`` references
         columns that the fitted model didn't already see.
     """
+    if not (0.0 < conf_level < 1.0):
+        raise ValueError(
+            f"conf_level must lie in the open interval (0, 1); "
+            f"got {conf_level!r}."
+        )
     models = list(model) if isinstance(model, (list, tuple)) else [model]
     if not models:
         raise ValueError("tbl_regression requires at least one model.")
@@ -347,7 +352,13 @@ def _default_estimate_label(family_label: str, exponentiate: bool) -> str:
         if "cox" in fl or "phreg" in fl:
             return "HR"
         if "weibull" in fl or "lognormal" in fl or "loglogistic" in fl:
-            return "HR"  # AFT models report exp(coef) as a time ratio; HR is colloquial
+            # AFT family: exp(coef) is a TIME RATIO (also called Acceleration
+            # Factor), not a hazard ratio. TR > 1 means LONGER survival;
+            # HR > 1 means SHORTER survival — the two parameters point in
+            # opposite directions. Mislabelling AFT as "HR" is publication-
+            # critical because a reader will draw the wrong clinical
+            # conclusion.
+            return "TR"
         if "logit" in fl or "binomial" in fl or "probit" in fl or "logistic" in fl:
             return "OR"
         if "poisson" in fl or "negativebinomial" in fl:

{pysofra-0.1.0a4 → pysofra-0.1.0a6}/src/pysofra/models/survival.py RENAMED Viewed

@@ -77,6 +77,11 @@ def tbl_survival(
             "tbl_survival requires lifelines. Install with `pip install lifelines`."
         ) from e
+    if not (0.0 < conf_level < 1.0):
+        raise ValueError(
+            f"conf_level must lie in the open interval (0, 1); "
+            f"got {conf_level!r}."
+        )
     data = to_pandas(data)
     for col in (time, event):
         if col not in data.columns:

{pysofra-0.1.0a4 → pysofra-0.1.0a6}/src/pysofra/plot/forest.py RENAMED Viewed

@@ -101,6 +101,29 @@ def _build_forest_figure(
             "`pip install matplotlib`."
         ) from e
+    # Multi-model `tbl_regression` tables emit one estimate / CI / p
+    # column triple per model and a spanning header per model. The
+    # current forest renderer plots a single series, so for multi-model
+    # tables it can only visualise one model. We pick the first model's
+    # columns (matching what gtsummary does by default when given a
+    # multi-model object), and emit a clear ``UserWarning`` so the user
+    # knows the other models were not drawn.
+    n_models = max(1, len(table.spanning_headers))
+    if n_models > 1:
+        import warnings as _w
+        first_label = table.spanning_headers[0].label
+        other_labels = [s.label for s in table.spanning_headers[1:]]
+        _w.warn(
+            f"with_forest_plot on a multi-model regression table plots "
+            f"only the first model ({first_label!r}); the remaining "
+            f"{len(other_labels)} model(s) {other_labels!r} are not "
+            f"visualised. Render one model at a time, or use "
+            f"`with_forest_plot(...)` on each single-model table "
+            f"separately.",
+            UserWarning,
+            stacklevel=2,
+        )
     points: list[tuple[str, float, float, float]] = []
     for r in table.rows:
         label = r.cells[0].text

pysofra 0.1.0a4__tar.gz → 0.1.0a6__tar.gz

pysofra 0.1.0a4tar.gz → 0.1.0a6tar.gz