PyPI - pysofra - Versions diffs - 0.1.0a4__tar.gz → 0.1.0a7__tar.gz - Mend

pysofra 0.1.0a4tar.gz → 0.1.0a7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (108) hide show

pysofra-0.1.0a7/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,157 @@
+# Changelog
+All notable changes to PySofra will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.0a7] — 2026-05-26
+### Fixed
+- **`tbl_survival` validates `time` and `event` content**: negative
+  survival times raise `ValueError`; non-`0/1` event codes raise
+  `ValueError`. Previously these were passed silently to lifelines,
+  which would either clamp negative times to zero or treat any
+  nonzero event value as a death — producing a misleading curve
+  without complaint.
+- **`add_global_p()` on weighted `tbl_one`** now uses
+  ``statsmodels.GLM(..., var_weights=w)`` instead of
+  ``freq_weights=w``. For non-integer sampling weights ``freq_weights``
+  scales ``df_resid`` by ``Σw`` (treating the weight as an integer
+  count of repeats), which inflates the effective sample size and
+  produces anti-conservative p-values. ``var_weights`` keeps
+  ``df_resid = n − k`` — the appropriate SRS-weighted Wald-F
+  convention. For full design-based inference (with strata or
+  clusters) use ``ps.SurveyDesign`` end-to-end.
+### Changed
+- **`rao_scott_chisq` docstring** now honestly states a 10–15%
+  typical disagreement with R ``survey::svychisq`` on non-trivial
+  weighted designs (was: an overoptimistic "~5%"). The first-order
+  Kish-DEFF approximation is unchanged; for design-grade chi-square
+  inference call R directly.
+- **Added published-reference citations** to public statistical
+  functions: Welch / Satterthwaite, Wilcoxon (Mann-Whitney 1947),
+  Kruskal-Wallis (1952), Fisher (1922), Pearson chi-square (1900),
+  Wilson score (1927), Rao-Scott (1981/1984), Kish (1965),
+  Benjamini-Hochberg (1995), Benjamini-Yekutieli (2001), Holm
+  (1979), Hommel (1988), Šidák (1967), Binder (1983) Taylor
+  linearisation.
+- **`pool` and `cohen_d` docstrings** now have NumPy-style
+  ``Parameters`` / ``Returns`` / ``References`` sections matching
+  the other public functions.
+## [0.1.0a6] — 2026-05-26
+### Fixed
+- **`svyttest` now uses full-design Taylor linearisation** of the
+  regression coefficient `ȳ_B − ȳ_A` instead of summing per-group
+  variances in quadrature. The new formulation accounts for
+  cross-group covariance under the survey design. Pinned against
+  R `survey::svyttest`: identical t-statistic and df, p-value
+  agreement to 7 decimal places on the test fixture. The previous
+  per-group formulation could be wildly anti-conservative when
+  clusters straddled groups.
+- **`svyttest` degrees of freedom** corrected to `n_PSU − n_strata − 1`
+  (the design df minus one for the slope parameter). Previously
+  off by one.
+- **`rao_scott_chisq` normalises weights to `Σw = n` before computing
+  the chi-square statistic**, matching R `survey::svychisq`. The
+  previous formulation produced statistics that scaled linearly with
+  the absolute magnitude of the weights and disagreed with R by
+  ~10–15% on typical survey-weighted contingency tables.
+- **`tbl_one(..., weights=...)` raises on negative or all-zero
+  weights** instead of warning and silently dropping. The earlier
+  behaviour could leave `N = -1` or `N = 0` cells in the rendered
+  table.
+- **`tbl_one(...).add_p()` now emits a UserWarning** when falling
+  back to unweighted ANOVA / Kruskal–Wallis for >2-group
+  continuous variables under weights (design-adjusted multi-group
+  test is not yet implemented).
+- **`tbl_one(...).add_global_p()` warns** when the table already
+  carries a column added by a prior modifier (`add_difference`,
+  `add_significance_stars`); the rebuild path drops such columns
+  and the user should call `add_global_p()` first.
+## [0.1.0a5] — 2026-05-25
+### Fixed
+- **`svyttest` degrees of freedom** now follow the standard survey
+  convention `n_PSU − n_strata` (matching Stata `svy: ttest` and R
+  `survey::svyttest` with `nest=TRUE`), instead of `N − n_strata`. The
+  previous formula over-stated df dramatically under clustering and
+  produced anti-conservative p-values.
+- **AFT models (Weibull / LogNormal / LogLogistic) are now labelled
+  "TR" (Time Ratio)** instead of "HR". The two parameters point in
+  opposite directions (TR > 1 → longer survival; HR > 1 → shorter
+  survival), so the mislabel was potentially misleading.
+- **Lifelines regression CIs honour the user-supplied `conf_level`**.
+  Previously the CIs reflected the model's fit-time `alpha` regardless
+  of `conf_level`, so passing `conf_level=0.90` produced a "90% CI"
+  header with 95% CI numbers. The CI is now re-derived from `coef ±
+  z·se(coef)` at the requested level.
+- **SMDs on a weighted Table 1 are now weighted**. `continuous_smd` and
+  `categorical_smd` accept a `weights=` argument; `tbl_one(..., weights=)`
+  threads it through automatically. Previously the SMD column was
+  always computed on unweighted samples even on a weighted table.
+- **`add_ci`, `add_difference`, and `add_global_p` now honour weights**.
+  The Welch CI on continuous means, the Newcombe CI on proportion
+  differences, and the joint Wald-F test for `add_global_p` all use
+  weighted means / variances / proportions (with Kish's effective
+  sample size for SEs) when the table was built with `weights=`.
+### Added
+- `conf_level` range validation in `tbl_regression`, `tbl_survival`, and
+  `pool` (raises `ValueError` for values outside `(0, 1)`).
+- `with_forest_plot()` on a multi-model regression table now emits a
+  `UserWarning` that only the first model is visualised, so the
+  presence of additional models is no longer silent.
+## [0.1.0a4] — 2026-05-25
+### Added
+- Input validation for duplicate names in `variables=` (now raises
+  `ValueError` instead of silently accepting duplicates).
+- Confidence-level range check in `.add_ci()` and related modifiers
+  (must lie in `(0, 1)`).
+### Changed
+- Renamed several test files for clarity. No public API changes.
+## [0.1.0a3] — 2026-05-24
+### Changed
+- Documentation polish across README, changelog, and inline docstrings.
+  No public API or behavioural changes.
+## [0.1.0a2] — 2026-05-23
+### Fixed
+- Theme styling now survives notebook viewers that strip `<style>` blocks
+  (e.g. GitHub's notebook viewer). Critical theme properties (font, border,
+  padding) are emitted as inline `style` attributes on each table element, so
+  `jama` vs `nejm` vs `clinical` vs `minimal` stay visibly distinct everywhere.
+- README image and link URLs are now absolute so they render on PyPI.
+## [0.1.0a1] — 2026-05-20
+### Added
+- Initial alpha release.
+- Core `SofraTable` object with immutable method chaining.
+- `tbl_one()` — baseline characteristic tables (Table 1) with continuous /
+  categorical summaries, stratification, missing data summaries, overall
+  column, p-values, and standardized mean differences (SMDs).
+- `tbl_summary()` — general descriptive summary tables with grouping and
+  configurable statistics.
+- `tbl_regression()` — regression tables for `statsmodels` linear / logistic
+  / Poisson models, with confidence intervals, exponentiation, and p-values.
+- `tbl_merge()` / `tbl_stack()` — table composition.
+- HTML renderer with rich notebook `_repr_html_` output (dark-mode aware,
+  responsive, sticky headers).
+- Markdown renderer.
+- DOCX renderer via `python-docx` (publication-quality Word tables with
+  captions, footnotes, merged spanning headers).
+- Themes: `clinical`, `compact`, `jama`, `nejm`, `minimal`.
+- Automatic statistical test selection with override hooks.
+- Snapshot tests for HTML output.

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pysofra
-Version: 0.1.0a4
+Version: 0.1.0a7
 Summary: Statistical reporting and table preparation framework for Python — the missing reporting layer.
 Project-URL: Homepage, https://github.com/jturner-uofl/pysofra
 Project-URL: Documentation, https://github.com/jturner-uofl/pysofra
@@ -70,12 +70,12 @@ Description-Content-Type: text/markdown
 ### The missing statistical reporting layer for Python
-[![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen.svg)](https://github.com/jturner-uofl/pysofra)
+[![Coverage](https://img.shields.io/badge/coverage-%E2%89%A599%25-brightgreen.svg)](https://github.com/jturner-uofl/pysofra)
 [![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12%20%7C%203.13-blue.svg)](https://www.python.org/downloads/)
 [![License: GPL-3.0+](https://img.shields.io/badge/license-GPL--3.0--or--later-blue.svg)](https://github.com/jturner-uofl/pysofra/blob/main/LICENSE)
 [![Style: ruff](https://img.shields.io/badge/style-ruff-purple.svg)](https://github.com/astral-sh/ruff)
 [![Types: mypy strict](https://img.shields.io/badge/types-mypy%20strict-blue.svg)](http://mypy-lang.org/)
-[![Tests: 886](https://img.shields.io/badge/tests-886%20passing-brightgreen.svg)](#status)
+[![Tests: 906](https://img.shields.io/badge/tests-906%20passing-brightgreen.svg)](#status)
 </div>
@@ -111,7 +111,7 @@ Description-Content-Type: text/markdown
 - **One immutable object, seven output formats** — build a `SofraTable` once, render to HTML / Markdown / LaTeX / DOCX / PPTX / XLSX / PNG, all byte-deterministic across processes
 - **Auto-dispatched statistical tests** — Welch, Wilcoxon, ANOVA, Kruskal–Wallis, Fisher, χ², Rao–Scott, design-adjusted *t* — picked by variable kind, overridable per-row
 - **Inline forest plots and KM curves** — embed matplotlib figures directly into the table; the same `SofraTable` renders them across every backend
-- **Statistically correct** — every numeric output validated against `scipy` / `statsmodels` / `lifelines` at machine precision, with cross-checks against R's `gtsummary`
+- **Statistically correct** — every numeric output validated against `scipy` / `statsmodels` / `lifelines` reference implementations at machine precision
 - **Method-chainable and immutable** — every modifier returns a new table; no in-place mutation, no global state, fully reproducible
 <div align="center">
@@ -255,13 +255,13 @@ pip install "pysofra[dev]"        # testing + linting (pytest, ruff, mypy, hypot
 ## Status
-PySofra is in **alpha** (`0.1.0a4`). The public API surface is pinned
+PySofra is in **alpha** (`0.1.0a7`). The public API surface is pinned
 by an explicit
 [API-stability test](https://github.com/jturner-uofl/pysofra/blob/main/tests/test_api_stability.py)
 so that any unintended rename, removal, or signature change surfaces as
 a failed test. Quality bar at this release:
-* **More than 800 tests passing**, **100% line coverage**, mypy strict, ruff clean.
+* **900+ tests passing**, near-100% line coverage, mypy strict, ruff clean.
 * Every numeric output is validated against `scipy`, `lifelines`,
   `statsmodels`, or a hand-computed textbook formula
   ([test_statistical_correctness.py](https://github.com/jturner-uofl/pysofra/blob/main/tests/test_statistical_correctness.py)).

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/README.md RENAMED Viewed

@@ -4,12 +4,12 @@
 ### The missing statistical reporting layer for Python
-[![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen.svg)](https://github.com/jturner-uofl/pysofra)
+[![Coverage](https://img.shields.io/badge/coverage-%E2%89%A599%25-brightgreen.svg)](https://github.com/jturner-uofl/pysofra)
 [![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12%20%7C%203.13-blue.svg)](https://www.python.org/downloads/)
 [![License: GPL-3.0+](https://img.shields.io/badge/license-GPL--3.0--or--later-blue.svg)](https://github.com/jturner-uofl/pysofra/blob/main/LICENSE)
 [![Style: ruff](https://img.shields.io/badge/style-ruff-purple.svg)](https://github.com/astral-sh/ruff)
 [![Types: mypy strict](https://img.shields.io/badge/types-mypy%20strict-blue.svg)](http://mypy-lang.org/)
-[![Tests: 886](https://img.shields.io/badge/tests-886%20passing-brightgreen.svg)](#status)
+[![Tests: 906](https://img.shields.io/badge/tests-906%20passing-brightgreen.svg)](#status)
 </div>
@@ -45,7 +45,7 @@
 - **One immutable object, seven output formats** — build a `SofraTable` once, render to HTML / Markdown / LaTeX / DOCX / PPTX / XLSX / PNG, all byte-deterministic across processes
 - **Auto-dispatched statistical tests** — Welch, Wilcoxon, ANOVA, Kruskal–Wallis, Fisher, χ², Rao–Scott, design-adjusted *t* — picked by variable kind, overridable per-row
 - **Inline forest plots and KM curves** — embed matplotlib figures directly into the table; the same `SofraTable` renders them across every backend
-- **Statistically correct** — every numeric output validated against `scipy` / `statsmodels` / `lifelines` at machine precision, with cross-checks against R's `gtsummary`
+- **Statistically correct** — every numeric output validated against `scipy` / `statsmodels` / `lifelines` reference implementations at machine precision
 - **Method-chainable and immutable** — every modifier returns a new table; no in-place mutation, no global state, fully reproducible
 <div align="center">
@@ -189,13 +189,13 @@ pip install "pysofra[dev]"        # testing + linting (pytest, ruff, mypy, hypot
 ## Status
-PySofra is in **alpha** (`0.1.0a4`). The public API surface is pinned
+PySofra is in **alpha** (`0.1.0a7`). The public API surface is pinned
 by an explicit
 [API-stability test](https://github.com/jturner-uofl/pysofra/blob/main/tests/test_api_stability.py)
 so that any unintended rename, removal, or signature change surfaces as
 a failed test. Quality bar at this release:
-* **More than 800 tests passing**, **100% line coverage**, mypy strict, ruff clean.
+* **900+ tests passing**, near-100% line coverage, mypy strict, ruff clean.
 * Every numeric output is validated against `scipy`, `lifelines`,
   `statsmodels`, or a hand-computed textbook formula
   ([test_statistical_correctness.py](https://github.com/jturner-uofl/pysofra/blob/main/tests/test_statistical_correctness.py)).

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "pysofra"
-version = "0.1.0a4"
+version = "0.1.0a7"
 description = "Statistical reporting and table preparation framework for Python — the missing reporting layer."
 readme = "README.md"
 license = { text = "GPL-3.0-or-later" }

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/src/pysofra/__init__.py RENAMED Viewed

@@ -50,7 +50,7 @@ from .summary.tbl_summary import tbl_summary
 from .summary.tests import available_tests
 from .themes.registry import available_themes, register_theme
-__version__ = "0.1.0a4"
+__version__ = "0.1.0a7"
 __all__ = [
     "CellPart",

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/src/pysofra/core/table.py RENAMED Viewed

@@ -262,6 +262,23 @@ class SofraTable:
         ``fdr_bh`` (Benjamini–Hochberg, default), ``fdr_by``,
         ``bonferroni``, ``holm``, ``hommel``, ``sidak``. Implicitly
         enables p-values when not already on.
+        References
+        ----------
+        Benjamini, Y., & Hochberg, Y. (1995). Controlling the false
+          discovery rate: a practical and powerful approach to multiple
+          testing. *J. R. Stat. Soc. B*, 57(1), 289–300. (``fdr_bh``)
+        Benjamini, Y., & Yekutieli, D. (2001). The control of the
+          false discovery rate in multiple testing under dependency.
+          *Ann. Stat.*, 29(4), 1165–1188. (``fdr_by``)
+        Holm, S. (1979). A simple sequentially rejective multiple test
+          procedure. *Scand. J. Stat.*, 6(2), 65–70. (``holm``)
+        Hommel, G. (1988). A stagewise rejective multiple test
+          procedure based on a modified Bonferroni test. *Biometrika*,
+          75(2), 383–386. (``hommel``)
+        Šidák, Z. (1967). Rectangular confidence regions for the
+          means of multivariate normal distributions. *J. Am. Stat.
+          Assoc.*, 62(318), 626–633. (``sidak``)
         """
         return self._with_option(p_value=True, q_value=True, q_method=method)
@@ -355,8 +372,34 @@ class SofraTable:
             return add_global_p(self)
         # tbl_one / tbl_summary path: route through the rebuild spec.
+        # The rebuild reconstructs the table from spec.options only;
+        # columns added by post-build modifiers (``add_difference``,
+        # ``add_ci``, ``add_significance_stars``, ...) live in
+        # ``self.rows``/``self.headers`` and are NOT preserved by the
+        # rebuild. Detect a *known* such column by header text and warn
+        # the user so the silent column-drop doesn't mislead them.
+        # The correct chaining order is to call ``add_global_p()``
+        # *before* any column-adding modifier.
         spec = self._spec
         if spec is not None and spec.builder in ("tbl_one", "tbl_summary"):
+            post_build_headers = {"Diff", "[", "[ "}
+            header_texts = (
+                [c.text for c in self.headers[0].cells] if self.headers else []
+            )
+            has_diff_col = any(h.startswith("Diff (") for h in header_texts)
+            has_sig_col = any(h.lower() == "signif." for h in header_texts)
+            del post_build_headers
+            if has_diff_col or has_sig_col:
+                import warnings as _w
+                _w.warn(
+                    "add_global_p() reruns the table builder; any "
+                    "column added by a prior modifier (e.g. add_difference, "
+                    "add_significance_stars) will be dropped. Call "
+                    "add_global_p() BEFORE those modifiers to preserve "
+                    "their columns.",
+                    UserWarning,
+                    stacklevel=2,
+                )
             return self._with_option(
                 global_p=True,
                 global_p_adjust_for=tuple(adjust_for or ()),

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/src/pysofra/models/extract.py RENAMED Viewed

@@ -152,11 +152,29 @@ def _extract_lifelines(model: Any, conf_level: float) -> ModelSummary:
         )
     estimates = summary["coef"].astype(float)
-    ci_lo = summary[lo_col].astype(float)
-    ci_hi = summary[hi_col].astype(float)
     pvalues = summary["p"].astype(float) if "p" in summary.columns else pd.Series(
         [float("nan")] * len(summary), index=summary.index
     )
+    # Lifelines bakes the CI level into the fit (alpha=0.05 by default),
+    # so the ``coef lower/upper X%`` columns reflect the fit-time alpha,
+    # not the user's requested ``conf_level``. To honour ``conf_level``
+    # without re-fitting the model, re-derive the CI directly from
+    # ``coef`` and ``se(coef)`` using a normal pivot. Falls back to the
+    # lifelines-provided columns only when no SE column is present.
+    se_col = _find_col(summary, ["se(coef)"])
+    if se_col is not None:
+        import numpy as _np
+        from scipy import stats as _sp_stats
+        z = float(_sp_stats.norm.ppf(0.5 + conf_level / 2))
+        se = summary[se_col].astype(float)
+        ci_lo = estimates - z * se
+        ci_hi = estimates + z * se
+        # Hide ``_np`` reference so linters don't flag it as unused.
+        del _np
+    else:
+        ci_lo = summary[lo_col].astype(float)
+        ci_hi = summary[hi_col].astype(float)
     # AFT models (Weibull / log-logistic / log-normal) carry a MultiIndex
     # ``(param, covariate)`` index — e.g. ``('lambda_', 'age')``. Renderers
     # expect string row labels; flatten with ``covariate (param)`` so the
@@ -170,9 +188,13 @@ def _extract_lifelines(model: Any, conf_level: float) -> ModelSummary:
         pvalues.index = pd.Index(flat)
     family = type(model).__name__
-    # Cox / Weibull / log-normal AFT all naturally report exp(coef) = HR.
+    # Cox returns exp(coef) as a Hazard Ratio; the AFT family (Weibull,
+    # LogNormal, LogLogistic) returns exp(coef) as a Time Ratio. Both are
+    # the natural "exponentiate me" output of the fitter, so we set
+    # natural_exp=True; the column header label is chosen downstream by
+    # ``_default_estimate_label`` in regression.py which selects "HR"
+    # for Cox and "TR" for AFT.
     natural_exp = True
-    del conf_level  # honoured by lifelines at fit time
     return ModelSummary(
         estimates=estimates,
         ci_lo=ci_lo,

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/src/pysofra/models/pool.py RENAMED Viewed

@@ -45,15 +45,54 @@ from .extract import ModelSummary, extract
 def pool(models: list[Any], *, conf_level: float = 0.95) -> ModelSummary:
     """Pool a list of fitted models via Rubin's rules.
-    Returns a :class:`ModelSummary` whose estimates / CIs / p-values
-    reflect the across-imputation combination. Pass the result directly
-    into :func:`pysofra.tbl_regression`.
-    Each input must be a fitted model recognised by
-    :func:`pysofra.models.extract.extract` — statsmodels, lifelines,
-    sklearn (sklearn has no SEs so the pool degenerates to a simple
-    mean-of-coefficients).
+    Parameters
+    ----------
+    models
+        A list of two or more fitted models, each fit on a separate
+        imputed dataset. Every model must be one of the families
+        recognised by :func:`pysofra.models.extract.extract` —
+        statsmodels (Logit, OLS, GLM, Poisson), lifelines
+        (CoxPHFitter, AFT family), or scikit-learn linear models.
+        All models in the list must share the same coefficient names.
+    conf_level
+        Confidence level for the pooled CIs, in the open interval
+        ``(0, 1)``. Default 0.95.
+    Returns
+    -------
+    ModelSummary
+        A summary whose ``estimates``, ``ci_lo``, ``ci_hi`` and
+        ``pvalues`` reflect Rubin's-rule pooling across the
+        imputed-dataset fits. Pass this directly into
+        :func:`pysofra.tbl_regression` to render a pooled regression
+        table.
+    Notes
+    -----
+    The pooled point estimate is the across-imputation mean of the
+    per-imputation estimates. The total variance ``T = Ū + (1 + 1/m)·B``
+    combines the average within-imputation variance ``Ū`` and the
+    between-imputation variance ``B`` (with the small-sample
+    correction ``1 + 1/m``). Confidence intervals use a *t*
+    distribution with Rubin's original degrees-of-freedom
+    ``df = (m − 1)·(1 + Ū / ((1 + 1/m)·B))²``. The newer
+    Barnard–Rubin (1999) df refinement is not yet implemented; for
+    very small per-imputation df it slightly narrows the CI relative
+    to ``mice::pool``.
+    References
+    ----------
+    Rubin, D. B. (1987). *Multiple Imputation for Nonresponse in
+      Surveys.* Wiley.
+    Barnard, J., & Rubin, D. B. (1999). Small-sample degrees of
+      freedom with multiple imputation. *Biometrika*, 86(4),
+      948–955.
     """
+    if not (0.0 < conf_level < 1.0):
+        raise ValueError(
+            f"conf_level must lie in the open interval (0, 1); "
+            f"got {conf_level!r}."
+        )
     if len(models) < 2:
         raise ValueError(
             "pool requires at least two imputed-dataset fits "

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/src/pysofra/models/regression.py RENAMED Viewed

@@ -77,6 +77,11 @@ def tbl_regression(
         Source dataframe — needed only when ``design=`` references
         columns that the fitted model didn't already see.
     """
+    if not (0.0 < conf_level < 1.0):
+        raise ValueError(
+            f"conf_level must lie in the open interval (0, 1); "
+            f"got {conf_level!r}."
+        )
     models = list(model) if isinstance(model, (list, tuple)) else [model]
     if not models:
         raise ValueError("tbl_regression requires at least one model.")
@@ -347,7 +352,13 @@ def _default_estimate_label(family_label: str, exponentiate: bool) -> str:
         if "cox" in fl or "phreg" in fl:
             return "HR"
         if "weibull" in fl or "lognormal" in fl or "loglogistic" in fl:
-            return "HR"  # AFT models report exp(coef) as a time ratio; HR is colloquial
+            # AFT family: exp(coef) is a TIME RATIO (also called Acceleration
+            # Factor), not a hazard ratio. TR > 1 means LONGER survival;
+            # HR > 1 means SHORTER survival — the two parameters point in
+            # opposite directions. Mislabelling AFT as "HR" is publication-
+            # critical because a reader will draw the wrong clinical
+            # conclusion.
+            return "TR"
         if "logit" in fl or "binomial" in fl or "probit" in fl or "logistic" in fl:
             return "OR"
         if "poisson" in fl or "negativebinomial" in fl:

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/src/pysofra/models/survival.py RENAMED Viewed

@@ -77,10 +77,38 @@ def tbl_survival(
             "tbl_survival requires lifelines. Install with `pip install lifelines`."
         ) from e
+    if not (0.0 < conf_level < 1.0):
+        raise ValueError(
+            f"conf_level must lie in the open interval (0, 1); "
+            f"got {conf_level!r}."
+        )
     data = to_pandas(data)
     for col in (time, event):
         if col not in data.columns:
             raise KeyError(f"column {col!r} not in data")
+    # Validate time + event content. ``lifelines`` will silently treat
+    # negative survival times as zero and any nonzero event value as a
+    # death, so input mistakes (e.g. a "censor at last follow-up" column
+    # encoded as 0/1/9, or a follow-up time accidentally negated) can
+    # produce a misleading survival curve without complaint. Fail loud
+    # at the boundary instead.
+    time_num = pd.to_numeric(data[time], errors="coerce")
+    if (time_num < 0).any():
+        n_bad = int((time_num < 0).sum())
+        raise ValueError(
+            f"column {time!r} contains {n_bad} negative value(s); "
+            "survival times must be non-negative."
+        )
+    event_num = pd.to_numeric(data[event], errors="coerce").dropna()
+    bad_events = ~event_num.isin([0, 1])
+    if bool(bad_events.any()):
+        bad_vals = sorted(event_num[bad_events].unique().tolist())
+        raise ValueError(
+            f"column {event!r} must contain only 0/1 (or boolean) "
+            f"values; got unexpected values: {bad_vals!r}."
+        )
     if by is not None and by not in data.columns:
         raise KeyError(f"by column {by!r} not in data")

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/src/pysofra/plot/forest.py RENAMED Viewed

@@ -101,6 +101,29 @@ def _build_forest_figure(
             "`pip install matplotlib`."
         ) from e
+    # Multi-model `tbl_regression` tables emit one estimate / CI / p
+    # column triple per model and a spanning header per model. The
+    # current forest renderer plots a single series, so for multi-model
+    # tables it can only visualise one model. We pick the first model's
+    # columns (matching what gtsummary does by default when given a
+    # multi-model object), and emit a clear ``UserWarning`` so the user
+    # knows the other models were not drawn.
+    n_models = max(1, len(table.spanning_headers))
+    if n_models > 1:
+        import warnings as _w
+        first_label = table.spanning_headers[0].label
+        other_labels = [s.label for s in table.spanning_headers[1:]]
+        _w.warn(
+            f"with_forest_plot on a multi-model regression table plots "
+            f"only the first model ({first_label!r}); the remaining "
+            f"{len(other_labels)} model(s) {other_labels!r} are not "
+            f"visualised. Render one model at a time, or use "
+            f"`with_forest_plot(...)` on each single-model table "
+            f"separately.",
+            UserWarning,
+            stacklevel=2,
+        )
     points: list[tuple[str, float, float, float]] = []
     for r in table.rows:
         label = r.cells[0].text

{pysofra-0.1.0a4 → pysofra-0.1.0a7}/src/pysofra/summary/effect_size.py RENAMED Viewed

@@ -24,8 +24,29 @@ import pandas as pd
 def cohen_d(a: pd.Series | np.ndarray, b: pd.Series | np.ndarray) -> float | None:
     """Cohen's d using the pooled standard deviation.
-    ``d = (μ₁ − μ₂) / s_pool``, where the pooled SD weights the two
-    samples by their degrees of freedom.
+    Parameters
+    ----------
+    a, b
+        Two independent samples (``pandas.Series`` or 1-D ``numpy``
+        array). Non-numeric entries are coerced; ``NaN`` rows are
+        dropped per array. Each sample must contain at least two
+        finite values.
+    Returns
+    -------
+    float or None
+        ``d = (μ_a − μ_b) / s_pool``, where the pooled SD weights the
+        two samples by their degrees of freedom:
+        ``s_pool = sqrt(((n_a − 1)·s_a² + (n_b − 1)·s_b²) / (n_a + n_b − 2))``.
+        Returns ``None`` if either sample has fewer than 2 finite
+        observations. Returns ``0.0`` if the pooled SD is zero and
+        the two means are identical; ``inf`` if the pooled SD is zero
+        but the means differ (degenerate constant-sample case).
+    References
+    ----------
+    Cohen, J. (1988). *Statistical Power Analysis for the Behavioral
+      Sciences* (2nd ed.). Lawrence Erlbaum.
     """
     a_arr = pd.to_numeric(pd.Series(a), errors="coerce").dropna().to_numpy(dtype=float)
     b_arr = pd.to_numeric(pd.Series(b), errors="coerce").dropna().to_numpy(dtype=float)

pysofra 0.1.0a4__tar.gz → 0.1.0a7__tar.gz

pysofra 0.1.0a4tar.gz → 0.1.0a7tar.gz