PyPI - pysofra - Versions diffs - 0.1.0a6__tar.gz → 0.1.0a7__tar.gz - Mend

pysofra 0.1.0a6tar.gz → 0.1.0a7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (106) hide show

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,42 @@ All notable changes to PySofra will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.0a7] — 2026-05-26
+### Fixed
+- **`tbl_survival` validates `time` and `event` content**: negative
+  survival times raise `ValueError`; non-`0/1` event codes raise
+  `ValueError`. Previously these were passed silently to lifelines,
+  which would either clamp negative times to zero or treat any
+  nonzero event value as a death — producing a misleading curve
+  without complaint.
+- **`add_global_p()` on weighted `tbl_one`** now uses
+  ``statsmodels.GLM(..., var_weights=w)`` instead of
+  ``freq_weights=w``. For non-integer sampling weights ``freq_weights``
+  scales ``df_resid`` by ``Σw`` (treating the weight as an integer
+  count of repeats), which inflates the effective sample size and
+  produces anti-conservative p-values. ``var_weights`` keeps
+  ``df_resid = n − k`` — the appropriate SRS-weighted Wald-F
+  convention. For full design-based inference (with strata or
+  clusters) use ``ps.SurveyDesign`` end-to-end.
+### Changed
+- **`rao_scott_chisq` docstring** now honestly states a 10–15%
+  typical disagreement with R ``survey::svychisq`` on non-trivial
+  weighted designs (was: an overoptimistic "~5%"). The first-order
+  Kish-DEFF approximation is unchanged; for design-grade chi-square
+  inference call R directly.
+- **Added published-reference citations** to public statistical
+  functions: Welch / Satterthwaite, Wilcoxon (Mann-Whitney 1947),
+  Kruskal-Wallis (1952), Fisher (1922), Pearson chi-square (1900),
+  Wilson score (1927), Rao-Scott (1981/1984), Kish (1965),
+  Benjamini-Hochberg (1995), Benjamini-Yekutieli (2001), Holm
+  (1979), Hommel (1988), Šidák (1967), Binder (1983) Taylor
+  linearisation.
+- **`pool` and `cohen_d` docstrings** now have NumPy-style
+  ``Parameters`` / ``Returns`` / ``References`` sections matching
+  the other public functions.
 ## [0.1.0a6] — 2026-05-26
 ### Fixed

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pysofra
-Version: 0.1.0a6
+Version: 0.1.0a7
 Summary: Statistical reporting and table preparation framework for Python — the missing reporting layer.
 Project-URL: Homepage, https://github.com/jturner-uofl/pysofra
 Project-URL: Documentation, https://github.com/jturner-uofl/pysofra
@@ -75,7 +75,7 @@ Description-Content-Type: text/markdown
 [![License: GPL-3.0+](https://img.shields.io/badge/license-GPL--3.0--or--later-blue.svg)](https://github.com/jturner-uofl/pysofra/blob/main/LICENSE)
 [![Style: ruff](https://img.shields.io/badge/style-ruff-purple.svg)](https://github.com/astral-sh/ruff)
 [![Types: mypy strict](https://img.shields.io/badge/types-mypy%20strict-blue.svg)](http://mypy-lang.org/)
-[![Tests: 903](https://img.shields.io/badge/tests-903%20passing-brightgreen.svg)](#status)
+[![Tests: 906](https://img.shields.io/badge/tests-906%20passing-brightgreen.svg)](#status)
 </div>
@@ -255,7 +255,7 @@ pip install "pysofra[dev]"        # testing + linting (pytest, ruff, mypy, hypot
 ## Status
-PySofra is in **alpha** (`0.1.0a6`). The public API surface is pinned
+PySofra is in **alpha** (`0.1.0a7`). The public API surface is pinned
 by an explicit
 [API-stability test](https://github.com/jturner-uofl/pysofra/blob/main/tests/test_api_stability.py)
 so that any unintended rename, removal, or signature change surfaces as

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/README.md RENAMED Viewed

@@ -9,7 +9,7 @@
 [![License: GPL-3.0+](https://img.shields.io/badge/license-GPL--3.0--or--later-blue.svg)](https://github.com/jturner-uofl/pysofra/blob/main/LICENSE)
 [![Style: ruff](https://img.shields.io/badge/style-ruff-purple.svg)](https://github.com/astral-sh/ruff)
 [![Types: mypy strict](https://img.shields.io/badge/types-mypy%20strict-blue.svg)](http://mypy-lang.org/)
-[![Tests: 903](https://img.shields.io/badge/tests-903%20passing-brightgreen.svg)](#status)
+[![Tests: 906](https://img.shields.io/badge/tests-906%20passing-brightgreen.svg)](#status)
 </div>
@@ -189,7 +189,7 @@ pip install "pysofra[dev]"        # testing + linting (pytest, ruff, mypy, hypot
 ## Status
-PySofra is in **alpha** (`0.1.0a6`). The public API surface is pinned
+PySofra is in **alpha** (`0.1.0a7`). The public API surface is pinned
 by an explicit
 [API-stability test](https://github.com/jturner-uofl/pysofra/blob/main/tests/test_api_stability.py)
 so that any unintended rename, removal, or signature change surfaces as

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "pysofra"
-version = "0.1.0a6"
+version = "0.1.0a7"
 description = "Statistical reporting and table preparation framework for Python — the missing reporting layer."
 readme = "README.md"
 license = { text = "GPL-3.0-or-later" }

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/src/pysofra/__init__.py RENAMED Viewed

@@ -50,7 +50,7 @@ from .summary.tbl_summary import tbl_summary
 from .summary.tests import available_tests
 from .themes.registry import available_themes, register_theme
-__version__ = "0.1.0a6"
+__version__ = "0.1.0a7"
 __all__ = [
     "CellPart",

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/src/pysofra/core/table.py RENAMED Viewed

@@ -262,6 +262,23 @@ class SofraTable:
         ``fdr_bh`` (Benjamini–Hochberg, default), ``fdr_by``,
         ``bonferroni``, ``holm``, ``hommel``, ``sidak``. Implicitly
         enables p-values when not already on.
+        References
+        ----------
+        Benjamini, Y., & Hochberg, Y. (1995). Controlling the false
+          discovery rate: a practical and powerful approach to multiple
+          testing. *J. R. Stat. Soc. B*, 57(1), 289–300. (``fdr_bh``)
+        Benjamini, Y., & Yekutieli, D. (2001). The control of the
+          false discovery rate in multiple testing under dependency.
+          *Ann. Stat.*, 29(4), 1165–1188. (``fdr_by``)
+        Holm, S. (1979). A simple sequentially rejective multiple test
+          procedure. *Scand. J. Stat.*, 6(2), 65–70. (``holm``)
+        Hommel, G. (1988). A stagewise rejective multiple test
+          procedure based on a modified Bonferroni test. *Biometrika*,
+          75(2), 383–386. (``hommel``)
+        Šidák, Z. (1967). Rectangular confidence regions for the
+          means of multivariate normal distributions. *J. Am. Stat.
+          Assoc.*, 62(318), 626–633. (``sidak``)
         """
         return self._with_option(p_value=True, q_value=True, q_method=method)

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/src/pysofra/models/pool.py RENAMED Viewed

@@ -45,14 +45,48 @@ from .extract import ModelSummary, extract
 def pool(models: list[Any], *, conf_level: float = 0.95) -> ModelSummary:
     """Pool a list of fitted models via Rubin's rules.
-    Returns a :class:`ModelSummary` whose estimates / CIs / p-values
-    reflect the across-imputation combination. Pass the result directly
-    into :func:`pysofra.tbl_regression`.
-    Each input must be a fitted model recognised by
-    :func:`pysofra.models.extract.extract` — statsmodels, lifelines,
-    sklearn (sklearn has no SEs so the pool degenerates to a simple
-    mean-of-coefficients).
+    Parameters
+    ----------
+    models
+        A list of two or more fitted models, each fit on a separate
+        imputed dataset. Every model must be one of the families
+        recognised by :func:`pysofra.models.extract.extract` —
+        statsmodels (Logit, OLS, GLM, Poisson), lifelines
+        (CoxPHFitter, AFT family), or scikit-learn linear models.
+        All models in the list must share the same coefficient names.
+    conf_level
+        Confidence level for the pooled CIs, in the open interval
+        ``(0, 1)``. Default 0.95.
+    Returns
+    -------
+    ModelSummary
+        A summary whose ``estimates``, ``ci_lo``, ``ci_hi`` and
+        ``pvalues`` reflect Rubin's-rule pooling across the
+        imputed-dataset fits. Pass this directly into
+        :func:`pysofra.tbl_regression` to render a pooled regression
+        table.
+    Notes
+    -----
+    The pooled point estimate is the across-imputation mean of the
+    per-imputation estimates. The total variance ``T = Ū + (1 + 1/m)·B``
+    combines the average within-imputation variance ``Ū`` and the
+    between-imputation variance ``B`` (with the small-sample
+    correction ``1 + 1/m``). Confidence intervals use a *t*
+    distribution with Rubin's original degrees-of-freedom
+    ``df = (m − 1)·(1 + Ū / ((1 + 1/m)·B))²``. The newer
+    Barnard–Rubin (1999) df refinement is not yet implemented; for
+    very small per-imputation df it slightly narrows the CI relative
+    to ``mice::pool``.
+    References
+    ----------
+    Rubin, D. B. (1987). *Multiple Imputation for Nonresponse in
+      Surveys.* Wiley.
+    Barnard, J., & Rubin, D. B. (1999). Small-sample degrees of
+      freedom with multiple imputation. *Biometrika*, 86(4),
+      948–955.
     """
     if not (0.0 < conf_level < 1.0):
         raise ValueError(

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/src/pysofra/models/survival.py RENAMED Viewed

@@ -86,6 +86,29 @@ def tbl_survival(
     for col in (time, event):
         if col not in data.columns:
             raise KeyError(f"column {col!r} not in data")
+    # Validate time + event content. ``lifelines`` will silently treat
+    # negative survival times as zero and any nonzero event value as a
+    # death, so input mistakes (e.g. a "censor at last follow-up" column
+    # encoded as 0/1/9, or a follow-up time accidentally negated) can
+    # produce a misleading survival curve without complaint. Fail loud
+    # at the boundary instead.
+    time_num = pd.to_numeric(data[time], errors="coerce")
+    if (time_num < 0).any():
+        n_bad = int((time_num < 0).sum())
+        raise ValueError(
+            f"column {time!r} contains {n_bad} negative value(s); "
+            "survival times must be non-negative."
+        )
+    event_num = pd.to_numeric(data[event], errors="coerce").dropna()
+    bad_events = ~event_num.isin([0, 1])
+    if bool(bad_events.any()):
+        bad_vals = sorted(event_num[bad_events].unique().tolist())
+        raise ValueError(
+            f"column {event!r} must contain only 0/1 (or boolean) "
+            f"values; got unexpected values: {bad_vals!r}."
+        )
     if by is not None and by not in data.columns:
         raise KeyError(f"by column {by!r} not in data")

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/src/pysofra/summary/effect_size.py RENAMED Viewed

@@ -24,8 +24,29 @@ import pandas as pd
 def cohen_d(a: pd.Series | np.ndarray, b: pd.Series | np.ndarray) -> float | None:
     """Cohen's d using the pooled standard deviation.
-    ``d = (μ₁ − μ₂) / s_pool``, where the pooled SD weights the two
-    samples by their degrees of freedom.
+    Parameters
+    ----------
+    a, b
+        Two independent samples (``pandas.Series`` or 1-D ``numpy``
+        array). Non-numeric entries are coerced; ``NaN`` rows are
+        dropped per array. Each sample must contain at least two
+        finite values.
+    Returns
+    -------
+    float or None
+        ``d = (μ_a − μ_b) / s_pool``, where the pooled SD weights the
+        two samples by their degrees of freedom:
+        ``s_pool = sqrt(((n_a − 1)·s_a² + (n_b − 1)·s_b²) / (n_a + n_b − 2))``.
+        Returns ``None`` if either sample has fewer than 2 finite
+        observations. Returns ``0.0`` if the pooled SD is zero and
+        the two means are identical; ``inf`` if the pooled SD is zero
+        but the means differ (degenerate constant-sample case).
+    References
+    ----------
+    Cohen, J. (1988). *Statistical Power Analysis for the Behavioral
+      Sciences* (2nd ed.). Lawrence Erlbaum.
     """
     a_arr = pd.to_numeric(pd.Series(a), errors="coerce").dropna().to_numpy(dtype=float)
     b_arr = pd.to_numeric(pd.Series(b), errors="coerce").dropna().to_numpy(dtype=float)

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/src/pysofra/summary/extras.py RENAMED Viewed

@@ -731,7 +731,13 @@ def add_ci(
 def _wilson_ci(x: int, n: int, *, z: float) -> tuple[float, float]:
-    """Wilson score CI for a proportion."""
+    """Wilson score CI for a proportion.
+    References
+    ----------
+    Wilson, E. B. (1927). Probable inference, the law of succession,
+      and statistical inference. *J. Am. Stat. Assoc.*, 22(158), 209–212.
+    """
     if n == 0:
         return float("nan"), float("nan")
     p = x / n

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/src/pysofra/summary/tbl_one.py RENAMED Viewed

@@ -1207,14 +1207,22 @@ def _fit_global_p(
     try:
         with _w.catch_warnings():
             _w.simplefilter("ignore")  # statsmodels convergence chatter
-            # Honour weights by routing through GLM(Binomial), which
-            # accepts ``freq_weights``. sm.Logit doesn't expose that
-            # kwarg, so for the weighted case we use the equivalent
-            # GLM-with-logit-link formulation (same MLE, same f_test API).
+            # Honour weights by routing through GLM(Binomial). We use
+            # ``var_weights`` rather than ``freq_weights``: ``freq_weights``
+            # treats the weight as an integer *count of repeats* and so
+            # scales ``df_resid`` by ``Σw`` — which dramatically inflates
+            # the effective sample size for non-integer sampling weights
+            # (a survey weight calibrated to a 200k population would push
+            # df_resid to 200k instead of n). ``var_weights`` keeps
+            # ``df_resid = n − k``, which is the appropriate convention
+            # for sampling / IPW weights where the weight does not
+            # represent a count. For full design-based inference (with
+            # strata or clusters) use ``ps.SurveyDesign`` end-to-end;
+            # the joint p test here is an SRS-weighted Wald-F.
             if weights_col is not None:
                 w_arr = sub[weights_col].to_numpy(dtype=float)
                 fam = sm.families.Binomial()
-                res = sm.GLM(y, X, family=fam, freq_weights=w_arr).fit(disp=False)
+                res = sm.GLM(y, X, family=fam, var_weights=w_arr).fit(disp=False)
             else:
                 res = sm.Logit(y, X).fit(disp=False, method="newton",
                                           maxiter=100)

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/src/pysofra/summary/tests.py RENAMED Viewed

@@ -17,6 +17,29 @@ Two layers:
 Returns a small :class:`TestResult` so callers can render both the p-value
 and the test name for the footnote.
+References
+----------
+Welch, B. L. (1947). The generalization of "Student's" problem when
+  several different population variances are involved. *Biometrika*,
+  34(1/2), 28–35. (Welch's t with Satterthwaite df.)
+Satterthwaite, F. E. (1946). An approximate distribution of estimates
+  of variance components. *Biometrics Bulletin*, 2(6), 110–114.
+Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of
+  two random variables is stochastically larger than the other.
+  *Ann. Math. Statist.*, 18(1), 50–60. (Wilcoxon rank-sum.)
+Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion
+  variance analysis. *J. Am. Stat. Assoc.*, 47(260), 583–621.
+Fisher, R. A. (1922). On the interpretation of χ² from contingency
+  tables, and the calculation of P. *J. Royal Statist. Soc.*,
+  85(1), 87–94.
+Pearson, K. (1900). On the criterion that a given system of
+  deviations from the probable… *Phil. Mag.*, 50(302), 157–175.
+  (Pearson chi-square.)
+Rao, J. N. K., & Scott, A. J. (1981). The analysis of categorical
+  data from complex sample surveys. *J. Am. Stat. Assoc.*, 76,
+  221–230. (Rao–Scott chi-square; see also Rao & Scott 1984.)
+Kish, L. (1965). *Survey Sampling.* Wiley. (Kish design effect.)
 """
 from __future__ import annotations
@@ -167,6 +190,15 @@ def svyttest(
     full design and gave inflated t-statistics whenever clusters
     straddled groups; the current formulation matches R to first
     order.
+    References
+    ----------
+    Lumley, T. (2010). *Complex Surveys: A Guide to Analysis Using R.*
+      Wiley. Chapter on two-sample tests for survey data.
+    Binder, D. A. (1983). On the variances of asymptotically normal
+      estimators from complex surveys. *Int. Statist. Rev.*, 51(3),
+      279–292. (Taylor linearisation of regression coefficients
+      under complex sampling.)
     """
     df_ = pd.DataFrame({
         "v": pd.to_numeric(values, errors="coerce"),
@@ -319,13 +351,28 @@ def rao_scott_chisq(
     Notes
     -----
     This is a *first-order* Rao–Scott correction using the Kish design
-    effect (a single scalar derived from the weights). For exact parity
-    with R ``survey::svychisq(..., statistic="F")`` — which uses the
-    *generalised* design effect derived from the eigenvalues of the
-    full design covariance matrix — call out to the R ``survey``
-    package directly. Pysofra's first-order approximation typically
-    agrees with R to within ~5% on simple weighted designs and is
-    adequate for descriptive Table 1 use.
+    effect (a single scalar derived from the weights). The fully-correct
+    Rao–Scott statistic uses the *generalised* design effect derived from
+    the eigenvalues of the full design covariance matrix
+    (Rao & Scott, 1981, 1984); R ``survey::svychisq`` implements that
+    version. On non-trivial weighted designs (stratified, clustered, or
+    even simple weighted with non-uniform weights), the Kish
+    approximation here typically disagrees with R ``svychisq`` by
+    **10–15%** in the statistic and a similar amount in the p-value.
+    The approximation is adequate for descriptive Table 1 contexts
+    where the χ² is a guide rather than a publication-grade test
+    statistic; for design-grade chi-square inference, call
+    ``survey::svychisq`` in R directly.
+    References
+    ----------
+    Rao, J. N. K., & Scott, A. J. (1981). The analysis of categorical
+      data from complex sample surveys. *J. Am. Stat. Assoc.*,
+      76(374), 221–230.
+    Rao, J. N. K., & Scott, A. J. (1984). On chi-squared tests for
+      multiway contingency tables with cell proportions estimated
+      from survey data. *Ann. Stat.*, 12(1), 46–60.
+    Kish, L. (1965). *Survey Sampling.* Wiley.
     """
     df = pd.DataFrame({
         "v": values,

{pysofra-0.1.0a6 → pysofra-0.1.0a7}/tests/test_regressions.py RENAMED Viewed

@@ -1857,7 +1857,13 @@ class TestWeightedModifiers:
         assert abs(diff_unw[1] - diff_wt[1]) > 1e-6 or \
                abs(diff_unw[2] - diff_wt[2]) > 1e-6
-    def test_add_global_p_weighted_matches_glm_freq_weights(self):
+    def test_add_global_p_weighted_matches_glm_var_weights(self):
+        # Reference uses ``var_weights=`` rather than ``freq_weights=``:
+        # for non-integer sampling weights, ``freq_weights`` artificially
+        # inflates df_resid by ``Σw`` (treating the weight as an integer
+        # count of repeats), making the F-test anti-conservative. The
+        # ``var_weights`` convention keeps df_resid = n - k, which is
+        # the appropriate SRS-weighted Wald-F for sampling/IPW weights.
         sm = pytest.importorskip("statsmodels.api")
         df = self._df()
         t = (
@@ -1865,12 +1871,12 @@ class TestWeightedModifiers:
                        weights="w", missing="never", types={"smoker": "dichotomous"})
             .add_global_p()
         )
-        # Manual reference: fit GLM(Binomial) with freq_weights and
+        # Manual reference: fit GLM(Binomial) with var_weights and
         # f_test on the single age coefficient.
         y = (df["arm"] == "B").astype(int).to_numpy()
         X = sm.add_constant(df[["age"]])
         ref = sm.GLM(y, X, family=sm.families.Binomial(),
-                     freq_weights=df["w"].to_numpy(dtype=float)).fit(disp=False)
+                     var_weights=df["w"].to_numpy(dtype=float)).fit(disp=False)
         expected_p = float(ref.f_test("age = 0").pvalue)
         # Get the table's global p for "age"
         row = next(r for r in t.rows if r.cells[0].text == "age")
@@ -1883,3 +1889,40 @@ class TestWeightedModifiers:
         del gp_cell
         assert last_p is not None
         assert abs(float(last_p) - expected_p) < 1e-6, (last_p, expected_p)
+# ----------------------------------------------------------------------
+# tbl_survival validates time + event content. Previously negative
+# follow-up times and non-0/1 event codes were silently passed
+# through to lifelines (which treats nonzero as a death), producing
+# misleading survival curves without complaint.
+# ----------------------------------------------------------------------
+class TestSurvivalInputValidation:
+    def test_negative_time_raises(self):
+        pytest.importorskip("lifelines")
+        df = pd.DataFrame({
+            "t": [1.0, -2.0, 3.0, 4.0],
+            "e": [0, 1, 1, 0],
+        })
+        with pytest.raises(ValueError, match=r"negative value"):
+            ps.tbl_survival(df, time="t", event="e")
+    def test_non_binary_event_raises(self):
+        pytest.importorskip("lifelines")
+        df = pd.DataFrame({
+            "t": [1.0, 2.0, 3.0, 4.0],
+            "e": [0, 1, 9, 1],   # 9 is not 0/1
+        })
+        with pytest.raises(ValueError, match=r"must contain only 0/1"):
+            ps.tbl_survival(df, time="t", event="e")
+    def test_valid_inputs_pass(self):
+        pytest.importorskip("lifelines")
+        rng = np.random.default_rng(0)
+        df = pd.DataFrame({
+            "t": rng.exponential(10, 50),
+            "e": rng.integers(0, 2, 50),
+        })
+        # Should not raise.
+        t = ps.tbl_survival(df, time="t", event="e")
+        assert len(t.rows) >= 1