PyPI - diff-diff - Versions diffs - 3.1.1__tar.gz → 3.1.3__tar.gz - Mend

diff-diff 3.1.1tar.gz → 3.1.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (77) hide show

diff_diff-3.1.3/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025-2026 Isaac Gerber
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

{diff_diff-3.1.1 → diff_diff-3.1.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: diff-diff
-Version: 3.1.1
+Version: 3.1.3
 Classifier: Development Status :: 5 - Production/Stable
 Classifier: Intended Audience :: Science/Research
 Classifier: Operating System :: OS Independent
@@ -37,6 +37,7 @@ Requires-Dist: plotly>=5.0 ; extra == 'plotly'
 Provides-Extra: dev
 Provides-Extra: docs
 Provides-Extra: plotly
+License-File: LICENSE
 Summary: Difference-in-Differences causal inference with sklearn-like API. Callaway-Sant'Anna, Synthetic DiD, Honest DiD, event studies, parallel trends.
 Keywords: causal-inference,difference-in-differences,econometrics,statistics,treatment-effects,event-study,staggered-adoption,parallel-trends,synthetic-control,panel-data,did,twfe,callaway-santanna,honest-did,sensitivity-analysis
 Author: diff-diff contributors
@@ -46,7 +47,7 @@ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
 Project-URL: Documentation, https://diff-diff.readthedocs.io
 Project-URL: Homepage, https://github.com/igerber/diff-diff
 Project-URL: Issues, https://github.com/igerber/diff-diff/issues
-Project-URL: Practitioner Guide, https://github.com/igerber/diff-diff/blob/main/docs/llms-practitioner.txt
+Project-URL: Practitioner Guide, https://diff-diff.readthedocs.io/en/stable/llms-practitioner.txt
 Project-URL: Repository, https://github.com/igerber/diff-diff
 # diff-diff
@@ -120,11 +121,19 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
 ## For AI Agents
-If you are an AI agent or LLM using this library, read [`docs/llms.txt`](docs/llms.txt) for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis — not just calling `fit()`, but testing assumptions, running sensitivity analysis, and checking robustness.
+If you are an AI agent or LLM using this library, call `diff_diff.get_llm_guide()` for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis — not just calling `fit()`, but testing assumptions, running sensitivity analysis, and checking robustness.
-After estimation, call `practitioner_next_steps(results)` for context-aware guidance on remaining diagnostic steps.
+```python
+from diff_diff import get_llm_guide
+get_llm_guide()                 # concise API reference
+get_llm_guide("practitioner")   # 8-step workflow (Baker et al. 2025)
+get_llm_guide("full")           # comprehensive documentation
+```
+The guides are bundled in the wheel, so they are accessible from a `pip install` with no network access required.
-Detailed guide: [`docs/llms-practitioner.txt`](docs/llms-practitioner.txt)
+After estimation, call `practitioner_next_steps(results)` for context-aware guidance on remaining diagnostic steps.
 ## For Data Scientists
@@ -1156,7 +1165,7 @@ results = stacked_did(
 ### Efficient DiD (Chen, Sant'Anna & Xie 2025)
-Efficient DiD achieves the semiparametric efficiency bound for ATT estimation in staggered adoption designs. It optimally weights across all valid comparison groups and baselines via the inverse covariance matrix Omega*, producing tighter confidence intervals than standard estimators like Callaway-Sant'Anna when the stronger PT-All assumption holds.
+Efficient DiD achieves the semiparametric efficiency bound for ATT estimation in staggered adoption designs along the **no-covariate path**, producing tighter confidence intervals than standard estimators when the stronger PT-All assumption holds. It optimally weights across all valid comparison groups and baselines via the inverse covariance matrix Omega*. A doubly-robust covariate path is also available: it is consistent if either the outcome regression or the sieve propensity ratio is correctly specified, but the linear OLS outcome regression does not generically attain the efficiency bound unless the conditional mean is linear in the covariates.
 ```python
 from diff_diff import EfficientDiD, generate_staggered_data
@@ -1191,8 +1200,13 @@ EfficientDiD(
 )
 ```
-> **Note:** Phase 1 supports the no-covariates path only. Use CallawaySantAnna with
-> `estimation_method='dr'` if you need covariate adjustment.
+> **Note:** EfficientDiD supports covariate adjustment via a doubly-robust path
+> (sieve-based propensity score ratios and a linear OLS outcome regression).
+> The DR property gives consistency if either the OR or the PS is correctly
+> specified, but the OLS working model for the outcome regression does not
+> generically attain the semiparametric efficiency bound. The unqualified
+> efficiency-bound claim applies to the no-covariate path only. See the
+> `covariates` parameter on `fit()` and `docs/methodology/REGISTRY.md`.
 **When to use Efficient DiD vs Callaway-Sant'Anna:**
@@ -1200,15 +1214,15 @@ EfficientDiD(
 |--------|--------------|-------------------|
 | Approach | Optimal EIF-based weighting | Separate 2x2 DiD aggregation |
 | PT assumption | PT-All (stronger) or PT-Post | Conditional PT |
-| Efficiency | Achieves semiparametric bound | Not efficient |
-| Covariates | Not yet (Phase 2) | Supported (OR, IPW, DR) |
+| Efficiency | Achieves semiparametric bound on the no-covariate path; DR covariate path is consistent but does not generically attain the bound under a linear OLS outcome regression | Not efficient |
+| Covariates | Supported (doubly robust, sieve-based PS + linear OLS OR) | Supported (OR, IPW, DR) |
 | When to choose | Maximum efficiency, PT-All credible | Covariates needed, weaker PT |
 ### de Chaisemartin-D'Haultfœuille (dCDH) for Reversible Treatments
 `ChaisemartinDHaultfoeuille` (alias `DCDH`) is the only library estimator that handles **non-absorbing (reversible) treatments** — treatment can switch on AND off over time. This is the natural fit for marketing campaigns, seasonal promotions, on/off policy cycles.
-Ships `DID_M` (= `DID_1` at horizon `l = 1`) plus the full multi-horizon event study `DID_l` for `l = 1..L_max` via the `L_max` parameter. Phase 3 will add covariate adjustment.
+Ships `DID_M` (= `DID_1` at horizon `l = 1`), the full multi-horizon event study `DID_l` for `l = 1..L_max` via the `L_max` parameter, residualization-style covariate adjustment (`controls`), group-specific linear trends (`trends_linear`), state-set-specific trends (`trends_nonparam`), heterogeneity testing, non-binary treatment, HonestDiD sensitivity integration on placebos, and survey support via Taylor-series linearization.
 ```python
 from diff_diff import ChaisemartinDHaultfoeuille
@@ -1264,7 +1278,7 @@ ChaisemartinDHaultfoeuille(
 | `n_groups_dropped_crossers`, `n_groups_dropped_singleton_baseline` | Filter counts (multi-switch groups dropped before estimation; singleton-baseline groups excluded from variance) |
 | `n_groups_dropped_never_switching` | Backwards-compatibility metadata. Never-switching groups participate in the variance via stable-control roles; this field is no longer a filter count. |
-**Multi-horizon event study** (Phase 2 - pass `L_max` to `fit()`):
+**Multi-horizon event study** (pass `L_max` to `fit()`):
 ```python
 results = est.fit(data, outcome="outcome", group="group",
@@ -1303,13 +1317,13 @@ print(f"Fraction of negative weights: {diagnostic.fraction_negative:.3f}")
 print(f"sigma_fe (sign-flipping threshold): {diagnostic.sigma_fe:.3f}")
 ```
-> **Note:** Placebo SE is `NaN` for both the single-lag `DID_M^pl` and the dynamic placebos `DID^{pl}_l`. The point estimates are meaningful for visual pre-trends inspection; formal placebo inference (influence-function derivation) is deferred to a follow-up. See `REGISTRY.md` for the full contract.
+> **Note:** Placebo SE is `NaN` for the single-period `DID_M^pl` (`L_max=None`) because the per-period aggregation path has no influence-function derivation; the point estimate is meaningful for visual pre-trends inspection. Multi-horizon dynamic placebos `DID^{pl}_l` (`L_max >= 1`) have valid analytical SE via the same cohort-recentered plug-in variance as the positive horizons, with bootstrap SE available when `n_bootstrap > 0`. See `docs/methodology/REGISTRY.md` for the full contract.
 > **Note:** By default (`drop_larger_lower=True`), the estimator drops groups whose treatment switches more than once before estimation. This matches R `DIDmultiplegtDYN`'s default and is required for the analytical variance formula to be consistent with the point estimate. Each drop emits an explicit warning.
-> **Note:** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. Late-entry groups (missing the baseline) raise `ValueError`; interior-gap groups are dropped with a warning; terminally-missing groups (early exit / right-censoring) are retained and contribute from their observed periods only. This is a documented deviation from R `DIDmultiplegtDYN`, which supports unbalanced panels — see [`docs/methodology/REGISTRY.md`](docs/methodology/REGISTRY.md) for the rationale, the defensive guards that make terminal missingness safe, and workarounds for unbalanced inputs.
+> **Note:** The estimator requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. Late-entry groups (missing the baseline) raise `ValueError`; interior-gap groups are dropped with a warning; terminally-missing groups (early exit / right-censoring) are retained and contribute from their observed periods only. This is a documented deviation from R `DIDmultiplegtDYN`, which supports unbalanced panels - see [`docs/methodology/REGISTRY.md`](docs/methodology/REGISTRY.md) for the rationale, the defensive guards that make terminal missingness safe, and workarounds for unbalanced inputs.
-> **Note:** Survey design (`survey_design`), covariate adjustment (`controls`), group-specific linear trends (`trends_linear`), and HonestDiD integration (`honest_did`) are not yet supported. They raise `NotImplementedError` with phase pointers - see [`ROADMAP.md`](ROADMAP.md) for the Phase 3 rollout.
+> **Note:** Survey design is supported via Taylor-series linearization on `pweight` with strata / PSU / FPC. Replicate-weight variance and PSU-level bootstrap for dCDH are a planned extension. The `aggregate` parameter still raises `NotImplementedError`.
 ### Triple Difference (DDD)

{diff_diff-3.1.1 → diff_diff-3.1.3}/README.md RENAMED Viewed

@@ -69,11 +69,19 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
 ## For AI Agents
-If you are an AI agent or LLM using this library, read [`docs/llms.txt`](docs/llms.txt) for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis — not just calling `fit()`, but testing assumptions, running sensitivity analysis, and checking robustness.
+If you are an AI agent or LLM using this library, call `diff_diff.get_llm_guide()` for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis — not just calling `fit()`, but testing assumptions, running sensitivity analysis, and checking robustness.
-After estimation, call `practitioner_next_steps(results)` for context-aware guidance on remaining diagnostic steps.
+```python
+from diff_diff import get_llm_guide
+get_llm_guide()                 # concise API reference
+get_llm_guide("practitioner")   # 8-step workflow (Baker et al. 2025)
+get_llm_guide("full")           # comprehensive documentation
+```
+The guides are bundled in the wheel, so they are accessible from a `pip install` with no network access required.
-Detailed guide: [`docs/llms-practitioner.txt`](docs/llms-practitioner.txt)
+After estimation, call `practitioner_next_steps(results)` for context-aware guidance on remaining diagnostic steps.
 ## For Data Scientists
@@ -1105,7 +1113,7 @@ results = stacked_did(
 ### Efficient DiD (Chen, Sant'Anna & Xie 2025)
-Efficient DiD achieves the semiparametric efficiency bound for ATT estimation in staggered adoption designs. It optimally weights across all valid comparison groups and baselines via the inverse covariance matrix Omega*, producing tighter confidence intervals than standard estimators like Callaway-Sant'Anna when the stronger PT-All assumption holds.
+Efficient DiD achieves the semiparametric efficiency bound for ATT estimation in staggered adoption designs along the **no-covariate path**, producing tighter confidence intervals than standard estimators when the stronger PT-All assumption holds. It optimally weights across all valid comparison groups and baselines via the inverse covariance matrix Omega*. A doubly-robust covariate path is also available: it is consistent if either the outcome regression or the sieve propensity ratio is correctly specified, but the linear OLS outcome regression does not generically attain the efficiency bound unless the conditional mean is linear in the covariates.
 ```python
 from diff_diff import EfficientDiD, generate_staggered_data
@@ -1140,8 +1148,13 @@ EfficientDiD(
 )
 ```
-> **Note:** Phase 1 supports the no-covariates path only. Use CallawaySantAnna with
-> `estimation_method='dr'` if you need covariate adjustment.
+> **Note:** EfficientDiD supports covariate adjustment via a doubly-robust path
+> (sieve-based propensity score ratios and a linear OLS outcome regression).
+> The DR property gives consistency if either the OR or the PS is correctly
+> specified, but the OLS working model for the outcome regression does not
+> generically attain the semiparametric efficiency bound. The unqualified
+> efficiency-bound claim applies to the no-covariate path only. See the
+> `covariates` parameter on `fit()` and `docs/methodology/REGISTRY.md`.
 **When to use Efficient DiD vs Callaway-Sant'Anna:**
@@ -1149,15 +1162,15 @@ EfficientDiD(
 |--------|--------------|-------------------|
 | Approach | Optimal EIF-based weighting | Separate 2x2 DiD aggregation |
 | PT assumption | PT-All (stronger) or PT-Post | Conditional PT |
-| Efficiency | Achieves semiparametric bound | Not efficient |
-| Covariates | Not yet (Phase 2) | Supported (OR, IPW, DR) |
+| Efficiency | Achieves semiparametric bound on the no-covariate path; DR covariate path is consistent but does not generically attain the bound under a linear OLS outcome regression | Not efficient |
+| Covariates | Supported (doubly robust, sieve-based PS + linear OLS OR) | Supported (OR, IPW, DR) |
 | When to choose | Maximum efficiency, PT-All credible | Covariates needed, weaker PT |
 ### de Chaisemartin-D'Haultfœuille (dCDH) for Reversible Treatments
 `ChaisemartinDHaultfoeuille` (alias `DCDH`) is the only library estimator that handles **non-absorbing (reversible) treatments** — treatment can switch on AND off over time. This is the natural fit for marketing campaigns, seasonal promotions, on/off policy cycles.
-Ships `DID_M` (= `DID_1` at horizon `l = 1`) plus the full multi-horizon event study `DID_l` for `l = 1..L_max` via the `L_max` parameter. Phase 3 will add covariate adjustment.
+Ships `DID_M` (= `DID_1` at horizon `l = 1`), the full multi-horizon event study `DID_l` for `l = 1..L_max` via the `L_max` parameter, residualization-style covariate adjustment (`controls`), group-specific linear trends (`trends_linear`), state-set-specific trends (`trends_nonparam`), heterogeneity testing, non-binary treatment, HonestDiD sensitivity integration on placebos, and survey support via Taylor-series linearization.
 ```python
 from diff_diff import ChaisemartinDHaultfoeuille
@@ -1213,7 +1226,7 @@ ChaisemartinDHaultfoeuille(
 | `n_groups_dropped_crossers`, `n_groups_dropped_singleton_baseline` | Filter counts (multi-switch groups dropped before estimation; singleton-baseline groups excluded from variance) |
 | `n_groups_dropped_never_switching` | Backwards-compatibility metadata. Never-switching groups participate in the variance via stable-control roles; this field is no longer a filter count. |
-**Multi-horizon event study** (Phase 2 - pass `L_max` to `fit()`):
+**Multi-horizon event study** (pass `L_max` to `fit()`):
 ```python
 results = est.fit(data, outcome="outcome", group="group",
@@ -1252,13 +1265,13 @@ print(f"Fraction of negative weights: {diagnostic.fraction_negative:.3f}")
 print(f"sigma_fe (sign-flipping threshold): {diagnostic.sigma_fe:.3f}")
 ```
-> **Note:** Placebo SE is `NaN` for both the single-lag `DID_M^pl` and the dynamic placebos `DID^{pl}_l`. The point estimates are meaningful for visual pre-trends inspection; formal placebo inference (influence-function derivation) is deferred to a follow-up. See `REGISTRY.md` for the full contract.
+> **Note:** Placebo SE is `NaN` for the single-period `DID_M^pl` (`L_max=None`) because the per-period aggregation path has no influence-function derivation; the point estimate is meaningful for visual pre-trends inspection. Multi-horizon dynamic placebos `DID^{pl}_l` (`L_max >= 1`) have valid analytical SE via the same cohort-recentered plug-in variance as the positive horizons, with bootstrap SE available when `n_bootstrap > 0`. See `docs/methodology/REGISTRY.md` for the full contract.
 > **Note:** By default (`drop_larger_lower=True`), the estimator drops groups whose treatment switches more than once before estimation. This matches R `DIDmultiplegtDYN`'s default and is required for the analytical variance formula to be consistent with the point estimate. Each drop emits an explicit warning.
-> **Note:** Phase 1 requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. Late-entry groups (missing the baseline) raise `ValueError`; interior-gap groups are dropped with a warning; terminally-missing groups (early exit / right-censoring) are retained and contribute from their observed periods only. This is a documented deviation from R `DIDmultiplegtDYN`, which supports unbalanced panels — see [`docs/methodology/REGISTRY.md`](docs/methodology/REGISTRY.md) for the rationale, the defensive guards that make terminal missingness safe, and workarounds for unbalanced inputs.
+> **Note:** The estimator requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. Late-entry groups (missing the baseline) raise `ValueError`; interior-gap groups are dropped with a warning; terminally-missing groups (early exit / right-censoring) are retained and contribute from their observed periods only. This is a documented deviation from R `DIDmultiplegtDYN`, which supports unbalanced panels - see [`docs/methodology/REGISTRY.md`](docs/methodology/REGISTRY.md) for the rationale, the defensive guards that make terminal missingness safe, and workarounds for unbalanced inputs.
-> **Note:** Survey design (`survey_design`), covariate adjustment (`controls`), group-specific linear trends (`trends_linear`), and HonestDiD integration (`honest_did`) are not yet supported. They raise `NotImplementedError` with phase pointers - see [`ROADMAP.md`](ROADMAP.md) for the Phase 3 rollout.
+> **Note:** Survey design is supported via Taylor-series linearization on `pweight` with strata / PSU / FPC. Replicate-weight variance and PSU-level bootstrap for dCDH are a planned extension. The `aggregate` parameter still raises `NotImplementedError`.
 ### Triple Difference (DDD)

{diff_diff-3.1.1 → diff_diff-3.1.3}/diff_diff/__init__.py RENAMED Viewed

@@ -4,12 +4,14 @@ diff-diff: A library for Difference-in-Differences analysis.
 This library provides sklearn-like estimators for causal inference
 using the difference-in-differences methodology.
-For rigorous analysis, follow the 8-step practitioner workflow in
-docs/llms-practitioner.txt (based on Baker et al. 2025). After
-estimation, call ``practitioner_next_steps(results)`` for context-aware
-guidance on remaining diagnostic steps.
+For rigorous analysis, follow the 8-step practitioner workflow based
+on Baker et al. (2025). After estimation, call
+``practitioner_next_steps(results)`` for context-aware guidance on
+remaining diagnostic steps.
-AI agent reference: docs/llms.txt
+AI agents: call ``diff_diff.get_llm_guide()`` for a complete API reference.
+Use ``get_llm_guide("practitioner")`` for the 8-step workflow or
+``get_llm_guide("full")`` for comprehensive documentation.
 """
 # Import backend detection from dedicated module (avoids circular imports)
@@ -200,6 +202,7 @@ from diff_diff.visualization import (
     plot_synth_weights,
 )
 from diff_diff.practitioner import practitioner_next_steps
+from diff_diff._guides_api import get_llm_guide
 from diff_diff.datasets import (
     clear_cache,
     list_datasets,
@@ -228,7 +231,7 @@ EDiD = EfficientDiD
 ETWFE = WooldridgeDiD
 DCDH = ChaisemartinDHaultfoeuille
-__version__ = "3.1.1"
+__version__ = "3.1.3"
 __all__ = [
     # Estimators
     "DifferenceInDifferences",
@@ -402,4 +405,6 @@ __all__ = [
     "clear_cache",
     # Practitioner guidance
     "practitioner_next_steps",
+    # LLM guide accessor
+    "get_llm_guide",
 ]

diff_diff-3.1.3/diff_diff/_guides_api.py ADDED Viewed

@@ -0,0 +1,48 @@
+"""Runtime accessor for bundled LLM guide files."""
+from __future__ import annotations
+from importlib.resources import files
+_VARIANT_TO_FILE = {
+    "concise": "llms.txt",
+    "full": "llms-full.txt",
+    "practitioner": "llms-practitioner.txt",
+}
+def get_llm_guide(variant: str = "concise") -> str:
+    """Return the contents of a bundled LLM guide.
+    Parameters
+    ----------
+    variant : str, default "concise"
+        Which guide to load. Names are case-sensitive. One of:
+        - ``"concise"`` -- compact API reference (llms.txt)
+        - ``"full"`` -- complete API documentation (llms-full.txt)
+        - ``"practitioner"`` -- 8-step practitioner workflow (llms-practitioner.txt)
+    Returns
+    -------
+    str
+        The full text of the requested guide.
+    Raises
+    ------
+    ValueError
+        If ``variant`` is not one of the known guide names.
+    Examples
+    --------
+    >>> from diff_diff import get_llm_guide
+    >>> concise = get_llm_guide()
+    >>> workflow = get_llm_guide("practitioner")
+    """
+    try:
+        filename = _VARIANT_TO_FILE[variant]
+    except (KeyError, TypeError):
+        valid = ", ".join(repr(k) for k in _VARIANT_TO_FILE)
+        raise ValueError(
+            f"Unknown guide variant {variant!r}. Valid options: {valid}."
+        ) from None
+    return files("diff_diff.guides").joinpath(filename).read_text(encoding="utf-8")

diff-diff 3.1.1__tar.gz → 3.1.3__tar.gz

diff-diff 3.1.1tar.gz → 3.1.3tar.gz