diff-diff 3.0.1__cp314-cp314-win_amd64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (62) hide show
  1. diff_diff/__init__.py +382 -0
  2. diff_diff/_backend.py +134 -0
  3. diff_diff/_rust_backend.cp314-win_amd64.pyd +0 -0
  4. diff_diff/bacon.py +1140 -0
  5. diff_diff/bootstrap_utils.py +730 -0
  6. diff_diff/continuous_did.py +1626 -0
  7. diff_diff/continuous_did_bspline.py +190 -0
  8. diff_diff/continuous_did_results.py +374 -0
  9. diff_diff/datasets.py +815 -0
  10. diff_diff/diagnostics.py +882 -0
  11. diff_diff/efficient_did.py +1770 -0
  12. diff_diff/efficient_did_bootstrap.py +359 -0
  13. diff_diff/efficient_did_covariates.py +899 -0
  14. diff_diff/efficient_did_results.py +368 -0
  15. diff_diff/efficient_did_weights.py +617 -0
  16. diff_diff/estimators.py +1501 -0
  17. diff_diff/honest_did.py +2585 -0
  18. diff_diff/imputation.py +2458 -0
  19. diff_diff/imputation_bootstrap.py +418 -0
  20. diff_diff/imputation_results.py +448 -0
  21. diff_diff/linalg.py +2538 -0
  22. diff_diff/power.py +2588 -0
  23. diff_diff/practitioner.py +869 -0
  24. diff_diff/prep.py +1738 -0
  25. diff_diff/prep_dgp.py +1718 -0
  26. diff_diff/pretrends.py +1105 -0
  27. diff_diff/results.py +918 -0
  28. diff_diff/stacked_did.py +1049 -0
  29. diff_diff/stacked_did_results.py +339 -0
  30. diff_diff/staggered.py +3895 -0
  31. diff_diff/staggered_aggregation.py +864 -0
  32. diff_diff/staggered_bootstrap.py +752 -0
  33. diff_diff/staggered_results.py +416 -0
  34. diff_diff/staggered_triple_diff.py +1545 -0
  35. diff_diff/staggered_triple_diff_results.py +416 -0
  36. diff_diff/sun_abraham.py +1685 -0
  37. diff_diff/survey.py +1981 -0
  38. diff_diff/synthetic_did.py +1136 -0
  39. diff_diff/triple_diff.py +2047 -0
  40. diff_diff/trop.py +952 -0
  41. diff_diff/trop_global.py +1270 -0
  42. diff_diff/trop_local.py +1307 -0
  43. diff_diff/trop_results.py +356 -0
  44. diff_diff/twfe.py +542 -0
  45. diff_diff/two_stage.py +1952 -0
  46. diff_diff/two_stage_bootstrap.py +520 -0
  47. diff_diff/two_stage_results.py +400 -0
  48. diff_diff/utils.py +1902 -0
  49. diff_diff/visualization/__init__.py +61 -0
  50. diff_diff/visualization/_common.py +328 -0
  51. diff_diff/visualization/_continuous.py +274 -0
  52. diff_diff/visualization/_diagnostic.py +817 -0
  53. diff_diff/visualization/_event_study.py +1086 -0
  54. diff_diff/visualization/_power.py +661 -0
  55. diff_diff/visualization/_staggered.py +833 -0
  56. diff_diff/visualization/_synthetic.py +197 -0
  57. diff_diff/wooldridge.py +1285 -0
  58. diff_diff/wooldridge_results.py +349 -0
  59. diff_diff-3.0.1.dist-info/METADATA +2997 -0
  60. diff_diff-3.0.1.dist-info/RECORD +62 -0
  61. diff_diff-3.0.1.dist-info/WHEEL +4 -0
  62. diff_diff-3.0.1.dist-info/sboms/diff_diff_rust.cyclonedx.json +5843 -0
@@ -0,0 +1,2997 @@
1
+ Metadata-Version: 2.4
2
+ Name: diff-diff
3
+ Version: 3.0.1
4
+ Classifier: Development Status :: 5 - Production/Stable
5
+ Classifier: Intended Audience :: Science/Research
6
+ Classifier: Operating System :: OS Independent
7
+ Classifier: Programming Language :: Python :: 3
8
+ Classifier: Programming Language :: Python :: 3.9
9
+ Classifier: Programming Language :: Python :: 3.10
10
+ Classifier: Programming Language :: Python :: 3.11
11
+ Classifier: Programming Language :: Python :: 3.12
12
+ Classifier: Programming Language :: Python :: 3.13
13
+ Classifier: Programming Language :: Python :: 3.14
14
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
15
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
16
+ Classifier: Topic :: Scientific/Engineering
17
+ Requires-Dist: numpy>=1.20.0
18
+ Requires-Dist: pandas>=1.3.0
19
+ Requires-Dist: scipy>=1.7.0
20
+ Requires-Dist: pytest>=7.0 ; extra == 'dev'
21
+ Requires-Dist: pytest-xdist>=3.0 ; extra == 'dev'
22
+ Requires-Dist: pytest-cov>=4.0 ; extra == 'dev'
23
+ Requires-Dist: black>=23.0 ; extra == 'dev'
24
+ Requires-Dist: ruff>=0.1.0 ; extra == 'dev'
25
+ Requires-Dist: mypy>=1.0 ; extra == 'dev'
26
+ Requires-Dist: maturin>=1.4,<2.0 ; extra == 'dev'
27
+ Requires-Dist: matplotlib>=3.5 ; extra == 'dev'
28
+ Requires-Dist: nbmake>=1.5 ; extra == 'dev'
29
+ Requires-Dist: plotly>=5.0 ; extra == 'dev'
30
+ Requires-Dist: sphinx>=6.0 ; extra == 'docs'
31
+ Requires-Dist: pydata-sphinx-theme>=0.15 ; extra == 'docs'
32
+ Requires-Dist: sphinxext-opengraph>=0.9 ; extra == 'docs'
33
+ Requires-Dist: sphinx-sitemap>=2.5 ; extra == 'docs'
34
+ Requires-Dist: nbsphinx>=0.9 ; extra == 'docs'
35
+ Requires-Dist: matplotlib>=3.5 ; extra == 'docs'
36
+ Requires-Dist: plotly>=5.0 ; extra == 'plotly'
37
+ Provides-Extra: dev
38
+ Provides-Extra: docs
39
+ Provides-Extra: plotly
40
+ Summary: Difference-in-Differences causal inference with sklearn-like API. Callaway-Sant'Anna, Synthetic DiD, Honest DiD, event studies, parallel trends.
41
+ Keywords: causal-inference,difference-in-differences,econometrics,statistics,treatment-effects,event-study,staggered-adoption,parallel-trends,synthetic-control,panel-data,did,twfe,callaway-santanna,honest-did,sensitivity-analysis
42
+ Author: diff-diff contributors
43
+ License-Expression: MIT
44
+ Requires-Python: >=3.9, <3.15
45
+ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
46
+ Project-URL: Documentation, https://diff-diff.readthedocs.io
47
+ Project-URL: Homepage, https://github.com/igerber/diff-diff
48
+ Project-URL: Issues, https://github.com/igerber/diff-diff/issues
49
+ Project-URL: Practitioner Guide, https://github.com/igerber/diff-diff/blob/main/docs/llms-practitioner.txt
50
+ Project-URL: Repository, https://github.com/igerber/diff-diff
51
+
52
+ # diff-diff
53
+
54
+ [![PyPI version](https://img.shields.io/pypi/v/diff-diff.svg)](https://pypi.org/project/diff-diff/)
55
+ [![Python versions](https://img.shields.io/pypi/pyversions/diff-diff.svg)](https://pypi.org/project/diff-diff/)
56
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
57
+ [![Downloads](https://img.shields.io/pypi/dm/diff-diff.svg)](https://pypi.org/project/diff-diff/)
58
+ [![Documentation](https://readthedocs.org/projects/diff-diff/badge/?version=stable)](https://diff-diff.readthedocs.io/en/stable/)
59
+
60
+ A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs.
61
+
62
+ ## Installation
63
+
64
+ ```bash
65
+ pip install diff-diff
66
+ ```
67
+
68
+ Or install from source:
69
+
70
+ ```bash
71
+ git clone https://github.com/igerber/diff-diff.git
72
+ cd diff-diff
73
+ pip install -e .
74
+ ```
75
+
76
+ ## Quick Start
77
+
78
+ ```python
79
+ import pandas as pd
80
+ from diff_diff import DifferenceInDifferences # or: DiD
81
+
82
+ # Create sample data
83
+ data = pd.DataFrame({
84
+ 'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
85
+ 'treated': [1, 1, 1, 1, 0, 0, 0, 0],
86
+ 'post': [0, 0, 1, 1, 0, 0, 1, 1]
87
+ })
88
+
89
+ # Fit the model
90
+ did = DifferenceInDifferences()
91
+ results = did.fit(data, outcome='outcome', treatment='treated', time='post')
92
+
93
+ # View results
94
+ print(results) # DiDResults(ATT=3.0000, SE=1.7321, p=0.1583)
95
+ results.print_summary()
96
+ ```
97
+
98
+ Output:
99
+ ```
100
+ ======================================================================
101
+ Difference-in-Differences Estimation Results
102
+ ======================================================================
103
+
104
+ Observations: 8
105
+ Treated units: 4
106
+ Control units: 4
107
+ R-squared: 0.9055
108
+
109
+ ----------------------------------------------------------------------
110
+ Parameter Estimate Std. Err. t-stat P>|t|
111
+ ----------------------------------------------------------------------
112
+ ATT 3.0000 1.7321 1.732 0.1583
113
+ ----------------------------------------------------------------------
114
+
115
+ 95% Confidence Interval: [-1.8089, 7.8089]
116
+
117
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
118
+ ======================================================================
119
+ ```
120
+
121
+ ## For AI Agents
122
+
123
+ If you are an AI agent or LLM using this library, read [`docs/llms.txt`](docs/llms.txt) for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis — not just calling `fit()`, but testing assumptions, running sensitivity analysis, and checking robustness.
124
+
125
+ After estimation, call `practitioner_next_steps(results)` for context-aware guidance on remaining diagnostic steps.
126
+
127
+ Detailed guide: [`docs/llms-practitioner.txt`](docs/llms-practitioner.txt)
128
+
129
+ ## Features
130
+
131
+ - **sklearn-like API**: Familiar `fit()` interface with `get_params()` and `set_params()`
132
+ - **Pythonic results**: Easy access to coefficients, standard errors, and confidence intervals
133
+ - **Multiple interfaces**: Column names or R-style formulas
134
+ - **Robust inference**: Heteroskedasticity-robust (HC1) and cluster-robust standard errors
135
+ - **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
136
+ - **Panel data support**: Two-way fixed effects estimator for panel designs
137
+ - **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
138
+ - **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing, Freedman & Hollingsworth 2024), Efficient DiD (Chen, Sant'Anna & Xie 2025), and Wooldridge ETWFE (2021/2023) estimators for heterogeneous treatment timing
139
+ - **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
140
+ - **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
141
+ - **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
142
+ - **Event study plots**: Publication-ready visualization of treatment effects
143
+ - **Parallel trends testing**: Multiple methods including equivalence tests
144
+ - **Goodman-Bacon decomposition**: Diagnose TWFE bias by decomposing into 2x2 comparisons
145
+ - **Placebo tests**: Comprehensive diagnostics including fake timing, fake group, permutation, and leave-one-out tests
146
+ - **Honest DiD sensitivity analysis**: Rambachan-Roth (2023) bounds and breakdown analysis for parallel trends violations
147
+ - **Pre-trends power analysis**: Roth (2022) minimum detectable violation (MDV) and power curves for pre-trends tests
148
+ - **Power analysis**: MDE, sample size, and power calculations for study design; simulation-based power for any estimator
149
+ - **Data prep utilities**: Helper functions for common data preparation tasks
150
+ - **Validated against R**: Benchmarked against `did`, `synthdid`, and `fixest` packages (see [benchmarks](docs/benchmarks.rst))
151
+
152
+ ## Estimator Aliases
153
+
154
+ All estimators have short aliases for convenience:
155
+
156
+ | Alias | Full Name | Method |
157
+ |-------|-----------|--------|
158
+ | `DiD` | `DifferenceInDifferences` | Basic 2x2 DiD |
159
+ | `TWFE` | `TwoWayFixedEffects` | Two-way fixed effects |
160
+ | `EventStudy` | `MultiPeriodDiD` | Event study / multi-period |
161
+ | `CS` | `CallawaySantAnna` | Callaway & Sant'Anna (2021) |
162
+ | `SA` | `SunAbraham` | Sun & Abraham (2021) |
163
+ | `BJS` | `ImputationDiD` | Borusyak, Jaravel & Spiess (2024) |
164
+ | `Gardner` | `TwoStageDiD` | Gardner (2022) two-stage |
165
+ | `SDiD` | `SyntheticDiD` | Synthetic DiD |
166
+ | `DDD` | `TripleDifference` | Triple difference |
167
+ | `CDiD` | `ContinuousDiD` | Continuous treatment DiD |
168
+ | `Stacked` | `StackedDiD` | Stacked DiD |
169
+ | `Bacon` | `BaconDecomposition` | Goodman-Bacon decomposition |
170
+ | `EDiD` | `EfficientDiD` | Efficient DiD |
171
+ | `ETWFE` | `WooldridgeDiD` | Wooldridge ETWFE (2021/2023) |
172
+
173
+ `TROP` already uses its short canonical name and needs no alias.
174
+
175
+ ## Tutorials
176
+
177
+ We provide Jupyter notebook tutorials in `docs/tutorials/`:
178
+
179
+ | Notebook | Description |
180
+ |----------|-------------|
181
+ | `01_basic_did.ipynb` | Basic 2x2 DiD, formula interface, covariates, fixed effects, cluster-robust SE, wild bootstrap |
182
+ | `02_staggered_did.ipynb` | Staggered adoption with Callaway-Sant'Anna and Sun-Abraham, group-time effects, aggregation methods, Bacon decomposition |
183
+ | `03_synthetic_did.ipynb` | Synthetic DiD, unit/time weights, inference methods, regularization |
184
+ | `04_parallel_trends.ipynb` | Testing parallel trends, equivalence tests, placebo tests, diagnostics |
185
+ | `05_honest_did.ipynb` | Honest DiD sensitivity analysis, bounds, breakdown values, visualization |
186
+ | `06_power_analysis.ipynb` | Power analysis, MDE, sample size calculations, simulation-based power |
187
+ | `07_pretrends_power.ipynb` | Pre-trends power analysis (Roth 2022), MDV, power curves |
188
+ | `08_triple_diff.ipynb` | Triple Difference (DDD) estimation with proper covariate handling |
189
+ | `09_real_world_examples.ipynb` | Real-world data examples (Card-Krueger, Castle Doctrine, Divorce Laws) |
190
+ | `10_trop.ipynb` | Triply Robust Panel (TROP) estimation with factor model adjustment |
191
+ | `11_imputation_did.ipynb` | Imputation DiD (Borusyak et al. 2024), pre-trend test, efficiency comparison |
192
+ | `12_two_stage_did.ipynb` | Two-Stage DiD (Gardner 2022), GMM sandwich variance, per-observation effects |
193
+ | `13_stacked_did.ipynb` | Stacked DiD (Wing et al. 2024), Q-weights, sub-experiment inspection, trimming, clean control definitions |
194
+ | `15_efficient_did.ipynb` | Efficient DiD (Chen et al. 2025), optimal weighting, PT-All vs PT-Post, efficiency gains, bootstrap inference |
195
+ | `16_survey_did.ipynb` | Survey-aware DiD with complex sampling designs (strata, PSU, FPC, weights), replicate weights, subpopulation analysis, DEFF diagnostics |
196
+
197
+ ## Data Preparation
198
+
199
+ diff-diff provides utility functions to help prepare your data for DiD analysis. These functions handle common data transformation tasks like creating treatment indicators, reshaping panel data, and validating data formats.
200
+
201
+ ### Generate Sample Data
202
+
203
+ Create synthetic data with a known treatment effect for testing and learning:
204
+
205
+ ```python
206
+ from diff_diff import generate_did_data, DifferenceInDifferences
207
+
208
+ # Generate panel data with 100 units, 4 periods, and a treatment effect of 5
209
+ data = generate_did_data(
210
+ n_units=100,
211
+ n_periods=4,
212
+ treatment_effect=5.0,
213
+ treatment_fraction=0.5, # 50% of units are treated
214
+ treatment_period=2, # Treatment starts at period 2
215
+ seed=42
216
+ )
217
+
218
+ # Verify the estimator recovers the treatment effect
219
+ did = DifferenceInDifferences()
220
+ results = did.fit(data, outcome='outcome', treatment='treated', time='post')
221
+ print(f"Estimated ATT: {results.att:.2f} (true: 5.0)")
222
+ ```
223
+
224
+ ### Create Treatment Indicators
225
+
226
+ Convert categorical variables or numeric thresholds to binary treatment indicators:
227
+
228
+ ```python
229
+ from diff_diff import make_treatment_indicator
230
+
231
+ # From categorical variable
232
+ df = make_treatment_indicator(
233
+ data,
234
+ column='state',
235
+ treated_values=['CA', 'NY', 'TX'] # These states are treated
236
+ )
237
+
238
+ # From numeric threshold (e.g., firms above median size)
239
+ df = make_treatment_indicator(
240
+ data,
241
+ column='firm_size',
242
+ threshold=data['firm_size'].median()
243
+ )
244
+
245
+ # Treat units below threshold
246
+ df = make_treatment_indicator(
247
+ data,
248
+ column='income',
249
+ threshold=50000,
250
+ above_threshold=False # Units with income <= 50000 are treated
251
+ )
252
+ ```
253
+
254
+ ### Create Post-Treatment Indicators
255
+
256
+ Convert time/date columns to binary post-treatment indicators:
257
+
258
+ ```python
259
+ from diff_diff import make_post_indicator
260
+
261
+ # From specific post-treatment periods
262
+ df = make_post_indicator(
263
+ data,
264
+ time_column='year',
265
+ post_periods=[2020, 2021, 2022]
266
+ )
267
+
268
+ # From treatment start date
269
+ df = make_post_indicator(
270
+ data,
271
+ time_column='year',
272
+ treatment_start=2020 # All years >= 2020 are post-treatment
273
+ )
274
+
275
+ # Works with datetime columns
276
+ df = make_post_indicator(
277
+ data,
278
+ time_column='date',
279
+ treatment_start='2020-01-01'
280
+ )
281
+ ```
282
+
283
+ ### Reshape Wide to Long Format
284
+
285
+ Convert wide-format data (one row per unit, multiple time columns) to long format:
286
+
287
+ ```python
288
+ from diff_diff import wide_to_long
289
+
290
+ # Wide format: columns like sales_2019, sales_2020, sales_2021
291
+ wide_df = pd.DataFrame({
292
+ 'firm_id': [1, 2, 3],
293
+ 'industry': ['tech', 'retail', 'tech'],
294
+ 'sales_2019': [100, 150, 200],
295
+ 'sales_2020': [110, 160, 210],
296
+ 'sales_2021': [120, 170, 220]
297
+ })
298
+
299
+ # Convert to long format for DiD
300
+ long_df = wide_to_long(
301
+ wide_df,
302
+ value_columns=['sales_2019', 'sales_2020', 'sales_2021'],
303
+ id_column='firm_id',
304
+ time_name='year',
305
+ value_name='sales',
306
+ time_values=[2019, 2020, 2021]
307
+ )
308
+ # Result: 9 rows (3 firms × 3 years), columns: firm_id, year, sales, industry
309
+ ```
310
+
311
+ ### Balance Panel Data
312
+
313
+ Ensure all units have observations for all time periods:
314
+
315
+ ```python
316
+ from diff_diff import balance_panel
317
+
318
+ # Keep only units with complete data (drop incomplete units)
319
+ balanced = balance_panel(
320
+ data,
321
+ unit_column='firm_id',
322
+ time_column='year',
323
+ method='inner'
324
+ )
325
+
326
+ # Include all unit-period combinations (creates NaN for missing)
327
+ balanced = balance_panel(
328
+ data,
329
+ unit_column='firm_id',
330
+ time_column='year',
331
+ method='outer'
332
+ )
333
+
334
+ # Fill missing values
335
+ balanced = balance_panel(
336
+ data,
337
+ unit_column='firm_id',
338
+ time_column='year',
339
+ method='fill',
340
+ fill_value=0 # Or None for forward/backward fill
341
+ )
342
+ ```
343
+
344
+ ### Validate Data
345
+
346
+ Check that your data meets DiD requirements before fitting:
347
+
348
+ ```python
349
+ from diff_diff import validate_did_data
350
+
351
+ # Validate and get informative error messages
352
+ result = validate_did_data(
353
+ data,
354
+ outcome='sales',
355
+ treatment='treated',
356
+ time='post',
357
+ unit='firm_id', # Optional: for panel-specific validation
358
+ raise_on_error=False # Return dict instead of raising
359
+ )
360
+
361
+ if result['valid']:
362
+ print("Data is ready for DiD analysis!")
363
+ print(f"Summary: {result['summary']}")
364
+ else:
365
+ print("Issues found:")
366
+ for error in result['errors']:
367
+ print(f" - {error}")
368
+
369
+ for warning in result['warnings']:
370
+ print(f"Warning: {warning}")
371
+ ```
372
+
373
+ ### Summarize Data by Groups
374
+
375
+ Get summary statistics for each treatment-time cell:
376
+
377
+ ```python
378
+ from diff_diff import summarize_did_data
379
+
380
+ summary = summarize_did_data(
381
+ data,
382
+ outcome='sales',
383
+ treatment='treated',
384
+ time='post'
385
+ )
386
+ print(summary)
387
+ ```
388
+
389
+ Output:
390
+ ```
391
+ n mean std min max
392
+ Control - Pre 250 100.5000 15.2340 65.0000 145.0000
393
+ Control - Post 250 105.2000 16.1230 68.0000 152.0000
394
+ Treated - Pre 250 101.2000 14.8900 67.0000 143.0000
395
+ Treated - Post 250 115.8000 17.5600 72.0000 165.0000
396
+ DiD Estimate - 9.9000 - - -
397
+ ```
398
+
399
+ ### Create Event Time for Staggered Designs
400
+
401
+ For designs where treatment occurs at different times:
402
+
403
+ ```python
404
+ from diff_diff import create_event_time
405
+
406
+ # Add event-time column relative to treatment timing
407
+ df = create_event_time(
408
+ data,
409
+ time_column='year',
410
+ treatment_time_column='treatment_year'
411
+ )
412
+ # Result: event_time = -2, -1, 0, 1, 2 relative to treatment
413
+ ```
414
+
415
+ ### Aggregate to Cohort Means
416
+
417
+ Aggregate unit-level data for visualization:
418
+
419
+ ```python
420
+ from diff_diff import aggregate_to_cohorts
421
+
422
+ cohort_data = aggregate_to_cohorts(
423
+ data,
424
+ unit_column='firm_id',
425
+ time_column='year',
426
+ treatment_column='treated',
427
+ outcome='sales'
428
+ )
429
+ # Result: mean outcome by treatment group and period
430
+ ```
431
+
432
+ ### Rank Control Units
433
+
434
+ Select the best control units for DiD or Synthetic DiD analysis by ranking them based on pre-treatment outcome similarity:
435
+
436
+ ```python
437
+ from diff_diff import rank_control_units, generate_did_data
438
+
439
+ # Generate sample data
440
+ data = generate_did_data(n_units=50, n_periods=6, seed=42)
441
+
442
+ # Rank control units by their similarity to treated units
443
+ ranking = rank_control_units(
444
+ data,
445
+ unit_column='unit',
446
+ time_column='period',
447
+ outcome_column='outcome',
448
+ treatment_column='treated',
449
+ n_top=10 # Return top 10 controls
450
+ )
451
+
452
+ print(ranking[['unit', 'quality_score', 'pre_trend_rmse']])
453
+ ```
454
+
455
+ Output:
456
+ ```
457
+ unit quality_score pre_trend_rmse
458
+ 0 35 1.0000 0.4521
459
+ 1 42 0.9234 0.5123
460
+ 2 28 0.8876 0.5892
461
+ ...
462
+ ```
463
+
464
+ With covariates for matching:
465
+
466
+ ```python
467
+ # Add covariate-based matching
468
+ ranking = rank_control_units(
469
+ data,
470
+ unit_column='unit',
471
+ time_column='period',
472
+ outcome_column='outcome',
473
+ treatment_column='treated',
474
+ covariates=['size', 'age'], # Match on these too
475
+ outcome_weight=0.7, # 70% weight on outcome trends
476
+ covariate_weight=0.3 # 30% weight on covariate similarity
477
+ )
478
+ ```
479
+
480
+ Filter data for SyntheticDiD using top controls:
481
+
482
+ ```python
483
+ from diff_diff import SyntheticDiD
484
+
485
+ # Get top control units
486
+ top_controls = ranking['unit'].tolist()
487
+
488
+ # Filter data to treated + top controls
489
+ filtered_data = data[
490
+ (data['treated'] == 1) | (data['unit'].isin(top_controls))
491
+ ]
492
+
493
+ # Fit SyntheticDiD with selected controls
494
+ sdid = SyntheticDiD()
495
+ results = sdid.fit(
496
+ filtered_data,
497
+ outcome='outcome',
498
+ treatment='treated',
499
+ unit='unit',
500
+ time='period',
501
+ post_periods=[3, 4, 5]
502
+ )
503
+ ```
504
+
505
+ ## Usage
506
+
507
+ ### Basic DiD with Column Names
508
+
509
+ ```python
510
+ from diff_diff import DifferenceInDifferences
511
+
512
+ did = DifferenceInDifferences(robust=True, alpha=0.05)
513
+ results = did.fit(
514
+ data,
515
+ outcome='sales',
516
+ treatment='treated',
517
+ time='post_policy'
518
+ )
519
+
520
+ # Access results
521
+ print(f"ATT: {results.att:.4f}")
522
+ print(f"Standard Error: {results.se:.4f}")
523
+ print(f"P-value: {results.p_value:.4f}")
524
+ print(f"95% CI: {results.conf_int}")
525
+ print(f"Significant: {results.is_significant}")
526
+ ```
527
+
528
+ ### Using Formula Interface
529
+
530
+ ```python
531
+ # R-style formula syntax
532
+ results = did.fit(data, formula='outcome ~ treated * post')
533
+
534
+ # Explicit interaction syntax
535
+ results = did.fit(data, formula='outcome ~ treated + post + treated:post')
536
+
537
+ # With covariates
538
+ results = did.fit(data, formula='outcome ~ treated * post + age + income')
539
+ ```
540
+
541
+ ### Including Covariates
542
+
543
+ ```python
544
+ results = did.fit(
545
+ data,
546
+ outcome='outcome',
547
+ treatment='treated',
548
+ time='post',
549
+ covariates=['age', 'income', 'education']
550
+ )
551
+ ```
552
+
553
+ ### Fixed Effects
554
+
555
+ Use `fixed_effects` for low-dimensional categorical controls (creates dummy variables):
556
+
557
+ ```python
558
+ # State and industry fixed effects
559
+ results = did.fit(
560
+ data,
561
+ outcome='sales',
562
+ treatment='treated',
563
+ time='post',
564
+ fixed_effects=['state', 'industry']
565
+ )
566
+
567
+ # Access fixed effect coefficients
568
+ state_coefs = {k: v for k, v in results.coefficients.items() if k.startswith('state_')}
569
+ ```
570
+
571
+ Use `absorb` for high-dimensional fixed effects (more efficient, uses within-transformation):
572
+
573
+ ```python
574
+ # Absorb firm-level fixed effects (efficient for many firms)
575
+ results = did.fit(
576
+ data,
577
+ outcome='sales',
578
+ treatment='treated',
579
+ time='post',
580
+ absorb=['firm_id']
581
+ )
582
+ ```
583
+
584
+ Combine covariates with fixed effects:
585
+
586
+ ```python
587
+ results = did.fit(
588
+ data,
589
+ outcome='sales',
590
+ treatment='treated',
591
+ time='post',
592
+ covariates=['size', 'age'], # Linear controls
593
+ fixed_effects=['industry'], # Low-dimensional FE (dummies)
594
+ absorb=['firm_id'] # High-dimensional FE (absorbed)
595
+ )
596
+ ```
597
+
598
+ ### Cluster-Robust Standard Errors
599
+
600
+ ```python
601
+ did = DifferenceInDifferences(cluster='state')
602
+ results = did.fit(
603
+ data,
604
+ outcome='outcome',
605
+ treatment='treated',
606
+ time='post'
607
+ )
608
+ ```
609
+
610
+ ### Wild Cluster Bootstrap
611
+
612
+ When you have few clusters (<50), standard cluster-robust SEs are biased. Wild cluster bootstrap provides valid inference even with 5-10 clusters.
613
+
614
+ ```python
615
+ # Use wild bootstrap for inference
616
+ did = DifferenceInDifferences(
617
+ cluster='state',
618
+ inference='wild_bootstrap',
619
+ n_bootstrap=999,
620
+ bootstrap_weights='rademacher', # or 'webb' for <10 clusters, 'mammen'
621
+ seed=42
622
+ )
623
+ results = did.fit(data, outcome='y', treatment='treated', time='post')
624
+
625
+ # Results include bootstrap-based SE and p-value
626
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
627
+ print(f"P-value: {results.p_value:.4f}")
628
+ print(f"95% CI: {results.conf_int}")
629
+ print(f"Inference method: {results.inference_method}")
630
+ print(f"Number of clusters: {results.n_clusters}")
631
+ ```
632
+
633
+ **Weight types:**
634
+ - `'rademacher'` - Default, ±1 with p=0.5, good for most cases
635
+ - `'webb'` - 6-point distribution, recommended for <10 clusters
636
+ - `'mammen'` - Two-point distribution, alternative to Rademacher
637
+
638
+ Works with `DifferenceInDifferences` and `TwoWayFixedEffects` estimators.
639
+
640
+ ### Two-Way Fixed Effects (Panel Data)
641
+
642
+ ```python
643
+ from diff_diff import TwoWayFixedEffects
644
+
645
+ twfe = TwoWayFixedEffects()
646
+ results = twfe.fit(
647
+ panel_data,
648
+ outcome='outcome',
649
+ treatment='treated',
650
+ time='year',
651
+ unit='firm_id'
652
+ )
653
+ ```
654
+
655
+ ### Multi-Period DiD (Event Study)
656
+
657
+ For settings with multiple pre- and post-treatment periods. Estimates treatment × period
658
+ interactions for ALL periods (pre and post), enabling parallel trends assessment:
659
+
660
+ ```python
661
+ from diff_diff import MultiPeriodDiD
662
+
663
+ # Fit full event study with pre and post period effects
664
+ did = MultiPeriodDiD()
665
+ results = did.fit(
666
+ panel_data,
667
+ outcome='sales',
668
+ treatment='treated',
669
+ time='period',
670
+ post_periods=[3, 4, 5], # Periods 3-5 are post-treatment
671
+ reference_period=2, # Last pre-period (e=-1 convention)
672
+ unit='unit_id', # Optional: warns if staggered adoption detected
673
+ )
674
+
675
+ # Pre-period effects test parallel trends (should be ≈ 0)
676
+ for period, effect in results.pre_period_effects.items():
677
+ print(f"Pre {period}: {effect.effect:.3f} (SE: {effect.se:.3f})")
678
+
679
+ # Post-period effects estimate dynamic treatment effects
680
+ for period, effect in results.post_period_effects.items():
681
+ print(f"Post {period}: {effect.effect:.3f} (SE: {effect.se:.3f})")
682
+
683
+ # View average treatment effect across post-periods
684
+ print(f"Average ATT: {results.avg_att:.3f}")
685
+ print(f"Average SE: {results.avg_se:.3f}")
686
+
687
+ # Full summary with pre and post period effects
688
+ results.print_summary()
689
+ ```
690
+
691
+ Output:
692
+ ```
693
+ ================================================================================
694
+ Multi-Period Difference-in-Differences Estimation Results
695
+ ================================================================================
696
+
697
+ Observations: 600
698
+ Pre-treatment periods: 3
699
+ Post-treatment periods: 3
700
+
701
+ --------------------------------------------------------------------------------
702
+ Average Treatment Effect
703
+ --------------------------------------------------------------------------------
704
+ Average ATT 5.2000 0.8234 6.315 0.0000
705
+ --------------------------------------------------------------------------------
706
+ 95% Confidence Interval: [3.5862, 6.8138]
707
+
708
+ Period-Specific Effects:
709
+ --------------------------------------------------------------------------------
710
+ Period Effect Std. Err. t-stat P>|t|
711
+ --------------------------------------------------------------------------------
712
+ 3 4.5000 0.9512 4.731 0.0000***
713
+ 4 5.2000 0.8876 5.858 0.0000***
714
+ 5 5.9000 0.9123 6.468 0.0000***
715
+ --------------------------------------------------------------------------------
716
+
717
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
718
+ ================================================================================
719
+ ```
720
+
721
+ ### Staggered Difference-in-Differences (Callaway-Sant'Anna)
722
+
723
+ When treatment is adopted at different times by different units, traditional TWFE estimators can be biased. The Callaway-Sant'Anna estimator provides unbiased estimates with staggered adoption.
724
+
725
+ ```python
726
+ from diff_diff import CallawaySantAnna
727
+
728
+ # Panel data with staggered treatment
729
+ # 'first_treat' = period when unit was first treated (0 if never treated)
730
+ cs = CallawaySantAnna()
731
+ results = cs.fit(
732
+ panel_data,
733
+ outcome='sales',
734
+ unit='firm_id',
735
+ time='year',
736
+ first_treat='first_treat', # 0 for never-treated, else first treatment year
737
+ aggregate='event_study' # Compute event study effects
738
+ )
739
+
740
+ # View results
741
+ results.print_summary()
742
+
743
+ # Access group-time effects ATT(g,t)
744
+ for (group, time), effect in results.group_time_effects.items():
745
+ print(f"Cohort {group}, Period {time}: {effect['effect']:.3f}")
746
+
747
+ # Event study effects (averaged by relative time)
748
+ for rel_time, effect in results.event_study_effects.items():
749
+ print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")
750
+
751
+ # Convert to DataFrame
752
+ df = results.to_dataframe(level='event_study')
753
+ ```
754
+
755
+ Output:
756
+ ```
757
+ =====================================================================================
758
+ Callaway-Sant'Anna Staggered Difference-in-Differences Results
759
+ =====================================================================================
760
+
761
+ Total observations: 600
762
+ Treated units: 35
763
+ Control units: 15
764
+ Treatment cohorts: 3
765
+ Time periods: 8
766
+ Control group: never_treated
767
+
768
+ -------------------------------------------------------------------------------------
769
+ Overall Average Treatment Effect on the Treated
770
+ -------------------------------------------------------------------------------------
771
+ Parameter Estimate Std. Err. t-stat P>|t| Sig.
772
+ -------------------------------------------------------------------------------------
773
+ ATT 2.5000 0.3521 7.101 0.0000 ***
774
+ -------------------------------------------------------------------------------------
775
+
776
+ 95% Confidence Interval: [1.8099, 3.1901]
777
+
778
+ -------------------------------------------------------------------------------------
779
+ Event Study (Dynamic) Effects
780
+ -------------------------------------------------------------------------------------
781
+ Rel. Period Estimate Std. Err. t-stat P>|t| Sig.
782
+ -------------------------------------------------------------------------------------
783
+ 0 2.1000 0.4521 4.645 0.0000 ***
784
+ 1 2.5000 0.4123 6.064 0.0000 ***
785
+ 2 2.8000 0.5234 5.349 0.0000 ***
786
+ -------------------------------------------------------------------------------------
787
+
788
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
789
+ =====================================================================================
790
+ ```
791
+
792
+ **When to use Callaway-Sant'Anna vs TWFE:**
793
+
794
+ | Scenario | Use TWFE | Use Callaway-Sant'Anna |
795
+ |----------|----------|------------------------|
796
+ | All units treated at same time | ✓ | ✓ |
797
+ | Staggered adoption, homogeneous effects | ✓ | ✓ |
798
+ | Staggered adoption, heterogeneous effects | ✗ | ✓ |
799
+ | Need event study with staggered timing | ✗ | ✓ |
800
+ | Fewer than ~20 treated units | ✓ | Depends on design |
801
+
802
+ **Parameters:**
803
+
804
+ ```python
805
+ CallawaySantAnna(
806
+ control_group='never_treated', # or 'not_yet_treated'
807
+ anticipation=0, # Periods before treatment with effects
808
+ estimation_method='dr', # 'dr', 'ipw', or 'reg'
809
+ alpha=0.05, # Significance level
810
+ cluster=None, # Column for cluster SEs
811
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
812
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
813
+ seed=None # Random seed
814
+ )
815
+ ```
816
+
817
+ **Multiplier bootstrap for inference:**
818
+
819
+ With few clusters or when analytical standard errors may be unreliable, use the multiplier bootstrap for valid inference. This implements the approach from Callaway & Sant'Anna (2021).
820
+
821
+ ```python
822
+ # Bootstrap inference with 999 iterations
823
+ cs = CallawaySantAnna(
824
+ n_bootstrap=999,
825
+ bootstrap_weights='rademacher', # or 'mammen', 'webb'
826
+ seed=42
827
+ )
828
+ results = cs.fit(
829
+ data,
830
+ outcome='sales',
831
+ unit='firm_id',
832
+ time='year',
833
+ first_treat='first_treat',
834
+ aggregate='event_study'
835
+ )
836
+
837
+ # Access bootstrap results
838
+ print(f"Overall ATT: {results.overall_att:.3f}")
839
+ print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}")
840
+ print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}")
841
+ print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}")
842
+
843
+ # Event study bootstrap inference
844
+ for rel_time, se in results.bootstrap_results.event_study_ses.items():
845
+ ci = results.bootstrap_results.event_study_cis[rel_time]
846
+ print(f"e={rel_time}: SE={se:.3f}, 95% CI=[{ci[0]:.3f}, {ci[1]:.3f}]")
847
+ ```
848
+
849
+ **Bootstrap weight types:**
850
+ - `'rademacher'` - Default, ±1 with p=0.5, good for most cases
851
+ - `'mammen'` - Two-point distribution matching first 3 moments
852
+ - `'webb'` - Six-point distribution, recommended for very few clusters (<10)
853
+
854
+ **Covariate adjustment for conditional parallel trends:**
855
+
856
+ When parallel trends only holds conditional on covariates, use the `covariates` parameter:
857
+
858
+ ```python
859
+ # Doubly robust estimation with covariates
860
+ cs = CallawaySantAnna(estimation_method='dr') # 'dr', 'ipw', or 'reg'
861
+ results = cs.fit(
862
+ data,
863
+ outcome='sales',
864
+ unit='firm_id',
865
+ time='year',
866
+ first_treat='first_treat',
867
+ covariates=['size', 'age', 'industry'], # Covariates for conditional PT
868
+ aggregate='event_study'
869
+ )
870
+ ```
871
+
872
+ ### Sun-Abraham Interaction-Weighted Estimator
873
+
874
+ The Sun-Abraham (2021) estimator provides an alternative to Callaway-Sant'Anna using an interaction-weighted (IW) regression approach. Running both estimators serves as a useful robustness check—when they agree, results are more credible.
875
+
876
+ ```python
877
+ from diff_diff import SunAbraham
878
+
879
+ # Basic usage
880
+ sa = SunAbraham()
881
+ results = sa.fit(
882
+ panel_data,
883
+ outcome='sales',
884
+ unit='firm_id',
885
+ time='year',
886
+ first_treat='first_treat' # 0 for never-treated, else first treatment year
887
+ )
888
+
889
+ # View results
890
+ results.print_summary()
891
+
892
+ # Event study effects (by relative time to treatment)
893
+ for rel_time, effect in results.event_study_effects.items():
894
+ print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")
895
+
896
+ # Overall ATT
897
+ print(f"Overall ATT: {results.overall_att:.3f} (SE: {results.overall_se:.3f})")
898
+
899
+ # Cohort weights (how each cohort contributes to each event-time estimate)
900
+ for rel_time, weights in results.cohort_weights.items():
901
+ print(f"e={rel_time}: {weights}")
902
+ ```
903
+
904
+ **Parameters:**
905
+
906
+ ```python
907
+ SunAbraham(
908
+ control_group='never_treated', # or 'not_yet_treated'
909
+ anticipation=0, # Periods before treatment with effects
910
+ alpha=0.05, # Significance level
911
+ cluster=None, # Column for cluster SEs
912
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
913
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
914
+ seed=None # Random seed
915
+ )
916
+ ```
917
+
918
+ **Bootstrap inference:**
919
+
920
+ ```python
921
+ # Bootstrap inference with 999 iterations
922
+ sa = SunAbraham(
923
+ n_bootstrap=999,
924
+ bootstrap_weights='rademacher',
925
+ seed=42
926
+ )
927
+ results = sa.fit(
928
+ data,
929
+ outcome='sales',
930
+ unit='firm_id',
931
+ time='year',
932
+ first_treat='first_treat'
933
+ )
934
+
935
+ # Access bootstrap results
936
+ print(f"Overall ATT: {results.overall_att:.3f}")
937
+ print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}")
938
+ print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}")
939
+ print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}")
940
+ ```
941
+
942
+ **When to use Sun-Abraham vs Callaway-Sant'Anna:**
943
+
944
+ | Aspect | Sun-Abraham | Callaway-Sant'Anna |
945
+ |--------|-------------|-------------------|
946
+ | Approach | Interaction-weighted regression | 2x2 DiD aggregation |
947
+ | Efficiency | More efficient under homogeneous effects | More robust to heterogeneity |
948
+ | Weighting | Weights by cohort share at each relative time | Weights by sample size |
949
+ | Use case | Robustness check, regression-based inference | Primary staggered DiD estimator |
950
+
951
+ **Both estimators should give similar results when:**
952
+ - Treatment effects are relatively homogeneous across cohorts
953
+ - Parallel trends holds
954
+
955
+ **Running both as robustness check:**
956
+
957
+ ```python
958
+ from diff_diff import CallawaySantAnna, SunAbraham
959
+
960
+ # Callaway-Sant'Anna
961
+ cs = CallawaySantAnna()
962
+ cs_results = cs.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat')
963
+
964
+ # Sun-Abraham
965
+ sa = SunAbraham()
966
+ sa_results = sa.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat')
967
+
968
+ # Compare
969
+ print(f"Callaway-Sant'Anna ATT: {cs_results.overall_att:.3f}")
970
+ print(f"Sun-Abraham ATT: {sa_results.overall_att:.3f}")
971
+
972
+ # If results differ substantially, investigate heterogeneity
973
+ ```
974
+
975
+ ### Borusyak-Jaravel-Spiess Imputation Estimator
976
+
977
+ The Borusyak et al. (2024) imputation estimator is the **efficient** estimator for staggered DiD under parallel trends, producing ~50% shorter confidence intervals than Callaway-Sant'Anna and 2-3.5x shorter than Sun-Abraham under homogeneous treatment effects.
978
+
979
+ ```python
980
+ from diff_diff import ImputationDiD, imputation_did
981
+
982
+ # Basic usage
983
+ est = ImputationDiD()
984
+ results = est.fit(data, outcome='outcome', unit='unit',
985
+ time='period', first_treat='first_treat')
986
+ results.print_summary()
987
+
988
+ # Event study
989
+ results = est.fit(data, outcome='outcome', unit='unit',
990
+ time='period', first_treat='first_treat',
991
+ aggregate='event_study')
992
+
993
+ # Pre-trend test (Equation 9)
994
+ pt = results.pretrend_test(n_leads=3)
995
+ print(f"F-stat: {pt['f_stat']:.3f}, p-value: {pt['p_value']:.4f}")
996
+
997
+ # Convenience function
998
+ results = imputation_did(data, 'outcome', 'unit', 'period', 'first_treat',
999
+ aggregate='all')
1000
+ ```
1001
+
1002
+ ```python
1003
+ ImputationDiD(
1004
+ anticipation=0, # Number of anticipation periods
1005
+ alpha=0.05, # Significance level
1006
+ cluster=None, # Cluster variable (defaults to unit)
1007
+ n_bootstrap=0, # Bootstrap iterations (0=analytical inference)
1008
+ seed=None, # Random seed
1009
+ horizon_max=None, # Max event-study horizon
1010
+ aux_partition="cohort_horizon", # Variance partition: "cohort_horizon", "cohort", "horizon"
1011
+ )
1012
+ ```
1013
+
1014
+ **When to use Imputation DiD vs Callaway-Sant'Anna:**
1015
+
1016
+ | Aspect | Imputation DiD | Callaway-Sant'Anna |
1017
+ |--------|---------------|-------------------|
1018
+ | Efficiency | Most efficient under homogeneous effects | Less efficient but more robust to heterogeneity |
1019
+ | Control group | Always uses all untreated obs | Choice of never-treated or not-yet-treated |
1020
+ | Inference | Conservative variance (Theorem 3) | Multiplier bootstrap |
1021
+ | Pre-trends | Built-in F-test (Equation 9) | Separate testing |
1022
+
1023
+ ### Two-Stage DiD (Gardner 2022)
1024
+
1025
+ Two-Stage DiD addresses TWFE bias in staggered adoption designs by estimating unit and time fixed effects on untreated observations only, then regressing the residualized outcomes on treatment indicators. Point estimates match the Imputation DiD estimator (Borusyak et al. 2024); the key difference is that Two-Stage DiD uses a GMM sandwich variance estimator that accounts for first-stage estimation error, while Imputation DiD uses a conservative variance (Theorem 3).
1026
+
1027
+ ```python
1028
+ from diff_diff import TwoStageDiD
1029
+
1030
+ # Basic usage
1031
+ est = TwoStageDiD()
1032
+ results = est.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat')
1033
+ results.print_summary()
1034
+ ```
1035
+
1036
+ **Event study:**
1037
+
1038
+ ```python
1039
+ # Event study aggregation with visualization
1040
+ results = est.fit(data, outcome='outcome', unit='unit', time='period',
1041
+ first_treat='first_treat', aggregate='event_study')
1042
+ plot_event_study(results)
1043
+ ```
1044
+
1045
+ **Parameters:**
1046
+
1047
+ ```python
1048
+ TwoStageDiD(
1049
+ anticipation=0, # Periods of anticipation effects
1050
+ alpha=0.05, # Significance level for CIs
1051
+ cluster=None, # Column for cluster-robust SEs (defaults to unit)
1052
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical GMM SEs)
1053
+ seed=None, # Random seed
1054
+ rank_deficient_action='warn', # 'warn', 'error', or 'silent'
1055
+ horizon_max=None, # Max event-study horizon
1056
+ )
1057
+ ```
1058
+
1059
+ **When to use Two-Stage DiD vs Imputation DiD:**
1060
+
1061
+ | Aspect | Two-Stage DiD | Imputation DiD |
1062
+ |--------|--------------|---------------|
1063
+ | Point estimates | Identical | Identical |
1064
+ | Variance | GMM sandwich (accounts for first-stage error) | Conservative (Theorem 3, may overcover) |
1065
+ | Intuition | Residualize then regress | Impute counterfactuals then aggregate |
1066
+ | Reference impl. | R `did2s` package | R `didimputation` package |
1067
+
1068
+ Both estimators are the efficient estimator under homogeneous treatment effects, producing shorter confidence intervals than Callaway-Sant'Anna or Sun-Abraham.
1069
+
1070
+ ### Stacked DiD (Wing, Freedman & Hollingsworth 2024)
1071
+
1072
+ Stacked DiD addresses TWFE bias in staggered adoption settings by constructing a "clean" comparison dataset for each treatment cohort and stacking them together. Each cohort's sub-experiment compares units treated at that cohort's timing against units that are not yet treated (or never treated) within a symmetric event-study window. This avoids the "bad comparisons" problem in TWFE while retaining a regression-based framework that practitioners familiar with event studies will find intuitive.
1073
+
1074
+ ```python
1075
+ from diff_diff import StackedDiD, generate_staggered_data
1076
+
1077
+ # Generate sample data
1078
+ data = generate_staggered_data(n_units=200, n_periods=12,
1079
+ cohort_periods=[4, 6, 8], seed=42)
1080
+
1081
+ # Fit stacked DiD with event study
1082
+ est = StackedDiD(kappa_pre=2, kappa_post=2)
1083
+ results = est.fit(data, outcome='outcome', unit='unit',
1084
+ time='period', first_treat='first_treat',
1085
+ aggregate='event_study')
1086
+ results.print_summary()
1087
+
1088
+ # Access stacked data for custom analysis
1089
+ stacked = results.stacked_data
1090
+
1091
+ # Convenience function
1092
+ from diff_diff import stacked_did
1093
+ results = stacked_did(data, 'outcome', 'unit', 'period', 'first_treat',
1094
+ kappa_pre=2, kappa_post=2, aggregate='event_study')
1095
+ ```
1096
+
1097
+ **Parameters:**
1098
+
1099
+ ```python
1100
+ StackedDiD(
1101
+ kappa_pre=1, # Pre-treatment event-study periods
1102
+ kappa_post=1, # Post-treatment event-study periods
1103
+ weighting='aggregate', # 'aggregate', 'population', or 'sample_share'
1104
+ clean_control='not_yet_treated', # 'not_yet_treated', 'strict', or 'never_treated'
1105
+ cluster='unit', # 'unit' or 'unit_subexp'
1106
+ alpha=0.05, # Significance level
1107
+ anticipation=0, # Anticipation periods
1108
+ rank_deficient_action='warn', # 'warn', 'error', or 'silent'
1109
+ )
1110
+ ```
1111
+
1112
+ > **Note:** Group aggregation (`aggregate='group'`) is not supported because the pooled
1113
+ > stacked regression cannot produce cohort-specific effects. Use `CallawaySantAnna` or
1114
+ > `ImputationDiD` for cohort-level estimates.
1115
+
1116
+ **When to use Stacked DiD vs Callaway-Sant'Anna:**
1117
+
1118
+ | Aspect | Stacked DiD | Callaway-Sant'Anna |
1119
+ |--------|-------------|-------------------|
1120
+ | Approach | Stack cohort sub-experiments, run pooled TWFE | 2x2 DiD aggregation |
1121
+ | Symmetric windows | Enforced via kappa_pre / kappa_post | Not required |
1122
+ | Control group | Not-yet-treated (default) or never-treated | Never-treated or not-yet-treated |
1123
+ | Covariates | Passed to pooled regression | Doubly robust / IPW |
1124
+ | Intuition | Familiar event-study regression | Nonparametric aggregation |
1125
+
1126
+ **Convenience function:**
1127
+
1128
+ ```python
1129
+ # One-liner estimation
1130
+ results = stacked_did(
1131
+ data,
1132
+ outcome='outcome',
1133
+ unit='unit',
1134
+ time='period',
1135
+ first_treat='first_treat',
1136
+ kappa_pre=3,
1137
+ kappa_post=3,
1138
+ aggregate='event_study'
1139
+ )
1140
+ ```
1141
+
1142
+ ### Efficient DiD (Chen, Sant'Anna & Xie 2025)
1143
+
1144
+ Efficient DiD achieves the semiparametric efficiency bound for ATT estimation in staggered adoption designs. It optimally weights across all valid comparison groups and baselines via the inverse covariance matrix Omega*, producing tighter confidence intervals than standard estimators like Callaway-Sant'Anna when the stronger PT-All assumption holds.
1145
+
1146
+ ```python
1147
+ from diff_diff import EfficientDiD, generate_staggered_data
1148
+
1149
+ # Generate sample data
1150
+ data = generate_staggered_data(n_units=300, n_periods=10,
1151
+ cohort_periods=[4, 6, 8], seed=42)
1152
+
1153
+ # Fit with PT-All (overidentified, tighter SEs)
1154
+ edid = EfficientDiD(pt_assumption="all")
1155
+ results = edid.fit(data, outcome='outcome', unit='unit',
1156
+ time='period', first_treat='first_treat',
1157
+ aggregate='all')
1158
+ results.print_summary()
1159
+
1160
+ # PT-Post mode (matches CS for post-treatment effects)
1161
+ edid_post = EfficientDiD(pt_assumption="post")
1162
+ results_post = edid_post.fit(data, outcome='outcome', unit='unit',
1163
+ time='period', first_treat='first_treat')
1164
+ ```
1165
+
1166
+ **Parameters:**
1167
+
1168
+ ```python
1169
+ EfficientDiD(
1170
+ pt_assumption='all', # 'all' (overidentified) or 'post' (matches CS post-treatment ATT)
1171
+ alpha=0.05, # Significance level
1172
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical only)
1173
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
1174
+ seed=None, # Random seed
1175
+ anticipation=0, # Anticipation periods
1176
+ )
1177
+ ```
1178
+
1179
+ > **Note:** Phase 1 supports the no-covariates path only. Use CallawaySantAnna with
1180
+ > `estimation_method='dr'` if you need covariate adjustment.
1181
+
1182
+ **When to use Efficient DiD vs Callaway-Sant'Anna:**
1183
+
1184
+ | Aspect | Efficient DiD | Callaway-Sant'Anna |
1185
+ |--------|--------------|-------------------|
1186
+ | Approach | Optimal EIF-based weighting | Separate 2x2 DiD aggregation |
1187
+ | PT assumption | PT-All (stronger) or PT-Post | Conditional PT |
1188
+ | Efficiency | Achieves semiparametric bound | Not efficient |
1189
+ | Covariates | Not yet (Phase 2) | Supported (OR, IPW, DR) |
1190
+ | When to choose | Maximum efficiency, PT-All credible | Covariates needed, weaker PT |
1191
+
1192
+ ### Triple Difference (DDD)
1193
+
1194
+ Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).
1195
+
1196
+ ```python
1197
+ from diff_diff import TripleDifference, triple_difference
1198
+
1199
+ # Basic usage
1200
+ ddd = TripleDifference(estimation_method='dr') # doubly robust (recommended)
1201
+ results = ddd.fit(
1202
+ data,
1203
+ outcome='wages',
1204
+ group='policy_state', # 1=state enacted policy, 0=control state
1205
+ partition='female', # 1=women (affected by policy), 0=men
1206
+ time='post' # 1=post-policy, 0=pre-policy
1207
+ )
1208
+
1209
+ # View results
1210
+ results.print_summary()
1211
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
1212
+
1213
+ # With covariates (properly incorporated, unlike naive DDD)
1214
+ results = ddd.fit(
1215
+ data,
1216
+ outcome='wages',
1217
+ group='policy_state',
1218
+ partition='female',
1219
+ time='post',
1220
+ covariates=['age', 'education', 'experience']
1221
+ )
1222
+ ```
1223
+
1224
+ **Estimation methods:**
1225
+
1226
+ | Method | Description | When to use |
1227
+ |--------|-------------|-------------|
1228
+ | `"dr"` | Doubly robust | Recommended. Consistent if either outcome or propensity model is correct |
1229
+ | `"reg"` | Regression adjustment | Simple outcome regression with full interactions |
1230
+ | `"ipw"` | Inverse probability weighting | When propensity score model is well-specified |
1231
+
1232
+ ```python
1233
+ # Compare estimation methods
1234
+ for method in ['reg', 'ipw', 'dr']:
1235
+ est = TripleDifference(estimation_method=method)
1236
+ res = est.fit(data, outcome='y', group='g', partition='p', time='t')
1237
+ print(f"{method}: ATT={res.att:.3f} (SE={res.se:.3f})")
1238
+ ```
1239
+
1240
+ **Convenience function:**
1241
+
1242
+ ```python
1243
+ # One-liner estimation
1244
+ results = triple_difference(
1245
+ data,
1246
+ outcome='wages',
1247
+ group='policy_state',
1248
+ partition='female',
1249
+ time='post',
1250
+ covariates=['age', 'education'],
1251
+ estimation_method='dr'
1252
+ )
1253
+ ```
1254
+
1255
+ **Why use DDD instead of DiD?**
1256
+
1257
+ DDD allows for violations of parallel trends that are:
1258
+ - Group-specific (e.g., economic shocks in treatment states)
1259
+ - Partition-specific (e.g., trends affecting women everywhere)
1260
+
1261
+ As long as these biases are additive, DDD differences them out. The key assumption is that the *differential* trend between eligible and ineligible units would be the same across groups.
1262
+
1263
+ ### Event Study Visualization
1264
+
1265
+ Create publication-ready event study plots:
1266
+
1267
+ ```python
1268
+ from diff_diff import plot_event_study, MultiPeriodDiD, CallawaySantAnna, SunAbraham
1269
+
1270
+ # From MultiPeriodDiD (full event study with pre and post period effects)
1271
+ did = MultiPeriodDiD()
1272
+ results = did.fit(data, outcome='y', treatment='treated',
1273
+ time='period', post_periods=[3, 4, 5], reference_period=2)
1274
+ plot_event_study(results, title="Treatment Effects Over Time")
1275
+
1276
+ # From CallawaySantAnna (with event study aggregation)
1277
+ cs = CallawaySantAnna()
1278
+ results = cs.fit(data, outcome='y', unit='unit', time='period',
1279
+ first_treat='first_treat', aggregate='event_study')
1280
+ plot_event_study(results, title="Staggered DiD Event Study (CS)")
1281
+
1282
+ # From SunAbraham
1283
+ sa = SunAbraham()
1284
+ results = sa.fit(data, outcome='y', unit='unit', time='period',
1285
+ first_treat='first_treat')
1286
+ plot_event_study(results, title="Staggered DiD Event Study (SA)")
1287
+
1288
+ # From a DataFrame
1289
+ df = pd.DataFrame({
1290
+ 'period': [-2, -1, 0, 1, 2],
1291
+ 'effect': [0.1, 0.05, 0.0, 2.5, 2.8],
1292
+ 'se': [0.3, 0.25, 0.0, 0.4, 0.45]
1293
+ })
1294
+ plot_event_study(df, reference_period=0)
1295
+
1296
+ # With customization
1297
+ ax = plot_event_study(
1298
+ results,
1299
+ title="Dynamic Treatment Effects",
1300
+ xlabel="Years Relative to Treatment",
1301
+ ylabel="Effect on Sales ($1000s)",
1302
+ color="#2563eb",
1303
+ marker="o",
1304
+ shade_pre=True, # Shade pre-treatment region
1305
+ show_zero_line=True, # Horizontal line at y=0
1306
+ show_reference_line=True, # Vertical line at reference period
1307
+ figsize=(10, 6),
1308
+ show=False # Don't call plt.show(), return axes
1309
+ )
1310
+ ```
1311
+
1312
+ ### Synthetic Difference-in-Differences
1313
+
1314
+ Synthetic DiD combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units' pre-treatment outcomes.
1315
+
1316
+ ```python
1317
+ from diff_diff import SyntheticDiD
1318
+
1319
+ # Fit Synthetic DiD model
1320
+ sdid = SyntheticDiD()
1321
+ results = sdid.fit(
1322
+ panel_data,
1323
+ outcome='gdp_growth',
1324
+ treatment='treated',
1325
+ unit='state',
1326
+ time='year',
1327
+ post_periods=[2015, 2016, 2017, 2018]
1328
+ )
1329
+
1330
+ # View results
1331
+ results.print_summary()
1332
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
1333
+
1334
+ # Examine unit weights (which control units matter most)
1335
+ weights_df = results.get_unit_weights_df()
1336
+ print(weights_df.head(10))
1337
+
1338
+ # Examine time weights
1339
+ time_weights_df = results.get_time_weights_df()
1340
+ print(time_weights_df)
1341
+ ```
1342
+
1343
+ Output:
1344
+ ```
1345
+ ===========================================================================
1346
+ Synthetic Difference-in-Differences Estimation Results
1347
+ ===========================================================================
1348
+
1349
+ Observations: 500
1350
+ Treated units: 1
1351
+ Control units: 49
1352
+ Pre-treatment periods: 6
1353
+ Post-treatment periods: 4
1354
+ Regularization (lambda): 0.0000
1355
+ Pre-treatment fit (RMSE): 0.1234
1356
+
1357
+ ---------------------------------------------------------------------------
1358
+ Parameter Estimate Std. Err. t-stat P>|t|
1359
+ ---------------------------------------------------------------------------
1360
+ ATT 2.5000 0.4521 5.530 0.0000
1361
+ ---------------------------------------------------------------------------
1362
+
1363
+ 95% Confidence Interval: [1.6139, 3.3861]
1364
+
1365
+ ---------------------------------------------------------------------------
1366
+ Top Unit Weights (Synthetic Control)
1367
+ ---------------------------------------------------------------------------
1368
+ Unit state_12: 0.3521
1369
+ Unit state_5: 0.2156
1370
+ Unit state_23: 0.1834
1371
+ Unit state_8: 0.1245
1372
+ Unit state_31: 0.0892
1373
+ (8 units with weight > 0.001)
1374
+
1375
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
1376
+ ===========================================================================
1377
+ ```
1378
+
1379
+ #### When to Use Synthetic DiD Over Vanilla DiD
1380
+
1381
+ Use Synthetic DiD instead of standard DiD when:
1382
+
1383
+ 1. **Few treated units**: When you have only one or a small number of treated units (e.g., a single state passed a policy), standard DiD averages across all controls equally. Synthetic DiD finds the optimal weighted combination of controls.
1384
+
1385
+ ```python
1386
+ # Example: California passed a policy, want to estimate its effect
1387
+ # Standard DiD would compare CA to the average of all other states
1388
+ # Synthetic DiD finds states that together best match CA's pre-treatment trend
1389
+ ```
1390
+
1391
+ 2. **Parallel trends is questionable**: When treated and control groups have different pre-treatment levels or trends, Synthetic DiD can construct a better counterfactual by matching the pre-treatment trajectory.
1392
+
1393
+ ```python
1394
+ # Example: A tech hub city vs rural areas
1395
+ # Rural areas may not be a good comparison on average
1396
+ # Synthetic DiD can weight urban/suburban controls more heavily
1397
+ ```
1398
+
1399
+ 3. **Heterogeneous control units**: When control units are very different from each other, equal weighting (as in standard DiD) is suboptimal.
1400
+
1401
+ ```python
1402
+ # Example: Comparing a treated developing country to other countries
1403
+ # Some control countries may be much more similar economically
1404
+ # Synthetic DiD upweights the most comparable controls
1405
+ ```
1406
+
1407
+ 4. **You want transparency**: Synthetic DiD provides explicit unit weights showing which controls contribute most to the comparison.
1408
+
1409
+ ```python
1410
+ # See exactly which units are driving the counterfactual
1411
+ print(results.get_unit_weights_df())
1412
+ ```
1413
+
1414
+ **Key differences from standard DiD:**
1415
+
1416
+ | Aspect | Standard DiD | Synthetic DiD |
1417
+ |--------|--------------|---------------|
1418
+ | Control weighting | Equal (1/N) | Optimized to match pre-treatment |
1419
+ | Time weighting | Equal across periods | Can emphasize informative periods |
1420
+ | N treated required | Can be many | Works with 1 treated unit |
1421
+ | Parallel trends | Assumed | Partially relaxed via matching |
1422
+ | Interpretability | Simple average | Explicit weights |
1423
+
1424
+ **Parameters:**
1425
+
1426
+ ```python
1427
+ SyntheticDiD(
1428
+ zeta_omega=None, # Unit weight regularization (None = auto-computed from data)
1429
+ zeta_lambda=None, # Time weight regularization (None = auto-computed from data)
1430
+ alpha=0.05, # Significance level
1431
+ variance_method="placebo", # "placebo" (default, matches R) or "bootstrap"
1432
+ n_bootstrap=200, # Replications for SE estimation
1433
+ seed=None # Random seed for reproducibility
1434
+ )
1435
+ ```
1436
+
1437
+ ### Triply Robust Panel (TROP)
1438
+
1439
+ TROP (Athey, Imbens, Qu & Viviano 2025) extends Synthetic DiD by adding interactive fixed effects (factor model) adjustment. It's particularly useful when there are unobserved time-varying confounders with a factor structure that could bias standard DiD or SDID estimates.
1440
+
1441
+ TROP combines three robustness components:
1442
+ 1. **Nuclear norm regularized factor model**: Estimates interactive fixed effects L_it via soft-thresholding
1443
+ 2. **Exponential distance-based unit weights**: ω_j = exp(-λ_unit × distance(j,i))
1444
+ 3. **Exponential time decay weights**: θ_s = exp(-λ_time × |s-t|)
1445
+
1446
+ Tuning parameters are selected via leave-one-out cross-validation (LOOCV).
1447
+
1448
+ ```python
1449
+ from diff_diff import TROP, trop
1450
+
1451
+ # Fit TROP model with automatic tuning via LOOCV
1452
+ trop_est = TROP(
1453
+ lambda_time_grid=[0.0, 0.5, 1.0, 2.0], # Time decay grid
1454
+ lambda_unit_grid=[0.0, 0.5, 1.0, 2.0], # Unit distance grid
1455
+ lambda_nn_grid=[0.0, 0.1, 1.0], # Nuclear norm grid
1456
+ n_bootstrap=200
1457
+ )
1458
+ # Note: TROP infers treatment periods from the treatment indicator column.
1459
+ # The 'treated' column must be an absorbing state (D=1 for all periods
1460
+ # during and after treatment starts for each unit).
1461
+ results = trop_est.fit(
1462
+ panel_data,
1463
+ outcome='gdp_growth',
1464
+ treatment='treated',
1465
+ unit='state',
1466
+ time='year'
1467
+ )
1468
+
1469
+ # View results
1470
+ results.print_summary()
1471
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
1472
+ print(f"Effective rank: {results.effective_rank:.2f}")
1473
+
1474
+ # Selected tuning parameters
1475
+ print(f"λ_time: {results.lambda_time:.2f}")
1476
+ print(f"λ_unit: {results.lambda_unit:.2f}")
1477
+ print(f"λ_nn: {results.lambda_nn:.2f}")
1478
+
1479
+ # Examine unit effects
1480
+ unit_effects = results.get_unit_effects_df()
1481
+ print(unit_effects.head(10))
1482
+ ```
1483
+
1484
+ Output:
1485
+ ```
1486
+ ===========================================================================
1487
+ Triply Robust Panel (TROP) Estimation Results
1488
+ Athey, Imbens, Qu & Viviano (2025)
1489
+ ===========================================================================
1490
+
1491
+ Observations: 500
1492
+ Treated units: 1
1493
+ Control units: 49
1494
+ Treated observations: 4
1495
+ Pre-treatment periods: 6
1496
+ Post-treatment periods: 4
1497
+
1498
+ ---------------------------------------------------------------------------
1499
+ Tuning Parameters (selected via LOOCV)
1500
+ ---------------------------------------------------------------------------
1501
+ Lambda (time decay): 1.0000
1502
+ Lambda (unit distance): 0.5000
1503
+ Lambda (nuclear norm): 0.1000
1504
+ Effective rank: 2.35
1505
+ LOOCV score: 0.012345
1506
+ Variance method: bootstrap
1507
+ Bootstrap replications: 200
1508
+
1509
+ ---------------------------------------------------------------------------
1510
+ Parameter Estimate Std. Err. t-stat P>|t|
1511
+ ---------------------------------------------------------------------------
1512
+ ATT 2.5000 0.3892 6.424 0.0000 ***
1513
+ ---------------------------------------------------------------------------
1514
+
1515
+ 95% Confidence Interval: [1.7372, 3.2628]
1516
+
1517
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
1518
+ ===========================================================================
1519
+ ```
1520
+
1521
+ #### When to Use TROP Over Synthetic DiD
1522
+
1523
+ Use TROP when you suspect **factor structure** in the data—unobserved confounders that affect outcomes differently across units and time:
1524
+
1525
+ | Scenario | Use SDID | Use TROP |
1526
+ |----------|----------|----------|
1527
+ | Simple parallel trends | ✓ | ✓ |
1528
+ | Unobserved factors (e.g., economic cycles) | May be biased | ✓ |
1529
+ | Strong unit-time interactions | May be biased | ✓ |
1530
+ | Low-dimensional confounding | ✓ | ✓ |
1531
+
1532
+ **Example scenarios where TROP excels:**
1533
+ - Regional economic shocks that affect states differently based on industry composition
1534
+ - Global trends that impact countries differently based on their economic structure
1535
+ - Common factors in financial data (market risk, interest rates, etc.)
1536
+
1537
+ **How TROP works:**
1538
+
1539
+ 1. **Factor estimation**: Estimates interactive fixed effects L_it using nuclear norm regularization (encourages low-rank structure)
1540
+ 2. **Unit weights**: Exponential distance-based weighting ω_j = exp(-λ_unit × d(j,i)) where d(j,i) is the RMSE of outcome differences
1541
+ 3. **Time weights**: Exponential decay weighting θ_s = exp(-λ_time × |s-t|) based on proximity to treatment
1542
+ 4. **ATT computation**: τ = Y_it - α_i - β_t - L_it for treated observations
1543
+
1544
+ ```python
1545
+ # Compare TROP vs SDID under factor confounding
1546
+ from diff_diff import SyntheticDiD
1547
+
1548
+ # Synthetic DiD (may be biased with factors)
1549
+ sdid = SyntheticDiD()
1550
+ sdid_results = sdid.fit(data, outcome='y', treatment='treated',
1551
+ unit='unit', time='time', post_periods=[5,6,7])
1552
+
1553
+ # TROP (accounts for factors)
1554
+ # Note: TROP infers treatment periods from the treatment indicator column
1555
+ # (D=1 for treated observations, D=0 for control)
1556
+ trop_est = TROP() # Uses default grids with LOOCV selection
1557
+ trop_results = trop_est.fit(data, outcome='y', treatment='treated',
1558
+ unit='unit', time='time')
1559
+
1560
+ print(f"SDID estimate: {sdid_results.att:.3f}")
1561
+ print(f"TROP estimate: {trop_results.att:.3f}")
1562
+ print(f"Effective rank: {trop_results.effective_rank:.2f}")
1563
+ ```
1564
+
1565
+ **Tuning parameter grids:**
1566
+
1567
+ ```python
1568
+ # Custom tuning grids (searched via LOOCV)
1569
+ trop = TROP(
1570
+ lambda_time_grid=[0.0, 0.1, 0.5, 1.0, 2.0, 5.0], # Time decay
1571
+ lambda_unit_grid=[0.0, 0.1, 0.5, 1.0, 2.0, 5.0], # Unit distance
1572
+ lambda_nn_grid=[0.0, 0.01, 0.1, 1.0, 10.0] # Nuclear norm
1573
+ )
1574
+
1575
+ # Fixed tuning parameters (skip LOOCV search)
1576
+ trop = TROP(
1577
+ lambda_time_grid=[1.0], # Single value = fixed
1578
+ lambda_unit_grid=[1.0], # Single value = fixed
1579
+ lambda_nn_grid=[0.1] # Single value = fixed
1580
+ )
1581
+ ```
1582
+
1583
+ **Parameters:**
1584
+
1585
+ ```python
1586
+ TROP(
1587
+ method='local', # Estimation method: 'local' (default) or 'global'
1588
+ lambda_time_grid=None, # Time decay grid (default: [0, 0.1, 0.5, 1, 2, 5])
1589
+ lambda_unit_grid=None, # Unit distance grid (default: [0, 0.1, 0.5, 1, 2, 5])
1590
+ lambda_nn_grid=None, # Nuclear norm grid (default: [0, 0.01, 0.1, 1, 10])
1591
+ max_iter=100, # Max iterations for factor estimation
1592
+ tol=1e-6, # Convergence tolerance
1593
+ alpha=0.05, # Significance level
1594
+ n_bootstrap=200, # Bootstrap replications
1595
+ seed=None # Random seed
1596
+ )
1597
+ ```
1598
+
1599
+ **Estimation methods:**
1600
+ - `'local'` (default): Per-observation model fitting following Algorithm 2 of the paper. Computes observation-specific weights and fits a model for each treated observation, then averages the individual treatment effects. More flexible but computationally intensive.
1601
+ - `'global'`: Global weighted least squares optimization. Fits a single model on control observations with global weights, then computes per-observation treatment effects as residuals. Faster but uses global rather than observation-specific weights.
1602
+
1603
+ **Convenience function:**
1604
+
1605
+ ```python
1606
+ # One-liner estimation with default tuning grids
1607
+ # Note: TROP infers treatment periods from the treatment indicator
1608
+ results = trop(
1609
+ data,
1610
+ outcome='y',
1611
+ treatment='treated',
1612
+ unit='unit',
1613
+ time='time',
1614
+ n_bootstrap=200
1615
+ )
1616
+ ```
1617
+
1618
+ ## Working with Results
1619
+
1620
+ ### Export Results
1621
+
1622
+ ```python
1623
+ # As dictionary
1624
+ results.to_dict()
1625
+ # {'att': 3.5, 'se': 1.26, 'p_value': 0.037, ...}
1626
+
1627
+ # As DataFrame
1628
+ df = results.to_dataframe()
1629
+ ```
1630
+
1631
+ ### Check Significance
1632
+
1633
+ ```python
1634
+ if results.is_significant:
1635
+ print(f"Effect is significant at {did.alpha} level")
1636
+
1637
+ # Get significance stars
1638
+ print(f"ATT: {results.att}{results.significance_stars}")
1639
+ # ATT: 3.5000*
1640
+ ```
1641
+
1642
+ ### Access Full Regression Output
1643
+
1644
+ ```python
1645
+ # All coefficients
1646
+ results.coefficients
1647
+ # {'const': 9.5, 'treated': 1.0, 'post': 2.5, 'treated:post': 3.5}
1648
+
1649
+ # Variance-covariance matrix
1650
+ results.vcov
1651
+
1652
+ # Residuals and fitted values
1653
+ results.residuals
1654
+ results.fitted_values
1655
+
1656
+ # R-squared
1657
+ results.r_squared
1658
+ ```
1659
+
1660
+ ## Checking Assumptions
1661
+
1662
+ ### Parallel Trends
1663
+
1664
+ **Simple slope-based test:**
1665
+
1666
+ ```python
1667
+ from diff_diff.utils import check_parallel_trends
1668
+
1669
+ trends = check_parallel_trends(
1670
+ data,
1671
+ outcome='outcome',
1672
+ time='period',
1673
+ treatment_group='treated'
1674
+ )
1675
+
1676
+ print(f"Treated trend: {trends['treated_trend']:.4f}")
1677
+ print(f"Control trend: {trends['control_trend']:.4f}")
1678
+ print(f"Difference p-value: {trends['p_value']:.4f}")
1679
+ ```
1680
+
1681
+ **Robust distributional test (Wasserstein distance):**
1682
+
1683
+ ```python
1684
+ from diff_diff.utils import check_parallel_trends_robust
1685
+
1686
+ results = check_parallel_trends_robust(
1687
+ data,
1688
+ outcome='outcome',
1689
+ time='period',
1690
+ treatment_group='treated',
1691
+ unit='firm_id', # Unit identifier for panel data
1692
+ pre_periods=[2018, 2019], # Pre-treatment periods
1693
+ n_permutations=1000 # Permutations for p-value
1694
+ )
1695
+
1696
+ print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}")
1697
+ print(f"Wasserstein p-value: {results['wasserstein_p_value']:.4f}")
1698
+ print(f"KS test p-value: {results['ks_p_value']:.4f}")
1699
+ print(f"Parallel trends plausible: {results['parallel_trends_plausible']}")
1700
+ ```
1701
+
1702
+ The Wasserstein (Earth Mover's) distance compares the full distribution of outcome changes, not just means. This is more robust to:
1703
+ - Non-normal distributions
1704
+ - Heterogeneous effects across units
1705
+ - Outliers
1706
+
1707
+ **Equivalence testing (TOST):**
1708
+
1709
+ ```python
1710
+ from diff_diff.utils import equivalence_test_trends
1711
+
1712
+ results = equivalence_test_trends(
1713
+ data,
1714
+ outcome='outcome',
1715
+ time='period',
1716
+ treatment_group='treated',
1717
+ unit='firm_id',
1718
+ equivalence_margin=0.5 # Define "practically equivalent"
1719
+ )
1720
+
1721
+ print(f"Mean difference: {results['mean_difference']:.4f}")
1722
+ print(f"TOST p-value: {results['tost_p_value']:.4f}")
1723
+ print(f"Trends equivalent: {results['equivalent']}")
1724
+ ```
1725
+
1726
+ ### Honest DiD Sensitivity Analysis (Rambachan-Roth)
1727
+
1728
+ Pre-trends tests have low power and can exacerbate bias. **Honest DiD** (Rambachan & Roth 2023) provides sensitivity analysis showing how robust your results are to violations of parallel trends.
1729
+
1730
+ ```python
1731
+ from diff_diff import HonestDiD, MultiPeriodDiD
1732
+
1733
+ # First, fit a full event study (pre + post period effects)
1734
+ did = MultiPeriodDiD()
1735
+ event_results = did.fit(
1736
+ data,
1737
+ outcome='outcome',
1738
+ treatment='treated',
1739
+ time='period',
1740
+ post_periods=[5, 6, 7, 8, 9],
1741
+ reference_period=4, # Last pre-period (e=-1 convention)
1742
+ )
1743
+
1744
+ # Compute honest bounds with relative magnitudes restriction
1745
+ # M=1 means post-treatment violations can be up to 1x the worst pre-treatment violation
1746
+ honest = HonestDiD(method='relative_magnitude', M=1.0)
1747
+ honest_results = honest.fit(event_results)
1748
+
1749
+ print(honest_results.summary())
1750
+ print(f"Original estimate: {honest_results.original_estimate:.4f}")
1751
+ print(f"Robust 95% CI: [{honest_results.ci_lb:.4f}, {honest_results.ci_ub:.4f}]")
1752
+ print(f"Effect robust to violations: {honest_results.is_significant}")
1753
+ ```
1754
+
1755
+ **Sensitivity analysis over M values:**
1756
+
1757
+ ```python
1758
+ # How do results change as we allow larger violations?
1759
+ sensitivity = honest.sensitivity_analysis(
1760
+ event_results,
1761
+ M_grid=[0, 0.5, 1.0, 1.5, 2.0]
1762
+ )
1763
+
1764
+ print(sensitivity.summary())
1765
+ print(f"Breakdown value: M = {sensitivity.breakdown_M}")
1766
+ # Breakdown = smallest M where the robust CI includes zero
1767
+ ```
1768
+
1769
+ **Breakdown value:**
1770
+
1771
+ The breakdown value tells you how robust your conclusion is:
1772
+
1773
+ ```python
1774
+ breakdown = honest.breakdown_value(event_results)
1775
+ if breakdown >= 1.0:
1776
+ print("Result holds even if post-treatment violations are as bad as pre-treatment")
1777
+ else:
1778
+ print(f"Result requires violations smaller than {breakdown:.1f}x pre-treatment")
1779
+ ```
1780
+
1781
+ **Smoothness restriction (alternative approach):**
1782
+
1783
+ ```python
1784
+ # Bounds second differences of trend violations
1785
+ # M=0 means linear extrapolation of pre-trends
1786
+ honest_smooth = HonestDiD(method='smoothness', M=0.5)
1787
+ smooth_results = honest_smooth.fit(event_results)
1788
+ ```
1789
+
1790
+ **Visualization:**
1791
+
1792
+ ```python
1793
+ from diff_diff import plot_sensitivity, plot_honest_event_study
1794
+
1795
+ # Plot sensitivity analysis
1796
+ plot_sensitivity(sensitivity, title="Sensitivity to Parallel Trends Violations")
1797
+
1798
+ # Event study with honest confidence intervals
1799
+ plot_honest_event_study(event_results, honest_results)
1800
+ ```
1801
+
1802
+ ### Pre-Trends Power Analysis (Roth 2022)
1803
+
1804
+ A passing pre-trends test doesn't mean parallel trends holds—it may just mean the test has low power. **Pre-Trends Power Analysis** (Roth 2022) answers: "What violations could my pre-trends test have detected?"
1805
+
1806
+ ```python
1807
+ from diff_diff import PreTrendsPower, MultiPeriodDiD
1808
+
1809
+ # First, fit a full event study
1810
+ did = MultiPeriodDiD()
1811
+ event_results = did.fit(
1812
+ data,
1813
+ outcome='outcome',
1814
+ treatment='treated',
1815
+ time='period',
1816
+ post_periods=[5, 6, 7, 8, 9],
1817
+ reference_period=4,
1818
+ )
1819
+
1820
+ # Analyze pre-trends test power
1821
+ pt = PreTrendsPower(alpha=0.05, power=0.80)
1822
+ power_results = pt.fit(event_results)
1823
+
1824
+ print(power_results.summary())
1825
+ print(f"Minimum Detectable Violation (MDV): {power_results.mdv:.4f}")
1826
+ print(f"Power to detect violations of size MDV: {power_results.power:.1%}")
1827
+ ```
1828
+
1829
+ **Key concepts:**
1830
+
1831
+ - **Minimum Detectable Violation (MDV)**: Smallest violation magnitude that would be detected with your target power (e.g., 80%). Passing the pre-trends test does NOT rule out violations up to this size.
1832
+ - **Power**: Probability of detecting a violation of given size if it exists.
1833
+ - **Violation types**: Linear trend, constant violation, last-period only, or custom patterns.
1834
+
1835
+ **Power curve visualization:**
1836
+
1837
+ ```python
1838
+ from diff_diff import plot_pretrends_power
1839
+
1840
+ # Generate power curve across violation magnitudes
1841
+ curve = pt.power_curve(event_results)
1842
+
1843
+ # Plot the power curve
1844
+ plot_pretrends_power(curve, title="Pre-Trends Test Power Curve")
1845
+
1846
+ # Or from the curve object directly
1847
+ curve.plot()
1848
+ ```
1849
+
1850
+ **Different violation patterns:**
1851
+
1852
+ ```python
1853
+ # Linear trend violations (default) - most common assumption
1854
+ pt_linear = PreTrendsPower(violation_type='linear')
1855
+
1856
+ # Constant violation in all pre-periods
1857
+ pt_constant = PreTrendsPower(violation_type='constant')
1858
+
1859
+ # Violation only in the last pre-period (sharp break)
1860
+ pt_last = PreTrendsPower(violation_type='last_period')
1861
+
1862
+ # Custom violation pattern
1863
+ custom_weights = np.array([0.1, 0.3, 0.6]) # Increasing violations
1864
+ pt_custom = PreTrendsPower(violation_type='custom', violation_weights=custom_weights)
1865
+ ```
1866
+
1867
+ **Combining with HonestDiD:**
1868
+
1869
+ Pre-trends power analysis and HonestDiD are complementary:
1870
+ 1. **Pre-trends power** tells you what the test could have detected
1871
+ 2. **HonestDiD** tells you how robust your results are to violations
1872
+
1873
+ ```python
1874
+ from diff_diff import HonestDiD, PreTrendsPower
1875
+
1876
+ # If MDV is large relative to your estimated effect, be cautious
1877
+ pt = PreTrendsPower()
1878
+ power_results = pt.fit(event_results)
1879
+ sensitivity = pt.sensitivity_to_honest_did(event_results)
1880
+ print(sensitivity['interpretation'])
1881
+
1882
+ # Use HonestDiD for robust inference
1883
+ honest = HonestDiD(method='relative_magnitude', M=1.0)
1884
+ honest_results = honest.fit(event_results)
1885
+ ```
1886
+
1887
+ ### Placebo Tests
1888
+
1889
+ Placebo tests help validate the parallel trends assumption by checking whether effects appear where they shouldn't (before treatment or in untreated groups).
1890
+
1891
+ **Fake timing test:**
1892
+
1893
+ ```python
1894
+ from diff_diff import run_placebo_test
1895
+
1896
+ # Test: Is there an effect before treatment actually occurred?
1897
+ # Actual treatment is at period 3 (post_periods=[3, 4, 5])
1898
+ # We test if a "fake" treatment at period 1 shows an effect
1899
+ results = run_placebo_test(
1900
+ data,
1901
+ outcome='outcome',
1902
+ treatment='treated',
1903
+ time='period',
1904
+ test_type='fake_timing',
1905
+ fake_treatment_period=1, # Pretend treatment was in period 1
1906
+ post_periods=[3, 4, 5] # Actual post-treatment periods
1907
+ )
1908
+
1909
+ print(results.summary())
1910
+ # If parallel trends hold, placebo_effect should be ~0 and not significant
1911
+ print(f"Placebo effect: {results.placebo_effect:.3f} (p={results.p_value:.3f})")
1912
+ print(f"Is significant (bad): {results.is_significant}")
1913
+ ```
1914
+
1915
+ **Fake group test:**
1916
+
1917
+ ```python
1918
+ # Test: Is there an effect among never-treated units?
1919
+ # Get some control unit IDs to use as "fake treated"
1920
+ control_units = data[data['treated'] == 0]['firm_id'].unique()[:5]
1921
+
1922
+ results = run_placebo_test(
1923
+ data,
1924
+ outcome='outcome',
1925
+ treatment='treated',
1926
+ time='period',
1927
+ unit='firm_id',
1928
+ test_type='fake_group',
1929
+ fake_treatment_group=list(control_units), # List of control unit IDs
1930
+ post_periods=[3, 4, 5]
1931
+ )
1932
+ ```
1933
+
1934
+ **Permutation test:**
1935
+
1936
+ ```python
1937
+ # Randomly reassign treatment and compute distribution of effects
1938
+ # Note: requires binary post indicator (use 'post' column, not 'period')
1939
+ results = run_placebo_test(
1940
+ data,
1941
+ outcome='outcome',
1942
+ treatment='treated',
1943
+ time='post', # Binary post-treatment indicator
1944
+ unit='firm_id',
1945
+ test_type='permutation',
1946
+ n_permutations=1000,
1947
+ seed=42
1948
+ )
1949
+
1950
+ print(f"Original effect: {results.original_effect:.3f}")
1951
+ print(f"Permutation p-value: {results.p_value:.4f}")
1952
+ # Low p-value indicates the effect is unlikely to be due to chance
1953
+ ```
1954
+
1955
+ **Leave-one-out sensitivity:**
1956
+
1957
+ ```python
1958
+ # Test sensitivity to individual treated units
1959
+ # Note: requires binary post indicator (use 'post' column, not 'period')
1960
+ results = run_placebo_test(
1961
+ data,
1962
+ outcome='outcome',
1963
+ treatment='treated',
1964
+ time='post', # Binary post-treatment indicator
1965
+ unit='firm_id',
1966
+ test_type='leave_one_out'
1967
+ )
1968
+
1969
+ # Check if any single unit drives the result
1970
+ print(results.leave_one_out_effects) # Effect when each unit is dropped
1971
+ ```
1972
+
1973
+ **Run all placebo tests:**
1974
+
1975
+ ```python
1976
+ from diff_diff import run_all_placebo_tests
1977
+
1978
+ # Comprehensive diagnostic suite
1979
+ # Note: This function runs fake_timing tests on pre-treatment periods.
1980
+ # The permutation and leave_one_out tests require a binary post indicator,
1981
+ # so they may return errors if the data uses multi-period time column.
1982
+ all_results = run_all_placebo_tests(
1983
+ data,
1984
+ outcome='outcome',
1985
+ treatment='treated',
1986
+ time='period',
1987
+ unit='firm_id',
1988
+ pre_periods=[0, 1, 2],
1989
+ post_periods=[3, 4, 5],
1990
+ n_permutations=500,
1991
+ seed=42
1992
+ )
1993
+
1994
+ for test_name, result in all_results.items():
1995
+ if hasattr(result, 'p_value'):
1996
+ print(f"{test_name}: p={result.p_value:.3f}, significant={result.is_significant}")
1997
+ elif isinstance(result, dict) and 'error' in result:
1998
+ print(f"{test_name}: Error - {result['error']}")
1999
+ ```
2000
+
2001
+ ## API Reference
2002
+
2003
+ ### DifferenceInDifferences
2004
+
2005
+ ```python
2006
+ DifferenceInDifferences(
2007
+ robust=True, # Use HC1 robust standard errors
2008
+ cluster=None, # Column for cluster-robust SEs
2009
+ alpha=0.05 # Significance level for CIs
2010
+ )
2011
+ ```
2012
+
2013
+ **Methods:**
2014
+
2015
+ | Method | Description |
2016
+ |--------|-------------|
2017
+ | `fit(data, outcome, treatment, time, ...)` | Fit the DiD model |
2018
+ | `summary()` | Get formatted summary string |
2019
+ | `print_summary()` | Print summary to stdout |
2020
+ | `get_params()` | Get estimator parameters (sklearn-compatible) |
2021
+ | `set_params(**params)` | Set estimator parameters (sklearn-compatible) |
2022
+
2023
+ **fit() Parameters:**
2024
+
2025
+ | Parameter | Type | Description |
2026
+ |-----------|------|-------------|
2027
+ | `data` | DataFrame | Input data |
2028
+ | `outcome` | str | Outcome variable column name |
2029
+ | `treatment` | str | Treatment indicator column (0/1) |
2030
+ | `time` | str | Post-treatment indicator column (0/1) |
2031
+ | `formula` | str | R-style formula (alternative to column names) |
2032
+ | `covariates` | list | Linear control variables |
2033
+ | `fixed_effects` | list | Categorical FE columns (creates dummies) |
2034
+ | `absorb` | list | High-dimensional FE (within-transformation) |
2035
+
2036
+ ### DiDResults
2037
+
2038
+ **Attributes:**
2039
+
2040
+ | Attribute | Description |
2041
+ |-----------|-------------|
2042
+ | `att` | Average Treatment effect on the Treated |
2043
+ | `se` | Standard error of ATT |
2044
+ | `t_stat` | T-statistic |
2045
+ | `p_value` | P-value for H0: ATT = 0 |
2046
+ | `conf_int` | Tuple of (lower, upper) confidence bounds |
2047
+ | `n_obs` | Number of observations |
2048
+ | `n_treated` | Number of treated units |
2049
+ | `n_control` | Number of control units |
2050
+ | `r_squared` | R-squared of regression |
2051
+ | `coefficients` | Dictionary of all coefficients |
2052
+ | `is_significant` | Boolean for significance at alpha |
2053
+ | `significance_stars` | String of significance stars |
2054
+
2055
+ **Methods:**
2056
+
2057
+ | Method | Description |
2058
+ |--------|-------------|
2059
+ | `summary(alpha)` | Get formatted summary string |
2060
+ | `print_summary(alpha)` | Print summary to stdout |
2061
+ | `to_dict()` | Convert to dictionary |
2062
+ | `to_dataframe()` | Convert to pandas DataFrame |
2063
+
2064
+ ### MultiPeriodDiD
2065
+
2066
+ ```python
2067
+ MultiPeriodDiD(
2068
+ robust=True, # Use HC1 robust standard errors
2069
+ cluster=None, # Column for cluster-robust SEs
2070
+ alpha=0.05 # Significance level for CIs
2071
+ )
2072
+ ```
2073
+
2074
+ **fit() Parameters:**
2075
+
2076
+ | Parameter | Type | Description |
2077
+ |-----------|------|-------------|
2078
+ | `data` | DataFrame | Input data |
2079
+ | `outcome` | str | Outcome variable column name |
2080
+ | `treatment` | str | Treatment indicator column (0/1) |
2081
+ | `time` | str | Time period column (multiple values) |
2082
+ | `post_periods` | list | List of post-treatment period values |
2083
+ | `covariates` | list | Linear control variables |
2084
+ | `fixed_effects` | list | Categorical FE columns (creates dummies) |
2085
+ | `absorb` | list | High-dimensional FE (within-transformation) |
2086
+ | `reference_period` | any | Omitted period (default: last pre-period, e=-1 convention) |
2087
+ | `unit` | str | Unit identifier column (for staggered adoption warning) |
2088
+
2089
+ ### MultiPeriodDiDResults
2090
+
2091
+ **Attributes:**
2092
+
2093
+ | Attribute | Description |
2094
+ |-----------|-------------|
2095
+ | `period_effects` | Dict mapping periods to PeriodEffect objects (pre and post, excluding reference) |
2096
+ | `avg_att` | Average ATT across post-treatment periods only |
2097
+ | `avg_se` | Standard error of average ATT |
2098
+ | `avg_t_stat` | T-statistic for average ATT |
2099
+ | `avg_p_value` | P-value for average ATT |
2100
+ | `avg_conf_int` | Confidence interval for average ATT |
2101
+ | `n_obs` | Number of observations |
2102
+ | `pre_periods` | List of pre-treatment periods |
2103
+ | `post_periods` | List of post-treatment periods |
2104
+ | `reference_period` | The omitted reference period (coefficient = 0 by construction) |
2105
+ | `interaction_indices` | Dict mapping period → column index in VCV (for sub-VCV extraction) |
2106
+ | `pre_period_effects` | Property: pre-period effects only (for parallel trends assessment) |
2107
+ | `post_period_effects` | Property: post-period effects only |
2108
+
2109
+ **Methods:**
2110
+
2111
+ | Method | Description |
2112
+ |--------|-------------|
2113
+ | `get_effect(period)` | Get PeriodEffect for specific period |
2114
+ | `summary(alpha)` | Get formatted summary string |
2115
+ | `print_summary(alpha)` | Print summary to stdout |
2116
+ | `to_dict()` | Convert to dictionary |
2117
+ | `to_dataframe()` | Convert to pandas DataFrame |
2118
+
2119
+ ### PeriodEffect
2120
+
2121
+ **Attributes:**
2122
+
2123
+ | Attribute | Description |
2124
+ |-----------|-------------|
2125
+ | `period` | Time period identifier |
2126
+ | `effect` | Treatment effect estimate |
2127
+ | `se` | Standard error |
2128
+ | `t_stat` | T-statistic |
2129
+ | `p_value` | P-value |
2130
+ | `conf_int` | Confidence interval |
2131
+ | `is_significant` | Boolean for significance at 0.05 |
2132
+ | `significance_stars` | String of significance stars |
2133
+
2134
+ ### SyntheticDiD
2135
+
2136
+ ```python
2137
+ SyntheticDiD(
2138
+ zeta_omega=None, # Unit weight regularization (None = auto from data)
2139
+ zeta_lambda=None, # Time weight regularization (None = auto from data)
2140
+ alpha=0.05, # Significance level for CIs
2141
+ variance_method="placebo", # "placebo" (R default) or "bootstrap"
2142
+ n_bootstrap=200, # Replications for SE estimation
2143
+ seed=None # Random seed for reproducibility
2144
+ )
2145
+ ```
2146
+
2147
+ **fit() Parameters:**
2148
+
2149
+ | Parameter | Type | Description |
2150
+ |-----------|------|-------------|
2151
+ | `data` | DataFrame | Panel data |
2152
+ | `outcome` | str | Outcome variable column name |
2153
+ | `treatment` | str | Treatment indicator column (0/1) |
2154
+ | `unit` | str | Unit identifier column |
2155
+ | `time` | str | Time period column |
2156
+ | `post_periods` | list | List of post-treatment period values |
2157
+ | `covariates` | list | Covariates to residualize out |
2158
+
2159
+ ### SyntheticDiDResults
2160
+
2161
+ **Attributes:**
2162
+
2163
+ | Attribute | Description |
2164
+ |-----------|-------------|
2165
+ | `att` | Average Treatment effect on the Treated |
2166
+ | `se` | Standard error (bootstrap or placebo-based) |
2167
+ | `t_stat` | T-statistic |
2168
+ | `p_value` | P-value |
2169
+ | `conf_int` | Confidence interval |
2170
+ | `n_obs` | Number of observations |
2171
+ | `n_treated` | Number of treated units |
2172
+ | `n_control` | Number of control units |
2173
+ | `unit_weights` | Dict mapping control unit IDs to weights |
2174
+ | `time_weights` | Dict mapping pre-treatment periods to weights |
2175
+ | `pre_periods` | List of pre-treatment periods |
2176
+ | `post_periods` | List of post-treatment periods |
2177
+ | `pre_treatment_fit` | RMSE of synthetic vs treated in pre-period |
2178
+ | `placebo_effects` | Array of placebo effect estimates |
2179
+
2180
+ **Methods:**
2181
+
2182
+ | Method | Description |
2183
+ |--------|-------------|
2184
+ | `summary(alpha)` | Get formatted summary string |
2185
+ | `print_summary(alpha)` | Print summary to stdout |
2186
+ | `to_dict()` | Convert to dictionary |
2187
+ | `to_dataframe()` | Convert to pandas DataFrame |
2188
+ | `get_unit_weights_df()` | Get unit weights as DataFrame |
2189
+ | `get_time_weights_df()` | Get time weights as DataFrame |
2190
+
2191
+ ### TROP
2192
+
2193
+ ```python
2194
+ TROP(
2195
+ lambda_time_grid=None, # Time decay grid (default: [0, 0.1, 0.5, 1, 2, 5])
2196
+ lambda_unit_grid=None, # Unit distance grid (default: [0, 0.1, 0.5, 1, 2, 5])
2197
+ lambda_nn_grid=None, # Nuclear norm grid (default: [0, 0.01, 0.1, 1, 10])
2198
+ max_iter=100, # Max iterations for factor estimation
2199
+ tol=1e-6, # Convergence tolerance
2200
+ alpha=0.05, # Significance level for CIs
2201
+ n_bootstrap=200, # Bootstrap replications (minimum 2; TROP requires bootstrap for SEs)
2202
+ seed=None # Random seed
2203
+ )
2204
+ ```
2205
+
2206
+ **fit() Parameters:**
2207
+
2208
+ | Parameter | Type | Description |
2209
+ |-----------|------|-------------|
2210
+ | `data` | DataFrame | Panel data |
2211
+ | `outcome` | str | Outcome variable column name |
2212
+ | `treatment` | str | Treatment indicator column (0/1 absorbing state) |
2213
+ | `unit` | str | Unit identifier column |
2214
+ | `time` | str | Time period column |
2215
+
2216
+ Note: TROP infers treatment periods from the treatment indicator column. The treatment column should be an absorbing state indicator where D=1 for all periods during and after treatment starts.
2217
+
2218
+ ### TROPResults
2219
+
2220
+ **Attributes:**
2221
+
2222
+ | Attribute | Description |
2223
+ |-----------|-------------|
2224
+ | `att` | Average Treatment effect on the Treated |
2225
+ | `se` | Standard error (bootstrap) |
2226
+ | `t_stat` | T-statistic |
2227
+ | `p_value` | P-value |
2228
+ | `conf_int` | Confidence interval |
2229
+ | `n_obs` | Number of observations |
2230
+ | `n_treated` | Number of treated units |
2231
+ | `n_control` | Number of control units |
2232
+ | `n_treated_obs` | Number of treated unit-time observations |
2233
+ | `unit_effects` | Dict mapping unit IDs to fixed effects |
2234
+ | `time_effects` | Dict mapping periods to fixed effects |
2235
+ | `treatment_effects` | Dict mapping (unit, time) to individual effects |
2236
+ | `lambda_time` | Selected time decay parameter |
2237
+ | `lambda_unit` | Selected unit distance parameter |
2238
+ | `lambda_nn` | Selected nuclear norm parameter |
2239
+ | `factor_matrix` | Low-rank factor matrix L (n_periods x n_units) |
2240
+ | `effective_rank` | Effective rank of factor matrix |
2241
+ | `loocv_score` | LOOCV score for selected parameters |
2242
+ | `n_pre_periods` | Number of pre-treatment periods |
2243
+ | `n_post_periods` | Number of post-treatment periods |
2244
+ | `bootstrap_distribution` | Bootstrap distribution (if bootstrap) |
2245
+
2246
+ **Methods:**
2247
+
2248
+ | Method | Description |
2249
+ |--------|-------------|
2250
+ | `summary(alpha)` | Get formatted summary string |
2251
+ | `print_summary(alpha)` | Print summary to stdout |
2252
+ | `to_dict()` | Convert to dictionary |
2253
+ | `to_dataframe()` | Convert to pandas DataFrame |
2254
+ | `get_unit_effects_df()` | Get unit fixed effects as DataFrame |
2255
+ | `get_time_effects_df()` | Get time fixed effects as DataFrame |
2256
+ | `get_treatment_effects_df()` | Get individual treatment effects as DataFrame |
2257
+
2258
+ ### SunAbraham
2259
+
2260
+ ```python
2261
+ SunAbraham(
2262
+ control_group='never_treated', # or 'not_yet_treated'
2263
+ anticipation=0, # Periods of anticipation effects
2264
+ alpha=0.05, # Significance level for CIs
2265
+ cluster=None, # Column for cluster-robust SEs
2266
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
2267
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
2268
+ seed=None # Random seed
2269
+ )
2270
+ ```
2271
+
2272
+ **fit() Parameters:**
2273
+
2274
+ | Parameter | Type | Description |
2275
+ |-----------|------|-------------|
2276
+ | `data` | DataFrame | Panel data |
2277
+ | `outcome` | str | Outcome variable column name |
2278
+ | `unit` | str | Unit identifier column |
2279
+ | `time` | str | Time period column |
2280
+ | `first_treat` | str | Column with first treatment period (0 for never-treated) |
2281
+ | `covariates` | list | Covariate column names |
2282
+
2283
+ ### SunAbrahamResults
2284
+
2285
+ **Attributes:**
2286
+
2287
+ | Attribute | Description |
2288
+ |-----------|-------------|
2289
+ | `event_study_effects` | Dict mapping relative time to effect info |
2290
+ | `overall_att` | Overall average treatment effect |
2291
+ | `overall_se` | Standard error of overall ATT |
2292
+ | `overall_t_stat` | T-statistic for overall ATT |
2293
+ | `overall_p_value` | P-value for overall ATT |
2294
+ | `overall_conf_int` | Confidence interval for overall ATT |
2295
+ | `cohort_weights` | Dict mapping relative time to cohort weights |
2296
+ | `groups` | List of treatment cohorts |
2297
+ | `time_periods` | List of all time periods |
2298
+ | `n_obs` | Total number of observations |
2299
+ | `n_treated_units` | Number of ever-treated units |
2300
+ | `n_control_units` | Number of never-treated units |
2301
+ | `is_significant` | Boolean for significance at alpha |
2302
+ | `significance_stars` | String of significance stars |
2303
+ | `bootstrap_results` | SABootstrapResults (if bootstrap enabled) |
2304
+
2305
+ **Methods:**
2306
+
2307
+ | Method | Description |
2308
+ |--------|-------------|
2309
+ | `summary(alpha)` | Get formatted summary string |
2310
+ | `print_summary(alpha)` | Print summary to stdout |
2311
+ | `to_dataframe(level)` | Convert to DataFrame ('event_study' or 'cohort') |
2312
+
2313
+ ### ImputationDiD
2314
+
2315
+ ```python
2316
+ ImputationDiD(
2317
+ anticipation=0, # Periods of anticipation effects
2318
+ alpha=0.05, # Significance level for CIs
2319
+ cluster=None, # Column for cluster-robust SEs
2320
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical)
2321
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
2322
+ seed=None, # Random seed
2323
+ rank_deficient_action='warn', # 'warn', 'error', or 'silent'
2324
+ horizon_max=None, # Max event-study horizon
2325
+ aux_partition='cohort_horizon', # Variance partition
2326
+ )
2327
+ ```
2328
+
2329
+ **fit() Parameters:**
2330
+
2331
+ | Parameter | Type | Description |
2332
+ |-----------|------|-------------|
2333
+ | `data` | DataFrame | Panel data |
2334
+ | `outcome` | str | Outcome variable column name |
2335
+ | `unit` | str | Unit identifier column |
2336
+ | `time` | str | Time period column |
2337
+ | `first_treat` | str | First treatment period column (0 for never-treated) |
2338
+ | `covariates` | list | Covariate column names |
2339
+ | `aggregate` | str | Aggregation: None, "event_study", "group", "all" |
2340
+ | `balance_e` | int | Balance event study to this many pre-treatment periods |
2341
+
2342
+ ### ImputationDiDResults
2343
+
2344
+ **Attributes:**
2345
+
2346
+ | Attribute | Description |
2347
+ |-----------|-------------|
2348
+ | `overall_att` | Overall average treatment effect on the treated |
2349
+ | `overall_se` | Standard error (conservative, Theorem 3) |
2350
+ | `overall_t_stat` | T-statistic |
2351
+ | `overall_p_value` | P-value for H0: ATT = 0 |
2352
+ | `overall_conf_int` | Confidence interval |
2353
+ | `event_study_effects` | Dict of relative time -> effect dict (if `aggregate='event_study'` or `'all'`) |
2354
+ | `group_effects` | Dict of cohort -> effect dict (if `aggregate='group'` or `'all'`) |
2355
+ | `treatment_effects` | DataFrame of unit-level imputed treatment effects |
2356
+ | `n_treated_obs` | Number of treated observations |
2357
+ | `n_untreated_obs` | Number of untreated observations |
2358
+
2359
+ **Methods:**
2360
+
2361
+ | Method | Description |
2362
+ |--------|-------------|
2363
+ | `summary(alpha)` | Get formatted summary string |
2364
+ | `print_summary(alpha)` | Print summary to stdout |
2365
+ | `to_dataframe(level)` | Convert to DataFrame ('observation', 'event_study', 'group') |
2366
+ | `pretrend_test(n_leads)` | Run pre-trend F-test (Equation 9) |
2367
+
2368
+ ### TwoStageDiD
2369
+
2370
+ ```python
2371
+ TwoStageDiD(
2372
+ anticipation=0, # Periods of anticipation effects
2373
+ alpha=0.05, # Significance level for CIs
2374
+ cluster=None, # Column for cluster-robust SEs (defaults to unit)
2375
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical GMM SEs)
2376
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
2377
+ seed=None, # Random seed
2378
+ rank_deficient_action='warn', # 'warn', 'error', or 'silent'
2379
+ horizon_max=None, # Max event-study horizon
2380
+ )
2381
+ ```
2382
+
2383
+ **fit() Parameters:**
2384
+
2385
+ | Parameter | Type | Description |
2386
+ |-----------|------|-------------|
2387
+ | `data` | DataFrame | Panel data |
2388
+ | `outcome` | str | Outcome variable column name |
2389
+ | `unit` | str | Unit identifier column |
2390
+ | `time` | str | Time period column |
2391
+ | `first_treat` | str | First treatment period column (0 for never-treated) |
2392
+ | `covariates` | list | Covariate column names |
2393
+ | `aggregate` | str | Aggregation: None, "event_study", "group", "all" |
2394
+ | `balance_e` | int | Balance event study to this many pre-treatment periods |
2395
+
2396
+ ### TwoStageDiDResults
2397
+
2398
+ **Attributes:**
2399
+
2400
+ | Attribute | Description |
2401
+ |-----------|-------------|
2402
+ | `overall_att` | Overall average treatment effect on the treated |
2403
+ | `overall_se` | Standard error (GMM sandwich variance) |
2404
+ | `overall_t_stat` | T-statistic |
2405
+ | `overall_p_value` | P-value for H0: ATT = 0 |
2406
+ | `overall_conf_int` | Confidence interval |
2407
+ | `event_study_effects` | Dict of relative time -> effect dict (if `aggregate='event_study'` or `'all'`) |
2408
+ | `group_effects` | Dict of cohort -> effect dict (if `aggregate='group'` or `'all'`) |
2409
+ | `treatment_effects` | DataFrame of unit-level treatment effects |
2410
+ | `n_treated_obs` | Number of treated observations |
2411
+ | `n_untreated_obs` | Number of untreated observations |
2412
+
2413
+ **Methods:**
2414
+
2415
+ | Method | Description |
2416
+ |--------|-------------|
2417
+ | `summary(alpha)` | Get formatted summary string |
2418
+ | `print_summary(alpha)` | Print summary to stdout |
2419
+ | `to_dataframe(level)` | Convert to DataFrame ('observation', 'event_study', 'group') |
2420
+
2421
+ ### StackedDiD
2422
+
2423
+ ```python
2424
+ StackedDiD(
2425
+ kappa_pre=1, # Pre-treatment event-study periods
2426
+ kappa_post=1, # Post-treatment event-study periods
2427
+ weighting='aggregate', # 'aggregate', 'population', or 'sample_share'
2428
+ clean_control='not_yet_treated', # 'not_yet_treated', 'strict', or 'never_treated'
2429
+ cluster='unit', # 'unit' or 'unit_subexp'
2430
+ alpha=0.05, # Significance level
2431
+ anticipation=0, # Anticipation periods
2432
+ rank_deficient_action='warn', # 'warn', 'error', or 'silent'
2433
+ )
2434
+ ```
2435
+
2436
+ **fit() Parameters:**
2437
+
2438
+ | Parameter | Type | Description |
2439
+ |-----------|------|-------------|
2440
+ | `data` | DataFrame | Panel data |
2441
+ | `outcome` | str | Outcome variable column name |
2442
+ | `unit` | str | Unit identifier column |
2443
+ | `time` | str | Time period column |
2444
+ | `first_treat` | str | First treatment period column (0 for never-treated) |
2445
+ | `population` | str, optional | Population column (required if weighting='population') |
2446
+ | `aggregate` | str | Aggregation: None, `"simple"`, or `"event_study"` |
2447
+
2448
+ ### StackedDiDResults
2449
+
2450
+ **Attributes:**
2451
+
2452
+ | Attribute | Description |
2453
+ |-----------|-------------|
2454
+ | `overall_att` | Overall average treatment effect on the treated |
2455
+ | `overall_se` | Standard error |
2456
+ | `overall_t_stat` | T-statistic |
2457
+ | `overall_p_value` | P-value for H0: ATT = 0 |
2458
+ | `overall_conf_int` | Confidence interval |
2459
+ | `event_study_effects` | Dict of relative time -> effect dict (if `aggregate='event_study'`) |
2460
+ | `stacked_data` | The stacked dataset used for estimation |
2461
+ | `n_treated_obs` | Number of treated observations |
2462
+ | `n_untreated_obs` | Number of untreated (clean control) observations |
2463
+ | `n_cohorts` | Number of treatment cohorts |
2464
+ | `kappa_pre` | Pre-treatment window used |
2465
+ | `kappa_post` | Post-treatment window used |
2466
+
2467
+ **Methods:**
2468
+
2469
+ | Method | Description |
2470
+ |--------|-------------|
2471
+ | `summary(alpha)` | Get formatted summary string |
2472
+ | `print_summary(alpha)` | Print summary to stdout |
2473
+ | `to_dataframe(level)` | Convert to DataFrame ('event_study') |
2474
+
2475
+ ### TripleDifference
2476
+
2477
+ ```python
2478
+ TripleDifference(
2479
+ estimation_method='dr', # 'dr' (doubly robust), 'reg', or 'ipw'
2480
+ robust=True, # Use HC1 robust standard errors
2481
+ cluster=None, # Column for cluster-robust SEs
2482
+ alpha=0.05, # Significance level for CIs
2483
+ pscore_trim=0.01 # Propensity score trimming threshold
2484
+ )
2485
+ ```
2486
+
2487
+ **fit() Parameters:**
2488
+
2489
+ | Parameter | Type | Description |
2490
+ |-----------|------|-------------|
2491
+ | `data` | DataFrame | Input data |
2492
+ | `outcome` | str | Outcome variable column name |
2493
+ | `group` | str | Group indicator column (0/1): 1=treated group |
2494
+ | `partition` | str | Partition/eligibility indicator column (0/1): 1=eligible |
2495
+ | `time` | str | Time indicator column (0/1): 1=post-treatment |
2496
+ | `covariates` | list | Covariate column names for adjustment |
2497
+
2498
+ ### TripleDifferenceResults
2499
+
2500
+ **Attributes:**
2501
+
2502
+ | Attribute | Description |
2503
+ |-----------|-------------|
2504
+ | `att` | Average Treatment effect on the Treated |
2505
+ | `se` | Standard error of ATT |
2506
+ | `t_stat` | T-statistic |
2507
+ | `p_value` | P-value for H0: ATT = 0 |
2508
+ | `conf_int` | Tuple of (lower, upper) confidence bounds |
2509
+ | `n_obs` | Total number of observations |
2510
+ | `n_treated_eligible` | Obs in treated group & eligible partition |
2511
+ | `n_treated_ineligible` | Obs in treated group & ineligible partition |
2512
+ | `n_control_eligible` | Obs in control group & eligible partition |
2513
+ | `n_control_ineligible` | Obs in control group & ineligible partition |
2514
+ | `estimation_method` | Method used ('dr', 'reg', or 'ipw') |
2515
+ | `group_means` | Dict of cell means for diagnostics |
2516
+ | `pscore_stats` | Propensity score statistics (IPW/DR only) |
2517
+ | `is_significant` | Boolean for significance at alpha |
2518
+ | `significance_stars` | String of significance stars |
2519
+
2520
+ **Methods:**
2521
+
2522
+ | Method | Description |
2523
+ |--------|-------------|
2524
+ | `summary(alpha)` | Get formatted summary string |
2525
+ | `print_summary(alpha)` | Print summary to stdout |
2526
+ | `to_dict()` | Convert to dictionary |
2527
+ | `to_dataframe()` | Convert to pandas DataFrame |
2528
+
2529
+ ### HonestDiD
2530
+
2531
+ ```python
2532
+ HonestDiD(
2533
+ method='relative_magnitude', # 'relative_magnitude' or 'smoothness'
2534
+ M=None, # Restriction parameter (default: 1.0 for RM, 0.0 for SD)
2535
+ alpha=0.05, # Significance level for CIs
2536
+ l_vec=None # Linear combination vector for target parameter
2537
+ )
2538
+ ```
2539
+
2540
+ **fit() Parameters:**
2541
+
2542
+ | Parameter | Type | Description |
2543
+ |-----------|------|-------------|
2544
+ | `results` | MultiPeriodDiDResults | Results from MultiPeriodDiD.fit() |
2545
+ | `M` | float | Restriction parameter (overrides constructor value) |
2546
+
2547
+ **Methods:**
2548
+
2549
+ | Method | Description |
2550
+ |--------|-------------|
2551
+ | `fit(results, M)` | Compute bounds for given event study results |
2552
+ | `sensitivity_analysis(results, M_grid)` | Compute bounds over grid of M values |
2553
+ | `breakdown_value(results, tol)` | Find smallest M where CI includes zero |
2554
+
2555
+ ### HonestDiDResults
2556
+
2557
+ **Attributes:**
2558
+
2559
+ | Attribute | Description |
2560
+ |-----------|-------------|
2561
+ | `original_estimate` | Point estimate under parallel trends |
2562
+ | `lb` | Lower bound of identified set |
2563
+ | `ub` | Upper bound of identified set |
2564
+ | `ci_lb` | Lower bound of robust confidence interval |
2565
+ | `ci_ub` | Upper bound of robust confidence interval |
2566
+ | `ci_width` | Width of robust CI |
2567
+ | `M` | Restriction parameter used |
2568
+ | `method` | Restriction method ('relative_magnitude' or 'smoothness') |
2569
+ | `alpha` | Significance level |
2570
+ | `is_significant` | True if robust CI excludes zero |
2571
+
2572
+ **Methods:**
2573
+
2574
+ | Method | Description |
2575
+ |--------|-------------|
2576
+ | `summary()` | Get formatted summary string |
2577
+ | `to_dict()` | Convert to dictionary |
2578
+ | `to_dataframe()` | Convert to pandas DataFrame |
2579
+
2580
+ ### SensitivityResults
2581
+
2582
+ **Attributes:**
2583
+
2584
+ | Attribute | Description |
2585
+ |-----------|-------------|
2586
+ | `M_grid` | Array of M values analyzed |
2587
+ | `results` | List of HonestDiDResults for each M |
2588
+ | `breakdown_M` | Smallest M where CI includes zero (None if always significant) |
2589
+
2590
+ **Methods:**
2591
+
2592
+ | Method | Description |
2593
+ |--------|-------------|
2594
+ | `summary()` | Get formatted summary string |
2595
+ | `plot(ax)` | Plot sensitivity analysis |
2596
+ | `to_dataframe()` | Convert to pandas DataFrame |
2597
+
2598
+ ### PreTrendsPower
2599
+
2600
+ ```python
2601
+ PreTrendsPower(
2602
+ alpha=0.05, # Significance level for pre-trends test
2603
+ power=0.80, # Target power for MDV calculation
2604
+ violation_type='linear', # 'linear', 'constant', 'last_period', 'custom'
2605
+ violation_weights=None # Custom weights (required if violation_type='custom')
2606
+ )
2607
+ ```
2608
+
2609
+ **fit() Parameters:**
2610
+
2611
+ | Parameter | Type | Description |
2612
+ |-----------|------|-------------|
2613
+ | `results` | MultiPeriodDiDResults | Results from event study |
2614
+ | `M` | float | Specific violation magnitude to evaluate |
2615
+
2616
+ **Methods:**
2617
+
2618
+ | Method | Description |
2619
+ |--------|-------------|
2620
+ | `fit(results, M)` | Compute power analysis for given event study |
2621
+ | `power_at(results, M)` | Compute power for specific violation magnitude |
2622
+ | `power_curve(results, M_grid, n_points)` | Compute power across range of M values |
2623
+ | `sensitivity_to_honest_did(results)` | Compare with HonestDiD analysis |
2624
+
2625
+ ### PreTrendsPowerResults
2626
+
2627
+ **Attributes:**
2628
+
2629
+ | Attribute | Description |
2630
+ |-----------|-------------|
2631
+ | `power` | Power to detect the specified violation |
2632
+ | `mdv` | Minimum detectable violation at target power |
2633
+ | `violation_magnitude` | Violation magnitude (M) tested |
2634
+ | `violation_type` | Type of violation pattern |
2635
+ | `alpha` | Significance level |
2636
+ | `target_power` | Target power level |
2637
+ | `n_pre_periods` | Number of pre-treatment periods |
2638
+ | `test_statistic` | Expected test statistic under violation |
2639
+ | `critical_value` | Critical value for pre-trends test |
2640
+ | `noncentrality` | Non-centrality parameter |
2641
+ | `is_informative` | Heuristic check if test is informative |
2642
+ | `power_adequate` | Whether power meets target |
2643
+
2644
+ **Methods:**
2645
+
2646
+ | Method | Description |
2647
+ |--------|-------------|
2648
+ | `summary()` | Get formatted summary string |
2649
+ | `print_summary()` | Print summary to stdout |
2650
+ | `to_dict()` | Convert to dictionary |
2651
+ | `to_dataframe()` | Convert to pandas DataFrame |
2652
+
2653
+ ### PreTrendsPowerCurve
2654
+
2655
+ **Attributes:**
2656
+
2657
+ | Attribute | Description |
2658
+ |-----------|-------------|
2659
+ | `M_values` | Array of violation magnitudes |
2660
+ | `powers` | Array of power values |
2661
+ | `mdv` | Minimum detectable violation |
2662
+ | `alpha` | Significance level |
2663
+ | `target_power` | Target power level |
2664
+ | `violation_type` | Type of violation pattern |
2665
+
2666
+ **Methods:**
2667
+
2668
+ | Method | Description |
2669
+ |--------|-------------|
2670
+ | `plot(ax, show_mdv, show_target)` | Plot power curve |
2671
+ | `to_dataframe()` | Convert to DataFrame with M and power columns |
2672
+
2673
+ ### Data Preparation Functions
2674
+
2675
+ #### generate_did_data
2676
+
2677
+ ```python
2678
+ generate_did_data(
2679
+ n_units=100, # Number of units
2680
+ n_periods=4, # Number of time periods
2681
+ treatment_effect=5.0, # True ATT
2682
+ treatment_fraction=0.5, # Fraction treated
2683
+ treatment_period=2, # First post-treatment period
2684
+ unit_fe_sd=2.0, # Unit fixed effect std dev
2685
+ time_trend=0.5, # Linear time trend
2686
+ noise_sd=1.0, # Idiosyncratic noise std dev
2687
+ seed=None # Random seed
2688
+ )
2689
+ ```
2690
+
2691
+ Returns DataFrame with columns: `unit`, `period`, `treated`, `post`, `outcome`, `true_effect`.
2692
+
2693
+ #### make_treatment_indicator
2694
+
2695
+ ```python
2696
+ make_treatment_indicator(
2697
+ data, # Input DataFrame
2698
+ column, # Column to create treatment from
2699
+ treated_values=None, # Value(s) indicating treatment
2700
+ threshold=None, # Numeric threshold for treatment
2701
+ above_threshold=True, # If True, >= threshold is treated
2702
+ new_column='treated' # Output column name
2703
+ )
2704
+ ```
2705
+
2706
+ #### make_post_indicator
2707
+
2708
+ ```python
2709
+ make_post_indicator(
2710
+ data, # Input DataFrame
2711
+ time_column, # Time/period column
2712
+ post_periods=None, # Specific post-treatment period(s)
2713
+ treatment_start=None, # First post-treatment period
2714
+ new_column='post' # Output column name
2715
+ )
2716
+ ```
2717
+
2718
+ #### wide_to_long
2719
+
2720
+ ```python
2721
+ wide_to_long(
2722
+ data, # Wide-format DataFrame
2723
+ value_columns, # List of time-varying columns
2724
+ id_column, # Unit identifier column
2725
+ time_name='period', # Name for time column
2726
+ value_name='value', # Name for value column
2727
+ time_values=None # Values for time periods
2728
+ )
2729
+ ```
2730
+
2731
+ #### balance_panel
2732
+
2733
+ ```python
2734
+ balance_panel(
2735
+ data, # Panel DataFrame
2736
+ unit_column, # Unit identifier column
2737
+ time_column, # Time period column
2738
+ method='inner', # 'inner', 'outer', or 'fill'
2739
+ fill_value=None # Value for filling (if method='fill')
2740
+ )
2741
+ ```
2742
+
2743
+ #### validate_did_data
2744
+
2745
+ ```python
2746
+ validate_did_data(
2747
+ data, # DataFrame to validate
2748
+ outcome, # Outcome column name
2749
+ treatment, # Treatment column name
2750
+ time, # Time/post column name
2751
+ unit=None, # Unit column (for panel validation)
2752
+ raise_on_error=True # Raise ValueError or return dict
2753
+ )
2754
+ ```
2755
+
2756
+ Returns dict with `valid`, `errors`, `warnings`, and `summary` keys.
2757
+
2758
+ #### summarize_did_data
2759
+
2760
+ ```python
2761
+ summarize_did_data(
2762
+ data, # Input DataFrame
2763
+ outcome, # Outcome column name
2764
+ treatment, # Treatment column name
2765
+ time, # Time/post column name
2766
+ unit=None # Unit column (optional)
2767
+ )
2768
+ ```
2769
+
2770
+ Returns DataFrame with summary statistics by treatment-time cell.
2771
+
2772
+ #### create_event_time
2773
+
2774
+ ```python
2775
+ create_event_time(
2776
+ data, # Panel DataFrame
2777
+ time_column, # Calendar time column
2778
+ treatment_time_column, # Column with treatment timing
2779
+ new_column='event_time' # Output column name
2780
+ )
2781
+ ```
2782
+
2783
+ #### aggregate_to_cohorts
2784
+
2785
+ ```python
2786
+ aggregate_to_cohorts(
2787
+ data, # Unit-level panel data
2788
+ unit_column, # Unit identifier column
2789
+ time_column, # Time period column
2790
+ treatment_column, # Treatment indicator column
2791
+ outcome, # Outcome variable column
2792
+ covariates=None # Additional columns to aggregate
2793
+ )
2794
+ ```
2795
+
2796
+ #### rank_control_units
2797
+
2798
+ ```python
2799
+ rank_control_units(
2800
+ data, # Panel data in long format
2801
+ unit_column, # Unit identifier column
2802
+ time_column, # Time period column
2803
+ outcome_column, # Outcome variable column
2804
+ treatment_column=None, # Treatment indicator column (0/1)
2805
+ treated_units=None, # Explicit list of treated unit IDs
2806
+ pre_periods=None, # Pre-treatment periods (default: first half)
2807
+ covariates=None, # Covariate columns for matching
2808
+ outcome_weight=0.7, # Weight for outcome trend similarity (0-1)
2809
+ covariate_weight=0.3, # Weight for covariate distance (0-1)
2810
+ exclude_units=None, # Units to exclude from control pool
2811
+ require_units=None, # Units that must appear in output
2812
+ n_top=None, # Return only top N controls
2813
+ suggest_treatment_candidates=False, # Identify treatment candidates
2814
+ n_treatment_candidates=5, # Number of treatment candidates
2815
+ lambda_reg=0.0 # Regularization for synthetic weights
2816
+ )
2817
+ ```
2818
+
2819
+ Returns DataFrame with columns: `unit`, `quality_score`, `outcome_trend_score`, `covariate_score`, `synthetic_weight`, `pre_trend_rmse`, `is_required`.
2820
+
2821
+ ## Requirements
2822
+
2823
+ - Python 3.9 - 3.14
2824
+ - numpy >= 1.20
2825
+ - pandas >= 1.3
2826
+ - scipy >= 1.7
2827
+
2828
+ ## Development
2829
+
2830
+ ```bash
2831
+ # Install with dev dependencies
2832
+ pip install -e ".[dev]"
2833
+
2834
+ # Run tests
2835
+ pytest
2836
+
2837
+ # Format code
2838
+ black diff_diff tests
2839
+ ruff check diff_diff tests
2840
+ ```
2841
+
2842
+ ## References
2843
+
2844
+ This library implements methods from the following scholarly works:
2845
+
2846
+ ### Difference-in-Differences
2847
+
2848
+ - **Ashenfelter, O., & Card, D. (1985).** "Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs." *The Review of Economics and Statistics*, 67(4), 648-660. [https://doi.org/10.2307/1924810](https://doi.org/10.2307/1924810)
2849
+
2850
+ - **Card, D., & Krueger, A. B. (1994).** "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania." *The American Economic Review*, 84(4), 772-793. [https://www.jstor.org/stable/2118030](https://www.jstor.org/stable/2118030)
2851
+
2852
+ - **Angrist, J. D., & Pischke, J.-S. (2009).** *Mostly Harmless Econometrics: An Empiricist's Companion*. Princeton University Press. Chapter 5: Differences-in-Differences.
2853
+
2854
+ ### Two-Way Fixed Effects
2855
+
2856
+ - **Wooldridge, J. M. (2010).** *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press.
2857
+
2858
+ - **Imai, K., & Kim, I. S. (2021).** "On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data." *Political Analysis*, 29(3), 405-415. [https://doi.org/10.1017/pan.2020.33](https://doi.org/10.1017/pan.2020.33)
2859
+
2860
+ ### Robust Standard Errors
2861
+
2862
+ - **White, H. (1980).** "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." *Econometrica*, 48(4), 817-838. [https://doi.org/10.2307/1912934](https://doi.org/10.2307/1912934)
2863
+
2864
+ - **MacKinnon, J. G., & White, H. (1985).** "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." *Journal of Econometrics*, 29(3), 305-325. [https://doi.org/10.1016/0304-4076(85)90158-7](https://doi.org/10.1016/0304-4076(85)90158-7)
2865
+
2866
+ - **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011).** "Robust Inference With Multiway Clustering." *Journal of Business & Economic Statistics*, 29(2), 238-249. [https://doi.org/10.1198/jbes.2010.07136](https://doi.org/10.1198/jbes.2010.07136)
2867
+
2868
+ ### Wild Cluster Bootstrap
2869
+
2870
+ - **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008).** "Bootstrap-Based Improvements for Inference with Clustered Errors." *The Review of Economics and Statistics*, 90(3), 414-427. [https://doi.org/10.1162/rest.90.3.414](https://doi.org/10.1162/rest.90.3.414)
2871
+
2872
+ - **Webb, M. D. (2014).** "Reworking Wild Bootstrap Based Inference for Clustered Errors." Queen's Economics Department Working Paper No. 1315. [https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf](https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf)
2873
+
2874
+ - **MacKinnon, J. G., & Webb, M. D. (2018).** "The Wild Bootstrap for Few (Treated) Clusters." *The Econometrics Journal*, 21(2), 114-135. [https://doi.org/10.1111/ectj.12107](https://doi.org/10.1111/ectj.12107)
2875
+
2876
+ ### Placebo Tests and DiD Diagnostics
2877
+
2878
+ - **Bertrand, M., Duflo, E., & Mullainathan, S. (2004).** "How Much Should We Trust Differences-in-Differences Estimates?" *The Quarterly Journal of Economics*, 119(1), 249-275. [https://doi.org/10.1162/003355304772839588](https://doi.org/10.1162/003355304772839588)
2879
+
2880
+ ### Synthetic Control Method
2881
+
2882
+ - **Abadie, A., & Gardeazabal, J. (2003).** "The Economic Costs of Conflict: A Case Study of the Basque Country." *The American Economic Review*, 93(1), 113-132. [https://doi.org/10.1257/000282803321455188](https://doi.org/10.1257/000282803321455188)
2883
+
2884
+ - **Abadie, A., Diamond, A., & Hainmueller, J. (2010).** "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." *Journal of the American Statistical Association*, 105(490), 493-505. [https://doi.org/10.1198/jasa.2009.ap08746](https://doi.org/10.1198/jasa.2009.ap08746)
2885
+
2886
+ - **Abadie, A., Diamond, A., & Hainmueller, J. (2015).** "Comparative Politics and the Synthetic Control Method." *American Journal of Political Science*, 59(2), 495-510. [https://doi.org/10.1111/ajps.12116](https://doi.org/10.1111/ajps.12116)
2887
+
2888
+ ### Synthetic Difference-in-Differences
2889
+
2890
+ - **Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021).** "Synthetic Difference-in-Differences." *American Economic Review*, 111(12), 4088-4118. [https://doi.org/10.1257/aer.20190159](https://doi.org/10.1257/aer.20190159)
2891
+
2892
+ ### Triply Robust Panel (TROP)
2893
+
2894
+ - **Athey, S., Imbens, G. W., Qu, Z., & Viviano, D. (2025).** "Triply Robust Panel Estimators." *Working Paper*. [https://arxiv.org/abs/2508.21536](https://arxiv.org/abs/2508.21536)
2895
+
2896
+ This paper introduces the TROP estimator which combines three robustness components:
2897
+ - **Factor model adjustment**: Low-rank factor structure via SVD removes unobserved confounders
2898
+ - **Unit weights**: Synthetic control style weighting for optimal comparison
2899
+ - **Time weights**: SDID style time weighting for informative pre-periods
2900
+
2901
+ TROP is particularly useful when there are unobserved time-varying confounders with a factor structure that affect different units differently over time.
2902
+
2903
+ ### Triple Difference (DDD)
2904
+
2905
+ - **Ortiz-Villavicencio, M., & Sant'Anna, P. H. C. (2025).** "Better Understanding Triple Differences Estimators." *Working Paper*. [https://arxiv.org/abs/2505.09942](https://arxiv.org/abs/2505.09942)
2906
+
2907
+ This paper shows that common DDD implementations (taking the difference between two DiDs, or applying three-way fixed effects regressions) are generally invalid when identification requires conditioning on covariates. The `TripleDifference` class implements their regression adjustment, inverse probability weighting, and doubly robust estimators.
2908
+
2909
+ - **Gruber, J. (1994).** "The Incidence of Mandated Maternity Benefits." *American Economic Review*, 84(3), 622-641. [https://www.jstor.org/stable/2118071](https://www.jstor.org/stable/2118071)
2910
+
2911
+ Classic paper introducing the Triple Difference design for policy evaluation.
2912
+
2913
+ - **Olden, A., & Møen, J. (2022).** "The Triple Difference Estimator." *The Econometrics Journal*, 25(3), 531-553. [https://doi.org/10.1093/ectj/utac010](https://doi.org/10.1093/ectj/utac010)
2914
+
2915
+ ### Parallel Trends and Pre-Trend Testing
2916
+
2917
+ - **Roth, J. (2022).** "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." *American Economic Review: Insights*, 4(3), 305-322. [https://doi.org/10.1257/aeri.20210236](https://doi.org/10.1257/aeri.20210236)
2918
+
2919
+ - **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. [https://doi.org/10.1177/1948550617697177](https://doi.org/10.1177/1948550617697177)
2920
+
2921
+ ### Honest DiD / Sensitivity Analysis
2922
+
2923
+ The `HonestDiD` module implements sensitivity analysis methods for relaxing the parallel trends assumption:
2924
+
2925
+ - **Rambachan, A., & Roth, J. (2023).** "A More Credible Approach to Parallel Trends." *The Review of Economic Studies*, 90(5), 2555-2591. [https://doi.org/10.1093/restud/rdad018](https://doi.org/10.1093/restud/rdad018)
2926
+
2927
+ This paper introduces the "Honest DiD" framework implemented in our `HonestDiD` class:
2928
+ - **Relative Magnitudes (ΔRM)**: Bounds post-treatment violations by a multiple of observed pre-treatment violations
2929
+ - **Smoothness (ΔSD)**: Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends
2930
+ - **Breakdown Analysis**: Finding the smallest violation magnitude that would overturn conclusions
2931
+ - **Robust Confidence Intervals**: Valid inference under partial identification
2932
+
2933
+ - **Roth, J., & Sant'Anna, P. H. C. (2023).** "When Is Parallel Trends Sensitive to Functional Form?" *Econometrica*, 91(2), 737-747. [https://doi.org/10.3982/ECTA19402](https://doi.org/10.3982/ECTA19402)
2934
+
2935
+ Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate.
2936
+
2937
+ ### Multi-Period and Staggered Adoption
2938
+
2939
+ - **Borusyak, K., Jaravel, X., & Spiess, J. (2024).** "Revisiting Event-Study Designs: Robust and Efficient Estimation." *Review of Economic Studies*, 91(6), 3253-3285. [https://doi.org/10.1093/restud/rdae007](https://doi.org/10.1093/restud/rdae007)
2940
+
2941
+ This paper introduces the imputation estimator implemented in our `ImputationDiD` class:
2942
+ - **Efficient imputation**: OLS on untreated observations → impute counterfactuals → aggregate
2943
+ - **Conservative variance**: Theorem 3 clustered variance estimator with auxiliary model
2944
+ - **Pre-trend test**: Independent of treatment effect estimation (Proposition 9)
2945
+ - **Efficiency gains**: ~50% shorter CIs than Callaway-Sant'Anna under homogeneous effects
2946
+
2947
+ - **Callaway, B., & Sant'Anna, P. H. C. (2021).** "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*, 225(2), 200-230. [https://doi.org/10.1016/j.jeconom.2020.12.001](https://doi.org/10.1016/j.jeconom.2020.12.001)
2948
+
2949
+ - **Sant'Anna, P. H. C., & Zhao, J. (2020).** "Doubly Robust Difference-in-Differences Estimators." *Journal of Econometrics*, 219(1), 101-122. [https://doi.org/10.1016/j.jeconom.2020.06.003](https://doi.org/10.1016/j.jeconom.2020.06.003)
2950
+
2951
+ - **Sun, L., & Abraham, S. (2021).** "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." *Journal of Econometrics*, 225(2), 175-199. [https://doi.org/10.1016/j.jeconom.2020.09.006](https://doi.org/10.1016/j.jeconom.2020.09.006)
2952
+
2953
+ - **Gardner, J. (2022).** "Two-stage differences in differences." *arXiv preprint arXiv:2207.05943*. [https://arxiv.org/abs/2207.05943](https://arxiv.org/abs/2207.05943)
2954
+
2955
+ - **Butts, K., & Gardner, J. (2022).** "did2s: Two-Stage Difference-in-Differences." *The R Journal*, 14(1), 162-173. [https://doi.org/10.32614/RJ-2022-048](https://doi.org/10.32614/RJ-2022-048)
2956
+
2957
+ - **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. [https://doi.org/10.1257/aer.20181169](https://doi.org/10.1257/aer.20181169)
2958
+
2959
+ - **Goodman-Bacon, A. (2021).** "Difference-in-Differences with Variation in Treatment Timing." *Journal of Econometrics*, 225(2), 254-277. [https://doi.org/10.1016/j.jeconom.2021.03.014](https://doi.org/10.1016/j.jeconom.2021.03.014)
2960
+
2961
+ - **Wing, C., Freedman, S. M., & Hollingsworth, A. (2024).** "Stacked Difference-in-Differences." *NBER Working Paper* 32054. [https://www.nber.org/papers/w32054](https://www.nber.org/papers/w32054)
2962
+
2963
+ ### Power Analysis
2964
+
2965
+ - **Bloom, H. S. (1995).** "Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs." *Evaluation Review*, 19(5), 547-556. [https://doi.org/10.1177/0193841X9501900504](https://doi.org/10.1177/0193841X9501900504)
2966
+
2967
+ - **Burlig, F., Preonas, L., & Woerman, M. (2020).** "Panel Data and Experimental Design." *Journal of Development Economics*, 144, 102458. [https://doi.org/10.1016/j.jdeveco.2020.102458](https://doi.org/10.1016/j.jdeveco.2020.102458)
2968
+
2969
+ Essential reference for power analysis in panel DiD designs. Discusses how serial correlation (ICC) affects power and provides formulas for panel data settings.
2970
+
2971
+ - **Djimeu, E. W., & Houndolo, D.-G. (2016).** "Power Calculation for Causal Inference in Social Science: Sample Size and Minimum Detectable Effect Determination." *Journal of Development Effectiveness*, 8(4), 508-527. [https://doi.org/10.1080/19439342.2016.1244555](https://doi.org/10.1080/19439342.2016.1244555)
2972
+
2973
+ ### General Causal Inference
2974
+
2975
+ - **Imbens, G. W., & Rubin, D. B. (2015).** *Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction*. Cambridge University Press.
2976
+
2977
+ - **Cunningham, S. (2021).** *Causal Inference: The Mixtape*. Yale University Press. [https://mixtape.scunning.com/](https://mixtape.scunning.com/)
2978
+
2979
+ ## Citing diff-diff
2980
+
2981
+ If you use diff-diff in your research, please cite it:
2982
+
2983
+ ```bibtex
2984
+ @software{diff_diff,
2985
+ title = {diff-diff: Difference-in-Differences Causal Inference for Python},
2986
+ author = {{diff-diff contributors}},
2987
+ url = {https://github.com/igerber/diff-diff},
2988
+ license = {MIT},
2989
+ }
2990
+ ```
2991
+
2992
+ See [`CITATION.cff`](CITATION.cff) for the full citation metadata.
2993
+
2994
+ ## License
2995
+
2996
+ MIT License
2997
+