diff-diff 1.3.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of diff-diff might be problematic. Click here for more details.

Files changed (38) hide show
  1. diff_diff-1.3.1/PKG-INFO +2255 -0
  2. diff_diff-1.3.1/README.md +2220 -0
  3. diff_diff-1.3.1/diff_diff/__init__.py +190 -0
  4. diff_diff-1.3.1/diff_diff/bacon.py +1027 -0
  5. diff_diff-1.3.1/diff_diff/diagnostics.py +927 -0
  6. diff_diff-1.3.1/diff_diff/estimators.py +1014 -0
  7. diff_diff-1.3.1/diff_diff/honest_did.py +1493 -0
  8. diff_diff-1.3.1/diff_diff/power.py +1350 -0
  9. diff_diff-1.3.1/diff_diff/prep.py +1338 -0
  10. diff_diff-1.3.1/diff_diff/pretrends.py +1067 -0
  11. diff_diff-1.3.1/diff_diff/results.py +703 -0
  12. diff_diff-1.3.1/diff_diff/staggered.py +1873 -0
  13. diff_diff-1.3.1/diff_diff/sun_abraham.py +1198 -0
  14. diff_diff-1.3.1/diff_diff/synthetic_did.py +738 -0
  15. diff_diff-1.3.1/diff_diff/triple_diff.py +1300 -0
  16. diff_diff-1.3.1/diff_diff/twfe.py +344 -0
  17. diff_diff-1.3.1/diff_diff/utils.py +1350 -0
  18. diff_diff-1.3.1/diff_diff/visualization.py +1627 -0
  19. diff_diff-1.3.1/diff_diff.egg-info/PKG-INFO +2255 -0
  20. diff_diff-1.3.1/diff_diff.egg-info/SOURCES.txt +36 -0
  21. diff_diff-1.3.1/diff_diff.egg-info/dependency_links.txt +1 -0
  22. diff_diff-1.3.1/diff_diff.egg-info/requires.txt +14 -0
  23. diff_diff-1.3.1/diff_diff.egg-info/top_level.txt +1 -0
  24. diff_diff-1.3.1/pyproject.toml +88 -0
  25. diff_diff-1.3.1/setup.cfg +4 -0
  26. diff_diff-1.3.1/tests/test_bacon.py +679 -0
  27. diff_diff-1.3.1/tests/test_diagnostics.py +674 -0
  28. diff_diff-1.3.1/tests/test_estimators.py +2711 -0
  29. diff_diff-1.3.1/tests/test_honest_did.py +699 -0
  30. diff_diff-1.3.1/tests/test_power.py +691 -0
  31. diff_diff-1.3.1/tests/test_prep.py +794 -0
  32. diff_diff-1.3.1/tests/test_pretrends.py +813 -0
  33. diff_diff-1.3.1/tests/test_staggered.py +1358 -0
  34. diff_diff-1.3.1/tests/test_sun_abraham.py +732 -0
  35. diff_diff-1.3.1/tests/test_triple_diff.py +869 -0
  36. diff_diff-1.3.1/tests/test_utils.py +1270 -0
  37. diff_diff-1.3.1/tests/test_visualization.py +284 -0
  38. diff_diff-1.3.1/tests/test_wild_bootstrap.py +804 -0
@@ -0,0 +1,2255 @@
1
+ Metadata-Version: 2.4
2
+ Name: diff-diff
3
+ Version: 1.3.1
4
+ Summary: A library for Difference-in-Differences causal inference analysis
5
+ Author: diff-diff contributors
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/igerber/diff-diff
8
+ Project-URL: Documentation, https://diff-diff.readthedocs.io
9
+ Project-URL: Repository, https://github.com/igerber/diff-diff
10
+ Project-URL: Issues, https://github.com/igerber/diff-diff/issues
11
+ Keywords: causal-inference,difference-in-differences,econometrics,statistics,treatment-effects
12
+ Classifier: Development Status :: 5 - Production/Stable
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: Operating System :: OS Independent
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.9
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
21
+ Requires-Python: >=3.9
22
+ Description-Content-Type: text/markdown
23
+ Requires-Dist: numpy>=1.20.0
24
+ Requires-Dist: pandas>=1.3.0
25
+ Requires-Dist: scipy>=1.7.0
26
+ Provides-Extra: dev
27
+ Requires-Dist: pytest>=7.0; extra == "dev"
28
+ Requires-Dist: pytest-cov>=4.0; extra == "dev"
29
+ Requires-Dist: black>=23.0; extra == "dev"
30
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
31
+ Requires-Dist: mypy>=1.0; extra == "dev"
32
+ Provides-Extra: docs
33
+ Requires-Dist: sphinx>=6.0; extra == "docs"
34
+ Requires-Dist: sphinx-rtd-theme>=1.0; extra == "docs"
35
+
36
+ # diff-diff
37
+
38
+ A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs.
39
+
40
+ ## Installation
41
+
42
+ ```bash
43
+ pip install diff-diff
44
+ ```
45
+
46
+ Or install from source:
47
+
48
+ ```bash
49
+ git clone https://github.com/igerber/diff-diff.git
50
+ cd diff-diff
51
+ pip install -e .
52
+ ```
53
+
54
+ ## Quick Start
55
+
56
+ ```python
57
+ import pandas as pd
58
+ from diff_diff import DifferenceInDifferences
59
+
60
+ # Create sample data
61
+ data = pd.DataFrame({
62
+ 'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
63
+ 'treated': [1, 1, 1, 1, 0, 0, 0, 0],
64
+ 'post': [0, 0, 1, 1, 0, 0, 1, 1]
65
+ })
66
+
67
+ # Fit the model
68
+ did = DifferenceInDifferences()
69
+ results = did.fit(data, outcome='outcome', treatment='treated', time='post')
70
+
71
+ # View results
72
+ print(results) # DiDResults(ATT=3.0000, SE=1.7321, p=0.1583)
73
+ results.print_summary()
74
+ ```
75
+
76
+ Output:
77
+ ```
78
+ ======================================================================
79
+ Difference-in-Differences Estimation Results
80
+ ======================================================================
81
+
82
+ Observations: 8
83
+ Treated units: 4
84
+ Control units: 4
85
+ R-squared: 0.9055
86
+
87
+ ----------------------------------------------------------------------
88
+ Parameter Estimate Std. Err. t-stat P>|t|
89
+ ----------------------------------------------------------------------
90
+ ATT 3.0000 1.7321 1.732 0.1583
91
+ ----------------------------------------------------------------------
92
+
93
+ 95% Confidence Interval: [-1.8089, 7.8089]
94
+
95
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
96
+ ======================================================================
97
+ ```
98
+
99
+ ## Features
100
+
101
+ - **sklearn-like API**: Familiar `fit()` interface with `get_params()` and `set_params()`
102
+ - **Pythonic results**: Easy access to coefficients, standard errors, and confidence intervals
103
+ - **Multiple interfaces**: Column names or R-style formulas
104
+ - **Robust inference**: Heteroskedasticity-robust (HC1) and cluster-robust standard errors
105
+ - **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
106
+ - **Panel data support**: Two-way fixed effects estimator for panel designs
107
+ - **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
108
+ - **Staggered adoption**: Callaway-Sant'Anna (2021) and Sun-Abraham (2021) estimators for heterogeneous treatment timing
109
+ - **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
110
+ - **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
111
+ - **Event study plots**: Publication-ready visualization of treatment effects
112
+ - **Parallel trends testing**: Multiple methods including equivalence tests
113
+ - **Goodman-Bacon decomposition**: Diagnose TWFE bias by decomposing into 2x2 comparisons
114
+ - **Placebo tests**: Comprehensive diagnostics including fake timing, fake group, permutation, and leave-one-out tests
115
+ - **Honest DiD sensitivity analysis**: Rambachan-Roth (2023) bounds and breakdown analysis for parallel trends violations
116
+ - **Pre-trends power analysis**: Roth (2022) minimum detectable violation (MDV) and power curves for pre-trends tests
117
+ - **Power analysis**: MDE, sample size, and power calculations for study design; simulation-based power for any estimator
118
+ - **Data prep utilities**: Helper functions for common data preparation tasks
119
+ - **Validated against R**: Benchmarked against `did`, `synthdid`, and `fixest` packages (see [benchmarks](docs/benchmarks.rst))
120
+
121
+ ## Tutorials
122
+
123
+ We provide Jupyter notebook tutorials in `docs/tutorials/`:
124
+
125
+ | Notebook | Description |
126
+ |----------|-------------|
127
+ | `01_basic_did.ipynb` | Basic 2x2 DiD, formula interface, covariates, fixed effects, cluster-robust SE, wild bootstrap |
128
+ | `02_staggered_did.ipynb` | Staggered adoption with Callaway-Sant'Anna and Sun-Abraham, group-time effects, aggregation methods, Bacon decomposition |
129
+ | `03_synthetic_did.ipynb` | Synthetic DiD, unit/time weights, inference methods, regularization |
130
+ | `04_parallel_trends.ipynb` | Testing parallel trends, equivalence tests, placebo tests, diagnostics |
131
+ | `05_honest_did.ipynb` | Honest DiD sensitivity analysis, bounds, breakdown values, visualization |
132
+ | `06_power_analysis.ipynb` | Power analysis, MDE, sample size calculations, simulation-based power |
133
+ | `07_pretrends_power.ipynb` | Pre-trends power analysis (Roth 2022), MDV, power curves |
134
+ | `08_triple_diff.ipynb` | Triple Difference (DDD) estimation with proper covariate handling |
135
+
136
+ ## Data Preparation
137
+
138
+ diff-diff provides utility functions to help prepare your data for DiD analysis. These functions handle common data transformation tasks like creating treatment indicators, reshaping panel data, and validating data formats.
139
+
140
+ ### Generate Sample Data
141
+
142
+ Create synthetic data with a known treatment effect for testing and learning:
143
+
144
+ ```python
145
+ from diff_diff import generate_did_data, DifferenceInDifferences
146
+
147
+ # Generate panel data with 100 units, 4 periods, and a treatment effect of 5
148
+ data = generate_did_data(
149
+ n_units=100,
150
+ n_periods=4,
151
+ treatment_effect=5.0,
152
+ treatment_fraction=0.5, # 50% of units are treated
153
+ treatment_period=2, # Treatment starts at period 2
154
+ seed=42
155
+ )
156
+
157
+ # Verify the estimator recovers the treatment effect
158
+ did = DifferenceInDifferences()
159
+ results = did.fit(data, outcome='outcome', treatment='treated', time='post')
160
+ print(f"Estimated ATT: {results.att:.2f} (true: 5.0)")
161
+ ```
162
+
163
+ ### Create Treatment Indicators
164
+
165
+ Convert categorical variables or numeric thresholds to binary treatment indicators:
166
+
167
+ ```python
168
+ from diff_diff import make_treatment_indicator
169
+
170
+ # From categorical variable
171
+ df = make_treatment_indicator(
172
+ data,
173
+ column='state',
174
+ treated_values=['CA', 'NY', 'TX'] # These states are treated
175
+ )
176
+
177
+ # From numeric threshold (e.g., firms above median size)
178
+ df = make_treatment_indicator(
179
+ data,
180
+ column='firm_size',
181
+ threshold=data['firm_size'].median()
182
+ )
183
+
184
+ # Treat units below threshold
185
+ df = make_treatment_indicator(
186
+ data,
187
+ column='income',
188
+ threshold=50000,
189
+ above_threshold=False # Units with income <= 50000 are treated
190
+ )
191
+ ```
192
+
193
+ ### Create Post-Treatment Indicators
194
+
195
+ Convert time/date columns to binary post-treatment indicators:
196
+
197
+ ```python
198
+ from diff_diff import make_post_indicator
199
+
200
+ # From specific post-treatment periods
201
+ df = make_post_indicator(
202
+ data,
203
+ time_column='year',
204
+ post_periods=[2020, 2021, 2022]
205
+ )
206
+
207
+ # From treatment start date
208
+ df = make_post_indicator(
209
+ data,
210
+ time_column='year',
211
+ treatment_start=2020 # All years >= 2020 are post-treatment
212
+ )
213
+
214
+ # Works with datetime columns
215
+ df = make_post_indicator(
216
+ data,
217
+ time_column='date',
218
+ treatment_start='2020-01-01'
219
+ )
220
+ ```
221
+
222
+ ### Reshape Wide to Long Format
223
+
224
+ Convert wide-format data (one row per unit, multiple time columns) to long format:
225
+
226
+ ```python
227
+ from diff_diff import wide_to_long
228
+
229
+ # Wide format: columns like sales_2019, sales_2020, sales_2021
230
+ wide_df = pd.DataFrame({
231
+ 'firm_id': [1, 2, 3],
232
+ 'industry': ['tech', 'retail', 'tech'],
233
+ 'sales_2019': [100, 150, 200],
234
+ 'sales_2020': [110, 160, 210],
235
+ 'sales_2021': [120, 170, 220]
236
+ })
237
+
238
+ # Convert to long format for DiD
239
+ long_df = wide_to_long(
240
+ wide_df,
241
+ value_columns=['sales_2019', 'sales_2020', 'sales_2021'],
242
+ id_column='firm_id',
243
+ time_name='year',
244
+ value_name='sales',
245
+ time_values=[2019, 2020, 2021]
246
+ )
247
+ # Result: 9 rows (3 firms × 3 years), columns: firm_id, year, sales, industry
248
+ ```
249
+
250
+ ### Balance Panel Data
251
+
252
+ Ensure all units have observations for all time periods:
253
+
254
+ ```python
255
+ from diff_diff import balance_panel
256
+
257
+ # Keep only units with complete data (drop incomplete units)
258
+ balanced = balance_panel(
259
+ data,
260
+ unit_column='firm_id',
261
+ time_column='year',
262
+ method='inner'
263
+ )
264
+
265
+ # Include all unit-period combinations (creates NaN for missing)
266
+ balanced = balance_panel(
267
+ data,
268
+ unit_column='firm_id',
269
+ time_column='year',
270
+ method='outer'
271
+ )
272
+
273
+ # Fill missing values
274
+ balanced = balance_panel(
275
+ data,
276
+ unit_column='firm_id',
277
+ time_column='year',
278
+ method='fill',
279
+ fill_value=0 # Or None for forward/backward fill
280
+ )
281
+ ```
282
+
283
+ ### Validate Data
284
+
285
+ Check that your data meets DiD requirements before fitting:
286
+
287
+ ```python
288
+ from diff_diff import validate_did_data
289
+
290
+ # Validate and get informative error messages
291
+ result = validate_did_data(
292
+ data,
293
+ outcome='sales',
294
+ treatment='treated',
295
+ time='post',
296
+ unit='firm_id', # Optional: for panel-specific validation
297
+ raise_on_error=False # Return dict instead of raising
298
+ )
299
+
300
+ if result['valid']:
301
+ print("Data is ready for DiD analysis!")
302
+ print(f"Summary: {result['summary']}")
303
+ else:
304
+ print("Issues found:")
305
+ for error in result['errors']:
306
+ print(f" - {error}")
307
+
308
+ for warning in result['warnings']:
309
+ print(f"Warning: {warning}")
310
+ ```
311
+
312
+ ### Summarize Data by Groups
313
+
314
+ Get summary statistics for each treatment-time cell:
315
+
316
+ ```python
317
+ from diff_diff import summarize_did_data
318
+
319
+ summary = summarize_did_data(
320
+ data,
321
+ outcome='sales',
322
+ treatment='treated',
323
+ time='post'
324
+ )
325
+ print(summary)
326
+ ```
327
+
328
+ Output:
329
+ ```
330
+ n mean std min max
331
+ Control - Pre 250 100.5000 15.2340 65.0000 145.0000
332
+ Control - Post 250 105.2000 16.1230 68.0000 152.0000
333
+ Treated - Pre 250 101.2000 14.8900 67.0000 143.0000
334
+ Treated - Post 250 115.8000 17.5600 72.0000 165.0000
335
+ DiD Estimate - 9.9000 - - -
336
+ ```
337
+
338
+ ### Create Event Time for Staggered Designs
339
+
340
+ For designs where treatment occurs at different times:
341
+
342
+ ```python
343
+ from diff_diff import create_event_time
344
+
345
+ # Add event-time column relative to treatment timing
346
+ df = create_event_time(
347
+ data,
348
+ time_column='year',
349
+ treatment_time_column='treatment_year'
350
+ )
351
+ # Result: event_time = -2, -1, 0, 1, 2 relative to treatment
352
+ ```
353
+
354
+ ### Aggregate to Cohort Means
355
+
356
+ Aggregate unit-level data for visualization:
357
+
358
+ ```python
359
+ from diff_diff import aggregate_to_cohorts
360
+
361
+ cohort_data = aggregate_to_cohorts(
362
+ data,
363
+ unit_column='firm_id',
364
+ time_column='year',
365
+ treatment_column='treated',
366
+ outcome='sales'
367
+ )
368
+ # Result: mean outcome by treatment group and period
369
+ ```
370
+
371
+ ### Rank Control Units
372
+
373
+ Select the best control units for DiD or Synthetic DiD analysis by ranking them based on pre-treatment outcome similarity:
374
+
375
+ ```python
376
+ from diff_diff import rank_control_units, generate_did_data
377
+
378
+ # Generate sample data
379
+ data = generate_did_data(n_units=50, n_periods=6, seed=42)
380
+
381
+ # Rank control units by their similarity to treated units
382
+ ranking = rank_control_units(
383
+ data,
384
+ unit_column='unit',
385
+ time_column='period',
386
+ outcome_column='outcome',
387
+ treatment_column='treated',
388
+ n_top=10 # Return top 10 controls
389
+ )
390
+
391
+ print(ranking[['unit', 'quality_score', 'pre_trend_rmse']])
392
+ ```
393
+
394
+ Output:
395
+ ```
396
+ unit quality_score pre_trend_rmse
397
+ 0 35 1.0000 0.4521
398
+ 1 42 0.9234 0.5123
399
+ 2 28 0.8876 0.5892
400
+ ...
401
+ ```
402
+
403
+ With covariates for matching:
404
+
405
+ ```python
406
+ # Add covariate-based matching
407
+ ranking = rank_control_units(
408
+ data,
409
+ unit_column='unit',
410
+ time_column='period',
411
+ outcome_column='outcome',
412
+ treatment_column='treated',
413
+ covariates=['size', 'age'], # Match on these too
414
+ outcome_weight=0.7, # 70% weight on outcome trends
415
+ covariate_weight=0.3 # 30% weight on covariate similarity
416
+ )
417
+ ```
418
+
419
+ Filter data for SyntheticDiD using top controls:
420
+
421
+ ```python
422
+ from diff_diff import SyntheticDiD
423
+
424
+ # Get top control units
425
+ top_controls = ranking['unit'].tolist()
426
+
427
+ # Filter data to treated + top controls
428
+ filtered_data = data[
429
+ (data['treated'] == 1) | (data['unit'].isin(top_controls))
430
+ ]
431
+
432
+ # Fit SyntheticDiD with selected controls
433
+ sdid = SyntheticDiD()
434
+ results = sdid.fit(
435
+ filtered_data,
436
+ outcome='outcome',
437
+ treatment='treated',
438
+ unit='unit',
439
+ time='period',
440
+ post_periods=[3, 4, 5]
441
+ )
442
+ ```
443
+
444
+ ## Usage
445
+
446
+ ### Basic DiD with Column Names
447
+
448
+ ```python
449
+ from diff_diff import DifferenceInDifferences
450
+
451
+ did = DifferenceInDifferences(robust=True, alpha=0.05)
452
+ results = did.fit(
453
+ data,
454
+ outcome='sales',
455
+ treatment='treated',
456
+ time='post_policy'
457
+ )
458
+
459
+ # Access results
460
+ print(f"ATT: {results.att:.4f}")
461
+ print(f"Standard Error: {results.se:.4f}")
462
+ print(f"P-value: {results.p_value:.4f}")
463
+ print(f"95% CI: {results.conf_int}")
464
+ print(f"Significant: {results.is_significant}")
465
+ ```
466
+
467
+ ### Using Formula Interface
468
+
469
+ ```python
470
+ # R-style formula syntax
471
+ results = did.fit(data, formula='outcome ~ treated * post')
472
+
473
+ # Explicit interaction syntax
474
+ results = did.fit(data, formula='outcome ~ treated + post + treated:post')
475
+
476
+ # With covariates
477
+ results = did.fit(data, formula='outcome ~ treated * post + age + income')
478
+ ```
479
+
480
+ ### Including Covariates
481
+
482
+ ```python
483
+ results = did.fit(
484
+ data,
485
+ outcome='outcome',
486
+ treatment='treated',
487
+ time='post',
488
+ covariates=['age', 'income', 'education']
489
+ )
490
+ ```
491
+
492
+ ### Fixed Effects
493
+
494
+ Use `fixed_effects` for low-dimensional categorical controls (creates dummy variables):
495
+
496
+ ```python
497
+ # State and industry fixed effects
498
+ results = did.fit(
499
+ data,
500
+ outcome='sales',
501
+ treatment='treated',
502
+ time='post',
503
+ fixed_effects=['state', 'industry']
504
+ )
505
+
506
+ # Access fixed effect coefficients
507
+ state_coefs = {k: v for k, v in results.coefficients.items() if k.startswith('state_')}
508
+ ```
509
+
510
+ Use `absorb` for high-dimensional fixed effects (more efficient, uses within-transformation):
511
+
512
+ ```python
513
+ # Absorb firm-level fixed effects (efficient for many firms)
514
+ results = did.fit(
515
+ data,
516
+ outcome='sales',
517
+ treatment='treated',
518
+ time='post',
519
+ absorb=['firm_id']
520
+ )
521
+ ```
522
+
523
+ Combine covariates with fixed effects:
524
+
525
+ ```python
526
+ results = did.fit(
527
+ data,
528
+ outcome='sales',
529
+ treatment='treated',
530
+ time='post',
531
+ covariates=['size', 'age'], # Linear controls
532
+ fixed_effects=['industry'], # Low-dimensional FE (dummies)
533
+ absorb=['firm_id'] # High-dimensional FE (absorbed)
534
+ )
535
+ ```
536
+
537
+ ### Cluster-Robust Standard Errors
538
+
539
+ ```python
540
+ did = DifferenceInDifferences(cluster='state')
541
+ results = did.fit(
542
+ data,
543
+ outcome='outcome',
544
+ treatment='treated',
545
+ time='post'
546
+ )
547
+ ```
548
+
549
+ ### Wild Cluster Bootstrap
550
+
551
+ When you have few clusters (<50), standard cluster-robust SEs are biased. Wild cluster bootstrap provides valid inference even with 5-10 clusters.
552
+
553
+ ```python
554
+ # Use wild bootstrap for inference
555
+ did = DifferenceInDifferences(
556
+ cluster='state',
557
+ inference='wild_bootstrap',
558
+ n_bootstrap=999,
559
+ bootstrap_weights='rademacher', # or 'webb' for <10 clusters, 'mammen'
560
+ seed=42
561
+ )
562
+ results = did.fit(data, outcome='y', treatment='treated', time='post')
563
+
564
+ # Results include bootstrap-based SE and p-value
565
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
566
+ print(f"P-value: {results.p_value:.4f}")
567
+ print(f"95% CI: {results.conf_int}")
568
+ print(f"Inference method: {results.inference_method}")
569
+ print(f"Number of clusters: {results.n_clusters}")
570
+ ```
571
+
572
+ **Weight types:**
573
+ - `'rademacher'` - Default, ±1 with p=0.5, good for most cases
574
+ - `'webb'` - 6-point distribution, recommended for <10 clusters
575
+ - `'mammen'` - Two-point distribution, alternative to Rademacher
576
+
577
+ Works with `DifferenceInDifferences` and `TwoWayFixedEffects` estimators.
578
+
579
+ ### Two-Way Fixed Effects (Panel Data)
580
+
581
+ ```python
582
+ from diff_diff import TwoWayFixedEffects
583
+
584
+ twfe = TwoWayFixedEffects()
585
+ results = twfe.fit(
586
+ panel_data,
587
+ outcome='outcome',
588
+ treatment='treated',
589
+ time='year',
590
+ unit='firm_id'
591
+ )
592
+ ```
593
+
594
+ ### Multi-Period DiD (Event Study)
595
+
596
+ For settings with multiple pre- and post-treatment periods:
597
+
598
+ ```python
599
+ from diff_diff import MultiPeriodDiD
600
+
601
+ # Fit with multiple time periods
602
+ did = MultiPeriodDiD()
603
+ results = did.fit(
604
+ panel_data,
605
+ outcome='sales',
606
+ treatment='treated',
607
+ time='period',
608
+ post_periods=[3, 4, 5], # Periods 3-5 are post-treatment
609
+ reference_period=0 # Reference period for comparison
610
+ )
611
+
612
+ # View period-specific treatment effects
613
+ for period, effect in results.period_effects.items():
614
+ print(f"Period {period}: {effect.effect:.3f} (SE: {effect.se:.3f})")
615
+
616
+ # View average treatment effect across post-periods
617
+ print(f"Average ATT: {results.avg_att:.3f}")
618
+ print(f"Average SE: {results.avg_se:.3f}")
619
+
620
+ # Full summary with all period effects
621
+ results.print_summary()
622
+ ```
623
+
624
+ Output:
625
+ ```
626
+ ================================================================================
627
+ Multi-Period Difference-in-Differences Estimation Results
628
+ ================================================================================
629
+
630
+ Observations: 600
631
+ Pre-treatment periods: 3
632
+ Post-treatment periods: 3
633
+
634
+ --------------------------------------------------------------------------------
635
+ Average Treatment Effect
636
+ --------------------------------------------------------------------------------
637
+ Average ATT 5.2000 0.8234 6.315 0.0000
638
+ --------------------------------------------------------------------------------
639
+ 95% Confidence Interval: [3.5862, 6.8138]
640
+
641
+ Period-Specific Effects:
642
+ --------------------------------------------------------------------------------
643
+ Period Effect Std. Err. t-stat P>|t|
644
+ --------------------------------------------------------------------------------
645
+ 3 4.5000 0.9512 4.731 0.0000***
646
+ 4 5.2000 0.8876 5.858 0.0000***
647
+ 5 5.9000 0.9123 6.468 0.0000***
648
+ --------------------------------------------------------------------------------
649
+
650
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
651
+ ================================================================================
652
+ ```
653
+
654
+ ### Staggered Difference-in-Differences (Callaway-Sant'Anna)
655
+
656
+ When treatment is adopted at different times by different units, traditional TWFE estimators can be biased. The Callaway-Sant'Anna estimator provides unbiased estimates with staggered adoption.
657
+
658
+ ```python
659
+ from diff_diff import CallawaySantAnna
660
+
661
+ # Panel data with staggered treatment
662
+ # 'first_treat' = period when unit was first treated (0 if never treated)
663
+ cs = CallawaySantAnna()
664
+ results = cs.fit(
665
+ panel_data,
666
+ outcome='sales',
667
+ unit='firm_id',
668
+ time='year',
669
+ first_treat='first_treat', # 0 for never-treated, else first treatment year
670
+ aggregate='event_study' # Compute event study effects
671
+ )
672
+
673
+ # View results
674
+ results.print_summary()
675
+
676
+ # Access group-time effects ATT(g,t)
677
+ for (group, time), effect in results.group_time_effects.items():
678
+ print(f"Cohort {group}, Period {time}: {effect['effect']:.3f}")
679
+
680
+ # Event study effects (averaged by relative time)
681
+ for rel_time, effect in results.event_study_effects.items():
682
+ print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")
683
+
684
+ # Convert to DataFrame
685
+ df = results.to_dataframe(level='event_study')
686
+ ```
687
+
688
+ Output:
689
+ ```
690
+ =====================================================================================
691
+ Callaway-Sant'Anna Staggered Difference-in-Differences Results
692
+ =====================================================================================
693
+
694
+ Total observations: 600
695
+ Treated units: 35
696
+ Control units: 15
697
+ Treatment cohorts: 3
698
+ Time periods: 8
699
+ Control group: never_treated
700
+
701
+ -------------------------------------------------------------------------------------
702
+ Overall Average Treatment Effect on the Treated
703
+ -------------------------------------------------------------------------------------
704
+ Parameter Estimate Std. Err. t-stat P>|t| Sig.
705
+ -------------------------------------------------------------------------------------
706
+ ATT 2.5000 0.3521 7.101 0.0000 ***
707
+ -------------------------------------------------------------------------------------
708
+
709
+ 95% Confidence Interval: [1.8099, 3.1901]
710
+
711
+ -------------------------------------------------------------------------------------
712
+ Event Study (Dynamic) Effects
713
+ -------------------------------------------------------------------------------------
714
+ Rel. Period Estimate Std. Err. t-stat P>|t| Sig.
715
+ -------------------------------------------------------------------------------------
716
+ 0 2.1000 0.4521 4.645 0.0000 ***
717
+ 1 2.5000 0.4123 6.064 0.0000 ***
718
+ 2 2.8000 0.5234 5.349 0.0000 ***
719
+ -------------------------------------------------------------------------------------
720
+
721
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
722
+ =====================================================================================
723
+ ```
724
+
725
+ **When to use Callaway-Sant'Anna vs TWFE:**
726
+
727
+ | Scenario | Use TWFE | Use Callaway-Sant'Anna |
728
+ |----------|----------|------------------------|
729
+ | All units treated at same time | ✓ | ✓ |
730
+ | Staggered adoption, homogeneous effects | ✓ | ✓ |
731
+ | Staggered adoption, heterogeneous effects | ✗ | ✓ |
732
+ | Need event study with staggered timing | ✗ | ✓ |
733
+ | Fewer than ~20 treated units | ✓ | Depends on design |
734
+
735
+ **Parameters:**
736
+
737
+ ```python
738
+ CallawaySantAnna(
739
+ control_group='never_treated', # or 'not_yet_treated'
740
+ anticipation=0, # Periods before treatment with effects
741
+ estimation_method='dr', # 'dr', 'ipw', or 'reg'
742
+ alpha=0.05, # Significance level
743
+ cluster=None, # Column for cluster SEs
744
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
745
+ bootstrap_weight_type='rademacher', # 'rademacher', 'mammen', or 'webb'
746
+ seed=None # Random seed
747
+ )
748
+ ```
749
+
750
+ **Multiplier bootstrap for inference:**
751
+
752
+ With few clusters or when analytical standard errors may be unreliable, use the multiplier bootstrap for valid inference. This implements the approach from Callaway & Sant'Anna (2021).
753
+
754
+ ```python
755
+ # Bootstrap inference with 999 iterations
756
+ cs = CallawaySantAnna(
757
+ n_bootstrap=999,
758
+ bootstrap_weight_type='rademacher', # or 'mammen', 'webb'
759
+ seed=42
760
+ )
761
+ results = cs.fit(
762
+ data,
763
+ outcome='sales',
764
+ unit='firm_id',
765
+ time='year',
766
+ first_treat='first_treat',
767
+ aggregate='event_study'
768
+ )
769
+
770
+ # Access bootstrap results
771
+ print(f"Overall ATT: {results.overall_att:.3f}")
772
+ print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}")
773
+ print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}")
774
+ print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}")
775
+
776
+ # Event study bootstrap inference
777
+ for rel_time, se in results.bootstrap_results.event_study_ses.items():
778
+ ci = results.bootstrap_results.event_study_cis[rel_time]
779
+ print(f"e={rel_time}: SE={se:.3f}, 95% CI=[{ci[0]:.3f}, {ci[1]:.3f}]")
780
+ ```
781
+
782
+ **Bootstrap weight types:**
783
+ - `'rademacher'` - Default, ±1 with p=0.5, good for most cases
784
+ - `'mammen'` - Two-point distribution matching first 3 moments
785
+ - `'webb'` - Six-point distribution, recommended for very few clusters (<10)
786
+
787
+ **Covariate adjustment for conditional parallel trends:**
788
+
789
+ When parallel trends only holds conditional on covariates, use the `covariates` parameter:
790
+
791
+ ```python
792
+ # Doubly robust estimation with covariates
793
+ cs = CallawaySantAnna(estimation_method='dr') # 'dr', 'ipw', or 'reg'
794
+ results = cs.fit(
795
+ data,
796
+ outcome='sales',
797
+ unit='firm_id',
798
+ time='year',
799
+ first_treat='first_treat',
800
+ covariates=['size', 'age', 'industry'], # Covariates for conditional PT
801
+ aggregate='event_study'
802
+ )
803
+ ```
804
+
805
+ ### Sun-Abraham Interaction-Weighted Estimator
806
+
807
+ The Sun-Abraham (2021) estimator provides an alternative to Callaway-Sant'Anna using an interaction-weighted (IW) regression approach. Running both estimators serves as a useful robustness check—when they agree, results are more credible.
808
+
809
+ ```python
810
+ from diff_diff import SunAbraham
811
+
812
+ # Basic usage
813
+ sa = SunAbraham()
814
+ results = sa.fit(
815
+ panel_data,
816
+ outcome='sales',
817
+ unit='firm_id',
818
+ time='year',
819
+ first_treat='first_treat' # 0 for never-treated, else first treatment year
820
+ )
821
+
822
+ # View results
823
+ results.print_summary()
824
+
825
+ # Event study effects (by relative time to treatment)
826
+ for rel_time, effect in results.event_study_effects.items():
827
+ print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")
828
+
829
+ # Overall ATT
830
+ print(f"Overall ATT: {results.overall_att:.3f} (SE: {results.overall_se:.3f})")
831
+
832
+ # Cohort weights (how each cohort contributes to each event-time estimate)
833
+ for rel_time, weights in results.cohort_weights.items():
834
+ print(f"e={rel_time}: {weights}")
835
+ ```
836
+
837
+ **Parameters:**
838
+
839
+ ```python
840
+ SunAbraham(
841
+ control_group='never_treated', # or 'not_yet_treated'
842
+ anticipation=0, # Periods before treatment with effects
843
+ alpha=0.05, # Significance level
844
+ cluster=None, # Column for cluster SEs
845
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
846
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
847
+ seed=None # Random seed
848
+ )
849
+ ```
850
+
851
+ **Bootstrap inference:**
852
+
853
+ ```python
854
+ # Bootstrap inference with 999 iterations
855
+ sa = SunAbraham(
856
+ n_bootstrap=999,
857
+ bootstrap_weights='rademacher',
858
+ seed=42
859
+ )
860
+ results = sa.fit(
861
+ data,
862
+ outcome='sales',
863
+ unit='firm_id',
864
+ time='year',
865
+ first_treat='first_treat'
866
+ )
867
+
868
+ # Access bootstrap results
869
+ print(f"Overall ATT: {results.overall_att:.3f}")
870
+ print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}")
871
+ print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}")
872
+ print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}")
873
+ ```
874
+
875
+ **When to use Sun-Abraham vs Callaway-Sant'Anna:**
876
+
877
+ | Aspect | Sun-Abraham | Callaway-Sant'Anna |
878
+ |--------|-------------|-------------------|
879
+ | Approach | Interaction-weighted regression | 2x2 DiD aggregation |
880
+ | Efficiency | More efficient under homogeneous effects | More robust to heterogeneity |
881
+ | Weighting | Weights by cohort share at each relative time | Weights by sample size |
882
+ | Use case | Robustness check, regression-based inference | Primary staggered DiD estimator |
883
+
884
+ **Both estimators should give similar results when:**
885
+ - Treatment effects are relatively homogeneous across cohorts
886
+ - Parallel trends holds
887
+
888
+ **Running both as robustness check:**
889
+
890
+ ```python
891
+ from diff_diff import CallawaySantAnna, SunAbraham
892
+
893
+ # Callaway-Sant'Anna
894
+ cs = CallawaySantAnna()
895
+ cs_results = cs.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat')
896
+
897
+ # Sun-Abraham
898
+ sa = SunAbraham()
899
+ sa_results = sa.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat')
900
+
901
+ # Compare
902
+ print(f"Callaway-Sant'Anna ATT: {cs_results.overall_att:.3f}")
903
+ print(f"Sun-Abraham ATT: {sa_results.overall_att:.3f}")
904
+
905
+ # If results differ substantially, investigate heterogeneity
906
+ ```
907
+
908
+ ### Triple Difference (DDD)
909
+
910
+ Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).
911
+
912
+ ```python
913
+ from diff_diff import TripleDifference, triple_difference
914
+
915
+ # Basic usage
916
+ ddd = TripleDifference(estimation_method='dr') # doubly robust (recommended)
917
+ results = ddd.fit(
918
+ data,
919
+ outcome='wages',
920
+ group='policy_state', # 1=state enacted policy, 0=control state
921
+ partition='female', # 1=women (affected by policy), 0=men
922
+ time='post' # 1=post-policy, 0=pre-policy
923
+ )
924
+
925
+ # View results
926
+ results.print_summary()
927
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
928
+
929
+ # With covariates (properly incorporated, unlike naive DDD)
930
+ results = ddd.fit(
931
+ data,
932
+ outcome='wages',
933
+ group='policy_state',
934
+ partition='female',
935
+ time='post',
936
+ covariates=['age', 'education', 'experience']
937
+ )
938
+ ```
939
+
940
+ **Estimation methods:**
941
+
942
+ | Method | Description | When to use |
943
+ |--------|-------------|-------------|
944
+ | `"dr"` | Doubly robust | Recommended. Consistent if either outcome or propensity model is correct |
945
+ | `"reg"` | Regression adjustment | Simple outcome regression with full interactions |
946
+ | `"ipw"` | Inverse probability weighting | When propensity score model is well-specified |
947
+
948
+ ```python
949
+ # Compare estimation methods
950
+ for method in ['reg', 'ipw', 'dr']:
951
+ est = TripleDifference(estimation_method=method)
952
+ res = est.fit(data, outcome='y', group='g', partition='p', time='t')
953
+ print(f"{method}: ATT={res.att:.3f} (SE={res.se:.3f})")
954
+ ```
955
+
956
+ **Convenience function:**
957
+
958
+ ```python
959
+ # One-liner estimation
960
+ results = triple_difference(
961
+ data,
962
+ outcome='wages',
963
+ group='policy_state',
964
+ partition='female',
965
+ time='post',
966
+ covariates=['age', 'education'],
967
+ estimation_method='dr'
968
+ )
969
+ ```
970
+
971
+ **Why use DDD instead of DiD?**
972
+
973
+ DDD allows for violations of parallel trends that are:
974
+ - Group-specific (e.g., economic shocks in treatment states)
975
+ - Partition-specific (e.g., trends affecting women everywhere)
976
+
977
+ As long as these biases are additive, DDD differences them out. The key assumption is that the *differential* trend between eligible and ineligible units would be the same across groups.
978
+
979
+ ### Event Study Visualization
980
+
981
+ Create publication-ready event study plots:
982
+
983
+ ```python
984
+ from diff_diff import plot_event_study, MultiPeriodDiD, CallawaySantAnna, SunAbraham
985
+
986
+ # From MultiPeriodDiD
987
+ did = MultiPeriodDiD()
988
+ results = did.fit(data, outcome='y', treatment='treated',
989
+ time='period', post_periods=[3, 4, 5])
990
+ plot_event_study(results, title="Treatment Effects Over Time")
991
+
992
+ # From CallawaySantAnna (with event study aggregation)
993
+ cs = CallawaySantAnna()
994
+ results = cs.fit(data, outcome='y', unit='unit', time='period',
995
+ first_treat='first_treat', aggregate='event_study')
996
+ plot_event_study(results, title="Staggered DiD Event Study (CS)")
997
+
998
+ # From SunAbraham
999
+ sa = SunAbraham()
1000
+ results = sa.fit(data, outcome='y', unit='unit', time='period',
1001
+ first_treat='first_treat')
1002
+ plot_event_study(results, title="Staggered DiD Event Study (SA)")
1003
+
1004
+ # From a DataFrame
1005
+ df = pd.DataFrame({
1006
+ 'period': [-2, -1, 0, 1, 2],
1007
+ 'effect': [0.1, 0.05, 0.0, 2.5, 2.8],
1008
+ 'se': [0.3, 0.25, 0.0, 0.4, 0.45]
1009
+ })
1010
+ plot_event_study(df, reference_period=0)
1011
+
1012
+ # With customization
1013
+ ax = plot_event_study(
1014
+ results,
1015
+ title="Dynamic Treatment Effects",
1016
+ xlabel="Years Relative to Treatment",
1017
+ ylabel="Effect on Sales ($1000s)",
1018
+ color="#2563eb",
1019
+ marker="o",
1020
+ shade_pre=True, # Shade pre-treatment region
1021
+ show_zero_line=True, # Horizontal line at y=0
1022
+ show_reference_line=True, # Vertical line at reference period
1023
+ figsize=(10, 6),
1024
+ show=False # Don't call plt.show(), return axes
1025
+ )
1026
+ ```
1027
+
1028
+ ### Synthetic Difference-in-Differences
1029
+
1030
+ Synthetic DiD combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units' pre-treatment outcomes.
1031
+
1032
+ ```python
1033
+ from diff_diff import SyntheticDiD
1034
+
1035
+ # Fit Synthetic DiD model
1036
+ sdid = SyntheticDiD()
1037
+ results = sdid.fit(
1038
+ panel_data,
1039
+ outcome='gdp_growth',
1040
+ treatment='treated',
1041
+ unit='state',
1042
+ time='year',
1043
+ post_periods=[2015, 2016, 2017, 2018]
1044
+ )
1045
+
1046
+ # View results
1047
+ results.print_summary()
1048
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
1049
+
1050
+ # Examine unit weights (which control units matter most)
1051
+ weights_df = results.get_unit_weights_df()
1052
+ print(weights_df.head(10))
1053
+
1054
+ # Examine time weights
1055
+ time_weights_df = results.get_time_weights_df()
1056
+ print(time_weights_df)
1057
+ ```
1058
+
1059
+ Output:
1060
+ ```
1061
+ ===========================================================================
1062
+ Synthetic Difference-in-Differences Estimation Results
1063
+ ===========================================================================
1064
+
1065
+ Observations: 500
1066
+ Treated units: 1
1067
+ Control units: 49
1068
+ Pre-treatment periods: 6
1069
+ Post-treatment periods: 4
1070
+ Regularization (lambda): 0.0000
1071
+ Pre-treatment fit (RMSE): 0.1234
1072
+
1073
+ ---------------------------------------------------------------------------
1074
+ Parameter Estimate Std. Err. t-stat P>|t|
1075
+ ---------------------------------------------------------------------------
1076
+ ATT 2.5000 0.4521 5.530 0.0000
1077
+ ---------------------------------------------------------------------------
1078
+
1079
+ 95% Confidence Interval: [1.6139, 3.3861]
1080
+
1081
+ ---------------------------------------------------------------------------
1082
+ Top Unit Weights (Synthetic Control)
1083
+ ---------------------------------------------------------------------------
1084
+ Unit state_12: 0.3521
1085
+ Unit state_5: 0.2156
1086
+ Unit state_23: 0.1834
1087
+ Unit state_8: 0.1245
1088
+ Unit state_31: 0.0892
1089
+ (8 units with weight > 0.001)
1090
+
1091
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
1092
+ ===========================================================================
1093
+ ```
1094
+
1095
+ #### When to Use Synthetic DiD Over Vanilla DiD
1096
+
1097
+ Use Synthetic DiD instead of standard DiD when:
1098
+
1099
+ 1. **Few treated units**: When you have only one or a small number of treated units (e.g., a single state passed a policy), standard DiD averages across all controls equally. Synthetic DiD finds the optimal weighted combination of controls.
1100
+
1101
+ ```python
1102
+ # Example: California passed a policy, want to estimate its effect
1103
+ # Standard DiD would compare CA to the average of all other states
1104
+ # Synthetic DiD finds states that together best match CA's pre-treatment trend
1105
+ ```
1106
+
1107
+ 2. **Parallel trends is questionable**: When treated and control groups have different pre-treatment levels or trends, Synthetic DiD can construct a better counterfactual by matching the pre-treatment trajectory.
1108
+
1109
+ ```python
1110
+ # Example: A tech hub city vs rural areas
1111
+ # Rural areas may not be a good comparison on average
1112
+ # Synthetic DiD can weight urban/suburban controls more heavily
1113
+ ```
1114
+
1115
+ 3. **Heterogeneous control units**: When control units are very different from each other, equal weighting (as in standard DiD) is suboptimal.
1116
+
1117
+ ```python
1118
+ # Example: Comparing a treated developing country to other countries
1119
+ # Some control countries may be much more similar economically
1120
+ # Synthetic DiD upweights the most comparable controls
1121
+ ```
1122
+
1123
+ 4. **You want transparency**: Synthetic DiD provides explicit unit weights showing which controls contribute most to the comparison.
1124
+
1125
+ ```python
1126
+ # See exactly which units are driving the counterfactual
1127
+ print(results.get_unit_weights_df())
1128
+ ```
1129
+
1130
+ **Key differences from standard DiD:**
1131
+
1132
+ | Aspect | Standard DiD | Synthetic DiD |
1133
+ |--------|--------------|---------------|
1134
+ | Control weighting | Equal (1/N) | Optimized to match pre-treatment |
1135
+ | Time weighting | Equal across periods | Can emphasize informative periods |
1136
+ | N treated required | Can be many | Works with 1 treated unit |
1137
+ | Parallel trends | Assumed | Partially relaxed via matching |
1138
+ | Interpretability | Simple average | Explicit weights |
1139
+
1140
+ **Parameters:**
1141
+
1142
+ ```python
1143
+ SyntheticDiD(
1144
+ lambda_reg=0.0, # Regularization toward uniform weights (0 = no reg)
1145
+ zeta=1.0, # Time weight regularization (higher = more uniform)
1146
+ alpha=0.05, # Significance level
1147
+ n_bootstrap=200, # Bootstrap iterations for SE (0 = placebo-based)
1148
+ seed=None # Random seed for reproducibility
1149
+ )
1150
+ ```
1151
+
1152
+ ## Working with Results
1153
+
1154
+ ### Export Results
1155
+
1156
+ ```python
1157
+ # As dictionary
1158
+ results.to_dict()
1159
+ # {'att': 3.5, 'se': 1.26, 'p_value': 0.037, ...}
1160
+
1161
+ # As DataFrame
1162
+ df = results.to_dataframe()
1163
+ ```
1164
+
1165
+ ### Check Significance
1166
+
1167
+ ```python
1168
+ if results.is_significant:
1169
+ print(f"Effect is significant at {did.alpha} level")
1170
+
1171
+ # Get significance stars
1172
+ print(f"ATT: {results.att}{results.significance_stars}")
1173
+ # ATT: 3.5000*
1174
+ ```
1175
+
1176
+ ### Access Full Regression Output
1177
+
1178
+ ```python
1179
+ # All coefficients
1180
+ results.coefficients
1181
+ # {'const': 9.5, 'treated': 1.0, 'post': 2.5, 'treated:post': 3.5}
1182
+
1183
+ # Variance-covariance matrix
1184
+ results.vcov
1185
+
1186
+ # Residuals and fitted values
1187
+ results.residuals
1188
+ results.fitted_values
1189
+
1190
+ # R-squared
1191
+ results.r_squared
1192
+ ```
1193
+
1194
+ ## Checking Assumptions
1195
+
1196
+ ### Parallel Trends
1197
+
1198
+ **Simple slope-based test:**
1199
+
1200
+ ```python
1201
+ from diff_diff.utils import check_parallel_trends
1202
+
1203
+ trends = check_parallel_trends(
1204
+ data,
1205
+ outcome='outcome',
1206
+ time='period',
1207
+ treatment_group='treated'
1208
+ )
1209
+
1210
+ print(f"Treated trend: {trends['treated_trend']:.4f}")
1211
+ print(f"Control trend: {trends['control_trend']:.4f}")
1212
+ print(f"Difference p-value: {trends['p_value']:.4f}")
1213
+ ```
1214
+
1215
+ **Robust distributional test (Wasserstein distance):**
1216
+
1217
+ ```python
1218
+ from diff_diff.utils import check_parallel_trends_robust
1219
+
1220
+ results = check_parallel_trends_robust(
1221
+ data,
1222
+ outcome='outcome',
1223
+ time='period',
1224
+ treatment_group='treated',
1225
+ unit='firm_id', # Unit identifier for panel data
1226
+ pre_periods=[2018, 2019], # Pre-treatment periods
1227
+ n_permutations=1000 # Permutations for p-value
1228
+ )
1229
+
1230
+ print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}")
1231
+ print(f"Wasserstein p-value: {results['wasserstein_p_value']:.4f}")
1232
+ print(f"KS test p-value: {results['ks_p_value']:.4f}")
1233
+ print(f"Parallel trends plausible: {results['parallel_trends_plausible']}")
1234
+ ```
1235
+
1236
+ The Wasserstein (Earth Mover's) distance compares the full distribution of outcome changes, not just means. This is more robust to:
1237
+ - Non-normal distributions
1238
+ - Heterogeneous effects across units
1239
+ - Outliers
1240
+
1241
+ **Equivalence testing (TOST):**
1242
+
1243
+ ```python
1244
+ from diff_diff.utils import equivalence_test_trends
1245
+
1246
+ results = equivalence_test_trends(
1247
+ data,
1248
+ outcome='outcome',
1249
+ time='period',
1250
+ treatment_group='treated',
1251
+ unit='firm_id',
1252
+ equivalence_margin=0.5 # Define "practically equivalent"
1253
+ )
1254
+
1255
+ print(f"Mean difference: {results['mean_difference']:.4f}")
1256
+ print(f"TOST p-value: {results['tost_p_value']:.4f}")
1257
+ print(f"Trends equivalent: {results['equivalent']}")
1258
+ ```
1259
+
1260
+ ### Honest DiD Sensitivity Analysis (Rambachan-Roth)
1261
+
1262
+ Pre-trends tests have low power and can exacerbate bias. **Honest DiD** (Rambachan & Roth 2023) provides sensitivity analysis showing how robust your results are to violations of parallel trends.
1263
+
1264
+ ```python
1265
+ from diff_diff import HonestDiD, MultiPeriodDiD
1266
+
1267
+ # First, fit a standard event study
1268
+ did = MultiPeriodDiD()
1269
+ event_results = did.fit(
1270
+ data,
1271
+ outcome='outcome',
1272
+ treatment='treated',
1273
+ time='period',
1274
+ post_periods=[5, 6, 7, 8, 9]
1275
+ )
1276
+
1277
+ # Compute honest bounds with relative magnitudes restriction
1278
+ # M=1 means post-treatment violations can be up to 1x the worst pre-treatment violation
1279
+ honest = HonestDiD(method='relative_magnitude', M=1.0)
1280
+ honest_results = honest.fit(event_results)
1281
+
1282
+ print(honest_results.summary())
1283
+ print(f"Original estimate: {honest_results.original_estimate:.4f}")
1284
+ print(f"Robust 95% CI: [{honest_results.ci_lb:.4f}, {honest_results.ci_ub:.4f}]")
1285
+ print(f"Effect robust to violations: {honest_results.is_significant}")
1286
+ ```
1287
+
1288
+ **Sensitivity analysis over M values:**
1289
+
1290
+ ```python
1291
+ # How do results change as we allow larger violations?
1292
+ sensitivity = honest.sensitivity_analysis(
1293
+ event_results,
1294
+ M_grid=[0, 0.5, 1.0, 1.5, 2.0]
1295
+ )
1296
+
1297
+ print(sensitivity.summary())
1298
+ print(f"Breakdown value: M = {sensitivity.breakdown_M}")
1299
+ # Breakdown = smallest M where the robust CI includes zero
1300
+ ```
1301
+
1302
+ **Breakdown value:**
1303
+
1304
+ The breakdown value tells you how robust your conclusion is:
1305
+
1306
+ ```python
1307
+ breakdown = honest.breakdown_value(event_results)
1308
+ if breakdown >= 1.0:
1309
+ print("Result holds even if post-treatment violations are as bad as pre-treatment")
1310
+ else:
1311
+ print(f"Result requires violations smaller than {breakdown:.1f}x pre-treatment")
1312
+ ```
1313
+
1314
+ **Smoothness restriction (alternative approach):**
1315
+
1316
+ ```python
1317
+ # Bounds second differences of trend violations
1318
+ # M=0 means linear extrapolation of pre-trends
1319
+ honest_smooth = HonestDiD(method='smoothness', M=0.5)
1320
+ smooth_results = honest_smooth.fit(event_results)
1321
+ ```
1322
+
1323
+ **Visualization:**
1324
+
1325
+ ```python
1326
+ from diff_diff import plot_sensitivity, plot_honest_event_study
1327
+
1328
+ # Plot sensitivity analysis
1329
+ plot_sensitivity(sensitivity, title="Sensitivity to Parallel Trends Violations")
1330
+
1331
+ # Event study with honest confidence intervals
1332
+ plot_honest_event_study(event_results, honest_results)
1333
+ ```
1334
+
1335
+ ### Pre-Trends Power Analysis (Roth 2022)
1336
+
1337
+ A passing pre-trends test doesn't mean parallel trends holds—it may just mean the test has low power. **Pre-Trends Power Analysis** (Roth 2022) answers: "What violations could my pre-trends test have detected?"
1338
+
1339
+ ```python
1340
+ from diff_diff import PreTrendsPower, MultiPeriodDiD
1341
+
1342
+ # First, fit an event study
1343
+ did = MultiPeriodDiD()
1344
+ event_results = did.fit(
1345
+ data,
1346
+ outcome='outcome',
1347
+ treatment='treated',
1348
+ time='period',
1349
+ post_periods=[5, 6, 7, 8, 9]
1350
+ )
1351
+
1352
+ # Analyze pre-trends test power
1353
+ pt = PreTrendsPower(alpha=0.05, power=0.80)
1354
+ power_results = pt.fit(event_results)
1355
+
1356
+ print(power_results.summary())
1357
+ print(f"Minimum Detectable Violation (MDV): {power_results.mdv:.4f}")
1358
+ print(f"Power to detect violations of size MDV: {power_results.power:.1%}")
1359
+ ```
1360
+
1361
+ **Key concepts:**
1362
+
1363
+ - **Minimum Detectable Violation (MDV)**: Smallest violation magnitude that would be detected with your target power (e.g., 80%). Passing the pre-trends test does NOT rule out violations up to this size.
1364
+ - **Power**: Probability of detecting a violation of given size if it exists.
1365
+ - **Violation types**: Linear trend, constant violation, last-period only, or custom patterns.
1366
+
1367
+ **Power curve visualization:**
1368
+
1369
+ ```python
1370
+ from diff_diff import plot_pretrends_power
1371
+
1372
+ # Generate power curve across violation magnitudes
1373
+ curve = pt.power_curve(event_results)
1374
+
1375
+ # Plot the power curve
1376
+ plot_pretrends_power(curve, title="Pre-Trends Test Power Curve")
1377
+
1378
+ # Or from the curve object directly
1379
+ curve.plot()
1380
+ ```
1381
+
1382
+ **Different violation patterns:**
1383
+
1384
+ ```python
1385
+ # Linear trend violations (default) - most common assumption
1386
+ pt_linear = PreTrendsPower(violation_type='linear')
1387
+
1388
+ # Constant violation in all pre-periods
1389
+ pt_constant = PreTrendsPower(violation_type='constant')
1390
+
1391
+ # Violation only in the last pre-period (sharp break)
1392
+ pt_last = PreTrendsPower(violation_type='last_period')
1393
+
1394
+ # Custom violation pattern
1395
+ custom_weights = np.array([0.1, 0.3, 0.6]) # Increasing violations
1396
+ pt_custom = PreTrendsPower(violation_type='custom', violation_weights=custom_weights)
1397
+ ```
1398
+
1399
+ **Combining with HonestDiD:**
1400
+
1401
+ Pre-trends power analysis and HonestDiD are complementary:
1402
+ 1. **Pre-trends power** tells you what the test could have detected
1403
+ 2. **HonestDiD** tells you how robust your results are to violations
1404
+
1405
+ ```python
1406
+ from diff_diff import HonestDiD, PreTrendsPower
1407
+
1408
+ # If MDV is large relative to your estimated effect, be cautious
1409
+ pt = PreTrendsPower()
1410
+ power_results = pt.fit(event_results)
1411
+ sensitivity = pt.sensitivity_to_honest_did(event_results)
1412
+ print(sensitivity['interpretation'])
1413
+
1414
+ # Use HonestDiD for robust inference
1415
+ honest = HonestDiD(method='relative_magnitude', M=1.0)
1416
+ honest_results = honest.fit(event_results)
1417
+ ```
1418
+
1419
+ ### Placebo Tests
1420
+
1421
+ Placebo tests help validate the parallel trends assumption by checking whether effects appear where they shouldn't (before treatment or in untreated groups).
1422
+
1423
+ **Fake timing test:**
1424
+
1425
+ ```python
1426
+ from diff_diff import run_placebo_test
1427
+
1428
+ # Test: Is there an effect before treatment actually occurred?
1429
+ # Actual treatment is at period 3 (post_periods=[3, 4, 5])
1430
+ # We test if a "fake" treatment at period 1 shows an effect
1431
+ results = run_placebo_test(
1432
+ data,
1433
+ outcome='outcome',
1434
+ treatment='treated',
1435
+ time='period',
1436
+ test_type='fake_timing',
1437
+ fake_treatment_period=1, # Pretend treatment was in period 1
1438
+ post_periods=[3, 4, 5] # Actual post-treatment periods
1439
+ )
1440
+
1441
+ print(results.summary())
1442
+ # If parallel trends hold, placebo_effect should be ~0 and not significant
1443
+ print(f"Placebo effect: {results.placebo_effect:.3f} (p={results.p_value:.3f})")
1444
+ print(f"Is significant (bad): {results.is_significant}")
1445
+ ```
1446
+
1447
+ **Fake group test:**
1448
+
1449
+ ```python
1450
+ # Test: Is there an effect among never-treated units?
1451
+ # Get some control unit IDs to use as "fake treated"
1452
+ control_units = data[data['treated'] == 0]['firm_id'].unique()[:5]
1453
+
1454
+ results = run_placebo_test(
1455
+ data,
1456
+ outcome='outcome',
1457
+ treatment='treated',
1458
+ time='period',
1459
+ unit='firm_id',
1460
+ test_type='fake_group',
1461
+ fake_treatment_group=list(control_units), # List of control unit IDs
1462
+ post_periods=[3, 4, 5]
1463
+ )
1464
+ ```
1465
+
1466
+ **Permutation test:**
1467
+
1468
+ ```python
1469
+ # Randomly reassign treatment and compute distribution of effects
1470
+ # Note: requires binary post indicator (use 'post' column, not 'period')
1471
+ results = run_placebo_test(
1472
+ data,
1473
+ outcome='outcome',
1474
+ treatment='treated',
1475
+ time='post', # Binary post-treatment indicator
1476
+ unit='firm_id',
1477
+ test_type='permutation',
1478
+ n_permutations=1000,
1479
+ seed=42
1480
+ )
1481
+
1482
+ print(f"Original effect: {results.original_effect:.3f}")
1483
+ print(f"Permutation p-value: {results.p_value:.4f}")
1484
+ # Low p-value indicates the effect is unlikely to be due to chance
1485
+ ```
1486
+
1487
+ **Leave-one-out sensitivity:**
1488
+
1489
+ ```python
1490
+ # Test sensitivity to individual treated units
1491
+ # Note: requires binary post indicator (use 'post' column, not 'period')
1492
+ results = run_placebo_test(
1493
+ data,
1494
+ outcome='outcome',
1495
+ treatment='treated',
1496
+ time='post', # Binary post-treatment indicator
1497
+ unit='firm_id',
1498
+ test_type='leave_one_out'
1499
+ )
1500
+
1501
+ # Check if any single unit drives the result
1502
+ print(results.leave_one_out_effects) # Effect when each unit is dropped
1503
+ ```
1504
+
1505
+ **Run all placebo tests:**
1506
+
1507
+ ```python
1508
+ from diff_diff import run_all_placebo_tests
1509
+
1510
+ # Comprehensive diagnostic suite
1511
+ # Note: This function runs fake_timing tests on pre-treatment periods.
1512
+ # The permutation and leave_one_out tests require a binary post indicator,
1513
+ # so they may return errors if the data uses multi-period time column.
1514
+ all_results = run_all_placebo_tests(
1515
+ data,
1516
+ outcome='outcome',
1517
+ treatment='treated',
1518
+ time='period',
1519
+ unit='firm_id',
1520
+ pre_periods=[0, 1, 2],
1521
+ post_periods=[3, 4, 5],
1522
+ n_permutations=500,
1523
+ seed=42
1524
+ )
1525
+
1526
+ for test_name, result in all_results.items():
1527
+ if hasattr(result, 'p_value'):
1528
+ print(f"{test_name}: p={result.p_value:.3f}, significant={result.is_significant}")
1529
+ elif isinstance(result, dict) and 'error' in result:
1530
+ print(f"{test_name}: Error - {result['error']}")
1531
+ ```
1532
+
1533
+ ## API Reference
1534
+
1535
+ ### DifferenceInDifferences
1536
+
1537
+ ```python
1538
+ DifferenceInDifferences(
1539
+ robust=True, # Use HC1 robust standard errors
1540
+ cluster=None, # Column for cluster-robust SEs
1541
+ alpha=0.05 # Significance level for CIs
1542
+ )
1543
+ ```
1544
+
1545
+ **Methods:**
1546
+
1547
+ | Method | Description |
1548
+ |--------|-------------|
1549
+ | `fit(data, outcome, treatment, time, ...)` | Fit the DiD model |
1550
+ | `summary()` | Get formatted summary string |
1551
+ | `print_summary()` | Print summary to stdout |
1552
+ | `get_params()` | Get estimator parameters (sklearn-compatible) |
1553
+ | `set_params(**params)` | Set estimator parameters (sklearn-compatible) |
1554
+
1555
+ **fit() Parameters:**
1556
+
1557
+ | Parameter | Type | Description |
1558
+ |-----------|------|-------------|
1559
+ | `data` | DataFrame | Input data |
1560
+ | `outcome` | str | Outcome variable column name |
1561
+ | `treatment` | str | Treatment indicator column (0/1) |
1562
+ | `time` | str | Post-treatment indicator column (0/1) |
1563
+ | `formula` | str | R-style formula (alternative to column names) |
1564
+ | `covariates` | list | Linear control variables |
1565
+ | `fixed_effects` | list | Categorical FE columns (creates dummies) |
1566
+ | `absorb` | list | High-dimensional FE (within-transformation) |
1567
+
1568
+ ### DiDResults
1569
+
1570
+ **Attributes:**
1571
+
1572
+ | Attribute | Description |
1573
+ |-----------|-------------|
1574
+ | `att` | Average Treatment effect on the Treated |
1575
+ | `se` | Standard error of ATT |
1576
+ | `t_stat` | T-statistic |
1577
+ | `p_value` | P-value for H0: ATT = 0 |
1578
+ | `conf_int` | Tuple of (lower, upper) confidence bounds |
1579
+ | `n_obs` | Number of observations |
1580
+ | `n_treated` | Number of treated units |
1581
+ | `n_control` | Number of control units |
1582
+ | `r_squared` | R-squared of regression |
1583
+ | `coefficients` | Dictionary of all coefficients |
1584
+ | `is_significant` | Boolean for significance at alpha |
1585
+ | `significance_stars` | String of significance stars |
1586
+
1587
+ **Methods:**
1588
+
1589
+ | Method | Description |
1590
+ |--------|-------------|
1591
+ | `summary(alpha)` | Get formatted summary string |
1592
+ | `print_summary(alpha)` | Print summary to stdout |
1593
+ | `to_dict()` | Convert to dictionary |
1594
+ | `to_dataframe()` | Convert to pandas DataFrame |
1595
+
1596
+ ### MultiPeriodDiD
1597
+
1598
+ ```python
1599
+ MultiPeriodDiD(
1600
+ robust=True, # Use HC1 robust standard errors
1601
+ cluster=None, # Column for cluster-robust SEs
1602
+ alpha=0.05 # Significance level for CIs
1603
+ )
1604
+ ```
1605
+
1606
+ **fit() Parameters:**
1607
+
1608
+ | Parameter | Type | Description |
1609
+ |-----------|------|-------------|
1610
+ | `data` | DataFrame | Input data |
1611
+ | `outcome` | str | Outcome variable column name |
1612
+ | `treatment` | str | Treatment indicator column (0/1) |
1613
+ | `time` | str | Time period column (multiple values) |
1614
+ | `post_periods` | list | List of post-treatment period values |
1615
+ | `covariates` | list | Linear control variables |
1616
+ | `fixed_effects` | list | Categorical FE columns (creates dummies) |
1617
+ | `absorb` | list | High-dimensional FE (within-transformation) |
1618
+ | `reference_period` | any | Omitted period for time dummies |
1619
+
1620
+ ### MultiPeriodDiDResults
1621
+
1622
+ **Attributes:**
1623
+
1624
+ | Attribute | Description |
1625
+ |-----------|-------------|
1626
+ | `period_effects` | Dict mapping periods to PeriodEffect objects |
1627
+ | `avg_att` | Average ATT across post-treatment periods |
1628
+ | `avg_se` | Standard error of average ATT |
1629
+ | `avg_t_stat` | T-statistic for average ATT |
1630
+ | `avg_p_value` | P-value for average ATT |
1631
+ | `avg_conf_int` | Confidence interval for average ATT |
1632
+ | `n_obs` | Number of observations |
1633
+ | `pre_periods` | List of pre-treatment periods |
1634
+ | `post_periods` | List of post-treatment periods |
1635
+
1636
+ **Methods:**
1637
+
1638
+ | Method | Description |
1639
+ |--------|-------------|
1640
+ | `get_effect(period)` | Get PeriodEffect for specific period |
1641
+ | `summary(alpha)` | Get formatted summary string |
1642
+ | `print_summary(alpha)` | Print summary to stdout |
1643
+ | `to_dict()` | Convert to dictionary |
1644
+ | `to_dataframe()` | Convert to pandas DataFrame |
1645
+
1646
+ ### PeriodEffect
1647
+
1648
+ **Attributes:**
1649
+
1650
+ | Attribute | Description |
1651
+ |-----------|-------------|
1652
+ | `period` | Time period identifier |
1653
+ | `effect` | Treatment effect estimate |
1654
+ | `se` | Standard error |
1655
+ | `t_stat` | T-statistic |
1656
+ | `p_value` | P-value |
1657
+ | `conf_int` | Confidence interval |
1658
+ | `is_significant` | Boolean for significance at 0.05 |
1659
+ | `significance_stars` | String of significance stars |
1660
+
1661
+ ### SyntheticDiD
1662
+
1663
+ ```python
1664
+ SyntheticDiD(
1665
+ lambda_reg=0.0, # L2 regularization for unit weights
1666
+ zeta=1.0, # Regularization for time weights
1667
+ alpha=0.05, # Significance level for CIs
1668
+ n_bootstrap=200, # Bootstrap iterations for SE
1669
+ seed=None # Random seed for reproducibility
1670
+ )
1671
+ ```
1672
+
1673
+ **fit() Parameters:**
1674
+
1675
+ | Parameter | Type | Description |
1676
+ |-----------|------|-------------|
1677
+ | `data` | DataFrame | Panel data |
1678
+ | `outcome` | str | Outcome variable column name |
1679
+ | `treatment` | str | Treatment indicator column (0/1) |
1680
+ | `unit` | str | Unit identifier column |
1681
+ | `time` | str | Time period column |
1682
+ | `post_periods` | list | List of post-treatment period values |
1683
+ | `covariates` | list | Covariates to residualize out |
1684
+
1685
+ ### SyntheticDiDResults
1686
+
1687
+ **Attributes:**
1688
+
1689
+ | Attribute | Description |
1690
+ |-----------|-------------|
1691
+ | `att` | Average Treatment effect on the Treated |
1692
+ | `se` | Standard error (bootstrap or placebo-based) |
1693
+ | `t_stat` | T-statistic |
1694
+ | `p_value` | P-value |
1695
+ | `conf_int` | Confidence interval |
1696
+ | `n_obs` | Number of observations |
1697
+ | `n_treated` | Number of treated units |
1698
+ | `n_control` | Number of control units |
1699
+ | `unit_weights` | Dict mapping control unit IDs to weights |
1700
+ | `time_weights` | Dict mapping pre-treatment periods to weights |
1701
+ | `pre_periods` | List of pre-treatment periods |
1702
+ | `post_periods` | List of post-treatment periods |
1703
+ | `pre_treatment_fit` | RMSE of synthetic vs treated in pre-period |
1704
+ | `placebo_effects` | Array of placebo effect estimates |
1705
+
1706
+ **Methods:**
1707
+
1708
+ | Method | Description |
1709
+ |--------|-------------|
1710
+ | `summary(alpha)` | Get formatted summary string |
1711
+ | `print_summary(alpha)` | Print summary to stdout |
1712
+ | `to_dict()` | Convert to dictionary |
1713
+ | `to_dataframe()` | Convert to pandas DataFrame |
1714
+ | `get_unit_weights_df()` | Get unit weights as DataFrame |
1715
+ | `get_time_weights_df()` | Get time weights as DataFrame |
1716
+
1717
+ ### SunAbraham
1718
+
1719
+ ```python
1720
+ SunAbraham(
1721
+ control_group='never_treated', # or 'not_yet_treated'
1722
+ anticipation=0, # Periods of anticipation effects
1723
+ alpha=0.05, # Significance level for CIs
1724
+ cluster=None, # Column for cluster-robust SEs
1725
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
1726
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
1727
+ seed=None # Random seed
1728
+ )
1729
+ ```
1730
+
1731
+ **fit() Parameters:**
1732
+
1733
+ | Parameter | Type | Description |
1734
+ |-----------|------|-------------|
1735
+ | `data` | DataFrame | Panel data |
1736
+ | `outcome` | str | Outcome variable column name |
1737
+ | `unit` | str | Unit identifier column |
1738
+ | `time` | str | Time period column |
1739
+ | `first_treat` | str | Column with first treatment period (0 for never-treated) |
1740
+ | `covariates` | list | Covariate column names |
1741
+ | `min_pre_periods` | int | Minimum pre-treatment periods to include |
1742
+ | `min_post_periods` | int | Minimum post-treatment periods to include |
1743
+
1744
+ ### SunAbrahamResults
1745
+
1746
+ **Attributes:**
1747
+
1748
+ | Attribute | Description |
1749
+ |-----------|-------------|
1750
+ | `event_study_effects` | Dict mapping relative time to effect info |
1751
+ | `overall_att` | Overall average treatment effect |
1752
+ | `overall_se` | Standard error of overall ATT |
1753
+ | `overall_t_stat` | T-statistic for overall ATT |
1754
+ | `overall_p_value` | P-value for overall ATT |
1755
+ | `overall_conf_int` | Confidence interval for overall ATT |
1756
+ | `cohort_weights` | Dict mapping relative time to cohort weights |
1757
+ | `groups` | List of treatment cohorts |
1758
+ | `time_periods` | List of all time periods |
1759
+ | `n_obs` | Total number of observations |
1760
+ | `n_treated_units` | Number of ever-treated units |
1761
+ | `n_control_units` | Number of never-treated units |
1762
+ | `is_significant` | Boolean for significance at alpha |
1763
+ | `significance_stars` | String of significance stars |
1764
+ | `bootstrap_results` | SABootstrapResults (if bootstrap enabled) |
1765
+
1766
+ **Methods:**
1767
+
1768
+ | Method | Description |
1769
+ |--------|-------------|
1770
+ | `summary(alpha)` | Get formatted summary string |
1771
+ | `print_summary(alpha)` | Print summary to stdout |
1772
+ | `to_dataframe(level)` | Convert to DataFrame ('event_study' or 'cohort') |
1773
+
1774
+ ### TripleDifference
1775
+
1776
+ ```python
1777
+ TripleDifference(
1778
+ estimation_method='dr', # 'dr' (doubly robust), 'reg', or 'ipw'
1779
+ robust=True, # Use HC1 robust standard errors
1780
+ cluster=None, # Column for cluster-robust SEs
1781
+ alpha=0.05, # Significance level for CIs
1782
+ pscore_trim=0.01 # Propensity score trimming threshold
1783
+ )
1784
+ ```
1785
+
1786
+ **fit() Parameters:**
1787
+
1788
+ | Parameter | Type | Description |
1789
+ |-----------|------|-------------|
1790
+ | `data` | DataFrame | Input data |
1791
+ | `outcome` | str | Outcome variable column name |
1792
+ | `group` | str | Group indicator column (0/1): 1=treated group |
1793
+ | `partition` | str | Partition/eligibility indicator column (0/1): 1=eligible |
1794
+ | `time` | str | Time indicator column (0/1): 1=post-treatment |
1795
+ | `covariates` | list | Covariate column names for adjustment |
1796
+
1797
+ ### TripleDifferenceResults
1798
+
1799
+ **Attributes:**
1800
+
1801
+ | Attribute | Description |
1802
+ |-----------|-------------|
1803
+ | `att` | Average Treatment effect on the Treated |
1804
+ | `se` | Standard error of ATT |
1805
+ | `t_stat` | T-statistic |
1806
+ | `p_value` | P-value for H0: ATT = 0 |
1807
+ | `conf_int` | Tuple of (lower, upper) confidence bounds |
1808
+ | `n_obs` | Total number of observations |
1809
+ | `n_treated_eligible` | Obs in treated group & eligible partition |
1810
+ | `n_treated_ineligible` | Obs in treated group & ineligible partition |
1811
+ | `n_control_eligible` | Obs in control group & eligible partition |
1812
+ | `n_control_ineligible` | Obs in control group & ineligible partition |
1813
+ | `estimation_method` | Method used ('dr', 'reg', or 'ipw') |
1814
+ | `group_means` | Dict of cell means for diagnostics |
1815
+ | `pscore_stats` | Propensity score statistics (IPW/DR only) |
1816
+ | `is_significant` | Boolean for significance at alpha |
1817
+ | `significance_stars` | String of significance stars |
1818
+
1819
+ **Methods:**
1820
+
1821
+ | Method | Description |
1822
+ |--------|-------------|
1823
+ | `summary(alpha)` | Get formatted summary string |
1824
+ | `print_summary(alpha)` | Print summary to stdout |
1825
+ | `to_dict()` | Convert to dictionary |
1826
+ | `to_dataframe()` | Convert to pandas DataFrame |
1827
+
1828
+ ### HonestDiD
1829
+
1830
+ ```python
1831
+ HonestDiD(
1832
+ method='relative_magnitude', # 'relative_magnitude' or 'smoothness'
1833
+ M=None, # Restriction parameter (default: 1.0 for RM, 0.0 for SD)
1834
+ alpha=0.05, # Significance level for CIs
1835
+ l_vec=None # Linear combination vector for target parameter
1836
+ )
1837
+ ```
1838
+
1839
+ **fit() Parameters:**
1840
+
1841
+ | Parameter | Type | Description |
1842
+ |-----------|------|-------------|
1843
+ | `results` | MultiPeriodDiDResults | Results from MultiPeriodDiD.fit() |
1844
+ | `M` | float | Restriction parameter (overrides constructor value) |
1845
+
1846
+ **Methods:**
1847
+
1848
+ | Method | Description |
1849
+ |--------|-------------|
1850
+ | `fit(results, M)` | Compute bounds for given event study results |
1851
+ | `sensitivity_analysis(results, M_grid)` | Compute bounds over grid of M values |
1852
+ | `breakdown_value(results, tol)` | Find smallest M where CI includes zero |
1853
+
1854
+ ### HonestDiDResults
1855
+
1856
+ **Attributes:**
1857
+
1858
+ | Attribute | Description |
1859
+ |-----------|-------------|
1860
+ | `original_estimate` | Point estimate under parallel trends |
1861
+ | `lb` | Lower bound of identified set |
1862
+ | `ub` | Upper bound of identified set |
1863
+ | `ci_lb` | Lower bound of robust confidence interval |
1864
+ | `ci_ub` | Upper bound of robust confidence interval |
1865
+ | `ci_width` | Width of robust CI |
1866
+ | `M` | Restriction parameter used |
1867
+ | `method` | Restriction method ('relative_magnitude' or 'smoothness') |
1868
+ | `alpha` | Significance level |
1869
+ | `is_significant` | True if robust CI excludes zero |
1870
+
1871
+ **Methods:**
1872
+
1873
+ | Method | Description |
1874
+ |--------|-------------|
1875
+ | `summary()` | Get formatted summary string |
1876
+ | `to_dict()` | Convert to dictionary |
1877
+ | `to_dataframe()` | Convert to pandas DataFrame |
1878
+
1879
+ ### SensitivityResults
1880
+
1881
+ **Attributes:**
1882
+
1883
+ | Attribute | Description |
1884
+ |-----------|-------------|
1885
+ | `M_grid` | Array of M values analyzed |
1886
+ | `results` | List of HonestDiDResults for each M |
1887
+ | `breakdown_M` | Smallest M where CI includes zero (None if always significant) |
1888
+
1889
+ **Methods:**
1890
+
1891
+ | Method | Description |
1892
+ |--------|-------------|
1893
+ | `summary()` | Get formatted summary string |
1894
+ | `plot(ax)` | Plot sensitivity analysis |
1895
+ | `to_dataframe()` | Convert to pandas DataFrame |
1896
+
1897
+ ### PreTrendsPower
1898
+
1899
+ ```python
1900
+ PreTrendsPower(
1901
+ alpha=0.05, # Significance level for pre-trends test
1902
+ power=0.80, # Target power for MDV calculation
1903
+ violation_type='linear', # 'linear', 'constant', 'last_period', 'custom'
1904
+ violation_weights=None # Custom weights (required if violation_type='custom')
1905
+ )
1906
+ ```
1907
+
1908
+ **fit() Parameters:**
1909
+
1910
+ | Parameter | Type | Description |
1911
+ |-----------|------|-------------|
1912
+ | `results` | MultiPeriodDiDResults | Results from event study |
1913
+ | `M` | float | Specific violation magnitude to evaluate |
1914
+
1915
+ **Methods:**
1916
+
1917
+ | Method | Description |
1918
+ |--------|-------------|
1919
+ | `fit(results, M)` | Compute power analysis for given event study |
1920
+ | `power_at(results, M)` | Compute power for specific violation magnitude |
1921
+ | `power_curve(results, M_grid, n_points)` | Compute power across range of M values |
1922
+ | `sensitivity_to_honest_did(results)` | Compare with HonestDiD analysis |
1923
+
1924
+ ### PreTrendsPowerResults
1925
+
1926
+ **Attributes:**
1927
+
1928
+ | Attribute | Description |
1929
+ |-----------|-------------|
1930
+ | `power` | Power to detect the specified violation |
1931
+ | `mdv` | Minimum detectable violation at target power |
1932
+ | `violation_magnitude` | Violation magnitude (M) tested |
1933
+ | `violation_type` | Type of violation pattern |
1934
+ | `alpha` | Significance level |
1935
+ | `target_power` | Target power level |
1936
+ | `n_pre_periods` | Number of pre-treatment periods |
1937
+ | `test_statistic` | Expected test statistic under violation |
1938
+ | `critical_value` | Critical value for pre-trends test |
1939
+ | `noncentrality` | Non-centrality parameter |
1940
+ | `is_informative` | Heuristic check if test is informative |
1941
+ | `power_adequate` | Whether power meets target |
1942
+
1943
+ **Methods:**
1944
+
1945
+ | Method | Description |
1946
+ |--------|-------------|
1947
+ | `summary()` | Get formatted summary string |
1948
+ | `print_summary()` | Print summary to stdout |
1949
+ | `to_dict()` | Convert to dictionary |
1950
+ | `to_dataframe()` | Convert to pandas DataFrame |
1951
+
1952
+ ### PreTrendsPowerCurve
1953
+
1954
+ **Attributes:**
1955
+
1956
+ | Attribute | Description |
1957
+ |-----------|-------------|
1958
+ | `M_values` | Array of violation magnitudes |
1959
+ | `powers` | Array of power values |
1960
+ | `mdv` | Minimum detectable violation |
1961
+ | `alpha` | Significance level |
1962
+ | `target_power` | Target power level |
1963
+ | `violation_type` | Type of violation pattern |
1964
+
1965
+ **Methods:**
1966
+
1967
+ | Method | Description |
1968
+ |--------|-------------|
1969
+ | `plot(ax, show_mdv, show_target)` | Plot power curve |
1970
+ | `to_dataframe()` | Convert to DataFrame with M and power columns |
1971
+
1972
+ ### Data Preparation Functions
1973
+
1974
+ #### generate_did_data
1975
+
1976
+ ```python
1977
+ generate_did_data(
1978
+ n_units=100, # Number of units
1979
+ n_periods=4, # Number of time periods
1980
+ treatment_effect=5.0, # True ATT
1981
+ treatment_fraction=0.5, # Fraction treated
1982
+ treatment_period=2, # First post-treatment period
1983
+ unit_fe_sd=2.0, # Unit fixed effect std dev
1984
+ time_trend=0.5, # Linear time trend
1985
+ noise_sd=1.0, # Idiosyncratic noise std dev
1986
+ seed=None # Random seed
1987
+ )
1988
+ ```
1989
+
1990
+ Returns DataFrame with columns: `unit`, `period`, `treated`, `post`, `outcome`, `true_effect`.
1991
+
1992
+ #### make_treatment_indicator
1993
+
1994
+ ```python
1995
+ make_treatment_indicator(
1996
+ data, # Input DataFrame
1997
+ column, # Column to create treatment from
1998
+ treated_values=None, # Value(s) indicating treatment
1999
+ threshold=None, # Numeric threshold for treatment
2000
+ above_threshold=True, # If True, >= threshold is treated
2001
+ new_column='treated' # Output column name
2002
+ )
2003
+ ```
2004
+
2005
+ #### make_post_indicator
2006
+
2007
+ ```python
2008
+ make_post_indicator(
2009
+ data, # Input DataFrame
2010
+ time_column, # Time/period column
2011
+ post_periods=None, # Specific post-treatment period(s)
2012
+ treatment_start=None, # First post-treatment period
2013
+ new_column='post' # Output column name
2014
+ )
2015
+ ```
2016
+
2017
+ #### wide_to_long
2018
+
2019
+ ```python
2020
+ wide_to_long(
2021
+ data, # Wide-format DataFrame
2022
+ value_columns, # List of time-varying columns
2023
+ id_column, # Unit identifier column
2024
+ time_name='period', # Name for time column
2025
+ value_name='value', # Name for value column
2026
+ time_values=None # Values for time periods
2027
+ )
2028
+ ```
2029
+
2030
+ #### balance_panel
2031
+
2032
+ ```python
2033
+ balance_panel(
2034
+ data, # Panel DataFrame
2035
+ unit_column, # Unit identifier column
2036
+ time_column, # Time period column
2037
+ method='inner', # 'inner', 'outer', or 'fill'
2038
+ fill_value=None # Value for filling (if method='fill')
2039
+ )
2040
+ ```
2041
+
2042
+ #### validate_did_data
2043
+
2044
+ ```python
2045
+ validate_did_data(
2046
+ data, # DataFrame to validate
2047
+ outcome, # Outcome column name
2048
+ treatment, # Treatment column name
2049
+ time, # Time/post column name
2050
+ unit=None, # Unit column (for panel validation)
2051
+ raise_on_error=True # Raise ValueError or return dict
2052
+ )
2053
+ ```
2054
+
2055
+ Returns dict with `valid`, `errors`, `warnings`, and `summary` keys.
2056
+
2057
+ #### summarize_did_data
2058
+
2059
+ ```python
2060
+ summarize_did_data(
2061
+ data, # Input DataFrame
2062
+ outcome, # Outcome column name
2063
+ treatment, # Treatment column name
2064
+ time, # Time/post column name
2065
+ unit=None # Unit column (optional)
2066
+ )
2067
+ ```
2068
+
2069
+ Returns DataFrame with summary statistics by treatment-time cell.
2070
+
2071
+ #### create_event_time
2072
+
2073
+ ```python
2074
+ create_event_time(
2075
+ data, # Panel DataFrame
2076
+ time_column, # Calendar time column
2077
+ treatment_time_column, # Column with treatment timing
2078
+ new_column='event_time' # Output column name
2079
+ )
2080
+ ```
2081
+
2082
+ #### aggregate_to_cohorts
2083
+
2084
+ ```python
2085
+ aggregate_to_cohorts(
2086
+ data, # Unit-level panel data
2087
+ unit_column, # Unit identifier column
2088
+ time_column, # Time period column
2089
+ treatment_column, # Treatment indicator column
2090
+ outcome, # Outcome variable column
2091
+ covariates=None # Additional columns to aggregate
2092
+ )
2093
+ ```
2094
+
2095
+ #### rank_control_units
2096
+
2097
+ ```python
2098
+ rank_control_units(
2099
+ data, # Panel data in long format
2100
+ unit_column, # Unit identifier column
2101
+ time_column, # Time period column
2102
+ outcome_column, # Outcome variable column
2103
+ treatment_column=None, # Treatment indicator column (0/1)
2104
+ treated_units=None, # Explicit list of treated unit IDs
2105
+ pre_periods=None, # Pre-treatment periods (default: first half)
2106
+ covariates=None, # Covariate columns for matching
2107
+ outcome_weight=0.7, # Weight for outcome trend similarity (0-1)
2108
+ covariate_weight=0.3, # Weight for covariate distance (0-1)
2109
+ exclude_units=None, # Units to exclude from control pool
2110
+ require_units=None, # Units that must appear in output
2111
+ n_top=None, # Return only top N controls
2112
+ suggest_treatment_candidates=False, # Identify treatment candidates
2113
+ n_treatment_candidates=5, # Number of treatment candidates
2114
+ lambda_reg=0.0 # Regularization for synthetic weights
2115
+ )
2116
+ ```
2117
+
2118
+ Returns DataFrame with columns: `unit`, `quality_score`, `outcome_trend_score`, `covariate_score`, `synthetic_weight`, `pre_trend_rmse`, `is_required`.
2119
+
2120
+ ## Requirements
2121
+
2122
+ - Python >= 3.9
2123
+ - numpy >= 1.20
2124
+ - pandas >= 1.3
2125
+ - scipy >= 1.7
2126
+
2127
+ ## Development
2128
+
2129
+ ```bash
2130
+ # Install with dev dependencies
2131
+ pip install -e ".[dev]"
2132
+
2133
+ # Run tests
2134
+ pytest
2135
+
2136
+ # Format code
2137
+ black diff_diff tests
2138
+ ruff check diff_diff tests
2139
+ ```
2140
+
2141
+ ## References
2142
+
2143
+ This library implements methods from the following scholarly works:
2144
+
2145
+ ### Difference-in-Differences
2146
+
2147
+ - **Ashenfelter, O., & Card, D. (1985).** "Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs." *The Review of Economics and Statistics*, 67(4), 648-660. [https://doi.org/10.2307/1924810](https://doi.org/10.2307/1924810)
2148
+
2149
+ - **Card, D., & Krueger, A. B. (1994).** "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania." *The American Economic Review*, 84(4), 772-793. [https://www.jstor.org/stable/2118030](https://www.jstor.org/stable/2118030)
2150
+
2151
+ - **Angrist, J. D., & Pischke, J.-S. (2009).** *Mostly Harmless Econometrics: An Empiricist's Companion*. Princeton University Press. Chapter 5: Differences-in-Differences.
2152
+
2153
+ ### Two-Way Fixed Effects
2154
+
2155
+ - **Wooldridge, J. M. (2010).** *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press.
2156
+
2157
+ - **Imai, K., & Kim, I. S. (2021).** "On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data." *Political Analysis*, 29(3), 405-415. [https://doi.org/10.1017/pan.2020.33](https://doi.org/10.1017/pan.2020.33)
2158
+
2159
+ ### Robust Standard Errors
2160
+
2161
+ - **White, H. (1980).** "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." *Econometrica*, 48(4), 817-838. [https://doi.org/10.2307/1912934](https://doi.org/10.2307/1912934)
2162
+
2163
+ - **MacKinnon, J. G., & White, H. (1985).** "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." *Journal of Econometrics*, 29(3), 305-325. [https://doi.org/10.1016/0304-4076(85)90158-7](https://doi.org/10.1016/0304-4076(85)90158-7)
2164
+
2165
+ - **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011).** "Robust Inference With Multiway Clustering." *Journal of Business & Economic Statistics*, 29(2), 238-249. [https://doi.org/10.1198/jbes.2010.07136](https://doi.org/10.1198/jbes.2010.07136)
2166
+
2167
+ ### Wild Cluster Bootstrap
2168
+
2169
+ - **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008).** "Bootstrap-Based Improvements for Inference with Clustered Errors." *The Review of Economics and Statistics*, 90(3), 414-427. [https://doi.org/10.1162/rest.90.3.414](https://doi.org/10.1162/rest.90.3.414)
2170
+
2171
+ - **Webb, M. D. (2014).** "Reworking Wild Bootstrap Based Inference for Clustered Errors." Queen's Economics Department Working Paper No. 1315. [https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf](https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf)
2172
+
2173
+ - **MacKinnon, J. G., & Webb, M. D. (2018).** "The Wild Bootstrap for Few (Treated) Clusters." *The Econometrics Journal*, 21(2), 114-135. [https://doi.org/10.1111/ectj.12107](https://doi.org/10.1111/ectj.12107)
2174
+
2175
+ ### Placebo Tests and DiD Diagnostics
2176
+
2177
+ - **Bertrand, M., Duflo, E., & Mullainathan, S. (2004).** "How Much Should We Trust Differences-in-Differences Estimates?" *The Quarterly Journal of Economics*, 119(1), 249-275. [https://doi.org/10.1162/003355304772839588](https://doi.org/10.1162/003355304772839588)
2178
+
2179
+ ### Synthetic Control Method
2180
+
2181
+ - **Abadie, A., & Gardeazabal, J. (2003).** "The Economic Costs of Conflict: A Case Study of the Basque Country." *The American Economic Review*, 93(1), 113-132. [https://doi.org/10.1257/000282803321455188](https://doi.org/10.1257/000282803321455188)
2182
+
2183
+ - **Abadie, A., Diamond, A., & Hainmueller, J. (2010).** "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." *Journal of the American Statistical Association*, 105(490), 493-505. [https://doi.org/10.1198/jasa.2009.ap08746](https://doi.org/10.1198/jasa.2009.ap08746)
2184
+
2185
+ - **Abadie, A., Diamond, A., & Hainmueller, J. (2015).** "Comparative Politics and the Synthetic Control Method." *American Journal of Political Science*, 59(2), 495-510. [https://doi.org/10.1111/ajps.12116](https://doi.org/10.1111/ajps.12116)
2186
+
2187
+ ### Synthetic Difference-in-Differences
2188
+
2189
+ - **Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021).** "Synthetic Difference-in-Differences." *American Economic Review*, 111(12), 4088-4118. [https://doi.org/10.1257/aer.20190159](https://doi.org/10.1257/aer.20190159)
2190
+
2191
+ ### Triple Difference (DDD)
2192
+
2193
+ - **Ortiz-Villavicencio, M., & Sant'Anna, P. H. C. (2025).** "Better Understanding Triple Differences Estimators." *Working Paper*. [https://arxiv.org/abs/2505.09942](https://arxiv.org/abs/2505.09942)
2194
+
2195
+ This paper shows that common DDD implementations (taking the difference between two DiDs, or applying three-way fixed effects regressions) are generally invalid when identification requires conditioning on covariates. The `TripleDifference` class implements their regression adjustment, inverse probability weighting, and doubly robust estimators.
2196
+
2197
+ - **Gruber, J. (1994).** "The Incidence of Mandated Maternity Benefits." *American Economic Review*, 84(3), 622-641. [https://www.jstor.org/stable/2118071](https://www.jstor.org/stable/2118071)
2198
+
2199
+ Classic paper introducing the Triple Difference design for policy evaluation.
2200
+
2201
+ - **Olden, A., & Møen, J. (2022).** "The Triple Difference Estimator." *The Econometrics Journal*, 25(3), 531-553. [https://doi.org/10.1093/ectj/utac010](https://doi.org/10.1093/ectj/utac010)
2202
+
2203
+ ### Parallel Trends and Pre-Trend Testing
2204
+
2205
+ - **Roth, J. (2022).** "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." *American Economic Review: Insights*, 4(3), 305-322. [https://doi.org/10.1257/aeri.20210236](https://doi.org/10.1257/aeri.20210236)
2206
+
2207
+ - **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. [https://doi.org/10.1177/1948550617697177](https://doi.org/10.1177/1948550617697177)
2208
+
2209
+ ### Honest DiD / Sensitivity Analysis
2210
+
2211
+ The `HonestDiD` module implements sensitivity analysis methods for relaxing the parallel trends assumption:
2212
+
2213
+ - **Rambachan, A., & Roth, J. (2023).** "A More Credible Approach to Parallel Trends." *The Review of Economic Studies*, 90(5), 2555-2591. [https://doi.org/10.1093/restud/rdad018](https://doi.org/10.1093/restud/rdad018)
2214
+
2215
+ This paper introduces the "Honest DiD" framework implemented in our `HonestDiD` class:
2216
+ - **Relative Magnitudes (ΔRM)**: Bounds post-treatment violations by a multiple of observed pre-treatment violations
2217
+ - **Smoothness (ΔSD)**: Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends
2218
+ - **Breakdown Analysis**: Finding the smallest violation magnitude that would overturn conclusions
2219
+ - **Robust Confidence Intervals**: Valid inference under partial identification
2220
+
2221
+ - **Roth, J., & Sant'Anna, P. H. C. (2023).** "When Is Parallel Trends Sensitive to Functional Form?" *Econometrica*, 91(2), 737-747. [https://doi.org/10.3982/ECTA19402](https://doi.org/10.3982/ECTA19402)
2222
+
2223
+ Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate.
2224
+
2225
+ ### Multi-Period and Staggered Adoption
2226
+
2227
+ - **Callaway, B., & Sant'Anna, P. H. C. (2021).** "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*, 225(2), 200-230. [https://doi.org/10.1016/j.jeconom.2020.12.001](https://doi.org/10.1016/j.jeconom.2020.12.001)
2228
+
2229
+ - **Sant'Anna, P. H. C., & Zhao, J. (2020).** "Doubly Robust Difference-in-Differences Estimators." *Journal of Econometrics*, 219(1), 101-122. [https://doi.org/10.1016/j.jeconom.2020.06.003](https://doi.org/10.1016/j.jeconom.2020.06.003)
2230
+
2231
+ - **Sun, L., & Abraham, S. (2021).** "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." *Journal of Econometrics*, 225(2), 175-199. [https://doi.org/10.1016/j.jeconom.2020.09.006](https://doi.org/10.1016/j.jeconom.2020.09.006)
2232
+
2233
+ - **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. [https://doi.org/10.1257/aer.20181169](https://doi.org/10.1257/aer.20181169)
2234
+
2235
+ - **Goodman-Bacon, A. (2021).** "Difference-in-Differences with Variation in Treatment Timing." *Journal of Econometrics*, 225(2), 254-277. [https://doi.org/10.1016/j.jeconom.2021.03.014](https://doi.org/10.1016/j.jeconom.2021.03.014)
2236
+
2237
+ ### Power Analysis
2238
+
2239
+ - **Bloom, H. S. (1995).** "Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs." *Evaluation Review*, 19(5), 547-556. [https://doi.org/10.1177/0193841X9501900504](https://doi.org/10.1177/0193841X9501900504)
2240
+
2241
+ - **Burlig, F., Preonas, L., & Woerman, M. (2020).** "Panel Data and Experimental Design." *Journal of Development Economics*, 144, 102458. [https://doi.org/10.1016/j.jdeveco.2020.102458](https://doi.org/10.1016/j.jdeveco.2020.102458)
2242
+
2243
+ Essential reference for power analysis in panel DiD designs. Discusses how serial correlation (ICC) affects power and provides formulas for panel data settings.
2244
+
2245
+ - **Djimeu, E. W., & Houndolo, D.-G. (2016).** "Power Calculation for Causal Inference in Social Science: Sample Size and Minimum Detectable Effect Determination." *Journal of Development Effectiveness*, 8(4), 508-527. [https://doi.org/10.1080/19439342.2016.1244555](https://doi.org/10.1080/19439342.2016.1244555)
2246
+
2247
+ ### General Causal Inference
2248
+
2249
+ - **Imbens, G. W., & Rubin, D. B. (2015).** *Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction*. Cambridge University Press.
2250
+
2251
+ - **Cunningham, S. (2021).** *Causal Inference: The Mixtape*. Yale University Press. [https://mixtape.scunning.com/](https://mixtape.scunning.com/)
2252
+
2253
+ ## License
2254
+
2255
+ MIT License