diff-diff 2.3.2__cp313-cp313-win_amd64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,2646 @@
1
+ Metadata-Version: 2.4
2
+ Name: diff-diff
3
+ Version: 2.3.2
4
+ Classifier: Development Status :: 5 - Production/Stable
5
+ Classifier: Intended Audience :: Science/Research
6
+ Classifier: Operating System :: OS Independent
7
+ Classifier: Programming Language :: Python :: 3
8
+ Classifier: Programming Language :: Python :: 3.9
9
+ Classifier: Programming Language :: Python :: 3.10
10
+ Classifier: Programming Language :: Python :: 3.11
11
+ Classifier: Programming Language :: Python :: 3.12
12
+ Classifier: Programming Language :: Python :: 3.13
13
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
14
+ Requires-Dist: numpy>=1.20.0
15
+ Requires-Dist: pandas>=1.3.0
16
+ Requires-Dist: scipy>=1.7.0
17
+ Requires-Dist: pytest>=7.0 ; extra == 'dev'
18
+ Requires-Dist: pytest-xdist>=3.0 ; extra == 'dev'
19
+ Requires-Dist: pytest-cov>=4.0 ; extra == 'dev'
20
+ Requires-Dist: black>=23.0 ; extra == 'dev'
21
+ Requires-Dist: ruff>=0.1.0 ; extra == 'dev'
22
+ Requires-Dist: mypy>=1.0 ; extra == 'dev'
23
+ Requires-Dist: maturin>=1.4,<2.0 ; extra == 'dev'
24
+ Requires-Dist: sphinx>=6.0 ; extra == 'docs'
25
+ Requires-Dist: sphinx-rtd-theme>=1.0 ; extra == 'docs'
26
+ Provides-Extra: dev
27
+ Provides-Extra: docs
28
+ Summary: A library for Difference-in-Differences causal inference analysis
29
+ Keywords: causal-inference,difference-in-differences,econometrics,statistics,treatment-effects
30
+ Author: diff-diff contributors
31
+ License-Expression: MIT
32
+ Requires-Python: >=3.9, <3.14
33
+ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
34
+ Project-URL: Documentation, https://diff-diff.readthedocs.io
35
+ Project-URL: Homepage, https://github.com/igerber/diff-diff
36
+ Project-URL: Issues, https://github.com/igerber/diff-diff/issues
37
+ Project-URL: Repository, https://github.com/igerber/diff-diff
38
+
39
+ # diff-diff
40
+
41
+ A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs.
42
+
43
+ ## Installation
44
+
45
+ ```bash
46
+ pip install diff-diff
47
+ ```
48
+
49
+ Or install from source:
50
+
51
+ ```bash
52
+ git clone https://github.com/igerber/diff-diff.git
53
+ cd diff-diff
54
+ pip install -e .
55
+ ```
56
+
57
+ ## Quick Start
58
+
59
+ ```python
60
+ import pandas as pd
61
+ from diff_diff import DifferenceInDifferences
62
+
63
+ # Create sample data
64
+ data = pd.DataFrame({
65
+ 'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
66
+ 'treated': [1, 1, 1, 1, 0, 0, 0, 0],
67
+ 'post': [0, 0, 1, 1, 0, 0, 1, 1]
68
+ })
69
+
70
+ # Fit the model
71
+ did = DifferenceInDifferences()
72
+ results = did.fit(data, outcome='outcome', treatment='treated', time='post')
73
+
74
+ # View results
75
+ print(results) # DiDResults(ATT=3.0000, SE=1.7321, p=0.1583)
76
+ results.print_summary()
77
+ ```
78
+
79
+ Output:
80
+ ```
81
+ ======================================================================
82
+ Difference-in-Differences Estimation Results
83
+ ======================================================================
84
+
85
+ Observations: 8
86
+ Treated units: 4
87
+ Control units: 4
88
+ R-squared: 0.9055
89
+
90
+ ----------------------------------------------------------------------
91
+ Parameter Estimate Std. Err. t-stat P>|t|
92
+ ----------------------------------------------------------------------
93
+ ATT 3.0000 1.7321 1.732 0.1583
94
+ ----------------------------------------------------------------------
95
+
96
+ 95% Confidence Interval: [-1.8089, 7.8089]
97
+
98
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
99
+ ======================================================================
100
+ ```
101
+
102
+ ## Features
103
+
104
+ - **sklearn-like API**: Familiar `fit()` interface with `get_params()` and `set_params()`
105
+ - **Pythonic results**: Easy access to coefficients, standard errors, and confidence intervals
106
+ - **Multiple interfaces**: Column names or R-style formulas
107
+ - **Robust inference**: Heteroskedasticity-robust (HC1) and cluster-robust standard errors
108
+ - **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
109
+ - **Panel data support**: Two-way fixed effects estimator for panel designs
110
+ - **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
111
+ - **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), and Borusyak-Jaravel-Spiess (2024) imputation estimators for heterogeneous treatment timing
112
+ - **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
113
+ - **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
114
+ - **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
115
+ - **Event study plots**: Publication-ready visualization of treatment effects
116
+ - **Parallel trends testing**: Multiple methods including equivalence tests
117
+ - **Goodman-Bacon decomposition**: Diagnose TWFE bias by decomposing into 2x2 comparisons
118
+ - **Placebo tests**: Comprehensive diagnostics including fake timing, fake group, permutation, and leave-one-out tests
119
+ - **Honest DiD sensitivity analysis**: Rambachan-Roth (2023) bounds and breakdown analysis for parallel trends violations
120
+ - **Pre-trends power analysis**: Roth (2022) minimum detectable violation (MDV) and power curves for pre-trends tests
121
+ - **Power analysis**: MDE, sample size, and power calculations for study design; simulation-based power for any estimator
122
+ - **Data prep utilities**: Helper functions for common data preparation tasks
123
+ - **Validated against R**: Benchmarked against `did`, `synthdid`, and `fixest` packages (see [benchmarks](docs/benchmarks.rst))
124
+
125
+ ## Tutorials
126
+
127
+ We provide Jupyter notebook tutorials in `docs/tutorials/`:
128
+
129
+ | Notebook | Description |
130
+ |----------|-------------|
131
+ | `01_basic_did.ipynb` | Basic 2x2 DiD, formula interface, covariates, fixed effects, cluster-robust SE, wild bootstrap |
132
+ | `02_staggered_did.ipynb` | Staggered adoption with Callaway-Sant'Anna and Sun-Abraham, group-time effects, aggregation methods, Bacon decomposition |
133
+ | `03_synthetic_did.ipynb` | Synthetic DiD, unit/time weights, inference methods, regularization |
134
+ | `04_parallel_trends.ipynb` | Testing parallel trends, equivalence tests, placebo tests, diagnostics |
135
+ | `05_honest_did.ipynb` | Honest DiD sensitivity analysis, bounds, breakdown values, visualization |
136
+ | `06_power_analysis.ipynb` | Power analysis, MDE, sample size calculations, simulation-based power |
137
+ | `07_pretrends_power.ipynb` | Pre-trends power analysis (Roth 2022), MDV, power curves |
138
+ | `08_triple_diff.ipynb` | Triple Difference (DDD) estimation with proper covariate handling |
139
+ | `09_real_world_examples.ipynb` | Real-world data examples (Card-Krueger, Castle Doctrine, Divorce Laws) |
140
+ | `10_trop.ipynb` | Triply Robust Panel (TROP) estimation with factor model adjustment |
141
+
142
+ ## Data Preparation
143
+
144
+ diff-diff provides utility functions to help prepare your data for DiD analysis. These functions handle common data transformation tasks like creating treatment indicators, reshaping panel data, and validating data formats.
145
+
146
+ ### Generate Sample Data
147
+
148
+ Create synthetic data with a known treatment effect for testing and learning:
149
+
150
+ ```python
151
+ from diff_diff import generate_did_data, DifferenceInDifferences
152
+
153
+ # Generate panel data with 100 units, 4 periods, and a treatment effect of 5
154
+ data = generate_did_data(
155
+ n_units=100,
156
+ n_periods=4,
157
+ treatment_effect=5.0,
158
+ treatment_fraction=0.5, # 50% of units are treated
159
+ treatment_period=2, # Treatment starts at period 2
160
+ seed=42
161
+ )
162
+
163
+ # Verify the estimator recovers the treatment effect
164
+ did = DifferenceInDifferences()
165
+ results = did.fit(data, outcome='outcome', treatment='treated', time='post')
166
+ print(f"Estimated ATT: {results.att:.2f} (true: 5.0)")
167
+ ```
168
+
169
+ ### Create Treatment Indicators
170
+
171
+ Convert categorical variables or numeric thresholds to binary treatment indicators:
172
+
173
+ ```python
174
+ from diff_diff import make_treatment_indicator
175
+
176
+ # From categorical variable
177
+ df = make_treatment_indicator(
178
+ data,
179
+ column='state',
180
+ treated_values=['CA', 'NY', 'TX'] # These states are treated
181
+ )
182
+
183
+ # From numeric threshold (e.g., firms above median size)
184
+ df = make_treatment_indicator(
185
+ data,
186
+ column='firm_size',
187
+ threshold=data['firm_size'].median()
188
+ )
189
+
190
+ # Treat units below threshold
191
+ df = make_treatment_indicator(
192
+ data,
193
+ column='income',
194
+ threshold=50000,
195
+ above_threshold=False # Units with income <= 50000 are treated
196
+ )
197
+ ```
198
+
199
+ ### Create Post-Treatment Indicators
200
+
201
+ Convert time/date columns to binary post-treatment indicators:
202
+
203
+ ```python
204
+ from diff_diff import make_post_indicator
205
+
206
+ # From specific post-treatment periods
207
+ df = make_post_indicator(
208
+ data,
209
+ time_column='year',
210
+ post_periods=[2020, 2021, 2022]
211
+ )
212
+
213
+ # From treatment start date
214
+ df = make_post_indicator(
215
+ data,
216
+ time_column='year',
217
+ treatment_start=2020 # All years >= 2020 are post-treatment
218
+ )
219
+
220
+ # Works with datetime columns
221
+ df = make_post_indicator(
222
+ data,
223
+ time_column='date',
224
+ treatment_start='2020-01-01'
225
+ )
226
+ ```
227
+
228
+ ### Reshape Wide to Long Format
229
+
230
+ Convert wide-format data (one row per unit, multiple time columns) to long format:
231
+
232
+ ```python
233
+ from diff_diff import wide_to_long
234
+
235
+ # Wide format: columns like sales_2019, sales_2020, sales_2021
236
+ wide_df = pd.DataFrame({
237
+ 'firm_id': [1, 2, 3],
238
+ 'industry': ['tech', 'retail', 'tech'],
239
+ 'sales_2019': [100, 150, 200],
240
+ 'sales_2020': [110, 160, 210],
241
+ 'sales_2021': [120, 170, 220]
242
+ })
243
+
244
+ # Convert to long format for DiD
245
+ long_df = wide_to_long(
246
+ wide_df,
247
+ value_columns=['sales_2019', 'sales_2020', 'sales_2021'],
248
+ id_column='firm_id',
249
+ time_name='year',
250
+ value_name='sales',
251
+ time_values=[2019, 2020, 2021]
252
+ )
253
+ # Result: 9 rows (3 firms × 3 years), columns: firm_id, year, sales, industry
254
+ ```
255
+
256
+ ### Balance Panel Data
257
+
258
+ Ensure all units have observations for all time periods:
259
+
260
+ ```python
261
+ from diff_diff import balance_panel
262
+
263
+ # Keep only units with complete data (drop incomplete units)
264
+ balanced = balance_panel(
265
+ data,
266
+ unit_column='firm_id',
267
+ time_column='year',
268
+ method='inner'
269
+ )
270
+
271
+ # Include all unit-period combinations (creates NaN for missing)
272
+ balanced = balance_panel(
273
+ data,
274
+ unit_column='firm_id',
275
+ time_column='year',
276
+ method='outer'
277
+ )
278
+
279
+ # Fill missing values
280
+ balanced = balance_panel(
281
+ data,
282
+ unit_column='firm_id',
283
+ time_column='year',
284
+ method='fill',
285
+ fill_value=0 # Or None for forward/backward fill
286
+ )
287
+ ```
288
+
289
+ ### Validate Data
290
+
291
+ Check that your data meets DiD requirements before fitting:
292
+
293
+ ```python
294
+ from diff_diff import validate_did_data
295
+
296
+ # Validate and get informative error messages
297
+ result = validate_did_data(
298
+ data,
299
+ outcome='sales',
300
+ treatment='treated',
301
+ time='post',
302
+ unit='firm_id', # Optional: for panel-specific validation
303
+ raise_on_error=False # Return dict instead of raising
304
+ )
305
+
306
+ if result['valid']:
307
+ print("Data is ready for DiD analysis!")
308
+ print(f"Summary: {result['summary']}")
309
+ else:
310
+ print("Issues found:")
311
+ for error in result['errors']:
312
+ print(f" - {error}")
313
+
314
+ for warning in result['warnings']:
315
+ print(f"Warning: {warning}")
316
+ ```
317
+
318
+ ### Summarize Data by Groups
319
+
320
+ Get summary statistics for each treatment-time cell:
321
+
322
+ ```python
323
+ from diff_diff import summarize_did_data
324
+
325
+ summary = summarize_did_data(
326
+ data,
327
+ outcome='sales',
328
+ treatment='treated',
329
+ time='post'
330
+ )
331
+ print(summary)
332
+ ```
333
+
334
+ Output:
335
+ ```
336
+ n mean std min max
337
+ Control - Pre 250 100.5000 15.2340 65.0000 145.0000
338
+ Control - Post 250 105.2000 16.1230 68.0000 152.0000
339
+ Treated - Pre 250 101.2000 14.8900 67.0000 143.0000
340
+ Treated - Post 250 115.8000 17.5600 72.0000 165.0000
341
+ DiD Estimate - 9.9000 - - -
342
+ ```
343
+
344
+ ### Create Event Time for Staggered Designs
345
+
346
+ For designs where treatment occurs at different times:
347
+
348
+ ```python
349
+ from diff_diff import create_event_time
350
+
351
+ # Add event-time column relative to treatment timing
352
+ df = create_event_time(
353
+ data,
354
+ time_column='year',
355
+ treatment_time_column='treatment_year'
356
+ )
357
+ # Result: event_time = -2, -1, 0, 1, 2 relative to treatment
358
+ ```
359
+
360
+ ### Aggregate to Cohort Means
361
+
362
+ Aggregate unit-level data for visualization:
363
+
364
+ ```python
365
+ from diff_diff import aggregate_to_cohorts
366
+
367
+ cohort_data = aggregate_to_cohorts(
368
+ data,
369
+ unit_column='firm_id',
370
+ time_column='year',
371
+ treatment_column='treated',
372
+ outcome='sales'
373
+ )
374
+ # Result: mean outcome by treatment group and period
375
+ ```
376
+
377
+ ### Rank Control Units
378
+
379
+ Select the best control units for DiD or Synthetic DiD analysis by ranking them based on pre-treatment outcome similarity:
380
+
381
+ ```python
382
+ from diff_diff import rank_control_units, generate_did_data
383
+
384
+ # Generate sample data
385
+ data = generate_did_data(n_units=50, n_periods=6, seed=42)
386
+
387
+ # Rank control units by their similarity to treated units
388
+ ranking = rank_control_units(
389
+ data,
390
+ unit_column='unit',
391
+ time_column='period',
392
+ outcome_column='outcome',
393
+ treatment_column='treated',
394
+ n_top=10 # Return top 10 controls
395
+ )
396
+
397
+ print(ranking[['unit', 'quality_score', 'pre_trend_rmse']])
398
+ ```
399
+
400
+ Output:
401
+ ```
402
+ unit quality_score pre_trend_rmse
403
+ 0 35 1.0000 0.4521
404
+ 1 42 0.9234 0.5123
405
+ 2 28 0.8876 0.5892
406
+ ...
407
+ ```
408
+
409
+ With covariates for matching:
410
+
411
+ ```python
412
+ # Add covariate-based matching
413
+ ranking = rank_control_units(
414
+ data,
415
+ unit_column='unit',
416
+ time_column='period',
417
+ outcome_column='outcome',
418
+ treatment_column='treated',
419
+ covariates=['size', 'age'], # Match on these too
420
+ outcome_weight=0.7, # 70% weight on outcome trends
421
+ covariate_weight=0.3 # 30% weight on covariate similarity
422
+ )
423
+ ```
424
+
425
+ Filter data for SyntheticDiD using top controls:
426
+
427
+ ```python
428
+ from diff_diff import SyntheticDiD
429
+
430
+ # Get top control units
431
+ top_controls = ranking['unit'].tolist()
432
+
433
+ # Filter data to treated + top controls
434
+ filtered_data = data[
435
+ (data['treated'] == 1) | (data['unit'].isin(top_controls))
436
+ ]
437
+
438
+ # Fit SyntheticDiD with selected controls
439
+ sdid = SyntheticDiD()
440
+ results = sdid.fit(
441
+ filtered_data,
442
+ outcome='outcome',
443
+ treatment='treated',
444
+ unit='unit',
445
+ time='period',
446
+ post_periods=[3, 4, 5]
447
+ )
448
+ ```
449
+
450
+ ## Usage
451
+
452
+ ### Basic DiD with Column Names
453
+
454
+ ```python
455
+ from diff_diff import DifferenceInDifferences
456
+
457
+ did = DifferenceInDifferences(robust=True, alpha=0.05)
458
+ results = did.fit(
459
+ data,
460
+ outcome='sales',
461
+ treatment='treated',
462
+ time='post_policy'
463
+ )
464
+
465
+ # Access results
466
+ print(f"ATT: {results.att:.4f}")
467
+ print(f"Standard Error: {results.se:.4f}")
468
+ print(f"P-value: {results.p_value:.4f}")
469
+ print(f"95% CI: {results.conf_int}")
470
+ print(f"Significant: {results.is_significant}")
471
+ ```
472
+
473
+ ### Using Formula Interface
474
+
475
+ ```python
476
+ # R-style formula syntax
477
+ results = did.fit(data, formula='outcome ~ treated * post')
478
+
479
+ # Explicit interaction syntax
480
+ results = did.fit(data, formula='outcome ~ treated + post + treated:post')
481
+
482
+ # With covariates
483
+ results = did.fit(data, formula='outcome ~ treated * post + age + income')
484
+ ```
485
+
486
+ ### Including Covariates
487
+
488
+ ```python
489
+ results = did.fit(
490
+ data,
491
+ outcome='outcome',
492
+ treatment='treated',
493
+ time='post',
494
+ covariates=['age', 'income', 'education']
495
+ )
496
+ ```
497
+
498
+ ### Fixed Effects
499
+
500
+ Use `fixed_effects` for low-dimensional categorical controls (creates dummy variables):
501
+
502
+ ```python
503
+ # State and industry fixed effects
504
+ results = did.fit(
505
+ data,
506
+ outcome='sales',
507
+ treatment='treated',
508
+ time='post',
509
+ fixed_effects=['state', 'industry']
510
+ )
511
+
512
+ # Access fixed effect coefficients
513
+ state_coefs = {k: v for k, v in results.coefficients.items() if k.startswith('state_')}
514
+ ```
515
+
516
+ Use `absorb` for high-dimensional fixed effects (more efficient, uses within-transformation):
517
+
518
+ ```python
519
+ # Absorb firm-level fixed effects (efficient for many firms)
520
+ results = did.fit(
521
+ data,
522
+ outcome='sales',
523
+ treatment='treated',
524
+ time='post',
525
+ absorb=['firm_id']
526
+ )
527
+ ```
528
+
529
+ Combine covariates with fixed effects:
530
+
531
+ ```python
532
+ results = did.fit(
533
+ data,
534
+ outcome='sales',
535
+ treatment='treated',
536
+ time='post',
537
+ covariates=['size', 'age'], # Linear controls
538
+ fixed_effects=['industry'], # Low-dimensional FE (dummies)
539
+ absorb=['firm_id'] # High-dimensional FE (absorbed)
540
+ )
541
+ ```
542
+
543
+ ### Cluster-Robust Standard Errors
544
+
545
+ ```python
546
+ did = DifferenceInDifferences(cluster='state')
547
+ results = did.fit(
548
+ data,
549
+ outcome='outcome',
550
+ treatment='treated',
551
+ time='post'
552
+ )
553
+ ```
554
+
555
+ ### Wild Cluster Bootstrap
556
+
557
+ When you have few clusters (<50), standard cluster-robust SEs are biased. Wild cluster bootstrap provides valid inference even with 5-10 clusters.
558
+
559
+ ```python
560
+ # Use wild bootstrap for inference
561
+ did = DifferenceInDifferences(
562
+ cluster='state',
563
+ inference='wild_bootstrap',
564
+ n_bootstrap=999,
565
+ bootstrap_weights='rademacher', # or 'webb' for <10 clusters, 'mammen'
566
+ seed=42
567
+ )
568
+ results = did.fit(data, outcome='y', treatment='treated', time='post')
569
+
570
+ # Results include bootstrap-based SE and p-value
571
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
572
+ print(f"P-value: {results.p_value:.4f}")
573
+ print(f"95% CI: {results.conf_int}")
574
+ print(f"Inference method: {results.inference_method}")
575
+ print(f"Number of clusters: {results.n_clusters}")
576
+ ```
577
+
578
+ **Weight types:**
579
+ - `'rademacher'` - Default, ±1 with p=0.5, good for most cases
580
+ - `'webb'` - 6-point distribution, recommended for <10 clusters
581
+ - `'mammen'` - Two-point distribution, alternative to Rademacher
582
+
583
+ Works with `DifferenceInDifferences` and `TwoWayFixedEffects` estimators.
584
+
585
+ ### Two-Way Fixed Effects (Panel Data)
586
+
587
+ ```python
588
+ from diff_diff import TwoWayFixedEffects
589
+
590
+ twfe = TwoWayFixedEffects()
591
+ results = twfe.fit(
592
+ panel_data,
593
+ outcome='outcome',
594
+ treatment='treated',
595
+ time='year',
596
+ unit='firm_id'
597
+ )
598
+ ```
599
+
600
+ ### Multi-Period DiD (Event Study)
601
+
602
+ For settings with multiple pre- and post-treatment periods. Estimates treatment × period
603
+ interactions for ALL periods (pre and post), enabling parallel trends assessment:
604
+
605
+ ```python
606
+ from diff_diff import MultiPeriodDiD
607
+
608
+ # Fit full event study with pre and post period effects
609
+ did = MultiPeriodDiD()
610
+ results = did.fit(
611
+ panel_data,
612
+ outcome='sales',
613
+ treatment='treated',
614
+ time='period',
615
+ post_periods=[3, 4, 5], # Periods 3-5 are post-treatment
616
+ reference_period=2, # Last pre-period (e=-1 convention)
617
+ unit='unit_id', # Optional: warns if staggered adoption detected
618
+ )
619
+
620
+ # Pre-period effects test parallel trends (should be ≈ 0)
621
+ for period, effect in results.pre_period_effects.items():
622
+ print(f"Pre {period}: {effect.effect:.3f} (SE: {effect.se:.3f})")
623
+
624
+ # Post-period effects estimate dynamic treatment effects
625
+ for period, effect in results.post_period_effects.items():
626
+ print(f"Post {period}: {effect.effect:.3f} (SE: {effect.se:.3f})")
627
+
628
+ # View average treatment effect across post-periods
629
+ print(f"Average ATT: {results.avg_att:.3f}")
630
+ print(f"Average SE: {results.avg_se:.3f}")
631
+
632
+ # Full summary with pre and post period effects
633
+ results.print_summary()
634
+ ```
635
+
636
+ Output:
637
+ ```
638
+ ================================================================================
639
+ Multi-Period Difference-in-Differences Estimation Results
640
+ ================================================================================
641
+
642
+ Observations: 600
643
+ Pre-treatment periods: 3
644
+ Post-treatment periods: 3
645
+
646
+ --------------------------------------------------------------------------------
647
+ Average Treatment Effect
648
+ --------------------------------------------------------------------------------
649
+ Average ATT 5.2000 0.8234 6.315 0.0000
650
+ --------------------------------------------------------------------------------
651
+ 95% Confidence Interval: [3.5862, 6.8138]
652
+
653
+ Period-Specific Effects:
654
+ --------------------------------------------------------------------------------
655
+ Period Effect Std. Err. t-stat P>|t|
656
+ --------------------------------------------------------------------------------
657
+ 3 4.5000 0.9512 4.731 0.0000***
658
+ 4 5.2000 0.8876 5.858 0.0000***
659
+ 5 5.9000 0.9123 6.468 0.0000***
660
+ --------------------------------------------------------------------------------
661
+
662
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
663
+ ================================================================================
664
+ ```
665
+
666
+ ### Staggered Difference-in-Differences (Callaway-Sant'Anna)
667
+
668
+ When treatment is adopted at different times by different units, traditional TWFE estimators can be biased. The Callaway-Sant'Anna estimator provides unbiased estimates with staggered adoption.
669
+
670
+ ```python
671
+ from diff_diff import CallawaySantAnna
672
+
673
+ # Panel data with staggered treatment
674
+ # 'first_treat' = period when unit was first treated (0 if never treated)
675
+ cs = CallawaySantAnna()
676
+ results = cs.fit(
677
+ panel_data,
678
+ outcome='sales',
679
+ unit='firm_id',
680
+ time='year',
681
+ first_treat='first_treat', # 0 for never-treated, else first treatment year
682
+ aggregate='event_study' # Compute event study effects
683
+ )
684
+
685
+ # View results
686
+ results.print_summary()
687
+
688
+ # Access group-time effects ATT(g,t)
689
+ for (group, time), effect in results.group_time_effects.items():
690
+ print(f"Cohort {group}, Period {time}: {effect['effect']:.3f}")
691
+
692
+ # Event study effects (averaged by relative time)
693
+ for rel_time, effect in results.event_study_effects.items():
694
+ print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")
695
+
696
+ # Convert to DataFrame
697
+ df = results.to_dataframe(level='event_study')
698
+ ```
699
+
700
+ Output:
701
+ ```
702
+ =====================================================================================
703
+ Callaway-Sant'Anna Staggered Difference-in-Differences Results
704
+ =====================================================================================
705
+
706
+ Total observations: 600
707
+ Treated units: 35
708
+ Control units: 15
709
+ Treatment cohorts: 3
710
+ Time periods: 8
711
+ Control group: never_treated
712
+
713
+ -------------------------------------------------------------------------------------
714
+ Overall Average Treatment Effect on the Treated
715
+ -------------------------------------------------------------------------------------
716
+ Parameter Estimate Std. Err. t-stat P>|t| Sig.
717
+ -------------------------------------------------------------------------------------
718
+ ATT 2.5000 0.3521 7.101 0.0000 ***
719
+ -------------------------------------------------------------------------------------
720
+
721
+ 95% Confidence Interval: [1.8099, 3.1901]
722
+
723
+ -------------------------------------------------------------------------------------
724
+ Event Study (Dynamic) Effects
725
+ -------------------------------------------------------------------------------------
726
+ Rel. Period Estimate Std. Err. t-stat P>|t| Sig.
727
+ -------------------------------------------------------------------------------------
728
+ 0 2.1000 0.4521 4.645 0.0000 ***
729
+ 1 2.5000 0.4123 6.064 0.0000 ***
730
+ 2 2.8000 0.5234 5.349 0.0000 ***
731
+ -------------------------------------------------------------------------------------
732
+
733
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
734
+ =====================================================================================
735
+ ```
736
+
737
+ **When to use Callaway-Sant'Anna vs TWFE:**
738
+
739
+ | Scenario | Use TWFE | Use Callaway-Sant'Anna |
740
+ |----------|----------|------------------------|
741
+ | All units treated at same time | ✓ | ✓ |
742
+ | Staggered adoption, homogeneous effects | ✓ | ✓ |
743
+ | Staggered adoption, heterogeneous effects | ✗ | ✓ |
744
+ | Need event study with staggered timing | ✗ | ✓ |
745
+ | Fewer than ~20 treated units | ✓ | Depends on design |
746
+
747
+ **Parameters:**
748
+
749
+ ```python
750
+ CallawaySantAnna(
751
+ control_group='never_treated', # or 'not_yet_treated'
752
+ anticipation=0, # Periods before treatment with effects
753
+ estimation_method='dr', # 'dr', 'ipw', or 'reg'
754
+ alpha=0.05, # Significance level
755
+ cluster=None, # Column for cluster SEs
756
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
757
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
758
+ seed=None # Random seed
759
+ )
760
+ ```
761
+
762
+ **Multiplier bootstrap for inference:**
763
+
764
+ With few clusters or when analytical standard errors may be unreliable, use the multiplier bootstrap for valid inference. This implements the approach from Callaway & Sant'Anna (2021).
765
+
766
+ ```python
767
+ # Bootstrap inference with 999 iterations
768
+ cs = CallawaySantAnna(
769
+ n_bootstrap=999,
770
+ bootstrap_weights='rademacher', # or 'mammen', 'webb'
771
+ seed=42
772
+ )
773
+ results = cs.fit(
774
+ data,
775
+ outcome='sales',
776
+ unit='firm_id',
777
+ time='year',
778
+ first_treat='first_treat',
779
+ aggregate='event_study'
780
+ )
781
+
782
+ # Access bootstrap results
783
+ print(f"Overall ATT: {results.overall_att:.3f}")
784
+ print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}")
785
+ print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}")
786
+ print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}")
787
+
788
+ # Event study bootstrap inference
789
+ for rel_time, se in results.bootstrap_results.event_study_ses.items():
790
+ ci = results.bootstrap_results.event_study_cis[rel_time]
791
+ print(f"e={rel_time}: SE={se:.3f}, 95% CI=[{ci[0]:.3f}, {ci[1]:.3f}]")
792
+ ```
793
+
794
+ **Bootstrap weight types:**
795
+ - `'rademacher'` - Default, ±1 with p=0.5, good for most cases
796
+ - `'mammen'` - Two-point distribution matching first 3 moments
797
+ - `'webb'` - Six-point distribution, recommended for very few clusters (<10)
798
+
799
+ **Covariate adjustment for conditional parallel trends:**
800
+
801
+ When parallel trends only holds conditional on covariates, use the `covariates` parameter:
802
+
803
+ ```python
804
+ # Doubly robust estimation with covariates
805
+ cs = CallawaySantAnna(estimation_method='dr') # 'dr', 'ipw', or 'reg'
806
+ results = cs.fit(
807
+ data,
808
+ outcome='sales',
809
+ unit='firm_id',
810
+ time='year',
811
+ first_treat='first_treat',
812
+ covariates=['size', 'age', 'industry'], # Covariates for conditional PT
813
+ aggregate='event_study'
814
+ )
815
+ ```
816
+
817
+ ### Sun-Abraham Interaction-Weighted Estimator
818
+
819
+ The Sun-Abraham (2021) estimator provides an alternative to Callaway-Sant'Anna using an interaction-weighted (IW) regression approach. Running both estimators serves as a useful robustness check—when they agree, results are more credible.
820
+
821
+ ```python
822
+ from diff_diff import SunAbraham
823
+
824
+ # Basic usage
825
+ sa = SunAbraham()
826
+ results = sa.fit(
827
+ panel_data,
828
+ outcome='sales',
829
+ unit='firm_id',
830
+ time='year',
831
+ first_treat='first_treat' # 0 for never-treated, else first treatment year
832
+ )
833
+
834
+ # View results
835
+ results.print_summary()
836
+
837
+ # Event study effects (by relative time to treatment)
838
+ for rel_time, effect in results.event_study_effects.items():
839
+ print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")
840
+
841
+ # Overall ATT
842
+ print(f"Overall ATT: {results.overall_att:.3f} (SE: {results.overall_se:.3f})")
843
+
844
+ # Cohort weights (how each cohort contributes to each event-time estimate)
845
+ for rel_time, weights in results.cohort_weights.items():
846
+ print(f"e={rel_time}: {weights}")
847
+ ```
848
+
849
+ **Parameters:**
850
+
851
+ ```python
852
+ SunAbraham(
853
+ control_group='never_treated', # or 'not_yet_treated'
854
+ anticipation=0, # Periods before treatment with effects
855
+ alpha=0.05, # Significance level
856
+ cluster=None, # Column for cluster SEs
857
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
858
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
859
+ seed=None # Random seed
860
+ )
861
+ ```
862
+
863
+ **Bootstrap inference:**
864
+
865
+ ```python
866
+ # Bootstrap inference with 999 iterations
867
+ sa = SunAbraham(
868
+ n_bootstrap=999,
869
+ bootstrap_weights='rademacher',
870
+ seed=42
871
+ )
872
+ results = sa.fit(
873
+ data,
874
+ outcome='sales',
875
+ unit='firm_id',
876
+ time='year',
877
+ first_treat='first_treat'
878
+ )
879
+
880
+ # Access bootstrap results
881
+ print(f"Overall ATT: {results.overall_att:.3f}")
882
+ print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}")
883
+ print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}")
884
+ print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}")
885
+ ```
886
+
887
+ **When to use Sun-Abraham vs Callaway-Sant'Anna:**
888
+
889
+ | Aspect | Sun-Abraham | Callaway-Sant'Anna |
890
+ |--------|-------------|-------------------|
891
+ | Approach | Interaction-weighted regression | 2x2 DiD aggregation |
892
+ | Efficiency | More efficient under homogeneous effects | More robust to heterogeneity |
893
+ | Weighting | Weights by cohort share at each relative time | Weights by sample size |
894
+ | Use case | Robustness check, regression-based inference | Primary staggered DiD estimator |
895
+
896
+ **Both estimators should give similar results when:**
897
+ - Treatment effects are relatively homogeneous across cohorts
898
+ - Parallel trends holds
899
+
900
+ **Running both as robustness check:**
901
+
902
+ ```python
903
+ from diff_diff import CallawaySantAnna, SunAbraham
904
+
905
+ # Callaway-Sant'Anna
906
+ cs = CallawaySantAnna()
907
+ cs_results = cs.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat')
908
+
909
+ # Sun-Abraham
910
+ sa = SunAbraham()
911
+ sa_results = sa.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat')
912
+
913
+ # Compare
914
+ print(f"Callaway-Sant'Anna ATT: {cs_results.overall_att:.3f}")
915
+ print(f"Sun-Abraham ATT: {sa_results.overall_att:.3f}")
916
+
917
+ # If results differ substantially, investigate heterogeneity
918
+ ```
919
+
920
+ ### Borusyak-Jaravel-Spiess Imputation Estimator
921
+
922
+ The Borusyak et al. (2024) imputation estimator is the **efficient** estimator for staggered DiD under parallel trends, producing ~50% shorter confidence intervals than Callaway-Sant'Anna and 2-3.5x shorter than Sun-Abraham under homogeneous treatment effects.
923
+
924
+ ```python
925
+ from diff_diff import ImputationDiD, imputation_did
926
+
927
+ # Basic usage
928
+ est = ImputationDiD()
929
+ results = est.fit(data, outcome='outcome', unit='unit',
930
+ time='period', first_treat='first_treat')
931
+ results.print_summary()
932
+
933
+ # Event study
934
+ results = est.fit(data, outcome='outcome', unit='unit',
935
+ time='period', first_treat='first_treat',
936
+ aggregate='event_study')
937
+
938
+ # Pre-trend test (Equation 9)
939
+ pt = results.pretrend_test(n_leads=3)
940
+ print(f"F-stat: {pt['f_stat']:.3f}, p-value: {pt['p_value']:.4f}")
941
+
942
+ # Convenience function
943
+ results = imputation_did(data, 'outcome', 'unit', 'period', 'first_treat',
944
+ aggregate='all')
945
+ ```
946
+
947
+ ```python
948
+ ImputationDiD(
949
+ anticipation=0, # Number of anticipation periods
950
+ alpha=0.05, # Significance level
951
+ cluster=None, # Cluster variable (defaults to unit)
952
+ n_bootstrap=0, # Bootstrap iterations (0=analytical inference)
953
+ seed=None, # Random seed
954
+ horizon_max=None, # Max event-study horizon
955
+ aux_partition="cohort_horizon", # Variance partition: "cohort_horizon", "cohort", "horizon"
956
+ )
957
+ ```
958
+
959
+ **When to use Imputation DiD vs Callaway-Sant'Anna:**
960
+
961
+ | Aspect | Imputation DiD | Callaway-Sant'Anna |
962
+ |--------|---------------|-------------------|
963
+ | Efficiency | Most efficient under homogeneous effects | Less efficient but more robust to heterogeneity |
964
+ | Control group | Always uses all untreated obs | Choice of never-treated or not-yet-treated |
965
+ | Inference | Conservative variance (Theorem 3) | Multiplier bootstrap |
966
+ | Pre-trends | Built-in F-test (Equation 9) | Separate testing |
967
+
968
+ ### Triple Difference (DDD)
969
+
970
+ Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).
971
+
972
+ ```python
973
+ from diff_diff import TripleDifference, triple_difference
974
+
975
+ # Basic usage
976
+ ddd = TripleDifference(estimation_method='dr') # doubly robust (recommended)
977
+ results = ddd.fit(
978
+ data,
979
+ outcome='wages',
980
+ group='policy_state', # 1=state enacted policy, 0=control state
981
+ partition='female', # 1=women (affected by policy), 0=men
982
+ time='post' # 1=post-policy, 0=pre-policy
983
+ )
984
+
985
+ # View results
986
+ results.print_summary()
987
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
988
+
989
+ # With covariates (properly incorporated, unlike naive DDD)
990
+ results = ddd.fit(
991
+ data,
992
+ outcome='wages',
993
+ group='policy_state',
994
+ partition='female',
995
+ time='post',
996
+ covariates=['age', 'education', 'experience']
997
+ )
998
+ ```
999
+
1000
+ **Estimation methods:**
1001
+
1002
+ | Method | Description | When to use |
1003
+ |--------|-------------|-------------|
1004
+ | `"dr"` | Doubly robust | Recommended. Consistent if either outcome or propensity model is correct |
1005
+ | `"reg"` | Regression adjustment | Simple outcome regression with full interactions |
1006
+ | `"ipw"` | Inverse probability weighting | When propensity score model is well-specified |
1007
+
1008
+ ```python
1009
+ # Compare estimation methods
1010
+ for method in ['reg', 'ipw', 'dr']:
1011
+ est = TripleDifference(estimation_method=method)
1012
+ res = est.fit(data, outcome='y', group='g', partition='p', time='t')
1013
+ print(f"{method}: ATT={res.att:.3f} (SE={res.se:.3f})")
1014
+ ```
1015
+
1016
+ **Convenience function:**
1017
+
1018
+ ```python
1019
+ # One-liner estimation
1020
+ results = triple_difference(
1021
+ data,
1022
+ outcome='wages',
1023
+ group='policy_state',
1024
+ partition='female',
1025
+ time='post',
1026
+ covariates=['age', 'education'],
1027
+ estimation_method='dr'
1028
+ )
1029
+ ```
1030
+
1031
+ **Why use DDD instead of DiD?**
1032
+
1033
+ DDD allows for violations of parallel trends that are:
1034
+ - Group-specific (e.g., economic shocks in treatment states)
1035
+ - Partition-specific (e.g., trends affecting women everywhere)
1036
+
1037
+ As long as these biases are additive, DDD differences them out. The key assumption is that the *differential* trend between eligible and ineligible units would be the same across groups.
1038
+
1039
+ ### Event Study Visualization
1040
+
1041
+ Create publication-ready event study plots:
1042
+
1043
+ ```python
1044
+ from diff_diff import plot_event_study, MultiPeriodDiD, CallawaySantAnna, SunAbraham
1045
+
1046
+ # From MultiPeriodDiD (full event study with pre and post period effects)
1047
+ did = MultiPeriodDiD()
1048
+ results = did.fit(data, outcome='y', treatment='treated',
1049
+ time='period', post_periods=[3, 4, 5], reference_period=2)
1050
+ plot_event_study(results, title="Treatment Effects Over Time")
1051
+
1052
+ # From CallawaySantAnna (with event study aggregation)
1053
+ cs = CallawaySantAnna()
1054
+ results = cs.fit(data, outcome='y', unit='unit', time='period',
1055
+ first_treat='first_treat', aggregate='event_study')
1056
+ plot_event_study(results, title="Staggered DiD Event Study (CS)")
1057
+
1058
+ # From SunAbraham
1059
+ sa = SunAbraham()
1060
+ results = sa.fit(data, outcome='y', unit='unit', time='period',
1061
+ first_treat='first_treat')
1062
+ plot_event_study(results, title="Staggered DiD Event Study (SA)")
1063
+
1064
+ # From a DataFrame
1065
+ df = pd.DataFrame({
1066
+ 'period': [-2, -1, 0, 1, 2],
1067
+ 'effect': [0.1, 0.05, 0.0, 2.5, 2.8],
1068
+ 'se': [0.3, 0.25, 0.0, 0.4, 0.45]
1069
+ })
1070
+ plot_event_study(df, reference_period=0)
1071
+
1072
+ # With customization
1073
+ ax = plot_event_study(
1074
+ results,
1075
+ title="Dynamic Treatment Effects",
1076
+ xlabel="Years Relative to Treatment",
1077
+ ylabel="Effect on Sales ($1000s)",
1078
+ color="#2563eb",
1079
+ marker="o",
1080
+ shade_pre=True, # Shade pre-treatment region
1081
+ show_zero_line=True, # Horizontal line at y=0
1082
+ show_reference_line=True, # Vertical line at reference period
1083
+ figsize=(10, 6),
1084
+ show=False # Don't call plt.show(), return axes
1085
+ )
1086
+ ```
1087
+
1088
+ ### Synthetic Difference-in-Differences
1089
+
1090
+ Synthetic DiD combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units' pre-treatment outcomes.
1091
+
1092
+ ```python
1093
+ from diff_diff import SyntheticDiD
1094
+
1095
+ # Fit Synthetic DiD model
1096
+ sdid = SyntheticDiD()
1097
+ results = sdid.fit(
1098
+ panel_data,
1099
+ outcome='gdp_growth',
1100
+ treatment='treated',
1101
+ unit='state',
1102
+ time='year',
1103
+ post_periods=[2015, 2016, 2017, 2018]
1104
+ )
1105
+
1106
+ # View results
1107
+ results.print_summary()
1108
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
1109
+
1110
+ # Examine unit weights (which control units matter most)
1111
+ weights_df = results.get_unit_weights_df()
1112
+ print(weights_df.head(10))
1113
+
1114
+ # Examine time weights
1115
+ time_weights_df = results.get_time_weights_df()
1116
+ print(time_weights_df)
1117
+ ```
1118
+
1119
+ Output:
1120
+ ```
1121
+ ===========================================================================
1122
+ Synthetic Difference-in-Differences Estimation Results
1123
+ ===========================================================================
1124
+
1125
+ Observations: 500
1126
+ Treated units: 1
1127
+ Control units: 49
1128
+ Pre-treatment periods: 6
1129
+ Post-treatment periods: 4
1130
+ Regularization (lambda): 0.0000
1131
+ Pre-treatment fit (RMSE): 0.1234
1132
+
1133
+ ---------------------------------------------------------------------------
1134
+ Parameter Estimate Std. Err. t-stat P>|t|
1135
+ ---------------------------------------------------------------------------
1136
+ ATT 2.5000 0.4521 5.530 0.0000
1137
+ ---------------------------------------------------------------------------
1138
+
1139
+ 95% Confidence Interval: [1.6139, 3.3861]
1140
+
1141
+ ---------------------------------------------------------------------------
1142
+ Top Unit Weights (Synthetic Control)
1143
+ ---------------------------------------------------------------------------
1144
+ Unit state_12: 0.3521
1145
+ Unit state_5: 0.2156
1146
+ Unit state_23: 0.1834
1147
+ Unit state_8: 0.1245
1148
+ Unit state_31: 0.0892
1149
+ (8 units with weight > 0.001)
1150
+
1151
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
1152
+ ===========================================================================
1153
+ ```
1154
+
1155
+ #### When to Use Synthetic DiD Over Vanilla DiD
1156
+
1157
+ Use Synthetic DiD instead of standard DiD when:
1158
+
1159
+ 1. **Few treated units**: When you have only one or a small number of treated units (e.g., a single state passed a policy), standard DiD averages across all controls equally. Synthetic DiD finds the optimal weighted combination of controls.
1160
+
1161
+ ```python
1162
+ # Example: California passed a policy, want to estimate its effect
1163
+ # Standard DiD would compare CA to the average of all other states
1164
+ # Synthetic DiD finds states that together best match CA's pre-treatment trend
1165
+ ```
1166
+
1167
+ 2. **Parallel trends is questionable**: When treated and control groups have different pre-treatment levels or trends, Synthetic DiD can construct a better counterfactual by matching the pre-treatment trajectory.
1168
+
1169
+ ```python
1170
+ # Example: A tech hub city vs rural areas
1171
+ # Rural areas may not be a good comparison on average
1172
+ # Synthetic DiD can weight urban/suburban controls more heavily
1173
+ ```
1174
+
1175
+ 3. **Heterogeneous control units**: When control units are very different from each other, equal weighting (as in standard DiD) is suboptimal.
1176
+
1177
+ ```python
1178
+ # Example: Comparing a treated developing country to other countries
1179
+ # Some control countries may be much more similar economically
1180
+ # Synthetic DiD upweights the most comparable controls
1181
+ ```
1182
+
1183
+ 4. **You want transparency**: Synthetic DiD provides explicit unit weights showing which controls contribute most to the comparison.
1184
+
1185
+ ```python
1186
+ # See exactly which units are driving the counterfactual
1187
+ print(results.get_unit_weights_df())
1188
+ ```
1189
+
1190
+ **Key differences from standard DiD:**
1191
+
1192
+ | Aspect | Standard DiD | Synthetic DiD |
1193
+ |--------|--------------|---------------|
1194
+ | Control weighting | Equal (1/N) | Optimized to match pre-treatment |
1195
+ | Time weighting | Equal across periods | Can emphasize informative periods |
1196
+ | N treated required | Can be many | Works with 1 treated unit |
1197
+ | Parallel trends | Assumed | Partially relaxed via matching |
1198
+ | Interpretability | Simple average | Explicit weights |
1199
+
1200
+ **Parameters:**
1201
+
1202
+ ```python
1203
+ SyntheticDiD(
1204
+ zeta_omega=None, # Unit weight regularization (None = auto-computed from data)
1205
+ zeta_lambda=None, # Time weight regularization (None = auto-computed from data)
1206
+ alpha=0.05, # Significance level
1207
+ variance_method="placebo", # "placebo" (default, matches R) or "bootstrap"
1208
+ n_bootstrap=200, # Replications for SE estimation
1209
+ seed=None # Random seed for reproducibility
1210
+ )
1211
+ ```
1212
+
1213
+ ### Triply Robust Panel (TROP)
1214
+
1215
+ TROP (Athey, Imbens, Qu & Viviano 2025) extends Synthetic DiD by adding interactive fixed effects (factor model) adjustment. It's particularly useful when there are unobserved time-varying confounders with a factor structure that could bias standard DiD or SDID estimates.
1216
+
1217
+ TROP combines three robustness components:
1218
+ 1. **Nuclear norm regularized factor model**: Estimates interactive fixed effects L_it via soft-thresholding
1219
+ 2. **Exponential distance-based unit weights**: ω_j = exp(-λ_unit × distance(j,i))
1220
+ 3. **Exponential time decay weights**: θ_s = exp(-λ_time × |s-t|)
1221
+
1222
+ Tuning parameters are selected via leave-one-out cross-validation (LOOCV).
1223
+
1224
+ ```python
1225
+ from diff_diff import TROP, trop
1226
+
1227
+ # Fit TROP model with automatic tuning via LOOCV
1228
+ trop_est = TROP(
1229
+ lambda_time_grid=[0.0, 0.5, 1.0, 2.0], # Time decay grid
1230
+ lambda_unit_grid=[0.0, 0.5, 1.0, 2.0], # Unit distance grid
1231
+ lambda_nn_grid=[0.0, 0.1, 1.0], # Nuclear norm grid
1232
+ n_bootstrap=200
1233
+ )
1234
+ # Note: TROP infers treatment periods from the treatment indicator column.
1235
+ # The 'treated' column must be an absorbing state (D=1 for all periods
1236
+ # during and after treatment starts for each unit).
1237
+ results = trop_est.fit(
1238
+ panel_data,
1239
+ outcome='gdp_growth',
1240
+ treatment='treated',
1241
+ unit='state',
1242
+ time='year'
1243
+ )
1244
+
1245
+ # View results
1246
+ results.print_summary()
1247
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
1248
+ print(f"Effective rank: {results.effective_rank:.2f}")
1249
+
1250
+ # Selected tuning parameters
1251
+ print(f"λ_time: {results.lambda_time:.2f}")
1252
+ print(f"λ_unit: {results.lambda_unit:.2f}")
1253
+ print(f"λ_nn: {results.lambda_nn:.2f}")
1254
+
1255
+ # Examine unit effects
1256
+ unit_effects = results.get_unit_effects_df()
1257
+ print(unit_effects.head(10))
1258
+ ```
1259
+
1260
+ Output:
1261
+ ```
1262
+ ===========================================================================
1263
+ Triply Robust Panel (TROP) Estimation Results
1264
+ Athey, Imbens, Qu & Viviano (2025)
1265
+ ===========================================================================
1266
+
1267
+ Observations: 500
1268
+ Treated units: 1
1269
+ Control units: 49
1270
+ Treated observations: 4
1271
+ Pre-treatment periods: 6
1272
+ Post-treatment periods: 4
1273
+
1274
+ ---------------------------------------------------------------------------
1275
+ Tuning Parameters (selected via LOOCV)
1276
+ ---------------------------------------------------------------------------
1277
+ Lambda (time decay): 1.0000
1278
+ Lambda (unit distance): 0.5000
1279
+ Lambda (nuclear norm): 0.1000
1280
+ Effective rank: 2.35
1281
+ LOOCV score: 0.012345
1282
+ Variance method: bootstrap
1283
+ Bootstrap replications: 200
1284
+
1285
+ ---------------------------------------------------------------------------
1286
+ Parameter Estimate Std. Err. t-stat P>|t|
1287
+ ---------------------------------------------------------------------------
1288
+ ATT 2.5000 0.3892 6.424 0.0000 ***
1289
+ ---------------------------------------------------------------------------
1290
+
1291
+ 95% Confidence Interval: [1.7372, 3.2628]
1292
+
1293
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
1294
+ ===========================================================================
1295
+ ```
1296
+
1297
+ #### When to Use TROP Over Synthetic DiD
1298
+
1299
+ Use TROP when you suspect **factor structure** in the data—unobserved confounders that affect outcomes differently across units and time:
1300
+
1301
+ | Scenario | Use SDID | Use TROP |
1302
+ |----------|----------|----------|
1303
+ | Simple parallel trends | ✓ | ✓ |
1304
+ | Unobserved factors (e.g., economic cycles) | May be biased | ✓ |
1305
+ | Strong unit-time interactions | May be biased | ✓ |
1306
+ | Low-dimensional confounding | ✓ | ✓ |
1307
+
1308
+ **Example scenarios where TROP excels:**
1309
+ - Regional economic shocks that affect states differently based on industry composition
1310
+ - Global trends that impact countries differently based on their economic structure
1311
+ - Common factors in financial data (market risk, interest rates, etc.)
1312
+
1313
+ **How TROP works:**
1314
+
1315
+ 1. **Factor estimation**: Estimates interactive fixed effects L_it using nuclear norm regularization (encourages low-rank structure)
1316
+ 2. **Unit weights**: Exponential distance-based weighting ω_j = exp(-λ_unit × d(j,i)) where d(j,i) is the RMSE of outcome differences
1317
+ 3. **Time weights**: Exponential decay weighting θ_s = exp(-λ_time × |s-t|) based on proximity to treatment
1318
+ 4. **ATT computation**: τ = Y_it - α_i - β_t - L_it for treated observations
1319
+
1320
+ ```python
1321
+ # Compare TROP vs SDID under factor confounding
1322
+ from diff_diff import SyntheticDiD
1323
+
1324
+ # Synthetic DiD (may be biased with factors)
1325
+ sdid = SyntheticDiD()
1326
+ sdid_results = sdid.fit(data, outcome='y', treatment='treated',
1327
+ unit='unit', time='time', post_periods=[5,6,7])
1328
+
1329
+ # TROP (accounts for factors)
1330
+ # Note: TROP infers treatment periods from the treatment indicator column
1331
+ # (D=1 for treated observations, D=0 for control)
1332
+ trop_est = TROP() # Uses default grids with LOOCV selection
1333
+ trop_results = trop_est.fit(data, outcome='y', treatment='treated',
1334
+ unit='unit', time='time')
1335
+
1336
+ print(f"SDID estimate: {sdid_results.att:.3f}")
1337
+ print(f"TROP estimate: {trop_results.att:.3f}")
1338
+ print(f"Effective rank: {trop_results.effective_rank:.2f}")
1339
+ ```
1340
+
1341
+ **Tuning parameter grids:**
1342
+
1343
+ ```python
1344
+ # Custom tuning grids (searched via LOOCV)
1345
+ trop = TROP(
1346
+ lambda_time_grid=[0.0, 0.1, 0.5, 1.0, 2.0, 5.0], # Time decay
1347
+ lambda_unit_grid=[0.0, 0.1, 0.5, 1.0, 2.0, 5.0], # Unit distance
1348
+ lambda_nn_grid=[0.0, 0.01, 0.1, 1.0, 10.0] # Nuclear norm
1349
+ )
1350
+
1351
+ # Fixed tuning parameters (skip LOOCV search)
1352
+ trop = TROP(
1353
+ lambda_time_grid=[1.0], # Single value = fixed
1354
+ lambda_unit_grid=[1.0], # Single value = fixed
1355
+ lambda_nn_grid=[0.1] # Single value = fixed
1356
+ )
1357
+ ```
1358
+
1359
+ **Parameters:**
1360
+
1361
+ ```python
1362
+ TROP(
1363
+ method='twostep', # Estimation method: 'twostep' (default) or 'joint'
1364
+ lambda_time_grid=None, # Time decay grid (default: [0, 0.1, 0.5, 1, 2, 5])
1365
+ lambda_unit_grid=None, # Unit distance grid (default: [0, 0.1, 0.5, 1, 2, 5])
1366
+ lambda_nn_grid=None, # Nuclear norm grid (default: [0, 0.01, 0.1, 1, 10])
1367
+ max_iter=100, # Max iterations for factor estimation
1368
+ tol=1e-6, # Convergence tolerance
1369
+ alpha=0.05, # Significance level
1370
+ n_bootstrap=200, # Bootstrap replications
1371
+ seed=None # Random seed
1372
+ )
1373
+ ```
1374
+
1375
+ **Estimation methods:**
1376
+ - `'twostep'` (default): Per-observation model fitting following Algorithm 2 of the paper. Computes observation-specific weights and fits a model for each treated observation, then averages the individual treatment effects. More flexible but computationally intensive.
1377
+ - `'joint'`: Joint weighted least squares optimization. Estimates a single scalar treatment effect τ along with fixed effects and optional low-rank factor adjustment. Faster but assumes homogeneous treatment effects.
1378
+
1379
+ **Convenience function:**
1380
+
1381
+ ```python
1382
+ # One-liner estimation with default tuning grids
1383
+ # Note: TROP infers treatment periods from the treatment indicator
1384
+ results = trop(
1385
+ data,
1386
+ outcome='y',
1387
+ treatment='treated',
1388
+ unit='unit',
1389
+ time='time',
1390
+ n_bootstrap=200
1391
+ )
1392
+ ```
1393
+
1394
+ ## Working with Results
1395
+
1396
+ ### Export Results
1397
+
1398
+ ```python
1399
+ # As dictionary
1400
+ results.to_dict()
1401
+ # {'att': 3.5, 'se': 1.26, 'p_value': 0.037, ...}
1402
+
1403
+ # As DataFrame
1404
+ df = results.to_dataframe()
1405
+ ```
1406
+
1407
+ ### Check Significance
1408
+
1409
+ ```python
1410
+ if results.is_significant:
1411
+ print(f"Effect is significant at {did.alpha} level")
1412
+
1413
+ # Get significance stars
1414
+ print(f"ATT: {results.att}{results.significance_stars}")
1415
+ # ATT: 3.5000*
1416
+ ```
1417
+
1418
+ ### Access Full Regression Output
1419
+
1420
+ ```python
1421
+ # All coefficients
1422
+ results.coefficients
1423
+ # {'const': 9.5, 'treated': 1.0, 'post': 2.5, 'treated:post': 3.5}
1424
+
1425
+ # Variance-covariance matrix
1426
+ results.vcov
1427
+
1428
+ # Residuals and fitted values
1429
+ results.residuals
1430
+ results.fitted_values
1431
+
1432
+ # R-squared
1433
+ results.r_squared
1434
+ ```
1435
+
1436
+ ## Checking Assumptions
1437
+
1438
+ ### Parallel Trends
1439
+
1440
+ **Simple slope-based test:**
1441
+
1442
+ ```python
1443
+ from diff_diff.utils import check_parallel_trends
1444
+
1445
+ trends = check_parallel_trends(
1446
+ data,
1447
+ outcome='outcome',
1448
+ time='period',
1449
+ treatment_group='treated'
1450
+ )
1451
+
1452
+ print(f"Treated trend: {trends['treated_trend']:.4f}")
1453
+ print(f"Control trend: {trends['control_trend']:.4f}")
1454
+ print(f"Difference p-value: {trends['p_value']:.4f}")
1455
+ ```
1456
+
1457
+ **Robust distributional test (Wasserstein distance):**
1458
+
1459
+ ```python
1460
+ from diff_diff.utils import check_parallel_trends_robust
1461
+
1462
+ results = check_parallel_trends_robust(
1463
+ data,
1464
+ outcome='outcome',
1465
+ time='period',
1466
+ treatment_group='treated',
1467
+ unit='firm_id', # Unit identifier for panel data
1468
+ pre_periods=[2018, 2019], # Pre-treatment periods
1469
+ n_permutations=1000 # Permutations for p-value
1470
+ )
1471
+
1472
+ print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}")
1473
+ print(f"Wasserstein p-value: {results['wasserstein_p_value']:.4f}")
1474
+ print(f"KS test p-value: {results['ks_p_value']:.4f}")
1475
+ print(f"Parallel trends plausible: {results['parallel_trends_plausible']}")
1476
+ ```
1477
+
1478
+ The Wasserstein (Earth Mover's) distance compares the full distribution of outcome changes, not just means. This is more robust to:
1479
+ - Non-normal distributions
1480
+ - Heterogeneous effects across units
1481
+ - Outliers
1482
+
1483
+ **Equivalence testing (TOST):**
1484
+
1485
+ ```python
1486
+ from diff_diff.utils import equivalence_test_trends
1487
+
1488
+ results = equivalence_test_trends(
1489
+ data,
1490
+ outcome='outcome',
1491
+ time='period',
1492
+ treatment_group='treated',
1493
+ unit='firm_id',
1494
+ equivalence_margin=0.5 # Define "practically equivalent"
1495
+ )
1496
+
1497
+ print(f"Mean difference: {results['mean_difference']:.4f}")
1498
+ print(f"TOST p-value: {results['tost_p_value']:.4f}")
1499
+ print(f"Trends equivalent: {results['equivalent']}")
1500
+ ```
1501
+
1502
+ ### Honest DiD Sensitivity Analysis (Rambachan-Roth)
1503
+
1504
+ Pre-trends tests have low power and can exacerbate bias. **Honest DiD** (Rambachan & Roth 2023) provides sensitivity analysis showing how robust your results are to violations of parallel trends.
1505
+
1506
+ ```python
1507
+ from diff_diff import HonestDiD, MultiPeriodDiD
1508
+
1509
+ # First, fit a full event study (pre + post period effects)
1510
+ did = MultiPeriodDiD()
1511
+ event_results = did.fit(
1512
+ data,
1513
+ outcome='outcome',
1514
+ treatment='treated',
1515
+ time='period',
1516
+ post_periods=[5, 6, 7, 8, 9],
1517
+ reference_period=4, # Last pre-period (e=-1 convention)
1518
+ )
1519
+
1520
+ # Compute honest bounds with relative magnitudes restriction
1521
+ # M=1 means post-treatment violations can be up to 1x the worst pre-treatment violation
1522
+ honest = HonestDiD(method='relative_magnitude', M=1.0)
1523
+ honest_results = honest.fit(event_results)
1524
+
1525
+ print(honest_results.summary())
1526
+ print(f"Original estimate: {honest_results.original_estimate:.4f}")
1527
+ print(f"Robust 95% CI: [{honest_results.ci_lb:.4f}, {honest_results.ci_ub:.4f}]")
1528
+ print(f"Effect robust to violations: {honest_results.is_significant}")
1529
+ ```
1530
+
1531
+ **Sensitivity analysis over M values:**
1532
+
1533
+ ```python
1534
+ # How do results change as we allow larger violations?
1535
+ sensitivity = honest.sensitivity_analysis(
1536
+ event_results,
1537
+ M_grid=[0, 0.5, 1.0, 1.5, 2.0]
1538
+ )
1539
+
1540
+ print(sensitivity.summary())
1541
+ print(f"Breakdown value: M = {sensitivity.breakdown_M}")
1542
+ # Breakdown = smallest M where the robust CI includes zero
1543
+ ```
1544
+
1545
+ **Breakdown value:**
1546
+
1547
+ The breakdown value tells you how robust your conclusion is:
1548
+
1549
+ ```python
1550
+ breakdown = honest.breakdown_value(event_results)
1551
+ if breakdown >= 1.0:
1552
+ print("Result holds even if post-treatment violations are as bad as pre-treatment")
1553
+ else:
1554
+ print(f"Result requires violations smaller than {breakdown:.1f}x pre-treatment")
1555
+ ```
1556
+
1557
+ **Smoothness restriction (alternative approach):**
1558
+
1559
+ ```python
1560
+ # Bounds second differences of trend violations
1561
+ # M=0 means linear extrapolation of pre-trends
1562
+ honest_smooth = HonestDiD(method='smoothness', M=0.5)
1563
+ smooth_results = honest_smooth.fit(event_results)
1564
+ ```
1565
+
1566
+ **Visualization:**
1567
+
1568
+ ```python
1569
+ from diff_diff import plot_sensitivity, plot_honest_event_study
1570
+
1571
+ # Plot sensitivity analysis
1572
+ plot_sensitivity(sensitivity, title="Sensitivity to Parallel Trends Violations")
1573
+
1574
+ # Event study with honest confidence intervals
1575
+ plot_honest_event_study(event_results, honest_results)
1576
+ ```
1577
+
1578
+ ### Pre-Trends Power Analysis (Roth 2022)
1579
+
1580
+ A passing pre-trends test doesn't mean parallel trends holds—it may just mean the test has low power. **Pre-Trends Power Analysis** (Roth 2022) answers: "What violations could my pre-trends test have detected?"
1581
+
1582
+ ```python
1583
+ from diff_diff import PreTrendsPower, MultiPeriodDiD
1584
+
1585
+ # First, fit a full event study
1586
+ did = MultiPeriodDiD()
1587
+ event_results = did.fit(
1588
+ data,
1589
+ outcome='outcome',
1590
+ treatment='treated',
1591
+ time='period',
1592
+ post_periods=[5, 6, 7, 8, 9],
1593
+ reference_period=4,
1594
+ )
1595
+
1596
+ # Analyze pre-trends test power
1597
+ pt = PreTrendsPower(alpha=0.05, power=0.80)
1598
+ power_results = pt.fit(event_results)
1599
+
1600
+ print(power_results.summary())
1601
+ print(f"Minimum Detectable Violation (MDV): {power_results.mdv:.4f}")
1602
+ print(f"Power to detect violations of size MDV: {power_results.power:.1%}")
1603
+ ```
1604
+
1605
+ **Key concepts:**
1606
+
1607
+ - **Minimum Detectable Violation (MDV)**: Smallest violation magnitude that would be detected with your target power (e.g., 80%). Passing the pre-trends test does NOT rule out violations up to this size.
1608
+ - **Power**: Probability of detecting a violation of given size if it exists.
1609
+ - **Violation types**: Linear trend, constant violation, last-period only, or custom patterns.
1610
+
1611
+ **Power curve visualization:**
1612
+
1613
+ ```python
1614
+ from diff_diff import plot_pretrends_power
1615
+
1616
+ # Generate power curve across violation magnitudes
1617
+ curve = pt.power_curve(event_results)
1618
+
1619
+ # Plot the power curve
1620
+ plot_pretrends_power(curve, title="Pre-Trends Test Power Curve")
1621
+
1622
+ # Or from the curve object directly
1623
+ curve.plot()
1624
+ ```
1625
+
1626
+ **Different violation patterns:**
1627
+
1628
+ ```python
1629
+ # Linear trend violations (default) - most common assumption
1630
+ pt_linear = PreTrendsPower(violation_type='linear')
1631
+
1632
+ # Constant violation in all pre-periods
1633
+ pt_constant = PreTrendsPower(violation_type='constant')
1634
+
1635
+ # Violation only in the last pre-period (sharp break)
1636
+ pt_last = PreTrendsPower(violation_type='last_period')
1637
+
1638
+ # Custom violation pattern
1639
+ custom_weights = np.array([0.1, 0.3, 0.6]) # Increasing violations
1640
+ pt_custom = PreTrendsPower(violation_type='custom', violation_weights=custom_weights)
1641
+ ```
1642
+
1643
+ **Combining with HonestDiD:**
1644
+
1645
+ Pre-trends power analysis and HonestDiD are complementary:
1646
+ 1. **Pre-trends power** tells you what the test could have detected
1647
+ 2. **HonestDiD** tells you how robust your results are to violations
1648
+
1649
+ ```python
1650
+ from diff_diff import HonestDiD, PreTrendsPower
1651
+
1652
+ # If MDV is large relative to your estimated effect, be cautious
1653
+ pt = PreTrendsPower()
1654
+ power_results = pt.fit(event_results)
1655
+ sensitivity = pt.sensitivity_to_honest_did(event_results)
1656
+ print(sensitivity['interpretation'])
1657
+
1658
+ # Use HonestDiD for robust inference
1659
+ honest = HonestDiD(method='relative_magnitude', M=1.0)
1660
+ honest_results = honest.fit(event_results)
1661
+ ```
1662
+
1663
+ ### Placebo Tests
1664
+
1665
+ Placebo tests help validate the parallel trends assumption by checking whether effects appear where they shouldn't (before treatment or in untreated groups).
1666
+
1667
+ **Fake timing test:**
1668
+
1669
+ ```python
1670
+ from diff_diff import run_placebo_test
1671
+
1672
+ # Test: Is there an effect before treatment actually occurred?
1673
+ # Actual treatment is at period 3 (post_periods=[3, 4, 5])
1674
+ # We test if a "fake" treatment at period 1 shows an effect
1675
+ results = run_placebo_test(
1676
+ data,
1677
+ outcome='outcome',
1678
+ treatment='treated',
1679
+ time='period',
1680
+ test_type='fake_timing',
1681
+ fake_treatment_period=1, # Pretend treatment was in period 1
1682
+ post_periods=[3, 4, 5] # Actual post-treatment periods
1683
+ )
1684
+
1685
+ print(results.summary())
1686
+ # If parallel trends hold, placebo_effect should be ~0 and not significant
1687
+ print(f"Placebo effect: {results.placebo_effect:.3f} (p={results.p_value:.3f})")
1688
+ print(f"Is significant (bad): {results.is_significant}")
1689
+ ```
1690
+
1691
+ **Fake group test:**
1692
+
1693
+ ```python
1694
+ # Test: Is there an effect among never-treated units?
1695
+ # Get some control unit IDs to use as "fake treated"
1696
+ control_units = data[data['treated'] == 0]['firm_id'].unique()[:5]
1697
+
1698
+ results = run_placebo_test(
1699
+ data,
1700
+ outcome='outcome',
1701
+ treatment='treated',
1702
+ time='period',
1703
+ unit='firm_id',
1704
+ test_type='fake_group',
1705
+ fake_treatment_group=list(control_units), # List of control unit IDs
1706
+ post_periods=[3, 4, 5]
1707
+ )
1708
+ ```
1709
+
1710
+ **Permutation test:**
1711
+
1712
+ ```python
1713
+ # Randomly reassign treatment and compute distribution of effects
1714
+ # Note: requires binary post indicator (use 'post' column, not 'period')
1715
+ results = run_placebo_test(
1716
+ data,
1717
+ outcome='outcome',
1718
+ treatment='treated',
1719
+ time='post', # Binary post-treatment indicator
1720
+ unit='firm_id',
1721
+ test_type='permutation',
1722
+ n_permutations=1000,
1723
+ seed=42
1724
+ )
1725
+
1726
+ print(f"Original effect: {results.original_effect:.3f}")
1727
+ print(f"Permutation p-value: {results.p_value:.4f}")
1728
+ # Low p-value indicates the effect is unlikely to be due to chance
1729
+ ```
1730
+
1731
+ **Leave-one-out sensitivity:**
1732
+
1733
+ ```python
1734
+ # Test sensitivity to individual treated units
1735
+ # Note: requires binary post indicator (use 'post' column, not 'period')
1736
+ results = run_placebo_test(
1737
+ data,
1738
+ outcome='outcome',
1739
+ treatment='treated',
1740
+ time='post', # Binary post-treatment indicator
1741
+ unit='firm_id',
1742
+ test_type='leave_one_out'
1743
+ )
1744
+
1745
+ # Check if any single unit drives the result
1746
+ print(results.leave_one_out_effects) # Effect when each unit is dropped
1747
+ ```
1748
+
1749
+ **Run all placebo tests:**
1750
+
1751
+ ```python
1752
+ from diff_diff import run_all_placebo_tests
1753
+
1754
+ # Comprehensive diagnostic suite
1755
+ # Note: This function runs fake_timing tests on pre-treatment periods.
1756
+ # The permutation and leave_one_out tests require a binary post indicator,
1757
+ # so they may return errors if the data uses multi-period time column.
1758
+ all_results = run_all_placebo_tests(
1759
+ data,
1760
+ outcome='outcome',
1761
+ treatment='treated',
1762
+ time='period',
1763
+ unit='firm_id',
1764
+ pre_periods=[0, 1, 2],
1765
+ post_periods=[3, 4, 5],
1766
+ n_permutations=500,
1767
+ seed=42
1768
+ )
1769
+
1770
+ for test_name, result in all_results.items():
1771
+ if hasattr(result, 'p_value'):
1772
+ print(f"{test_name}: p={result.p_value:.3f}, significant={result.is_significant}")
1773
+ elif isinstance(result, dict) and 'error' in result:
1774
+ print(f"{test_name}: Error - {result['error']}")
1775
+ ```
1776
+
1777
+ ## API Reference
1778
+
1779
+ ### DifferenceInDifferences
1780
+
1781
+ ```python
1782
+ DifferenceInDifferences(
1783
+ robust=True, # Use HC1 robust standard errors
1784
+ cluster=None, # Column for cluster-robust SEs
1785
+ alpha=0.05 # Significance level for CIs
1786
+ )
1787
+ ```
1788
+
1789
+ **Methods:**
1790
+
1791
+ | Method | Description |
1792
+ |--------|-------------|
1793
+ | `fit(data, outcome, treatment, time, ...)` | Fit the DiD model |
1794
+ | `summary()` | Get formatted summary string |
1795
+ | `print_summary()` | Print summary to stdout |
1796
+ | `get_params()` | Get estimator parameters (sklearn-compatible) |
1797
+ | `set_params(**params)` | Set estimator parameters (sklearn-compatible) |
1798
+
1799
+ **fit() Parameters:**
1800
+
1801
+ | Parameter | Type | Description |
1802
+ |-----------|------|-------------|
1803
+ | `data` | DataFrame | Input data |
1804
+ | `outcome` | str | Outcome variable column name |
1805
+ | `treatment` | str | Treatment indicator column (0/1) |
1806
+ | `time` | str | Post-treatment indicator column (0/1) |
1807
+ | `formula` | str | R-style formula (alternative to column names) |
1808
+ | `covariates` | list | Linear control variables |
1809
+ | `fixed_effects` | list | Categorical FE columns (creates dummies) |
1810
+ | `absorb` | list | High-dimensional FE (within-transformation) |
1811
+
1812
+ ### DiDResults
1813
+
1814
+ **Attributes:**
1815
+
1816
+ | Attribute | Description |
1817
+ |-----------|-------------|
1818
+ | `att` | Average Treatment effect on the Treated |
1819
+ | `se` | Standard error of ATT |
1820
+ | `t_stat` | T-statistic |
1821
+ | `p_value` | P-value for H0: ATT = 0 |
1822
+ | `conf_int` | Tuple of (lower, upper) confidence bounds |
1823
+ | `n_obs` | Number of observations |
1824
+ | `n_treated` | Number of treated units |
1825
+ | `n_control` | Number of control units |
1826
+ | `r_squared` | R-squared of regression |
1827
+ | `coefficients` | Dictionary of all coefficients |
1828
+ | `is_significant` | Boolean for significance at alpha |
1829
+ | `significance_stars` | String of significance stars |
1830
+
1831
+ **Methods:**
1832
+
1833
+ | Method | Description |
1834
+ |--------|-------------|
1835
+ | `summary(alpha)` | Get formatted summary string |
1836
+ | `print_summary(alpha)` | Print summary to stdout |
1837
+ | `to_dict()` | Convert to dictionary |
1838
+ | `to_dataframe()` | Convert to pandas DataFrame |
1839
+
1840
+ ### MultiPeriodDiD
1841
+
1842
+ ```python
1843
+ MultiPeriodDiD(
1844
+ robust=True, # Use HC1 robust standard errors
1845
+ cluster=None, # Column for cluster-robust SEs
1846
+ alpha=0.05 # Significance level for CIs
1847
+ )
1848
+ ```
1849
+
1850
+ **fit() Parameters:**
1851
+
1852
+ | Parameter | Type | Description |
1853
+ |-----------|------|-------------|
1854
+ | `data` | DataFrame | Input data |
1855
+ | `outcome` | str | Outcome variable column name |
1856
+ | `treatment` | str | Treatment indicator column (0/1) |
1857
+ | `time` | str | Time period column (multiple values) |
1858
+ | `post_periods` | list | List of post-treatment period values |
1859
+ | `covariates` | list | Linear control variables |
1860
+ | `fixed_effects` | list | Categorical FE columns (creates dummies) |
1861
+ | `absorb` | list | High-dimensional FE (within-transformation) |
1862
+ | `reference_period` | any | Omitted period (default: last pre-period, e=-1 convention) |
1863
+ | `unit` | str | Unit identifier column (for staggered adoption warning) |
1864
+
1865
+ ### MultiPeriodDiDResults
1866
+
1867
+ **Attributes:**
1868
+
1869
+ | Attribute | Description |
1870
+ |-----------|-------------|
1871
+ | `period_effects` | Dict mapping periods to PeriodEffect objects (pre and post, excluding reference) |
1872
+ | `avg_att` | Average ATT across post-treatment periods only |
1873
+ | `avg_se` | Standard error of average ATT |
1874
+ | `avg_t_stat` | T-statistic for average ATT |
1875
+ | `avg_p_value` | P-value for average ATT |
1876
+ | `avg_conf_int` | Confidence interval for average ATT |
1877
+ | `n_obs` | Number of observations |
1878
+ | `pre_periods` | List of pre-treatment periods |
1879
+ | `post_periods` | List of post-treatment periods |
1880
+ | `reference_period` | The omitted reference period (coefficient = 0 by construction) |
1881
+ | `interaction_indices` | Dict mapping period → column index in VCV (for sub-VCV extraction) |
1882
+ | `pre_period_effects` | Property: pre-period effects only (for parallel trends assessment) |
1883
+ | `post_period_effects` | Property: post-period effects only |
1884
+
1885
+ **Methods:**
1886
+
1887
+ | Method | Description |
1888
+ |--------|-------------|
1889
+ | `get_effect(period)` | Get PeriodEffect for specific period |
1890
+ | `summary(alpha)` | Get formatted summary string |
1891
+ | `print_summary(alpha)` | Print summary to stdout |
1892
+ | `to_dict()` | Convert to dictionary |
1893
+ | `to_dataframe()` | Convert to pandas DataFrame |
1894
+
1895
+ ### PeriodEffect
1896
+
1897
+ **Attributes:**
1898
+
1899
+ | Attribute | Description |
1900
+ |-----------|-------------|
1901
+ | `period` | Time period identifier |
1902
+ | `effect` | Treatment effect estimate |
1903
+ | `se` | Standard error |
1904
+ | `t_stat` | T-statistic |
1905
+ | `p_value` | P-value |
1906
+ | `conf_int` | Confidence interval |
1907
+ | `is_significant` | Boolean for significance at 0.05 |
1908
+ | `significance_stars` | String of significance stars |
1909
+
1910
+ ### SyntheticDiD
1911
+
1912
+ ```python
1913
+ SyntheticDiD(
1914
+ zeta_omega=None, # Unit weight regularization (None = auto from data)
1915
+ zeta_lambda=None, # Time weight regularization (None = auto from data)
1916
+ alpha=0.05, # Significance level for CIs
1917
+ variance_method="placebo", # "placebo" (R default) or "bootstrap"
1918
+ n_bootstrap=200, # Replications for SE estimation
1919
+ seed=None # Random seed for reproducibility
1920
+ )
1921
+ ```
1922
+
1923
+ **fit() Parameters:**
1924
+
1925
+ | Parameter | Type | Description |
1926
+ |-----------|------|-------------|
1927
+ | `data` | DataFrame | Panel data |
1928
+ | `outcome` | str | Outcome variable column name |
1929
+ | `treatment` | str | Treatment indicator column (0/1) |
1930
+ | `unit` | str | Unit identifier column |
1931
+ | `time` | str | Time period column |
1932
+ | `post_periods` | list | List of post-treatment period values |
1933
+ | `covariates` | list | Covariates to residualize out |
1934
+
1935
+ ### SyntheticDiDResults
1936
+
1937
+ **Attributes:**
1938
+
1939
+ | Attribute | Description |
1940
+ |-----------|-------------|
1941
+ | `att` | Average Treatment effect on the Treated |
1942
+ | `se` | Standard error (bootstrap or placebo-based) |
1943
+ | `t_stat` | T-statistic |
1944
+ | `p_value` | P-value |
1945
+ | `conf_int` | Confidence interval |
1946
+ | `n_obs` | Number of observations |
1947
+ | `n_treated` | Number of treated units |
1948
+ | `n_control` | Number of control units |
1949
+ | `unit_weights` | Dict mapping control unit IDs to weights |
1950
+ | `time_weights` | Dict mapping pre-treatment periods to weights |
1951
+ | `pre_periods` | List of pre-treatment periods |
1952
+ | `post_periods` | List of post-treatment periods |
1953
+ | `pre_treatment_fit` | RMSE of synthetic vs treated in pre-period |
1954
+ | `placebo_effects` | Array of placebo effect estimates |
1955
+
1956
+ **Methods:**
1957
+
1958
+ | Method | Description |
1959
+ |--------|-------------|
1960
+ | `summary(alpha)` | Get formatted summary string |
1961
+ | `print_summary(alpha)` | Print summary to stdout |
1962
+ | `to_dict()` | Convert to dictionary |
1963
+ | `to_dataframe()` | Convert to pandas DataFrame |
1964
+ | `get_unit_weights_df()` | Get unit weights as DataFrame |
1965
+ | `get_time_weights_df()` | Get time weights as DataFrame |
1966
+
1967
+ ### TROP
1968
+
1969
+ ```python
1970
+ TROP(
1971
+ lambda_time_grid=None, # Time decay grid (default: [0, 0.1, 0.5, 1, 2, 5])
1972
+ lambda_unit_grid=None, # Unit distance grid (default: [0, 0.1, 0.5, 1, 2, 5])
1973
+ lambda_nn_grid=None, # Nuclear norm grid (default: [0, 0.01, 0.1, 1, 10])
1974
+ max_iter=100, # Max iterations for factor estimation
1975
+ tol=1e-6, # Convergence tolerance
1976
+ alpha=0.05, # Significance level for CIs
1977
+ n_bootstrap=200, # Bootstrap replications
1978
+ seed=None # Random seed
1979
+ )
1980
+ ```
1981
+
1982
+ **fit() Parameters:**
1983
+
1984
+ | Parameter | Type | Description |
1985
+ |-----------|------|-------------|
1986
+ | `data` | DataFrame | Panel data |
1987
+ | `outcome` | str | Outcome variable column name |
1988
+ | `treatment` | str | Treatment indicator column (0/1 absorbing state) |
1989
+ | `unit` | str | Unit identifier column |
1990
+ | `time` | str | Time period column |
1991
+
1992
+ Note: TROP infers treatment periods from the treatment indicator column. The treatment column should be an absorbing state indicator where D=1 for all periods during and after treatment starts.
1993
+
1994
+ ### TROPResults
1995
+
1996
+ **Attributes:**
1997
+
1998
+ | Attribute | Description |
1999
+ |-----------|-------------|
2000
+ | `att` | Average Treatment effect on the Treated |
2001
+ | `se` | Standard error (bootstrap) |
2002
+ | `t_stat` | T-statistic |
2003
+ | `p_value` | P-value |
2004
+ | `conf_int` | Confidence interval |
2005
+ | `n_obs` | Number of observations |
2006
+ | `n_treated` | Number of treated units |
2007
+ | `n_control` | Number of control units |
2008
+ | `n_treated_obs` | Number of treated unit-time observations |
2009
+ | `unit_effects` | Dict mapping unit IDs to fixed effects |
2010
+ | `time_effects` | Dict mapping periods to fixed effects |
2011
+ | `treatment_effects` | Dict mapping (unit, time) to individual effects |
2012
+ | `lambda_time` | Selected time decay parameter |
2013
+ | `lambda_unit` | Selected unit distance parameter |
2014
+ | `lambda_nn` | Selected nuclear norm parameter |
2015
+ | `factor_matrix` | Low-rank factor matrix L (n_periods x n_units) |
2016
+ | `effective_rank` | Effective rank of factor matrix |
2017
+ | `loocv_score` | LOOCV score for selected parameters |
2018
+ | `n_pre_periods` | Number of pre-treatment periods |
2019
+ | `n_post_periods` | Number of post-treatment periods |
2020
+ | `bootstrap_distribution` | Bootstrap distribution (if bootstrap) |
2021
+
2022
+ **Methods:**
2023
+
2024
+ | Method | Description |
2025
+ |--------|-------------|
2026
+ | `summary(alpha)` | Get formatted summary string |
2027
+ | `print_summary(alpha)` | Print summary to stdout |
2028
+ | `to_dict()` | Convert to dictionary |
2029
+ | `to_dataframe()` | Convert to pandas DataFrame |
2030
+ | `get_unit_effects_df()` | Get unit fixed effects as DataFrame |
2031
+ | `get_time_effects_df()` | Get time fixed effects as DataFrame |
2032
+ | `get_treatment_effects_df()` | Get individual treatment effects as DataFrame |
2033
+
2034
+ ### SunAbraham
2035
+
2036
+ ```python
2037
+ SunAbraham(
2038
+ control_group='never_treated', # or 'not_yet_treated'
2039
+ anticipation=0, # Periods of anticipation effects
2040
+ alpha=0.05, # Significance level for CIs
2041
+ cluster=None, # Column for cluster-robust SEs
2042
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
2043
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
2044
+ seed=None # Random seed
2045
+ )
2046
+ ```
2047
+
2048
+ **fit() Parameters:**
2049
+
2050
+ | Parameter | Type | Description |
2051
+ |-----------|------|-------------|
2052
+ | `data` | DataFrame | Panel data |
2053
+ | `outcome` | str | Outcome variable column name |
2054
+ | `unit` | str | Unit identifier column |
2055
+ | `time` | str | Time period column |
2056
+ | `first_treat` | str | Column with first treatment period (0 for never-treated) |
2057
+ | `covariates` | list | Covariate column names |
2058
+ | `min_pre_periods` | int | Minimum pre-treatment periods to include |
2059
+ | `min_post_periods` | int | Minimum post-treatment periods to include |
2060
+
2061
+ ### SunAbrahamResults
2062
+
2063
+ **Attributes:**
2064
+
2065
+ | Attribute | Description |
2066
+ |-----------|-------------|
2067
+ | `event_study_effects` | Dict mapping relative time to effect info |
2068
+ | `overall_att` | Overall average treatment effect |
2069
+ | `overall_se` | Standard error of overall ATT |
2070
+ | `overall_t_stat` | T-statistic for overall ATT |
2071
+ | `overall_p_value` | P-value for overall ATT |
2072
+ | `overall_conf_int` | Confidence interval for overall ATT |
2073
+ | `cohort_weights` | Dict mapping relative time to cohort weights |
2074
+ | `groups` | List of treatment cohorts |
2075
+ | `time_periods` | List of all time periods |
2076
+ | `n_obs` | Total number of observations |
2077
+ | `n_treated_units` | Number of ever-treated units |
2078
+ | `n_control_units` | Number of never-treated units |
2079
+ | `is_significant` | Boolean for significance at alpha |
2080
+ | `significance_stars` | String of significance stars |
2081
+ | `bootstrap_results` | SABootstrapResults (if bootstrap enabled) |
2082
+
2083
+ **Methods:**
2084
+
2085
+ | Method | Description |
2086
+ |--------|-------------|
2087
+ | `summary(alpha)` | Get formatted summary string |
2088
+ | `print_summary(alpha)` | Print summary to stdout |
2089
+ | `to_dataframe(level)` | Convert to DataFrame ('event_study' or 'cohort') |
2090
+
2091
+ ### ImputationDiD
2092
+
2093
+ ```python
2094
+ ImputationDiD(
2095
+ anticipation=0, # Periods of anticipation effects
2096
+ alpha=0.05, # Significance level for CIs
2097
+ cluster=None, # Column for cluster-robust SEs
2098
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical)
2099
+ seed=None, # Random seed
2100
+ rank_deficient_action='warn', # 'warn', 'error', or 'silent'
2101
+ horizon_max=None, # Max event-study horizon
2102
+ aux_partition='cohort_horizon', # Variance partition
2103
+ )
2104
+ ```
2105
+
2106
+ **fit() Parameters:**
2107
+
2108
+ | Parameter | Type | Description |
2109
+ |-----------|------|-------------|
2110
+ | `data` | DataFrame | Panel data |
2111
+ | `outcome` | str | Outcome variable column name |
2112
+ | `unit` | str | Unit identifier column |
2113
+ | `time` | str | Time period column |
2114
+ | `first_treat` | str | First treatment period column (0 for never-treated) |
2115
+ | `covariates` | list | Covariate column names |
2116
+ | `aggregate` | str | Aggregation: None, "event_study", "group", "all" |
2117
+ | `balance_e` | int | Balance event study to this many pre-treatment periods |
2118
+
2119
+ ### ImputationDiDResults
2120
+
2121
+ **Attributes:**
2122
+
2123
+ | Attribute | Description |
2124
+ |-----------|-------------|
2125
+ | `overall_att` | Overall average treatment effect on the treated |
2126
+ | `overall_se` | Standard error (conservative, Theorem 3) |
2127
+ | `overall_t_stat` | T-statistic |
2128
+ | `overall_p_value` | P-value for H0: ATT = 0 |
2129
+ | `overall_conf_int` | Confidence interval |
2130
+ | `event_study_effects` | Dict of relative time -> effect dict (if `aggregate='event_study'` or `'all'`) |
2131
+ | `group_effects` | Dict of cohort -> effect dict (if `aggregate='group'` or `'all'`) |
2132
+ | `treatment_effects` | DataFrame of unit-level imputed treatment effects |
2133
+ | `n_treated_obs` | Number of treated observations |
2134
+ | `n_untreated_obs` | Number of untreated observations |
2135
+
2136
+ **Methods:**
2137
+
2138
+ | Method | Description |
2139
+ |--------|-------------|
2140
+ | `summary(alpha)` | Get formatted summary string |
2141
+ | `print_summary(alpha)` | Print summary to stdout |
2142
+ | `to_dataframe(level)` | Convert to DataFrame ('observation', 'event_study', 'group') |
2143
+ | `pretrend_test(n_leads)` | Run pre-trend F-test (Equation 9) |
2144
+
2145
+ ### TripleDifference
2146
+
2147
+ ```python
2148
+ TripleDifference(
2149
+ estimation_method='dr', # 'dr' (doubly robust), 'reg', or 'ipw'
2150
+ robust=True, # Use HC1 robust standard errors
2151
+ cluster=None, # Column for cluster-robust SEs
2152
+ alpha=0.05, # Significance level for CIs
2153
+ pscore_trim=0.01 # Propensity score trimming threshold
2154
+ )
2155
+ ```
2156
+
2157
+ **fit() Parameters:**
2158
+
2159
+ | Parameter | Type | Description |
2160
+ |-----------|------|-------------|
2161
+ | `data` | DataFrame | Input data |
2162
+ | `outcome` | str | Outcome variable column name |
2163
+ | `group` | str | Group indicator column (0/1): 1=treated group |
2164
+ | `partition` | str | Partition/eligibility indicator column (0/1): 1=eligible |
2165
+ | `time` | str | Time indicator column (0/1): 1=post-treatment |
2166
+ | `covariates` | list | Covariate column names for adjustment |
2167
+
2168
+ ### TripleDifferenceResults
2169
+
2170
+ **Attributes:**
2171
+
2172
+ | Attribute | Description |
2173
+ |-----------|-------------|
2174
+ | `att` | Average Treatment effect on the Treated |
2175
+ | `se` | Standard error of ATT |
2176
+ | `t_stat` | T-statistic |
2177
+ | `p_value` | P-value for H0: ATT = 0 |
2178
+ | `conf_int` | Tuple of (lower, upper) confidence bounds |
2179
+ | `n_obs` | Total number of observations |
2180
+ | `n_treated_eligible` | Obs in treated group & eligible partition |
2181
+ | `n_treated_ineligible` | Obs in treated group & ineligible partition |
2182
+ | `n_control_eligible` | Obs in control group & eligible partition |
2183
+ | `n_control_ineligible` | Obs in control group & ineligible partition |
2184
+ | `estimation_method` | Method used ('dr', 'reg', or 'ipw') |
2185
+ | `group_means` | Dict of cell means for diagnostics |
2186
+ | `pscore_stats` | Propensity score statistics (IPW/DR only) |
2187
+ | `is_significant` | Boolean for significance at alpha |
2188
+ | `significance_stars` | String of significance stars |
2189
+
2190
+ **Methods:**
2191
+
2192
+ | Method | Description |
2193
+ |--------|-------------|
2194
+ | `summary(alpha)` | Get formatted summary string |
2195
+ | `print_summary(alpha)` | Print summary to stdout |
2196
+ | `to_dict()` | Convert to dictionary |
2197
+ | `to_dataframe()` | Convert to pandas DataFrame |
2198
+
2199
+ ### HonestDiD
2200
+
2201
+ ```python
2202
+ HonestDiD(
2203
+ method='relative_magnitude', # 'relative_magnitude' or 'smoothness'
2204
+ M=None, # Restriction parameter (default: 1.0 for RM, 0.0 for SD)
2205
+ alpha=0.05, # Significance level for CIs
2206
+ l_vec=None # Linear combination vector for target parameter
2207
+ )
2208
+ ```
2209
+
2210
+ **fit() Parameters:**
2211
+
2212
+ | Parameter | Type | Description |
2213
+ |-----------|------|-------------|
2214
+ | `results` | MultiPeriodDiDResults | Results from MultiPeriodDiD.fit() |
2215
+ | `M` | float | Restriction parameter (overrides constructor value) |
2216
+
2217
+ **Methods:**
2218
+
2219
+ | Method | Description |
2220
+ |--------|-------------|
2221
+ | `fit(results, M)` | Compute bounds for given event study results |
2222
+ | `sensitivity_analysis(results, M_grid)` | Compute bounds over grid of M values |
2223
+ | `breakdown_value(results, tol)` | Find smallest M where CI includes zero |
2224
+
2225
+ ### HonestDiDResults
2226
+
2227
+ **Attributes:**
2228
+
2229
+ | Attribute | Description |
2230
+ |-----------|-------------|
2231
+ | `original_estimate` | Point estimate under parallel trends |
2232
+ | `lb` | Lower bound of identified set |
2233
+ | `ub` | Upper bound of identified set |
2234
+ | `ci_lb` | Lower bound of robust confidence interval |
2235
+ | `ci_ub` | Upper bound of robust confidence interval |
2236
+ | `ci_width` | Width of robust CI |
2237
+ | `M` | Restriction parameter used |
2238
+ | `method` | Restriction method ('relative_magnitude' or 'smoothness') |
2239
+ | `alpha` | Significance level |
2240
+ | `is_significant` | True if robust CI excludes zero |
2241
+
2242
+ **Methods:**
2243
+
2244
+ | Method | Description |
2245
+ |--------|-------------|
2246
+ | `summary()` | Get formatted summary string |
2247
+ | `to_dict()` | Convert to dictionary |
2248
+ | `to_dataframe()` | Convert to pandas DataFrame |
2249
+
2250
+ ### SensitivityResults
2251
+
2252
+ **Attributes:**
2253
+
2254
+ | Attribute | Description |
2255
+ |-----------|-------------|
2256
+ | `M_grid` | Array of M values analyzed |
2257
+ | `results` | List of HonestDiDResults for each M |
2258
+ | `breakdown_M` | Smallest M where CI includes zero (None if always significant) |
2259
+
2260
+ **Methods:**
2261
+
2262
+ | Method | Description |
2263
+ |--------|-------------|
2264
+ | `summary()` | Get formatted summary string |
2265
+ | `plot(ax)` | Plot sensitivity analysis |
2266
+ | `to_dataframe()` | Convert to pandas DataFrame |
2267
+
2268
+ ### PreTrendsPower
2269
+
2270
+ ```python
2271
+ PreTrendsPower(
2272
+ alpha=0.05, # Significance level for pre-trends test
2273
+ power=0.80, # Target power for MDV calculation
2274
+ violation_type='linear', # 'linear', 'constant', 'last_period', 'custom'
2275
+ violation_weights=None # Custom weights (required if violation_type='custom')
2276
+ )
2277
+ ```
2278
+
2279
+ **fit() Parameters:**
2280
+
2281
+ | Parameter | Type | Description |
2282
+ |-----------|------|-------------|
2283
+ | `results` | MultiPeriodDiDResults | Results from event study |
2284
+ | `M` | float | Specific violation magnitude to evaluate |
2285
+
2286
+ **Methods:**
2287
+
2288
+ | Method | Description |
2289
+ |--------|-------------|
2290
+ | `fit(results, M)` | Compute power analysis for given event study |
2291
+ | `power_at(results, M)` | Compute power for specific violation magnitude |
2292
+ | `power_curve(results, M_grid, n_points)` | Compute power across range of M values |
2293
+ | `sensitivity_to_honest_did(results)` | Compare with HonestDiD analysis |
2294
+
2295
+ ### PreTrendsPowerResults
2296
+
2297
+ **Attributes:**
2298
+
2299
+ | Attribute | Description |
2300
+ |-----------|-------------|
2301
+ | `power` | Power to detect the specified violation |
2302
+ | `mdv` | Minimum detectable violation at target power |
2303
+ | `violation_magnitude` | Violation magnitude (M) tested |
2304
+ | `violation_type` | Type of violation pattern |
2305
+ | `alpha` | Significance level |
2306
+ | `target_power` | Target power level |
2307
+ | `n_pre_periods` | Number of pre-treatment periods |
2308
+ | `test_statistic` | Expected test statistic under violation |
2309
+ | `critical_value` | Critical value for pre-trends test |
2310
+ | `noncentrality` | Non-centrality parameter |
2311
+ | `is_informative` | Heuristic check if test is informative |
2312
+ | `power_adequate` | Whether power meets target |
2313
+
2314
+ **Methods:**
2315
+
2316
+ | Method | Description |
2317
+ |--------|-------------|
2318
+ | `summary()` | Get formatted summary string |
2319
+ | `print_summary()` | Print summary to stdout |
2320
+ | `to_dict()` | Convert to dictionary |
2321
+ | `to_dataframe()` | Convert to pandas DataFrame |
2322
+
2323
+ ### PreTrendsPowerCurve
2324
+
2325
+ **Attributes:**
2326
+
2327
+ | Attribute | Description |
2328
+ |-----------|-------------|
2329
+ | `M_values` | Array of violation magnitudes |
2330
+ | `powers` | Array of power values |
2331
+ | `mdv` | Minimum detectable violation |
2332
+ | `alpha` | Significance level |
2333
+ | `target_power` | Target power level |
2334
+ | `violation_type` | Type of violation pattern |
2335
+
2336
+ **Methods:**
2337
+
2338
+ | Method | Description |
2339
+ |--------|-------------|
2340
+ | `plot(ax, show_mdv, show_target)` | Plot power curve |
2341
+ | `to_dataframe()` | Convert to DataFrame with M and power columns |
2342
+
2343
+ ### Data Preparation Functions
2344
+
2345
+ #### generate_did_data
2346
+
2347
+ ```python
2348
+ generate_did_data(
2349
+ n_units=100, # Number of units
2350
+ n_periods=4, # Number of time periods
2351
+ treatment_effect=5.0, # True ATT
2352
+ treatment_fraction=0.5, # Fraction treated
2353
+ treatment_period=2, # First post-treatment period
2354
+ unit_fe_sd=2.0, # Unit fixed effect std dev
2355
+ time_trend=0.5, # Linear time trend
2356
+ noise_sd=1.0, # Idiosyncratic noise std dev
2357
+ seed=None # Random seed
2358
+ )
2359
+ ```
2360
+
2361
+ Returns DataFrame with columns: `unit`, `period`, `treated`, `post`, `outcome`, `true_effect`.
2362
+
2363
+ #### make_treatment_indicator
2364
+
2365
+ ```python
2366
+ make_treatment_indicator(
2367
+ data, # Input DataFrame
2368
+ column, # Column to create treatment from
2369
+ treated_values=None, # Value(s) indicating treatment
2370
+ threshold=None, # Numeric threshold for treatment
2371
+ above_threshold=True, # If True, >= threshold is treated
2372
+ new_column='treated' # Output column name
2373
+ )
2374
+ ```
2375
+
2376
+ #### make_post_indicator
2377
+
2378
+ ```python
2379
+ make_post_indicator(
2380
+ data, # Input DataFrame
2381
+ time_column, # Time/period column
2382
+ post_periods=None, # Specific post-treatment period(s)
2383
+ treatment_start=None, # First post-treatment period
2384
+ new_column='post' # Output column name
2385
+ )
2386
+ ```
2387
+
2388
+ #### wide_to_long
2389
+
2390
+ ```python
2391
+ wide_to_long(
2392
+ data, # Wide-format DataFrame
2393
+ value_columns, # List of time-varying columns
2394
+ id_column, # Unit identifier column
2395
+ time_name='period', # Name for time column
2396
+ value_name='value', # Name for value column
2397
+ time_values=None # Values for time periods
2398
+ )
2399
+ ```
2400
+
2401
+ #### balance_panel
2402
+
2403
+ ```python
2404
+ balance_panel(
2405
+ data, # Panel DataFrame
2406
+ unit_column, # Unit identifier column
2407
+ time_column, # Time period column
2408
+ method='inner', # 'inner', 'outer', or 'fill'
2409
+ fill_value=None # Value for filling (if method='fill')
2410
+ )
2411
+ ```
2412
+
2413
+ #### validate_did_data
2414
+
2415
+ ```python
2416
+ validate_did_data(
2417
+ data, # DataFrame to validate
2418
+ outcome, # Outcome column name
2419
+ treatment, # Treatment column name
2420
+ time, # Time/post column name
2421
+ unit=None, # Unit column (for panel validation)
2422
+ raise_on_error=True # Raise ValueError or return dict
2423
+ )
2424
+ ```
2425
+
2426
+ Returns dict with `valid`, `errors`, `warnings`, and `summary` keys.
2427
+
2428
+ #### summarize_did_data
2429
+
2430
+ ```python
2431
+ summarize_did_data(
2432
+ data, # Input DataFrame
2433
+ outcome, # Outcome column name
2434
+ treatment, # Treatment column name
2435
+ time, # Time/post column name
2436
+ unit=None # Unit column (optional)
2437
+ )
2438
+ ```
2439
+
2440
+ Returns DataFrame with summary statistics by treatment-time cell.
2441
+
2442
+ #### create_event_time
2443
+
2444
+ ```python
2445
+ create_event_time(
2446
+ data, # Panel DataFrame
2447
+ time_column, # Calendar time column
2448
+ treatment_time_column, # Column with treatment timing
2449
+ new_column='event_time' # Output column name
2450
+ )
2451
+ ```
2452
+
2453
+ #### aggregate_to_cohorts
2454
+
2455
+ ```python
2456
+ aggregate_to_cohorts(
2457
+ data, # Unit-level panel data
2458
+ unit_column, # Unit identifier column
2459
+ time_column, # Time period column
2460
+ treatment_column, # Treatment indicator column
2461
+ outcome, # Outcome variable column
2462
+ covariates=None # Additional columns to aggregate
2463
+ )
2464
+ ```
2465
+
2466
+ #### rank_control_units
2467
+
2468
+ ```python
2469
+ rank_control_units(
2470
+ data, # Panel data in long format
2471
+ unit_column, # Unit identifier column
2472
+ time_column, # Time period column
2473
+ outcome_column, # Outcome variable column
2474
+ treatment_column=None, # Treatment indicator column (0/1)
2475
+ treated_units=None, # Explicit list of treated unit IDs
2476
+ pre_periods=None, # Pre-treatment periods (default: first half)
2477
+ covariates=None, # Covariate columns for matching
2478
+ outcome_weight=0.7, # Weight for outcome trend similarity (0-1)
2479
+ covariate_weight=0.3, # Weight for covariate distance (0-1)
2480
+ exclude_units=None, # Units to exclude from control pool
2481
+ require_units=None, # Units that must appear in output
2482
+ n_top=None, # Return only top N controls
2483
+ suggest_treatment_candidates=False, # Identify treatment candidates
2484
+ n_treatment_candidates=5, # Number of treatment candidates
2485
+ lambda_reg=0.0 # Regularization for synthetic weights
2486
+ )
2487
+ ```
2488
+
2489
+ Returns DataFrame with columns: `unit`, `quality_score`, `outcome_trend_score`, `covariate_score`, `synthetic_weight`, `pre_trend_rmse`, `is_required`.
2490
+
2491
+ ## Requirements
2492
+
2493
+ - Python 3.9 - 3.13
2494
+ - numpy >= 1.20
2495
+ - pandas >= 1.3
2496
+ - scipy >= 1.7
2497
+
2498
+ ## Development
2499
+
2500
+ ```bash
2501
+ # Install with dev dependencies
2502
+ pip install -e ".[dev]"
2503
+
2504
+ # Run tests
2505
+ pytest
2506
+
2507
+ # Format code
2508
+ black diff_diff tests
2509
+ ruff check diff_diff tests
2510
+ ```
2511
+
2512
+ ## References
2513
+
2514
+ This library implements methods from the following scholarly works:
2515
+
2516
+ ### Difference-in-Differences
2517
+
2518
+ - **Ashenfelter, O., & Card, D. (1985).** "Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs." *The Review of Economics and Statistics*, 67(4), 648-660. [https://doi.org/10.2307/1924810](https://doi.org/10.2307/1924810)
2519
+
2520
+ - **Card, D., & Krueger, A. B. (1994).** "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania." *The American Economic Review*, 84(4), 772-793. [https://www.jstor.org/stable/2118030](https://www.jstor.org/stable/2118030)
2521
+
2522
+ - **Angrist, J. D., & Pischke, J.-S. (2009).** *Mostly Harmless Econometrics: An Empiricist's Companion*. Princeton University Press. Chapter 5: Differences-in-Differences.
2523
+
2524
+ ### Two-Way Fixed Effects
2525
+
2526
+ - **Wooldridge, J. M. (2010).** *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press.
2527
+
2528
+ - **Imai, K., & Kim, I. S. (2021).** "On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data." *Political Analysis*, 29(3), 405-415. [https://doi.org/10.1017/pan.2020.33](https://doi.org/10.1017/pan.2020.33)
2529
+
2530
+ ### Robust Standard Errors
2531
+
2532
+ - **White, H. (1980).** "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." *Econometrica*, 48(4), 817-838. [https://doi.org/10.2307/1912934](https://doi.org/10.2307/1912934)
2533
+
2534
+ - **MacKinnon, J. G., & White, H. (1985).** "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." *Journal of Econometrics*, 29(3), 305-325. [https://doi.org/10.1016/0304-4076(85)90158-7](https://doi.org/10.1016/0304-4076(85)90158-7)
2535
+
2536
+ - **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011).** "Robust Inference With Multiway Clustering." *Journal of Business & Economic Statistics*, 29(2), 238-249. [https://doi.org/10.1198/jbes.2010.07136](https://doi.org/10.1198/jbes.2010.07136)
2537
+
2538
+ ### Wild Cluster Bootstrap
2539
+
2540
+ - **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008).** "Bootstrap-Based Improvements for Inference with Clustered Errors." *The Review of Economics and Statistics*, 90(3), 414-427. [https://doi.org/10.1162/rest.90.3.414](https://doi.org/10.1162/rest.90.3.414)
2541
+
2542
+ - **Webb, M. D. (2014).** "Reworking Wild Bootstrap Based Inference for Clustered Errors." Queen's Economics Department Working Paper No. 1315. [https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf](https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf)
2543
+
2544
+ - **MacKinnon, J. G., & Webb, M. D. (2018).** "The Wild Bootstrap for Few (Treated) Clusters." *The Econometrics Journal*, 21(2), 114-135. [https://doi.org/10.1111/ectj.12107](https://doi.org/10.1111/ectj.12107)
2545
+
2546
+ ### Placebo Tests and DiD Diagnostics
2547
+
2548
+ - **Bertrand, M., Duflo, E., & Mullainathan, S. (2004).** "How Much Should We Trust Differences-in-Differences Estimates?" *The Quarterly Journal of Economics*, 119(1), 249-275. [https://doi.org/10.1162/003355304772839588](https://doi.org/10.1162/003355304772839588)
2549
+
2550
+ ### Synthetic Control Method
2551
+
2552
+ - **Abadie, A., & Gardeazabal, J. (2003).** "The Economic Costs of Conflict: A Case Study of the Basque Country." *The American Economic Review*, 93(1), 113-132. [https://doi.org/10.1257/000282803321455188](https://doi.org/10.1257/000282803321455188)
2553
+
2554
+ - **Abadie, A., Diamond, A., & Hainmueller, J. (2010).** "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." *Journal of the American Statistical Association*, 105(490), 493-505. [https://doi.org/10.1198/jasa.2009.ap08746](https://doi.org/10.1198/jasa.2009.ap08746)
2555
+
2556
+ - **Abadie, A., Diamond, A., & Hainmueller, J. (2015).** "Comparative Politics and the Synthetic Control Method." *American Journal of Political Science*, 59(2), 495-510. [https://doi.org/10.1111/ajps.12116](https://doi.org/10.1111/ajps.12116)
2557
+
2558
+ ### Synthetic Difference-in-Differences
2559
+
2560
+ - **Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021).** "Synthetic Difference-in-Differences." *American Economic Review*, 111(12), 4088-4118. [https://doi.org/10.1257/aer.20190159](https://doi.org/10.1257/aer.20190159)
2561
+
2562
+ ### Triply Robust Panel (TROP)
2563
+
2564
+ - **Athey, S., Imbens, G. W., Qu, Z., & Viviano, D. (2025).** "Triply Robust Panel Estimators." *Working Paper*. [https://arxiv.org/abs/2508.21536](https://arxiv.org/abs/2508.21536)
2565
+
2566
+ This paper introduces the TROP estimator which combines three robustness components:
2567
+ - **Factor model adjustment**: Low-rank factor structure via SVD removes unobserved confounders
2568
+ - **Unit weights**: Synthetic control style weighting for optimal comparison
2569
+ - **Time weights**: SDID style time weighting for informative pre-periods
2570
+
2571
+ TROP is particularly useful when there are unobserved time-varying confounders with a factor structure that affect different units differently over time.
2572
+
2573
+ ### Triple Difference (DDD)
2574
+
2575
+ - **Ortiz-Villavicencio, M., & Sant'Anna, P. H. C. (2025).** "Better Understanding Triple Differences Estimators." *Working Paper*. [https://arxiv.org/abs/2505.09942](https://arxiv.org/abs/2505.09942)
2576
+
2577
+ This paper shows that common DDD implementations (taking the difference between two DiDs, or applying three-way fixed effects regressions) are generally invalid when identification requires conditioning on covariates. The `TripleDifference` class implements their regression adjustment, inverse probability weighting, and doubly robust estimators.
2578
+
2579
+ - **Gruber, J. (1994).** "The Incidence of Mandated Maternity Benefits." *American Economic Review*, 84(3), 622-641. [https://www.jstor.org/stable/2118071](https://www.jstor.org/stable/2118071)
2580
+
2581
+ Classic paper introducing the Triple Difference design for policy evaluation.
2582
+
2583
+ - **Olden, A., & Møen, J. (2022).** "The Triple Difference Estimator." *The Econometrics Journal*, 25(3), 531-553. [https://doi.org/10.1093/ectj/utac010](https://doi.org/10.1093/ectj/utac010)
2584
+
2585
+ ### Parallel Trends and Pre-Trend Testing
2586
+
2587
+ - **Roth, J. (2022).** "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." *American Economic Review: Insights*, 4(3), 305-322. [https://doi.org/10.1257/aeri.20210236](https://doi.org/10.1257/aeri.20210236)
2588
+
2589
+ - **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. [https://doi.org/10.1177/1948550617697177](https://doi.org/10.1177/1948550617697177)
2590
+
2591
+ ### Honest DiD / Sensitivity Analysis
2592
+
2593
+ The `HonestDiD` module implements sensitivity analysis methods for relaxing the parallel trends assumption:
2594
+
2595
+ - **Rambachan, A., & Roth, J. (2023).** "A More Credible Approach to Parallel Trends." *The Review of Economic Studies*, 90(5), 2555-2591. [https://doi.org/10.1093/restud/rdad018](https://doi.org/10.1093/restud/rdad018)
2596
+
2597
+ This paper introduces the "Honest DiD" framework implemented in our `HonestDiD` class:
2598
+ - **Relative Magnitudes (ΔRM)**: Bounds post-treatment violations by a multiple of observed pre-treatment violations
2599
+ - **Smoothness (ΔSD)**: Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends
2600
+ - **Breakdown Analysis**: Finding the smallest violation magnitude that would overturn conclusions
2601
+ - **Robust Confidence Intervals**: Valid inference under partial identification
2602
+
2603
+ - **Roth, J., & Sant'Anna, P. H. C. (2023).** "When Is Parallel Trends Sensitive to Functional Form?" *Econometrica*, 91(2), 737-747. [https://doi.org/10.3982/ECTA19402](https://doi.org/10.3982/ECTA19402)
2604
+
2605
+ Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate.
2606
+
2607
+ ### Multi-Period and Staggered Adoption
2608
+
2609
+ - **Borusyak, K., Jaravel, X., & Spiess, J. (2024).** "Revisiting Event-Study Designs: Robust and Efficient Estimation." *Review of Economic Studies*, 91(6), 3253-3285. [https://doi.org/10.1093/restud/rdae007](https://doi.org/10.1093/restud/rdae007)
2610
+
2611
+ This paper introduces the imputation estimator implemented in our `ImputationDiD` class:
2612
+ - **Efficient imputation**: OLS on untreated observations → impute counterfactuals → aggregate
2613
+ - **Conservative variance**: Theorem 3 clustered variance estimator with auxiliary model
2614
+ - **Pre-trend test**: Independent of treatment effect estimation (Proposition 9)
2615
+ - **Efficiency gains**: ~50% shorter CIs than Callaway-Sant'Anna under homogeneous effects
2616
+
2617
+ - **Callaway, B., & Sant'Anna, P. H. C. (2021).** "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*, 225(2), 200-230. [https://doi.org/10.1016/j.jeconom.2020.12.001](https://doi.org/10.1016/j.jeconom.2020.12.001)
2618
+
2619
+ - **Sant'Anna, P. H. C., & Zhao, J. (2020).** "Doubly Robust Difference-in-Differences Estimators." *Journal of Econometrics*, 219(1), 101-122. [https://doi.org/10.1016/j.jeconom.2020.06.003](https://doi.org/10.1016/j.jeconom.2020.06.003)
2620
+
2621
+ - **Sun, L., & Abraham, S. (2021).** "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." *Journal of Econometrics*, 225(2), 175-199. [https://doi.org/10.1016/j.jeconom.2020.09.006](https://doi.org/10.1016/j.jeconom.2020.09.006)
2622
+
2623
+ - **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. [https://doi.org/10.1257/aer.20181169](https://doi.org/10.1257/aer.20181169)
2624
+
2625
+ - **Goodman-Bacon, A. (2021).** "Difference-in-Differences with Variation in Treatment Timing." *Journal of Econometrics*, 225(2), 254-277. [https://doi.org/10.1016/j.jeconom.2021.03.014](https://doi.org/10.1016/j.jeconom.2021.03.014)
2626
+
2627
+ ### Power Analysis
2628
+
2629
+ - **Bloom, H. S. (1995).** "Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs." *Evaluation Review*, 19(5), 547-556. [https://doi.org/10.1177/0193841X9501900504](https://doi.org/10.1177/0193841X9501900504)
2630
+
2631
+ - **Burlig, F., Preonas, L., & Woerman, M. (2020).** "Panel Data and Experimental Design." *Journal of Development Economics*, 144, 102458. [https://doi.org/10.1016/j.jdeveco.2020.102458](https://doi.org/10.1016/j.jdeveco.2020.102458)
2632
+
2633
+ Essential reference for power analysis in panel DiD designs. Discusses how serial correlation (ICC) affects power and provides formulas for panel data settings.
2634
+
2635
+ - **Djimeu, E. W., & Houndolo, D.-G. (2016).** "Power Calculation for Causal Inference in Social Science: Sample Size and Minimum Detectable Effect Determination." *Journal of Development Effectiveness*, 8(4), 508-527. [https://doi.org/10.1080/19439342.2016.1244555](https://doi.org/10.1080/19439342.2016.1244555)
2636
+
2637
+ ### General Causal Inference
2638
+
2639
+ - **Imbens, G. W., & Rubin, D. B. (2015).** *Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction*. Cambridge University Press.
2640
+
2641
+ - **Cunningham, S. (2021).** *Causal Inference: The Mixtape*. Yale University Press. [https://mixtape.scunning.com/](https://mixtape.scunning.com/)
2642
+
2643
+ ## License
2644
+
2645
+ MIT License
2646
+