diff-diff 2.1.0__cp39-cp39-macosx_11_0_arm64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,2511 @@
1
+ Metadata-Version: 2.4
2
+ Name: diff-diff
3
+ Version: 2.1.0
4
+ Classifier: Development Status :: 5 - Production/Stable
5
+ Classifier: Intended Audience :: Science/Research
6
+ Classifier: Operating System :: OS Independent
7
+ Classifier: Programming Language :: Python :: 3
8
+ Classifier: Programming Language :: Python :: 3.9
9
+ Classifier: Programming Language :: Python :: 3.10
10
+ Classifier: Programming Language :: Python :: 3.11
11
+ Classifier: Programming Language :: Python :: 3.12
12
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
13
+ Requires-Dist: numpy>=1.20.0
14
+ Requires-Dist: pandas>=1.3.0
15
+ Requires-Dist: scipy>=1.7.0
16
+ Requires-Dist: pytest>=7.0 ; extra == 'dev'
17
+ Requires-Dist: pytest-cov>=4.0 ; extra == 'dev'
18
+ Requires-Dist: black>=23.0 ; extra == 'dev'
19
+ Requires-Dist: ruff>=0.1.0 ; extra == 'dev'
20
+ Requires-Dist: mypy>=1.0 ; extra == 'dev'
21
+ Requires-Dist: sphinx>=6.0 ; extra == 'docs'
22
+ Requires-Dist: sphinx-rtd-theme>=1.0 ; extra == 'docs'
23
+ Provides-Extra: dev
24
+ Provides-Extra: docs
25
+ Summary: A library for Difference-in-Differences causal inference analysis
26
+ Keywords: causal-inference,difference-in-differences,econometrics,statistics,treatment-effects
27
+ Author: diff-diff contributors
28
+ License-Expression: MIT
29
+ Requires-Python: >=3.9
30
+ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
31
+ Project-URL: Documentation, https://diff-diff.readthedocs.io
32
+ Project-URL: Homepage, https://github.com/igerber/diff-diff
33
+ Project-URL: Issues, https://github.com/igerber/diff-diff/issues
34
+ Project-URL: Repository, https://github.com/igerber/diff-diff
35
+
36
+ # diff-diff
37
+
38
+ A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs.
39
+
40
+ ## Installation
41
+
42
+ ```bash
43
+ pip install diff-diff
44
+ ```
45
+
46
+ Or install from source:
47
+
48
+ ```bash
49
+ git clone https://github.com/igerber/diff-diff.git
50
+ cd diff-diff
51
+ pip install -e .
52
+ ```
53
+
54
+ ## Quick Start
55
+
56
+ ```python
57
+ import pandas as pd
58
+ from diff_diff import DifferenceInDifferences
59
+
60
+ # Create sample data
61
+ data = pd.DataFrame({
62
+ 'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
63
+ 'treated': [1, 1, 1, 1, 0, 0, 0, 0],
64
+ 'post': [0, 0, 1, 1, 0, 0, 1, 1]
65
+ })
66
+
67
+ # Fit the model
68
+ did = DifferenceInDifferences()
69
+ results = did.fit(data, outcome='outcome', treatment='treated', time='post')
70
+
71
+ # View results
72
+ print(results) # DiDResults(ATT=3.0000, SE=1.7321, p=0.1583)
73
+ results.print_summary()
74
+ ```
75
+
76
+ Output:
77
+ ```
78
+ ======================================================================
79
+ Difference-in-Differences Estimation Results
80
+ ======================================================================
81
+
82
+ Observations: 8
83
+ Treated units: 4
84
+ Control units: 4
85
+ R-squared: 0.9055
86
+
87
+ ----------------------------------------------------------------------
88
+ Parameter Estimate Std. Err. t-stat P>|t|
89
+ ----------------------------------------------------------------------
90
+ ATT 3.0000 1.7321 1.732 0.1583
91
+ ----------------------------------------------------------------------
92
+
93
+ 95% Confidence Interval: [-1.8089, 7.8089]
94
+
95
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
96
+ ======================================================================
97
+ ```
98
+
99
+ ## Features
100
+
101
+ - **sklearn-like API**: Familiar `fit()` interface with `get_params()` and `set_params()`
102
+ - **Pythonic results**: Easy access to coefficients, standard errors, and confidence intervals
103
+ - **Multiple interfaces**: Column names or R-style formulas
104
+ - **Robust inference**: Heteroskedasticity-robust (HC1) and cluster-robust standard errors
105
+ - **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
106
+ - **Panel data support**: Two-way fixed effects estimator for panel designs
107
+ - **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
108
+ - **Staggered adoption**: Callaway-Sant'Anna (2021) and Sun-Abraham (2021) estimators for heterogeneous treatment timing
109
+ - **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
110
+ - **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
111
+ - **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
112
+ - **Event study plots**: Publication-ready visualization of treatment effects
113
+ - **Parallel trends testing**: Multiple methods including equivalence tests
114
+ - **Goodman-Bacon decomposition**: Diagnose TWFE bias by decomposing into 2x2 comparisons
115
+ - **Placebo tests**: Comprehensive diagnostics including fake timing, fake group, permutation, and leave-one-out tests
116
+ - **Honest DiD sensitivity analysis**: Rambachan-Roth (2023) bounds and breakdown analysis for parallel trends violations
117
+ - **Pre-trends power analysis**: Roth (2022) minimum detectable violation (MDV) and power curves for pre-trends tests
118
+ - **Power analysis**: MDE, sample size, and power calculations for study design; simulation-based power for any estimator
119
+ - **Data prep utilities**: Helper functions for common data preparation tasks
120
+ - **Validated against R**: Benchmarked against `did`, `synthdid`, and `fixest` packages (see [benchmarks](docs/benchmarks.rst))
121
+
122
+ ## Tutorials
123
+
124
+ We provide Jupyter notebook tutorials in `docs/tutorials/`:
125
+
126
+ | Notebook | Description |
127
+ |----------|-------------|
128
+ | `01_basic_did.ipynb` | Basic 2x2 DiD, formula interface, covariates, fixed effects, cluster-robust SE, wild bootstrap |
129
+ | `02_staggered_did.ipynb` | Staggered adoption with Callaway-Sant'Anna and Sun-Abraham, group-time effects, aggregation methods, Bacon decomposition |
130
+ | `03_synthetic_did.ipynb` | Synthetic DiD, unit/time weights, inference methods, regularization |
131
+ | `04_parallel_trends.ipynb` | Testing parallel trends, equivalence tests, placebo tests, diagnostics |
132
+ | `05_honest_did.ipynb` | Honest DiD sensitivity analysis, bounds, breakdown values, visualization |
133
+ | `06_power_analysis.ipynb` | Power analysis, MDE, sample size calculations, simulation-based power |
134
+ | `07_pretrends_power.ipynb` | Pre-trends power analysis (Roth 2022), MDV, power curves |
135
+ | `08_triple_diff.ipynb` | Triple Difference (DDD) estimation with proper covariate handling |
136
+ | `09_real_world_examples.ipynb` | Real-world data examples (Card-Krueger, Castle Doctrine, Divorce Laws) |
137
+ | `10_trop.ipynb` | Triply Robust Panel (TROP) estimation with factor model adjustment |
138
+
139
+ ## Data Preparation
140
+
141
+ diff-diff provides utility functions to help prepare your data for DiD analysis. These functions handle common data transformation tasks like creating treatment indicators, reshaping panel data, and validating data formats.
142
+
143
+ ### Generate Sample Data
144
+
145
+ Create synthetic data with a known treatment effect for testing and learning:
146
+
147
+ ```python
148
+ from diff_diff import generate_did_data, DifferenceInDifferences
149
+
150
+ # Generate panel data with 100 units, 4 periods, and a treatment effect of 5
151
+ data = generate_did_data(
152
+ n_units=100,
153
+ n_periods=4,
154
+ treatment_effect=5.0,
155
+ treatment_fraction=0.5, # 50% of units are treated
156
+ treatment_period=2, # Treatment starts at period 2
157
+ seed=42
158
+ )
159
+
160
+ # Verify the estimator recovers the treatment effect
161
+ did = DifferenceInDifferences()
162
+ results = did.fit(data, outcome='outcome', treatment='treated', time='post')
163
+ print(f"Estimated ATT: {results.att:.2f} (true: 5.0)")
164
+ ```
165
+
166
+ ### Create Treatment Indicators
167
+
168
+ Convert categorical variables or numeric thresholds to binary treatment indicators:
169
+
170
+ ```python
171
+ from diff_diff import make_treatment_indicator
172
+
173
+ # From categorical variable
174
+ df = make_treatment_indicator(
175
+ data,
176
+ column='state',
177
+ treated_values=['CA', 'NY', 'TX'] # These states are treated
178
+ )
179
+
180
+ # From numeric threshold (e.g., firms above median size)
181
+ df = make_treatment_indicator(
182
+ data,
183
+ column='firm_size',
184
+ threshold=data['firm_size'].median()
185
+ )
186
+
187
+ # Treat units below threshold
188
+ df = make_treatment_indicator(
189
+ data,
190
+ column='income',
191
+ threshold=50000,
192
+ above_threshold=False # Units with income <= 50000 are treated
193
+ )
194
+ ```
195
+
196
+ ### Create Post-Treatment Indicators
197
+
198
+ Convert time/date columns to binary post-treatment indicators:
199
+
200
+ ```python
201
+ from diff_diff import make_post_indicator
202
+
203
+ # From specific post-treatment periods
204
+ df = make_post_indicator(
205
+ data,
206
+ time_column='year',
207
+ post_periods=[2020, 2021, 2022]
208
+ )
209
+
210
+ # From treatment start date
211
+ df = make_post_indicator(
212
+ data,
213
+ time_column='year',
214
+ treatment_start=2020 # All years >= 2020 are post-treatment
215
+ )
216
+
217
+ # Works with datetime columns
218
+ df = make_post_indicator(
219
+ data,
220
+ time_column='date',
221
+ treatment_start='2020-01-01'
222
+ )
223
+ ```
224
+
225
+ ### Reshape Wide to Long Format
226
+
227
+ Convert wide-format data (one row per unit, multiple time columns) to long format:
228
+
229
+ ```python
230
+ from diff_diff import wide_to_long
231
+
232
+ # Wide format: columns like sales_2019, sales_2020, sales_2021
233
+ wide_df = pd.DataFrame({
234
+ 'firm_id': [1, 2, 3],
235
+ 'industry': ['tech', 'retail', 'tech'],
236
+ 'sales_2019': [100, 150, 200],
237
+ 'sales_2020': [110, 160, 210],
238
+ 'sales_2021': [120, 170, 220]
239
+ })
240
+
241
+ # Convert to long format for DiD
242
+ long_df = wide_to_long(
243
+ wide_df,
244
+ value_columns=['sales_2019', 'sales_2020', 'sales_2021'],
245
+ id_column='firm_id',
246
+ time_name='year',
247
+ value_name='sales',
248
+ time_values=[2019, 2020, 2021]
249
+ )
250
+ # Result: 9 rows (3 firms × 3 years), columns: firm_id, year, sales, industry
251
+ ```
252
+
253
+ ### Balance Panel Data
254
+
255
+ Ensure all units have observations for all time periods:
256
+
257
+ ```python
258
+ from diff_diff import balance_panel
259
+
260
+ # Keep only units with complete data (drop incomplete units)
261
+ balanced = balance_panel(
262
+ data,
263
+ unit_column='firm_id',
264
+ time_column='year',
265
+ method='inner'
266
+ )
267
+
268
+ # Include all unit-period combinations (creates NaN for missing)
269
+ balanced = balance_panel(
270
+ data,
271
+ unit_column='firm_id',
272
+ time_column='year',
273
+ method='outer'
274
+ )
275
+
276
+ # Fill missing values
277
+ balanced = balance_panel(
278
+ data,
279
+ unit_column='firm_id',
280
+ time_column='year',
281
+ method='fill',
282
+ fill_value=0 # Or None for forward/backward fill
283
+ )
284
+ ```
285
+
286
+ ### Validate Data
287
+
288
+ Check that your data meets DiD requirements before fitting:
289
+
290
+ ```python
291
+ from diff_diff import validate_did_data
292
+
293
+ # Validate and get informative error messages
294
+ result = validate_did_data(
295
+ data,
296
+ outcome='sales',
297
+ treatment='treated',
298
+ time='post',
299
+ unit='firm_id', # Optional: for panel-specific validation
300
+ raise_on_error=False # Return dict instead of raising
301
+ )
302
+
303
+ if result['valid']:
304
+ print("Data is ready for DiD analysis!")
305
+ print(f"Summary: {result['summary']}")
306
+ else:
307
+ print("Issues found:")
308
+ for error in result['errors']:
309
+ print(f" - {error}")
310
+
311
+ for warning in result['warnings']:
312
+ print(f"Warning: {warning}")
313
+ ```
314
+
315
+ ### Summarize Data by Groups
316
+
317
+ Get summary statistics for each treatment-time cell:
318
+
319
+ ```python
320
+ from diff_diff import summarize_did_data
321
+
322
+ summary = summarize_did_data(
323
+ data,
324
+ outcome='sales',
325
+ treatment='treated',
326
+ time='post'
327
+ )
328
+ print(summary)
329
+ ```
330
+
331
+ Output:
332
+ ```
333
+ n mean std min max
334
+ Control - Pre 250 100.5000 15.2340 65.0000 145.0000
335
+ Control - Post 250 105.2000 16.1230 68.0000 152.0000
336
+ Treated - Pre 250 101.2000 14.8900 67.0000 143.0000
337
+ Treated - Post 250 115.8000 17.5600 72.0000 165.0000
338
+ DiD Estimate - 9.9000 - - -
339
+ ```
340
+
341
+ ### Create Event Time for Staggered Designs
342
+
343
+ For designs where treatment occurs at different times:
344
+
345
+ ```python
346
+ from diff_diff import create_event_time
347
+
348
+ # Add event-time column relative to treatment timing
349
+ df = create_event_time(
350
+ data,
351
+ time_column='year',
352
+ treatment_time_column='treatment_year'
353
+ )
354
+ # Result: event_time = -2, -1, 0, 1, 2 relative to treatment
355
+ ```
356
+
357
+ ### Aggregate to Cohort Means
358
+
359
+ Aggregate unit-level data for visualization:
360
+
361
+ ```python
362
+ from diff_diff import aggregate_to_cohorts
363
+
364
+ cohort_data = aggregate_to_cohorts(
365
+ data,
366
+ unit_column='firm_id',
367
+ time_column='year',
368
+ treatment_column='treated',
369
+ outcome='sales'
370
+ )
371
+ # Result: mean outcome by treatment group and period
372
+ ```
373
+
374
+ ### Rank Control Units
375
+
376
+ Select the best control units for DiD or Synthetic DiD analysis by ranking them based on pre-treatment outcome similarity:
377
+
378
+ ```python
379
+ from diff_diff import rank_control_units, generate_did_data
380
+
381
+ # Generate sample data
382
+ data = generate_did_data(n_units=50, n_periods=6, seed=42)
383
+
384
+ # Rank control units by their similarity to treated units
385
+ ranking = rank_control_units(
386
+ data,
387
+ unit_column='unit',
388
+ time_column='period',
389
+ outcome_column='outcome',
390
+ treatment_column='treated',
391
+ n_top=10 # Return top 10 controls
392
+ )
393
+
394
+ print(ranking[['unit', 'quality_score', 'pre_trend_rmse']])
395
+ ```
396
+
397
+ Output:
398
+ ```
399
+ unit quality_score pre_trend_rmse
400
+ 0 35 1.0000 0.4521
401
+ 1 42 0.9234 0.5123
402
+ 2 28 0.8876 0.5892
403
+ ...
404
+ ```
405
+
406
+ With covariates for matching:
407
+
408
+ ```python
409
+ # Add covariate-based matching
410
+ ranking = rank_control_units(
411
+ data,
412
+ unit_column='unit',
413
+ time_column='period',
414
+ outcome_column='outcome',
415
+ treatment_column='treated',
416
+ covariates=['size', 'age'], # Match on these too
417
+ outcome_weight=0.7, # 70% weight on outcome trends
418
+ covariate_weight=0.3 # 30% weight on covariate similarity
419
+ )
420
+ ```
421
+
422
+ Filter data for SyntheticDiD using top controls:
423
+
424
+ ```python
425
+ from diff_diff import SyntheticDiD
426
+
427
+ # Get top control units
428
+ top_controls = ranking['unit'].tolist()
429
+
430
+ # Filter data to treated + top controls
431
+ filtered_data = data[
432
+ (data['treated'] == 1) | (data['unit'].isin(top_controls))
433
+ ]
434
+
435
+ # Fit SyntheticDiD with selected controls
436
+ sdid = SyntheticDiD()
437
+ results = sdid.fit(
438
+ filtered_data,
439
+ outcome='outcome',
440
+ treatment='treated',
441
+ unit='unit',
442
+ time='period',
443
+ post_periods=[3, 4, 5]
444
+ )
445
+ ```
446
+
447
+ ## Usage
448
+
449
+ ### Basic DiD with Column Names
450
+
451
+ ```python
452
+ from diff_diff import DifferenceInDifferences
453
+
454
+ did = DifferenceInDifferences(robust=True, alpha=0.05)
455
+ results = did.fit(
456
+ data,
457
+ outcome='sales',
458
+ treatment='treated',
459
+ time='post_policy'
460
+ )
461
+
462
+ # Access results
463
+ print(f"ATT: {results.att:.4f}")
464
+ print(f"Standard Error: {results.se:.4f}")
465
+ print(f"P-value: {results.p_value:.4f}")
466
+ print(f"95% CI: {results.conf_int}")
467
+ print(f"Significant: {results.is_significant}")
468
+ ```
469
+
470
+ ### Using Formula Interface
471
+
472
+ ```python
473
+ # R-style formula syntax
474
+ results = did.fit(data, formula='outcome ~ treated * post')
475
+
476
+ # Explicit interaction syntax
477
+ results = did.fit(data, formula='outcome ~ treated + post + treated:post')
478
+
479
+ # With covariates
480
+ results = did.fit(data, formula='outcome ~ treated * post + age + income')
481
+ ```
482
+
483
+ ### Including Covariates
484
+
485
+ ```python
486
+ results = did.fit(
487
+ data,
488
+ outcome='outcome',
489
+ treatment='treated',
490
+ time='post',
491
+ covariates=['age', 'income', 'education']
492
+ )
493
+ ```
494
+
495
+ ### Fixed Effects
496
+
497
+ Use `fixed_effects` for low-dimensional categorical controls (creates dummy variables):
498
+
499
+ ```python
500
+ # State and industry fixed effects
501
+ results = did.fit(
502
+ data,
503
+ outcome='sales',
504
+ treatment='treated',
505
+ time='post',
506
+ fixed_effects=['state', 'industry']
507
+ )
508
+
509
+ # Access fixed effect coefficients
510
+ state_coefs = {k: v for k, v in results.coefficients.items() if k.startswith('state_')}
511
+ ```
512
+
513
+ Use `absorb` for high-dimensional fixed effects (more efficient, uses within-transformation):
514
+
515
+ ```python
516
+ # Absorb firm-level fixed effects (efficient for many firms)
517
+ results = did.fit(
518
+ data,
519
+ outcome='sales',
520
+ treatment='treated',
521
+ time='post',
522
+ absorb=['firm_id']
523
+ )
524
+ ```
525
+
526
+ Combine covariates with fixed effects:
527
+
528
+ ```python
529
+ results = did.fit(
530
+ data,
531
+ outcome='sales',
532
+ treatment='treated',
533
+ time='post',
534
+ covariates=['size', 'age'], # Linear controls
535
+ fixed_effects=['industry'], # Low-dimensional FE (dummies)
536
+ absorb=['firm_id'] # High-dimensional FE (absorbed)
537
+ )
538
+ ```
539
+
540
+ ### Cluster-Robust Standard Errors
541
+
542
+ ```python
543
+ did = DifferenceInDifferences(cluster='state')
544
+ results = did.fit(
545
+ data,
546
+ outcome='outcome',
547
+ treatment='treated',
548
+ time='post'
549
+ )
550
+ ```
551
+
552
+ ### Wild Cluster Bootstrap
553
+
554
+ When you have few clusters (<50), standard cluster-robust SEs are biased. Wild cluster bootstrap provides valid inference even with 5-10 clusters.
555
+
556
+ ```python
557
+ # Use wild bootstrap for inference
558
+ did = DifferenceInDifferences(
559
+ cluster='state',
560
+ inference='wild_bootstrap',
561
+ n_bootstrap=999,
562
+ bootstrap_weights='rademacher', # or 'webb' for <10 clusters, 'mammen'
563
+ seed=42
564
+ )
565
+ results = did.fit(data, outcome='y', treatment='treated', time='post')
566
+
567
+ # Results include bootstrap-based SE and p-value
568
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
569
+ print(f"P-value: {results.p_value:.4f}")
570
+ print(f"95% CI: {results.conf_int}")
571
+ print(f"Inference method: {results.inference_method}")
572
+ print(f"Number of clusters: {results.n_clusters}")
573
+ ```
574
+
575
+ **Weight types:**
576
+ - `'rademacher'` - Default, ±1 with p=0.5, good for most cases
577
+ - `'webb'` - 6-point distribution, recommended for <10 clusters
578
+ - `'mammen'` - Two-point distribution, alternative to Rademacher
579
+
580
+ Works with `DifferenceInDifferences` and `TwoWayFixedEffects` estimators.
581
+
582
+ ### Two-Way Fixed Effects (Panel Data)
583
+
584
+ ```python
585
+ from diff_diff import TwoWayFixedEffects
586
+
587
+ twfe = TwoWayFixedEffects()
588
+ results = twfe.fit(
589
+ panel_data,
590
+ outcome='outcome',
591
+ treatment='treated',
592
+ time='year',
593
+ unit='firm_id'
594
+ )
595
+ ```
596
+
597
+ ### Multi-Period DiD (Event Study)
598
+
599
+ For settings with multiple pre- and post-treatment periods:
600
+
601
+ ```python
602
+ from diff_diff import MultiPeriodDiD
603
+
604
+ # Fit with multiple time periods
605
+ did = MultiPeriodDiD()
606
+ results = did.fit(
607
+ panel_data,
608
+ outcome='sales',
609
+ treatment='treated',
610
+ time='period',
611
+ post_periods=[3, 4, 5], # Periods 3-5 are post-treatment
612
+ reference_period=0 # Reference period for comparison
613
+ )
614
+
615
+ # View period-specific treatment effects
616
+ for period, effect in results.period_effects.items():
617
+ print(f"Period {period}: {effect.effect:.3f} (SE: {effect.se:.3f})")
618
+
619
+ # View average treatment effect across post-periods
620
+ print(f"Average ATT: {results.avg_att:.3f}")
621
+ print(f"Average SE: {results.avg_se:.3f}")
622
+
623
+ # Full summary with all period effects
624
+ results.print_summary()
625
+ ```
626
+
627
+ Output:
628
+ ```
629
+ ================================================================================
630
+ Multi-Period Difference-in-Differences Estimation Results
631
+ ================================================================================
632
+
633
+ Observations: 600
634
+ Pre-treatment periods: 3
635
+ Post-treatment periods: 3
636
+
637
+ --------------------------------------------------------------------------------
638
+ Average Treatment Effect
639
+ --------------------------------------------------------------------------------
640
+ Average ATT 5.2000 0.8234 6.315 0.0000
641
+ --------------------------------------------------------------------------------
642
+ 95% Confidence Interval: [3.5862, 6.8138]
643
+
644
+ Period-Specific Effects:
645
+ --------------------------------------------------------------------------------
646
+ Period Effect Std. Err. t-stat P>|t|
647
+ --------------------------------------------------------------------------------
648
+ 3 4.5000 0.9512 4.731 0.0000***
649
+ 4 5.2000 0.8876 5.858 0.0000***
650
+ 5 5.9000 0.9123 6.468 0.0000***
651
+ --------------------------------------------------------------------------------
652
+
653
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
654
+ ================================================================================
655
+ ```
656
+
657
+ ### Staggered Difference-in-Differences (Callaway-Sant'Anna)
658
+
659
+ When treatment is adopted at different times by different units, traditional TWFE estimators can be biased. The Callaway-Sant'Anna estimator provides unbiased estimates with staggered adoption.
660
+
661
+ ```python
662
+ from diff_diff import CallawaySantAnna
663
+
664
+ # Panel data with staggered treatment
665
+ # 'first_treat' = period when unit was first treated (0 if never treated)
666
+ cs = CallawaySantAnna()
667
+ results = cs.fit(
668
+ panel_data,
669
+ outcome='sales',
670
+ unit='firm_id',
671
+ time='year',
672
+ first_treat='first_treat', # 0 for never-treated, else first treatment year
673
+ aggregate='event_study' # Compute event study effects
674
+ )
675
+
676
+ # View results
677
+ results.print_summary()
678
+
679
+ # Access group-time effects ATT(g,t)
680
+ for (group, time), effect in results.group_time_effects.items():
681
+ print(f"Cohort {group}, Period {time}: {effect['effect']:.3f}")
682
+
683
+ # Event study effects (averaged by relative time)
684
+ for rel_time, effect in results.event_study_effects.items():
685
+ print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")
686
+
687
+ # Convert to DataFrame
688
+ df = results.to_dataframe(level='event_study')
689
+ ```
690
+
691
+ Output:
692
+ ```
693
+ =====================================================================================
694
+ Callaway-Sant'Anna Staggered Difference-in-Differences Results
695
+ =====================================================================================
696
+
697
+ Total observations: 600
698
+ Treated units: 35
699
+ Control units: 15
700
+ Treatment cohorts: 3
701
+ Time periods: 8
702
+ Control group: never_treated
703
+
704
+ -------------------------------------------------------------------------------------
705
+ Overall Average Treatment Effect on the Treated
706
+ -------------------------------------------------------------------------------------
707
+ Parameter Estimate Std. Err. t-stat P>|t| Sig.
708
+ -------------------------------------------------------------------------------------
709
+ ATT 2.5000 0.3521 7.101 0.0000 ***
710
+ -------------------------------------------------------------------------------------
711
+
712
+ 95% Confidence Interval: [1.8099, 3.1901]
713
+
714
+ -------------------------------------------------------------------------------------
715
+ Event Study (Dynamic) Effects
716
+ -------------------------------------------------------------------------------------
717
+ Rel. Period Estimate Std. Err. t-stat P>|t| Sig.
718
+ -------------------------------------------------------------------------------------
719
+ 0 2.1000 0.4521 4.645 0.0000 ***
720
+ 1 2.5000 0.4123 6.064 0.0000 ***
721
+ 2 2.8000 0.5234 5.349 0.0000 ***
722
+ -------------------------------------------------------------------------------------
723
+
724
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
725
+ =====================================================================================
726
+ ```
727
+
728
+ **When to use Callaway-Sant'Anna vs TWFE:**
729
+
730
+ | Scenario | Use TWFE | Use Callaway-Sant'Anna |
731
+ |----------|----------|------------------------|
732
+ | All units treated at same time | ✓ | ✓ |
733
+ | Staggered adoption, homogeneous effects | ✓ | ✓ |
734
+ | Staggered adoption, heterogeneous effects | ✗ | ✓ |
735
+ | Need event study with staggered timing | ✗ | ✓ |
736
+ | Fewer than ~20 treated units | ✓ | Depends on design |
737
+
738
+ **Parameters:**
739
+
740
+ ```python
741
+ CallawaySantAnna(
742
+ control_group='never_treated', # or 'not_yet_treated'
743
+ anticipation=0, # Periods before treatment with effects
744
+ estimation_method='dr', # 'dr', 'ipw', or 'reg'
745
+ alpha=0.05, # Significance level
746
+ cluster=None, # Column for cluster SEs
747
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
748
+ bootstrap_weight_type='rademacher', # 'rademacher', 'mammen', or 'webb'
749
+ seed=None # Random seed
750
+ )
751
+ ```
752
+
753
+ **Multiplier bootstrap for inference:**
754
+
755
+ With few clusters or when analytical standard errors may be unreliable, use the multiplier bootstrap for valid inference. This implements the approach from Callaway & Sant'Anna (2021).
756
+
757
+ ```python
758
+ # Bootstrap inference with 999 iterations
759
+ cs = CallawaySantAnna(
760
+ n_bootstrap=999,
761
+ bootstrap_weight_type='rademacher', # or 'mammen', 'webb'
762
+ seed=42
763
+ )
764
+ results = cs.fit(
765
+ data,
766
+ outcome='sales',
767
+ unit='firm_id',
768
+ time='year',
769
+ first_treat='first_treat',
770
+ aggregate='event_study'
771
+ )
772
+
773
+ # Access bootstrap results
774
+ print(f"Overall ATT: {results.overall_att:.3f}")
775
+ print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}")
776
+ print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}")
777
+ print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}")
778
+
779
+ # Event study bootstrap inference
780
+ for rel_time, se in results.bootstrap_results.event_study_ses.items():
781
+ ci = results.bootstrap_results.event_study_cis[rel_time]
782
+ print(f"e={rel_time}: SE={se:.3f}, 95% CI=[{ci[0]:.3f}, {ci[1]:.3f}]")
783
+ ```
784
+
785
+ **Bootstrap weight types:**
786
+ - `'rademacher'` - Default, ±1 with p=0.5, good for most cases
787
+ - `'mammen'` - Two-point distribution matching first 3 moments
788
+ - `'webb'` - Six-point distribution, recommended for very few clusters (<10)
789
+
790
+ **Covariate adjustment for conditional parallel trends:**
791
+
792
+ When parallel trends only holds conditional on covariates, use the `covariates` parameter:
793
+
794
+ ```python
795
+ # Doubly robust estimation with covariates
796
+ cs = CallawaySantAnna(estimation_method='dr') # 'dr', 'ipw', or 'reg'
797
+ results = cs.fit(
798
+ data,
799
+ outcome='sales',
800
+ unit='firm_id',
801
+ time='year',
802
+ first_treat='first_treat',
803
+ covariates=['size', 'age', 'industry'], # Covariates for conditional PT
804
+ aggregate='event_study'
805
+ )
806
+ ```
807
+
808
+ ### Sun-Abraham Interaction-Weighted Estimator
809
+
810
+ The Sun-Abraham (2021) estimator provides an alternative to Callaway-Sant'Anna using an interaction-weighted (IW) regression approach. Running both estimators serves as a useful robustness check—when they agree, results are more credible.
811
+
812
+ ```python
813
+ from diff_diff import SunAbraham
814
+
815
+ # Basic usage
816
+ sa = SunAbraham()
817
+ results = sa.fit(
818
+ panel_data,
819
+ outcome='sales',
820
+ unit='firm_id',
821
+ time='year',
822
+ first_treat='first_treat' # 0 for never-treated, else first treatment year
823
+ )
824
+
825
+ # View results
826
+ results.print_summary()
827
+
828
+ # Event study effects (by relative time to treatment)
829
+ for rel_time, effect in results.event_study_effects.items():
830
+ print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")
831
+
832
+ # Overall ATT
833
+ print(f"Overall ATT: {results.overall_att:.3f} (SE: {results.overall_se:.3f})")
834
+
835
+ # Cohort weights (how each cohort contributes to each event-time estimate)
836
+ for rel_time, weights in results.cohort_weights.items():
837
+ print(f"e={rel_time}: {weights}")
838
+ ```
839
+
840
+ **Parameters:**
841
+
842
+ ```python
843
+ SunAbraham(
844
+ control_group='never_treated', # or 'not_yet_treated'
845
+ anticipation=0, # Periods before treatment with effects
846
+ alpha=0.05, # Significance level
847
+ cluster=None, # Column for cluster SEs
848
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
849
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
850
+ seed=None # Random seed
851
+ )
852
+ ```
853
+
854
+ **Bootstrap inference:**
855
+
856
+ ```python
857
+ # Bootstrap inference with 999 iterations
858
+ sa = SunAbraham(
859
+ n_bootstrap=999,
860
+ bootstrap_weights='rademacher',
861
+ seed=42
862
+ )
863
+ results = sa.fit(
864
+ data,
865
+ outcome='sales',
866
+ unit='firm_id',
867
+ time='year',
868
+ first_treat='first_treat'
869
+ )
870
+
871
+ # Access bootstrap results
872
+ print(f"Overall ATT: {results.overall_att:.3f}")
873
+ print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}")
874
+ print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}")
875
+ print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}")
876
+ ```
877
+
878
+ **When to use Sun-Abraham vs Callaway-Sant'Anna:**
879
+
880
+ | Aspect | Sun-Abraham | Callaway-Sant'Anna |
881
+ |--------|-------------|-------------------|
882
+ | Approach | Interaction-weighted regression | 2x2 DiD aggregation |
883
+ | Efficiency | More efficient under homogeneous effects | More robust to heterogeneity |
884
+ | Weighting | Weights by cohort share at each relative time | Weights by sample size |
885
+ | Use case | Robustness check, regression-based inference | Primary staggered DiD estimator |
886
+
887
+ **Both estimators should give similar results when:**
888
+ - Treatment effects are relatively homogeneous across cohorts
889
+ - Parallel trends holds
890
+
891
+ **Running both as robustness check:**
892
+
893
+ ```python
894
+ from diff_diff import CallawaySantAnna, SunAbraham
895
+
896
+ # Callaway-Sant'Anna
897
+ cs = CallawaySantAnna()
898
+ cs_results = cs.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat')
899
+
900
+ # Sun-Abraham
901
+ sa = SunAbraham()
902
+ sa_results = sa.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat')
903
+
904
+ # Compare
905
+ print(f"Callaway-Sant'Anna ATT: {cs_results.overall_att:.3f}")
906
+ print(f"Sun-Abraham ATT: {sa_results.overall_att:.3f}")
907
+
908
+ # If results differ substantially, investigate heterogeneity
909
+ ```
910
+
911
+ ### Triple Difference (DDD)
912
+
913
+ Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).
914
+
915
+ ```python
916
+ from diff_diff import TripleDifference, triple_difference
917
+
918
+ # Basic usage
919
+ ddd = TripleDifference(estimation_method='dr') # doubly robust (recommended)
920
+ results = ddd.fit(
921
+ data,
922
+ outcome='wages',
923
+ group='policy_state', # 1=state enacted policy, 0=control state
924
+ partition='female', # 1=women (affected by policy), 0=men
925
+ time='post' # 1=post-policy, 0=pre-policy
926
+ )
927
+
928
+ # View results
929
+ results.print_summary()
930
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
931
+
932
+ # With covariates (properly incorporated, unlike naive DDD)
933
+ results = ddd.fit(
934
+ data,
935
+ outcome='wages',
936
+ group='policy_state',
937
+ partition='female',
938
+ time='post',
939
+ covariates=['age', 'education', 'experience']
940
+ )
941
+ ```
942
+
943
+ **Estimation methods:**
944
+
945
+ | Method | Description | When to use |
946
+ |--------|-------------|-------------|
947
+ | `"dr"` | Doubly robust | Recommended. Consistent if either outcome or propensity model is correct |
948
+ | `"reg"` | Regression adjustment | Simple outcome regression with full interactions |
949
+ | `"ipw"` | Inverse probability weighting | When propensity score model is well-specified |
950
+
951
+ ```python
952
+ # Compare estimation methods
953
+ for method in ['reg', 'ipw', 'dr']:
954
+ est = TripleDifference(estimation_method=method)
955
+ res = est.fit(data, outcome='y', group='g', partition='p', time='t')
956
+ print(f"{method}: ATT={res.att:.3f} (SE={res.se:.3f})")
957
+ ```
958
+
959
+ **Convenience function:**
960
+
961
+ ```python
962
+ # One-liner estimation
963
+ results = triple_difference(
964
+ data,
965
+ outcome='wages',
966
+ group='policy_state',
967
+ partition='female',
968
+ time='post',
969
+ covariates=['age', 'education'],
970
+ estimation_method='dr'
971
+ )
972
+ ```
973
+
974
+ **Why use DDD instead of DiD?**
975
+
976
+ DDD allows for violations of parallel trends that are:
977
+ - Group-specific (e.g., economic shocks in treatment states)
978
+ - Partition-specific (e.g., trends affecting women everywhere)
979
+
980
+ As long as these biases are additive, DDD differences them out. The key assumption is that the *differential* trend between eligible and ineligible units would be the same across groups.
981
+
982
+ ### Event Study Visualization
983
+
984
+ Create publication-ready event study plots:
985
+
986
+ ```python
987
+ from diff_diff import plot_event_study, MultiPeriodDiD, CallawaySantAnna, SunAbraham
988
+
989
+ # From MultiPeriodDiD
990
+ did = MultiPeriodDiD()
991
+ results = did.fit(data, outcome='y', treatment='treated',
992
+ time='period', post_periods=[3, 4, 5])
993
+ plot_event_study(results, title="Treatment Effects Over Time")
994
+
995
+ # From CallawaySantAnna (with event study aggregation)
996
+ cs = CallawaySantAnna()
997
+ results = cs.fit(data, outcome='y', unit='unit', time='period',
998
+ first_treat='first_treat', aggregate='event_study')
999
+ plot_event_study(results, title="Staggered DiD Event Study (CS)")
1000
+
1001
+ # From SunAbraham
1002
+ sa = SunAbraham()
1003
+ results = sa.fit(data, outcome='y', unit='unit', time='period',
1004
+ first_treat='first_treat')
1005
+ plot_event_study(results, title="Staggered DiD Event Study (SA)")
1006
+
1007
+ # From a DataFrame
1008
+ df = pd.DataFrame({
1009
+ 'period': [-2, -1, 0, 1, 2],
1010
+ 'effect': [0.1, 0.05, 0.0, 2.5, 2.8],
1011
+ 'se': [0.3, 0.25, 0.0, 0.4, 0.45]
1012
+ })
1013
+ plot_event_study(df, reference_period=0)
1014
+
1015
+ # With customization
1016
+ ax = plot_event_study(
1017
+ results,
1018
+ title="Dynamic Treatment Effects",
1019
+ xlabel="Years Relative to Treatment",
1020
+ ylabel="Effect on Sales ($1000s)",
1021
+ color="#2563eb",
1022
+ marker="o",
1023
+ shade_pre=True, # Shade pre-treatment region
1024
+ show_zero_line=True, # Horizontal line at y=0
1025
+ show_reference_line=True, # Vertical line at reference period
1026
+ figsize=(10, 6),
1027
+ show=False # Don't call plt.show(), return axes
1028
+ )
1029
+ ```
1030
+
1031
+ ### Synthetic Difference-in-Differences
1032
+
1033
+ Synthetic DiD combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units' pre-treatment outcomes.
1034
+
1035
+ ```python
1036
+ from diff_diff import SyntheticDiD
1037
+
1038
+ # Fit Synthetic DiD model
1039
+ sdid = SyntheticDiD()
1040
+ results = sdid.fit(
1041
+ panel_data,
1042
+ outcome='gdp_growth',
1043
+ treatment='treated',
1044
+ unit='state',
1045
+ time='year',
1046
+ post_periods=[2015, 2016, 2017, 2018]
1047
+ )
1048
+
1049
+ # View results
1050
+ results.print_summary()
1051
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
1052
+
1053
+ # Examine unit weights (which control units matter most)
1054
+ weights_df = results.get_unit_weights_df()
1055
+ print(weights_df.head(10))
1056
+
1057
+ # Examine time weights
1058
+ time_weights_df = results.get_time_weights_df()
1059
+ print(time_weights_df)
1060
+ ```
1061
+
1062
+ Output:
1063
+ ```
1064
+ ===========================================================================
1065
+ Synthetic Difference-in-Differences Estimation Results
1066
+ ===========================================================================
1067
+
1068
+ Observations: 500
1069
+ Treated units: 1
1070
+ Control units: 49
1071
+ Pre-treatment periods: 6
1072
+ Post-treatment periods: 4
1073
+ Regularization (lambda): 0.0000
1074
+ Pre-treatment fit (RMSE): 0.1234
1075
+
1076
+ ---------------------------------------------------------------------------
1077
+ Parameter Estimate Std. Err. t-stat P>|t|
1078
+ ---------------------------------------------------------------------------
1079
+ ATT 2.5000 0.4521 5.530 0.0000
1080
+ ---------------------------------------------------------------------------
1081
+
1082
+ 95% Confidence Interval: [1.6139, 3.3861]
1083
+
1084
+ ---------------------------------------------------------------------------
1085
+ Top Unit Weights (Synthetic Control)
1086
+ ---------------------------------------------------------------------------
1087
+ Unit state_12: 0.3521
1088
+ Unit state_5: 0.2156
1089
+ Unit state_23: 0.1834
1090
+ Unit state_8: 0.1245
1091
+ Unit state_31: 0.0892
1092
+ (8 units with weight > 0.001)
1093
+
1094
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
1095
+ ===========================================================================
1096
+ ```
1097
+
1098
+ #### When to Use Synthetic DiD Over Vanilla DiD
1099
+
1100
+ Use Synthetic DiD instead of standard DiD when:
1101
+
1102
+ 1. **Few treated units**: When you have only one or a small number of treated units (e.g., a single state passed a policy), standard DiD averages across all controls equally. Synthetic DiD finds the optimal weighted combination of controls.
1103
+
1104
+ ```python
1105
+ # Example: California passed a policy, want to estimate its effect
1106
+ # Standard DiD would compare CA to the average of all other states
1107
+ # Synthetic DiD finds states that together best match CA's pre-treatment trend
1108
+ ```
1109
+
1110
+ 2. **Parallel trends is questionable**: When treated and control groups have different pre-treatment levels or trends, Synthetic DiD can construct a better counterfactual by matching the pre-treatment trajectory.
1111
+
1112
+ ```python
1113
+ # Example: A tech hub city vs rural areas
1114
+ # Rural areas may not be a good comparison on average
1115
+ # Synthetic DiD can weight urban/suburban controls more heavily
1116
+ ```
1117
+
1118
+ 3. **Heterogeneous control units**: When control units are very different from each other, equal weighting (as in standard DiD) is suboptimal.
1119
+
1120
+ ```python
1121
+ # Example: Comparing a treated developing country to other countries
1122
+ # Some control countries may be much more similar economically
1123
+ # Synthetic DiD upweights the most comparable controls
1124
+ ```
1125
+
1126
+ 4. **You want transparency**: Synthetic DiD provides explicit unit weights showing which controls contribute most to the comparison.
1127
+
1128
+ ```python
1129
+ # See exactly which units are driving the counterfactual
1130
+ print(results.get_unit_weights_df())
1131
+ ```
1132
+
1133
+ **Key differences from standard DiD:**
1134
+
1135
+ | Aspect | Standard DiD | Synthetic DiD |
1136
+ |--------|--------------|---------------|
1137
+ | Control weighting | Equal (1/N) | Optimized to match pre-treatment |
1138
+ | Time weighting | Equal across periods | Can emphasize informative periods |
1139
+ | N treated required | Can be many | Works with 1 treated unit |
1140
+ | Parallel trends | Assumed | Partially relaxed via matching |
1141
+ | Interpretability | Simple average | Explicit weights |
1142
+
1143
+ **Parameters:**
1144
+
1145
+ ```python
1146
+ SyntheticDiD(
1147
+ lambda_reg=0.0, # Regularization toward uniform weights (0 = no reg)
1148
+ zeta=1.0, # Time weight regularization (higher = more uniform)
1149
+ alpha=0.05, # Significance level
1150
+ n_bootstrap=200, # Bootstrap iterations for SE (0 = placebo-based)
1151
+ seed=None # Random seed for reproducibility
1152
+ )
1153
+ ```
1154
+
1155
+ ### Triply Robust Panel (TROP)
1156
+
1157
+ TROP (Athey, Imbens, Qu & Viviano 2025) extends Synthetic DiD by adding interactive fixed effects (factor model) adjustment. It's particularly useful when there are unobserved time-varying confounders with a factor structure that could bias standard DiD or SDID estimates.
1158
+
1159
+ TROP combines three robustness components:
1160
+ 1. **Nuclear norm regularized factor model**: Estimates interactive fixed effects L_it via soft-thresholding
1161
+ 2. **Exponential distance-based unit weights**: ω_j = exp(-λ_unit × distance(j,i))
1162
+ 3. **Exponential time decay weights**: θ_s = exp(-λ_time × |s-t|)
1163
+
1164
+ Tuning parameters are selected via leave-one-out cross-validation (LOOCV).
1165
+
1166
+ ```python
1167
+ from diff_diff import TROP, trop
1168
+
1169
+ # Fit TROP model with automatic tuning via LOOCV
1170
+ trop_est = TROP(
1171
+ lambda_time_grid=[0.0, 0.5, 1.0, 2.0], # Time decay grid
1172
+ lambda_unit_grid=[0.0, 0.5, 1.0, 2.0], # Unit distance grid
1173
+ lambda_nn_grid=[0.0, 0.1, 1.0], # Nuclear norm grid
1174
+ n_bootstrap=200
1175
+ )
1176
+ results = trop_est.fit(
1177
+ panel_data,
1178
+ outcome='gdp_growth',
1179
+ treatment='treated',
1180
+ unit='state',
1181
+ time='year',
1182
+ post_periods=[2015, 2016, 2017, 2018]
1183
+ )
1184
+
1185
+ # View results
1186
+ results.print_summary()
1187
+ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
1188
+ print(f"Effective rank: {results.effective_rank:.2f}")
1189
+
1190
+ # Selected tuning parameters
1191
+ print(f"λ_time: {results.lambda_time:.2f}")
1192
+ print(f"λ_unit: {results.lambda_unit:.2f}")
1193
+ print(f"λ_nn: {results.lambda_nn:.2f}")
1194
+
1195
+ # Examine unit effects
1196
+ unit_effects = results.get_unit_effects_df()
1197
+ print(unit_effects.head(10))
1198
+ ```
1199
+
1200
+ Output:
1201
+ ```
1202
+ ===========================================================================
1203
+ Triply Robust Panel (TROP) Estimation Results
1204
+ Athey, Imbens, Qu & Viviano (2025)
1205
+ ===========================================================================
1206
+
1207
+ Observations: 500
1208
+ Treated units: 1
1209
+ Control units: 49
1210
+ Treated observations: 4
1211
+ Pre-treatment periods: 6
1212
+ Post-treatment periods: 4
1213
+
1214
+ ---------------------------------------------------------------------------
1215
+ Tuning Parameters (selected via LOOCV)
1216
+ ---------------------------------------------------------------------------
1217
+ Lambda (time decay): 1.0000
1218
+ Lambda (unit distance): 0.5000
1219
+ Lambda (nuclear norm): 0.1000
1220
+ Effective rank: 2.35
1221
+ LOOCV score: 0.012345
1222
+ Variance method: bootstrap
1223
+ Bootstrap replications: 200
1224
+
1225
+ ---------------------------------------------------------------------------
1226
+ Parameter Estimate Std. Err. t-stat P>|t|
1227
+ ---------------------------------------------------------------------------
1228
+ ATT 2.5000 0.3892 6.424 0.0000 ***
1229
+ ---------------------------------------------------------------------------
1230
+
1231
+ 95% Confidence Interval: [1.7372, 3.2628]
1232
+
1233
+ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
1234
+ ===========================================================================
1235
+ ```
1236
+
1237
+ #### When to Use TROP Over Synthetic DiD
1238
+
1239
+ Use TROP when you suspect **factor structure** in the data—unobserved confounders that affect outcomes differently across units and time:
1240
+
1241
+ | Scenario | Use SDID | Use TROP |
1242
+ |----------|----------|----------|
1243
+ | Simple parallel trends | ✓ | ✓ |
1244
+ | Unobserved factors (e.g., economic cycles) | May be biased | ✓ |
1245
+ | Strong unit-time interactions | May be biased | ✓ |
1246
+ | Low-dimensional confounding | ✓ | ✓ |
1247
+
1248
+ **Example scenarios where TROP excels:**
1249
+ - Regional economic shocks that affect states differently based on industry composition
1250
+ - Global trends that impact countries differently based on their economic structure
1251
+ - Common factors in financial data (market risk, interest rates, etc.)
1252
+
1253
+ **How TROP works:**
1254
+
1255
+ 1. **Factor estimation**: Estimates interactive fixed effects L_it using nuclear norm regularization (encourages low-rank structure)
1256
+ 2. **Unit weights**: Exponential distance-based weighting ω_j = exp(-λ_unit × d(j,i)) where d(j,i) is the RMSE of outcome differences
1257
+ 3. **Time weights**: Exponential decay weighting θ_s = exp(-λ_time × |s-t|) based on proximity to treatment
1258
+ 4. **ATT computation**: τ = Y_it - α_i - β_t - L_it for treated observations
1259
+
1260
+ ```python
1261
+ # Compare TROP vs SDID under factor confounding
1262
+ from diff_diff import SyntheticDiD
1263
+
1264
+ # Synthetic DiD (may be biased with factors)
1265
+ sdid = SyntheticDiD()
1266
+ sdid_results = sdid.fit(data, outcome='y', treatment='treated',
1267
+ unit='unit', time='time', post_periods=[5,6,7])
1268
+
1269
+ # TROP (accounts for factors)
1270
+ trop_est = TROP() # Uses default grids with LOOCV selection
1271
+ trop_results = trop_est.fit(data, outcome='y', treatment='treated',
1272
+ unit='unit', time='time', post_periods=[5,6,7])
1273
+
1274
+ print(f"SDID estimate: {sdid_results.att:.3f}")
1275
+ print(f"TROP estimate: {trop_results.att:.3f}")
1276
+ print(f"Effective rank: {trop_results.effective_rank:.2f}")
1277
+ ```
1278
+
1279
+ **Tuning parameter grids:**
1280
+
1281
+ ```python
1282
+ # Custom tuning grids (searched via LOOCV)
1283
+ trop = TROP(
1284
+ lambda_time_grid=[0.0, 0.1, 0.5, 1.0, 2.0, 5.0], # Time decay
1285
+ lambda_unit_grid=[0.0, 0.1, 0.5, 1.0, 2.0, 5.0], # Unit distance
1286
+ lambda_nn_grid=[0.0, 0.01, 0.1, 1.0, 10.0] # Nuclear norm
1287
+ )
1288
+
1289
+ # Fixed tuning parameters (skip LOOCV search)
1290
+ trop = TROP(
1291
+ lambda_time_grid=[1.0], # Single value = fixed
1292
+ lambda_unit_grid=[1.0], # Single value = fixed
1293
+ lambda_nn_grid=[0.1] # Single value = fixed
1294
+ )
1295
+ ```
1296
+
1297
+ **Parameters:**
1298
+
1299
+ ```python
1300
+ TROP(
1301
+ lambda_time_grid=None, # Time decay grid (default: [0, 0.1, 0.5, 1, 2, 5])
1302
+ lambda_unit_grid=None, # Unit distance grid (default: [0, 0.1, 0.5, 1, 2, 5])
1303
+ lambda_nn_grid=None, # Nuclear norm grid (default: [0, 0.01, 0.1, 1, 10])
1304
+ max_iter=100, # Max iterations for factor estimation
1305
+ tol=1e-6, # Convergence tolerance
1306
+ alpha=0.05, # Significance level
1307
+ variance_method='bootstrap', # 'bootstrap' or 'jackknife'
1308
+ n_bootstrap=200, # Bootstrap replications
1309
+ seed=None # Random seed
1310
+ )
1311
+ ```
1312
+
1313
+ **Convenience function:**
1314
+
1315
+ ```python
1316
+ # One-liner estimation with default tuning grids
1317
+ results = trop(
1318
+ data,
1319
+ outcome='y',
1320
+ treatment='treated',
1321
+ unit='unit',
1322
+ time='time',
1323
+ post_periods=[5, 6, 7],
1324
+ n_bootstrap=200
1325
+ )
1326
+ ```
1327
+
1328
+ ## Working with Results
1329
+
1330
+ ### Export Results
1331
+
1332
+ ```python
1333
+ # As dictionary
1334
+ results.to_dict()
1335
+ # {'att': 3.5, 'se': 1.26, 'p_value': 0.037, ...}
1336
+
1337
+ # As DataFrame
1338
+ df = results.to_dataframe()
1339
+ ```
1340
+
1341
+ ### Check Significance
1342
+
1343
+ ```python
1344
+ if results.is_significant:
1345
+ print(f"Effect is significant at {did.alpha} level")
1346
+
1347
+ # Get significance stars
1348
+ print(f"ATT: {results.att}{results.significance_stars}")
1349
+ # ATT: 3.5000*
1350
+ ```
1351
+
1352
+ ### Access Full Regression Output
1353
+
1354
+ ```python
1355
+ # All coefficients
1356
+ results.coefficients
1357
+ # {'const': 9.5, 'treated': 1.0, 'post': 2.5, 'treated:post': 3.5}
1358
+
1359
+ # Variance-covariance matrix
1360
+ results.vcov
1361
+
1362
+ # Residuals and fitted values
1363
+ results.residuals
1364
+ results.fitted_values
1365
+
1366
+ # R-squared
1367
+ results.r_squared
1368
+ ```
1369
+
1370
+ ## Checking Assumptions
1371
+
1372
+ ### Parallel Trends
1373
+
1374
+ **Simple slope-based test:**
1375
+
1376
+ ```python
1377
+ from diff_diff.utils import check_parallel_trends
1378
+
1379
+ trends = check_parallel_trends(
1380
+ data,
1381
+ outcome='outcome',
1382
+ time='period',
1383
+ treatment_group='treated'
1384
+ )
1385
+
1386
+ print(f"Treated trend: {trends['treated_trend']:.4f}")
1387
+ print(f"Control trend: {trends['control_trend']:.4f}")
1388
+ print(f"Difference p-value: {trends['p_value']:.4f}")
1389
+ ```
1390
+
1391
+ **Robust distributional test (Wasserstein distance):**
1392
+
1393
+ ```python
1394
+ from diff_diff.utils import check_parallel_trends_robust
1395
+
1396
+ results = check_parallel_trends_robust(
1397
+ data,
1398
+ outcome='outcome',
1399
+ time='period',
1400
+ treatment_group='treated',
1401
+ unit='firm_id', # Unit identifier for panel data
1402
+ pre_periods=[2018, 2019], # Pre-treatment periods
1403
+ n_permutations=1000 # Permutations for p-value
1404
+ )
1405
+
1406
+ print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}")
1407
+ print(f"Wasserstein p-value: {results['wasserstein_p_value']:.4f}")
1408
+ print(f"KS test p-value: {results['ks_p_value']:.4f}")
1409
+ print(f"Parallel trends plausible: {results['parallel_trends_plausible']}")
1410
+ ```
1411
+
1412
+ The Wasserstein (Earth Mover's) distance compares the full distribution of outcome changes, not just means. This is more robust to:
1413
+ - Non-normal distributions
1414
+ - Heterogeneous effects across units
1415
+ - Outliers
1416
+
1417
+ **Equivalence testing (TOST):**
1418
+
1419
+ ```python
1420
+ from diff_diff.utils import equivalence_test_trends
1421
+
1422
+ results = equivalence_test_trends(
1423
+ data,
1424
+ outcome='outcome',
1425
+ time='period',
1426
+ treatment_group='treated',
1427
+ unit='firm_id',
1428
+ equivalence_margin=0.5 # Define "practically equivalent"
1429
+ )
1430
+
1431
+ print(f"Mean difference: {results['mean_difference']:.4f}")
1432
+ print(f"TOST p-value: {results['tost_p_value']:.4f}")
1433
+ print(f"Trends equivalent: {results['equivalent']}")
1434
+ ```
1435
+
1436
+ ### Honest DiD Sensitivity Analysis (Rambachan-Roth)
1437
+
1438
+ Pre-trends tests have low power and can exacerbate bias. **Honest DiD** (Rambachan & Roth 2023) provides sensitivity analysis showing how robust your results are to violations of parallel trends.
1439
+
1440
+ ```python
1441
+ from diff_diff import HonestDiD, MultiPeriodDiD
1442
+
1443
+ # First, fit a standard event study
1444
+ did = MultiPeriodDiD()
1445
+ event_results = did.fit(
1446
+ data,
1447
+ outcome='outcome',
1448
+ treatment='treated',
1449
+ time='period',
1450
+ post_periods=[5, 6, 7, 8, 9]
1451
+ )
1452
+
1453
+ # Compute honest bounds with relative magnitudes restriction
1454
+ # M=1 means post-treatment violations can be up to 1x the worst pre-treatment violation
1455
+ honest = HonestDiD(method='relative_magnitude', M=1.0)
1456
+ honest_results = honest.fit(event_results)
1457
+
1458
+ print(honest_results.summary())
1459
+ print(f"Original estimate: {honest_results.original_estimate:.4f}")
1460
+ print(f"Robust 95% CI: [{honest_results.ci_lb:.4f}, {honest_results.ci_ub:.4f}]")
1461
+ print(f"Effect robust to violations: {honest_results.is_significant}")
1462
+ ```
1463
+
1464
+ **Sensitivity analysis over M values:**
1465
+
1466
+ ```python
1467
+ # How do results change as we allow larger violations?
1468
+ sensitivity = honest.sensitivity_analysis(
1469
+ event_results,
1470
+ M_grid=[0, 0.5, 1.0, 1.5, 2.0]
1471
+ )
1472
+
1473
+ print(sensitivity.summary())
1474
+ print(f"Breakdown value: M = {sensitivity.breakdown_M}")
1475
+ # Breakdown = smallest M where the robust CI includes zero
1476
+ ```
1477
+
1478
+ **Breakdown value:**
1479
+
1480
+ The breakdown value tells you how robust your conclusion is:
1481
+
1482
+ ```python
1483
+ breakdown = honest.breakdown_value(event_results)
1484
+ if breakdown >= 1.0:
1485
+ print("Result holds even if post-treatment violations are as bad as pre-treatment")
1486
+ else:
1487
+ print(f"Result requires violations smaller than {breakdown:.1f}x pre-treatment")
1488
+ ```
1489
+
1490
+ **Smoothness restriction (alternative approach):**
1491
+
1492
+ ```python
1493
+ # Bounds second differences of trend violations
1494
+ # M=0 means linear extrapolation of pre-trends
1495
+ honest_smooth = HonestDiD(method='smoothness', M=0.5)
1496
+ smooth_results = honest_smooth.fit(event_results)
1497
+ ```
1498
+
1499
+ **Visualization:**
1500
+
1501
+ ```python
1502
+ from diff_diff import plot_sensitivity, plot_honest_event_study
1503
+
1504
+ # Plot sensitivity analysis
1505
+ plot_sensitivity(sensitivity, title="Sensitivity to Parallel Trends Violations")
1506
+
1507
+ # Event study with honest confidence intervals
1508
+ plot_honest_event_study(event_results, honest_results)
1509
+ ```
1510
+
1511
+ ### Pre-Trends Power Analysis (Roth 2022)
1512
+
1513
+ A passing pre-trends test doesn't mean parallel trends holds—it may just mean the test has low power. **Pre-Trends Power Analysis** (Roth 2022) answers: "What violations could my pre-trends test have detected?"
1514
+
1515
+ ```python
1516
+ from diff_diff import PreTrendsPower, MultiPeriodDiD
1517
+
1518
+ # First, fit an event study
1519
+ did = MultiPeriodDiD()
1520
+ event_results = did.fit(
1521
+ data,
1522
+ outcome='outcome',
1523
+ treatment='treated',
1524
+ time='period',
1525
+ post_periods=[5, 6, 7, 8, 9]
1526
+ )
1527
+
1528
+ # Analyze pre-trends test power
1529
+ pt = PreTrendsPower(alpha=0.05, power=0.80)
1530
+ power_results = pt.fit(event_results)
1531
+
1532
+ print(power_results.summary())
1533
+ print(f"Minimum Detectable Violation (MDV): {power_results.mdv:.4f}")
1534
+ print(f"Power to detect violations of size MDV: {power_results.power:.1%}")
1535
+ ```
1536
+
1537
+ **Key concepts:**
1538
+
1539
+ - **Minimum Detectable Violation (MDV)**: Smallest violation magnitude that would be detected with your target power (e.g., 80%). Passing the pre-trends test does NOT rule out violations up to this size.
1540
+ - **Power**: Probability of detecting a violation of given size if it exists.
1541
+ - **Violation types**: Linear trend, constant violation, last-period only, or custom patterns.
1542
+
1543
+ **Power curve visualization:**
1544
+
1545
+ ```python
1546
+ from diff_diff import plot_pretrends_power
1547
+
1548
+ # Generate power curve across violation magnitudes
1549
+ curve = pt.power_curve(event_results)
1550
+
1551
+ # Plot the power curve
1552
+ plot_pretrends_power(curve, title="Pre-Trends Test Power Curve")
1553
+
1554
+ # Or from the curve object directly
1555
+ curve.plot()
1556
+ ```
1557
+
1558
+ **Different violation patterns:**
1559
+
1560
+ ```python
1561
+ # Linear trend violations (default) - most common assumption
1562
+ pt_linear = PreTrendsPower(violation_type='linear')
1563
+
1564
+ # Constant violation in all pre-periods
1565
+ pt_constant = PreTrendsPower(violation_type='constant')
1566
+
1567
+ # Violation only in the last pre-period (sharp break)
1568
+ pt_last = PreTrendsPower(violation_type='last_period')
1569
+
1570
+ # Custom violation pattern
1571
+ custom_weights = np.array([0.1, 0.3, 0.6]) # Increasing violations
1572
+ pt_custom = PreTrendsPower(violation_type='custom', violation_weights=custom_weights)
1573
+ ```
1574
+
1575
+ **Combining with HonestDiD:**
1576
+
1577
+ Pre-trends power analysis and HonestDiD are complementary:
1578
+ 1. **Pre-trends power** tells you what the test could have detected
1579
+ 2. **HonestDiD** tells you how robust your results are to violations
1580
+
1581
+ ```python
1582
+ from diff_diff import HonestDiD, PreTrendsPower
1583
+
1584
+ # If MDV is large relative to your estimated effect, be cautious
1585
+ pt = PreTrendsPower()
1586
+ power_results = pt.fit(event_results)
1587
+ sensitivity = pt.sensitivity_to_honest_did(event_results)
1588
+ print(sensitivity['interpretation'])
1589
+
1590
+ # Use HonestDiD for robust inference
1591
+ honest = HonestDiD(method='relative_magnitude', M=1.0)
1592
+ honest_results = honest.fit(event_results)
1593
+ ```
1594
+
1595
+ ### Placebo Tests
1596
+
1597
+ Placebo tests help validate the parallel trends assumption by checking whether effects appear where they shouldn't (before treatment or in untreated groups).
1598
+
1599
+ **Fake timing test:**
1600
+
1601
+ ```python
1602
+ from diff_diff import run_placebo_test
1603
+
1604
+ # Test: Is there an effect before treatment actually occurred?
1605
+ # Actual treatment is at period 3 (post_periods=[3, 4, 5])
1606
+ # We test if a "fake" treatment at period 1 shows an effect
1607
+ results = run_placebo_test(
1608
+ data,
1609
+ outcome='outcome',
1610
+ treatment='treated',
1611
+ time='period',
1612
+ test_type='fake_timing',
1613
+ fake_treatment_period=1, # Pretend treatment was in period 1
1614
+ post_periods=[3, 4, 5] # Actual post-treatment periods
1615
+ )
1616
+
1617
+ print(results.summary())
1618
+ # If parallel trends hold, placebo_effect should be ~0 and not significant
1619
+ print(f"Placebo effect: {results.placebo_effect:.3f} (p={results.p_value:.3f})")
1620
+ print(f"Is significant (bad): {results.is_significant}")
1621
+ ```
1622
+
1623
+ **Fake group test:**
1624
+
1625
+ ```python
1626
+ # Test: Is there an effect among never-treated units?
1627
+ # Get some control unit IDs to use as "fake treated"
1628
+ control_units = data[data['treated'] == 0]['firm_id'].unique()[:5]
1629
+
1630
+ results = run_placebo_test(
1631
+ data,
1632
+ outcome='outcome',
1633
+ treatment='treated',
1634
+ time='period',
1635
+ unit='firm_id',
1636
+ test_type='fake_group',
1637
+ fake_treatment_group=list(control_units), # List of control unit IDs
1638
+ post_periods=[3, 4, 5]
1639
+ )
1640
+ ```
1641
+
1642
+ **Permutation test:**
1643
+
1644
+ ```python
1645
+ # Randomly reassign treatment and compute distribution of effects
1646
+ # Note: requires binary post indicator (use 'post' column, not 'period')
1647
+ results = run_placebo_test(
1648
+ data,
1649
+ outcome='outcome',
1650
+ treatment='treated',
1651
+ time='post', # Binary post-treatment indicator
1652
+ unit='firm_id',
1653
+ test_type='permutation',
1654
+ n_permutations=1000,
1655
+ seed=42
1656
+ )
1657
+
1658
+ print(f"Original effect: {results.original_effect:.3f}")
1659
+ print(f"Permutation p-value: {results.p_value:.4f}")
1660
+ # Low p-value indicates the effect is unlikely to be due to chance
1661
+ ```
1662
+
1663
+ **Leave-one-out sensitivity:**
1664
+
1665
+ ```python
1666
+ # Test sensitivity to individual treated units
1667
+ # Note: requires binary post indicator (use 'post' column, not 'period')
1668
+ results = run_placebo_test(
1669
+ data,
1670
+ outcome='outcome',
1671
+ treatment='treated',
1672
+ time='post', # Binary post-treatment indicator
1673
+ unit='firm_id',
1674
+ test_type='leave_one_out'
1675
+ )
1676
+
1677
+ # Check if any single unit drives the result
1678
+ print(results.leave_one_out_effects) # Effect when each unit is dropped
1679
+ ```
1680
+
1681
+ **Run all placebo tests:**
1682
+
1683
+ ```python
1684
+ from diff_diff import run_all_placebo_tests
1685
+
1686
+ # Comprehensive diagnostic suite
1687
+ # Note: This function runs fake_timing tests on pre-treatment periods.
1688
+ # The permutation and leave_one_out tests require a binary post indicator,
1689
+ # so they may return errors if the data uses multi-period time column.
1690
+ all_results = run_all_placebo_tests(
1691
+ data,
1692
+ outcome='outcome',
1693
+ treatment='treated',
1694
+ time='period',
1695
+ unit='firm_id',
1696
+ pre_periods=[0, 1, 2],
1697
+ post_periods=[3, 4, 5],
1698
+ n_permutations=500,
1699
+ seed=42
1700
+ )
1701
+
1702
+ for test_name, result in all_results.items():
1703
+ if hasattr(result, 'p_value'):
1704
+ print(f"{test_name}: p={result.p_value:.3f}, significant={result.is_significant}")
1705
+ elif isinstance(result, dict) and 'error' in result:
1706
+ print(f"{test_name}: Error - {result['error']}")
1707
+ ```
1708
+
1709
+ ## API Reference
1710
+
1711
+ ### DifferenceInDifferences
1712
+
1713
+ ```python
1714
+ DifferenceInDifferences(
1715
+ robust=True, # Use HC1 robust standard errors
1716
+ cluster=None, # Column for cluster-robust SEs
1717
+ alpha=0.05 # Significance level for CIs
1718
+ )
1719
+ ```
1720
+
1721
+ **Methods:**
1722
+
1723
+ | Method | Description |
1724
+ |--------|-------------|
1725
+ | `fit(data, outcome, treatment, time, ...)` | Fit the DiD model |
1726
+ | `summary()` | Get formatted summary string |
1727
+ | `print_summary()` | Print summary to stdout |
1728
+ | `get_params()` | Get estimator parameters (sklearn-compatible) |
1729
+ | `set_params(**params)` | Set estimator parameters (sklearn-compatible) |
1730
+
1731
+ **fit() Parameters:**
1732
+
1733
+ | Parameter | Type | Description |
1734
+ |-----------|------|-------------|
1735
+ | `data` | DataFrame | Input data |
1736
+ | `outcome` | str | Outcome variable column name |
1737
+ | `treatment` | str | Treatment indicator column (0/1) |
1738
+ | `time` | str | Post-treatment indicator column (0/1) |
1739
+ | `formula` | str | R-style formula (alternative to column names) |
1740
+ | `covariates` | list | Linear control variables |
1741
+ | `fixed_effects` | list | Categorical FE columns (creates dummies) |
1742
+ | `absorb` | list | High-dimensional FE (within-transformation) |
1743
+
1744
+ ### DiDResults
1745
+
1746
+ **Attributes:**
1747
+
1748
+ | Attribute | Description |
1749
+ |-----------|-------------|
1750
+ | `att` | Average Treatment effect on the Treated |
1751
+ | `se` | Standard error of ATT |
1752
+ | `t_stat` | T-statistic |
1753
+ | `p_value` | P-value for H0: ATT = 0 |
1754
+ | `conf_int` | Tuple of (lower, upper) confidence bounds |
1755
+ | `n_obs` | Number of observations |
1756
+ | `n_treated` | Number of treated units |
1757
+ | `n_control` | Number of control units |
1758
+ | `r_squared` | R-squared of regression |
1759
+ | `coefficients` | Dictionary of all coefficients |
1760
+ | `is_significant` | Boolean for significance at alpha |
1761
+ | `significance_stars` | String of significance stars |
1762
+
1763
+ **Methods:**
1764
+
1765
+ | Method | Description |
1766
+ |--------|-------------|
1767
+ | `summary(alpha)` | Get formatted summary string |
1768
+ | `print_summary(alpha)` | Print summary to stdout |
1769
+ | `to_dict()` | Convert to dictionary |
1770
+ | `to_dataframe()` | Convert to pandas DataFrame |
1771
+
1772
+ ### MultiPeriodDiD
1773
+
1774
+ ```python
1775
+ MultiPeriodDiD(
1776
+ robust=True, # Use HC1 robust standard errors
1777
+ cluster=None, # Column for cluster-robust SEs
1778
+ alpha=0.05 # Significance level for CIs
1779
+ )
1780
+ ```
1781
+
1782
+ **fit() Parameters:**
1783
+
1784
+ | Parameter | Type | Description |
1785
+ |-----------|------|-------------|
1786
+ | `data` | DataFrame | Input data |
1787
+ | `outcome` | str | Outcome variable column name |
1788
+ | `treatment` | str | Treatment indicator column (0/1) |
1789
+ | `time` | str | Time period column (multiple values) |
1790
+ | `post_periods` | list | List of post-treatment period values |
1791
+ | `covariates` | list | Linear control variables |
1792
+ | `fixed_effects` | list | Categorical FE columns (creates dummies) |
1793
+ | `absorb` | list | High-dimensional FE (within-transformation) |
1794
+ | `reference_period` | any | Omitted period for time dummies |
1795
+
1796
+ ### MultiPeriodDiDResults
1797
+
1798
+ **Attributes:**
1799
+
1800
+ | Attribute | Description |
1801
+ |-----------|-------------|
1802
+ | `period_effects` | Dict mapping periods to PeriodEffect objects |
1803
+ | `avg_att` | Average ATT across post-treatment periods |
1804
+ | `avg_se` | Standard error of average ATT |
1805
+ | `avg_t_stat` | T-statistic for average ATT |
1806
+ | `avg_p_value` | P-value for average ATT |
1807
+ | `avg_conf_int` | Confidence interval for average ATT |
1808
+ | `n_obs` | Number of observations |
1809
+ | `pre_periods` | List of pre-treatment periods |
1810
+ | `post_periods` | List of post-treatment periods |
1811
+
1812
+ **Methods:**
1813
+
1814
+ | Method | Description |
1815
+ |--------|-------------|
1816
+ | `get_effect(period)` | Get PeriodEffect for specific period |
1817
+ | `summary(alpha)` | Get formatted summary string |
1818
+ | `print_summary(alpha)` | Print summary to stdout |
1819
+ | `to_dict()` | Convert to dictionary |
1820
+ | `to_dataframe()` | Convert to pandas DataFrame |
1821
+
1822
+ ### PeriodEffect
1823
+
1824
+ **Attributes:**
1825
+
1826
+ | Attribute | Description |
1827
+ |-----------|-------------|
1828
+ | `period` | Time period identifier |
1829
+ | `effect` | Treatment effect estimate |
1830
+ | `se` | Standard error |
1831
+ | `t_stat` | T-statistic |
1832
+ | `p_value` | P-value |
1833
+ | `conf_int` | Confidence interval |
1834
+ | `is_significant` | Boolean for significance at 0.05 |
1835
+ | `significance_stars` | String of significance stars |
1836
+
1837
+ ### SyntheticDiD
1838
+
1839
+ ```python
1840
+ SyntheticDiD(
1841
+ lambda_reg=0.0, # L2 regularization for unit weights
1842
+ zeta=1.0, # Regularization for time weights
1843
+ alpha=0.05, # Significance level for CIs
1844
+ n_bootstrap=200, # Bootstrap iterations for SE
1845
+ seed=None # Random seed for reproducibility
1846
+ )
1847
+ ```
1848
+
1849
+ **fit() Parameters:**
1850
+
1851
+ | Parameter | Type | Description |
1852
+ |-----------|------|-------------|
1853
+ | `data` | DataFrame | Panel data |
1854
+ | `outcome` | str | Outcome variable column name |
1855
+ | `treatment` | str | Treatment indicator column (0/1) |
1856
+ | `unit` | str | Unit identifier column |
1857
+ | `time` | str | Time period column |
1858
+ | `post_periods` | list | List of post-treatment period values |
1859
+ | `covariates` | list | Covariates to residualize out |
1860
+
1861
+ ### SyntheticDiDResults
1862
+
1863
+ **Attributes:**
1864
+
1865
+ | Attribute | Description |
1866
+ |-----------|-------------|
1867
+ | `att` | Average Treatment effect on the Treated |
1868
+ | `se` | Standard error (bootstrap or placebo-based) |
1869
+ | `t_stat` | T-statistic |
1870
+ | `p_value` | P-value |
1871
+ | `conf_int` | Confidence interval |
1872
+ | `n_obs` | Number of observations |
1873
+ | `n_treated` | Number of treated units |
1874
+ | `n_control` | Number of control units |
1875
+ | `unit_weights` | Dict mapping control unit IDs to weights |
1876
+ | `time_weights` | Dict mapping pre-treatment periods to weights |
1877
+ | `pre_periods` | List of pre-treatment periods |
1878
+ | `post_periods` | List of post-treatment periods |
1879
+ | `pre_treatment_fit` | RMSE of synthetic vs treated in pre-period |
1880
+ | `placebo_effects` | Array of placebo effect estimates |
1881
+
1882
+ **Methods:**
1883
+
1884
+ | Method | Description |
1885
+ |--------|-------------|
1886
+ | `summary(alpha)` | Get formatted summary string |
1887
+ | `print_summary(alpha)` | Print summary to stdout |
1888
+ | `to_dict()` | Convert to dictionary |
1889
+ | `to_dataframe()` | Convert to pandas DataFrame |
1890
+ | `get_unit_weights_df()` | Get unit weights as DataFrame |
1891
+ | `get_time_weights_df()` | Get time weights as DataFrame |
1892
+
1893
+ ### TROP
1894
+
1895
+ ```python
1896
+ TROP(
1897
+ lambda_time_grid=None, # Time decay grid (default: [0, 0.1, 0.5, 1, 2, 5])
1898
+ lambda_unit_grid=None, # Unit distance grid (default: [0, 0.1, 0.5, 1, 2, 5])
1899
+ lambda_nn_grid=None, # Nuclear norm grid (default: [0, 0.01, 0.1, 1, 10])
1900
+ max_iter=100, # Max iterations for factor estimation
1901
+ tol=1e-6, # Convergence tolerance
1902
+ alpha=0.05, # Significance level for CIs
1903
+ variance_method='bootstrap', # 'bootstrap' or 'jackknife'
1904
+ n_bootstrap=200, # Bootstrap/jackknife iterations
1905
+ seed=None # Random seed
1906
+ )
1907
+ ```
1908
+
1909
+ **fit() Parameters:**
1910
+
1911
+ | Parameter | Type | Description |
1912
+ |-----------|------|-------------|
1913
+ | `data` | DataFrame | Panel data |
1914
+ | `outcome` | str | Outcome variable column name |
1915
+ | `treatment` | str | Treatment indicator column (0/1) |
1916
+ | `unit` | str | Unit identifier column |
1917
+ | `time` | str | Time period column |
1918
+ | `post_periods` | list | List of post-treatment period values |
1919
+
1920
+ ### TROPResults
1921
+
1922
+ **Attributes:**
1923
+
1924
+ | Attribute | Description |
1925
+ |-----------|-------------|
1926
+ | `att` | Average Treatment effect on the Treated |
1927
+ | `se` | Standard error (bootstrap or jackknife) |
1928
+ | `t_stat` | T-statistic |
1929
+ | `p_value` | P-value |
1930
+ | `conf_int` | Confidence interval |
1931
+ | `n_obs` | Number of observations |
1932
+ | `n_treated` | Number of treated units |
1933
+ | `n_control` | Number of control units |
1934
+ | `n_treated_obs` | Number of treated unit-time observations |
1935
+ | `unit_effects` | Dict mapping unit IDs to fixed effects |
1936
+ | `time_effects` | Dict mapping periods to fixed effects |
1937
+ | `treatment_effects` | Dict mapping (unit, time) to individual effects |
1938
+ | `lambda_time` | Selected time decay parameter |
1939
+ | `lambda_unit` | Selected unit distance parameter |
1940
+ | `lambda_nn` | Selected nuclear norm parameter |
1941
+ | `factor_matrix` | Low-rank factor matrix L (n_periods x n_units) |
1942
+ | `effective_rank` | Effective rank of factor matrix |
1943
+ | `loocv_score` | LOOCV score for selected parameters |
1944
+ | `pre_periods` | List of pre-treatment periods |
1945
+ | `post_periods` | List of post-treatment periods |
1946
+ | `variance_method` | Variance estimation method |
1947
+ | `bootstrap_distribution` | Bootstrap distribution (if bootstrap) |
1948
+
1949
+ **Methods:**
1950
+
1951
+ | Method | Description |
1952
+ |--------|-------------|
1953
+ | `summary(alpha)` | Get formatted summary string |
1954
+ | `print_summary(alpha)` | Print summary to stdout |
1955
+ | `to_dict()` | Convert to dictionary |
1956
+ | `to_dataframe()` | Convert to pandas DataFrame |
1957
+ | `get_unit_effects_df()` | Get unit fixed effects as DataFrame |
1958
+ | `get_time_effects_df()` | Get time fixed effects as DataFrame |
1959
+ | `get_treatment_effects_df()` | Get individual treatment effects as DataFrame |
1960
+
1961
+ ### SunAbraham
1962
+
1963
+ ```python
1964
+ SunAbraham(
1965
+ control_group='never_treated', # or 'not_yet_treated'
1966
+ anticipation=0, # Periods of anticipation effects
1967
+ alpha=0.05, # Significance level for CIs
1968
+ cluster=None, # Column for cluster-robust SEs
1969
+ n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs)
1970
+ bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
1971
+ seed=None # Random seed
1972
+ )
1973
+ ```
1974
+
1975
+ **fit() Parameters:**
1976
+
1977
+ | Parameter | Type | Description |
1978
+ |-----------|------|-------------|
1979
+ | `data` | DataFrame | Panel data |
1980
+ | `outcome` | str | Outcome variable column name |
1981
+ | `unit` | str | Unit identifier column |
1982
+ | `time` | str | Time period column |
1983
+ | `first_treat` | str | Column with first treatment period (0 for never-treated) |
1984
+ | `covariates` | list | Covariate column names |
1985
+ | `min_pre_periods` | int | Minimum pre-treatment periods to include |
1986
+ | `min_post_periods` | int | Minimum post-treatment periods to include |
1987
+
1988
+ ### SunAbrahamResults
1989
+
1990
+ **Attributes:**
1991
+
1992
+ | Attribute | Description |
1993
+ |-----------|-------------|
1994
+ | `event_study_effects` | Dict mapping relative time to effect info |
1995
+ | `overall_att` | Overall average treatment effect |
1996
+ | `overall_se` | Standard error of overall ATT |
1997
+ | `overall_t_stat` | T-statistic for overall ATT |
1998
+ | `overall_p_value` | P-value for overall ATT |
1999
+ | `overall_conf_int` | Confidence interval for overall ATT |
2000
+ | `cohort_weights` | Dict mapping relative time to cohort weights |
2001
+ | `groups` | List of treatment cohorts |
2002
+ | `time_periods` | List of all time periods |
2003
+ | `n_obs` | Total number of observations |
2004
+ | `n_treated_units` | Number of ever-treated units |
2005
+ | `n_control_units` | Number of never-treated units |
2006
+ | `is_significant` | Boolean for significance at alpha |
2007
+ | `significance_stars` | String of significance stars |
2008
+ | `bootstrap_results` | SABootstrapResults (if bootstrap enabled) |
2009
+
2010
+ **Methods:**
2011
+
2012
+ | Method | Description |
2013
+ |--------|-------------|
2014
+ | `summary(alpha)` | Get formatted summary string |
2015
+ | `print_summary(alpha)` | Print summary to stdout |
2016
+ | `to_dataframe(level)` | Convert to DataFrame ('event_study' or 'cohort') |
2017
+
2018
+ ### TripleDifference
2019
+
2020
+ ```python
2021
+ TripleDifference(
2022
+ estimation_method='dr', # 'dr' (doubly robust), 'reg', or 'ipw'
2023
+ robust=True, # Use HC1 robust standard errors
2024
+ cluster=None, # Column for cluster-robust SEs
2025
+ alpha=0.05, # Significance level for CIs
2026
+ pscore_trim=0.01 # Propensity score trimming threshold
2027
+ )
2028
+ ```
2029
+
2030
+ **fit() Parameters:**
2031
+
2032
+ | Parameter | Type | Description |
2033
+ |-----------|------|-------------|
2034
+ | `data` | DataFrame | Input data |
2035
+ | `outcome` | str | Outcome variable column name |
2036
+ | `group` | str | Group indicator column (0/1): 1=treated group |
2037
+ | `partition` | str | Partition/eligibility indicator column (0/1): 1=eligible |
2038
+ | `time` | str | Time indicator column (0/1): 1=post-treatment |
2039
+ | `covariates` | list | Covariate column names for adjustment |
2040
+
2041
+ ### TripleDifferenceResults
2042
+
2043
+ **Attributes:**
2044
+
2045
+ | Attribute | Description |
2046
+ |-----------|-------------|
2047
+ | `att` | Average Treatment effect on the Treated |
2048
+ | `se` | Standard error of ATT |
2049
+ | `t_stat` | T-statistic |
2050
+ | `p_value` | P-value for H0: ATT = 0 |
2051
+ | `conf_int` | Tuple of (lower, upper) confidence bounds |
2052
+ | `n_obs` | Total number of observations |
2053
+ | `n_treated_eligible` | Obs in treated group & eligible partition |
2054
+ | `n_treated_ineligible` | Obs in treated group & ineligible partition |
2055
+ | `n_control_eligible` | Obs in control group & eligible partition |
2056
+ | `n_control_ineligible` | Obs in control group & ineligible partition |
2057
+ | `estimation_method` | Method used ('dr', 'reg', or 'ipw') |
2058
+ | `group_means` | Dict of cell means for diagnostics |
2059
+ | `pscore_stats` | Propensity score statistics (IPW/DR only) |
2060
+ | `is_significant` | Boolean for significance at alpha |
2061
+ | `significance_stars` | String of significance stars |
2062
+
2063
+ **Methods:**
2064
+
2065
+ | Method | Description |
2066
+ |--------|-------------|
2067
+ | `summary(alpha)` | Get formatted summary string |
2068
+ | `print_summary(alpha)` | Print summary to stdout |
2069
+ | `to_dict()` | Convert to dictionary |
2070
+ | `to_dataframe()` | Convert to pandas DataFrame |
2071
+
2072
+ ### HonestDiD
2073
+
2074
+ ```python
2075
+ HonestDiD(
2076
+ method='relative_magnitude', # 'relative_magnitude' or 'smoothness'
2077
+ M=None, # Restriction parameter (default: 1.0 for RM, 0.0 for SD)
2078
+ alpha=0.05, # Significance level for CIs
2079
+ l_vec=None # Linear combination vector for target parameter
2080
+ )
2081
+ ```
2082
+
2083
+ **fit() Parameters:**
2084
+
2085
+ | Parameter | Type | Description |
2086
+ |-----------|------|-------------|
2087
+ | `results` | MultiPeriodDiDResults | Results from MultiPeriodDiD.fit() |
2088
+ | `M` | float | Restriction parameter (overrides constructor value) |
2089
+
2090
+ **Methods:**
2091
+
2092
+ | Method | Description |
2093
+ |--------|-------------|
2094
+ | `fit(results, M)` | Compute bounds for given event study results |
2095
+ | `sensitivity_analysis(results, M_grid)` | Compute bounds over grid of M values |
2096
+ | `breakdown_value(results, tol)` | Find smallest M where CI includes zero |
2097
+
2098
+ ### HonestDiDResults
2099
+
2100
+ **Attributes:**
2101
+
2102
+ | Attribute | Description |
2103
+ |-----------|-------------|
2104
+ | `original_estimate` | Point estimate under parallel trends |
2105
+ | `lb` | Lower bound of identified set |
2106
+ | `ub` | Upper bound of identified set |
2107
+ | `ci_lb` | Lower bound of robust confidence interval |
2108
+ | `ci_ub` | Upper bound of robust confidence interval |
2109
+ | `ci_width` | Width of robust CI |
2110
+ | `M` | Restriction parameter used |
2111
+ | `method` | Restriction method ('relative_magnitude' or 'smoothness') |
2112
+ | `alpha` | Significance level |
2113
+ | `is_significant` | True if robust CI excludes zero |
2114
+
2115
+ **Methods:**
2116
+
2117
+ | Method | Description |
2118
+ |--------|-------------|
2119
+ | `summary()` | Get formatted summary string |
2120
+ | `to_dict()` | Convert to dictionary |
2121
+ | `to_dataframe()` | Convert to pandas DataFrame |
2122
+
2123
+ ### SensitivityResults
2124
+
2125
+ **Attributes:**
2126
+
2127
+ | Attribute | Description |
2128
+ |-----------|-------------|
2129
+ | `M_grid` | Array of M values analyzed |
2130
+ | `results` | List of HonestDiDResults for each M |
2131
+ | `breakdown_M` | Smallest M where CI includes zero (None if always significant) |
2132
+
2133
+ **Methods:**
2134
+
2135
+ | Method | Description |
2136
+ |--------|-------------|
2137
+ | `summary()` | Get formatted summary string |
2138
+ | `plot(ax)` | Plot sensitivity analysis |
2139
+ | `to_dataframe()` | Convert to pandas DataFrame |
2140
+
2141
+ ### PreTrendsPower
2142
+
2143
+ ```python
2144
+ PreTrendsPower(
2145
+ alpha=0.05, # Significance level for pre-trends test
2146
+ power=0.80, # Target power for MDV calculation
2147
+ violation_type='linear', # 'linear', 'constant', 'last_period', 'custom'
2148
+ violation_weights=None # Custom weights (required if violation_type='custom')
2149
+ )
2150
+ ```
2151
+
2152
+ **fit() Parameters:**
2153
+
2154
+ | Parameter | Type | Description |
2155
+ |-----------|------|-------------|
2156
+ | `results` | MultiPeriodDiDResults | Results from event study |
2157
+ | `M` | float | Specific violation magnitude to evaluate |
2158
+
2159
+ **Methods:**
2160
+
2161
+ | Method | Description |
2162
+ |--------|-------------|
2163
+ | `fit(results, M)` | Compute power analysis for given event study |
2164
+ | `power_at(results, M)` | Compute power for specific violation magnitude |
2165
+ | `power_curve(results, M_grid, n_points)` | Compute power across range of M values |
2166
+ | `sensitivity_to_honest_did(results)` | Compare with HonestDiD analysis |
2167
+
2168
+ ### PreTrendsPowerResults
2169
+
2170
+ **Attributes:**
2171
+
2172
+ | Attribute | Description |
2173
+ |-----------|-------------|
2174
+ | `power` | Power to detect the specified violation |
2175
+ | `mdv` | Minimum detectable violation at target power |
2176
+ | `violation_magnitude` | Violation magnitude (M) tested |
2177
+ | `violation_type` | Type of violation pattern |
2178
+ | `alpha` | Significance level |
2179
+ | `target_power` | Target power level |
2180
+ | `n_pre_periods` | Number of pre-treatment periods |
2181
+ | `test_statistic` | Expected test statistic under violation |
2182
+ | `critical_value` | Critical value for pre-trends test |
2183
+ | `noncentrality` | Non-centrality parameter |
2184
+ | `is_informative` | Heuristic check if test is informative |
2185
+ | `power_adequate` | Whether power meets target |
2186
+
2187
+ **Methods:**
2188
+
2189
+ | Method | Description |
2190
+ |--------|-------------|
2191
+ | `summary()` | Get formatted summary string |
2192
+ | `print_summary()` | Print summary to stdout |
2193
+ | `to_dict()` | Convert to dictionary |
2194
+ | `to_dataframe()` | Convert to pandas DataFrame |
2195
+
2196
+ ### PreTrendsPowerCurve
2197
+
2198
+ **Attributes:**
2199
+
2200
+ | Attribute | Description |
2201
+ |-----------|-------------|
2202
+ | `M_values` | Array of violation magnitudes |
2203
+ | `powers` | Array of power values |
2204
+ | `mdv` | Minimum detectable violation |
2205
+ | `alpha` | Significance level |
2206
+ | `target_power` | Target power level |
2207
+ | `violation_type` | Type of violation pattern |
2208
+
2209
+ **Methods:**
2210
+
2211
+ | Method | Description |
2212
+ |--------|-------------|
2213
+ | `plot(ax, show_mdv, show_target)` | Plot power curve |
2214
+ | `to_dataframe()` | Convert to DataFrame with M and power columns |
2215
+
2216
+ ### Data Preparation Functions
2217
+
2218
+ #### generate_did_data
2219
+
2220
+ ```python
2221
+ generate_did_data(
2222
+ n_units=100, # Number of units
2223
+ n_periods=4, # Number of time periods
2224
+ treatment_effect=5.0, # True ATT
2225
+ treatment_fraction=0.5, # Fraction treated
2226
+ treatment_period=2, # First post-treatment period
2227
+ unit_fe_sd=2.0, # Unit fixed effect std dev
2228
+ time_trend=0.5, # Linear time trend
2229
+ noise_sd=1.0, # Idiosyncratic noise std dev
2230
+ seed=None # Random seed
2231
+ )
2232
+ ```
2233
+
2234
+ Returns DataFrame with columns: `unit`, `period`, `treated`, `post`, `outcome`, `true_effect`.
2235
+
2236
+ #### make_treatment_indicator
2237
+
2238
+ ```python
2239
+ make_treatment_indicator(
2240
+ data, # Input DataFrame
2241
+ column, # Column to create treatment from
2242
+ treated_values=None, # Value(s) indicating treatment
2243
+ threshold=None, # Numeric threshold for treatment
2244
+ above_threshold=True, # If True, >= threshold is treated
2245
+ new_column='treated' # Output column name
2246
+ )
2247
+ ```
2248
+
2249
+ #### make_post_indicator
2250
+
2251
+ ```python
2252
+ make_post_indicator(
2253
+ data, # Input DataFrame
2254
+ time_column, # Time/period column
2255
+ post_periods=None, # Specific post-treatment period(s)
2256
+ treatment_start=None, # First post-treatment period
2257
+ new_column='post' # Output column name
2258
+ )
2259
+ ```
2260
+
2261
+ #### wide_to_long
2262
+
2263
+ ```python
2264
+ wide_to_long(
2265
+ data, # Wide-format DataFrame
2266
+ value_columns, # List of time-varying columns
2267
+ id_column, # Unit identifier column
2268
+ time_name='period', # Name for time column
2269
+ value_name='value', # Name for value column
2270
+ time_values=None # Values for time periods
2271
+ )
2272
+ ```
2273
+
2274
+ #### balance_panel
2275
+
2276
+ ```python
2277
+ balance_panel(
2278
+ data, # Panel DataFrame
2279
+ unit_column, # Unit identifier column
2280
+ time_column, # Time period column
2281
+ method='inner', # 'inner', 'outer', or 'fill'
2282
+ fill_value=None # Value for filling (if method='fill')
2283
+ )
2284
+ ```
2285
+
2286
+ #### validate_did_data
2287
+
2288
+ ```python
2289
+ validate_did_data(
2290
+ data, # DataFrame to validate
2291
+ outcome, # Outcome column name
2292
+ treatment, # Treatment column name
2293
+ time, # Time/post column name
2294
+ unit=None, # Unit column (for panel validation)
2295
+ raise_on_error=True # Raise ValueError or return dict
2296
+ )
2297
+ ```
2298
+
2299
+ Returns dict with `valid`, `errors`, `warnings`, and `summary` keys.
2300
+
2301
+ #### summarize_did_data
2302
+
2303
+ ```python
2304
+ summarize_did_data(
2305
+ data, # Input DataFrame
2306
+ outcome, # Outcome column name
2307
+ treatment, # Treatment column name
2308
+ time, # Time/post column name
2309
+ unit=None # Unit column (optional)
2310
+ )
2311
+ ```
2312
+
2313
+ Returns DataFrame with summary statistics by treatment-time cell.
2314
+
2315
+ #### create_event_time
2316
+
2317
+ ```python
2318
+ create_event_time(
2319
+ data, # Panel DataFrame
2320
+ time_column, # Calendar time column
2321
+ treatment_time_column, # Column with treatment timing
2322
+ new_column='event_time' # Output column name
2323
+ )
2324
+ ```
2325
+
2326
+ #### aggregate_to_cohorts
2327
+
2328
+ ```python
2329
+ aggregate_to_cohorts(
2330
+ data, # Unit-level panel data
2331
+ unit_column, # Unit identifier column
2332
+ time_column, # Time period column
2333
+ treatment_column, # Treatment indicator column
2334
+ outcome, # Outcome variable column
2335
+ covariates=None # Additional columns to aggregate
2336
+ )
2337
+ ```
2338
+
2339
+ #### rank_control_units
2340
+
2341
+ ```python
2342
+ rank_control_units(
2343
+ data, # Panel data in long format
2344
+ unit_column, # Unit identifier column
2345
+ time_column, # Time period column
2346
+ outcome_column, # Outcome variable column
2347
+ treatment_column=None, # Treatment indicator column (0/1)
2348
+ treated_units=None, # Explicit list of treated unit IDs
2349
+ pre_periods=None, # Pre-treatment periods (default: first half)
2350
+ covariates=None, # Covariate columns for matching
2351
+ outcome_weight=0.7, # Weight for outcome trend similarity (0-1)
2352
+ covariate_weight=0.3, # Weight for covariate distance (0-1)
2353
+ exclude_units=None, # Units to exclude from control pool
2354
+ require_units=None, # Units that must appear in output
2355
+ n_top=None, # Return only top N controls
2356
+ suggest_treatment_candidates=False, # Identify treatment candidates
2357
+ n_treatment_candidates=5, # Number of treatment candidates
2358
+ lambda_reg=0.0 # Regularization for synthetic weights
2359
+ )
2360
+ ```
2361
+
2362
+ Returns DataFrame with columns: `unit`, `quality_score`, `outcome_trend_score`, `covariate_score`, `synthetic_weight`, `pre_trend_rmse`, `is_required`.
2363
+
2364
+ ## Requirements
2365
+
2366
+ - Python >= 3.9
2367
+ - numpy >= 1.20
2368
+ - pandas >= 1.3
2369
+ - scipy >= 1.7
2370
+
2371
+ ## Development
2372
+
2373
+ ```bash
2374
+ # Install with dev dependencies
2375
+ pip install -e ".[dev]"
2376
+
2377
+ # Run tests
2378
+ pytest
2379
+
2380
+ # Format code
2381
+ black diff_diff tests
2382
+ ruff check diff_diff tests
2383
+ ```
2384
+
2385
+ ## References
2386
+
2387
+ This library implements methods from the following scholarly works:
2388
+
2389
+ ### Difference-in-Differences
2390
+
2391
+ - **Ashenfelter, O., & Card, D. (1985).** "Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs." *The Review of Economics and Statistics*, 67(4), 648-660. [https://doi.org/10.2307/1924810](https://doi.org/10.2307/1924810)
2392
+
2393
+ - **Card, D., & Krueger, A. B. (1994).** "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania." *The American Economic Review*, 84(4), 772-793. [https://www.jstor.org/stable/2118030](https://www.jstor.org/stable/2118030)
2394
+
2395
+ - **Angrist, J. D., & Pischke, J.-S. (2009).** *Mostly Harmless Econometrics: An Empiricist's Companion*. Princeton University Press. Chapter 5: Differences-in-Differences.
2396
+
2397
+ ### Two-Way Fixed Effects
2398
+
2399
+ - **Wooldridge, J. M. (2010).** *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press.
2400
+
2401
+ - **Imai, K., & Kim, I. S. (2021).** "On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data." *Political Analysis*, 29(3), 405-415. [https://doi.org/10.1017/pan.2020.33](https://doi.org/10.1017/pan.2020.33)
2402
+
2403
+ ### Robust Standard Errors
2404
+
2405
+ - **White, H. (1980).** "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." *Econometrica*, 48(4), 817-838. [https://doi.org/10.2307/1912934](https://doi.org/10.2307/1912934)
2406
+
2407
+ - **MacKinnon, J. G., & White, H. (1985).** "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." *Journal of Econometrics*, 29(3), 305-325. [https://doi.org/10.1016/0304-4076(85)90158-7](https://doi.org/10.1016/0304-4076(85)90158-7)
2408
+
2409
+ - **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011).** "Robust Inference With Multiway Clustering." *Journal of Business & Economic Statistics*, 29(2), 238-249. [https://doi.org/10.1198/jbes.2010.07136](https://doi.org/10.1198/jbes.2010.07136)
2410
+
2411
+ ### Wild Cluster Bootstrap
2412
+
2413
+ - **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008).** "Bootstrap-Based Improvements for Inference with Clustered Errors." *The Review of Economics and Statistics*, 90(3), 414-427. [https://doi.org/10.1162/rest.90.3.414](https://doi.org/10.1162/rest.90.3.414)
2414
+
2415
+ - **Webb, M. D. (2014).** "Reworking Wild Bootstrap Based Inference for Clustered Errors." Queen's Economics Department Working Paper No. 1315. [https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf](https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf)
2416
+
2417
+ - **MacKinnon, J. G., & Webb, M. D. (2018).** "The Wild Bootstrap for Few (Treated) Clusters." *The Econometrics Journal*, 21(2), 114-135. [https://doi.org/10.1111/ectj.12107](https://doi.org/10.1111/ectj.12107)
2418
+
2419
+ ### Placebo Tests and DiD Diagnostics
2420
+
2421
+ - **Bertrand, M., Duflo, E., & Mullainathan, S. (2004).** "How Much Should We Trust Differences-in-Differences Estimates?" *The Quarterly Journal of Economics*, 119(1), 249-275. [https://doi.org/10.1162/003355304772839588](https://doi.org/10.1162/003355304772839588)
2422
+
2423
+ ### Synthetic Control Method
2424
+
2425
+ - **Abadie, A., & Gardeazabal, J. (2003).** "The Economic Costs of Conflict: A Case Study of the Basque Country." *The American Economic Review*, 93(1), 113-132. [https://doi.org/10.1257/000282803321455188](https://doi.org/10.1257/000282803321455188)
2426
+
2427
+ - **Abadie, A., Diamond, A., & Hainmueller, J. (2010).** "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." *Journal of the American Statistical Association*, 105(490), 493-505. [https://doi.org/10.1198/jasa.2009.ap08746](https://doi.org/10.1198/jasa.2009.ap08746)
2428
+
2429
+ - **Abadie, A., Diamond, A., & Hainmueller, J. (2015).** "Comparative Politics and the Synthetic Control Method." *American Journal of Political Science*, 59(2), 495-510. [https://doi.org/10.1111/ajps.12116](https://doi.org/10.1111/ajps.12116)
2430
+
2431
+ ### Synthetic Difference-in-Differences
2432
+
2433
+ - **Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021).** "Synthetic Difference-in-Differences." *American Economic Review*, 111(12), 4088-4118. [https://doi.org/10.1257/aer.20190159](https://doi.org/10.1257/aer.20190159)
2434
+
2435
+ ### Triply Robust Panel (TROP)
2436
+
2437
+ - **Athey, S., Imbens, G. W., Qu, Z., & Viviano, D. (2025).** "Triply Robust Panel Estimators." *Working Paper*. [https://arxiv.org/abs/2508.21536](https://arxiv.org/abs/2508.21536)
2438
+
2439
+ This paper introduces the TROP estimator which combines three robustness components:
2440
+ - **Factor model adjustment**: Low-rank factor structure via SVD removes unobserved confounders
2441
+ - **Unit weights**: Synthetic control style weighting for optimal comparison
2442
+ - **Time weights**: SDID style time weighting for informative pre-periods
2443
+
2444
+ TROP is particularly useful when there are unobserved time-varying confounders with a factor structure that affect different units differently over time.
2445
+
2446
+ ### Triple Difference (DDD)
2447
+
2448
+ - **Ortiz-Villavicencio, M., & Sant'Anna, P. H. C. (2025).** "Better Understanding Triple Differences Estimators." *Working Paper*. [https://arxiv.org/abs/2505.09942](https://arxiv.org/abs/2505.09942)
2449
+
2450
+ This paper shows that common DDD implementations (taking the difference between two DiDs, or applying three-way fixed effects regressions) are generally invalid when identification requires conditioning on covariates. The `TripleDifference` class implements their regression adjustment, inverse probability weighting, and doubly robust estimators.
2451
+
2452
+ - **Gruber, J. (1994).** "The Incidence of Mandated Maternity Benefits." *American Economic Review*, 84(3), 622-641. [https://www.jstor.org/stable/2118071](https://www.jstor.org/stable/2118071)
2453
+
2454
+ Classic paper introducing the Triple Difference design for policy evaluation.
2455
+
2456
+ - **Olden, A., & Møen, J. (2022).** "The Triple Difference Estimator." *The Econometrics Journal*, 25(3), 531-553. [https://doi.org/10.1093/ectj/utac010](https://doi.org/10.1093/ectj/utac010)
2457
+
2458
+ ### Parallel Trends and Pre-Trend Testing
2459
+
2460
+ - **Roth, J. (2022).** "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." *American Economic Review: Insights*, 4(3), 305-322. [https://doi.org/10.1257/aeri.20210236](https://doi.org/10.1257/aeri.20210236)
2461
+
2462
+ - **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. [https://doi.org/10.1177/1948550617697177](https://doi.org/10.1177/1948550617697177)
2463
+
2464
+ ### Honest DiD / Sensitivity Analysis
2465
+
2466
+ The `HonestDiD` module implements sensitivity analysis methods for relaxing the parallel trends assumption:
2467
+
2468
+ - **Rambachan, A., & Roth, J. (2023).** "A More Credible Approach to Parallel Trends." *The Review of Economic Studies*, 90(5), 2555-2591. [https://doi.org/10.1093/restud/rdad018](https://doi.org/10.1093/restud/rdad018)
2469
+
2470
+ This paper introduces the "Honest DiD" framework implemented in our `HonestDiD` class:
2471
+ - **Relative Magnitudes (ΔRM)**: Bounds post-treatment violations by a multiple of observed pre-treatment violations
2472
+ - **Smoothness (ΔSD)**: Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends
2473
+ - **Breakdown Analysis**: Finding the smallest violation magnitude that would overturn conclusions
2474
+ - **Robust Confidence Intervals**: Valid inference under partial identification
2475
+
2476
+ - **Roth, J., & Sant'Anna, P. H. C. (2023).** "When Is Parallel Trends Sensitive to Functional Form?" *Econometrica*, 91(2), 737-747. [https://doi.org/10.3982/ECTA19402](https://doi.org/10.3982/ECTA19402)
2477
+
2478
+ Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate.
2479
+
2480
+ ### Multi-Period and Staggered Adoption
2481
+
2482
+ - **Callaway, B., & Sant'Anna, P. H. C. (2021).** "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*, 225(2), 200-230. [https://doi.org/10.1016/j.jeconom.2020.12.001](https://doi.org/10.1016/j.jeconom.2020.12.001)
2483
+
2484
+ - **Sant'Anna, P. H. C., & Zhao, J. (2020).** "Doubly Robust Difference-in-Differences Estimators." *Journal of Econometrics*, 219(1), 101-122. [https://doi.org/10.1016/j.jeconom.2020.06.003](https://doi.org/10.1016/j.jeconom.2020.06.003)
2485
+
2486
+ - **Sun, L., & Abraham, S. (2021).** "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." *Journal of Econometrics*, 225(2), 175-199. [https://doi.org/10.1016/j.jeconom.2020.09.006](https://doi.org/10.1016/j.jeconom.2020.09.006)
2487
+
2488
+ - **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. [https://doi.org/10.1257/aer.20181169](https://doi.org/10.1257/aer.20181169)
2489
+
2490
+ - **Goodman-Bacon, A. (2021).** "Difference-in-Differences with Variation in Treatment Timing." *Journal of Econometrics*, 225(2), 254-277. [https://doi.org/10.1016/j.jeconom.2021.03.014](https://doi.org/10.1016/j.jeconom.2021.03.014)
2491
+
2492
+ ### Power Analysis
2493
+
2494
+ - **Bloom, H. S. (1995).** "Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs." *Evaluation Review*, 19(5), 547-556. [https://doi.org/10.1177/0193841X9501900504](https://doi.org/10.1177/0193841X9501900504)
2495
+
2496
+ - **Burlig, F., Preonas, L., & Woerman, M. (2020).** "Panel Data and Experimental Design." *Journal of Development Economics*, 144, 102458. [https://doi.org/10.1016/j.jdeveco.2020.102458](https://doi.org/10.1016/j.jdeveco.2020.102458)
2497
+
2498
+ Essential reference for power analysis in panel DiD designs. Discusses how serial correlation (ICC) affects power and provides formulas for panel data settings.
2499
+
2500
+ - **Djimeu, E. W., & Houndolo, D.-G. (2016).** "Power Calculation for Causal Inference in Social Science: Sample Size and Minimum Detectable Effect Determination." *Journal of Development Effectiveness*, 8(4), 508-527. [https://doi.org/10.1080/19439342.2016.1244555](https://doi.org/10.1080/19439342.2016.1244555)
2501
+
2502
+ ### General Causal Inference
2503
+
2504
+ - **Imbens, G. W., & Rubin, D. B. (2015).** *Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction*. Cambridge University Press.
2505
+
2506
+ - **Cunningham, S. (2021).** *Causal Inference: The Mixtape*. Yale University Press. [https://mixtape.scunning.com/](https://mixtape.scunning.com/)
2507
+
2508
+ ## License
2509
+
2510
+ MIT License
2511
+