diffindiff 2.2.5__tar.gz → 2.2.7__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {diffindiff-2.2.5 → diffindiff-2.2.7}/PKG-INFO +10 -9
- {diffindiff-2.2.5 → diffindiff-2.2.7}/README.md +8 -7
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/config.py +8 -5
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/didanalysis.py +14 -9
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/didanalysis_helper.py +11 -8
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/diddata.py +199 -43
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/didtools.py +76 -17
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/tests/tests_diffindiff.py +3 -3
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff.egg-info/PKG-INFO +10 -9
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff.egg-info/requires.txt +0 -1
- {diffindiff-2.2.5 → diffindiff-2.2.7}/setup.py +3 -4
- {diffindiff-2.2.5 → diffindiff-2.2.7}/MANIFEST.in +0 -0
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/__init__.py +0 -0
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/tests/__init__.py +0 -0
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/tests/data/Corona_Hesse.xlsx +0 -0
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/tests/data/counties_DE.csv +0 -0
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/tests/data/curfew_DE.csv +0 -0
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff.egg-info/SOURCES.txt +0 -0
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff.egg-info/dependency_links.txt +0 -0
- {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff.egg-info/top_level.txt +0 -0
- {diffindiff-2.2.5 → diffindiff-2.2.7}/setup.cfg +0 -0
|
@@ -1,12 +1,12 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: diffindiff
|
|
3
|
-
Version: 2.2.
|
|
4
|
-
Summary: diffindiff: Python library for convenient Difference-in-Differences
|
|
3
|
+
Version: 2.2.7
|
|
4
|
+
Summary: diffindiff: Python library for convenient Difference-in-Differences analyses
|
|
5
5
|
Author: Thomas Wieland
|
|
6
6
|
Author-email: geowieland@googlemail.com
|
|
7
7
|
Description-Content-Type: text/markdown
|
|
8
8
|
|
|
9
|
-
# diffindiff:
|
|
9
|
+
# diffindiff: Python library for convenient Difference-in-Differences analyses
|
|
10
10
|
|
|
11
11
|
This Python library is designed for performing Difference-in-Differences (DiD) analyses in a convenient way. It allows users to construct datasets, define treatment and control groups, and set treatment periods. DiD model analyses may be conducted with both datasets created by built-in functions and ready-to-use external datasets. Both simultaneous and staggered adoption are supported. The library allows for various extensions, such as two-way fixed effects models, group- or individual-specific effects, post-treatment periods, and triple-difference estimations. Additionally, it includes functions for visualizing results, such as plotting DiD coefficients with confidence intervals and illustrating the temporal evolution of staggered treatments. Furthermore, several functions for rigorous treatment setting and data diagnostics are incorporated.
|
|
12
12
|
|
|
@@ -20,14 +20,14 @@ Thomas Wieland [ORCID](https://orcid.org/0000-0001-5168-9846) [EMail](mailto:geo
|
|
|
20
20
|
|
|
21
21
|
- 📦 PyPI: [diffindiff](https://pypi.org/project/diffindiff/)
|
|
22
22
|
- 💻 GitHub Repository: [diffindiff_official](https://github.com/geowieland/diffindiff_official)
|
|
23
|
-
- 📄 DOI (Zenodo): [10.5281/zenodo.
|
|
23
|
+
- 📄 DOI (Zenodo): [10.5281/zenodo.18656820](https://doi.org/10.5281/zenodo.18656820)
|
|
24
24
|
|
|
25
25
|
|
|
26
26
|
## Citation
|
|
27
27
|
|
|
28
28
|
If you use this software, please cite:
|
|
29
29
|
|
|
30
|
-
Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.
|
|
30
|
+
Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.7) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.18656820
|
|
31
31
|
|
|
32
32
|
|
|
33
33
|
## Installation
|
|
@@ -167,8 +167,9 @@ See the /tests directory for usage examples of most of the included functions.
|
|
|
167
167
|
- Wooldridge JM (2012) *Introductory Econometrics. A Modern Approach*.
|
|
168
168
|
|
|
169
169
|
|
|
170
|
-
## What's new (v2.2.
|
|
170
|
+
## What's new (v2.2.7)
|
|
171
|
+
- Functions
|
|
172
|
+
- diddata.DiffData.define_treatment() for constructing a new treatment from a column in the dataframe
|
|
171
173
|
- Bugfixes:
|
|
172
|
-
-
|
|
173
|
-
-
|
|
174
|
-
- Update README
|
|
174
|
+
- didtools.treatment_times() and didtools.is_multiple_treatment_period() now also identify continuous treatments correctly
|
|
175
|
+
- Fixed problematic type conversion in didtools.fit_metrics()
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
# diffindiff:
|
|
1
|
+
# diffindiff: Python library for convenient Difference-in-Differences analyses
|
|
2
2
|
|
|
3
3
|
This Python library is designed for performing Difference-in-Differences (DiD) analyses in a convenient way. It allows users to construct datasets, define treatment and control groups, and set treatment periods. DiD model analyses may be conducted with both datasets created by built-in functions and ready-to-use external datasets. Both simultaneous and staggered adoption are supported. The library allows for various extensions, such as two-way fixed effects models, group- or individual-specific effects, post-treatment periods, and triple-difference estimations. Additionally, it includes functions for visualizing results, such as plotting DiD coefficients with confidence intervals and illustrating the temporal evolution of staggered treatments. Furthermore, several functions for rigorous treatment setting and data diagnostics are incorporated.
|
|
4
4
|
|
|
@@ -12,14 +12,14 @@ Thomas Wieland [ORCID](https://orcid.org/0000-0001-5168-9846) [EMail](mailto:geo
|
|
|
12
12
|
|
|
13
13
|
- 📦 PyPI: [diffindiff](https://pypi.org/project/diffindiff/)
|
|
14
14
|
- 💻 GitHub Repository: [diffindiff_official](https://github.com/geowieland/diffindiff_official)
|
|
15
|
-
- 📄 DOI (Zenodo): [10.5281/zenodo.
|
|
15
|
+
- 📄 DOI (Zenodo): [10.5281/zenodo.18656820](https://doi.org/10.5281/zenodo.18656820)
|
|
16
16
|
|
|
17
17
|
|
|
18
18
|
## Citation
|
|
19
19
|
|
|
20
20
|
If you use this software, please cite:
|
|
21
21
|
|
|
22
|
-
Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.
|
|
22
|
+
Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.7) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.18656820
|
|
23
23
|
|
|
24
24
|
|
|
25
25
|
## Installation
|
|
@@ -159,8 +159,9 @@ See the /tests directory for usage examples of most of the included functions.
|
|
|
159
159
|
- Wooldridge JM (2012) *Introductory Econometrics. A Modern Approach*.
|
|
160
160
|
|
|
161
161
|
|
|
162
|
-
## What's new (v2.2.
|
|
162
|
+
## What's new (v2.2.7)
|
|
163
|
+
- Functions
|
|
164
|
+
- diddata.DiffData.define_treatment() for constructing a new treatment from a column in the dataframe
|
|
163
165
|
- Bugfixes:
|
|
164
|
-
-
|
|
165
|
-
-
|
|
166
|
-
- Update README
|
|
166
|
+
- didtools.treatment_times() and didtools.is_multiple_treatment_period() now also identify continuous treatments correctly
|
|
167
|
+
- Fixed problematic type conversion in didtools.fit_metrics()
|
|
@@ -4,22 +4,25 @@
|
|
|
4
4
|
# Author: Thomas Wieland
|
|
5
5
|
# ORCID: 0000-0001-5168-9846
|
|
6
6
|
# mail: geowieland@googlemail.com
|
|
7
|
-
# Version: 1.0.
|
|
8
|
-
# Last update:
|
|
9
|
-
# Copyright (c) 2025 Thomas Wieland
|
|
7
|
+
# Version: 1.0.6
|
|
8
|
+
# Last update: 2026-02-26 18:04
|
|
9
|
+
# Copyright (c) 2025-2026 Thomas Wieland
|
|
10
10
|
#-----------------------------------------------------------------------
|
|
11
11
|
|
|
12
12
|
# Basic config:
|
|
13
13
|
|
|
14
|
-
|
|
14
|
+
PACKAGE_NAME = "diffindiff"
|
|
15
|
+
PACKAGE_VERSION = "2.2.7"
|
|
15
16
|
|
|
16
|
-
VERBOSE =
|
|
17
|
+
VERBOSE = True
|
|
17
18
|
|
|
18
19
|
ROUND_STATISTIC = 3
|
|
19
20
|
ROUND_PERCENT = 2
|
|
20
21
|
|
|
21
22
|
AUTO_SWITCH_TO_PREPOST = True
|
|
22
23
|
|
|
24
|
+
ACCEPT_CONTINUOUS_TREATMENTS = True
|
|
25
|
+
|
|
23
26
|
# Description texts:
|
|
24
27
|
|
|
25
28
|
DID_DESCRIPTION = "Difference-in-Differences Analysis"
|
|
@@ -4,15 +4,14 @@
|
|
|
4
4
|
# Author: Thomas Wieland
|
|
5
5
|
# ORCID: 0000-0001-5168-9846
|
|
6
6
|
# mail: geowieland@googlemail.com
|
|
7
|
-
# Version: 2.2.
|
|
8
|
-
# Last update:
|
|
9
|
-
# Copyright (c)
|
|
7
|
+
# Version: 2.2.4
|
|
8
|
+
# Last update: 2026-02-26 18:04
|
|
9
|
+
# Copyright (c) 2024-2026 Thomas Wieland
|
|
10
10
|
#-----------------------------------------------------------------------
|
|
11
11
|
|
|
12
12
|
|
|
13
13
|
import pandas as pd
|
|
14
14
|
import numpy as np
|
|
15
|
-
from math import isnan
|
|
16
15
|
import matplotlib.pyplot as plt
|
|
17
16
|
from matplotlib.dates import DateFormatter
|
|
18
17
|
import diffindiff.didtools as tools
|
|
@@ -930,7 +929,7 @@ class DiffModel:
|
|
|
930
929
|
if "TG" in plot_intervals_groups and "CG" in plot_intervals_groups:
|
|
931
930
|
lines_labels_required = lines_labels_required+2
|
|
932
931
|
assert len(lines_col) == lines_col_required, f"Parameter 'lines_col' must be a list with {lines_col_required} entries"
|
|
933
|
-
assert len(lines_style) == lines_style_required, f"Parameter 'lines_style' must be a list with {
|
|
932
|
+
assert len(lines_style) == lines_style_required, f"Parameter 'lines_style' must be a list with {lines_style_required} entries"
|
|
934
933
|
assert len(lines_labels) == lines_labels_required, f"Parameter 'lines_labels' must be a list with {lines_labels_required} entries"
|
|
935
934
|
|
|
936
935
|
model_data = self.data[2]
|
|
@@ -1357,8 +1356,8 @@ def did_analysis(
|
|
|
1357
1356
|
missing_replace_by_zero: bool = False,
|
|
1358
1357
|
fit_by = "ols_fit",
|
|
1359
1358
|
verbose: bool = config.VERBOSE
|
|
1360
|
-
):
|
|
1361
|
-
|
|
1359
|
+
):
|
|
1360
|
+
|
|
1362
1361
|
tools.check_columns(
|
|
1363
1362
|
df = data,
|
|
1364
1363
|
columns = [
|
|
@@ -1385,6 +1384,12 @@ def did_analysis(
|
|
|
1385
1384
|
verbose = verbose
|
|
1386
1385
|
)
|
|
1387
1386
|
|
|
1387
|
+
tools.is_numeric(
|
|
1388
|
+
df = data,
|
|
1389
|
+
columns = treatment_col,
|
|
1390
|
+
verbose = verbose
|
|
1391
|
+
)
|
|
1392
|
+
|
|
1388
1393
|
cols_relevant = [
|
|
1389
1394
|
unit_col,
|
|
1390
1395
|
time_col,
|
|
@@ -1808,7 +1813,7 @@ def did_analysis(
|
|
|
1808
1813
|
}
|
|
1809
1814
|
|
|
1810
1815
|
if bonferroni:
|
|
1811
|
-
confint_alpha = confint_alpha/no_treatments
|
|
1816
|
+
confint_alpha = confint_alpha/no_treatments
|
|
1812
1817
|
|
|
1813
1818
|
if fit_by == "ml":
|
|
1814
1819
|
fit_result = helper.ml_fit(
|
|
@@ -1825,7 +1830,7 @@ def did_analysis(
|
|
|
1825
1830
|
cluster_SE_by = cluster_SE_by,
|
|
1826
1831
|
verbose = verbose
|
|
1827
1832
|
)
|
|
1828
|
-
|
|
1833
|
+
|
|
1829
1834
|
model_results = helper.extract_model_results(
|
|
1830
1835
|
fit_result = fit_result,
|
|
1831
1836
|
TG_col = TG_col,
|
|
@@ -4,9 +4,9 @@
|
|
|
4
4
|
# Author: Thomas Wieland
|
|
5
5
|
# ORCID: 0000-0001-5168-9846
|
|
6
6
|
# mail: geowieland@googlemail.com
|
|
7
|
-
# Version: 1.0.
|
|
8
|
-
# Last update: 2025-
|
|
9
|
-
# Copyright (c) 2025 Thomas Wieland
|
|
7
|
+
# Version: 1.0.7
|
|
8
|
+
# Last update: 2025-02-26 18:02
|
|
9
|
+
# Copyright (c) 2025-2026 Thomas Wieland
|
|
10
10
|
#-----------------------------------------------------------------------
|
|
11
11
|
|
|
12
12
|
import pandas as pd
|
|
@@ -203,7 +203,7 @@ def create_spillover(
|
|
|
203
203
|
time_col = time_col,
|
|
204
204
|
treatment_col = treatment,
|
|
205
205
|
create_TT_col = TT_col,
|
|
206
|
-
verbose =
|
|
206
|
+
verbose = False
|
|
207
207
|
)[0]
|
|
208
208
|
|
|
209
209
|
sp_unit_col = f"{config.SPILLOVER_UNIT_PREFIX}{config.DELIMITER}{treatment}"
|
|
@@ -396,7 +396,11 @@ def treatment_diagnostics(
|
|
|
396
396
|
)
|
|
397
397
|
|
|
398
398
|
if verbose:
|
|
399
|
-
|
|
399
|
+
|
|
400
|
+
if no_treatments > 1:
|
|
401
|
+
print(f"There are {no_treatments} treatments (simultaneous: {no_treatments-staggered_count}, staggered: {staggered_count}) with {untreated[0]} treated and {untreated[1]} untreated units.")
|
|
402
|
+
else:
|
|
403
|
+
print(f"There is {no_treatments} treatment (staggered: {staggered_count}) with {untreated[0]} treated and {untreated[1]} untreated units.")
|
|
400
404
|
|
|
401
405
|
return [
|
|
402
406
|
treatment_diagnostics_results,
|
|
@@ -918,10 +922,9 @@ def create_timestamp(function):
|
|
|
918
922
|
now = datetime.now()
|
|
919
923
|
|
|
920
924
|
timestamp_dict = {
|
|
921
|
-
"package_version": f"
|
|
925
|
+
"package_version": f"{config.PACKAGE_NAME} {config.PACKAGE_VERSION}",
|
|
922
926
|
"function": function,
|
|
923
927
|
"datetime": now.strftime("%Y-%m-%d %H-%M-%S")
|
|
924
928
|
}
|
|
925
929
|
|
|
926
|
-
return timestamp_dict
|
|
927
|
-
|
|
930
|
+
return timestamp_dict
|
|
@@ -4,9 +4,9 @@
|
|
|
4
4
|
# Author: Thomas Wieland
|
|
5
5
|
# ORCID: 0000-0001-5168-9846
|
|
6
6
|
# mail: geowieland@googlemail.com
|
|
7
|
-
# Version: 2.1.
|
|
8
|
-
# Last update:
|
|
9
|
-
# Copyright (c)
|
|
7
|
+
# Version: 2.1.8
|
|
8
|
+
# Last update: 2026-02-26 18:30
|
|
9
|
+
# Copyright (c) 2024-2026 Thomas Wieland
|
|
10
10
|
#-----------------------------------------------------------------------
|
|
11
11
|
|
|
12
12
|
|
|
@@ -76,29 +76,34 @@ class DiffGroups:
|
|
|
76
76
|
verbose: bool = config.VERBOSE
|
|
77
77
|
):
|
|
78
78
|
|
|
79
|
-
groups_config = self.data[1]
|
|
79
|
+
groups_config = self.data[1]
|
|
80
80
|
|
|
81
81
|
if groups_config["DDD"]:
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
82
|
+
|
|
83
|
+
print("DiffGroups object already includes a benefit group. No segmentation added.")
|
|
84
|
+
|
|
85
|
+
groups = self
|
|
86
|
+
|
|
87
|
+
else:
|
|
86
88
|
|
|
87
|
-
|
|
89
|
+
if verbose:
|
|
90
|
+
print(f"Adding benefit group with {len(group_benefit)} units to groups data", end = " ... ")
|
|
91
|
+
|
|
92
|
+
groups_data = self.data[0]
|
|
93
|
+
|
|
94
|
+
groups_data[config.BG_COL] = 0
|
|
95
|
+
groups_data.loc[groups_data[config.UNIT_COL].astype(str).isin(group_benefit), config.BG_COL] = 1
|
|
96
|
+
|
|
97
|
+
groups_config["DDD"] = True
|
|
88
98
|
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
99
|
+
groups = DiffGroups(
|
|
100
|
+
groups_data,
|
|
101
|
+
groups_config,
|
|
102
|
+
timestamp = helper.create_timestamp(function="add_segmentation")
|
|
103
|
+
)
|
|
93
104
|
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
groups_config,
|
|
97
|
-
timestamp = helper.create_timestamp(function="add_segmentation")
|
|
98
|
-
)
|
|
99
|
-
|
|
100
|
-
if verbose:
|
|
101
|
-
print("OK")
|
|
105
|
+
if verbose:
|
|
106
|
+
print("OK")
|
|
102
107
|
|
|
103
108
|
return groups
|
|
104
109
|
|
|
@@ -255,9 +260,16 @@ def create_treatment(
|
|
|
255
260
|
after_treatment_period: bool = False,
|
|
256
261
|
verbose = config.VERBOSE
|
|
257
262
|
):
|
|
263
|
+
|
|
264
|
+
check_dates = tools.check_date_format(
|
|
265
|
+
dates = study_period+treatment_period,
|
|
266
|
+
date_format = date_format
|
|
267
|
+
)
|
|
268
|
+
if check_dates[0]:
|
|
269
|
+
raise ValueError(f"Study and/or treatment period include invalid dates: {', '.join(check_dates[1])}.")
|
|
258
270
|
|
|
259
271
|
TT_col = config.TT_COL
|
|
260
|
-
|
|
272
|
+
|
|
261
273
|
if treatment_name is not None:
|
|
262
274
|
|
|
263
275
|
if not isinstance(treatment_name, str):
|
|
@@ -474,7 +486,7 @@ class DiffData:
|
|
|
474
486
|
variables: list = None,
|
|
475
487
|
unit_col: str = None,
|
|
476
488
|
time_col: str = None,
|
|
477
|
-
verbose: bool =
|
|
489
|
+
verbose: bool = False
|
|
478
490
|
):
|
|
479
491
|
|
|
480
492
|
if unit_col is None and time_col is None:
|
|
@@ -567,6 +579,7 @@ class DiffData:
|
|
|
567
579
|
|
|
568
580
|
self.data[0] = did_modeldata
|
|
569
581
|
self.data[5] = variables
|
|
582
|
+
self.data[7][len(self.data[7])] = helper.create_timestamp(function="add_covariates")
|
|
570
583
|
|
|
571
584
|
if verbose:
|
|
572
585
|
print("OK")
|
|
@@ -610,7 +623,6 @@ class DiffData:
|
|
|
610
623
|
groups_data_old = did_groups_old.get_data()
|
|
611
624
|
|
|
612
625
|
did_modeldata_old = self.get_did_modeldata_df()
|
|
613
|
-
unit_id_col, time_col = self.get_unit_time_cols()
|
|
614
626
|
outcome_col_original = self.data[3]
|
|
615
627
|
unit_time_col_original = self.get_unit_time_cols()
|
|
616
628
|
covariates = self.get_covariates()
|
|
@@ -716,21 +728,157 @@ class DiffData:
|
|
|
716
728
|
timestamp = helper.create_timestamp(function="add_treatment")
|
|
717
729
|
)
|
|
718
730
|
|
|
719
|
-
|
|
720
|
-
|
|
721
|
-
|
|
722
|
-
|
|
723
|
-
|
|
724
|
-
|
|
725
|
-
|
|
726
|
-
|
|
727
|
-
|
|
731
|
+
if verbose:
|
|
732
|
+
print("OK")
|
|
733
|
+
|
|
734
|
+
self.data[0] = did_modeldata_new
|
|
735
|
+
self.data[1] = groups_new
|
|
736
|
+
self.data[2] = treatment_new
|
|
737
|
+
self.data[3] = outcome_col_original
|
|
738
|
+
self.data[4] = unit_time_col_original
|
|
739
|
+
self.data[5] = covariates
|
|
740
|
+
self.data[6] = treatment_cols_new
|
|
741
|
+
self.data[7][len(self.data[7])] = helper.create_timestamp(function="add_treatment")
|
|
742
|
+
|
|
743
|
+
return self
|
|
744
|
+
|
|
745
|
+
def define_treatment(
|
|
746
|
+
self,
|
|
747
|
+
treatment_name,
|
|
748
|
+
after_treatment_period: bool = False,
|
|
749
|
+
after_treatment_name = None,
|
|
750
|
+
verbose: bool = config.VERBOSE
|
|
751
|
+
):
|
|
752
|
+
|
|
753
|
+
if not treatment_name:
|
|
754
|
+
raise ValueError("When adding a treatment from the data, you need to specify a treatment column with parameter treament_name = [your_treatment].")
|
|
755
|
+
|
|
756
|
+
if treatment_name not in self.get_did_modeldata_df().columns:
|
|
757
|
+
raise KeyError(f"Column '{treatment_name}' not in data frame")
|
|
758
|
+
|
|
759
|
+
did_treatment_old = self.get_did_treatment()
|
|
760
|
+
treatment_config_old = did_treatment_old.get_config()
|
|
761
|
+
treatment_meta_old = did_treatment_old.get_metadata()
|
|
762
|
+
no_treatments_old = treatment_meta_old["no_treatments"]
|
|
763
|
+
|
|
764
|
+
did_groups_old = self.get_did_groups()
|
|
765
|
+
groups_config_old = did_groups_old.get_config()
|
|
766
|
+
groups_data_old = did_groups_old.get_data()
|
|
767
|
+
|
|
768
|
+
did_modeldata_old = self.get_did_modeldata_df()
|
|
769
|
+
outcome_col_original = self.data[3]
|
|
770
|
+
unit_time_col_original = self.get_unit_time_cols()
|
|
771
|
+
covariates = self.get_covariates()
|
|
772
|
+
|
|
773
|
+
treatment_cols = self.get_treatment_cols()
|
|
774
|
+
treatment_cols_new = treatment_cols
|
|
775
|
+
|
|
776
|
+
no_treatments = no_treatments_old+1
|
|
777
|
+
key_counter = no_treatments-1
|
|
778
|
+
|
|
779
|
+
tt = tools.treatment_times(
|
|
780
|
+
data = did_modeldata_old,
|
|
781
|
+
unit_col=config.UNIT_COL,
|
|
782
|
+
time_col=config.TIME_COL,
|
|
783
|
+
treatment_col=treatment_name,
|
|
784
|
+
verbose=verbose
|
|
785
|
+
)
|
|
786
|
+
|
|
787
|
+
tt_date = [datetime.strptime(t, treatment_meta_old["date_format"]) for t in tt[1]]
|
|
788
|
+
treatment_period_start = min(tt_date)
|
|
789
|
+
treatment_period_end = max(tt_date)
|
|
790
|
+
treatment_period_start = treatment_period_start.strftime("%Y-%m-%d")
|
|
791
|
+
treatment_period_end = treatment_period_end.strftime("%Y-%m-%d")
|
|
792
|
+
|
|
793
|
+
is_notreatment_result = tools.is_notreatment(
|
|
794
|
+
data = did_modeldata_old,
|
|
795
|
+
unit_col=config.UNIT_COL,
|
|
796
|
+
treatment_col=treatment_name,
|
|
797
|
+
verbose = verbose
|
|
728
798
|
)
|
|
799
|
+
|
|
800
|
+
treatment_group = is_notreatment_result[1]
|
|
801
|
+
control_group = is_notreatment_result[2]
|
|
802
|
+
|
|
803
|
+
if verbose:
|
|
804
|
+
print(f"Constructing treatment from column '{treatment_name}'", end = " ... ")
|
|
805
|
+
|
|
806
|
+
new_groups = create_groups(
|
|
807
|
+
treatment_group = treatment_group,
|
|
808
|
+
control_group = control_group,
|
|
809
|
+
treatment_name = treatment_name,
|
|
810
|
+
verbose=False
|
|
811
|
+
)
|
|
812
|
+
new_groups_data_df = new_groups.get_data()[0]
|
|
813
|
+
new_groups_config = new_groups.get_config()
|
|
814
|
+
TG_col = new_groups_config[0]["TG_col"]
|
|
815
|
+
|
|
816
|
+
new_treatment = create_treatment(
|
|
817
|
+
study_period = [treatment_meta_old["study_period_start"], treatment_meta_old["study_period_end"]],
|
|
818
|
+
treatment_period = [treatment_period_start, treatment_period_end],
|
|
819
|
+
freq = treatment_meta_old["frequency"],
|
|
820
|
+
date_format = treatment_meta_old["date_format"],
|
|
821
|
+
treatment_name = treatment_name,
|
|
822
|
+
pre_post = treatment_meta_old["pre_post"],
|
|
823
|
+
after_treatment_period = after_treatment_period,
|
|
824
|
+
verbose=False
|
|
825
|
+
)
|
|
826
|
+
|
|
827
|
+
new_treatment_data_df = new_treatment.get_data()
|
|
828
|
+
|
|
829
|
+
new_treatment_config = new_treatment.get_config()
|
|
830
|
+
TT_col = new_treatment_config[0]["TT_col"]
|
|
831
|
+
ATT_col = new_treatment_config[0]["ATT_col"]
|
|
832
|
+
|
|
833
|
+
treatment_cols_new[key_counter] = {
|
|
834
|
+
"TT_col": TT_col,
|
|
835
|
+
"ATT_col": ATT_col,
|
|
836
|
+
"treatment_name": treatment_name,
|
|
837
|
+
"after_treatment_name": after_treatment_name
|
|
838
|
+
}
|
|
839
|
+
|
|
840
|
+
groups_config_new = groups_config_old
|
|
841
|
+
groups_config_new[key_counter] = new_groups_config[0]
|
|
842
|
+
groups_data_new = groups_data_old
|
|
843
|
+
groups_data_old.append(new_groups_data_df)
|
|
844
|
+
groups_new = DiffGroups(
|
|
845
|
+
groups_data_new,
|
|
846
|
+
groups_config_new,
|
|
847
|
+
timestamp = helper.create_timestamp(function="define_treatment")
|
|
848
|
+
)
|
|
849
|
+
|
|
850
|
+
treatment_meta_new = treatment_meta_old
|
|
851
|
+
treatment_meta_new["no_treatments"] = no_treatments
|
|
852
|
+
treatment_config_new = treatment_config_old
|
|
853
|
+
treatment_config_new[key_counter] = new_treatment_config[0]
|
|
854
|
+
|
|
855
|
+
treatment_new = DiffTreatment(
|
|
856
|
+
new_treatment_data_df,
|
|
857
|
+
treatment_config_new,
|
|
858
|
+
treatment_meta_new,
|
|
859
|
+
timestamp = helper.create_timestamp(function="define_treatment")
|
|
860
|
+
)
|
|
729
861
|
|
|
730
862
|
if verbose:
|
|
731
863
|
print("OK")
|
|
732
|
-
|
|
733
|
-
|
|
864
|
+
|
|
865
|
+
if treatment_name in covariates:
|
|
866
|
+
|
|
867
|
+
if verbose:
|
|
868
|
+
print(f"NOTE: Column '{treatment_name}' was defined as covariate before and is now removed from covariates list.")
|
|
869
|
+
|
|
870
|
+
covariates.remove(treatment_name)
|
|
871
|
+
|
|
872
|
+
self.data[0] = did_modeldata_old
|
|
873
|
+
self.data[1] = groups_new
|
|
874
|
+
self.data[2] = treatment_new
|
|
875
|
+
self.data[3] = outcome_col_original
|
|
876
|
+
self.data[4] = unit_time_col_original
|
|
877
|
+
self.data[5] = covariates
|
|
878
|
+
self.data[6] = treatment_cols_new
|
|
879
|
+
self.data[7][len(self.data[7])] = helper.create_timestamp(function="define_treatment")
|
|
880
|
+
|
|
881
|
+
return self
|
|
734
882
|
|
|
735
883
|
def add_segmentation(
|
|
736
884
|
self,
|
|
@@ -967,8 +1115,8 @@ class DiffData:
|
|
|
967
1115
|
if value["after_treatment_name"] is not None:
|
|
968
1116
|
after_treatment_col[key] = value["after_treatment_name"]
|
|
969
1117
|
if value["ATT_col"] is not None:
|
|
970
|
-
ATT_col[key] = value["ATT_col"]
|
|
971
|
-
|
|
1118
|
+
ATT_col[key] = value["ATT_col"]
|
|
1119
|
+
|
|
972
1120
|
did_results = didanalysis.did_analysis(
|
|
973
1121
|
data = did_modeldata,
|
|
974
1122
|
TG_col = TG_col,
|
|
@@ -1016,9 +1164,15 @@ def merge_data(
|
|
|
1016
1164
|
keep_columns: bool = False,
|
|
1017
1165
|
verbose: bool = config.VERBOSE
|
|
1018
1166
|
):
|
|
1019
|
-
|
|
1020
|
-
|
|
1021
|
-
|
|
1167
|
+
|
|
1168
|
+
tools.check_columns(
|
|
1169
|
+
df = outcome_data,
|
|
1170
|
+
columns = [
|
|
1171
|
+
unit_id_col,
|
|
1172
|
+
time_col,
|
|
1173
|
+
outcome_col
|
|
1174
|
+
]
|
|
1175
|
+
)
|
|
1022
1176
|
|
|
1023
1177
|
groups_data_df = diff_groups.get_data()
|
|
1024
1178
|
groups_data_df = groups_data_df[0]
|
|
@@ -1075,6 +1229,9 @@ def merge_data(
|
|
|
1075
1229
|
verbose=verbose
|
|
1076
1230
|
)
|
|
1077
1231
|
|
|
1232
|
+
if verbose:
|
|
1233
|
+
print("Merging groups and treatment data", end = " ... ")
|
|
1234
|
+
|
|
1078
1235
|
if keep_columns:
|
|
1079
1236
|
outcome_data_short = outcome_data
|
|
1080
1237
|
else:
|
|
@@ -1108,7 +1265,8 @@ def merge_data(
|
|
|
1108
1265
|
}
|
|
1109
1266
|
}
|
|
1110
1267
|
|
|
1111
|
-
timestamp =
|
|
1268
|
+
timestamp = {}
|
|
1269
|
+
timestamp[0] = helper.create_timestamp(function="merge_data")
|
|
1112
1270
|
|
|
1113
1271
|
did_data_all = DiffData(
|
|
1114
1272
|
did_modeldata,
|
|
@@ -1175,8 +1333,6 @@ def create_data(
|
|
|
1175
1333
|
verbose = verbose
|
|
1176
1334
|
)
|
|
1177
1335
|
|
|
1178
|
-
did_data_all.timestamp = helper.create_timestamp(function="create_data")
|
|
1179
|
-
|
|
1180
1336
|
return did_data_all
|
|
1181
1337
|
|
|
1182
1338
|
def create_counterfactual(
|
|
@@ -4,15 +4,16 @@
|
|
|
4
4
|
# Author: Thomas Wieland
|
|
5
5
|
# ORCID: 0000-0001-5168-9846
|
|
6
6
|
# mail: geowieland@googlemail.com
|
|
7
|
-
# Version: 2.1.
|
|
8
|
-
# Last update:
|
|
9
|
-
# Copyright (c) 2025 Thomas Wieland
|
|
7
|
+
# Version: 2.1.6
|
|
8
|
+
# Last update: 2026-02-26 18:33
|
|
9
|
+
# Copyright (c) 2025-2026 Thomas Wieland
|
|
10
10
|
#-----------------------------------------------------------------------
|
|
11
11
|
|
|
12
12
|
|
|
13
13
|
import pandas as pd
|
|
14
14
|
import numpy as np
|
|
15
15
|
import re
|
|
16
|
+
from datetime import datetime
|
|
16
17
|
from collections.abc import Iterable
|
|
17
18
|
from statsmodels.formula.api import ols
|
|
18
19
|
from sklearn.ensemble import BaggingRegressor, RandomForestRegressor, GradientBoostingRegressor
|
|
@@ -23,7 +24,6 @@ from xgboost import XGBRegressor
|
|
|
23
24
|
from lightgbm import LGBMRegressor
|
|
24
25
|
from sklearn.linear_model import LinearRegression
|
|
25
26
|
from sklearn.model_selection import train_test_split
|
|
26
|
-
from huff.goodness_of_fit import modelfit, modelfit_cat, modelfit_plot
|
|
27
27
|
import diffindiff.config as config
|
|
28
28
|
|
|
29
29
|
|
|
@@ -46,6 +46,30 @@ def check_columns(
|
|
|
46
46
|
if missing_columns:
|
|
47
47
|
raise KeyError(f"Data do not contain column(s): {', '.join(missing_columns)}")
|
|
48
48
|
|
|
49
|
+
def is_numeric(
|
|
50
|
+
df: pd.DataFrame,
|
|
51
|
+
columns: list,
|
|
52
|
+
verbose: bool = config.VERBOSE
|
|
53
|
+
):
|
|
54
|
+
|
|
55
|
+
if len(columns) > 0:
|
|
56
|
+
|
|
57
|
+
if verbose:
|
|
58
|
+
print(f"Checking if column(s) {', '.join(columns)} are numeric", end=" ... ")
|
|
59
|
+
|
|
60
|
+
non_numeric_columns = []
|
|
61
|
+
|
|
62
|
+
for col in columns:
|
|
63
|
+
|
|
64
|
+
if not pd.api.types.is_numeric_dtype(df[col]):
|
|
65
|
+
non_numeric_columns.append(col)
|
|
66
|
+
|
|
67
|
+
if verbose:
|
|
68
|
+
print("OK")
|
|
69
|
+
|
|
70
|
+
if non_numeric_columns:
|
|
71
|
+
raise KeyError(f"Data contain non-numeric column(s): {', '.join(non_numeric_columns)}")
|
|
72
|
+
|
|
49
73
|
def panel_index(
|
|
50
74
|
data: pd.DataFrame,
|
|
51
75
|
unit_col: str,
|
|
@@ -527,8 +551,11 @@ def is_multiple_treatment_period(
|
|
|
527
551
|
unit_treatment = data_sub[treatment_col]
|
|
528
552
|
|
|
529
553
|
groups = (unit_treatment != unit_treatment.shift()).cumsum()
|
|
530
|
-
|
|
531
|
-
|
|
554
|
+
|
|
555
|
+
if config.ACCEPT_CONTINUOUS_TREATMENTS:
|
|
556
|
+
periods_count = (unit_treatment > 0).groupby(groups).any().sum()
|
|
557
|
+
else:
|
|
558
|
+
periods_count = (unit_treatment == 1).groupby(groups).any().sum()
|
|
532
559
|
|
|
533
560
|
unit_treatment_periods[unit] = int(periods_count)
|
|
534
561
|
|
|
@@ -636,25 +663,31 @@ def treatment_times(
|
|
|
636
663
|
verbose=verbose
|
|
637
664
|
)
|
|
638
665
|
|
|
639
|
-
is_multiple_treatment_period(
|
|
666
|
+
is_multiple_treatment_period_result = is_multiple_treatment_period(
|
|
640
667
|
data = data,
|
|
641
668
|
unit_col = unit_col,
|
|
642
669
|
treatment_col = treatment_col,
|
|
643
670
|
verbose = verbose
|
|
644
|
-
)
|
|
671
|
+
)
|
|
645
672
|
|
|
646
673
|
if verbose:
|
|
647
674
|
print(f"Identifying treatment times for treatment '{treatment_col}'", end = " ... ")
|
|
648
675
|
|
|
649
|
-
|
|
650
|
-
|
|
676
|
+
if config.ACCEPT_CONTINUOUS_TREATMENTS:
|
|
677
|
+
tt = list(unique(data.loc[data[treatment_col] > 0, time_col]))
|
|
678
|
+
else:
|
|
679
|
+
tt = list(unique(data.loc[data[treatment_col] == 1, time_col]))
|
|
680
|
+
|
|
651
681
|
units = unique(data[unit_col])
|
|
652
682
|
|
|
653
683
|
units_tt = pd.DataFrame(columns = [unit_col, "treatment_min", "treatment_max"])
|
|
654
684
|
|
|
655
685
|
for unit in units:
|
|
656
686
|
|
|
657
|
-
|
|
687
|
+
if config.ACCEPT_CONTINUOUS_TREATMENTS:
|
|
688
|
+
data_unit_tt = data[(data[unit_col] == unit) & (data[treatment_col] > 0)]
|
|
689
|
+
else:
|
|
690
|
+
data_unit_tt = data[(data[unit_col] == unit) & (data[treatment_col] == 1)]
|
|
658
691
|
|
|
659
692
|
if data_unit_tt.empty:
|
|
660
693
|
continue
|
|
@@ -678,7 +711,7 @@ def treatment_times(
|
|
|
678
711
|
|
|
679
712
|
if verbose:
|
|
680
713
|
print("OK")
|
|
681
|
-
|
|
714
|
+
|
|
682
715
|
return [
|
|
683
716
|
units_tt,
|
|
684
717
|
tt
|
|
@@ -796,9 +829,9 @@ def fit_metrics(
|
|
|
796
829
|
|
|
797
830
|
assert observed_no == expected_no, "Error while calculating fit metrics: Observed and expected differ in length"
|
|
798
831
|
|
|
799
|
-
if not pd.api.types.is_numeric_dtype(observed):
|
|
832
|
+
if not pd.api.types.is_numeric_dtype(observed) or not np.issubdtype(observed.dtype, np.number):
|
|
800
833
|
raise ValueError("Error while calculating fit metrics: Observed column is not numeric")
|
|
801
|
-
if not pd.api.types.is_numeric_dtype(expected):
|
|
834
|
+
if not pd.api.types.is_numeric_dtype(expected) or not np.issubdtype(expected.dtype, np.number):
|
|
802
835
|
raise ValueError("Error while calculating fit metrics: Expected column is not numeric")
|
|
803
836
|
|
|
804
837
|
if outcome_col is not None:
|
|
@@ -810,8 +843,8 @@ def fit_metrics(
|
|
|
810
843
|
|
|
811
844
|
if remove_nan:
|
|
812
845
|
|
|
813
|
-
observed =
|
|
814
|
-
expected =
|
|
846
|
+
observed = np.array(observed)
|
|
847
|
+
expected = np.array(expected)
|
|
815
848
|
|
|
816
849
|
obs_exp = pd.DataFrame(
|
|
817
850
|
{
|
|
@@ -947,4 +980,30 @@ def bool_to_YN(val):
|
|
|
947
980
|
if isinstance(val, bool):
|
|
948
981
|
return "YES" if val else "NO"
|
|
949
982
|
else:
|
|
950
|
-
return val
|
|
983
|
+
return val
|
|
984
|
+
|
|
985
|
+
def check_date_format(
|
|
986
|
+
dates: list = None,
|
|
987
|
+
date_format: str = "%Y-%m-%d"
|
|
988
|
+
):
|
|
989
|
+
|
|
990
|
+
if dates is None:
|
|
991
|
+
dates = []
|
|
992
|
+
|
|
993
|
+
invalid_dates_included = False
|
|
994
|
+
invalid_dates = []
|
|
995
|
+
|
|
996
|
+
for date in dates:
|
|
997
|
+
try:
|
|
998
|
+
datetime.strptime(date, date_format)
|
|
999
|
+
except (ValueError, TypeError):
|
|
1000
|
+
invalid_dates.append(date)
|
|
1001
|
+
|
|
1002
|
+
if len(invalid_dates) > 0:
|
|
1003
|
+
invalid_dates_included = True
|
|
1004
|
+
invalid_dates = [str(d) for d in invalid_dates]
|
|
1005
|
+
|
|
1006
|
+
return [
|
|
1007
|
+
invalid_dates_included,
|
|
1008
|
+
invalid_dates
|
|
1009
|
+
]
|
|
@@ -4,9 +4,9 @@
|
|
|
4
4
|
# Author: Thomas Wieland
|
|
5
5
|
# ORCID: 0000-0001-5168-9846
|
|
6
6
|
# mail: geowieland@googlemail.com
|
|
7
|
-
# Version: 2.0.
|
|
8
|
-
# Last update:
|
|
9
|
-
# Copyright (c) 2025 Thomas Wieland
|
|
7
|
+
# Version: 2.0.11
|
|
8
|
+
# Last update: 2026-02-20 17:44
|
|
9
|
+
# Copyright (c) 2025-2026 Thomas Wieland
|
|
10
10
|
#-----------------------------------------------------------------------
|
|
11
11
|
|
|
12
12
|
|
|
@@ -1,12 +1,12 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: diffindiff
|
|
3
|
-
Version: 2.2.
|
|
4
|
-
Summary: diffindiff: Python library for convenient Difference-in-Differences
|
|
3
|
+
Version: 2.2.7
|
|
4
|
+
Summary: diffindiff: Python library for convenient Difference-in-Differences analyses
|
|
5
5
|
Author: Thomas Wieland
|
|
6
6
|
Author-email: geowieland@googlemail.com
|
|
7
7
|
Description-Content-Type: text/markdown
|
|
8
8
|
|
|
9
|
-
# diffindiff:
|
|
9
|
+
# diffindiff: Python library for convenient Difference-in-Differences analyses
|
|
10
10
|
|
|
11
11
|
This Python library is designed for performing Difference-in-Differences (DiD) analyses in a convenient way. It allows users to construct datasets, define treatment and control groups, and set treatment periods. DiD model analyses may be conducted with both datasets created by built-in functions and ready-to-use external datasets. Both simultaneous and staggered adoption are supported. The library allows for various extensions, such as two-way fixed effects models, group- or individual-specific effects, post-treatment periods, and triple-difference estimations. Additionally, it includes functions for visualizing results, such as plotting DiD coefficients with confidence intervals and illustrating the temporal evolution of staggered treatments. Furthermore, several functions for rigorous treatment setting and data diagnostics are incorporated.
|
|
12
12
|
|
|
@@ -20,14 +20,14 @@ Thomas Wieland [ORCID](https://orcid.org/0000-0001-5168-9846) [EMail](mailto:geo
|
|
|
20
20
|
|
|
21
21
|
- 📦 PyPI: [diffindiff](https://pypi.org/project/diffindiff/)
|
|
22
22
|
- 💻 GitHub Repository: [diffindiff_official](https://github.com/geowieland/diffindiff_official)
|
|
23
|
-
- 📄 DOI (Zenodo): [10.5281/zenodo.
|
|
23
|
+
- 📄 DOI (Zenodo): [10.5281/zenodo.18656820](https://doi.org/10.5281/zenodo.18656820)
|
|
24
24
|
|
|
25
25
|
|
|
26
26
|
## Citation
|
|
27
27
|
|
|
28
28
|
If you use this software, please cite:
|
|
29
29
|
|
|
30
|
-
Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.
|
|
30
|
+
Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.7) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.18656820
|
|
31
31
|
|
|
32
32
|
|
|
33
33
|
## Installation
|
|
@@ -167,8 +167,9 @@ See the /tests directory for usage examples of most of the included functions.
|
|
|
167
167
|
- Wooldridge JM (2012) *Introductory Econometrics. A Modern Approach*.
|
|
168
168
|
|
|
169
169
|
|
|
170
|
-
## What's new (v2.2.
|
|
170
|
+
## What's new (v2.2.7)
|
|
171
|
+
- Functions
|
|
172
|
+
- diddata.DiffData.define_treatment() for constructing a new treatment from a column in the dataframe
|
|
171
173
|
- Bugfixes:
|
|
172
|
-
-
|
|
173
|
-
-
|
|
174
|
-
- Update README
|
|
174
|
+
- didtools.treatment_times() and didtools.is_multiple_treatment_period() now also identify continuous treatments correctly
|
|
175
|
+
- Fixed problematic type conversion in didtools.fit_metrics()
|
|
@@ -7,8 +7,8 @@ def read_README():
|
|
|
7
7
|
|
|
8
8
|
setup(
|
|
9
9
|
name='diffindiff',
|
|
10
|
-
version='2.2.
|
|
11
|
-
description='diffindiff: Python library for convenient Difference-in-Differences
|
|
10
|
+
version='2.2.7',
|
|
11
|
+
description='diffindiff: Python library for convenient Difference-in-Differences analyses',
|
|
12
12
|
packages=find_packages(include=["diffindiff", "diffindiff.tests"]),
|
|
13
13
|
include_package_data=True,
|
|
14
14
|
long_description=read_README(),
|
|
@@ -30,8 +30,7 @@ setup(
|
|
|
30
30
|
'xgboost',
|
|
31
31
|
'lightgbm',
|
|
32
32
|
'patsy',
|
|
33
|
-
'openpyxl'
|
|
34
|
-
'huff>=1.6.6'
|
|
33
|
+
'openpyxl'
|
|
35
34
|
],
|
|
36
35
|
test_suite='tests',
|
|
37
36
|
)
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|