diffindiff 2.2.5__tar.gz → 2.2.7__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (21) hide show
  1. {diffindiff-2.2.5 → diffindiff-2.2.7}/PKG-INFO +10 -9
  2. {diffindiff-2.2.5 → diffindiff-2.2.7}/README.md +8 -7
  3. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/config.py +8 -5
  4. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/didanalysis.py +14 -9
  5. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/didanalysis_helper.py +11 -8
  6. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/diddata.py +199 -43
  7. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/didtools.py +76 -17
  8. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/tests/tests_diffindiff.py +3 -3
  9. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff.egg-info/PKG-INFO +10 -9
  10. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff.egg-info/requires.txt +0 -1
  11. {diffindiff-2.2.5 → diffindiff-2.2.7}/setup.py +3 -4
  12. {diffindiff-2.2.5 → diffindiff-2.2.7}/MANIFEST.in +0 -0
  13. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/__init__.py +0 -0
  14. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/tests/__init__.py +0 -0
  15. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/tests/data/Corona_Hesse.xlsx +0 -0
  16. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/tests/data/counties_DE.csv +0 -0
  17. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff/tests/data/curfew_DE.csv +0 -0
  18. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff.egg-info/SOURCES.txt +0 -0
  19. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff.egg-info/dependency_links.txt +0 -0
  20. {diffindiff-2.2.5 → diffindiff-2.2.7}/diffindiff.egg-info/top_level.txt +0 -0
  21. {diffindiff-2.2.5 → diffindiff-2.2.7}/setup.cfg +0 -0
@@ -1,12 +1,12 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: diffindiff
3
- Version: 2.2.5
4
- Summary: diffindiff: Python library for convenient Difference-in-Differences Analyses
3
+ Version: 2.2.7
4
+ Summary: diffindiff: Python library for convenient Difference-in-Differences analyses
5
5
  Author: Thomas Wieland
6
6
  Author-email: geowieland@googlemail.com
7
7
  Description-Content-Type: text/markdown
8
8
 
9
- # diffindiff: A Python library for convenient difference-in-differences analyses
9
+ # diffindiff: Python library for convenient Difference-in-Differences analyses
10
10
 
11
11
  This Python library is designed for performing Difference-in-Differences (DiD) analyses in a convenient way. It allows users to construct datasets, define treatment and control groups, and set treatment periods. DiD model analyses may be conducted with both datasets created by built-in functions and ready-to-use external datasets. Both simultaneous and staggered adoption are supported. The library allows for various extensions, such as two-way fixed effects models, group- or individual-specific effects, post-treatment periods, and triple-difference estimations. Additionally, it includes functions for visualizing results, such as plotting DiD coefficients with confidence intervals and illustrating the temporal evolution of staggered treatments. Furthermore, several functions for rigorous treatment setting and data diagnostics are incorporated.
12
12
 
@@ -20,14 +20,14 @@ Thomas Wieland [ORCID](https://orcid.org/0000-0001-5168-9846) [EMail](mailto:geo
20
20
 
21
21
  - 📦 PyPI: [diffindiff](https://pypi.org/project/diffindiff/)
22
22
  - 💻 GitHub Repository: [diffindiff_official](https://github.com/geowieland/diffindiff_official)
23
- - 📄 DOI (Zenodo): [10.5281/zenodo.18639559](https://doi.org/10.5281/zenodo.18639559)
23
+ - 📄 DOI (Zenodo): [10.5281/zenodo.18656820](https://doi.org/10.5281/zenodo.18656820)
24
24
 
25
25
 
26
26
  ## Citation
27
27
 
28
28
  If you use this software, please cite:
29
29
 
30
- Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.4) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.18656820
30
+ Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.7) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.18656820
31
31
 
32
32
 
33
33
  ## Installation
@@ -167,8 +167,9 @@ See the /tests directory for usage examples of most of the included functions.
167
167
  - Wooldridge JM (2012) *Introductory Econometrics. A Modern Approach*.
168
168
 
169
169
 
170
- ## What's new (v2.2.5)
170
+ ## What's new (v2.2.7)
171
+ - Functions
172
+ - diddata.DiffData.define_treatment() for constructing a new treatment from a column in the dataframe
171
173
  - Bugfixes:
172
- - Incorrect import
173
- - Other:
174
- - Update README
174
+ - didtools.treatment_times() and didtools.is_multiple_treatment_period() now also identify continuous treatments correctly
175
+ - Fixed problematic type conversion in didtools.fit_metrics()
@@ -1,4 +1,4 @@
1
- # diffindiff: A Python library for convenient difference-in-differences analyses
1
+ # diffindiff: Python library for convenient Difference-in-Differences analyses
2
2
 
3
3
  This Python library is designed for performing Difference-in-Differences (DiD) analyses in a convenient way. It allows users to construct datasets, define treatment and control groups, and set treatment periods. DiD model analyses may be conducted with both datasets created by built-in functions and ready-to-use external datasets. Both simultaneous and staggered adoption are supported. The library allows for various extensions, such as two-way fixed effects models, group- or individual-specific effects, post-treatment periods, and triple-difference estimations. Additionally, it includes functions for visualizing results, such as plotting DiD coefficients with confidence intervals and illustrating the temporal evolution of staggered treatments. Furthermore, several functions for rigorous treatment setting and data diagnostics are incorporated.
4
4
 
@@ -12,14 +12,14 @@ Thomas Wieland [ORCID](https://orcid.org/0000-0001-5168-9846) [EMail](mailto:geo
12
12
 
13
13
  - 📦 PyPI: [diffindiff](https://pypi.org/project/diffindiff/)
14
14
  - 💻 GitHub Repository: [diffindiff_official](https://github.com/geowieland/diffindiff_official)
15
- - 📄 DOI (Zenodo): [10.5281/zenodo.18639559](https://doi.org/10.5281/zenodo.18639559)
15
+ - 📄 DOI (Zenodo): [10.5281/zenodo.18656820](https://doi.org/10.5281/zenodo.18656820)
16
16
 
17
17
 
18
18
  ## Citation
19
19
 
20
20
  If you use this software, please cite:
21
21
 
22
- Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.4) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.18656820
22
+ Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.7) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.18656820
23
23
 
24
24
 
25
25
  ## Installation
@@ -159,8 +159,9 @@ See the /tests directory for usage examples of most of the included functions.
159
159
  - Wooldridge JM (2012) *Introductory Econometrics. A Modern Approach*.
160
160
 
161
161
 
162
- ## What's new (v2.2.5)
162
+ ## What's new (v2.2.7)
163
+ - Functions
164
+ - diddata.DiffData.define_treatment() for constructing a new treatment from a column in the dataframe
163
165
  - Bugfixes:
164
- - Incorrect import
165
- - Other:
166
- - Update README
166
+ - didtools.treatment_times() and didtools.is_multiple_treatment_period() now also identify continuous treatments correctly
167
+ - Fixed problematic type conversion in didtools.fit_metrics()
@@ -4,22 +4,25 @@
4
4
  # Author: Thomas Wieland
5
5
  # ORCID: 0000-0001-5168-9846
6
6
  # mail: geowieland@googlemail.com
7
- # Version: 1.0.4
8
- # Last update: 2025-12-06 11:52
9
- # Copyright (c) 2025 Thomas Wieland
7
+ # Version: 1.0.6
8
+ # Last update: 2026-02-26 18:04
9
+ # Copyright (c) 2025-2026 Thomas Wieland
10
10
  #-----------------------------------------------------------------------
11
11
 
12
12
  # Basic config:
13
13
 
14
- PACKAGE_VERSION = "2.2.2"
14
+ PACKAGE_NAME = "diffindiff"
15
+ PACKAGE_VERSION = "2.2.7"
15
16
 
16
- VERBOSE = False
17
+ VERBOSE = True
17
18
 
18
19
  ROUND_STATISTIC = 3
19
20
  ROUND_PERCENT = 2
20
21
 
21
22
  AUTO_SWITCH_TO_PREPOST = True
22
23
 
24
+ ACCEPT_CONTINUOUS_TREATMENTS = True
25
+
23
26
  # Description texts:
24
27
 
25
28
  DID_DESCRIPTION = "Difference-in-Differences Analysis"
@@ -4,15 +4,14 @@
4
4
  # Author: Thomas Wieland
5
5
  # ORCID: 0000-0001-5168-9846
6
6
  # mail: geowieland@googlemail.com
7
- # Version: 2.2.2
8
- # Last update: 2025-12-07 10:27
9
- # Copyright (c) 2025 Thomas Wieland
7
+ # Version: 2.2.4
8
+ # Last update: 2026-02-26 18:04
9
+ # Copyright (c) 2024-2026 Thomas Wieland
10
10
  #-----------------------------------------------------------------------
11
11
 
12
12
 
13
13
  import pandas as pd
14
14
  import numpy as np
15
- from math import isnan
16
15
  import matplotlib.pyplot as plt
17
16
  from matplotlib.dates import DateFormatter
18
17
  import diffindiff.didtools as tools
@@ -930,7 +929,7 @@ class DiffModel:
930
929
  if "TG" in plot_intervals_groups and "CG" in plot_intervals_groups:
931
930
  lines_labels_required = lines_labels_required+2
932
931
  assert len(lines_col) == lines_col_required, f"Parameter 'lines_col' must be a list with {lines_col_required} entries"
933
- assert len(lines_style) == lines_style_required, f"Parameter 'lines_style' must be a list with {lines_col_required} entries"
932
+ assert len(lines_style) == lines_style_required, f"Parameter 'lines_style' must be a list with {lines_style_required} entries"
934
933
  assert len(lines_labels) == lines_labels_required, f"Parameter 'lines_labels' must be a list with {lines_labels_required} entries"
935
934
 
936
935
  model_data = self.data[2]
@@ -1357,8 +1356,8 @@ def did_analysis(
1357
1356
  missing_replace_by_zero: bool = False,
1358
1357
  fit_by = "ols_fit",
1359
1358
  verbose: bool = config.VERBOSE
1360
- ):
1361
-
1359
+ ):
1360
+
1362
1361
  tools.check_columns(
1363
1362
  df = data,
1364
1363
  columns = [
@@ -1385,6 +1384,12 @@ def did_analysis(
1385
1384
  verbose = verbose
1386
1385
  )
1387
1386
 
1387
+ tools.is_numeric(
1388
+ df = data,
1389
+ columns = treatment_col,
1390
+ verbose = verbose
1391
+ )
1392
+
1388
1393
  cols_relevant = [
1389
1394
  unit_col,
1390
1395
  time_col,
@@ -1808,7 +1813,7 @@ def did_analysis(
1808
1813
  }
1809
1814
 
1810
1815
  if bonferroni:
1811
- confint_alpha = confint_alpha/no_treatments
1816
+ confint_alpha = confint_alpha/no_treatments
1812
1817
 
1813
1818
  if fit_by == "ml":
1814
1819
  fit_result = helper.ml_fit(
@@ -1825,7 +1830,7 @@ def did_analysis(
1825
1830
  cluster_SE_by = cluster_SE_by,
1826
1831
  verbose = verbose
1827
1832
  )
1828
-
1833
+
1829
1834
  model_results = helper.extract_model_results(
1830
1835
  fit_result = fit_result,
1831
1836
  TG_col = TG_col,
@@ -4,9 +4,9 @@
4
4
  # Author: Thomas Wieland
5
5
  # ORCID: 0000-0001-5168-9846
6
6
  # mail: geowieland@googlemail.com
7
- # Version: 1.0.5
8
- # Last update: 2025-12-07 10:27
9
- # Copyright (c) 2025 Thomas Wieland
7
+ # Version: 1.0.7
8
+ # Last update: 2025-02-26 18:02
9
+ # Copyright (c) 2025-2026 Thomas Wieland
10
10
  #-----------------------------------------------------------------------
11
11
 
12
12
  import pandas as pd
@@ -203,7 +203,7 @@ def create_spillover(
203
203
  time_col = time_col,
204
204
  treatment_col = treatment,
205
205
  create_TT_col = TT_col,
206
- verbose = verbose
206
+ verbose = False
207
207
  )[0]
208
208
 
209
209
  sp_unit_col = f"{config.SPILLOVER_UNIT_PREFIX}{config.DELIMITER}{treatment}"
@@ -396,7 +396,11 @@ def treatment_diagnostics(
396
396
  )
397
397
 
398
398
  if verbose:
399
- print(f"There are {no_treatments} treatments (simultaneous: {no_treatments-staggered_count}, staggered: {staggered_count}) with {untreated[0]} treated and {untreated[1]} untreated units.")
399
+
400
+ if no_treatments > 1:
401
+ print(f"There are {no_treatments} treatments (simultaneous: {no_treatments-staggered_count}, staggered: {staggered_count}) with {untreated[0]} treated and {untreated[1]} untreated units.")
402
+ else:
403
+ print(f"There is {no_treatments} treatment (staggered: {staggered_count}) with {untreated[0]} treated and {untreated[1]} untreated units.")
400
404
 
401
405
  return [
402
406
  treatment_diagnostics_results,
@@ -918,10 +922,9 @@ def create_timestamp(function):
918
922
  now = datetime.now()
919
923
 
920
924
  timestamp_dict = {
921
- "package_version": f"diffindiff {config.PACKAGE_VERSION}",
925
+ "package_version": f"{config.PACKAGE_NAME} {config.PACKAGE_VERSION}",
922
926
  "function": function,
923
927
  "datetime": now.strftime("%Y-%m-%d %H-%M-%S")
924
928
  }
925
929
 
926
- return timestamp_dict
927
-
930
+ return timestamp_dict
@@ -4,9 +4,9 @@
4
4
  # Author: Thomas Wieland
5
5
  # ORCID: 0000-0001-5168-9846
6
6
  # mail: geowieland@googlemail.com
7
- # Version: 2.1.5
8
- # Last update: 2025-12-07 10:27
9
- # Copyright (c) 2025 Thomas Wieland
7
+ # Version: 2.1.8
8
+ # Last update: 2026-02-26 18:30
9
+ # Copyright (c) 2024-2026 Thomas Wieland
10
10
  #-----------------------------------------------------------------------
11
11
 
12
12
 
@@ -76,29 +76,34 @@ class DiffGroups:
76
76
  verbose: bool = config.VERBOSE
77
77
  ):
78
78
 
79
- groups_config = self.data[1]
79
+ groups_config = self.data[1]
80
80
 
81
81
  if groups_config["DDD"]:
82
- raise ValueError("DiffGroups object already includes a benefit group")
83
-
84
- if verbose:
85
- print(f"Adding benefit group with {len(group_benefit)} units to groups data", end = " ... ")
82
+
83
+ print("DiffGroups object already includes a benefit group. No segmentation added.")
84
+
85
+ groups = self
86
+
87
+ else:
86
88
 
87
- groups_data = self.data[0]
89
+ if verbose:
90
+ print(f"Adding benefit group with {len(group_benefit)} units to groups data", end = " ... ")
91
+
92
+ groups_data = self.data[0]
93
+
94
+ groups_data[config.BG_COL] = 0
95
+ groups_data.loc[groups_data[config.UNIT_COL].astype(str).isin(group_benefit), config.BG_COL] = 1
96
+
97
+ groups_config["DDD"] = True
88
98
 
89
- groups_data[config.BG_COL] = 0
90
- groups_data.loc[groups_data[config.UNIT_COL].astype(str).isin(group_benefit), config.BG_COL] = 1
91
-
92
- groups_config["DDD"] = True
99
+ groups = DiffGroups(
100
+ groups_data,
101
+ groups_config,
102
+ timestamp = helper.create_timestamp(function="add_segmentation")
103
+ )
93
104
 
94
- groups = DiffGroups(
95
- groups_data,
96
- groups_config,
97
- timestamp = helper.create_timestamp(function="add_segmentation")
98
- )
99
-
100
- if verbose:
101
- print("OK")
105
+ if verbose:
106
+ print("OK")
102
107
 
103
108
  return groups
104
109
 
@@ -255,9 +260,16 @@ def create_treatment(
255
260
  after_treatment_period: bool = False,
256
261
  verbose = config.VERBOSE
257
262
  ):
263
+
264
+ check_dates = tools.check_date_format(
265
+ dates = study_period+treatment_period,
266
+ date_format = date_format
267
+ )
268
+ if check_dates[0]:
269
+ raise ValueError(f"Study and/or treatment period include invalid dates: {', '.join(check_dates[1])}.")
258
270
 
259
271
  TT_col = config.TT_COL
260
-
272
+
261
273
  if treatment_name is not None:
262
274
 
263
275
  if not isinstance(treatment_name, str):
@@ -474,7 +486,7 @@ class DiffData:
474
486
  variables: list = None,
475
487
  unit_col: str = None,
476
488
  time_col: str = None,
477
- verbose: bool = config.VERBOSE
489
+ verbose: bool = False
478
490
  ):
479
491
 
480
492
  if unit_col is None and time_col is None:
@@ -567,6 +579,7 @@ class DiffData:
567
579
 
568
580
  self.data[0] = did_modeldata
569
581
  self.data[5] = variables
582
+ self.data[7][len(self.data[7])] = helper.create_timestamp(function="add_covariates")
570
583
 
571
584
  if verbose:
572
585
  print("OK")
@@ -610,7 +623,6 @@ class DiffData:
610
623
  groups_data_old = did_groups_old.get_data()
611
624
 
612
625
  did_modeldata_old = self.get_did_modeldata_df()
613
- unit_id_col, time_col = self.get_unit_time_cols()
614
626
  outcome_col_original = self.data[3]
615
627
  unit_time_col_original = self.get_unit_time_cols()
616
628
  covariates = self.get_covariates()
@@ -716,21 +728,157 @@ class DiffData:
716
728
  timestamp = helper.create_timestamp(function="add_treatment")
717
729
  )
718
730
 
719
- did_data_new = DiffData(
720
- did_modeldata = did_modeldata_new,
721
- diff_groups = groups_new,
722
- diff_treatment = treatment_new,
723
- outcome_col_original = outcome_col_original,
724
- unit_time_col_original = unit_time_col_original,
725
- covariates = covariates,
726
- treatment_cols = treatment_cols_new,
727
- timestamp = helper.create_timestamp(function="add_segmentation")
731
+ if verbose:
732
+ print("OK")
733
+
734
+ self.data[0] = did_modeldata_new
735
+ self.data[1] = groups_new
736
+ self.data[2] = treatment_new
737
+ self.data[3] = outcome_col_original
738
+ self.data[4] = unit_time_col_original
739
+ self.data[5] = covariates
740
+ self.data[6] = treatment_cols_new
741
+ self.data[7][len(self.data[7])] = helper.create_timestamp(function="add_treatment")
742
+
743
+ return self
744
+
745
+ def define_treatment(
746
+ self,
747
+ treatment_name,
748
+ after_treatment_period: bool = False,
749
+ after_treatment_name = None,
750
+ verbose: bool = config.VERBOSE
751
+ ):
752
+
753
+ if not treatment_name:
754
+ raise ValueError("When adding a treatment from the data, you need to specify a treatment column with parameter treament_name = [your_treatment].")
755
+
756
+ if treatment_name not in self.get_did_modeldata_df().columns:
757
+ raise KeyError(f"Column '{treatment_name}' not in data frame")
758
+
759
+ did_treatment_old = self.get_did_treatment()
760
+ treatment_config_old = did_treatment_old.get_config()
761
+ treatment_meta_old = did_treatment_old.get_metadata()
762
+ no_treatments_old = treatment_meta_old["no_treatments"]
763
+
764
+ did_groups_old = self.get_did_groups()
765
+ groups_config_old = did_groups_old.get_config()
766
+ groups_data_old = did_groups_old.get_data()
767
+
768
+ did_modeldata_old = self.get_did_modeldata_df()
769
+ outcome_col_original = self.data[3]
770
+ unit_time_col_original = self.get_unit_time_cols()
771
+ covariates = self.get_covariates()
772
+
773
+ treatment_cols = self.get_treatment_cols()
774
+ treatment_cols_new = treatment_cols
775
+
776
+ no_treatments = no_treatments_old+1
777
+ key_counter = no_treatments-1
778
+
779
+ tt = tools.treatment_times(
780
+ data = did_modeldata_old,
781
+ unit_col=config.UNIT_COL,
782
+ time_col=config.TIME_COL,
783
+ treatment_col=treatment_name,
784
+ verbose=verbose
785
+ )
786
+
787
+ tt_date = [datetime.strptime(t, treatment_meta_old["date_format"]) for t in tt[1]]
788
+ treatment_period_start = min(tt_date)
789
+ treatment_period_end = max(tt_date)
790
+ treatment_period_start = treatment_period_start.strftime("%Y-%m-%d")
791
+ treatment_period_end = treatment_period_end.strftime("%Y-%m-%d")
792
+
793
+ is_notreatment_result = tools.is_notreatment(
794
+ data = did_modeldata_old,
795
+ unit_col=config.UNIT_COL,
796
+ treatment_col=treatment_name,
797
+ verbose = verbose
728
798
  )
799
+
800
+ treatment_group = is_notreatment_result[1]
801
+ control_group = is_notreatment_result[2]
802
+
803
+ if verbose:
804
+ print(f"Constructing treatment from column '{treatment_name}'", end = " ... ")
805
+
806
+ new_groups = create_groups(
807
+ treatment_group = treatment_group,
808
+ control_group = control_group,
809
+ treatment_name = treatment_name,
810
+ verbose=False
811
+ )
812
+ new_groups_data_df = new_groups.get_data()[0]
813
+ new_groups_config = new_groups.get_config()
814
+ TG_col = new_groups_config[0]["TG_col"]
815
+
816
+ new_treatment = create_treatment(
817
+ study_period = [treatment_meta_old["study_period_start"], treatment_meta_old["study_period_end"]],
818
+ treatment_period = [treatment_period_start, treatment_period_end],
819
+ freq = treatment_meta_old["frequency"],
820
+ date_format = treatment_meta_old["date_format"],
821
+ treatment_name = treatment_name,
822
+ pre_post = treatment_meta_old["pre_post"],
823
+ after_treatment_period = after_treatment_period,
824
+ verbose=False
825
+ )
826
+
827
+ new_treatment_data_df = new_treatment.get_data()
828
+
829
+ new_treatment_config = new_treatment.get_config()
830
+ TT_col = new_treatment_config[0]["TT_col"]
831
+ ATT_col = new_treatment_config[0]["ATT_col"]
832
+
833
+ treatment_cols_new[key_counter] = {
834
+ "TT_col": TT_col,
835
+ "ATT_col": ATT_col,
836
+ "treatment_name": treatment_name,
837
+ "after_treatment_name": after_treatment_name
838
+ }
839
+
840
+ groups_config_new = groups_config_old
841
+ groups_config_new[key_counter] = new_groups_config[0]
842
+ groups_data_new = groups_data_old
843
+ groups_data_old.append(new_groups_data_df)
844
+ groups_new = DiffGroups(
845
+ groups_data_new,
846
+ groups_config_new,
847
+ timestamp = helper.create_timestamp(function="define_treatment")
848
+ )
849
+
850
+ treatment_meta_new = treatment_meta_old
851
+ treatment_meta_new["no_treatments"] = no_treatments
852
+ treatment_config_new = treatment_config_old
853
+ treatment_config_new[key_counter] = new_treatment_config[0]
854
+
855
+ treatment_new = DiffTreatment(
856
+ new_treatment_data_df,
857
+ treatment_config_new,
858
+ treatment_meta_new,
859
+ timestamp = helper.create_timestamp(function="define_treatment")
860
+ )
729
861
 
730
862
  if verbose:
731
863
  print("OK")
732
-
733
- return did_data_new
864
+
865
+ if treatment_name in covariates:
866
+
867
+ if verbose:
868
+ print(f"NOTE: Column '{treatment_name}' was defined as covariate before and is now removed from covariates list.")
869
+
870
+ covariates.remove(treatment_name)
871
+
872
+ self.data[0] = did_modeldata_old
873
+ self.data[1] = groups_new
874
+ self.data[2] = treatment_new
875
+ self.data[3] = outcome_col_original
876
+ self.data[4] = unit_time_col_original
877
+ self.data[5] = covariates
878
+ self.data[6] = treatment_cols_new
879
+ self.data[7][len(self.data[7])] = helper.create_timestamp(function="define_treatment")
880
+
881
+ return self
734
882
 
735
883
  def add_segmentation(
736
884
  self,
@@ -967,8 +1115,8 @@ class DiffData:
967
1115
  if value["after_treatment_name"] is not None:
968
1116
  after_treatment_col[key] = value["after_treatment_name"]
969
1117
  if value["ATT_col"] is not None:
970
- ATT_col[key] = value["ATT_col"]
971
-
1118
+ ATT_col[key] = value["ATT_col"]
1119
+
972
1120
  did_results = didanalysis.did_analysis(
973
1121
  data = did_modeldata,
974
1122
  TG_col = TG_col,
@@ -1016,9 +1164,15 @@ def merge_data(
1016
1164
  keep_columns: bool = False,
1017
1165
  verbose: bool = config.VERBOSE
1018
1166
  ):
1019
-
1020
- if verbose:
1021
- print("Merging groups and treatment data", end = " ... ")
1167
+
1168
+ tools.check_columns(
1169
+ df = outcome_data,
1170
+ columns = [
1171
+ unit_id_col,
1172
+ time_col,
1173
+ outcome_col
1174
+ ]
1175
+ )
1022
1176
 
1023
1177
  groups_data_df = diff_groups.get_data()
1024
1178
  groups_data_df = groups_data_df[0]
@@ -1075,6 +1229,9 @@ def merge_data(
1075
1229
  verbose=verbose
1076
1230
  )
1077
1231
 
1232
+ if verbose:
1233
+ print("Merging groups and treatment data", end = " ... ")
1234
+
1078
1235
  if keep_columns:
1079
1236
  outcome_data_short = outcome_data
1080
1237
  else:
@@ -1108,7 +1265,8 @@ def merge_data(
1108
1265
  }
1109
1266
  }
1110
1267
 
1111
- timestamp = helper.create_timestamp(function="merge_data")
1268
+ timestamp = {}
1269
+ timestamp[0] = helper.create_timestamp(function="merge_data")
1112
1270
 
1113
1271
  did_data_all = DiffData(
1114
1272
  did_modeldata,
@@ -1175,8 +1333,6 @@ def create_data(
1175
1333
  verbose = verbose
1176
1334
  )
1177
1335
 
1178
- did_data_all.timestamp = helper.create_timestamp(function="create_data")
1179
-
1180
1336
  return did_data_all
1181
1337
 
1182
1338
  def create_counterfactual(
@@ -4,15 +4,16 @@
4
4
  # Author: Thomas Wieland
5
5
  # ORCID: 0000-0001-5168-9846
6
6
  # mail: geowieland@googlemail.com
7
- # Version: 2.1.4
8
- # Last update: 2025-12-07 10:27
9
- # Copyright (c) 2025 Thomas Wieland
7
+ # Version: 2.1.6
8
+ # Last update: 2026-02-26 18:33
9
+ # Copyright (c) 2025-2026 Thomas Wieland
10
10
  #-----------------------------------------------------------------------
11
11
 
12
12
 
13
13
  import pandas as pd
14
14
  import numpy as np
15
15
  import re
16
+ from datetime import datetime
16
17
  from collections.abc import Iterable
17
18
  from statsmodels.formula.api import ols
18
19
  from sklearn.ensemble import BaggingRegressor, RandomForestRegressor, GradientBoostingRegressor
@@ -23,7 +24,6 @@ from xgboost import XGBRegressor
23
24
  from lightgbm import LGBMRegressor
24
25
  from sklearn.linear_model import LinearRegression
25
26
  from sklearn.model_selection import train_test_split
26
- from huff.goodness_of_fit import modelfit, modelfit_cat, modelfit_plot
27
27
  import diffindiff.config as config
28
28
 
29
29
 
@@ -46,6 +46,30 @@ def check_columns(
46
46
  if missing_columns:
47
47
  raise KeyError(f"Data do not contain column(s): {', '.join(missing_columns)}")
48
48
 
49
+ def is_numeric(
50
+ df: pd.DataFrame,
51
+ columns: list,
52
+ verbose: bool = config.VERBOSE
53
+ ):
54
+
55
+ if len(columns) > 0:
56
+
57
+ if verbose:
58
+ print(f"Checking if column(s) {', '.join(columns)} are numeric", end=" ... ")
59
+
60
+ non_numeric_columns = []
61
+
62
+ for col in columns:
63
+
64
+ if not pd.api.types.is_numeric_dtype(df[col]):
65
+ non_numeric_columns.append(col)
66
+
67
+ if verbose:
68
+ print("OK")
69
+
70
+ if non_numeric_columns:
71
+ raise KeyError(f"Data contain non-numeric column(s): {', '.join(non_numeric_columns)}")
72
+
49
73
  def panel_index(
50
74
  data: pd.DataFrame,
51
75
  unit_col: str,
@@ -527,8 +551,11 @@ def is_multiple_treatment_period(
527
551
  unit_treatment = data_sub[treatment_col]
528
552
 
529
553
  groups = (unit_treatment != unit_treatment.shift()).cumsum()
530
-
531
- periods_count = (unit_treatment == 1).groupby(groups).any().sum()
554
+
555
+ if config.ACCEPT_CONTINUOUS_TREATMENTS:
556
+ periods_count = (unit_treatment > 0).groupby(groups).any().sum()
557
+ else:
558
+ periods_count = (unit_treatment == 1).groupby(groups).any().sum()
532
559
 
533
560
  unit_treatment_periods[unit] = int(periods_count)
534
561
 
@@ -636,25 +663,31 @@ def treatment_times(
636
663
  verbose=verbose
637
664
  )
638
665
 
639
- is_multiple_treatment_period(
666
+ is_multiple_treatment_period_result = is_multiple_treatment_period(
640
667
  data = data,
641
668
  unit_col = unit_col,
642
669
  treatment_col = treatment_col,
643
670
  verbose = verbose
644
- )[0]
671
+ )
645
672
 
646
673
  if verbose:
647
674
  print(f"Identifying treatment times for treatment '{treatment_col}'", end = " ... ")
648
675
 
649
- tt = list(unique(data.loc[data[treatment_col] == 1, time_col]))
650
-
676
+ if config.ACCEPT_CONTINUOUS_TREATMENTS:
677
+ tt = list(unique(data.loc[data[treatment_col] > 0, time_col]))
678
+ else:
679
+ tt = list(unique(data.loc[data[treatment_col] == 1, time_col]))
680
+
651
681
  units = unique(data[unit_col])
652
682
 
653
683
  units_tt = pd.DataFrame(columns = [unit_col, "treatment_min", "treatment_max"])
654
684
 
655
685
  for unit in units:
656
686
 
657
- data_unit_tt = data[(data[unit_col] == unit) & (data[treatment_col] == 1)]
687
+ if config.ACCEPT_CONTINUOUS_TREATMENTS:
688
+ data_unit_tt = data[(data[unit_col] == unit) & (data[treatment_col] > 0)]
689
+ else:
690
+ data_unit_tt = data[(data[unit_col] == unit) & (data[treatment_col] == 1)]
658
691
 
659
692
  if data_unit_tt.empty:
660
693
  continue
@@ -678,7 +711,7 @@ def treatment_times(
678
711
 
679
712
  if verbose:
680
713
  print("OK")
681
-
714
+
682
715
  return [
683
716
  units_tt,
684
717
  tt
@@ -796,9 +829,9 @@ def fit_metrics(
796
829
 
797
830
  assert observed_no == expected_no, "Error while calculating fit metrics: Observed and expected differ in length"
798
831
 
799
- if not pd.api.types.is_numeric_dtype(observed):
832
+ if not pd.api.types.is_numeric_dtype(observed) or not np.issubdtype(observed.dtype, np.number):
800
833
  raise ValueError("Error while calculating fit metrics: Observed column is not numeric")
801
- if not pd.api.types.is_numeric_dtype(expected):
834
+ if not pd.api.types.is_numeric_dtype(expected) or not np.issubdtype(expected.dtype, np.number):
802
835
  raise ValueError("Error while calculating fit metrics: Expected column is not numeric")
803
836
 
804
837
  if outcome_col is not None:
@@ -810,8 +843,8 @@ def fit_metrics(
810
843
 
811
844
  if remove_nan:
812
845
 
813
- observed = observed.reset_index(drop=True)
814
- expected = expected.reset_index(drop=True)
846
+ observed = np.array(observed)
847
+ expected = np.array(expected)
815
848
 
816
849
  obs_exp = pd.DataFrame(
817
850
  {
@@ -947,4 +980,30 @@ def bool_to_YN(val):
947
980
  if isinstance(val, bool):
948
981
  return "YES" if val else "NO"
949
982
  else:
950
- return val
983
+ return val
984
+
985
+ def check_date_format(
986
+ dates: list = None,
987
+ date_format: str = "%Y-%m-%d"
988
+ ):
989
+
990
+ if dates is None:
991
+ dates = []
992
+
993
+ invalid_dates_included = False
994
+ invalid_dates = []
995
+
996
+ for date in dates:
997
+ try:
998
+ datetime.strptime(date, date_format)
999
+ except (ValueError, TypeError):
1000
+ invalid_dates.append(date)
1001
+
1002
+ if len(invalid_dates) > 0:
1003
+ invalid_dates_included = True
1004
+ invalid_dates = [str(d) for d in invalid_dates]
1005
+
1006
+ return [
1007
+ invalid_dates_included,
1008
+ invalid_dates
1009
+ ]
@@ -4,9 +4,9 @@
4
4
  # Author: Thomas Wieland
5
5
  # ORCID: 0000-0001-5168-9846
6
6
  # mail: geowieland@googlemail.com
7
- # Version: 2.0.10
8
- # Last update: 2025-12-05 17:23
9
- # Copyright (c) 2025 Thomas Wieland
7
+ # Version: 2.0.11
8
+ # Last update: 2026-02-20 17:44
9
+ # Copyright (c) 2025-2026 Thomas Wieland
10
10
  #-----------------------------------------------------------------------
11
11
 
12
12
 
@@ -1,12 +1,12 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: diffindiff
3
- Version: 2.2.5
4
- Summary: diffindiff: Python library for convenient Difference-in-Differences Analyses
3
+ Version: 2.2.7
4
+ Summary: diffindiff: Python library for convenient Difference-in-Differences analyses
5
5
  Author: Thomas Wieland
6
6
  Author-email: geowieland@googlemail.com
7
7
  Description-Content-Type: text/markdown
8
8
 
9
- # diffindiff: A Python library for convenient difference-in-differences analyses
9
+ # diffindiff: Python library for convenient Difference-in-Differences analyses
10
10
 
11
11
  This Python library is designed for performing Difference-in-Differences (DiD) analyses in a convenient way. It allows users to construct datasets, define treatment and control groups, and set treatment periods. DiD model analyses may be conducted with both datasets created by built-in functions and ready-to-use external datasets. Both simultaneous and staggered adoption are supported. The library allows for various extensions, such as two-way fixed effects models, group- or individual-specific effects, post-treatment periods, and triple-difference estimations. Additionally, it includes functions for visualizing results, such as plotting DiD coefficients with confidence intervals and illustrating the temporal evolution of staggered treatments. Furthermore, several functions for rigorous treatment setting and data diagnostics are incorporated.
12
12
 
@@ -20,14 +20,14 @@ Thomas Wieland [ORCID](https://orcid.org/0000-0001-5168-9846) [EMail](mailto:geo
20
20
 
21
21
  - 📦 PyPI: [diffindiff](https://pypi.org/project/diffindiff/)
22
22
  - 💻 GitHub Repository: [diffindiff_official](https://github.com/geowieland/diffindiff_official)
23
- - 📄 DOI (Zenodo): [10.5281/zenodo.18639559](https://doi.org/10.5281/zenodo.18639559)
23
+ - 📄 DOI (Zenodo): [10.5281/zenodo.18656820](https://doi.org/10.5281/zenodo.18656820)
24
24
 
25
25
 
26
26
  ## Citation
27
27
 
28
28
  If you use this software, please cite:
29
29
 
30
- Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.4) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.18656820
30
+ Wieland, T. (2026). diffindiff: A Python library for convenient difference-in-differences analyses (Version 2.2.7) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.18656820
31
31
 
32
32
 
33
33
  ## Installation
@@ -167,8 +167,9 @@ See the /tests directory for usage examples of most of the included functions.
167
167
  - Wooldridge JM (2012) *Introductory Econometrics. A Modern Approach*.
168
168
 
169
169
 
170
- ## What's new (v2.2.5)
170
+ ## What's new (v2.2.7)
171
+ - Functions
172
+ - diddata.DiffData.define_treatment() for constructing a new treatment from a column in the dataframe
171
173
  - Bugfixes:
172
- - Incorrect import
173
- - Other:
174
- - Update README
174
+ - didtools.treatment_times() and didtools.is_multiple_treatment_period() now also identify continuous treatments correctly
175
+ - Fixed problematic type conversion in didtools.fit_metrics()
@@ -9,4 +9,3 @@ xgboost
9
9
  lightgbm
10
10
  patsy
11
11
  openpyxl
12
- huff>=1.6.6
@@ -7,8 +7,8 @@ def read_README():
7
7
 
8
8
  setup(
9
9
  name='diffindiff',
10
- version='2.2.5',
11
- description='diffindiff: Python library for convenient Difference-in-Differences Analyses',
10
+ version='2.2.7',
11
+ description='diffindiff: Python library for convenient Difference-in-Differences analyses',
12
12
  packages=find_packages(include=["diffindiff", "diffindiff.tests"]),
13
13
  include_package_data=True,
14
14
  long_description=read_README(),
@@ -30,8 +30,7 @@ setup(
30
30
  'xgboost',
31
31
  'lightgbm',
32
32
  'patsy',
33
- 'openpyxl',
34
- 'huff>=1.6.6'
33
+ 'openpyxl'
35
34
  ],
36
35
  test_suite='tests',
37
36
  )
File without changes
File without changes