npm - @wentorai/research-plugins - Versions diffs - 1.0.0 → 1.2.0 - Mend

@wentorai/research-plugins 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (415) hide show

package/skills/analysis/econometrics/panel-data-analyst/SKILL.md ADDED Viewed

@@ -0,0 +1,259 @@
+---
+name: panel-data-analyst
+description: "Expert panel data regression analysis with fixed effects and GMM"
+metadata:
+  openclaw:
+    emoji: "grid"
+    category: "analysis"
+    subcategory: "econometrics"
+    keywords: ["panel data", "fixed effects", "random effects", "GMM", "dynamic panel", "Hausman test"]
+    source: "https://www.stata.com/manuals/xt.pdf"
+---
+# Panel Data Analyst
+Perform expert-level panel data regression analysis including fixed effects, random effects, dynamic panel models (Arellano-Bond/Blundell-Bond GMM), and advanced diagnostic tests. This skill covers the full workflow from panel setup through model selection, estimation, and publication-ready reporting.
+## Overview
+Panel data -- repeated observations on the same cross-sectional units over time -- is the workhorse of modern empirical economics, finance, political science, and management research. Panel methods exploit both cross-sectional and temporal variation, enabling researchers to control for unobserved heterogeneity that would bias ordinary cross-sectional estimates.
+The choice between fixed effects, random effects, and dynamic panel estimators depends on the data structure, the nature of unobserved heterogeneity, and the identifying assumptions the researcher is willing to make. This skill provides a systematic decision framework and implementation in both Stata and R, with emphasis on the diagnostic tests that justify model selection.
+Beyond basic FE/RE models, this skill covers the advanced techniques increasingly required by journal reviewers: instrumental variables within panel frameworks, Driscoll-Kraay standard errors for cross-sectional dependence, correlated random effects (Mundlak/Chamberlain), and system GMM for dynamic panels with endogenous regressors.
+## Panel Data Setup
+### Declaring Panel Structure
+```stata
+* Stata panel setup
+xtset firm_id year
+xtset  // Verify panel structure
+* Check panel balance
+xtdescribe
+* Shows: min/max/avg observations per panel, gaps
+* Summary statistics by panel dimension
+xtsum revenue profit employees rnd_spending
+* Reports overall, between, and within variation
+```
+### Panel Diagnostics
+```stata
+* Check for gaps in panel
+xtset firm_id year
+gen gap = year - l.year if l.year != .
+tab gap  // Should be all 1's for balanced annual panels
+* Create balanced subsample
+by firm_id: gen T_i = _N
+tab T_i
+keep if T_i == max_T  // Keep only units observed in all periods
+* Attrition analysis
+gen in_panel = 1
+xtset firm_id year
+tsfill, full
+replace in_panel = 0 if missing(in_panel)
+reg in_panel l.revenue l.profit l.size, cluster(firm_id)
+```
+## Fixed Effects vs. Random Effects
+### Fixed Effects Estimation
+```stata
+* Within estimator (entity fixed effects)
+xtreg profit revenue rnd_spending employees i.year, fe robust
+estimates store fe_model
+* Entity and time fixed effects
+reghdfe profit revenue rnd_spending employees, ///
+    absorb(firm_id year) cluster(firm_id)
+estimates store twoway_fe
+* First-differences (alternative to within estimator)
+reg d.profit d.revenue d.rnd_spending d.employees i.year, ///
+    cluster(firm_id)
+estimates store fd_model
+```
+### Random Effects Estimation
+```stata
+* GLS random effects
+xtreg profit revenue rnd_spending employees i.year, re robust
+estimates store re_model
+```
+### Hausman Test for Model Selection
+```stata
+* Classic Hausman test
+xtreg profit revenue rnd_spending employees, fe
+estimates store fe_haus
+xtreg profit revenue rnd_spending employees, re
+estimates store re_haus
+hausman fe_haus re_haus
+* Robust Hausman test (preferred with heteroskedasticity)
+* Mundlak (1978) approach: add group means to RE model
+foreach var of varlist revenue rnd_spending employees {
+    bysort firm_id: egen m_`var' = mean(`var')
+}
+xtreg profit revenue rnd_spending employees ///
+    m_revenue m_rnd_spending m_employees i.year, re cluster(firm_id)
+test m_revenue m_rnd_spending m_employees
+* Rejection => FE preferred; failure to reject => RE acceptable
+```
+## Dynamic Panel Models
+### Arellano-Bond GMM (Difference GMM)
+```stata
+* When the lagged dependent variable is a regressor:
+* y_it = alpha * y_{i,t-1} + X_it * beta + mu_i + epsilon_it
+* Difference GMM (Arellano & Bond 1991)
+xtabond profit l.profit revenue rnd_spending employees, ///
+    lags(1) twostep robust artests(2)
+* Diagnostics
+* AR(1) should be significant, AR(2) should NOT be significant
+* Hansen J test of overidentifying restrictions (p > 0.10 desired)
+```
+### System GMM (Blundell-Bond)
+```stata
+* System GMM (Blundell & Bond 1998)
+* More efficient than difference GMM, especially with persistent series
+xtabond2 profit l.profit revenue rnd_spending employees i.year, ///
+    gmm(l.profit, lag(2 4) collapse) ///
+    gmm(revenue rnd_spending, lag(2 3) collapse) ///
+    iv(employees i.year) ///
+    twostep robust orthogonal small
+* Key diagnostics to report:
+* 1. Number of instruments (should not exceed number of groups)
+* 2. Hansen J test p-value (> 0.10, but < 0.25 preferred -- not too high)
+* 3. AR(2) test p-value (> 0.10 for valid instruments)
+* 4. Difference-in-Hansen test for subset of instruments
+```
+### GMM Diagnostic Checklist
+| Test | Null Hypothesis | Desired Result | Stata Command |
+|------|----------------|----------------|---------------|
+| AR(1) | No first-order autocorrelation | Reject (p < 0.05) | Reported automatically |
+| AR(2) | No second-order autocorrelation | Fail to reject (p > 0.10) | Reported automatically |
+| Hansen J | Instruments are valid | Fail to reject (p > 0.10) | Reported automatically |
+| Diff-in-Hansen | Level instruments valid | Fail to reject (p > 0.10) | Reported automatically |
+| Instrument count | -- | N_instruments < N_groups | Check output |
+## Standard Error Options
+### Choosing the Right Standard Errors
+```stata
+* Entity-clustered (default choice for firm panels)
+xtreg profit revenue rnd_spending, fe cluster(firm_id)
+* Two-way clustering (firm and year)
+reghdfe profit revenue rnd_spending, ///
+    absorb(firm_id) cluster(firm_id year)
+* Driscoll-Kraay standard errors (cross-sectional dependence)
+xtscc profit revenue rnd_spending i.year, fe lag(3)
+* Newey-West within panels (autocorrelation + heteroskedasticity)
+xtreg profit revenue rnd_spending, fe
+xtpcse profit revenue rnd_spending i.firm_id, correlation(ar1)
+```
+### Diagnostic Tests for Standard Error Selection
+```stata
+* Test for heteroskedasticity in FE model
+xtreg profit revenue rnd_spending, fe
+xttest3  // Modified Wald test (rejects => use robust/cluster SE)
+* Test for serial correlation
+xtserial profit revenue rnd_spending
+* Wooldridge test (rejects => use cluster SE or Newey-West)
+* Test for cross-sectional dependence
+xtreg profit revenue rnd_spending, fe
+xtcsd, pesaran abs
+* Pesaran CD test (rejects => consider Driscoll-Kraay SE)
+```
+## Advanced Specifications
+### Interaction Effects in Panel Models
+```stata
+* Continuous x continuous interaction with FE
+xtreg profit c.rnd_spending##c.market_share i.year, fe cluster(firm_id)
+* Visualize marginal effect
+margins, dydx(rnd_spending) at(market_share=(0(0.1)1))
+marginsplot, title("Marginal Effect of R&D by Market Share")
+```
+### Instrumental Variables in Panel Data
+```stata
+* IV with fixed effects (xtivreg)
+xtivreg profit (rnd_spending = tax_credit regulatory_change) ///
+    employees size i.year, fe first
+* First-stage F-statistic check
+* Report Kleibergen-Paap rk Wald F for weak instruments
+```
+### Correlated Random Effects (Mundlak)
+```stata
+* Mundlak (1978) approach: include within-group means
+foreach var of varlist revenue rnd_spending employees {
+    bysort firm_id: egen bar_`var' = mean(`var')
+}
+xtreg profit revenue rnd_spending employees ///
+    bar_revenue bar_rnd_spending bar_employees ///
+    i.year, re cluster(firm_id)
+* Coefficients on time-varying vars are equivalent to FE estimates
+* Coefficients on bar_ vars capture between-unit effects
+```
+## Publication Tables
+```stata
+* Comparison table: FE vs RE vs GMM
+esttab fe_model re_model gmm_model using "tables/panel_comparison.tex", ///
+    b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
+    label title("Panel Regression Results") ///
+    mtitles("Fixed Effects" "Random Effects" "System GMM") ///
+    stats(N N_g r2_w ar2p hansenp, ///
+        labels("Observations" "Firms" "Within R-squared" ///
+               "AR(2) p-value" "Hansen p-value") ///
+        fmt(0 0 3 3 3)) ///
+    addnotes("Clustered standard errors in parentheses." ///
+             "All models include year fixed effects.") ///
+    replace
+```
+## References
+- Wooldridge, J.M. (2010), Econometric Analysis of Cross Section and Panel Data, 2nd ed., MIT Press
+- Arellano & Bond (1991), "Some Tests of Specification for Panel Data," RES 58(2)
+- Blundell & Bond (1998), "Initial Conditions and Moment Restrictions in Dynamic Panel Data Models," JoE 87(1)
+- Roodman (2009), "How to Do xtabond2: An Introduction to Difference and System GMM in Stata," SJ 9(1)
+- Cameron & Trivedi (2005), Microeconometrics: Methods and Applications, Cambridge University Press

package/skills/analysis/econometrics/panel-data-regression-workflow/SKILL.md ADDED Viewed

@@ -0,0 +1,267 @@
+---
+name: panel-data-regression-workflow
+description: "Reproducible panel data regression workflow in Python and Stata"
+metadata:
+  openclaw:
+    emoji: "📊"
+    category: "analysis"
+    subcategory: "econometrics"
+    keywords: ["panel data", "fixed effects", "regression workflow", "python econometrics", "stata", "reproducible research"]
+    source: "https://skillsmp.com/skills/panel-data-regression-analyst"
+---
+# Panel Data Regression Workflow
+## Overview
+Panel data (longitudinal data) tracks multiple entities over time, enabling researchers to control for unobserved heterogeneity. This guide provides a complete, reproducible workflow for panel data regression — from data preparation through estimation to reporting — in both Python and Stata. It covers fixed effects, random effects, model selection, and diagnostics.
+## Step 1: Data Structure and Setup
+### Panel Data Format
+Panel data should be in **long format** with one row per entity-time observation:
+| entity_id | year | outcome | treatment | control_1 | control_2 |
+|-----------|------|---------|-----------|-----------|-----------|
+| firm_001 | 2018 | 45.2 | 0 | 12.3 | 0.8 |
+| firm_001 | 2019 | 48.7 | 0 | 13.1 | 0.9 |
+| firm_001 | 2020 | 52.1 | 1 | 14.0 | 0.7 |
+| firm_002 | 2018 | 31.0 | 0 | 8.5 | 1.2 |
+| ... | ... | ... | ... | ... | ... |
+### Python Setup
+```python
+import pandas as pd
+import numpy as np
+from linearmodels.panel import PanelOLS, RandomEffects, BetweenOLS, compare
+import statsmodels.api as sm
+# Load and set panel structure
+df = pd.read_csv("panel_data.csv")
+df = df.set_index(["entity_id", "year"])
+# Check balance
+balance = df.groupby("entity_id").size()
+print(f"Balanced: {balance.nunique() == 1}")
+print(f"Entities: {df.index.get_level_values(0).nunique()}")
+print(f"Periods: {df.index.get_level_values(1).nunique()}")
+print(f"Observations: {len(df)}")
+```
+### Stata Setup
+```stata
+* Declare panel structure
+xtset entity_id year
+* Check balance
+xtdescribe
+xtsum outcome treatment control_1 control_2
+```
+## Step 2: Exploratory Panel Analysis
+### Within and Between Variation
+```python
+# Decompose variation
+entity_means = df.groupby("entity_id")["outcome"].transform("mean")
+time_means = df.groupby("year")["outcome"].transform("mean")
+grand_mean = df["outcome"].mean()
+df["within_var"] = df["outcome"] - entity_means
+df["between_var"] = entity_means - grand_mean
+print(f"Total variance:   {df['outcome'].var():.4f}")
+print(f"Within variance:  {df['within_var'].var():.4f}")
+print(f"Between variance: {df['between_var'].var():.4f}")
+```
+```stata
+* Stata: within/between decomposition
+xtsum outcome treatment control_1 control_2
+* Reports Overall, Between, and Within standard deviations
+```
+### Visual Diagnostics
+```python
+import matplotlib.pyplot as plt
+# Entity-specific time trends (spaghetti plot)
+fig, ax = plt.subplots(figsize=(10, 6))
+for entity, group in df.groupby("entity_id"):
+    ax.plot(group.index.get_level_values("year"), group["outcome"],
+            alpha=0.3, color="steelblue")
+ax.set_xlabel("Year")
+ax.set_ylabel("Outcome")
+ax.set_title("Entity-Level Outcome Trajectories")
+plt.tight_layout()
+plt.savefig("panel_trajectories.png", dpi=150)
+```
+## Step 3: Estimation
+### Fixed Effects (Within Estimator)
+Controls for all time-invariant unobserved entity characteristics:
+```python
+# Python: Entity fixed effects
+model_fe = PanelOLS(
+    df["outcome"],
+    df[["treatment", "control_1", "control_2"]],
+    entity_effects=True,
+    time_effects=True,  # two-way FE
+    check_rank=True
+)
+result_fe = model_fe.fit(cov_type="clustered", cluster_entity=True)
+print(result_fe.summary)
+```
+```stata
+* Stata: Entity + time fixed effects with clustered SEs
+xtreg outcome treatment control_1 control_2 i.year, fe cluster(entity_id)
+* Or using reghdfe (absorbs high-dimensional FE efficiently)
+reghdfe outcome treatment control_1 control_2, absorb(entity_id year) cluster(entity_id)
+```
+### Random Effects (GLS)
+Assumes unobserved effects are uncorrelated with regressors:
+```python
+# Python: Random effects
+model_re = RandomEffects(
+    df["outcome"],
+    df[["treatment", "control_1", "control_2"]]
+)
+result_re = model_re.fit(cov_type="clustered", cluster_entity=True)
+print(result_re.summary)
+```
+```stata
+* Stata: Random effects
+xtreg outcome treatment control_1 control_2, re cluster(entity_id)
+```
+## Step 4: Model Selection
+### Hausman Test (FE vs RE)
+```python
+# Python: manual Hausman test
+from scipy import stats
+b_fe = result_fe.params
+b_re = result_re.params
+common = b_fe.index.intersection(b_re.index)
+diff = b_fe[common] - b_re[common]
+cov_diff = result_fe.cov[common].loc[common] - result_re.cov[common].loc[common]
+hausman_stat = float(diff @ np.linalg.inv(cov_diff) @ diff)
+p_value = 1 - stats.chi2.cdf(hausman_stat, df=len(common))
+print(f"Hausman statistic: {hausman_stat:.4f}")
+print(f"p-value: {p_value:.4f}")
+print(f"Decision: {'Fixed Effects' if p_value < 0.05 else 'Random Effects'}")
+```
+```stata
+* Stata: Hausman test
+quietly xtreg outcome treatment control_1 control_2, fe
+estimates store fe
+quietly xtreg outcome treatment control_1 control_2, re
+estimates store re
+hausman fe re
+```
+**Interpretation**: p < 0.05 → FE preferred (RE assumption violated). In practice, most applied researchers default to FE for causal inference.
+### Decision Framework
+```
+1. Is the key variable time-varying?
+   No → Cannot use FE (within estimator eliminates it)
+        Use RE, Correlated RE, or Between estimator
+   Yes → Continue
+2. Hausman test significant?
+   Yes → Use Fixed Effects
+   No → RE is more efficient, but FE is still consistent
+        (many researchers use FE regardless for robustness)
+3. Time effects needed?
+   Check: testparm i.year (Stata) or joint F-test
+   Significant → Include time FE (two-way)
+4. Clustering level?
+   Cluster at the entity level (or higher if treatment varies at group level)
+```
+## Step 5: Diagnostics
+```python
+# Serial correlation test (Wooldridge)
+# H₀: No first-order autocorrelation
+from linearmodels.panel import PanelOLS
+# Estimate first-differenced model and test residual autocorrelation
+# Heteroscedasticity (Modified Wald test)
+# If using clustered SEs, heteroscedasticity is already addressed
+# Cross-sectional dependence (Pesaran CD test)
+# Important for macro panels (country-level data)
+```
+```stata
+* Stata: Wooldridge test for serial correlation
+xtserial outcome treatment control_1 control_2
+* Modified Wald test for heteroscedasticity in FE
+xttest3
+* Pesaran CD test for cross-sectional dependence
+xtcd outcome treatment control_1 control_2
+```
+## Step 6: Reporting
+### Publication Table
+```python
+# Python: compare multiple specifications
+from linearmodels.panel import compare
+comparison = compare({
+    "OLS": result_ols,
+    "FE": result_fe,
+    "FE + Time": result_fe_time,
+    "RE": result_re
+})
+print(comparison.summary)
+```
+```stata
+* Stata: publication-quality table
+eststo clear
+eststo: reg outcome treatment control_1 control_2, cluster(entity_id)
+eststo: xtreg outcome treatment control_1 control_2, fe cluster(entity_id)
+eststo: reghdfe outcome treatment control_1 control_2, absorb(entity_id year) cluster(entity_id)
+eststo: xtreg outcome treatment control_1 control_2, re cluster(entity_id)
+esttab, se star(* 0.10 ** 0.05 *** 0.01) ///
+    title("Panel Regression Results") label ///
+    mtitles("OLS" "FE" "Two-way FE" "RE") ///
+    scalars("r2 R-squared" "N Observations")
+```
+## References
+- Wooldridge, J. M. (2010). *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press.
+- Cameron, A. C., & Trivedi, P. K. (2005). *Microeconometrics*. Cambridge University Press.
+- [linearmodels Python Package](https://bashtage.github.io/linearmodels/)
+- [reghdfe Stata Package](http://scorreia.com/software/reghdfe/)

package/skills/analysis/econometrics/python-causality-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,134 @@
+---
+name: python-causality-guide
+description: "Learn causal inference with Python using the Brave and True handbook"
+metadata:
+  openclaw:
+    emoji: "📊"
+    category: "analysis"
+    subcategory: "econometrics"
+    keywords: ["causal-inference", "python", "econometrics", "statistics", "treatment-effects", "observational-studies"]
+    source: "https://github.com/matheusfacure/python-causality-handbook"
+---
+# Causal Inference for the Brave and True
+## Overview
+Causal Inference for the Brave and True is an open-source, Python-based textbook by Matheus Facure that teaches causal inference methods through practical implementations. The book bridges the gap between theoretical econometrics textbooks and hands-on data science practice, presenting each method with runnable Python code, real-world datasets, and intuitive explanations that demystify the mathematics behind causal reasoning.
+The handbook covers the full spectrum of causal inference techniques used in modern empirical research, from foundational concepts like potential outcomes and directed acyclic graphs (DAGs) through advanced methods including instrumental variables, regression discontinuity, difference-in-differences, and synthetic control. Each chapter builds on the previous one, constructing a coherent framework for thinking about causation from observational data.
+With over 3,000 GitHub stars, this resource has become a standard reference for graduate students, applied researchers, and data scientists seeking to add causal reasoning to their analytical toolkit. The emphasis on Python implementation makes it directly applicable to modern research workflows.
+## Installation and Setup
+The handbook runs as Jupyter notebooks. Set up the environment:
+```bash
+git clone https://github.com/matheusfacure/python-causality-handbook.git
+cd python-causality-handbook
+# Create a virtual environment
+python -m venv causal-env
+source causal-env/bin/activate
+# Install dependencies
+pip install numpy pandas matplotlib seaborn scikit-learn statsmodels
+pip install linearmodels causalinference
+pip install jupyter
+```
+Launch the notebook server:
+```bash
+jupyter notebook
+```
+The chapters are organized as numbered Jupyter notebooks, starting from foundational concepts and progressing to advanced methods. Each notebook is self-contained with all data loading and analysis code included.
+## Core Methods Covered
+**Potential Outcomes Framework**: The book begins by establishing the Neyman-Rubin potential outcomes model, defining treatment effects and the fundamental problem of causal inference:
+```python
+import pandas as pd
+import numpy as np
+from scipy.stats import ttest_ind
+# Estimate ATE from randomized experiment
+treated = data[data["treatment"] == 1]["outcome"]
+control = data[data["treatment"] == 0]["outcome"]
+ate = treated.mean() - control.mean()
+t_stat, p_value = ttest_ind(treated, control)
+print(f"ATE: {ate:.3f}, p-value: {p_value:.4f}")
+```
+**Regression and Matching**: OLS regression for causal estimation, understanding omitted variable bias, propensity score methods, and matching estimators:
+```python
+import statsmodels.formula.api as smf
+# OLS with controls
+model = smf.ols("outcome ~ treatment + age + income + education", data=data)
+results = model.fit(cov_type="HC1")
+print(results.summary().tables[1])
+```
+**Instrumental Variables**: Two-stage least squares and the local average treatment effect, with practical guidance on instrument validity and weak instrument diagnostics:
+```python
+from linearmodels.iv import IV2SLS
+# Two-stage least squares
+iv_formula = "outcome ~ 1 + [treatment ~ instrument]"
+iv_model = IV2SLS.from_formula(iv_formula, data=data)
+iv_results = iv_model.fit(cov_type="robust")
+print(iv_results.summary)
+```
+**Difference-in-Differences**: Parallel trends assumption, two-way fixed effects, event study designs, and staggered treatment adoption:
+```python
+# Difference-in-Differences with two-way fixed effects
+did_model = smf.ols(
+    "outcome ~ treated_post + C(unit_id) + C(time_period)",
+    data=panel_data
+)
+did_results = did_model.fit(cov_type="cluster", cov_kwds={"groups": panel_data["unit_id"]})
+```
+**Regression Discontinuity**: Sharp and fuzzy RD designs, bandwidth selection, and local polynomial estimation for identifying causal effects at policy thresholds.
+**Synthetic Control**: Constructing counterfactual units from donor pools for comparative case studies, with inference via placebo tests.
+## Research Workflow Integration
+**Graduate Coursework**: The handbook maps directly to applied econometrics and causal inference course syllabi. Students can follow along with lectures by running the corresponding notebooks, experimenting with parameter changes, and observing how different assumptions affect estimates.
+**Method Selection Guide**: Use the decision framework presented across chapters to choose the appropriate method for your research question:
+- Randomized experiment available: simple comparison of means or regression adjustment
+- Selection on observables: matching, propensity scores, or regression
+- Unobserved confounders with instrument: instrumental variables
+- Policy threshold: regression discontinuity
+- Before/after with control group: difference-in-differences
+- Single treated unit over time: synthetic control
+**Replication and Extension**: Each chapter uses real or realistic datasets. Researchers can adapt the code to their own data by replacing data loading steps while preserving the analytical pipeline.
+**Teaching Tool**: Instructors can assign chapters as interactive homework, asking students to modify assumptions, change specifications, or apply methods to new datasets. The notebook format makes it straightforward to create assignments with embedded solutions.
+## Best Practices Highlighted in the Handbook
+1. **Always graph your data first**: Visual inspection reveals patterns that inform modeling choices and expose violations of identifying assumptions.
+2. **Understand your identification strategy**: Before running any estimator, articulate clearly what variation identifies the causal effect and what assumptions are required.
+3. **Cluster standard errors appropriately**: When treatment is assigned at group level, cluster standard errors at that level to avoid overstating statistical significance.
+4. **Run robustness checks**: Vary specifications, bandwidths, control variables, and functional forms to assess sensitivity of conclusions.
+5. **Report effect sizes alongside p-values**: Statistical significance without practical significance is not informative for policy or scientific understanding.
+## References
+- Python Causality Handbook: https://github.com/matheusfacure/python-causality-handbook
+- Online version: https://matheusfacure.github.io/python-causality-handbook/
+- Angrist and Pischke, Mostly Harmless Econometrics (companion reference)
+- Cunningham, Causal Inference: The Mixtape (complementary resource)