@wentorai/research-plugins 1.2.2 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (141) hide show
  1. package/README.md +16 -8
  2. package/openclaw.plugin.json +10 -3
  3. package/package.json +2 -5
  4. package/skills/analysis/dataviz/SKILL.md +25 -0
  5. package/skills/analysis/dataviz/chart-image-generator/SKILL.md +1 -1
  6. package/skills/analysis/econometrics/SKILL.md +23 -0
  7. package/skills/analysis/econometrics/robustness-checks/SKILL.md +1 -1
  8. package/skills/analysis/statistics/SKILL.md +21 -0
  9. package/skills/analysis/statistics/data-anomaly-detection/SKILL.md +1 -1
  10. package/skills/analysis/statistics/ml-experiment-tracker/SKILL.md +1 -1
  11. package/skills/analysis/statistics/{senior-data-scientist-guide → modeling-strategy-guide}/SKILL.md +5 -5
  12. package/skills/analysis/wrangling/SKILL.md +21 -0
  13. package/skills/analysis/wrangling/csv-data-analyzer/SKILL.md +1 -1
  14. package/skills/analysis/wrangling/data-cog-guide/SKILL.md +1 -1
  15. package/skills/domains/ai-ml/SKILL.md +37 -0
  16. package/skills/domains/biomedical/SKILL.md +28 -0
  17. package/skills/domains/biomedical/genomas-guide/SKILL.md +1 -1
  18. package/skills/domains/biomedical/med-researcher-guide/SKILL.md +1 -1
  19. package/skills/domains/biomedical/medgeclaw-guide/SKILL.md +1 -1
  20. package/skills/domains/business/SKILL.md +17 -0
  21. package/skills/domains/business/architecture-design-guide/SKILL.md +1 -1
  22. package/skills/domains/chemistry/SKILL.md +19 -0
  23. package/skills/domains/chemistry/computational-chemistry-guide/SKILL.md +1 -1
  24. package/skills/domains/cs/SKILL.md +21 -0
  25. package/skills/domains/ecology/SKILL.md +16 -0
  26. package/skills/domains/economics/SKILL.md +20 -0
  27. package/skills/domains/economics/post-labor-economics/SKILL.md +1 -1
  28. package/skills/domains/economics/pricing-psychology-guide/SKILL.md +1 -1
  29. package/skills/domains/education/SKILL.md +19 -0
  30. package/skills/domains/education/academic-study-methods/SKILL.md +1 -1
  31. package/skills/domains/education/edumcp-guide/SKILL.md +1 -1
  32. package/skills/domains/finance/SKILL.md +19 -0
  33. package/skills/domains/finance/akshare-finance-data/SKILL.md +1 -1
  34. package/skills/domains/finance/options-analytics-agent-guide/SKILL.md +1 -1
  35. package/skills/domains/finance/stata-accounting-research/SKILL.md +1 -1
  36. package/skills/domains/geoscience/SKILL.md +17 -0
  37. package/skills/domains/humanities/SKILL.md +16 -0
  38. package/skills/domains/humanities/history-research-guide/SKILL.md +1 -1
  39. package/skills/domains/humanities/political-history-guide/SKILL.md +1 -1
  40. package/skills/domains/law/SKILL.md +19 -0
  41. package/skills/domains/math/SKILL.md +17 -0
  42. package/skills/domains/pharma/SKILL.md +17 -0
  43. package/skills/domains/physics/SKILL.md +16 -0
  44. package/skills/domains/social-science/SKILL.md +17 -0
  45. package/skills/domains/social-science/sociology-research-methods/SKILL.md +1 -1
  46. package/skills/literature/discovery/SKILL.md +20 -0
  47. package/skills/literature/discovery/paper-recommendation-guide/SKILL.md +1 -1
  48. package/skills/literature/discovery/semantic-paper-radar/SKILL.md +1 -1
  49. package/skills/literature/fulltext/SKILL.md +26 -0
  50. package/skills/literature/metadata/SKILL.md +35 -0
  51. package/skills/literature/metadata/doi-content-negotiation/SKILL.md +4 -0
  52. package/skills/literature/metadata/doi-resolution-guide/SKILL.md +4 -0
  53. package/skills/literature/metadata/orcid-api/SKILL.md +4 -0
  54. package/skills/literature/metadata/orcid-integration-guide/SKILL.md +4 -0
  55. package/skills/literature/search/SKILL.md +43 -0
  56. package/skills/literature/search/paper-search-mcp-guide/SKILL.md +1 -1
  57. package/skills/research/automation/SKILL.md +21 -0
  58. package/skills/research/deep-research/SKILL.md +24 -0
  59. package/skills/research/deep-research/auto-deep-research-guide/SKILL.md +1 -1
  60. package/skills/research/deep-research/in-depth-research-guide/SKILL.md +1 -1
  61. package/skills/research/funding/SKILL.md +20 -0
  62. package/skills/research/methodology/SKILL.md +24 -0
  63. package/skills/research/paper-review/SKILL.md +19 -0
  64. package/skills/research/paper-review/paper-critique-framework/SKILL.md +1 -1
  65. package/skills/tools/code-exec/SKILL.md +18 -0
  66. package/skills/tools/diagram/SKILL.md +20 -0
  67. package/skills/tools/document/SKILL.md +21 -0
  68. package/skills/tools/knowledge-graph/SKILL.md +21 -0
  69. package/skills/tools/ocr-translate/SKILL.md +18 -0
  70. package/skills/tools/ocr-translate/handwriting-recognition-guide/SKILL.md +2 -0
  71. package/skills/tools/ocr-translate/latex-ocr-guide/SKILL.md +2 -0
  72. package/skills/tools/scraping/SKILL.md +17 -0
  73. package/skills/writing/citation/SKILL.md +33 -0
  74. package/skills/writing/citation/zotfile-attachment-guide/SKILL.md +2 -0
  75. package/skills/writing/composition/SKILL.md +22 -0
  76. package/skills/writing/composition/research-paper-writer/SKILL.md +1 -1
  77. package/skills/writing/composition/scientific-writing-wrapper/SKILL.md +1 -1
  78. package/skills/writing/latex/SKILL.md +22 -0
  79. package/skills/writing/latex/academic-writing-latex/SKILL.md +1 -1
  80. package/skills/writing/latex/latex-drawing-guide/SKILL.md +1 -1
  81. package/skills/writing/polish/SKILL.md +20 -0
  82. package/skills/writing/polish/chinese-text-humanizer/SKILL.md +1 -1
  83. package/skills/writing/templates/SKILL.md +22 -0
  84. package/skills/writing/templates/beamer-presentation-guide/SKILL.md +1 -1
  85. package/skills/writing/templates/scientific-article-pdf/SKILL.md +1 -1
  86. package/skills/analysis/dataviz/citation-map-guide/SKILL.md +0 -184
  87. package/skills/analysis/dataviz/data-visualization-principles/SKILL.md +0 -171
  88. package/skills/analysis/econometrics/empirical-paper-analysis/SKILL.md +0 -192
  89. package/skills/analysis/econometrics/panel-data-regression-workflow/SKILL.md +0 -267
  90. package/skills/analysis/econometrics/stata-regression/SKILL.md +0 -117
  91. package/skills/analysis/statistics/general-statistics-guide/SKILL.md +0 -226
  92. package/skills/analysis/statistics/infiagent-benchmark-guide/SKILL.md +0 -106
  93. package/skills/analysis/statistics/pywayne-statistics-guide/SKILL.md +0 -192
  94. package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md +0 -193
  95. package/skills/analysis/wrangling/claude-data-analysis-guide/SKILL.md +0 -100
  96. package/skills/analysis/wrangling/open-data-scientist-guide/SKILL.md +0 -197
  97. package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md +0 -159
  98. package/skills/domains/humanities/digital-humanities-methods/SKILL.md +0 -232
  99. package/skills/domains/law/legal-research-methods/SKILL.md +0 -190
  100. package/skills/domains/social-science/sociology-research-guide/SKILL.md +0 -238
  101. package/skills/literature/discovery/arxiv-paper-monitoring/SKILL.md +0 -233
  102. package/skills/literature/discovery/paper-tracking-guide/SKILL.md +0 -211
  103. package/skills/literature/fulltext/zotero-scihub-guide/SKILL.md +0 -168
  104. package/skills/literature/search/arxiv-osiris/SKILL.md +0 -199
  105. package/skills/literature/search/deepgit-search-guide/SKILL.md +0 -147
  106. package/skills/literature/search/multi-database-literature-search/SKILL.md +0 -198
  107. package/skills/literature/search/papers-chat-guide/SKILL.md +0 -194
  108. package/skills/literature/search/pasa-paper-search-guide/SKILL.md +0 -138
  109. package/skills/literature/search/scientify-literature-survey/SKILL.md +0 -203
  110. package/skills/research/automation/ai-scientist-guide/SKILL.md +0 -228
  111. package/skills/research/automation/coexist-ai-guide/SKILL.md +0 -149
  112. package/skills/research/automation/foam-agent-guide/SKILL.md +0 -203
  113. package/skills/research/automation/research-paper-orchestrator/SKILL.md +0 -254
  114. package/skills/research/deep-research/academic-deep-research/SKILL.md +0 -190
  115. package/skills/research/deep-research/cognitive-kernel-guide/SKILL.md +0 -200
  116. package/skills/research/deep-research/corvus-research-guide/SKILL.md +0 -132
  117. package/skills/research/deep-research/deep-research-pro/SKILL.md +0 -213
  118. package/skills/research/deep-research/deep-research-work/SKILL.md +0 -204
  119. package/skills/research/deep-research/research-cog/SKILL.md +0 -153
  120. package/skills/research/methodology/academic-mentor-guide/SKILL.md +0 -169
  121. package/skills/research/methodology/deep-innovator-guide/SKILL.md +0 -242
  122. package/skills/research/methodology/research-pipeline-units-guide/SKILL.md +0 -169
  123. package/skills/research/paper-review/paper-compare-guide/SKILL.md +0 -238
  124. package/skills/research/paper-review/paper-digest-guide/SKILL.md +0 -240
  125. package/skills/research/paper-review/paper-research-assistant/SKILL.md +0 -231
  126. package/skills/research/paper-review/research-quality-filter/SKILL.md +0 -261
  127. package/skills/tools/code-exec/contextplus-mcp-guide/SKILL.md +0 -110
  128. package/skills/tools/diagram/clawphd-guide/SKILL.md +0 -149
  129. package/skills/tools/diagram/scientific-graphical-abstract/SKILL.md +0 -201
  130. package/skills/tools/document/md2pdf-xelatex/SKILL.md +0 -212
  131. package/skills/tools/document/openpaper-guide/SKILL.md +0 -232
  132. package/skills/tools/document/weknora-guide/SKILL.md +0 -216
  133. package/skills/tools/knowledge-graph/mimir-memory-guide/SKILL.md +0 -135
  134. package/skills/tools/knowledge-graph/open-webui-tools-guide/SKILL.md +0 -156
  135. package/skills/tools/ocr-translate/formula-recognition-guide/SKILL.md +0 -367
  136. package/skills/tools/ocr-translate/math-equation-renderer/SKILL.md +0 -198
  137. package/skills/tools/scraping/api-data-collection-guide/SKILL.md +0 -301
  138. package/skills/writing/citation/academic-citation-manager-guide/SKILL.md +0 -182
  139. package/skills/writing/composition/opendraft-thesis-guide/SKILL.md +0 -200
  140. package/skills/writing/composition/paper-debugger-guide/SKILL.md +0 -143
  141. package/skills/writing/composition/paperforge-guide/SKILL.md +0 -205
@@ -1,192 +0,0 @@
1
- ---
2
- name: empirical-paper-analysis
3
- description: "Systematic framework for analyzing empirical law and economics papers"
4
- metadata:
5
- openclaw:
6
- emoji: "⚖️"
7
- category: "analysis"
8
- subcategory: "econometrics"
9
- keywords: ["empirical analysis", "law and economics", "identification strategy", "causal inference", "robustness checks", "research methodology"]
10
- source: "https://clawhub.ai/zhouziyue233/empirical-paper-analysis-skill"
11
- ---
12
-
13
- # Empirical Paper Analysis Framework
14
-
15
- ## Overview
16
-
17
- This framework provides a systematic approach to reading, evaluating, and critiquing empirical research papers in law and economics and related social science fields. It covers identification strategy assessment, data evaluation, robustness check analysis, and constructive critique formulation. Use this when reviewing papers for seminars, referee reports, or your own literature reviews.
18
-
19
- ## The 6-Step Analysis Framework
20
-
21
- ### Step 1: Identify the Research Question
22
-
23
- Extract the core question and decompose it:
24
-
25
- ```
26
- Template:
27
- - Research Question: [What causal/descriptive claim does the paper make?]
28
- - Unit of analysis: [individual / firm / state / country-year]
29
- - Outcome variable (Y): [What is being explained?]
30
- - Key explanatory variable (X): [What is the treatment or variable of interest?]
31
- - Claimed relationship: [X → Y via what mechanism?]
32
- ```
33
-
34
- **Red flags**:
35
- - Vague or shifting research question across sections
36
- - Mismatch between stated question and actual regression specification
37
- - Question that is purely correlational framed as causal
38
-
39
- ### Step 2: Evaluate the Identification Strategy
40
-
41
- The identification strategy is how the paper argues for causal interpretation. Map it to a known framework:
42
-
43
- | Strategy | Key Assumption | What to Check |
44
- |----------|---------------|---------------|
45
- | **OLS** | No omitted variable bias (E[u\|X]=0) | Control variable completeness, R² sensitivity |
46
- | **IV / 2SLS** | Exclusion restriction (instrument affects Y only through X) | First stage F-stat (>10), instrument validity argument |
47
- | **Difference-in-Differences** | Parallel trends (absent treatment, treated and control would trend similarly) | Pre-treatment parallel trends test, event study plot |
48
- | **Regression Discontinuity** | No manipulation at cutoff, continuity of potential outcomes | McCrary density test, covariate balance at cutoff |
49
- | **Matching / PSM** | Selection on observables (no unobservable confounders) | Balance tables, common support, sensitivity to caliper |
50
- | **Synthetic Control** | Pre-treatment fit quality, no spillovers | RMSPE ratio, placebo tests on donor pool |
51
-
52
- **Questions to ask**:
53
- - Is the identification assumption stated explicitly?
54
- - Is there a **falsification test** (placebo treatment, placebo outcome)?
55
- - Could there be **reverse causality**?
56
- - Are there **spillover effects** that violate SUTVA?
57
-
58
- ### Step 3: Assess the Data
59
-
60
- ```
61
- Data Evaluation Checklist:
62
- □ Source: Is the data publicly available or proprietary?
63
- □ Sample period: Does it match the question? Any structural breaks?
64
- □ Sample size: Sufficient for the method? Power analysis?
65
- □ Attrition: Is there selective dropout? Attrition tables?
66
- □ Measurement: Are key variables measured directly or proxied?
67
- □ External validity: Is the sample representative of the population of interest?
68
- ```
69
-
70
- **Common data issues in law and economics**:
71
- - Court case data: selection into litigation (Priest-Klein hypothesis)
72
- - Regulatory data: endogenous timing of policy changes
73
- - Survey data: response bias, recall bias
74
- - Administrative data: measurement captures legal definitions, not economic concepts
75
-
76
- ### Step 4: Evaluate the Empirical Specification
77
-
78
- Examine the main regression equation:
79
-
80
- ```
81
- Y_it = α + β·X_it + γ·Controls_it + θ_i + λ_t + ε_it
82
-
83
- Where:
84
- Y_it = outcome for unit i at time t
85
- X_it = treatment / variable of interest
86
- Controls_it = control variables
87
- θ_i = unit fixed effects
88
- λ_t = time fixed effects
89
- ε_it = error term
90
- β = coefficient of interest
91
- ```
92
-
93
- **Check**:
94
- - Is `β` the causal parameter of interest, or just an association?
95
- - Are fixed effects appropriate? (Individual FE removes time-invariant confounders)
96
- - What is the **clustering level** for standard errors? (Should match treatment assignment level)
97
- - Are control variables themselves **bad controls** (post-treatment variables that are affected by X)?
98
-
99
- ### Step 5: Scrutinize Robustness Checks
100
-
101
- A well-executed paper should include several:
102
-
103
- | Robustness Check | Purpose | What to Look For |
104
- |-----------------|---------|-----------------|
105
- | **Alternative specifications** | Drop/add controls | Does β sign/magnitude change? |
106
- | **Alternative samples** | Trim outliers, restrict subgroups | Is result driven by a small subset? |
107
- | **Placebo tests** | Fake treatment date, fake outcome | Should find null results |
108
- | **Alternative clustering** | State vs. county vs. firm | Does significance survive? |
109
- | **Bounding exercises** | Oster (2019) bounds, Altonji ratio | How large would selection on unobservables need to be? |
110
- | **Leave-one-out** | Drop each unit/period | Is result driven by a single observation? |
111
- | **Event study** | Dynamic treatment effects plot | Are pre-treatment coefficients zero? |
112
-
113
- **Warning signs**:
114
- - Only showing robustness checks that "work" (selective reporting)
115
- - No sensitivity analysis on key assumptions
116
- - Robustness table hidden in appendix with different significance levels
117
-
118
- ### Step 6: Formulate Constructive Critique
119
-
120
- Structure your critique as:
121
-
122
- ```markdown
123
- ## Summary
124
- [2-3 sentences on what the paper does and finds]
125
-
126
- ## Strengths
127
- - [Identification strategy strength]
128
- - [Data quality strength]
129
- - [Policy relevance]
130
-
131
- ## Main Concerns
132
-
133
- ### Concern 1: [Identification]
134
- - Issue: [What specific assumption is violated or untested?]
135
- - Evidence: [What in the paper supports your concern?]
136
- - Suggestion: [What analysis would address this?]
137
-
138
- ### Concern 2: [Data/Measurement]
139
- - Issue: ...
140
- - Evidence: ...
141
- - Suggestion: ...
142
-
143
- ### Concern 3: [Specification]
144
- - Issue: ...
145
- - Evidence: ...
146
- - Suggestion: ...
147
-
148
- ## Minor Comments
149
- - [Table formatting, typos, unclear notation]
150
- ```
151
-
152
- ## Quick Reference: Common Mistakes
153
-
154
- | Mistake | Why It's Wrong | Fix |
155
- |---------|---------------|-----|
156
- | Clustering at wrong level | Understated SEs, inflated t-stats | Cluster at treatment assignment level |
157
- | Bad controls | Including post-treatment variables biases β | Only control for pre-treatment variables |
158
- | Cherry-picked specification | Overfitting to significance | Pre-register or show full specification curve |
159
- | Ignoring multiple testing | Family-wise error rate inflation | Bonferroni or Benjamini-Hochberg correction |
160
- | Log of zero | Undefined, ad hoc fixes (log(Y+1)) introduce bias | IHS transform or Poisson pseudo-MLE |
161
- | Winner's curse | Published effect sizes are biased upward | Check if effect is plausible given prior literature |
162
-
163
- ## Example: Analyzing a DiD Paper
164
-
165
- ```
166
- Paper claim: "Adopting e-filing reduces case processing time by 15%"
167
-
168
- Step 1: RQ = Does e-filing (X) cause faster case processing (Y)?
169
- Step 2: DiD with staggered adoption across courts
170
- - Check: parallel trends plot for early vs. late adopters
171
- - Check: recent DiD literature (de Chaisemartin & D'Haultfoeuille 2020)
172
- warns that TWFE with staggered treatment can be biased
173
- Step 3: Data from court administrative records (2005-2020)
174
- - Check: is adoption timing truly exogenous? (Courts with backlogs
175
- might adopt earlier → selection bias)
176
- Step 4: log(processing_days)_it = β·efiling_it + court_FE + year_FE + ε_it
177
- - Concern: no controls for court budgets, judge turnover
178
- Step 5: Robustness: event study plot, drop large courts, alternative
179
- measure of processing time → β stable around -0.15
180
- Step 6: Credible but could be strengthened with:
181
- - Callaway & Sant'Anna (2021) estimator for staggered DiD
182
- - Instrument for adoption timing
183
- - Heterogeneity by court size and case type
184
- ```
185
-
186
- ## References
187
-
188
- - Angrist, J. D., & Pischke, J. S. (2009). *Mostly Harmless Econometrics*. Princeton University Press.
189
- - Oster, E. (2019). "Unobservable Selection and Coefficient Stability." *Journal of Business & Economic Statistics*.
190
- - de Chaisemartin, C., & D'Haultfoeuille, X. (2020). "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *AER*.
191
- - Callaway, B., & Sant'Anna, P. H. (2021). "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*.
192
- - [Empirical Legal Studies Resources](https://www.law.northwestern.edu/research-faculty/clbe/events/empiricallegalstudies/)
@@ -1,267 +0,0 @@
1
- ---
2
- name: panel-data-regression-workflow
3
- description: "Reproducible panel data regression workflow in Python and Stata"
4
- metadata:
5
- openclaw:
6
- emoji: "📊"
7
- category: "analysis"
8
- subcategory: "econometrics"
9
- keywords: ["panel data", "fixed effects", "regression workflow", "python econometrics", "stata", "reproducible research"]
10
- source: "https://skillsmp.com/skills/panel-data-regression-analyst"
11
- ---
12
-
13
- # Panel Data Regression Workflow
14
-
15
- ## Overview
16
-
17
- Panel data (longitudinal data) tracks multiple entities over time, enabling researchers to control for unobserved heterogeneity. This guide provides a complete, reproducible workflow for panel data regression — from data preparation through estimation to reporting — in both Python and Stata. It covers fixed effects, random effects, model selection, and diagnostics.
18
-
19
- ## Step 1: Data Structure and Setup
20
-
21
- ### Panel Data Format
22
-
23
- Panel data should be in **long format** with one row per entity-time observation:
24
-
25
- | entity_id | year | outcome | treatment | control_1 | control_2 |
26
- |-----------|------|---------|-----------|-----------|-----------|
27
- | firm_001 | 2018 | 45.2 | 0 | 12.3 | 0.8 |
28
- | firm_001 | 2019 | 48.7 | 0 | 13.1 | 0.9 |
29
- | firm_001 | 2020 | 52.1 | 1 | 14.0 | 0.7 |
30
- | firm_002 | 2018 | 31.0 | 0 | 8.5 | 1.2 |
31
- | ... | ... | ... | ... | ... | ... |
32
-
33
- ### Python Setup
34
-
35
- ```python
36
- import pandas as pd
37
- import numpy as np
38
- from linearmodels.panel import PanelOLS, RandomEffects, BetweenOLS, compare
39
- import statsmodels.api as sm
40
-
41
- # Load and set panel structure
42
- df = pd.read_csv("panel_data.csv")
43
- df = df.set_index(["entity_id", "year"])
44
-
45
- # Check balance
46
- balance = df.groupby("entity_id").size()
47
- print(f"Balanced: {balance.nunique() == 1}")
48
- print(f"Entities: {df.index.get_level_values(0).nunique()}")
49
- print(f"Periods: {df.index.get_level_values(1).nunique()}")
50
- print(f"Observations: {len(df)}")
51
- ```
52
-
53
- ### Stata Setup
54
-
55
- ```stata
56
- * Declare panel structure
57
- xtset entity_id year
58
-
59
- * Check balance
60
- xtdescribe
61
- xtsum outcome treatment control_1 control_2
62
- ```
63
-
64
- ## Step 2: Exploratory Panel Analysis
65
-
66
- ### Within and Between Variation
67
-
68
- ```python
69
- # Decompose variation
70
- entity_means = df.groupby("entity_id")["outcome"].transform("mean")
71
- time_means = df.groupby("year")["outcome"].transform("mean")
72
- grand_mean = df["outcome"].mean()
73
-
74
- df["within_var"] = df["outcome"] - entity_means
75
- df["between_var"] = entity_means - grand_mean
76
-
77
- print(f"Total variance: {df['outcome'].var():.4f}")
78
- print(f"Within variance: {df['within_var'].var():.4f}")
79
- print(f"Between variance: {df['between_var'].var():.4f}")
80
- ```
81
-
82
- ```stata
83
- * Stata: within/between decomposition
84
- xtsum outcome treatment control_1 control_2
85
- * Reports Overall, Between, and Within standard deviations
86
- ```
87
-
88
- ### Visual Diagnostics
89
-
90
- ```python
91
- import matplotlib.pyplot as plt
92
-
93
- # Entity-specific time trends (spaghetti plot)
94
- fig, ax = plt.subplots(figsize=(10, 6))
95
- for entity, group in df.groupby("entity_id"):
96
- ax.plot(group.index.get_level_values("year"), group["outcome"],
97
- alpha=0.3, color="steelblue")
98
- ax.set_xlabel("Year")
99
- ax.set_ylabel("Outcome")
100
- ax.set_title("Entity-Level Outcome Trajectories")
101
- plt.tight_layout()
102
- plt.savefig("panel_trajectories.png", dpi=150)
103
- ```
104
-
105
- ## Step 3: Estimation
106
-
107
- ### Fixed Effects (Within Estimator)
108
-
109
- Controls for all time-invariant unobserved entity characteristics:
110
-
111
- ```python
112
- # Python: Entity fixed effects
113
- model_fe = PanelOLS(
114
- df["outcome"],
115
- df[["treatment", "control_1", "control_2"]],
116
- entity_effects=True,
117
- time_effects=True, # two-way FE
118
- check_rank=True
119
- )
120
- result_fe = model_fe.fit(cov_type="clustered", cluster_entity=True)
121
- print(result_fe.summary)
122
- ```
123
-
124
- ```stata
125
- * Stata: Entity + time fixed effects with clustered SEs
126
- xtreg outcome treatment control_1 control_2 i.year, fe cluster(entity_id)
127
-
128
- * Or using reghdfe (absorbs high-dimensional FE efficiently)
129
- reghdfe outcome treatment control_1 control_2, absorb(entity_id year) cluster(entity_id)
130
- ```
131
-
132
- ### Random Effects (GLS)
133
-
134
- Assumes unobserved effects are uncorrelated with regressors:
135
-
136
- ```python
137
- # Python: Random effects
138
- model_re = RandomEffects(
139
- df["outcome"],
140
- df[["treatment", "control_1", "control_2"]]
141
- )
142
- result_re = model_re.fit(cov_type="clustered", cluster_entity=True)
143
- print(result_re.summary)
144
- ```
145
-
146
- ```stata
147
- * Stata: Random effects
148
- xtreg outcome treatment control_1 control_2, re cluster(entity_id)
149
- ```
150
-
151
- ## Step 4: Model Selection
152
-
153
- ### Hausman Test (FE vs RE)
154
-
155
- ```python
156
- # Python: manual Hausman test
157
- from scipy import stats
158
-
159
- b_fe = result_fe.params
160
- b_re = result_re.params
161
- common = b_fe.index.intersection(b_re.index)
162
-
163
- diff = b_fe[common] - b_re[common]
164
- cov_diff = result_fe.cov[common].loc[common] - result_re.cov[common].loc[common]
165
-
166
- hausman_stat = float(diff @ np.linalg.inv(cov_diff) @ diff)
167
- p_value = 1 - stats.chi2.cdf(hausman_stat, df=len(common))
168
- print(f"Hausman statistic: {hausman_stat:.4f}")
169
- print(f"p-value: {p_value:.4f}")
170
- print(f"Decision: {'Fixed Effects' if p_value < 0.05 else 'Random Effects'}")
171
- ```
172
-
173
- ```stata
174
- * Stata: Hausman test
175
- quietly xtreg outcome treatment control_1 control_2, fe
176
- estimates store fe
177
- quietly xtreg outcome treatment control_1 control_2, re
178
- estimates store re
179
- hausman fe re
180
- ```
181
-
182
- **Interpretation**: p < 0.05 → FE preferred (RE assumption violated). In practice, most applied researchers default to FE for causal inference.
183
-
184
- ### Decision Framework
185
-
186
- ```
187
- 1. Is the key variable time-varying?
188
- No → Cannot use FE (within estimator eliminates it)
189
- Use RE, Correlated RE, or Between estimator
190
- Yes → Continue
191
-
192
- 2. Hausman test significant?
193
- Yes → Use Fixed Effects
194
- No → RE is more efficient, but FE is still consistent
195
- (many researchers use FE regardless for robustness)
196
-
197
- 3. Time effects needed?
198
- Check: testparm i.year (Stata) or joint F-test
199
- Significant → Include time FE (two-way)
200
-
201
- 4. Clustering level?
202
- Cluster at the entity level (or higher if treatment varies at group level)
203
- ```
204
-
205
- ## Step 5: Diagnostics
206
-
207
- ```python
208
- # Serial correlation test (Wooldridge)
209
- # H₀: No first-order autocorrelation
210
- from linearmodels.panel import PanelOLS
211
- # Estimate first-differenced model and test residual autocorrelation
212
-
213
- # Heteroscedasticity (Modified Wald test)
214
- # If using clustered SEs, heteroscedasticity is already addressed
215
-
216
- # Cross-sectional dependence (Pesaran CD test)
217
- # Important for macro panels (country-level data)
218
- ```
219
-
220
- ```stata
221
- * Stata: Wooldridge test for serial correlation
222
- xtserial outcome treatment control_1 control_2
223
-
224
- * Modified Wald test for heteroscedasticity in FE
225
- xttest3
226
-
227
- * Pesaran CD test for cross-sectional dependence
228
- xtcd outcome treatment control_1 control_2
229
- ```
230
-
231
- ## Step 6: Reporting
232
-
233
- ### Publication Table
234
-
235
- ```python
236
- # Python: compare multiple specifications
237
- from linearmodels.panel import compare
238
-
239
- comparison = compare({
240
- "OLS": result_ols,
241
- "FE": result_fe,
242
- "FE + Time": result_fe_time,
243
- "RE": result_re
244
- })
245
- print(comparison.summary)
246
- ```
247
-
248
- ```stata
249
- * Stata: publication-quality table
250
- eststo clear
251
- eststo: reg outcome treatment control_1 control_2, cluster(entity_id)
252
- eststo: xtreg outcome treatment control_1 control_2, fe cluster(entity_id)
253
- eststo: reghdfe outcome treatment control_1 control_2, absorb(entity_id year) cluster(entity_id)
254
- eststo: xtreg outcome treatment control_1 control_2, re cluster(entity_id)
255
-
256
- esttab, se star(* 0.10 ** 0.05 *** 0.01) ///
257
- title("Panel Regression Results") label ///
258
- mtitles("OLS" "FE" "Two-way FE" "RE") ///
259
- scalars("r2 R-squared" "N Observations")
260
- ```
261
-
262
- ## References
263
-
264
- - Wooldridge, J. M. (2010). *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press.
265
- - Cameron, A. C., & Trivedi, P. K. (2005). *Microeconometrics*. Cambridge University Press.
266
- - [linearmodels Python Package](https://bashtage.github.io/linearmodels/)
267
- - [reghdfe Stata Package](http://scorreia.com/software/reghdfe/)
@@ -1,117 +0,0 @@
1
- ---
2
- name: stata-regression
3
- description: "Run regression analyses in Stata with publication-ready output"
4
- metadata:
5
- openclaw:
6
- emoji: "📊"
7
- category: "analysis"
8
- subcategory: "econometrics"
9
- keywords: ["Stata regression", "Stata data cleaning", "Stata commands", "panel data", "fixed effects", "robustness checks"]
10
- source: "https://github.com/awesome-econ-ai/academic-skills"
11
- ---
12
-
13
- # Stata Regression
14
-
15
- ## Purpose
16
-
17
- This skill produces reproducible regression analysis workflows in Stata, including model diagnostics and publication-ready tables using `esttab` or `outreg2`.
18
-
19
- ## When to Use
20
-
21
- - Estimating linear or nonlinear regression models in Stata
22
- - Producing tables for academic papers and reports
23
- - Running robustness checks and alternative specifications
24
-
25
- ## Instructions
26
-
27
- Follow these steps to complete the task:
28
-
29
- ### Step 1: Understand the Context
30
-
31
- Before generating any code, ask the user:
32
- - What is the dependent variable and key regressors?
33
- - What controls and fixed effects are required?
34
- - How should standard errors be clustered?
35
- - What output format is needed (LaTeX, Word, or CSV)?
36
-
37
- ### Step 2: Generate the Output
38
-
39
- Based on the context, generate Stata code that:
40
-
41
- - **Loads and checks the data** - Handle missing values and verify variable types
42
- - **Runs the requested specification** - Use `regress`, `reghdfe`, or `xtreg` as appropriate
43
- - **Adds robust or clustered standard errors** - Match the study design
44
- - **Exports tables** - Use `esttab` or `outreg2` with clear labels
45
-
46
- ### Step 3: Verify and Explain
47
-
48
- After generating output:
49
- - Explain what each model estimates
50
- - Highlight assumptions and diagnostics
51
- - Suggest robustness checks or alternative models
52
-
53
- ## Example Prompts
54
-
55
- - "Run OLS with firm and year fixed effects, clustering by firm"
56
- - "Estimate a logit model and export results to LaTeX"
57
- - "Create a regression table with three specifications"
58
-
59
- ## Example Output
60
-
61
- ```stata
62
- * ============================================
63
- * Regression Analysis with Stata
64
- * ============================================
65
-
66
- * Load data
67
- use "data.dta", clear
68
-
69
- * Summary stats
70
- summarize y x1 x2 x3
71
-
72
- * Main regression with clustered SEs
73
- regress y x1 x2 x3, vce(cluster firm_id)
74
- eststo model1
75
-
76
- * Alternative specification with fixed effects
77
- reghdfe y x1 x2 x3, absorb(firm_id year) vce(cluster firm_id)
78
- eststo model2
79
-
80
- * Export table
81
- esttab model1 model2 using "results/regression_table.tex", replace se label
82
- ```
83
-
84
- ## Requirements
85
-
86
- ### Software
87
-
88
- - Stata 17+
89
-
90
- ### Packages
91
-
92
- - `estout` (for `esttab`)
93
- - `reghdfe` (optional, for high-dimensional fixed effects)
94
-
95
- Install with:
96
- ```stata
97
- ssc install estout
98
- ssc install reghdfe
99
- ```
100
-
101
- ## Best Practices
102
-
103
- - **Match standard errors to the design** (cluster where treatment varies)
104
- - **Report all model variants** used in the analysis
105
- - **Document variable definitions** and transformations
106
-
107
- ## Common Pitfalls
108
-
109
- - Not clustering standard errors at the correct level
110
- - Omitting fixed effects when required by the design
111
- - Exporting tables without clear labels and notes
112
-
113
- ## References
114
-
115
- - [Stata Regression Reference Manual](https://www.stata.com/manuals/rregress.pdf)
116
- - [reghdfe documentation](https://github.com/sergiocorreia/reghdfe)
117
- - [estout documentation](https://repec.sowi.unibe.ch/stata/estout/)