stata-cli 0.3.0__tar.gz → 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. {stata_cli-0.3.0 → stata_cli-0.4.0}/PKG-INFO +16 -1
  2. {stata_cli-0.3.0 → stata_cli-0.4.0}/README.md +15 -0
  3. {stata_cli-0.3.0 → stata_cli-0.4.0}/pyproject.toml +4 -1
  4. stata_cli-0.4.0/src/stata_cli/__init__.py +1 -0
  5. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli/main.py +39 -0
  6. stata_cli-0.4.0/src/stata_cli/skill_registry.py +171 -0
  7. stata_cli-0.4.0/src/stata_cli/skills/overview.md +196 -0
  8. stata_cli-0.4.0/src/stata_cli/skills/packages/asdoc.md +357 -0
  9. stata_cli-0.4.0/src/stata_cli/skills/packages/binsreg.md +358 -0
  10. stata_cli-0.4.0/src/stata_cli/skills/packages/coefplot.md +397 -0
  11. stata_cli-0.4.0/src/stata_cli/skills/packages/data-manipulation.md +407 -0
  12. stata_cli-0.4.0/src/stata_cli/skills/packages/diagnostics.md +621 -0
  13. stata_cli-0.4.0/src/stata_cli/skills/packages/did.md +583 -0
  14. stata_cli-0.4.0/src/stata_cli/skills/packages/estout.md +676 -0
  15. stata_cli-0.4.0/src/stata_cli/skills/packages/event-study.md +1032 -0
  16. stata_cli-0.4.0/src/stata_cli/skills/packages/graph-schemes.md +633 -0
  17. stata_cli-0.4.0/src/stata_cli/skills/packages/ivreg2.md +387 -0
  18. stata_cli-0.4.0/src/stata_cli/skills/packages/nprobust.md +447 -0
  19. stata_cli-0.4.0/src/stata_cli/skills/packages/outreg2.md +424 -0
  20. stata_cli-0.4.0/src/stata_cli/skills/packages/package-management.md +319 -0
  21. stata_cli-0.4.0/src/stata_cli/skills/packages/psmatch2.md +658 -0
  22. stata_cli-0.4.0/src/stata_cli/skills/packages/rdrobust.md +498 -0
  23. stata_cli-0.4.0/src/stata_cli/skills/packages/reghdfe.md +372 -0
  24. stata_cli-0.4.0/src/stata_cli/skills/packages/synth.md +873 -0
  25. stata_cli-0.4.0/src/stata_cli/skills/packages/tabout.md +533 -0
  26. stata_cli-0.4.0/src/stata_cli/skills/packages/winsor.md +284 -0
  27. stata_cli-0.4.0/src/stata_cli/skills/packages/xtabond2.md +544 -0
  28. stata_cli-0.4.0/src/stata_cli/skills/references/advanced-programming.md +506 -0
  29. stata_cli-0.4.0/src/stata_cli/skills/references/basics-getting-started.md +237 -0
  30. stata_cli-0.4.0/src/stata_cli/skills/references/bootstrap-simulation.md +327 -0
  31. stata_cli-0.4.0/src/stata_cli/skills/references/data-import-export.md +282 -0
  32. stata_cli-0.4.0/src/stata_cli/skills/references/data-management.md +426 -0
  33. stata_cli-0.4.0/src/stata_cli/skills/references/date-time-functions.md +282 -0
  34. stata_cli-0.4.0/src/stata_cli/skills/references/descriptive-statistics.md +268 -0
  35. stata_cli-0.4.0/src/stata_cli/skills/references/difference-in-differences.md +750 -0
  36. stata_cli-0.4.0/src/stata_cli/skills/references/external-tools-integration.md +966 -0
  37. stata_cli-0.4.0/src/stata_cli/skills/references/gmm-estimation.md +367 -0
  38. stata_cli-0.4.0/src/stata_cli/skills/references/graphics.md +344 -0
  39. stata_cli-0.4.0/src/stata_cli/skills/references/limited-dependent-variables.md +289 -0
  40. stata_cli-0.4.0/src/stata_cli/skills/references/linear-regression.md +398 -0
  41. stata_cli-0.4.0/src/stata_cli/skills/references/machine-learning.md +511 -0
  42. stata_cli-0.4.0/src/stata_cli/skills/references/mata-data-access.md +370 -0
  43. stata_cli-0.4.0/src/stata_cli/skills/references/mata-introduction.md +313 -0
  44. stata_cli-0.4.0/src/stata_cli/skills/references/mata-matrix-operations.md +305 -0
  45. stata_cli-0.4.0/src/stata_cli/skills/references/mata-programming.md +400 -0
  46. stata_cli-0.4.0/src/stata_cli/skills/references/matching-methods.md +742 -0
  47. stata_cli-0.4.0/src/stata_cli/skills/references/mathematical-functions.md +269 -0
  48. stata_cli-0.4.0/src/stata_cli/skills/references/maximum-likelihood.md +749 -0
  49. stata_cli-0.4.0/src/stata_cli/skills/references/missing-data-handling.md +712 -0
  50. stata_cli-0.4.0/src/stata_cli/skills/references/nonparametric-methods.md +478 -0
  51. stata_cli-0.4.0/src/stata_cli/skills/references/panel-data.md +294 -0
  52. stata_cli-0.4.0/src/stata_cli/skills/references/programming-basics.md +440 -0
  53. stata_cli-0.4.0/src/stata_cli/skills/references/regression-discontinuity.md +486 -0
  54. stata_cli-0.4.0/src/stata_cli/skills/references/sample-selection.md +670 -0
  55. stata_cli-0.4.0/src/stata_cli/skills/references/sem-factor-analysis.md +576 -0
  56. stata_cli-0.4.0/src/stata_cli/skills/references/spatial-analysis.md +766 -0
  57. stata_cli-0.4.0/src/stata_cli/skills/references/string-functions.md +318 -0
  58. stata_cli-0.4.0/src/stata_cli/skills/references/survey-data-analysis.md +595 -0
  59. stata_cli-0.4.0/src/stata_cli/skills/references/survival-analysis.md +466 -0
  60. stata_cli-0.4.0/src/stata_cli/skills/references/tables-reporting.md +973 -0
  61. stata_cli-0.4.0/src/stata_cli/skills/references/time-series.md +345 -0
  62. stata_cli-0.4.0/src/stata_cli/skills/references/treatment-effects.md +804 -0
  63. stata_cli-0.4.0/src/stata_cli/skills/references/variables-operators.md +206 -0
  64. stata_cli-0.4.0/src/stata_cli/skills/references/workflow-best-practices.md +1176 -0
  65. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli.egg-info/PKG-INFO +16 -1
  66. stata_cli-0.4.0/src/stata_cli.egg-info/SOURCES.txt +76 -0
  67. stata_cli-0.3.0/src/stata_cli/__init__.py +0 -1
  68. stata_cli-0.3.0/src/stata_cli.egg-info/SOURCES.txt +0 -17
  69. {stata_cli-0.3.0 → stata_cli-0.4.0}/setup.cfg +0 -0
  70. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli/__main__.py +0 -0
  71. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli/daemon.py +0 -0
  72. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli/engine.py +0 -0
  73. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli/graph_artifacts.py +0 -0
  74. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli/output_filter.py +0 -0
  75. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli/smcl_parser.py +0 -0
  76. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli/utils.py +0 -0
  77. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli.egg-info/dependency_links.txt +0 -0
  78. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli.egg-info/entry_points.txt +0 -0
  79. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli.egg-info/requires.txt +0 -0
  80. {stata_cli-0.3.0 → stata_cli-0.4.0}/src/stata_cli.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: stata-cli
3
- Version: 0.3.0
3
+ Version: 0.4.0
4
4
  Summary: Command-line interface for running Stata commands via PyStata
5
5
  License: MIT
6
6
  Keywords: stata,cli,statistics,data-science
@@ -13,6 +13,8 @@ Requires-Dist: pandas; extra == "data"
13
13
 
14
14
  # stata-cli
15
15
 
16
+ > **Stata CLI Is All Reg Monkeys Need**
17
+
16
18
  ![stata-cli banner](assets/banner.png)
17
19
 
18
20
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -52,6 +54,7 @@ A command-line interface for [Stata](https://www.stata.com/) via PyStata — bui
52
54
  | **Daemon Mode** | Persistent background process for sub-second execution via Unix socket |
53
55
  | **Output Control** | Compact mode, JSON output, token limit management, log file output |
54
56
  | **Interruption** | Send break signal to stop long-running commands |
57
+ | **Skill Library** | Built-in Stata reference with 57 topics: syntax, econometrics, causal inference, packages |
55
58
 
56
59
  ## Installation & Quick Start
57
60
 
@@ -259,6 +262,18 @@ stata-cli frame
259
262
 
260
263
  Shows all Stata frames and the current working frame.
261
264
 
265
+ ### `skill` — Stata Reference Library
266
+
267
+ ```bash
268
+ stata-cli skill # overview: gotchas, patterns, topic routing table
269
+ stata-cli skill --list # list all 57 topics with descriptions
270
+ stata-cli skill regression # linear regression reference
271
+ stata-cli skill did # difference-in-differences guide
272
+ stata-cli skill reghdfe # reghdfe package guide
273
+ ```
274
+
275
+ Built-in reference library covering data management, econometrics, causal inference, graphics, Mata programming, and 20+ community packages. Aliases supported (e.g. `did` for `difference-in-differences`, `panel` for `panel-data`).
276
+
262
277
  ## Daemon Mode
263
278
 
264
279
  The daemon keeps PyStata alive in the background — reduces execution time from **~2-3s to ~85ms** (35x speedup).
@@ -1,5 +1,7 @@
1
1
  # stata-cli
2
2
 
3
+ > **Stata CLI Is All Reg Monkeys Need**
4
+
3
5
  ![stata-cli banner](assets/banner.png)
4
6
 
5
7
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -39,6 +41,7 @@ A command-line interface for [Stata](https://www.stata.com/) via PyStata — bui
39
41
  | **Daemon Mode** | Persistent background process for sub-second execution via Unix socket |
40
42
  | **Output Control** | Compact mode, JSON output, token limit management, log file output |
41
43
  | **Interruption** | Send break signal to stop long-running commands |
44
+ | **Skill Library** | Built-in Stata reference with 57 topics: syntax, econometrics, causal inference, packages |
42
45
 
43
46
  ## Installation & Quick Start
44
47
 
@@ -246,6 +249,18 @@ stata-cli frame
246
249
 
247
250
  Shows all Stata frames and the current working frame.
248
251
 
252
+ ### `skill` — Stata Reference Library
253
+
254
+ ```bash
255
+ stata-cli skill # overview: gotchas, patterns, topic routing table
256
+ stata-cli skill --list # list all 57 topics with descriptions
257
+ stata-cli skill regression # linear regression reference
258
+ stata-cli skill did # difference-in-differences guide
259
+ stata-cli skill reghdfe # reghdfe package guide
260
+ ```
261
+
262
+ Built-in reference library covering data management, econometrics, causal inference, graphics, Mata programming, and 20+ community packages. Aliases supported (e.g. `did` for `difference-in-differences`, `panel` for `panel-data`).
263
+
249
264
  ## Daemon Mode
250
265
 
251
266
  The daemon keeps PyStata alive in the background — reduces execution time from **~2-3s to ~85ms** (35x speedup).
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "stata-cli"
3
- version = "0.3.0"
3
+ version = "0.4.0"
4
4
  description = "Command-line interface for running Stata commands via PyStata"
5
5
  readme = "README.md"
6
6
  requires-python = ">=3.9"
@@ -23,3 +23,6 @@ build-backend = "setuptools.build_meta"
23
23
 
24
24
  [tool.setuptools.packages.find]
25
25
  where = ["src"]
26
+
27
+ [tool.setuptools.package-data]
28
+ stata_cli = ["skills/**/*.md"]
@@ -0,0 +1 @@
1
+ __version__ = "0.4.0"
@@ -406,6 +406,45 @@ def frame_cmd(ctx):
406
406
  click.echo(json.dumps(resp, ensure_ascii=False, indent=2))
407
407
 
408
408
 
409
+ # ── Skill command ────────────────────────────────────────────────────────
410
+
411
+ @cli.command("skill")
412
+ @click.argument("topic", required=False, default=None)
413
+ @click.option("--list", "list_topics", is_flag=True, default=False, help="List all available topics.")
414
+ @click.pass_context
415
+ def skill_cmd(ctx, topic, list_topics):
416
+ """Built-in Stata reference library.
417
+
418
+ Without arguments, shows the overview (gotchas, common patterns, routing table).
419
+ With a topic name, shows the detailed reference for that topic.
420
+
421
+ \b
422
+ Examples:
423
+ stata-cli skill # overview
424
+ stata-cli skill --list # list all topics
425
+ stata-cli skill regression # linear regression reference
426
+ stata-cli skill did # difference-in-differences
427
+ stata-cli skill reghdfe # reghdfe package guide
428
+ """
429
+ from .skill_registry import get_overview, get_topic, list_topics as _list_topics
430
+
431
+ if list_topics:
432
+ click.echo(_list_topics())
433
+ return
434
+
435
+ if topic is None:
436
+ click.echo(get_overview())
437
+ return
438
+
439
+ content = get_topic(topic)
440
+ if content is None:
441
+ click.echo(f"Unknown topic: {topic}", err=True)
442
+ click.echo("Run 'stata-cli skill --list' to see available topics.", err=True)
443
+ _exit(EXIT_USAGE_ERROR)
444
+ else:
445
+ click.echo(content)
446
+
447
+
409
448
  # ── Daemon subcommands ───────────────────────────────────────────────────
410
449
 
411
450
  @cli.group()
@@ -0,0 +1,171 @@
1
+ """stata-cli skill — built-in Stata reference library."""
2
+ from __future__ import annotations
3
+
4
+ import os
5
+ from pathlib import Path
6
+ from typing import Dict, List, Optional, Tuple
7
+
8
+ SKILLS_DIR = Path(__file__).parent / "skills"
9
+
10
+ TOPIC_CATALOG: List[Tuple[str, str, str]] = [
11
+ # (topic_name, category, description)
12
+ # Data Operations
13
+ ("basics", "Data Operations", "Getting started, use, save, describe, browse, sysuse"),
14
+ ("data-import-export", "Data Operations", "import delimited/excel, export, ODBC, web data"),
15
+ ("data-management", "Data Operations", "generate, replace, merge, reshape, collapse, egen, encode/decode"),
16
+ ("variables-operators", "Data Operations", "Variable types, missing values, operators, if/in qualifiers"),
17
+ ("string-functions", "Data Operations", "substr(), regexm(), split, strtrim(), Unicode"),
18
+ ("date-time-functions", "Data Operations", "date(), clock(), %td/%tc formats, mdy(), business calendars"),
19
+ ("mathematical-functions", "Data Operations", "round(), log(), exp(), cond(), distributions, random numbers"),
20
+ # Statistics & Econometrics
21
+ ("descriptive-statistics", "Statistics & Econometrics", "summarize, tabulate, correlate, tabstat, codebook"),
22
+ ("linear-regression", "Statistics & Econometrics", "regress, vce(robust), vce(cluster), margins, predict, ivregress"),
23
+ ("panel-data", "Statistics & Econometrics", "xtset, xtreg fe/re, Hausman test, dynamic panels"),
24
+ ("time-series", "Statistics & Econometrics", "tsset, ARIMA, VAR, dfuller, pperron, irf, forecasting"),
25
+ ("limited-dependent-variables", "Statistics & Econometrics", "logit, probit, tobit, poisson, nbreg, mlogit, ologit"),
26
+ ("survey-data-analysis", "Statistics & Econometrics", "svyset, svy:, subpop(), complex survey design"),
27
+ ("bootstrap-simulation", "Statistics & Econometrics", "bootstrap, simulate, permute, Monte Carlo"),
28
+ ("missing-data-handling", "Statistics & Econometrics", "mi impute, mi estimate, FIML, misstable"),
29
+ ("maximum-likelihood", "Statistics & Econometrics", "ml model, custom likelihood functions, ml init"),
30
+ ("gmm-estimation", "Statistics & Econometrics", "gmm, moment conditions, estat overid, J-test"),
31
+ # Causal Inference
32
+ ("treatment-effects", "Causal Inference", "teffects ra/ipw/ipwra/aipw, ATE/ATT/ATET"),
33
+ ("difference-in-differences", "Causal Inference", "DiD, parallel trends, event studies, staggered adoption"),
34
+ ("regression-discontinuity", "Causal Inference", "Sharp/fuzzy RD, bandwidth selection, rdplot"),
35
+ ("matching-methods", "Causal Inference", "PSM, nearest neighbor, kernel matching, teffects nnmatch"),
36
+ ("sample-selection", "Causal Inference", "heckman, heckprobit, exclusion restrictions"),
37
+ # Advanced Methods
38
+ ("survival-analysis", "Advanced Methods", "stset, stcox, streg, Kaplan-Meier, parametric models"),
39
+ ("sem-factor-analysis", "Advanced Methods", "sem, gsem, CFA, path analysis, alpha, reliability"),
40
+ ("nonparametric-methods", "Advanced Methods", "kdensity, rank tests, qreg, npregress"),
41
+ ("spatial-analysis", "Advanced Methods", "spmatrix, spregress, spatial weights, Moran's I"),
42
+ ("machine-learning", "Advanced Methods", "lasso, elasticnet, cvlasso, cross-validation"),
43
+ # Graphics
44
+ ("graphics", "Graphics", "twoway, scatter, line, bar, histogram, graph combine, graph export"),
45
+ # Programming
46
+ ("programming-basics", "Programming", "local, global, foreach, forvalues, program define, syntax"),
47
+ ("advanced-programming", "Programming", "syntax, mata, classes, tempfile/tempvar"),
48
+ ("mata-introduction", "Programming", "Mata basics, when to use Mata vs ado, data types"),
49
+ ("mata-programming", "Programming", "Mata functions, flow control, structures, pointers"),
50
+ ("mata-matrix-operations", "Programming", "Matrix creation, decompositions, solvers, st_matrix()"),
51
+ ("mata-data-access", "Programming", "st_data(), st_view(), st_store(), performance tips"),
52
+ # Output & Workflow
53
+ ("tables-reporting", "Output & Workflow", "putexcel, putdocx, putpdf, LaTeX, collect"),
54
+ ("workflow-best-practices", "Output & Workflow", "Project structure, master do-files, version control"),
55
+ ("external-tools-integration", "Output & Workflow", "Python via python:, R via rsource, shell, Git"),
56
+ # Community Packages
57
+ ("reghdfe", "Community Packages", "High-dimensional fixed effects OLS"),
58
+ ("estout", "Community Packages", "Publication-quality regression tables (esttab/estout)"),
59
+ ("outreg2", "Community Packages", "Alternative regression table exporter (Word/Excel/TeX)"),
60
+ ("asdoc", "Community Packages", "One-command Word document creation for any Stata output"),
61
+ ("tabout", "Community Packages", "Cross-tabulations and summary tables to file"),
62
+ ("coefplot", "Community Packages", "Coefficient plots from stored estimates"),
63
+ ("graph-schemes", "Community Packages", "grstyle, schemepack, plotplain — better graph themes"),
64
+ ("did", "Community Packages", "Modern DiD: csdid, did_multiplegt, did_imputation"),
65
+ ("event-study", "Community Packages", "eventstudyinteract, eventdd — event study estimators"),
66
+ ("rdrobust", "Community Packages", "Robust RD estimation with optimal bandwidth"),
67
+ ("psmatch2", "Community Packages", "Propensity score matching (nearest neighbor, kernel)"),
68
+ ("synth", "Community Packages", "Synthetic control method (synth, synth_runner)"),
69
+ ("ivreg2", "Community Packages", "Enhanced IV/2SLS with additional diagnostics"),
70
+ ("xtabond2", "Community Packages", "Dynamic panel GMM (Arellano-Bond/Blundell-Bond)"),
71
+ ("binsreg", "Community Packages", "Binned scatter plots with CI"),
72
+ ("nprobust", "Community Packages", "Nonparametric kernel estimation and inference"),
73
+ ("diagnostics", "Community Packages", "bacondecomp, xttest3, collinearity, heteroskedasticity"),
74
+ ("winsor", "Community Packages", "Winsorizing and trimming: winsor2, winsor"),
75
+ ("data-manipulation", "Community Packages", "gtools (fast collapse/egen), rangestat, egenmore"),
76
+ ("package-management", "Community Packages", "ssc install, net install, ado update"),
77
+ ]
78
+
79
+ # Build lookup: topic name -> (category, description)
80
+ _TOPIC_MAP: Dict[str, Tuple[str, str]] = {t[0]: (t[1], t[2]) for t in TOPIC_CATALOG}
81
+
82
+ # Short aliases for convenience
83
+ _ALIASES: Dict[str, str] = {
84
+ "basics": "basics-getting-started",
85
+ "regression": "linear-regression",
86
+ "panel": "panel-data",
87
+ "rd": "regression-discontinuity",
88
+ "diff-in-diff": "difference-in-differences",
89
+ "matching": "matching-methods",
90
+ "ts": "time-series",
91
+ "logit": "limited-dependent-variables",
92
+ "probit": "limited-dependent-variables",
93
+ "tobit": "limited-dependent-variables",
94
+ "survival": "survival-analysis",
95
+ "sem": "sem-factor-analysis",
96
+ "ml": "maximum-likelihood",
97
+ "gmm": "gmm-estimation",
98
+ "survey": "survey-data-analysis",
99
+ "bootstrap": "bootstrap-simulation",
100
+ "mi": "missing-data-handling",
101
+ "mata": "mata-introduction",
102
+ "strings": "string-functions",
103
+ "dates": "date-time-functions",
104
+ "math": "mathematical-functions",
105
+ "tables": "tables-reporting",
106
+ "workflow": "workflow-best-practices",
107
+ "external": "external-tools-integration",
108
+ "nonparametric": "nonparametric-methods",
109
+ "spatial": "spatial-analysis",
110
+ "lasso": "machine-learning",
111
+ "missing": "missing-data-handling",
112
+ "heckman": "sample-selection",
113
+ "selection": "sample-selection",
114
+ "iv": "ivreg2",
115
+ "gtools": "data-manipulation",
116
+ }
117
+
118
+
119
+ def _resolve_topic(name: str) -> Optional[str]:
120
+ """Resolve a topic name (with alias support) to its file stem."""
121
+ name = name.lower().strip()
122
+ if name in _ALIASES:
123
+ name = _ALIASES[name]
124
+ # Try references/ then packages/
125
+ for subdir in ("references", "packages"):
126
+ path = SKILLS_DIR / subdir / f"{name}.md"
127
+ if path.exists():
128
+ return str(path)
129
+ # Try partial match
130
+ for subdir in ("references", "packages"):
131
+ d = SKILLS_DIR / subdir
132
+ if d.is_dir():
133
+ for f in d.iterdir():
134
+ if f.suffix == ".md" and name in f.stem:
135
+ return str(f)
136
+ return None
137
+
138
+
139
+ def get_overview() -> str:
140
+ """Return the skill overview content."""
141
+ overview_path = SKILLS_DIR / "overview.md"
142
+ if overview_path.exists():
143
+ return overview_path.read_text(encoding="utf-8")
144
+ return "Skill overview not found."
145
+
146
+
147
+ def get_topic(name: str) -> Optional[str]:
148
+ """Return content of a specific topic, or None if not found."""
149
+ path = _resolve_topic(name)
150
+ if path:
151
+ return Path(path).read_text(encoding="utf-8")
152
+ return None
153
+
154
+
155
+ def list_topics() -> str:
156
+ """Return a formatted topic listing grouped by category."""
157
+ lines = []
158
+ current_cat = ""
159
+ for topic_name, category, description in TOPIC_CATALOG:
160
+ if category != current_cat:
161
+ if current_cat:
162
+ lines.append("")
163
+ lines.append(category)
164
+ current_cat = category
165
+ lines.append(f" {topic_name:<30s} {description}")
166
+ return "\n".join(lines)
167
+
168
+
169
+ def get_all_topic_names() -> List[str]:
170
+ """Return list of all valid topic names."""
171
+ return [t[0] for t in TOPIC_CATALOG]
@@ -0,0 +1,196 @@
1
+ # stata-cli skill
2
+
3
+ Stata reference library built into stata-cli. Covers syntax, data management,
4
+ econometrics, causal inference, graphics, Mata programming, and 20+ community
5
+ packages. Use `stata-cli skill <topic>` to read a specific reference.
6
+
7
+ ## Critical Gotchas
8
+
9
+ ### Missing Values Sort to +Infinity
10
+ Stata's `.` (and `.a`-`.z`) are **greater than all numbers**.
11
+ ```stata
12
+ * WRONG — includes observations where income is missing!
13
+ gen high_income = (income > 50000)
14
+
15
+ * RIGHT
16
+ gen high_income = (income > 50000) if !missing(income)
17
+ ```
18
+
19
+ ### `=` vs `==`
20
+ `=` is assignment; `==` is comparison.
21
+ ```stata
22
+ * WRONG — syntax error
23
+ gen employed = 1 if status = 1
24
+
25
+ * RIGHT
26
+ gen employed = 1 if status == 1
27
+ ```
28
+
29
+ ### Local Macro Syntax
30
+ Locals use `` `name' `` (backtick + single-quote). Globals use `$name`.
31
+ ```stata
32
+ local controls "age education income"
33
+ regress wage `controls' // correct
34
+ regress wage `controls // WRONG — missing closing quote
35
+ ```
36
+
37
+ ### `by` Requires Prior Sort (Use `bysort`)
38
+ ```stata
39
+ bysort id: gen first = (_n == 1) // RIGHT — bysort sorts automatically
40
+ ```
41
+
42
+ ### Factor Variable Notation
43
+ Use `i.` for categorical, `c.` for continuous.
44
+ ```stata
45
+ * WRONG — treats race as continuous
46
+ regress wage race education
47
+
48
+ * RIGHT — creates dummies
49
+ regress wage i.race education
50
+ ```
51
+
52
+ ### `merge` Always Check `_merge`
53
+ ```stata
54
+ merge 1:1 id using other.dta
55
+ tab _merge
56
+ drop _merge
57
+ ```
58
+
59
+ ### Stored Results: `r()` vs `e()` vs `s()`
60
+ - `r()` — r-class (summarize, tabulate)
61
+ - `e()` — e-class (regress, logit)
62
+ - `s()` — s-class (parsing)
63
+
64
+ A new estimation command **overwrites** previous `e()` results. Use `estimates store`.
65
+
66
+ ## Common Patterns
67
+
68
+ ### Regression Table Workflow
69
+ ```stata
70
+ eststo clear
71
+ eststo: regress y x1 x2, vce(robust)
72
+ eststo: regress y x1 x2 x3, vce(robust)
73
+ esttab using "results.tex", replace se star(* 0.10 ** 0.05 *** 0.01) label booktabs
74
+ ```
75
+
76
+ ### Panel Data Setup
77
+ ```stata
78
+ xtset panelid timevar
79
+ reghdfe y x1 x2, absorb(panelid timevar) vce(cluster panelid)
80
+ ```
81
+
82
+ ### Difference-in-Differences
83
+ ```stata
84
+ * Classic 2x2 DiD
85
+ gen post = (year >= treatment_year)
86
+ gen treat_post = treated * post
87
+ regress y treated post treat_post, vce(cluster id)
88
+
89
+ * Modern staggered DiD (Callaway & Sant'Anna)
90
+ csdid y x1 x2, ivar(id) time(year) gvar(first_treat) agg(event)
91
+ ```
92
+
93
+ ### Data Cleaning Pipeline
94
+ ```stata
95
+ import delimited "raw_data.csv", clear varnames(1)
96
+ rename *, lower
97
+ destring income, replace force
98
+ replace income = . if income < 0
99
+ label variable income "Annual household income (USD)"
100
+ compress
101
+ save "clean_data.dta", replace
102
+ ```
103
+
104
+ ## Topic Routing Table
105
+
106
+ Use `stata-cli skill <topic>` to read a specific reference.
107
+ Use `stata-cli skill --list` to see all topics with descriptions.
108
+
109
+ ### Data Operations
110
+ | Topic | Key Commands |
111
+ |-------|-------------|
112
+ | `basics` | use, save, describe, browse, sysuse |
113
+ | `data-import-export` | import delimited/excel, export, ODBC |
114
+ | `data-management` | generate, replace, merge, reshape, collapse, egen |
115
+ | `variables-operators` | Variable types, missing values, if/in qualifiers |
116
+ | `string-functions` | substr(), regexm(), split, Unicode |
117
+ | `date-time-functions` | date(), clock(), %td/%tc formats |
118
+ | `mathematical-functions` | round(), log(), cond(), distributions |
119
+
120
+ ### Statistics & Econometrics
121
+ | Topic | Key Commands |
122
+ |-------|-------------|
123
+ | `descriptive-statistics` | summarize, tabulate, correlate, tabstat |
124
+ | `linear-regression` | regress, vce(robust), margins, predict |
125
+ | `panel-data` | xtset, xtreg fe/re, Hausman test |
126
+ | `time-series` | tsset, ARIMA, VAR, unit root tests |
127
+ | `limited-dependent` | logit, probit, tobit, poisson, mlogit |
128
+ | `survey-data` | svyset, svy:, subpop(), complex design |
129
+ | `bootstrap-simulation` | bootstrap, simulate, Monte Carlo |
130
+ | `missing-data` | mi impute, mi estimate, FIML |
131
+ | `maximum-likelihood` | ml model, custom likelihood |
132
+ | `gmm-estimation` | gmm, moment conditions, J-test |
133
+
134
+ ### Causal Inference
135
+ | Topic | Key Commands |
136
+ |-------|-------------|
137
+ | `treatment-effects` | teffects ra/ipw/aipw, ATE/ATT |
138
+ | `difference-in-differences` | DiD, event study, staggered adoption |
139
+ | `regression-discontinuity` | Sharp/fuzzy RD, bandwidth selection |
140
+ | `matching-methods` | PSM, nearest neighbor, kernel matching |
141
+ | `sample-selection` | heckman, exclusion restrictions |
142
+
143
+ ### Advanced Methods
144
+ | Topic | Key Commands |
145
+ |-------|-------------|
146
+ | `survival-analysis` | stset, stcox, streg, Kaplan-Meier |
147
+ | `sem-factor-analysis` | sem, gsem, CFA, path analysis |
148
+ | `nonparametric-methods` | kdensity, qreg, npregress |
149
+ | `spatial-analysis` | spmatrix, spregress, Moran's I |
150
+ | `machine-learning` | lasso, elasticnet, cross-validation |
151
+
152
+ ### Graphics
153
+ | Topic | Key Commands |
154
+ |-------|-------------|
155
+ | `graphics` | twoway, scatter, histogram, graph export |
156
+
157
+ ### Programming
158
+ | Topic | Key Commands |
159
+ |-------|-------------|
160
+ | `programming-basics` | local, global, foreach, program define |
161
+ | `advanced-programming` | syntax, mata, tempfile/tempvar |
162
+ | `mata-introduction` | Mata basics, when to use Mata |
163
+ | `mata-programming` | Mata functions, structures, pointers |
164
+ | `mata-matrix-operations` | Matrix decompositions, st_matrix() |
165
+ | `mata-data-access` | st_data(), st_view(), st_store() |
166
+
167
+ ### Output & Workflow
168
+ | Topic | Key Commands |
169
+ |-------|-------------|
170
+ | `tables-reporting` | putexcel, putdocx, LaTeX, collect |
171
+ | `workflow-best-practices` | Project structure, version control |
172
+ | `external-tools` | Python via python:, R, shell commands |
173
+
174
+ ### Community Packages
175
+ | Topic | What It Does |
176
+ |-------|-------------|
177
+ | `reghdfe` | High-dimensional fixed effects OLS |
178
+ | `estout` | Publication-quality regression tables (esttab) |
179
+ | `outreg2` | Alternative table exporter (Word/Excel/TeX) |
180
+ | `asdoc` | One-command Word document creation |
181
+ | `coefplot` | Coefficient plots from stored estimates |
182
+ | `did` | Modern DiD estimators (csdid, did_multiplegt) |
183
+ | `event-study` | eventstudyinteract, eventdd |
184
+ | `rdrobust` | Robust RD estimation + optimal bandwidth |
185
+ | `psmatch2` | Propensity score matching |
186
+ | `synth` | Synthetic control method |
187
+ | `ivreg2` | Enhanced IV/2SLS with diagnostics |
188
+ | `xtabond2` | Dynamic panel GMM (Arellano-Bond) |
189
+ | `binsreg` | Binned scatter plots with CI |
190
+ | `data-manipulation` | gtools (fast collapse/egen), rangestat |
191
+ | `diagnostics` | bacondecomp, xttest3, heteroskedasticity |
192
+ | `graph-schemes` | grstyle, schemepack, plotplain |
193
+ | `nprobust` | Nonparametric kernel estimation |
194
+ | `winsor` | Winsorizing and trimming (winsor2) |
195
+ | `tabout` | Cross-tabulations to file |
196
+ | `package-management` | ssc install, net install, ado update |