stata-cli 0.3.0__tar.gz → 0.4.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {stata_cli-0.3.0 → stata_cli-0.4.1}/PKG-INFO +16 -1
- {stata_cli-0.3.0 → stata_cli-0.4.1}/README.md +15 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/pyproject.toml +4 -1
- stata_cli-0.4.1/src/stata_cli/__init__.py +1 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli/main.py +39 -0
- stata_cli-0.4.1/src/stata_cli/skill_registry.py +171 -0
- stata_cli-0.4.1/src/stata_cli/skills/overview.md +196 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/asdoc.md +357 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/binsreg.md +358 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/coefplot.md +397 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/data-manipulation.md +407 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/diagnostics.md +621 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/did.md +583 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/estout.md +676 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/event-study.md +1032 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/graph-schemes.md +633 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/ivreg2.md +387 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/nprobust.md +447 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/outreg2.md +424 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/package-management.md +319 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/psmatch2.md +658 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/rdrobust.md +498 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/reghdfe.md +372 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/synth.md +873 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/tabout.md +533 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/winsor.md +284 -0
- stata_cli-0.4.1/src/stata_cli/skills/packages/xtabond2.md +544 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/advanced-programming.md +506 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/basics-getting-started.md +237 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/bootstrap-simulation.md +327 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/data-import-export.md +282 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/data-management.md +426 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/date-time-functions.md +282 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/descriptive-statistics.md +268 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/difference-in-differences.md +750 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/external-tools-integration.md +966 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/gmm-estimation.md +367 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/graphics.md +344 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/limited-dependent-variables.md +289 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/linear-regression.md +398 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/machine-learning.md +511 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/mata-data-access.md +370 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/mata-introduction.md +313 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/mata-matrix-operations.md +305 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/mata-programming.md +400 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/matching-methods.md +742 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/mathematical-functions.md +269 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/maximum-likelihood.md +749 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/missing-data-handling.md +712 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/nonparametric-methods.md +478 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/panel-data.md +294 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/programming-basics.md +440 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/regression-discontinuity.md +486 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/sample-selection.md +670 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/sem-factor-analysis.md +576 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/spatial-analysis.md +766 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/string-functions.md +318 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/survey-data-analysis.md +595 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/survival-analysis.md +466 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/tables-reporting.md +973 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/time-series.md +345 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/treatment-effects.md +804 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/variables-operators.md +206 -0
- stata_cli-0.4.1/src/stata_cli/skills/references/workflow-best-practices.md +1176 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli.egg-info/PKG-INFO +16 -1
- stata_cli-0.4.1/src/stata_cli.egg-info/SOURCES.txt +76 -0
- stata_cli-0.3.0/src/stata_cli/__init__.py +0 -1
- stata_cli-0.3.0/src/stata_cli.egg-info/SOURCES.txt +0 -17
- {stata_cli-0.3.0 → stata_cli-0.4.1}/setup.cfg +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli/__main__.py +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli/daemon.py +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli/engine.py +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli/graph_artifacts.py +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli/output_filter.py +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli/smcl_parser.py +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli/utils.py +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli.egg-info/dependency_links.txt +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli.egg-info/entry_points.txt +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli.egg-info/requires.txt +0 -0
- {stata_cli-0.3.0 → stata_cli-0.4.1}/src/stata_cli.egg-info/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: stata-cli
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.4.1
|
|
4
4
|
Summary: Command-line interface for running Stata commands via PyStata
|
|
5
5
|
License: MIT
|
|
6
6
|
Keywords: stata,cli,statistics,data-science
|
|
@@ -13,6 +13,8 @@ Requires-Dist: pandas; extra == "data"
|
|
|
13
13
|
|
|
14
14
|
# stata-cli
|
|
15
15
|
|
|
16
|
+
> **Stata CLI Is All Reg Monkeys Need**
|
|
17
|
+
|
|
16
18
|

|
|
17
19
|
|
|
18
20
|
[](https://opensource.org/licenses/MIT)
|
|
@@ -52,6 +54,7 @@ A command-line interface for [Stata](https://www.stata.com/) via PyStata — bui
|
|
|
52
54
|
| **Daemon Mode** | Persistent background process for sub-second execution via Unix socket |
|
|
53
55
|
| **Output Control** | Compact mode, JSON output, token limit management, log file output |
|
|
54
56
|
| **Interruption** | Send break signal to stop long-running commands |
|
|
57
|
+
| **Skill Library** | Built-in Stata reference with 57 topics: syntax, econometrics, causal inference, packages |
|
|
55
58
|
|
|
56
59
|
## Installation & Quick Start
|
|
57
60
|
|
|
@@ -259,6 +262,18 @@ stata-cli frame
|
|
|
259
262
|
|
|
260
263
|
Shows all Stata frames and the current working frame.
|
|
261
264
|
|
|
265
|
+
### `skill` — Stata Reference Library
|
|
266
|
+
|
|
267
|
+
```bash
|
|
268
|
+
stata-cli skill # overview: gotchas, patterns, topic routing table
|
|
269
|
+
stata-cli skill --list # list all 57 topics with descriptions
|
|
270
|
+
stata-cli skill regression # linear regression reference
|
|
271
|
+
stata-cli skill did # modern DiD packages (csdid, did_multiplegt)
|
|
272
|
+
stata-cli skill reghdfe # reghdfe package guide
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
Built-in reference library covering data management, econometrics, causal inference, graphics, Mata programming, and 20+ community packages. Aliases supported (e.g. `did` for `difference-in-differences`, `panel` for `panel-data`).
|
|
276
|
+
|
|
262
277
|
## Daemon Mode
|
|
263
278
|
|
|
264
279
|
The daemon keeps PyStata alive in the background — reduces execution time from **~2-3s to ~85ms** (35x speedup).
|
|
@@ -1,5 +1,7 @@
|
|
|
1
1
|
# stata-cli
|
|
2
2
|
|
|
3
|
+
> **Stata CLI Is All Reg Monkeys Need**
|
|
4
|
+
|
|
3
5
|

|
|
4
6
|
|
|
5
7
|
[](https://opensource.org/licenses/MIT)
|
|
@@ -39,6 +41,7 @@ A command-line interface for [Stata](https://www.stata.com/) via PyStata — bui
|
|
|
39
41
|
| **Daemon Mode** | Persistent background process for sub-second execution via Unix socket |
|
|
40
42
|
| **Output Control** | Compact mode, JSON output, token limit management, log file output |
|
|
41
43
|
| **Interruption** | Send break signal to stop long-running commands |
|
|
44
|
+
| **Skill Library** | Built-in Stata reference with 57 topics: syntax, econometrics, causal inference, packages |
|
|
42
45
|
|
|
43
46
|
## Installation & Quick Start
|
|
44
47
|
|
|
@@ -246,6 +249,18 @@ stata-cli frame
|
|
|
246
249
|
|
|
247
250
|
Shows all Stata frames and the current working frame.
|
|
248
251
|
|
|
252
|
+
### `skill` — Stata Reference Library
|
|
253
|
+
|
|
254
|
+
```bash
|
|
255
|
+
stata-cli skill # overview: gotchas, patterns, topic routing table
|
|
256
|
+
stata-cli skill --list # list all 57 topics with descriptions
|
|
257
|
+
stata-cli skill regression # linear regression reference
|
|
258
|
+
stata-cli skill did # modern DiD packages (csdid, did_multiplegt)
|
|
259
|
+
stata-cli skill reghdfe # reghdfe package guide
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
Built-in reference library covering data management, econometrics, causal inference, graphics, Mata programming, and 20+ community packages. Aliases supported (e.g. `did` for `difference-in-differences`, `panel` for `panel-data`).
|
|
263
|
+
|
|
249
264
|
## Daemon Mode
|
|
250
265
|
|
|
251
266
|
The daemon keeps PyStata alive in the background — reduces execution time from **~2-3s to ~85ms** (35x speedup).
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
[project]
|
|
2
2
|
name = "stata-cli"
|
|
3
|
-
version = "0.
|
|
3
|
+
version = "0.4.1"
|
|
4
4
|
description = "Command-line interface for running Stata commands via PyStata"
|
|
5
5
|
readme = "README.md"
|
|
6
6
|
requires-python = ">=3.9"
|
|
@@ -23,3 +23,6 @@ build-backend = "setuptools.build_meta"
|
|
|
23
23
|
|
|
24
24
|
[tool.setuptools.packages.find]
|
|
25
25
|
where = ["src"]
|
|
26
|
+
|
|
27
|
+
[tool.setuptools.package-data]
|
|
28
|
+
stata_cli = ["skills/**/*.md"]
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
__version__ = "0.4.1"
|
|
@@ -406,6 +406,45 @@ def frame_cmd(ctx):
|
|
|
406
406
|
click.echo(json.dumps(resp, ensure_ascii=False, indent=2))
|
|
407
407
|
|
|
408
408
|
|
|
409
|
+
# ── Skill command ────────────────────────────────────────────────────────
|
|
410
|
+
|
|
411
|
+
@cli.command("skill")
|
|
412
|
+
@click.argument("topic", required=False, default=None)
|
|
413
|
+
@click.option("--list", "list_topics", is_flag=True, default=False, help="List all available topics.")
|
|
414
|
+
@click.pass_context
|
|
415
|
+
def skill_cmd(ctx, topic, list_topics):
|
|
416
|
+
"""Stata reference library — 57 topics, use 'skill --list' to browse.
|
|
417
|
+
|
|
418
|
+
Without arguments, shows the overview (gotchas, common patterns, routing table).
|
|
419
|
+
With a topic name, shows the detailed reference for that topic.
|
|
420
|
+
|
|
421
|
+
\b
|
|
422
|
+
Examples:
|
|
423
|
+
stata-cli skill # overview
|
|
424
|
+
stata-cli skill --list # list all topics
|
|
425
|
+
stata-cli skill regression # linear regression reference
|
|
426
|
+
stata-cli skill did # difference-in-differences
|
|
427
|
+
stata-cli skill reghdfe # reghdfe package guide
|
|
428
|
+
"""
|
|
429
|
+
from .skill_registry import get_overview, get_topic, list_topics as _list_topics
|
|
430
|
+
|
|
431
|
+
if list_topics:
|
|
432
|
+
click.echo(_list_topics())
|
|
433
|
+
return
|
|
434
|
+
|
|
435
|
+
if topic is None:
|
|
436
|
+
click.echo(get_overview())
|
|
437
|
+
return
|
|
438
|
+
|
|
439
|
+
content = get_topic(topic)
|
|
440
|
+
if content is None:
|
|
441
|
+
click.echo(f"Unknown topic: {topic}", err=True)
|
|
442
|
+
click.echo("Run 'stata-cli skill --list' to see available topics.", err=True)
|
|
443
|
+
_exit(EXIT_USAGE_ERROR)
|
|
444
|
+
else:
|
|
445
|
+
click.echo(content)
|
|
446
|
+
|
|
447
|
+
|
|
409
448
|
# ── Daemon subcommands ───────────────────────────────────────────────────
|
|
410
449
|
|
|
411
450
|
@cli.group()
|
|
@@ -0,0 +1,171 @@
|
|
|
1
|
+
"""stata-cli skill — built-in Stata reference library."""
|
|
2
|
+
from __future__ import annotations
|
|
3
|
+
|
|
4
|
+
import os
|
|
5
|
+
from pathlib import Path
|
|
6
|
+
from typing import Dict, List, Optional, Tuple
|
|
7
|
+
|
|
8
|
+
SKILLS_DIR = Path(__file__).parent / "skills"
|
|
9
|
+
|
|
10
|
+
TOPIC_CATALOG: List[Tuple[str, str, str]] = [
|
|
11
|
+
# (topic_name, category, description)
|
|
12
|
+
# Data Operations
|
|
13
|
+
("basics", "Data Operations", "Getting started, use, save, describe, browse, sysuse"),
|
|
14
|
+
("data-import-export", "Data Operations", "import delimited/excel, export, ODBC, web data"),
|
|
15
|
+
("data-management", "Data Operations", "generate, replace, merge, reshape, collapse, egen, encode/decode"),
|
|
16
|
+
("variables-operators", "Data Operations", "Variable types, missing values, operators, if/in qualifiers"),
|
|
17
|
+
("string-functions", "Data Operations", "substr(), regexm(), split, strtrim(), Unicode"),
|
|
18
|
+
("date-time-functions", "Data Operations", "date(), clock(), %td/%tc formats, mdy(), business calendars"),
|
|
19
|
+
("mathematical-functions", "Data Operations", "round(), log(), exp(), cond(), distributions, random numbers"),
|
|
20
|
+
# Statistics & Econometrics
|
|
21
|
+
("descriptive-statistics", "Statistics & Econometrics", "summarize, tabulate, correlate, tabstat, codebook"),
|
|
22
|
+
("linear-regression", "Statistics & Econometrics", "regress, vce(robust), vce(cluster), margins, predict, ivregress"),
|
|
23
|
+
("panel-data", "Statistics & Econometrics", "xtset, xtreg fe/re, Hausman test, dynamic panels"),
|
|
24
|
+
("time-series", "Statistics & Econometrics", "tsset, ARIMA, VAR, dfuller, pperron, irf, forecasting"),
|
|
25
|
+
("limited-dependent-variables", "Statistics & Econometrics", "logit, probit, tobit, poisson, nbreg, mlogit, ologit"),
|
|
26
|
+
("survey-data-analysis", "Statistics & Econometrics", "svyset, svy:, subpop(), complex survey design"),
|
|
27
|
+
("bootstrap-simulation", "Statistics & Econometrics", "bootstrap, simulate, permute, Monte Carlo"),
|
|
28
|
+
("missing-data-handling", "Statistics & Econometrics", "mi impute, mi estimate, FIML, misstable"),
|
|
29
|
+
("maximum-likelihood", "Statistics & Econometrics", "ml model, custom likelihood functions, ml init"),
|
|
30
|
+
("gmm-estimation", "Statistics & Econometrics", "gmm, moment conditions, estat overid, J-test"),
|
|
31
|
+
# Causal Inference
|
|
32
|
+
("treatment-effects", "Causal Inference", "teffects ra/ipw/ipwra/aipw, ATE/ATT/ATET"),
|
|
33
|
+
("difference-in-differences", "Causal Inference", "DiD, parallel trends, event studies, staggered adoption"),
|
|
34
|
+
("regression-discontinuity", "Causal Inference", "Sharp/fuzzy RD, bandwidth selection, rdplot"),
|
|
35
|
+
("matching-methods", "Causal Inference", "PSM, nearest neighbor, kernel matching, teffects nnmatch"),
|
|
36
|
+
("sample-selection", "Causal Inference", "heckman, heckprobit, exclusion restrictions"),
|
|
37
|
+
# Advanced Methods
|
|
38
|
+
("survival-analysis", "Advanced Methods", "stset, stcox, streg, Kaplan-Meier, parametric models"),
|
|
39
|
+
("sem-factor-analysis", "Advanced Methods", "sem, gsem, CFA, path analysis, alpha, reliability"),
|
|
40
|
+
("nonparametric-methods", "Advanced Methods", "kdensity, rank tests, qreg, npregress"),
|
|
41
|
+
("spatial-analysis", "Advanced Methods", "spmatrix, spregress, spatial weights, Moran's I"),
|
|
42
|
+
("machine-learning", "Advanced Methods", "lasso, elasticnet, cvlasso, cross-validation"),
|
|
43
|
+
# Graphics
|
|
44
|
+
("graphics", "Graphics", "twoway, scatter, line, bar, histogram, graph combine, graph export"),
|
|
45
|
+
# Programming
|
|
46
|
+
("programming-basics", "Programming", "local, global, foreach, forvalues, program define, syntax"),
|
|
47
|
+
("advanced-programming", "Programming", "syntax, mata, classes, tempfile/tempvar"),
|
|
48
|
+
("mata-introduction", "Programming", "Mata basics, when to use Mata vs ado, data types"),
|
|
49
|
+
("mata-programming", "Programming", "Mata functions, flow control, structures, pointers"),
|
|
50
|
+
("mata-matrix-operations", "Programming", "Matrix creation, decompositions, solvers, st_matrix()"),
|
|
51
|
+
("mata-data-access", "Programming", "st_data(), st_view(), st_store(), performance tips"),
|
|
52
|
+
# Output & Workflow
|
|
53
|
+
("tables-reporting", "Output & Workflow", "putexcel, putdocx, putpdf, LaTeX, collect"),
|
|
54
|
+
("workflow-best-practices", "Output & Workflow", "Project structure, master do-files, version control"),
|
|
55
|
+
("external-tools-integration", "Output & Workflow", "Python via python:, R via rsource, shell, Git"),
|
|
56
|
+
# Community Packages
|
|
57
|
+
("reghdfe", "Community Packages", "High-dimensional fixed effects OLS"),
|
|
58
|
+
("estout", "Community Packages", "Publication-quality regression tables (esttab/estout)"),
|
|
59
|
+
("outreg2", "Community Packages", "Alternative regression table exporter (Word/Excel/TeX)"),
|
|
60
|
+
("asdoc", "Community Packages", "One-command Word document creation for any Stata output"),
|
|
61
|
+
("tabout", "Community Packages", "Cross-tabulations and summary tables to file"),
|
|
62
|
+
("coefplot", "Community Packages", "Coefficient plots from stored estimates"),
|
|
63
|
+
("graph-schemes", "Community Packages", "grstyle, schemepack, plotplain — better graph themes"),
|
|
64
|
+
("did", "Community Packages", "Modern DiD: csdid, did_multiplegt, did_imputation"),
|
|
65
|
+
("event-study", "Community Packages", "eventstudyinteract, eventdd — event study estimators"),
|
|
66
|
+
("rdrobust", "Community Packages", "Robust RD estimation with optimal bandwidth"),
|
|
67
|
+
("psmatch2", "Community Packages", "Propensity score matching (nearest neighbor, kernel)"),
|
|
68
|
+
("synth", "Community Packages", "Synthetic control method (synth, synth_runner)"),
|
|
69
|
+
("ivreg2", "Community Packages", "Enhanced IV/2SLS with additional diagnostics"),
|
|
70
|
+
("xtabond2", "Community Packages", "Dynamic panel GMM (Arellano-Bond/Blundell-Bond)"),
|
|
71
|
+
("binsreg", "Community Packages", "Binned scatter plots with CI"),
|
|
72
|
+
("nprobust", "Community Packages", "Nonparametric kernel estimation and inference"),
|
|
73
|
+
("diagnostics", "Community Packages", "bacondecomp, xttest3, collinearity, heteroskedasticity"),
|
|
74
|
+
("winsor", "Community Packages", "Winsorizing and trimming: winsor2, winsor"),
|
|
75
|
+
("data-manipulation", "Community Packages", "gtools (fast collapse/egen), rangestat, egenmore"),
|
|
76
|
+
("package-management", "Community Packages", "ssc install, net install, ado update"),
|
|
77
|
+
]
|
|
78
|
+
|
|
79
|
+
# Build lookup: topic name -> (category, description)
|
|
80
|
+
_TOPIC_MAP: Dict[str, Tuple[str, str]] = {t[0]: (t[1], t[2]) for t in TOPIC_CATALOG}
|
|
81
|
+
|
|
82
|
+
# Short aliases for convenience
|
|
83
|
+
_ALIASES: Dict[str, str] = {
|
|
84
|
+
"basics": "basics-getting-started",
|
|
85
|
+
"regression": "linear-regression",
|
|
86
|
+
"panel": "panel-data",
|
|
87
|
+
"rd": "regression-discontinuity",
|
|
88
|
+
"diff-in-diff": "difference-in-differences",
|
|
89
|
+
"matching": "matching-methods",
|
|
90
|
+
"ts": "time-series",
|
|
91
|
+
"logit": "limited-dependent-variables",
|
|
92
|
+
"probit": "limited-dependent-variables",
|
|
93
|
+
"tobit": "limited-dependent-variables",
|
|
94
|
+
"survival": "survival-analysis",
|
|
95
|
+
"sem": "sem-factor-analysis",
|
|
96
|
+
"ml": "maximum-likelihood",
|
|
97
|
+
"gmm": "gmm-estimation",
|
|
98
|
+
"survey": "survey-data-analysis",
|
|
99
|
+
"bootstrap": "bootstrap-simulation",
|
|
100
|
+
"mi": "missing-data-handling",
|
|
101
|
+
"mata": "mata-introduction",
|
|
102
|
+
"strings": "string-functions",
|
|
103
|
+
"dates": "date-time-functions",
|
|
104
|
+
"math": "mathematical-functions",
|
|
105
|
+
"tables": "tables-reporting",
|
|
106
|
+
"workflow": "workflow-best-practices",
|
|
107
|
+
"external": "external-tools-integration",
|
|
108
|
+
"nonparametric": "nonparametric-methods",
|
|
109
|
+
"spatial": "spatial-analysis",
|
|
110
|
+
"lasso": "machine-learning",
|
|
111
|
+
"missing": "missing-data-handling",
|
|
112
|
+
"heckman": "sample-selection",
|
|
113
|
+
"selection": "sample-selection",
|
|
114
|
+
"iv": "ivreg2",
|
|
115
|
+
"gtools": "data-manipulation",
|
|
116
|
+
}
|
|
117
|
+
|
|
118
|
+
|
|
119
|
+
def _resolve_topic(name: str) -> Optional[str]:
|
|
120
|
+
"""Resolve a topic name (with alias support) to its file stem."""
|
|
121
|
+
name = name.lower().strip()
|
|
122
|
+
if name in _ALIASES:
|
|
123
|
+
name = _ALIASES[name]
|
|
124
|
+
# Try references/ then packages/
|
|
125
|
+
for subdir in ("references", "packages"):
|
|
126
|
+
path = SKILLS_DIR / subdir / f"{name}.md"
|
|
127
|
+
if path.exists():
|
|
128
|
+
return str(path)
|
|
129
|
+
# Try partial match
|
|
130
|
+
for subdir in ("references", "packages"):
|
|
131
|
+
d = SKILLS_DIR / subdir
|
|
132
|
+
if d.is_dir():
|
|
133
|
+
for f in d.iterdir():
|
|
134
|
+
if f.suffix == ".md" and name in f.stem:
|
|
135
|
+
return str(f)
|
|
136
|
+
return None
|
|
137
|
+
|
|
138
|
+
|
|
139
|
+
def get_overview() -> str:
|
|
140
|
+
"""Return the skill overview content."""
|
|
141
|
+
overview_path = SKILLS_DIR / "overview.md"
|
|
142
|
+
if overview_path.exists():
|
|
143
|
+
return overview_path.read_text(encoding="utf-8")
|
|
144
|
+
return "Skill overview not found."
|
|
145
|
+
|
|
146
|
+
|
|
147
|
+
def get_topic(name: str) -> Optional[str]:
|
|
148
|
+
"""Return content of a specific topic, or None if not found."""
|
|
149
|
+
path = _resolve_topic(name)
|
|
150
|
+
if path:
|
|
151
|
+
return Path(path).read_text(encoding="utf-8")
|
|
152
|
+
return None
|
|
153
|
+
|
|
154
|
+
|
|
155
|
+
def list_topics() -> str:
|
|
156
|
+
"""Return a formatted topic listing grouped by category."""
|
|
157
|
+
lines = []
|
|
158
|
+
current_cat = ""
|
|
159
|
+
for topic_name, category, description in TOPIC_CATALOG:
|
|
160
|
+
if category != current_cat:
|
|
161
|
+
if current_cat:
|
|
162
|
+
lines.append("")
|
|
163
|
+
lines.append(category)
|
|
164
|
+
current_cat = category
|
|
165
|
+
lines.append(f" {topic_name:<30s} {description}")
|
|
166
|
+
return "\n".join(lines)
|
|
167
|
+
|
|
168
|
+
|
|
169
|
+
def get_all_topic_names() -> List[str]:
|
|
170
|
+
"""Return list of all valid topic names."""
|
|
171
|
+
return [t[0] for t in TOPIC_CATALOG]
|
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
# stata-cli skill
|
|
2
|
+
|
|
3
|
+
Stata reference library built into stata-cli. Covers syntax, data management,
|
|
4
|
+
econometrics, causal inference, graphics, Mata programming, and 20+ community
|
|
5
|
+
packages. Use `stata-cli skill <topic>` to read a specific reference.
|
|
6
|
+
|
|
7
|
+
## Critical Gotchas
|
|
8
|
+
|
|
9
|
+
### Missing Values Sort to +Infinity
|
|
10
|
+
Stata's `.` (and `.a`-`.z`) are **greater than all numbers**.
|
|
11
|
+
```stata
|
|
12
|
+
* WRONG — includes observations where income is missing!
|
|
13
|
+
gen high_income = (income > 50000)
|
|
14
|
+
|
|
15
|
+
* RIGHT
|
|
16
|
+
gen high_income = (income > 50000) if !missing(income)
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
### `=` vs `==`
|
|
20
|
+
`=` is assignment; `==` is comparison.
|
|
21
|
+
```stata
|
|
22
|
+
* WRONG — syntax error
|
|
23
|
+
gen employed = 1 if status = 1
|
|
24
|
+
|
|
25
|
+
* RIGHT
|
|
26
|
+
gen employed = 1 if status == 1
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
### Local Macro Syntax
|
|
30
|
+
Locals use `` `name' `` (backtick + single-quote). Globals use `$name`.
|
|
31
|
+
```stata
|
|
32
|
+
local controls "age education income"
|
|
33
|
+
regress wage `controls' // correct
|
|
34
|
+
regress wage `controls // WRONG — missing closing quote
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
### `by` Requires Prior Sort (Use `bysort`)
|
|
38
|
+
```stata
|
|
39
|
+
bysort id: gen first = (_n == 1) // RIGHT — bysort sorts automatically
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
### Factor Variable Notation
|
|
43
|
+
Use `i.` for categorical, `c.` for continuous.
|
|
44
|
+
```stata
|
|
45
|
+
* WRONG — treats race as continuous
|
|
46
|
+
regress wage race education
|
|
47
|
+
|
|
48
|
+
* RIGHT — creates dummies
|
|
49
|
+
regress wage i.race education
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### `merge` Always Check `_merge`
|
|
53
|
+
```stata
|
|
54
|
+
merge 1:1 id using other.dta
|
|
55
|
+
tab _merge
|
|
56
|
+
drop _merge
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Stored Results: `r()` vs `e()` vs `s()`
|
|
60
|
+
- `r()` — r-class (summarize, tabulate)
|
|
61
|
+
- `e()` — e-class (regress, logit)
|
|
62
|
+
- `s()` — s-class (parsing)
|
|
63
|
+
|
|
64
|
+
A new estimation command **overwrites** previous `e()` results. Use `estimates store`.
|
|
65
|
+
|
|
66
|
+
## Common Patterns
|
|
67
|
+
|
|
68
|
+
### Regression Table Workflow
|
|
69
|
+
```stata
|
|
70
|
+
eststo clear
|
|
71
|
+
eststo: regress y x1 x2, vce(robust)
|
|
72
|
+
eststo: regress y x1 x2 x3, vce(robust)
|
|
73
|
+
esttab using "results.tex", replace se star(* 0.10 ** 0.05 *** 0.01) label booktabs
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### Panel Data Setup
|
|
77
|
+
```stata
|
|
78
|
+
xtset panelid timevar
|
|
79
|
+
reghdfe y x1 x2, absorb(panelid timevar) vce(cluster panelid)
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### Difference-in-Differences
|
|
83
|
+
```stata
|
|
84
|
+
* Classic 2x2 DiD
|
|
85
|
+
gen post = (year >= treatment_year)
|
|
86
|
+
gen treat_post = treated * post
|
|
87
|
+
regress y treated post treat_post, vce(cluster id)
|
|
88
|
+
|
|
89
|
+
* Modern staggered DiD (Callaway & Sant'Anna)
|
|
90
|
+
csdid y x1 x2, ivar(id) time(year) gvar(first_treat) agg(event)
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### Data Cleaning Pipeline
|
|
94
|
+
```stata
|
|
95
|
+
import delimited "raw_data.csv", clear varnames(1)
|
|
96
|
+
rename *, lower
|
|
97
|
+
destring income, replace force
|
|
98
|
+
replace income = . if income < 0
|
|
99
|
+
label variable income "Annual household income (USD)"
|
|
100
|
+
compress
|
|
101
|
+
save "clean_data.dta", replace
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
## Topic Routing Table
|
|
105
|
+
|
|
106
|
+
Use `stata-cli skill <topic>` to read a specific reference.
|
|
107
|
+
Use `stata-cli skill --list` to see all topics with descriptions.
|
|
108
|
+
|
|
109
|
+
### Data Operations
|
|
110
|
+
| Topic | Key Commands |
|
|
111
|
+
|-------|-------------|
|
|
112
|
+
| `basics` | use, save, describe, browse, sysuse |
|
|
113
|
+
| `data-import-export` | import delimited/excel, export, ODBC |
|
|
114
|
+
| `data-management` | generate, replace, merge, reshape, collapse, egen |
|
|
115
|
+
| `variables-operators` | Variable types, missing values, if/in qualifiers |
|
|
116
|
+
| `string-functions` | substr(), regexm(), split, Unicode |
|
|
117
|
+
| `date-time-functions` | date(), clock(), %td/%tc formats |
|
|
118
|
+
| `mathematical-functions` | round(), log(), cond(), distributions |
|
|
119
|
+
|
|
120
|
+
### Statistics & Econometrics
|
|
121
|
+
| Topic | Key Commands |
|
|
122
|
+
|-------|-------------|
|
|
123
|
+
| `descriptive-statistics` | summarize, tabulate, correlate, tabstat |
|
|
124
|
+
| `linear-regression` | regress, vce(robust), margins, predict |
|
|
125
|
+
| `panel-data` | xtset, xtreg fe/re, Hausman test |
|
|
126
|
+
| `time-series` | tsset, ARIMA, VAR, unit root tests |
|
|
127
|
+
| `limited-dependent` | logit, probit, tobit, poisson, mlogit |
|
|
128
|
+
| `survey-data` | svyset, svy:, subpop(), complex design |
|
|
129
|
+
| `bootstrap-simulation` | bootstrap, simulate, Monte Carlo |
|
|
130
|
+
| `missing-data` | mi impute, mi estimate, FIML |
|
|
131
|
+
| `maximum-likelihood` | ml model, custom likelihood |
|
|
132
|
+
| `gmm-estimation` | gmm, moment conditions, J-test |
|
|
133
|
+
|
|
134
|
+
### Causal Inference
|
|
135
|
+
| Topic | Key Commands |
|
|
136
|
+
|-------|-------------|
|
|
137
|
+
| `treatment-effects` | teffects ra/ipw/aipw, ATE/ATT |
|
|
138
|
+
| `difference-in-differences` | DiD, event study, staggered adoption |
|
|
139
|
+
| `regression-discontinuity` | Sharp/fuzzy RD, bandwidth selection |
|
|
140
|
+
| `matching-methods` | PSM, nearest neighbor, kernel matching |
|
|
141
|
+
| `sample-selection` | heckman, exclusion restrictions |
|
|
142
|
+
|
|
143
|
+
### Advanced Methods
|
|
144
|
+
| Topic | Key Commands |
|
|
145
|
+
|-------|-------------|
|
|
146
|
+
| `survival-analysis` | stset, stcox, streg, Kaplan-Meier |
|
|
147
|
+
| `sem-factor-analysis` | sem, gsem, CFA, path analysis |
|
|
148
|
+
| `nonparametric-methods` | kdensity, qreg, npregress |
|
|
149
|
+
| `spatial-analysis` | spmatrix, spregress, Moran's I |
|
|
150
|
+
| `machine-learning` | lasso, elasticnet, cross-validation |
|
|
151
|
+
|
|
152
|
+
### Graphics
|
|
153
|
+
| Topic | Key Commands |
|
|
154
|
+
|-------|-------------|
|
|
155
|
+
| `graphics` | twoway, scatter, histogram, graph export |
|
|
156
|
+
|
|
157
|
+
### Programming
|
|
158
|
+
| Topic | Key Commands |
|
|
159
|
+
|-------|-------------|
|
|
160
|
+
| `programming-basics` | local, global, foreach, program define |
|
|
161
|
+
| `advanced-programming` | syntax, mata, tempfile/tempvar |
|
|
162
|
+
| `mata-introduction` | Mata basics, when to use Mata |
|
|
163
|
+
| `mata-programming` | Mata functions, structures, pointers |
|
|
164
|
+
| `mata-matrix-operations` | Matrix decompositions, st_matrix() |
|
|
165
|
+
| `mata-data-access` | st_data(), st_view(), st_store() |
|
|
166
|
+
|
|
167
|
+
### Output & Workflow
|
|
168
|
+
| Topic | Key Commands |
|
|
169
|
+
|-------|-------------|
|
|
170
|
+
| `tables-reporting` | putexcel, putdocx, LaTeX, collect |
|
|
171
|
+
| `workflow-best-practices` | Project structure, version control |
|
|
172
|
+
| `external-tools` | Python via python:, R, shell commands |
|
|
173
|
+
|
|
174
|
+
### Community Packages
|
|
175
|
+
| Topic | What It Does |
|
|
176
|
+
|-------|-------------|
|
|
177
|
+
| `reghdfe` | High-dimensional fixed effects OLS |
|
|
178
|
+
| `estout` | Publication-quality regression tables (esttab) |
|
|
179
|
+
| `outreg2` | Alternative table exporter (Word/Excel/TeX) |
|
|
180
|
+
| `asdoc` | One-command Word document creation |
|
|
181
|
+
| `coefplot` | Coefficient plots from stored estimates |
|
|
182
|
+
| `did` | Modern DiD estimators (csdid, did_multiplegt) |
|
|
183
|
+
| `event-study` | eventstudyinteract, eventdd |
|
|
184
|
+
| `rdrobust` | Robust RD estimation + optimal bandwidth |
|
|
185
|
+
| `psmatch2` | Propensity score matching |
|
|
186
|
+
| `synth` | Synthetic control method |
|
|
187
|
+
| `ivreg2` | Enhanced IV/2SLS with diagnostics |
|
|
188
|
+
| `xtabond2` | Dynamic panel GMM (Arellano-Bond) |
|
|
189
|
+
| `binsreg` | Binned scatter plots with CI |
|
|
190
|
+
| `data-manipulation` | gtools (fast collapse/egen), rangestat |
|
|
191
|
+
| `diagnostics` | bacondecomp, xttest3, heteroskedasticity |
|
|
192
|
+
| `graph-schemes` | grstyle, schemepack, plotplain |
|
|
193
|
+
| `nprobust` | Nonparametric kernel estimation |
|
|
194
|
+
| `winsor` | Winsorizing and trimming (winsor2) |
|
|
195
|
+
| `tabout` | Cross-tabulations to file |
|
|
196
|
+
| `package-management` | ssc install, net install, ado update |
|