PyPI - stata-code - Versions diffs - 0.7.2__tar.gz → 0.8.0__tar.gz - Mend

stata-code 0.7.2tar.gz → 0.8.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (71) hide show

{stata_code-0.7.2 → stata_code-0.8.0}/CHANGELOG.md RENAMED Viewed

@@ -4,7 +4,33 @@ All notable changes to `stata-code` are documented here. The format follows
 [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); the project adheres
 to semver-major.minor for the result schema (see `SCHEMA.md` §6).
-## Unreleased
+## 0.8.0 — 2026-06-20
+### Added
+- **Economist workflow coordination and roadmap.** Added
+  `AGENT_COORDINATION.md` for concurrent-agent lanes and
+  `docs/industry-leader-roadmap.md` for the one-month product plan: workflow
+  intelligence, parity audits, data-MCP handoff, editor/artifact polish, and
+  distribution diagnostics.
+- **Cross-stack and data-MCP workflow references.** The `stata-code` skill now
+  includes `references/parity-audit.md` and
+  `references/data-mcp-handoff.md`, plus cookbook examples for cross-stack
+  parity audits and external-data-MCP handoff into Stata.
+- **Modern empirical-economics package notes.** Added package references for
+  `csdid`, `drdid`, `did_imputation`, `eventstudyinteract`,
+  `did_multiplegt_dyn`, `rdrobust`, `ivreg2`, `ivreghdfe`, `boottest`, and
+  `outreg2`, and wired them into the skill routing table.
+- **MCP prompt discoverability for economist workflows.** Added
+  `plan_cross_stack_parity_audit`, `data_mcp_to_stata_handoff`,
+  `did_event_study`, `iv_2sls`, `rdd`, `publication_table`, and
+  `cross_validate_did` prompts so clients can discover the new protocols and
+  turnkey empirical recipes directly through MCP.
+- **Read-only installation diagnostics.** Added the top-level `stata-code`
+  console script with `doctor` / `verify` commands. The diagnostic reports
+  package/Python version, MCP and kernel extras, `pystata` discovery, console
+  scripts on `PATH`, client/VS Code hints, and an optional live Stata
+  version/edition probe without mutating user configuration.
 ## 0.7.2 — 2026-06-20

{stata_code-0.7.2 → stata_code-0.8.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: stata-code
-Version: 0.7.2
+Version: 0.8.0
 Summary: Agent-native Stata bridge — one core, multiple frontends (MCP, Jupyter, VSCode)
 Project-URL: Homepage, https://github.com/brycewang-stanford/stata-code
 Project-URL: Repository, https://github.com/brycewang-stanford/stata-code
@@ -67,6 +67,22 @@ Description-Content-Type: text/markdown
 `stata-code` lets you drive Stata from modern environments: an LLM agent (Claude Code, Cursor, Claude Desktop), a Jupyter notebook, or a VS Code editor session. All frontends share one Python core and return a stable, structured, **agent-friendly** result schema.
+**For empirical economists.** Drive Stata in plain language: run **DiD, IV, RDD, and publication-ready `esttab` tables in one conversation** — then cross-check each estimate across Stata and Python so you only trust results that *agree* (the Cunningham cross-package robustness check).
+**Try it in 60 seconds** with [Claude Code](https://github.com/anthropics/claude-code) — no global install needed:
+```bash
+claude mcp add stata-code --scope user -- uvx --from "stata-code[mcp]" stata-code-mcp
+```
+Then just ask:
+> *"Using `data/cfps_panel.dta`, run a two-way fixed-effects regression of monthly wage on the treatment (controls: `age age2 edu industry`), then test heterogeneous effects with Callaway-Sant'Anna, and export an `esttab` table."*
+`stata-code` writes the do-file, runs it, returns the table, and interprets the result — and can re-estimate the same ATT with [StatsPAI](https://github.com/brycewang-stanford/StatsPAI) to confirm the two stacks agree. These workflows ship as one-call MCP prompts (`did_event_study`, `iv_2sls`, `rdd`, `publication_table`, `cross_validate_did`) backed by an on-demand [recipe library](skills/stata-code/references/recipes/).
+**Why `stata-code`:** MIT-licensed · ships as an MCP server, a bundled agent skill, a Jupyter kernel, **and** a VS Code extension · one structured, token-economy result schema (typed errors, native `r()` / `e()`) · cross-stack validation with StatsPAI for the Cunningham check.
 ```text
                     ┌────────────────────────────────────────┐
                     │     stata-code core (Python)           │
@@ -84,12 +100,18 @@ Description-Content-Type: text/markdown
               └─────────────┘  └────────────┘  └─────────────────┘
 ```
-**Status: v0.7 (May 2026)** — the core, MCP server, Jupyter kernel, and VS Code extension work end-to-end against Stata 18 MP. The test suite covers schema, runner, MCP, kernel, notebook, run-index, subprocess-pool, and VS Code modules; CI also checks linting, type safety, schema generation, package metadata, and VSIX packaging. License: **MIT**.
+**Status: v0.8 (June 2026)** — the core, MCP server, Jupyter kernel, and VS Code extension work end-to-end against Stata 18 MP. The test suite covers schema, runner, MCP, kernel, notebook, run-index, subprocess-pool, and VS Code modules; CI also checks linting, type safety, schema generation, package metadata, and VSIX packaging. License: **MIT**.
-Two workflows the current release explicitly supports for end users:
+Three workflows the current tree explicitly supports for end users and agents:
 - **Run Stata code from a Jupyter notebook.** `pip install "stata-code[kernel]"` + `stata-code-kernel install --user` registers a **Stata** kernel that the Jupyter Notebook UI, JupyterLab, and the VS Code Jupyter extension all pick up by name. Cells render Stata logs, graphs, and warnings inline (the kernel logo bundled since v0.5 makes it appear in VS Code's kernel picker too). See [As a Jupyter Kernel](#as-a-jupyter-kernel).
 - **Optional agent "fix and rerun" loop.** `stata_run` returns typed `error.kind/line/context` plus `suggestions` on every failure. By default Claude Code only reports diagnostics — but if you explicitly say "fix this and rerun until it passes", the agent uses the same fields to edit your `.do` file and re-call `stata_run` until the run is green. The repair loop is **opt-in**: failed runs are diagnostics first, not automatic rewrite permission. See [Error Recovery in Agent Workflows](#error-recovery-in-agent-workflows).
+- **Economist workflow guides.** The bundled skill and cookbook now cover
+  modern DiD, IV/weak-IV, RDD, table export, data-MCP handoff, and
+  cross-stack parity audits. `stata-code` runs and audits the Stata leg; R,
+  Python, and official data MCPs remain separate tools with explicit handoff
+  files and source metadata. See [`skills/stata-code/references/`](skills/stata-code/references/)
+  and [`examples/`](examples/).
 ---
@@ -138,6 +160,19 @@ pip install -e ".[mcp,kernel]"
 Note: `pystata` is **not** on PyPI; it ships with Stata. `stata-code` auto-discovers it on macOS at `/Applications/Stata/utilities/pystata` and at equivalent Linux / Windows paths. If your install is elsewhere, add it to `PYTHONPATH` before importing.
+Verify the local setup with the read-only doctor:
+```bash
+stata-code doctor
+stata-code doctor --json          # machine-readable output
+stata-code doctor --no-stata-probe # skip live Stata initialization
+```
+The doctor reports the package/Python version, MCP and Jupyter extras, `pystata`
+discovery, console scripts on `PATH`, client/VS Code configuration hints, and a
+best-effort Stata version/edition probe. It never edits shell, Stata, Claude, or
+VS Code config.
 ---
 ## Quick Start
@@ -315,8 +350,11 @@ resources:
 MCP prompts are available for common agent workflows:
 `run_do_file_and_report`, `debug_stata_error`,
-`fix_and_rerun_until_passes`, `replication_audit`, and
-`summarize_estimation_results`.
+`fix_and_rerun_until_passes`, `replication_audit`,
+`plan_cross_stack_parity_audit`, `data_mcp_to_stata_handoff`,
+`summarize_estimation_results`, `run_notebook_cell_and_report`,
+`fix_and_rerun_notebook_cell`, `did_event_study`, `iv_2sls`, `rdd`,
+`publication_table`, and `cross_validate_did`.
 ### As a Jupyter Kernel
@@ -358,6 +396,12 @@ Or open the **Extensions** sidebar in VS Code and search `stata-code`. The exten
 On first activation the extension probes for `stata-code-mcp` on `PATH` (and in any workspace `.venv` / `venv`). If nothing resolves, it shows a one-time install hint with the exact `pip install "stata-code[mcp]"` command — choose **Don't show again** to silence it for the installed extension version.
+If the extension or an MCP client cannot find the server, run
+`stata-code doctor --no-stata-probe` in the same Python environment. It reports
+whether `stata-code-mcp` is on `PATH` and suggests absolute-path or
+`python -m stata_code.mcp` fallbacks for GUI clients whose `PATH` differs from
+your shell.
 #### Cell and section conventions
 The extension recognizes two complementary structural markers inside `.do` files. Either can be mixed in the same file; they do not conflict.
@@ -447,7 +491,7 @@ stata_code/
 ## Roadmap
-### Done (through v0.7 — May 2026)
+### Done (current tree)
 - v1.0 result schema ([SCHEMA.md](SCHEMA.md))
 - `pystata`-based runner with native-typed `r()`, `e()`, and matrices
@@ -463,6 +507,12 @@ stata_code/
 - Subprocess-backed hard timeout and cancellation for the public Python API and MCP server: `timeout_ms`, `cancel(session_id)`, and MCP `cancel_session`
 - Per-cell repair loop on `.ipynb` via `notebook_outline` / `notebook_get_cell` / `notebook_edit_cell` with optimistic-concurrency `expected_source` guards and `origin_cell_id` echo on `RunResult`
 - Persistent run bundles + `list_runs` query over `manifest.json` files (filter by cell / origin / session / since / ok; page with limit / offset)
+- Read-only `stata-code doctor` / `verify` diagnostics for package version,
+  extras, `pystata` discovery, console scripts, client hints, and optional live
+  Stata version probing
+- Economist workflow layer: skill references and examples for modern DiD,
+  IV/weak-IV, RDD, table export, data-MCP handoff, and cross-stack parity
+  audits
 - JSON Schema artifact auto-generated from `schema.py`: [`schema/run_result.schema.json`](schema/run_result.schema.json)
 - VS Code extension published to the Marketplace as [`brycewang-stanford.stata-code-vscode`](https://marketplace.visualstudio.com/items?itemName=brycewang-stanford.stata-code-vscode): syntax highlighting, section outline/navigation, code-lens cell and section runners, sidebar (sessions / last result / run history / logs / graphs), status bar, completions, conservative variable rename, diagnostics, MCP child-process spawn
 - Clean-room license policy ([LICENSE-POLICY.md](LICENSE-POLICY.md))

{stata_code-0.7.2 → stata_code-0.8.0}/README.md RENAMED Viewed

@@ -28,6 +28,22 @@
 `stata-code` lets you drive Stata from modern environments: an LLM agent (Claude Code, Cursor, Claude Desktop), a Jupyter notebook, or a VS Code editor session. All frontends share one Python core and return a stable, structured, **agent-friendly** result schema.
+**For empirical economists.** Drive Stata in plain language: run **DiD, IV, RDD, and publication-ready `esttab` tables in one conversation** — then cross-check each estimate across Stata and Python so you only trust results that *agree* (the Cunningham cross-package robustness check).
+**Try it in 60 seconds** with [Claude Code](https://github.com/anthropics/claude-code) — no global install needed:
+```bash
+claude mcp add stata-code --scope user -- uvx --from "stata-code[mcp]" stata-code-mcp
+```
+Then just ask:
+> *"Using `data/cfps_panel.dta`, run a two-way fixed-effects regression of monthly wage on the treatment (controls: `age age2 edu industry`), then test heterogeneous effects with Callaway-Sant'Anna, and export an `esttab` table."*
+`stata-code` writes the do-file, runs it, returns the table, and interprets the result — and can re-estimate the same ATT with [StatsPAI](https://github.com/brycewang-stanford/StatsPAI) to confirm the two stacks agree. These workflows ship as one-call MCP prompts (`did_event_study`, `iv_2sls`, `rdd`, `publication_table`, `cross_validate_did`) backed by an on-demand [recipe library](skills/stata-code/references/recipes/).
+**Why `stata-code`:** MIT-licensed · ships as an MCP server, a bundled agent skill, a Jupyter kernel, **and** a VS Code extension · one structured, token-economy result schema (typed errors, native `r()` / `e()`) · cross-stack validation with StatsPAI for the Cunningham check.
 ```text
                     ┌────────────────────────────────────────┐
                     │     stata-code core (Python)           │
@@ -45,12 +61,18 @@
               └─────────────┘  └────────────┘  └─────────────────┘
 ```
-**Status: v0.7 (May 2026)** — the core, MCP server, Jupyter kernel, and VS Code extension work end-to-end against Stata 18 MP. The test suite covers schema, runner, MCP, kernel, notebook, run-index, subprocess-pool, and VS Code modules; CI also checks linting, type safety, schema generation, package metadata, and VSIX packaging. License: **MIT**.
+**Status: v0.8 (June 2026)** — the core, MCP server, Jupyter kernel, and VS Code extension work end-to-end against Stata 18 MP. The test suite covers schema, runner, MCP, kernel, notebook, run-index, subprocess-pool, and VS Code modules; CI also checks linting, type safety, schema generation, package metadata, and VSIX packaging. License: **MIT**.
-Two workflows the current release explicitly supports for end users:
+Three workflows the current tree explicitly supports for end users and agents:
 - **Run Stata code from a Jupyter notebook.** `pip install "stata-code[kernel]"` + `stata-code-kernel install --user` registers a **Stata** kernel that the Jupyter Notebook UI, JupyterLab, and the VS Code Jupyter extension all pick up by name. Cells render Stata logs, graphs, and warnings inline (the kernel logo bundled since v0.5 makes it appear in VS Code's kernel picker too). See [As a Jupyter Kernel](#as-a-jupyter-kernel).
 - **Optional agent "fix and rerun" loop.** `stata_run` returns typed `error.kind/line/context` plus `suggestions` on every failure. By default Claude Code only reports diagnostics — but if you explicitly say "fix this and rerun until it passes", the agent uses the same fields to edit your `.do` file and re-call `stata_run` until the run is green. The repair loop is **opt-in**: failed runs are diagnostics first, not automatic rewrite permission. See [Error Recovery in Agent Workflows](#error-recovery-in-agent-workflows).
+- **Economist workflow guides.** The bundled skill and cookbook now cover
+  modern DiD, IV/weak-IV, RDD, table export, data-MCP handoff, and
+  cross-stack parity audits. `stata-code` runs and audits the Stata leg; R,
+  Python, and official data MCPs remain separate tools with explicit handoff
+  files and source metadata. See [`skills/stata-code/references/`](skills/stata-code/references/)
+  and [`examples/`](examples/).
 ---
@@ -99,6 +121,19 @@ pip install -e ".[mcp,kernel]"
 Note: `pystata` is **not** on PyPI; it ships with Stata. `stata-code` auto-discovers it on macOS at `/Applications/Stata/utilities/pystata` and at equivalent Linux / Windows paths. If your install is elsewhere, add it to `PYTHONPATH` before importing.
+Verify the local setup with the read-only doctor:
+```bash
+stata-code doctor
+stata-code doctor --json          # machine-readable output
+stata-code doctor --no-stata-probe # skip live Stata initialization
+```
+The doctor reports the package/Python version, MCP and Jupyter extras, `pystata`
+discovery, console scripts on `PATH`, client/VS Code configuration hints, and a
+best-effort Stata version/edition probe. It never edits shell, Stata, Claude, or
+VS Code config.
 ---
 ## Quick Start
@@ -276,8 +311,11 @@ resources:
 MCP prompts are available for common agent workflows:
 `run_do_file_and_report`, `debug_stata_error`,
-`fix_and_rerun_until_passes`, `replication_audit`, and
-`summarize_estimation_results`.
+`fix_and_rerun_until_passes`, `replication_audit`,
+`plan_cross_stack_parity_audit`, `data_mcp_to_stata_handoff`,
+`summarize_estimation_results`, `run_notebook_cell_and_report`,
+`fix_and_rerun_notebook_cell`, `did_event_study`, `iv_2sls`, `rdd`,
+`publication_table`, and `cross_validate_did`.
 ### As a Jupyter Kernel
@@ -319,6 +357,12 @@ Or open the **Extensions** sidebar in VS Code and search `stata-code`. The exten
 On first activation the extension probes for `stata-code-mcp` on `PATH` (and in any workspace `.venv` / `venv`). If nothing resolves, it shows a one-time install hint with the exact `pip install "stata-code[mcp]"` command — choose **Don't show again** to silence it for the installed extension version.
+If the extension or an MCP client cannot find the server, run
+`stata-code doctor --no-stata-probe` in the same Python environment. It reports
+whether `stata-code-mcp` is on `PATH` and suggests absolute-path or
+`python -m stata_code.mcp` fallbacks for GUI clients whose `PATH` differs from
+your shell.
 #### Cell and section conventions
 The extension recognizes two complementary structural markers inside `.do` files. Either can be mixed in the same file; they do not conflict.
@@ -408,7 +452,7 @@ stata_code/
 ## Roadmap
-### Done (through v0.7 — May 2026)
+### Done (current tree)
 - v1.0 result schema ([SCHEMA.md](SCHEMA.md))
 - `pystata`-based runner with native-typed `r()`, `e()`, and matrices
@@ -424,6 +468,12 @@ stata_code/
 - Subprocess-backed hard timeout and cancellation for the public Python API and MCP server: `timeout_ms`, `cancel(session_id)`, and MCP `cancel_session`
 - Per-cell repair loop on `.ipynb` via `notebook_outline` / `notebook_get_cell` / `notebook_edit_cell` with optimistic-concurrency `expected_source` guards and `origin_cell_id` echo on `RunResult`
 - Persistent run bundles + `list_runs` query over `manifest.json` files (filter by cell / origin / session / since / ok; page with limit / offset)
+- Read-only `stata-code doctor` / `verify` diagnostics for package version,
+  extras, `pystata` discovery, console scripts, client hints, and optional live
+  Stata version probing
+- Economist workflow layer: skill references and examples for modern DiD,
+  IV/weak-IV, RDD, table export, data-MCP handoff, and cross-stack parity
+  audits
 - JSON Schema artifact auto-generated from `schema.py`: [`schema/run_result.schema.json`](schema/run_result.schema.json)
 - VS Code extension published to the Marketplace as [`brycewang-stanford.stata-code-vscode`](https://marketplace.visualstudio.com/items?itemName=brycewang-stanford.stata-code-vscode): syntax highlighting, section outline/navigation, code-lens cell and section runners, sidebar (sessions / last result / run history / logs / graphs), status bar, completions, conservative variable rename, diagnostics, MCP child-process spawn
 - Clean-room license policy ([LICENSE-POLICY.md](LICENSE-POLICY.md))

stata_code-0.8.0/docs/industry-leader-roadmap.md ADDED Viewed

@@ -0,0 +1,99 @@
+# stata-code Industry Leadership Roadmap
+This roadmap translates the June 2026 empirical-research MCP landscape into
+work that fits `stata-code`'s architecture. The project should win by being the
+most reliable agent-native Stata execution and audit layer for empirical
+economists, not by becoming a grab-bag data platform or a second R/Python
+runtime.
+## North Star
+`stata-code` should be the default way an AI agent runs, inspects, repairs, and
+audits Stata work:
+- one execution core across Python, MCP, Jupyter, and VS Code;
+- stable `RunResult` schema with typed errors and native `r()` / `e()` values;
+- token-efficient logs, graphs, matrices, and run bundles;
+- economist-facing workflows for DiD, IV, RDD, tables, data handoff, and
+  cross-package verification.
+## Product Pillars
+1. **Reliable execution contract.** Keep `SCHEMA.md` load-bearing. Agents
+   branch on `ok`, `error.kind`, `results.e`, refs, and run manifests instead
+   of parsing log prose.
+2. **Econometrics workflow intelligence.** Ship concise skill references and
+   prompts that know the Stata commands economists actually use: `csdid`,
+   `did_imputation`, `eventstudyinteract`, `rdrobust`, `ivreg2`,
+   `ivreghdfe`, `boottest`, `esttab`, `collect`, and related packages.
+3. **Cross-stack parity audits.** Treat R/Python/Stata disagreement as a first
+   class research risk. `stata-code` should run the Stata leg and define the
+   comparison protocol without pretending to own the R or Python runtimes.
+4. **Data-MCP handoff.** External MCP servers can discover and fetch official
+   data. `stata-code` should document and validate the handoff into Stata:
+   source metadata, stable raw files, key checks, and reproducible imports.
+5. **Editor and artifact ergonomics.** VS Code should make sessions, graphs,
+   logs, tables, data previews, and run bundles easy to inspect without hiding
+   the underlying structured result.
+6. **Distribution confidence.** Install and runtime checks should be easy to
+   verify without mutating user config. Prefer `doctor`/`verify` diagnostics
+   before any automatic config writer.
+## Scope Boundaries
+`stata-code` should not directly bundle data-provider APIs, R sessions, Python
+causal libraries, or paid services. Those are separate tools. The durable
+boundary is: external data/model tools produce files or results; `stata-code`
+executes and audits the Stata side with traceable artifacts.
+## One-Month Execution Plan
+### Week 1: Workflow Layer
+- Add cross-agent coordination and this roadmap.
+- Expand the skill reference library for modern DiD, IV/weak-IV, RDD,
+  table-export, data-MCP handoff, and parity audits.
+- Add examples that show how agents should use the workflows without claiming
+  unsupported automation.
+- Add MCP prompts for parity audit planning, data-MCP-to-Stata handoff, and
+  turnkey method templates for DiD/event study, IV/2SLS, RDD, and publication
+  tables.
+- Validate with skill packaging tests, MCP prompt tests, and markdown hygiene.
+### Week 2: Diagnostics and Setup Confidence
+- Ship a read-only `stata-code doctor` / `verify` command that reports Python,
+  `stata-code`, MCP extras, `pystata` discovery, Stata version/edition, PATH
+  resolution, and common client config hints.
+- Keep config writing out of scope until backups and dry-run behavior exist.
+- Add tests for missing `pystata`, missing MCP extra, path mismatch, and JSON
+  output.
+### Week 3: VS Code and Artifacts
+- Improve dataset preview from first-100 text output toward a paged/filterable
+  view or a clearly documented intermediate step.
+- Surface table/export artifacts from run bundles more explicitly.
+- Add tests around formatter and tree-provider behavior before broad UI work.
+### Week 4: Release Quality
+- Sweep README.md, README.zh.md, vscode/README.md, CHANGELOG.md, examples,
+  and skill docs for drift.
+- Run release-relevant checks: version guard, schema export, skill zip build,
+  MCP tests, core tests that do not require Stata, and VS Code compile/tests if
+  touched.
+- Prepare release notes that separate shipped features from roadmap items.
+## Success Criteria
+- Agents can find a documented path for the top empirical workflows without
+  loading the whole reference library.
+- Parity audits preserve sample definitions, package versions, estimator
+  defaults, failure/refusal behavior, and numeric tolerances.
+- Data pulled by external MCP servers enters Stata through a reproducible raw
+  file plus metadata handoff, not through unstated browser-copy steps.
+- User-facing docs explain that `stata-code` runs Stata and coordinates with
+  other MCP tools; they do not imply that it directly runs R/Python or hosts
+  official data APIs.
+- All changed surfaces have targeted validation evidence before handoff.

stata_code-0.8.0/examples/06-cross-stack-parity-audit.md ADDED Viewed

@@ -0,0 +1,101 @@
+# 06 — Cross-stack parity audit
+> **Goal:** show how an agent should use `stata-code` for the Stata leg of a
+> Stata/R/Python robustness audit without pretending that one tool owns every
+> runtime.
+This example is intentionally protocol-first. The exact R/Python calls depend
+on which external MCP servers or local runtimes the user has installed. The
+Stata leg is concrete and traceable through `stata_run`.
+## Step 1: freeze the common sample
+**Agent calls:**
+```json
+{
+  "tool": "stata_run",
+  "arguments": {
+    "code": "use data/panel.dta, clear\negen unit_id = group(firm_id), label\negen time_id = group(year), label\ngen byte audit_sample = !missing(y, first_treat, unit_id, time_id, x1, x2)\nkeep if audit_sample\nisid unit_id time_id\ncompress\ndatasignature set, reset\nsave data/derived/parity_sample.dta, replace\nexport delimited using data/derived/parity_sample.csv, replace",
+    "origin_path": "/abs/project/analysis/00_freeze_parity_sample.do",
+    "origin_kind": "file",
+    "persist_log_files": true
+  }
+}
+```
+**Agent reads:**
+- `ok`, `rc`, and any typed error.
+- `dataset.n_obs` and `dataset.n_vars`.
+- `log.files.directory` for the run bundle.
+- generated files copied into `outputs/` when persistence is enabled.
+The CSV is the handoff file for R/Python tools. The DTA is the Stata source for
+the Stata estimators. Do not let every package define its own missing-value
+sample.
+## Step 2: run the Stata estimator
+**Agent calls:**
+```json
+{
+  "tool": "stata_run",
+  "arguments": {
+    "code": "use data/derived/parity_sample.dta, clear\ncsdid y x1 x2, ivar(unit_id) time(time_id) gvar(first_treat) method(dripw)\nestat simple\nestat event\ncsdid_plot",
+    "session_id": "stata_csdid",
+    "origin_path": "/abs/project/analysis/01_stata_csdid.do",
+    "origin_kind": "file",
+    "persist_log_files": true
+  }
+}
+```
+**Agent reads:**
+- `results.e.scalars` for `N` and available fit/ATT scalars.
+- `results.e.matrices` for coefficient and VCE payloads.
+- `graphs[0].ref` for the event-study plot.
+- `warnings` and `log.error_window` for dropped cohorts or estimator refusal.
+If `csdid` is missing, the repair loop may call:
+```json
+{"tool": "install_package", "arguments": {"name": "csdid"}}
+```
+and, if needed:
+```json
+{"tool": "install_package", "arguments": {"name": "drdid"}}
+```
+## Step 3: run external legs with their own tools
+The agent should hand `data/derived/parity_sample.csv` plus the written parity
+contract to the R/Python tools that are actually available. `stata-code` should
+not claim those estimates. It should record their package versions, options,
+sample `N`, warnings/refusals, and output files in the comparison table.
+## Step 4: compare only like with like
+| Stack | Package | Target | N | Estimate | SE | Warning/refusal |
+| --- | --- | --- | ---: | ---: | ---: | --- |
+| Stata | `csdid` | overall ATT from `estat simple` | from `results.e` | from `e(b)`/scalar | from `e(V)` | from `warnings` |
+| R | external | same target | external | external | external | external |
+| Python | external | same target | external | external | external | external |
+Do not compare an overall ATT to an event-time coefficient. Do not hide package
+refusals. If sample `N` differs, stop and fix the sample before interpreting
+coefficient differences.
+## Step 5: report conservatively
+Use language like:
+- "The Stata `csdid` leg ran on the frozen sample and produced ..."
+- "The R/Python legs were run by external tools; stata-code only coordinated the
+  handoff and Stata audit trail."
+- "The estimates agree within the predeclared tolerance" or "they diverge, with
+  the likely source being sample/default/failure differences."

stata_code-0.8.0/examples/07-data-mcp-handoff.md ADDED Viewed

@@ -0,0 +1,77 @@
+# 07 — Data-MCP handoff into Stata
+> **Goal:** show how an agent should turn data fetched by an external data MCP
+> into a reproducible Stata analysis.
+Use this pattern for OpenEcon, World Bank Data360, FRED, OECD, IMF, Eurostat, or
+project-specific database MCPs. The data MCP discovers and fetches data;
+`stata-code` imports, validates, analyzes, and records the Stata run.
+## Step 1: persist the external data pull
+The external data MCP should save:
+```text
+data/raw/oecd_youth_unemployment_pisa_2010_2023.csv
+data/raw/oecd_youth_unemployment_pisa_2010_2023.source.json
+```
+The metadata file should include provider, endpoint or source URL, indicator
+IDs, countries, date range, fetch timestamp, units, and any non-Stata
+transformations.
+## Step 2: import and validate through Stata
+**Agent calls:**
+```json
+{
+  "tool": "stata_run",
+  "arguments": {
+    "code": "version 18\nclear all\nset more off\nlocal raw \"data/raw/oecd_youth_unemployment_pisa_2010_2023.csv\"\nlocal out \"data/derived/oecd_youth_unemployment_pisa_2010_2023.dta\"\ncapture confirm file \"`raw'\"\nif _rc {\n    display as error \"Missing raw data file: `raw'\"\n    exit 601\n}\nimport delimited using \"`raw'\", varnames(1) clear bindquote(strict) encoding(UTF-8)\ncompress\nisid country year\nassert inrange(year, 2010, 2023)\nassert !missing(country, year)\nnotes _dta: Source metadata stored next to raw CSV.\ndatasignature set, reset\nsave \"`out'\", replace",
+    "origin_path": "/abs/project/analysis/00_import_oecd.do",
+    "origin_kind": "file",
+    "persist_log_files": true
+  }
+}
+```
+**Agent reads:**
+- `ok` and `error.kind` if import/validation failed.
+- `dataset.n_obs`, `dataset.n_vars`, and variables.
+- run-bundle paths for logs and generated `.dta`.
+## Step 3: analyze the derived DTA
+**Agent calls:**
+```json
+{
+  "tool": "stata_run",
+  "arguments": {
+    "code": "use data/derived/oecd_youth_unemployment_pisa_2010_2023.dta, clear\nsummarize youth_unemployment pisa_math\npwcorr youth_unemployment pisa_math, sig obs\ntwoway scatter pisa_math youth_unemployment, xtitle(\"Youth unemployment rate\") ytitle(\"PISA math score\") title(\"OECD countries, 2010-2023\")",
+    "origin_path": "/abs/project/analysis/01_scatter_corr.do",
+    "origin_kind": "file",
+    "persist_log_files": true
+  }
+}
+```
+**Agent reads:**
+- `results.r.scalars` for correlation/post-command scalars when available.
+- `graphs[0].ref` for the scatter plot.
+- `log.ref` only if the structured fields do not contain the needed detail.
+## Step 4: report provenance
+The final answer should cite the metadata file and Stata run bundle, not the
+LLM's memory. A good handoff report includes:
+- source provider and indicator IDs;
+- raw and derived file paths;
+- import validation checks;
+- Stata commands run;
+- graph/table/log artifact paths;
+- any missingness, unit, or key warnings.

{stata_code-0.7.2 → stata_code-0.8.0}/examples/README.md RENAMED Viewed

@@ -11,5 +11,7 @@ The recurring theme is **token economy**: by default the server returns a 20-lin
 | 03 | [Graphs](./03-graphs.md) | `graph://` refs vs. inline base64; `include_graphs="inline"` opt-in |
 | 04 | [Multi-session](./04-multi-session.md) | Two parallel analyses via `session_id` (Stata frames under the hood) |
 | 05 | [Large matrices](./05-large-matrix.md) | `matrix://` refs when a result exceeds the 10,000-cell inline cap |
+| 06 | [Cross-stack parity audit](./06-cross-stack-parity-audit.md) | Freeze one sample, run the Stata leg, and compare against external R/Python legs without hiding package disagreement |
+| 07 | [Data-MCP handoff](./07-data-mcp-handoff.md) | Persist external data-MCP pulls, import/validate them in Stata, and keep provenance in run bundles |
 Every Stata command shown is real syntax. Tool names, argument names, and response field names match `stata_code/mcp/server.py` and `SCHEMA.md` v1.0. Where a JSON response is abbreviated for readability, an inline comment marks what was cut.

{stata_code-0.7.2 → stata_code-0.8.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "stata-code"
-version = "0.7.2"
+version = "0.8.0"
 description = "Agent-native Stata bridge — one core, multiple frontends (MCP, Jupyter, VSCode)"
 readme = "README.md"
 license = "MIT"
@@ -47,6 +47,7 @@ kernel = ["ipykernel>=6.0"]
 all = ["stata-code[mcp,kernel]"]
 [project.scripts]
+stata-code = "stata_code.cli:run_main"
 stata-code-kernel = "stata_code.kernel.kernel:run_main"
 stata-code-mcp = "stata_code.mcp.server:run_main"

stata-code 0.7.2__tar.gz → 0.8.0__tar.gz

stata-code 0.7.2tar.gz → 0.8.0tar.gz