stata-code 0.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,389 @@
1
+ Metadata-Version: 2.4
2
+ Name: stata-code
3
+ Version: 0.3.0
4
+ Summary: Agent-native Stata bridge — one core, multiple frontends (MCP, Jupyter, VSCode)
5
+ Project-URL: Homepage, https://github.com/brycewang-stanford/stata-code
6
+ Project-URL: Repository, https://github.com/brycewang-stanford/stata-code
7
+ Project-URL: Issues, https://github.com/brycewang-stanford/stata-code/issues
8
+ Project-URL: Changelog, https://github.com/brycewang-stanford/stata-code/blob/main/CHANGELOG.md
9
+ Author-email: Bryce Wang <brycewang@stanford.edu>
10
+ License-Expression: MIT
11
+ License-File: LICENSE
12
+ License-File: LICENSE-POLICY.md
13
+ Keywords: causal-inference,jupyter,mcp,pystata,stata,vscode
14
+ Classifier: Development Status :: 3 - Alpha
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Requires-Python: >=3.10
24
+ Requires-Dist: pydantic>=2.0
25
+ Provides-Extra: all
26
+ Requires-Dist: ipykernel>=6.0; extra == 'all'
27
+ Requires-Dist: mcp>=1.0; extra == 'all'
28
+ Provides-Extra: dev
29
+ Requires-Dist: mypy>=1.8; extra == 'dev'
30
+ Requires-Dist: pytest-cov>=4.0; extra == 'dev'
31
+ Requires-Dist: pytest>=8.0; extra == 'dev'
32
+ Requires-Dist: ruff>=0.4.0; extra == 'dev'
33
+ Provides-Extra: kernel
34
+ Requires-Dist: ipykernel>=6.0; extra == 'kernel'
35
+ Provides-Extra: mcp
36
+ Requires-Dist: mcp>=1.0; extra == 'mcp'
37
+ Description-Content-Type: text/markdown
38
+
39
+ # stata-code
40
+
41
+ > 面向 LLM 智能体的 Stata 桥接工具 - **一个 Python 核心,多种前端入口**。
42
+ > Agent-native Stata bridge - **one Python core, multiple frontends**.
43
+
44
+ `stata-code` 让你可以从现代开发环境中驱动 Stata:LLM 智能体(Claude Code、Cursor、Claude Desktop)、Jupyter notebook,或计划中的 VS Code 编辑器入口。它们共享同一个 Python 核心,并返回稳定、结构化、**适合智能体读取**的结果格式。
45
+
46
+ `stata-code` lets you drive Stata from modern environments: an LLM agent (Claude Code, Cursor, Claude Desktop), a Jupyter notebook, or a planned VS Code editor session. All frontends share one Python core and return a stable, structured, **agent-friendly** result schema.
47
+
48
+ ```text
49
+ ┌────────────────────────────────────────┐
50
+ │ stata-code core (Python) │
51
+ │ │
52
+ │ • pystata adapter (Stata 17+) │
53
+ │ • v1.0 unified result schema │
54
+ │ • token-economy defaults │
55
+ │ • multi-session via Stata frames │
56
+ │ • typed errors + suggestions │
57
+ └────────────────────────────────────────┘
58
+ ↑ ↑ ↑
59
+ ┌────────┴────┐ ┌──────┴─────┐ ┌────┴────────────┐
60
+ │ Jupyter │ │ MCP │ │ VS Code glue │
61
+ │ kernel │ │ server │ │ (planned) │
62
+ └─────────────┘ └────────────┘ └─────────────────┘
63
+ ```
64
+
65
+ **当前状态 / Status: v0.2 (May 2026)** - core、MCP server 和 Jupyter kernel 已经可以在 Stata 18 MP 上端到端运行。当前测试:144 passing(88 个不需要 Stata 的单元测试 + 56 个真实 Stata 集成测试)。许可证:**MIT**。
66
+
67
+ **Status: v0.2 (May 2026)** - the core, MCP server, and Jupyter kernel work end-to-end against Stata 18 MP. Current test suite: 144 passing tests (88 no-Stata unit tests + 56 real-Stata integration tests). License: **MIT**.
68
+
69
+ ---
70
+
71
+ ## 为什么做这个项目 / Why this exists
72
+
73
+ Stata 的 AI / agent 工具生态现在比较分散,详见 [References-tools.md](References-tools.md):
74
+
75
+ The Stata AI / agent tooling landscape is fragmented; see [References-tools.md](References-tools.md):
76
+
77
+ - 现有 MCP server([SepineTam/stata-mcp](https://github.com/sepinetam/stata-mcp)、[tmonk/mcp-stata](https://github.com/tmonk/mcp-stata))使用 **AGPL-3.0**,不适合闭源或商业集成。
78
+ Existing MCP servers ([SepineTam/stata-mcp](https://github.com/sepinetam/stata-mcp), [tmonk/mcp-stata](https://github.com/tmonk/mcp-stata)) are **AGPL-3.0**, which is not a fit for closed-source or commercial integration.
79
+
80
+ - 常用的 VS Code AI 插件([hanlulong/stata-mcp](https://github.com/hanlulong/stata-mcp))是 MIT,但 MCP server 被打包在插件内部,不方便单独复用。
81
+ The popular VS Code AI extension ([hanlulong/stata-mcp](https://github.com/hanlulong/stata-mcp)) is MIT, but it bundles the MCP server inside the extension, making standalone reuse awkward.
82
+
83
+ - 每个工具都用自己的方式封装 `pystata`,返回结构不统一,智能体需要为不同工具写特殊处理。
84
+ Each tool wraps `pystata` with its own result shape, so agents have to special-case each integration.
85
+
86
+ - 很多工具一开始是为人类交互设计的,再接到 MCP 上;它们经常把 200 行日志和 base64 图片直接塞进回复,默认就大量消耗 token。
87
+ Many existing tools were designed for humans first and then bolted onto MCP; they often dump long logs and base64 graph blobs into every reply, burning tokens by default.
88
+
89
+ `stata-code` 要填补的就是这个空位:
90
+
91
+ `stata-code` is designed to fill that gap:
92
+
93
+ 1. **MIT 许可证**,没有 copyleft 传染问题。
94
+ **MIT-licensed**, with no copyleft contagion.
95
+
96
+ 2. 所有前端共享同一个结果格式:[SCHEMA.md](SCHEMA.md)。
97
+ One shared result schema for every frontend: [SCHEMA.md](SCHEMA.md).
98
+
99
+ 3. 默认面向智能体:typed errors、结构化 `r()` / `e()`、log refs、graph refs、suggestion seeds。
100
+ Agent-native by default: typed errors, structured `r()` / `e()`, log refs, graph refs, and suggestion seeds.
101
+
102
+ 4. 一个 core,多个入口:Jupyter kernel、MCP server、计划中的 VS Code glue。
103
+ One core, multiple frontends: Jupyter kernel, MCP server, and planned VS Code glue.
104
+
105
+ 如果你关心 AGPL/GPL Stata 项目的 clean-room 边界,请看 [LICENSE-POLICY.md](LICENSE-POLICY.md)。
106
+
107
+ For the project's clean-room policy around AGPL/GPL Stata projects, see [LICENSE-POLICY.md](LICENSE-POLICY.md).
108
+
109
+ ---
110
+
111
+ ## 安装 / Install
112
+
113
+ 要求:**Stata 17+**(自带 `pystata`)和 **Python 3.10+**。
114
+
115
+ Requirements: **Stata 17+** (with `pystata` shipped by Stata) and **Python 3.10+**.
116
+
117
+ ```bash
118
+ # from PyPI
119
+ pip install stata-code
120
+
121
+ # with the MCP server and Jupyter kernel extras
122
+ pip install "stata-code[mcp,kernel]"
123
+
124
+ # or from source (editable install for development)
125
+ git clone https://github.com/brycewang-stanford/stata-code.git
126
+ cd stata-code
127
+ pip install -e ".[mcp,kernel]"
128
+ ```
129
+
130
+ > **Naming note.** The PyPI distribution is `stata-code` (hyphen), but
131
+ > the Python import is `stata_code` (underscore — Python identifiers
132
+ > can't contain hyphens). Same convention as `scikit-learn` →
133
+ > `import sklearn`. So: `pip install stata-code`,
134
+ > `from stata_code import run`.
135
+
136
+ 注意:`pystata` **不在 PyPI 上**,它随 Stata 一起安装。`stata-code` 会自动在 macOS 的 `/Applications/Stata/utilities/pystata` 以及 Linux / Windows 的对应位置寻找它。如果你的 Stata 安装在其他位置,请在导入前把 `pystata` 加到 `PYTHONPATH`。
137
+
138
+ Note: `pystata` is **not** on PyPI; it ships with Stata. `stata-code` auto-discovers it on macOS at `/Applications/Stata/utilities/pystata` and at equivalent Linux / Windows paths. If your install is elsewhere, add it to `PYTHONPATH` before importing.
139
+
140
+ ---
141
+
142
+ ## 快速开始 / Quick Start
143
+
144
+ 完整 cookbook 在 [`examples/`](examples/):基础回归、DiD、图形、多 session、大矩阵。
145
+
146
+ See [`examples/`](examples/) for end-to-end cookbook entries: basic regression, DiD, graphs, multi-session, and large matrices.
147
+
148
+ ### 作为 Python library / As a Python Library
149
+
150
+ ```python
151
+ from stata_code import run
152
+
153
+ r = run("sysuse auto, clear")
154
+ r = run("regress mpg weight")
155
+
156
+ if r.ok:
157
+ print(r.results.e.scalars["r2"]) # 0.6515 (native float)
158
+ print(r.results.e.macros["cmd"]) # "regress"
159
+ b = r.results.e.matrices["b"]
160
+ print(dict(zip(b.cols, b.values[0]))) # {"weight": -0.006, "_cons": 39.44}
161
+ else:
162
+ print(r.error.kind, r.error.message) # ErrorKind.VARNAME_NOT_FOUND, "..."
163
+ for s in r.error.suggestions:
164
+ print("hint:", s.action) # "Did you mean `mpg`?"
165
+ ```
166
+
167
+ ### 作为 MCP server / As an MCP Server
168
+
169
+ 安装后,`stata-code-mcp` 会出现在你的 `PATH` 中。把下面的配置加到 Claude Code(`~/.claude/mcp.json` 或 Claude Code settings UI)、Cursor、Claude Desktop 等支持 MCP 的客户端里:
170
+
171
+ After install, `stata-code-mcp` is on your `PATH`. Add this to Claude Code (`~/.claude/mcp.json` or the Claude Code settings UI), Cursor, Claude Desktop, or another MCP-compatible client:
172
+
173
+ ```json
174
+ {
175
+ "mcpServers": {
176
+ "stata": {
177
+ "command": "stata-code-mcp"
178
+ }
179
+ }
180
+ }
181
+ ```
182
+
183
+ 也可以直接以 module 方式运行:
184
+
185
+ Or run it as a module:
186
+
187
+ ```bash
188
+ python -m stata_code.mcp
189
+ ```
190
+
191
+ MCP server 注册了 8 个工具:
192
+
193
+ The MCP server registers 8 tools:
194
+
195
+ | Tool | 用途 / Purpose |
196
+ | --- | --- |
197
+ | `stata_run` | 执行 Stata code,返回 v1.0 RunResult JSON / Execute Stata code and return a v1.0 RunResult JSON |
198
+ | `stata_info` | 返回 Stata edition、version 和 capabilities / Report Stata edition, version, and capabilities |
199
+ | `get_log` | 通过 `log://` ref 获取完整日志 / Fetch the full log behind a `log://` ref |
200
+ | `get_graph` | 通过 `graph://` ref 获取图形 bytes (`ImageContent`) / Fetch graph bytes behind a `graph://` ref |
201
+ | `get_matrix` | 通过 `matrix://` ref 获取矩阵 `{rows, cols, values}` / Fetch matrix payloads behind a `matrix://` ref |
202
+ | `list_sessions` | 列出 live sessions / Enumerate live sessions |
203
+ | `cancel_session` | 协作式取消某个 session 的下一次 `stata_run` / Cooperatively cancel the next `stata_run` for a session |
204
+ | `reset_session` | 清空某个 session 的数据 / Drop a session's data |
205
+
206
+ ### 作为 Jupyter kernel / As a Jupyter Kernel
207
+
208
+ ```bash
209
+ stata-code-kernel install --user
210
+ ```
211
+
212
+ 也可以直接以 module 方式安装:
213
+
214
+ Or install it as a module:
215
+
216
+ ```bash
217
+ python -m stata_code.kernel install --user
218
+ ```
219
+
220
+ 然后打开 notebook,选择 **Stata** kernel。Stata 命令会在 cell 中运行,日志、图形和 warnings 会以内联方式显示。
221
+
222
+ Then open a notebook and select the **Stata** kernel. Stata commands run in cells; logs, graphs, and warnings render inline.
223
+
224
+ ---
225
+
226
+ ## 默认节省 token / Token-Economy Defaults
227
+
228
+ 典型的 `stata_run` 响应比现有 MCP server 直接返回日志和图片的方式小约 **10 倍**。核心设计有三点:
229
+
230
+ A typical `stata_run` response is about **10x smaller** than servers that dump logs and images directly. Three design choices drive this:
231
+
232
+ 1. **日志默认只返回 `head` + `tail` + `ref`**。默认各 20 行;完整日志可以按需用 `get_log(ref)` 获取。Stata 回归日志可能有约 6,000 tokens,`stata-code` 默认约 600 tokens。
233
+ **Logs return `head` + `tail` + `ref`** by default. Full logs are fetched on demand via `get_log(ref)`. A Stata regression log can be about 6,000 tokens; `stata-code` returns about 600 by default.
234
+
235
+ 2. **图形默认返回 refs,不内联 base64**。一个 30 KB PNG 转成 base64 约 50,000 tokens;返回 ref 可以让智能体只在真正需要渲染时再取 bytes。
236
+ **Graphs return refs, not inline base64**. A 30 KB PNG can become about 50,000 base64 tokens; returning a ref avoids that unless the agent actually needs the bytes.
237
+
238
+ 3. **错误是结构化 typed errors**。智能体可以判断 `err.kind == "varname_not_found"`,而不是正则解析英文日志。
239
+ **Errors are typed**. Agents can check `err.kind == "varname_not_found"` instead of regex-parsing English logs.
240
+
241
+ 例如,变量名写错时返回的是结构化错误:
242
+
243
+ For example, a misspelled variable returns a structured error:
244
+
245
+ ```json
246
+ {
247
+ "ok": false,
248
+ "rc": 111,
249
+ "error": {
250
+ "kind": "varname_not_found",
251
+ "varname": "mpgg",
252
+ "line": 3,
253
+ "context": {
254
+ "before": ["use auto"],
255
+ "failing": "summarize mpgg",
256
+ "after": []
257
+ },
258
+ "suggestions": [
259
+ {"action": "Did you mean `mpg`?", "command": "describe"}
260
+ ]
261
+ }
262
+ }
263
+ ```
264
+
265
+ 完整 schema 见 [SCHEMA.md](SCHEMA.md)。
266
+
267
+ The full schema is in [SCHEMA.md](SCHEMA.md).
268
+
269
+ ---
270
+
271
+ ## 架构 / Architecture
272
+
273
+ ```text
274
+ stata_code/
275
+ ├── core/
276
+ │ ├── _runtime.py # process-singleton pystata wrapper
277
+ │ ├── _refs.py # LRU ref store for log/graph/matrix payloads
278
+ │ ├── schema.py # Pydantic v2 models for the v1.0 result schema
279
+ │ ├── errors.py # rc → ErrorKind mapping + suggestion seeds
280
+ │ └── runner.py # the one execute(); collects everything via sfi
281
+ ├── mcp/
282
+ │ └── server.py # MCP server (8 tools)
283
+ └── kernel/
284
+ └── kernel.py # Jupyter kernel
285
+ ```
286
+
287
+ `runner.py` 是唯一直接接触 Stata 的地方。Jupyter kernel 和 MCP server 都只导入它,然后把结果翻译成各自的传输格式。
288
+
289
+ `runner.py` is the only place that touches Stata. The Jupyter kernel and MCP server both import from it and only translate results into their own transports.
290
+
291
+ ---
292
+
293
+ ## 对比 / Comparison
294
+
295
+ | | stata-code | SepineTam/stata-mcp | hanlulong/stata-mcp | nbstata |
296
+ | --- | --- | --- | --- | --- |
297
+ | License / 许可证 | **MIT** | AGPL-3.0 | MIT | GPL-3.0 |
298
+ | Standalone MCP / 独立 MCP | ✓ | ✓ | bundled with VS Code | - |
299
+ | Jupyter kernel | ✓ | - | - | ✓ |
300
+ | Unified result schema / 统一结果格式 | ✓ ([SCHEMA.md](SCHEMA.md)) | per-tool | per-tool | per-tool |
301
+ | Token-economy defaults / 默认节省 token | ✓ (log refs, graph refs) | - | - | - |
302
+ | Typed errors + suggestions / 结构化错误和建议 | ✓ (32 kinds) | - | - | - |
303
+ | Multi-session / 多 session | ✓ (Stata frames) | partial | - | - |
304
+ | Mature ecosystem / 生态成熟度 | early | ✓ (statamcp.com, cookbook) | ✓ (11k installs) | ✓ |
305
+
306
+ `stata-code` 是这个问题空间里更年轻的、MIT 许可证的、agent-native 的替代方案。AGPL 方案里,SepineTam 的 `stata-mcp` 目前更成熟;`stata-code` 的目标是服务那些不能接受 copyleft 传染、又需要结构化智能体接口的场景。
307
+
308
+ `stata-code` is the younger, MIT-licensed, agent-native alternative in this problem space. Among the AGPL options, SepineTam's `stata-mcp` is currently more mature; `stata-code` is aimed at cases where copyleft contagion is unacceptable and agents need structured results.
309
+
310
+ ---
311
+
312
+ ## 路线图 / Roadmap
313
+
314
+ ### 已完成 / Done (v0.2 - May 2026)
315
+
316
+ - v1.0 result schema ([SCHEMA.md](SCHEMA.md))
317
+ - 基于 `pystata` 的 runner,支持 native-typed `r()`、`e()`、matrices
318
+ - Multi-session via Stata frames
319
+ - Per-line error attribution: line number、context、commands_executed
320
+ - Graph capture: `png` / `svg` / `pdf` with ref store
321
+ - Log truncation with ref store
322
+ - Warning extraction: 5 categories + generic notes
323
+ - 32-kind error taxonomy with canonical suggestions
324
+ - MCP server: 8 tools
325
+ - Jupyter kernel: rewired to the v1.0 pipeline
326
+ - Matrix size cap + `get_matrix(ref)` for large matrices (>10k cells)
327
+ - Cooperative cancellation: `cancel(session_id)` / MCP `cancel_session`
328
+ - JSON Schema artifact auto-generated from `schema.py`: [`schema/run_result.schema.json`](schema/run_result.schema.json)
329
+ - VS Code extension scaffold ([`vscode/`](vscode/)): `Run Selection`、graph webview、MCP child-process spawn
330
+ - Clean-room license policy ([LICENSE-POLICY.md](LICENSE-POLICY.md))
331
+
332
+ ### 下一步 / Next Up
333
+
334
+ - **v0.3** - Console fallback for Stata 11-16, re-implemented against the v1.0 schema
335
+ - **v0.3** - Hard timeout / mid-Stata interrupt; design and tradeoffs in [`docs/design/hard_timeout.md`](docs/design/hard_timeout.md)
336
+ - **v0.4** - VS Code Marketplace publishing; the scaffold and graph webview already work in dev host
337
+ - **v1.0** - Stable schema, PyPI / VS Code Marketplace publishing
338
+
339
+ 明确不做的范围见 [SCHEMA.md §7](SCHEMA.md)。
340
+
341
+ See [SCHEMA.md §7](SCHEMA.md) for explicitly out-of-scope items.
342
+
343
+ ---
344
+
345
+ ## 测试 / Testing
346
+
347
+ ```bash
348
+ pip install -e ".[dev,mcp,kernel]"
349
+ pytest # full suite (144 tests)
350
+ pytest -m "not stata_required" # CI subset; no Stata needed
351
+ pytest -m "stata_required" -v # Stata-only integration tests
352
+ ```
353
+
354
+ `stata_required` marker 标记真实 Stata 集成测试。CI 使用 `pytest -m "not stata_required"`,因此不会收集这些测试。本地没有 Stata 时,这些测试也会用 `"pystata / Stata 17+ not available"` 信息 cleanly skip。
355
+
356
+ The `stata_required` marker tags the real-Stata integration tests. CI uses `pytest -m "not stata_required"` so it does not collect them. Locally without Stata, those tests skip cleanly with the `"pystata / Stata 17+ not available"` message.
357
+
358
+ ---
359
+
360
+ ## 贡献 / Contributing
361
+
362
+ - 提 PR 前请先读 [LICENSE-POLICY.md](LICENSE-POLICY.md)。
363
+ Read [LICENSE-POLICY.md](LICENSE-POLICY.md) before opening a PR.
364
+
365
+ - 第一个 PR description 里请加一行 acknowledgement,模板在 policy 文件里。
366
+ Add a one-line acknowledgement to your first PR description; the template is in the policy file.
367
+
368
+ - 新增 schema field 或 runner 行为时必须补测试。
369
+ Tests are required for any new schema field or runner behavior.
370
+
371
+ ---
372
+
373
+ ## 许可证 / License
374
+
375
+ 代码使用 [MIT](./LICENSE)。[LICENSE-POLICY.md](LICENSE-POLICY.md) 说明本项目如何处理和其他 Stata 项目的关系。
376
+
377
+ The code is licensed under [MIT](./LICENSE). [LICENSE-POLICY.md](LICENSE-POLICY.md) explains how this project relates to other Stata projects.
378
+
379
+ ## 商标声明 / Trademark Notice
380
+
381
+ Stata 是 StataCorp LLC 的注册商标。本项目是独立项目,不隶属于 StataCorp,也未获得 StataCorp 背书。
382
+
383
+ Stata is a registered trademark of StataCorp LLC. This project is independent and not affiliated with or endorsed by StataCorp.
384
+
385
+ ## 致谢 / Acknowledgements
386
+
387
+ 本项目参考和学习的 Stata 工具生态整理在 [References-tools.md](References-tools.md)。其中列出的项目保留各自的许可证和作者归属;复用前请查看对应仓库。
388
+
389
+ The Stata tooling landscape that this project builds on and learns from is surveyed in [References-tools.md](References-tools.md). All listed projects retain their own licenses and authorship; please consult each repository before reuse.
@@ -0,0 +1,20 @@
1
+ stata_code/__init__.py,sha256=ewOqfKF7JfN7dobC4XDoijd1kcMNhNX8-L7jMSlXnKk,2072
2
+ stata_code/core/__init__.py,sha256=ycnmNHt7rkNgym7PIO-dP2r9JnfTUN3Sd5D4DEs_qmE,1420
3
+ stata_code/core/_pool.py,sha256=8FlKxXF7sbfQrH2ST8vi-7ij7qeqMGtpUGosrNdk8U8,30632
4
+ stata_code/core/_refs.py,sha256=8eEbYfA3QEVZTx6O7fjndLu5uO5xrP8mV3MJs9eBCYY,2691
5
+ stata_code/core/_runtime.py,sha256=c4paNhAqFNtiixccD_Xt5L7vqafA7z10PVpZSEBad_c,5734
6
+ stata_code/core/errors.py,sha256=dAR_ZeEX9WXVtO5aDC0TK3TRXuEGeG4AiBB2ecAdVB4,15970
7
+ stata_code/core/runner.py,sha256=FfhRKXId-fEbGZFTxOLfPSlkjzazg2sR6b3mmO1cu20,38428
8
+ stata_code/core/schema.py,sha256=SDqE33S74rnlN94qxwe_qKpcWw_DseSm2jIWSPNtfQI,10749
9
+ stata_code/kernel/__init__.py,sha256=YmDgfF2dpCjIxkrmbInYxDBSNwSO60LoU_pmLikXlY4,178
10
+ stata_code/kernel/__main__.py,sha256=LN7NI2cXzcUdSG8pjT0OECuRC_Bt8K70P2R_QN-h6UY,185
11
+ stata_code/kernel/kernel.py,sha256=lKsOgIJsujDCH68QtOWdo4AHoQUsLbID-euxlZVtZBo,13240
12
+ stata_code/mcp/__init__.py,sha256=OX8fbkpo0MMA7F7B1yCZyKxVkKnr6KSUR6DN-8TMtfQ,68
13
+ stata_code/mcp/__main__.py,sha256=F2-yxmtXuMuLH9Z1-QjDtLMypdpvAlTQEUKW3qWA_go,172
14
+ stata_code/mcp/server.py,sha256=MbMkVHljP_ZMVEmWwCplcgPI1rb81FVCHpBnDZ-yxbc,14434
15
+ stata_code-0.3.0.dist-info/METADATA,sha256=oWdPXSJC8zVc5qbJLs68GxnzFqHINm-oQENXDqHwNgo,19036
16
+ stata_code-0.3.0.dist-info/WHEEL,sha256=QccIxa26bgl1E6uMy58deGWi-0aeIkkangHcxk2kWfw,87
17
+ stata_code-0.3.0.dist-info/entry_points.txt,sha256=nb-7yrLNWetJQYFkQdpNVtFIEXC5mIX6r0bxbKPIiDE,120
18
+ stata_code-0.3.0.dist-info/licenses/LICENSE,sha256=6kG2eA5OSA5BHqSs_dUWiGqsOkWpnK6Zy5XxjM9ZlqQ,1065
19
+ stata_code-0.3.0.dist-info/licenses/LICENSE-POLICY.md,sha256=1XpLv9v1JJQ5Cf4EIO9bcj4cj6LMSXQiz-Sugjc61s4,6565
20
+ stata_code-0.3.0.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.29.0
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,3 @@
1
+ [console_scripts]
2
+ stata-code-kernel = stata_code.kernel.kernel:run_main
3
+ stata-code-mcp = stata_code.mcp.server:run_main
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 brycew6m
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,125 @@
1
+ # License Policy
2
+
3
+ `stata_code` is released under the **MIT License**. To keep the codebase legally clean and freely usable downstream (including by commercial and closed-source projects), this repository follows a strict **protocol-first, clean-room** development policy. This document is the binding policy; contributors must read it before opening a pull request.
4
+
5
+ ---
6
+
7
+ ## 1. Project license
8
+
9
+ - **License:** MIT (see `LICENSE`).
10
+ - **Goal:** Anyone — including commercial and closed-source projects — can integrate, fork, or redistribute `stata_code` without copyleft obligations.
11
+
12
+ This goal is incompatible with deriving from AGPL-3.0 / GPL-3.0 source code. The rules below exist to prevent that.
13
+
14
+ ---
15
+
16
+ ## 2. The three categories of references
17
+
18
+ Every external project relevant to `stata_code` falls into one of three buckets:
19
+
20
+ ### 2.1 Open standards & vendor docs (always allowed)
21
+
22
+ These define **public protocols and APIs**. Reading them, citing them, and implementing against them does not contaminate our code.
23
+
24
+ - **Anthropic MCP specification** — protocol shape, message formats, tool registration semantics.
25
+ - **Jupyter kernel protocol** — `kernel_info`, `execute_request`, message routing.
26
+ - **Language Server Protocol (LSP)** — for any future LSP work.
27
+ - **StataCorp pystata documentation** — official Python API surface.
28
+ - **StataCorp Stata documentation** (`help`, manuals) — `r()`, `e()`, `_rc`, system values.
29
+ - **Stata `.dta` file format documentation** — published by StataCorp.
30
+ - **Anthropic / OpenAI tool-use docs** — function-calling shapes.
31
+
32
+ ### 2.2 Permissively-licensed projects (allowed with attribution if reused)
33
+
34
+ MIT, BSD, Apache 2.0, ISC. Reading source is allowed; copying must follow the license terms (preserve copyright notice, etc.). Even when allowed, we **prefer independent implementation** to keep authorship clean.
35
+
36
+ - `kylebarron/stata-enhanced` — MIT (TextMate grammar; we do not reuse it).
37
+ - `kylebarron/stata-exec` — MIT (Atom; not reused).
38
+ - `kylebarron/language-stata` — MIT (Atom grammar; not reused).
39
+ - `hanlulong/stata-mcp` — MIT (we do not consult its source; see §4).
40
+ - `lbraglia/RStata` — design reference only.
41
+ - `euglevi/stata-language-server` — MIT.
42
+
43
+ ### 2.3 Copyleft projects (source code forbidden)
44
+
45
+ **Source code of these projects must not be read by anyone contributing to `stata_code`.** Their READMEs, public issues, demos, screenshots, and documentation describing user-facing behavior are fine — copyright protects expression, not ideas. But the source itself contaminates.
46
+
47
+ - `SepineTam/stata-mcp` — AGPL-3.0
48
+ - `tmonk/mcp-stata` — AGPL-3.0
49
+ - `tmonk/stata-workbench` — AGPL-3.0
50
+ - `kylebarron/stata_kernel` — GPL-3.0
51
+ - `hugetim/nbstata` — GPL-3.0
52
+
53
+ If new copyleft Stata projects appear, add them here in the same PR that first references them.
54
+
55
+ ---
56
+
57
+ ## 3. The clean-room rule
58
+
59
+ When designing or implementing any feature that overlaps with a copyleft project's behavior:
60
+
61
+ 1. **Do not open the copyleft project's source files.** Not in a browser, not in `git clone`, not in an IDE.
62
+ 2. **You may** read its README, feature list, screenshots, public issues, blog posts, and conference talks describing what it does.
63
+ 3. **You may** read the underlying public protocol or API spec (MCP, pystata, etc.) and implement against that.
64
+ 4. **You may** look at the inputs and outputs (call its tools, observe responses) — black-box behavioral observation is fine.
65
+ 5. **Design from first principles.** Our schema (`SCHEMA.md`) was designed from agent-token-economy principles and the public pystata API. It was not derived by simplifying or rearranging anyone else's schema.
66
+
67
+ If you find yourself thinking *"how does project X handle Y?"*, the answer is: read its docs and observe its behavior. Do not open its source.
68
+
69
+ ---
70
+
71
+ ## 4. If you accidentally read forbidden source
72
+
73
+ It happens. Honesty is the only safe response.
74
+
75
+ 1. **Stop reading immediately.** Close the file.
76
+ 2. **Disclose in the PR or issue.** Note what you read and approximately how much.
77
+ 3. **Wait at least 30 days** before contributing code in the affected area. If the area is small (one function), a fresh contributor implements it. If broad, that contributor sits out the area indefinitely.
78
+ 4. **Do not** quote, paraphrase, or rewrite from memory.
79
+
80
+ This is the same posture used by clean-room reverse-engineering teams. It is conservative on purpose.
81
+
82
+ ---
83
+
84
+ ## 5. Adding a new reference
85
+
86
+ When introducing any new external project to documentation, code, or discussion:
87
+
88
+ 1. Add it to one of the three lists in §2 of this file in the same PR.
89
+ 2. State its license explicitly (check `LICENSE` file, not `package.json`/`README` — those drift).
90
+ 3. If copyleft, the PR must not include any code; only the bucket-3 listing.
91
+
92
+ Reviewers should reject PRs that mention an external project without classifying it.
93
+
94
+ ---
95
+
96
+ ## 6. Dependencies vs. derivation
97
+
98
+ Note the difference:
99
+
100
+ - **Depending** on an MIT/BSD/Apache library at runtime is fine and does not contaminate.
101
+ - **Depending** on a GPL/AGPL library at runtime *does* contaminate the distributed package; we don't do that for any package we ship under MIT.
102
+ - **Depending** on a GPL/AGPL library only in a separate, GPL-licensed sub-package (e.g., `stata-code-jupyter-glue`) is acceptable as long as the MIT core does not import it. Any such split must be called out at the top of the README and in `pyproject.toml`.
103
+
104
+ ---
105
+
106
+ ## 7. Why this matters
107
+
108
+ Stata is a small ecosystem with active and vigilant maintainers, several of whom have publicly enforced their AGPL terms. A clean license posture:
109
+
110
+ - Keeps `stata_code` usable by any downstream — universities, central banks, commercial vendors.
111
+ - Prevents "rip-off" accusations that have already been levied at fork-style projects in the space.
112
+ - Makes future fundraising, hiring, and acquisitions trivial on the IP side.
113
+ - Protects contributors personally — clean-room compliance is auditable.
114
+
115
+ The cost of this policy is small (some independent design work). The cost of getting it wrong is irreversible: a contaminated codebase cannot be "scrubbed" of AGPL after the fact; only rewritten from scratch by uncontaminated authors.
116
+
117
+ ---
118
+
119
+ ## 8. Acknowledgement on first contribution
120
+
121
+ Every first-time contributor to `stata_code` adds the following line to their first PR description:
122
+
123
+ > I have read `LICENSE-POLICY.md` and confirm I have not consulted source code from the copyleft projects listed therein for the purposes of this contribution.
124
+
125
+ Maintainers may decline contributions without this acknowledgement.