PyPI - stata-code - Versions diffs - 0.7.1__tar.gz → 0.7.2__tar.gz - Mend

stata-code 0.7.1tar.gz → 0.7.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (61) hide show

{stata_code-0.7.1 → stata_code-0.7.2}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,33 @@ to semver-major.minor for the result schema (see `SCHEMA.md` §6).
 ## Unreleased
+## 0.7.2 — 2026-06-20
+### Added
+- **Three convenience MCP tools** raise the tool surface from 15 to 18:
+  - `install_package(name, source?, url?, replace?, session_id?)` — installs a
+    community package via `ssc install` / `net install` without the agent
+    having to remember the syntax, then verifies it resolves with `which`.
+    Package names and URLs are validated to keep them out of the generated
+    command line; failures surface the typed `error` block (e.g. `network`).
+  - `search_log(ref, pattern, is_regex?, ignore_case?, context?, max_matches?)`
+    — greps within a truncated `log://` payload and returns only the matching
+    lines (with optional context), so a long log can be inspected without
+    pulling the whole transcript back through `get_log`.
+  - `inspect_data(varlist?, detail?, session_id?)` — runs `describe` +
+    `codebook` and returns the structured `dataset` block plus the codebook
+    log: a one-call "what's in this dataset" the agent doesn't have to spell out.
+- **On-demand Stata reference library** under `skills/stata-code/references/`
+  (~4,200 lines): topic files for core syntax, data management, econometrics,
+  causal inference, panel/time series, graphics, and table export; load-bearing
+  `error-codes.md` (the full `rc → kind → fix` table + self-repair loop, aligned
+  with the typed-error taxonomy) and `defensive-coding.md`; and per-package notes
+  for `reghdfe`, `coefplot`, `estout`, and `gtools`. `SKILL.md` gained a routing
+  table (read 1–3 files on demand) and a live-vs-offline execution-mode section.
+- **`scripts/build_skill_zip.py`** packages the skill into a deterministic
+  `build/stata-code-skill.zip` for upload as Claude.ai project knowledge.
 ## 0.7.1 — 2026-06-19
 ### Fixed

{stata_code-0.7.1 → stata_code-0.7.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: stata-code
-Version: 0.7.1
+Version: 0.7.2
 Summary: Agent-native Stata bridge — one core, multiple frontends (MCP, Jupyter, VSCode)
 Project-URL: Homepage, https://github.com/brycewang-stanford/stata-code
 Project-URL: Repository, https://github.com/brycewang-stanford/stata-code
@@ -188,7 +188,7 @@ claude mcp add stata-code --scope local -- stata-code-mcp
 claude mcp add stata-code --scope project -- stata-code-mcp
 ```
-Then launch `claude` and type `/mcp` to confirm `stata-code` shows up with its 15 tools (`stata_run`, `stata_info`, `get_log`, `get_graph`, `get_matrix`, `list_sessions`, `cancel_session`, `reset_session`, `notebook_outline`, `notebook_get_cell`, `notebook_locate`, `notebook_edit_cell`, `notebook_insert_cell`, `notebook_delete_cell`, `list_runs`).
+Then launch `claude` and type `/mcp` to confirm `stata-code` shows up with its 18 tools (`stata_run`, `stata_info`, `get_log`, `search_log`, `get_graph`, `get_matrix`, `inspect_data`, `install_package`, `list_sessions`, `cancel_session`, `reset_session`, `notebook_outline`, `notebook_get_cell`, `notebook_locate`, `notebook_edit_cell`, `notebook_insert_cell`, `notebook_delete_cell`, `list_runs`).
 #### Error Recovery in Agent Workflows
@@ -276,15 +276,18 @@ If an OpenAI-backed client reports `API Error: 400 Invalid schema for function
 upgrade to `stata-code>=0.6.5`, then restart the MCP client. Older server
 processes keep advertising the stale schema until they are restarted.
-The MCP server registers 15 tools:
+The MCP server registers 18 tools:
 | Tool | Purpose |
 | --- | --- |
 | `stata_run` | Execute Stata code and return a v1.0 RunResult JSON |
 | `stata_info` | Report Stata edition, version, and capabilities |
 | `get_log` | Fetch the full log behind a `log://` ref |
+| `search_log` | Search matching lines inside a stored `log://` payload |
 | `get_graph` | Fetch graph bytes behind a `graph://` ref (`ImageContent`) |
 | `get_matrix` | Fetch matrix payloads behind a `matrix://` ref |
+| `inspect_data` | Run `describe` + `codebook` and return compact dataset metadata |
+| `install_package` | Install an SSC or explicit `net install` package and verify it resolves |
 | `list_sessions` | Enumerate live sessions |
 | `cancel_session` | Cancel a session; the subprocess-backed path terminates in-flight runs and short-circuits pending ones |
 | `reset_session` | Drop a session's data |
@@ -416,7 +419,7 @@ stata_code/
 │   ├── runner.py      # in-process execute(); collects everything via sfi
 │   └── _pool.py       # subprocess workers for public API / MCP hard timeouts
 ├── mcp/
-│   └── server.py      # MCP server (15 tools)
+│   └── server.py      # MCP server (18 tools)
 └── kernel/
     └── kernel.py      # Jupyter kernel
 ```
@@ -454,7 +457,7 @@ stata_code/
 - Log truncation with ref store
 - Warning extraction: 5 categories + generic notes
 - 32-kind error taxonomy with canonical suggestions
-- MCP server: 15 tools, including notebook navigation / search / atomic edits and the run-bundle index (`list_runs`)
+- MCP server: 18 tools, including notebook navigation / search / atomic edits, the run-bundle index (`list_runs`), log grep (`search_log`), dataset inspection (`inspect_data`), and package installation (`install_package`)
 - Jupyter kernel: rewired to the v1.0 pipeline, kernel logos bundled
 - Matrix size cap + `get_matrix(ref)` for large matrices (>10k cells)
 - Subprocess-backed hard timeout and cancellation for the public Python API and MCP server: `timeout_ms`, `cancel(session_id)`, and MCP `cancel_session`

{stata_code-0.7.1 → stata_code-0.7.2}/README.md RENAMED Viewed

@@ -149,7 +149,7 @@ claude mcp add stata-code --scope local -- stata-code-mcp
 claude mcp add stata-code --scope project -- stata-code-mcp
 ```
-Then launch `claude` and type `/mcp` to confirm `stata-code` shows up with its 15 tools (`stata_run`, `stata_info`, `get_log`, `get_graph`, `get_matrix`, `list_sessions`, `cancel_session`, `reset_session`, `notebook_outline`, `notebook_get_cell`, `notebook_locate`, `notebook_edit_cell`, `notebook_insert_cell`, `notebook_delete_cell`, `list_runs`).
+Then launch `claude` and type `/mcp` to confirm `stata-code` shows up with its 18 tools (`stata_run`, `stata_info`, `get_log`, `search_log`, `get_graph`, `get_matrix`, `inspect_data`, `install_package`, `list_sessions`, `cancel_session`, `reset_session`, `notebook_outline`, `notebook_get_cell`, `notebook_locate`, `notebook_edit_cell`, `notebook_insert_cell`, `notebook_delete_cell`, `list_runs`).
 #### Error Recovery in Agent Workflows
@@ -237,15 +237,18 @@ If an OpenAI-backed client reports `API Error: 400 Invalid schema for function
 upgrade to `stata-code>=0.6.5`, then restart the MCP client. Older server
 processes keep advertising the stale schema until they are restarted.
-The MCP server registers 15 tools:
+The MCP server registers 18 tools:
 | Tool | Purpose |
 | --- | --- |
 | `stata_run` | Execute Stata code and return a v1.0 RunResult JSON |
 | `stata_info` | Report Stata edition, version, and capabilities |
 | `get_log` | Fetch the full log behind a `log://` ref |
+| `search_log` | Search matching lines inside a stored `log://` payload |
 | `get_graph` | Fetch graph bytes behind a `graph://` ref (`ImageContent`) |
 | `get_matrix` | Fetch matrix payloads behind a `matrix://` ref |
+| `inspect_data` | Run `describe` + `codebook` and return compact dataset metadata |
+| `install_package` | Install an SSC or explicit `net install` package and verify it resolves |
 | `list_sessions` | Enumerate live sessions |
 | `cancel_session` | Cancel a session; the subprocess-backed path terminates in-flight runs and short-circuits pending ones |
 | `reset_session` | Drop a session's data |
@@ -377,7 +380,7 @@ stata_code/
 │   ├── runner.py      # in-process execute(); collects everything via sfi
 │   └── _pool.py       # subprocess workers for public API / MCP hard timeouts
 ├── mcp/
-│   └── server.py      # MCP server (15 tools)
+│   └── server.py      # MCP server (18 tools)
 └── kernel/
     └── kernel.py      # Jupyter kernel
 ```
@@ -415,7 +418,7 @@ stata_code/
 - Log truncation with ref store
 - Warning extraction: 5 categories + generic notes
 - 32-kind error taxonomy with canonical suggestions
-- MCP server: 15 tools, including notebook navigation / search / atomic edits and the run-bundle index (`list_runs`)
+- MCP server: 18 tools, including notebook navigation / search / atomic edits, the run-bundle index (`list_runs`), log grep (`search_log`), dataset inspection (`inspect_data`), and package installation (`install_package`)
 - Jupyter kernel: rewired to the v1.0 pipeline, kernel logos bundled
 - Matrix size cap + `get_matrix(ref)` for large matrices (>10k cells)
 - Subprocess-backed hard timeout and cancellation for the public Python API and MCP server: `timeout_ms`, `cancel(session_id)`, and MCP `cancel_session`

{stata_code-0.7.1 → stata_code-0.7.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "stata-code"
-version = "0.7.1"
+version = "0.7.2"
 description = "Agent-native Stata bridge — one core, multiple frontends (MCP, Jupyter, VSCode)"
 readme = "README.md"
 license = "MIT"

stata_code-0.7.2/scripts/build_skill_zip.py ADDED Viewed

@@ -0,0 +1,105 @@
+"""Package the ``stata-code`` skill into a single uploadable ``.zip``.
+The skill (``skills/stata-code/SKILL.md`` + the ``references/`` library) is
+consumed two ways:
+* In-repo / Claude Code — read straight from ``skills/stata-code/``.
+* Claude.ai project knowledge — uploaded as a ``.zip``. This script builds
+  that archive.
+The archive contains a single top-level ``stata-code/`` folder so it extracts
+cleanly::
+    stata-code/SKILL.md
+    stata-code/references/econometrics.md
+    stata-code/references/packages/reghdfe.md
+    ...
+Run::
+    python scripts/build_skill_zip.py                 # -> build/stata-code-skill.zip
+    python scripts/build_skill_zip.py -o /tmp/out.zip  # custom destination
+The build is deterministic (sorted entries, fixed timestamps) so re-running it
+on unchanged inputs produces a byte-identical archive.
+"""
+from __future__ import annotations
+import argparse
+import sys
+import zipfile
+from pathlib import Path
+REPO_ROOT = Path(__file__).resolve().parent.parent
+SKILL_DIR = REPO_ROOT / "skills" / "stata-code"
+DEFAULT_OUTPUT = REPO_ROOT / "build" / "stata-code-skill.zip"
+ARCHIVE_PREFIX = "stata-code"
+# Fixed timestamp for reproducible archives (zip epoch starts at 1980).
+_FIXED_DATE_TIME = (1980, 1, 1, 0, 0, 0)
+def collect_files(skill_dir: Path = SKILL_DIR) -> list[Path]:
+    """Return every shippable skill file, sorted, relative-stable.
+    Excludes editor/OS cruft so the archive is clean.
+    """
+    if not skill_dir.is_dir():
+        raise FileNotFoundError(f"skill directory not found: {skill_dir}")
+    skip = {".DS_Store"}
+    files = [
+        p
+        for p in skill_dir.rglob("*")
+        if p.is_file() and p.name not in skip and "__pycache__" not in p.parts
+    ]
+    return sorted(files)
+def build_zip(
+    dest: Path = DEFAULT_OUTPUT,
+    skill_dir: Path = SKILL_DIR,
+) -> list[str]:
+    """Write the skill archive to ``dest``; return the arcnames included."""
+    files = collect_files(skill_dir)
+    if not files:
+        raise FileNotFoundError(f"no skill files under {skill_dir}")
+    dest.parent.mkdir(parents=True, exist_ok=True)
+    arcnames: list[str] = []
+    with zipfile.ZipFile(dest, "w", compression=zipfile.ZIP_DEFLATED) as zf:
+        for path in files:
+            rel = path.relative_to(skill_dir).as_posix()
+            arcname = f"{ARCHIVE_PREFIX}/{rel}"
+            info = zipfile.ZipInfo(arcname, date_time=_FIXED_DATE_TIME)
+            info.compress_type = zipfile.ZIP_DEFLATED
+            info.external_attr = 0o644 << 16  # regular file, rw-r--r--
+            zf.writestr(info, path.read_bytes())
+            arcnames.append(arcname)
+    return arcnames
+def main() -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "-o",
+        "--output",
+        type=Path,
+        default=DEFAULT_OUTPUT,
+        help=f"Destination .zip (default: {DEFAULT_OUTPUT.relative_to(REPO_ROOT)}).",
+    )
+    args = parser.parse_args()
+    try:
+        arcnames = build_zip(args.output)
+    except FileNotFoundError as exc:
+        print(f"error: {exc}", file=sys.stderr)
+        return 1
+    size = args.output.stat().st_size
+    print(f"wrote: {args.output}  ({len(arcnames)} files, {size:,} bytes)")
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())

{stata_code-0.7.1 → stata_code-0.7.2}/stata_code/__init__.py RENAMED Viewed

@@ -174,7 +174,7 @@ def is_available() -> bool:
     return True
-__version__ = "0.7.1"
+__version__ = "0.7.2"
 __all__ = [
     # Primary entry points

{stata_code-0.7.1 → stata_code-0.7.2}/stata_code/core/runner.py RENAMED Viewed

@@ -218,6 +218,107 @@ def get_log(ref: str) -> dict[str, Any]:
     }
+def search_log(
+    ref: str,
+    pattern: str,
+    *,
+    is_regex: bool = False,
+    ignore_case: bool = True,
+    context: int = 0,
+    max_matches: int = 50,
+) -> dict[str, Any]:
+    """Auxiliary tool: grep within a stored ``log://`` payload.
+    Pairs with the token-economy default of returning long logs by
+    reference: instead of pulling the whole log back with
+    :func:`get_log`, the agent can find just the lines it cares about.
+    Parameters
+    ----------
+    ref : str
+        A ``log://<request_id>`` ref produced by a truncated ``stata_run``.
+    pattern : str
+        Substring (default) or regular expression (``is_regex=True``) to
+        match against each line.
+    is_regex : bool
+        Treat ``pattern`` as a Python regular expression. A malformed
+        regex raises :class:`ValueError` (surfaced as ``invalid_request``).
+    ignore_case : bool
+        Case-insensitive matching (default ``True``).
+    context : int
+        Lines of surrounding context to include on each side of a match
+        (capped at 10). ``before`` / ``after`` are omitted when 0.
+    max_matches : int
+        Stop after this many matches; ``truncated`` reports whether more
+        existed (capped at 1000).
+    Returns
+    -------
+    dict
+        ``{ref, pattern, is_regex, lines_total, match_count, truncated,
+        matches: [{line_no, text, before?, after?}]}``. ``line_no`` is
+        1-based. Raises :class:`RefNotFound` for an unknown ref.
+    """
+    payload = _refs.get(ref)
+    if (
+        not isinstance(payload, dict)
+        or not isinstance(payload.get("text"), str)
+        or "lines_total" not in payload
+    ):
+        raise RefNotFound(ref, kind="unknown_log_ref")
+    if not pattern:
+        raise ValueError("pattern must be a non-empty string")
+    context = max(0, min(int(context), 10))
+    max_matches = max(1, min(int(max_matches), 1000))
+    flags = re.IGNORECASE if ignore_case else 0
+    if is_regex:
+        try:
+            matcher = re.compile(pattern, flags)
+        except re.error as exc:
+            raise ValueError(f"invalid regex: {exc}") from exc
+        def _hit(line: str) -> bool:
+            return matcher.search(line) is not None
+    else:
+        needle = pattern.lower() if ignore_case else pattern
+        def _hit(line: str) -> bool:
+            hay = line.lower() if ignore_case else line
+            return needle in hay
+    text: str = payload["text"]
+    lines = text.split("\n")
+    matches: list[dict[str, Any]] = []
+    truncated = False
+    for idx, line in enumerate(lines):
+        if not _hit(line):
+            continue
+        if len(matches) >= max_matches:
+            truncated = True
+            break
+        entry: dict[str, Any] = {"line_no": idx + 1, "text": line}
+        if context:
+            before = lines[max(0, idx - context):idx]
+            after = lines[idx + 1:idx + 1 + context]
+            if before:
+                entry["before"] = before
+            if after:
+                entry["after"] = after
+        matches.append(entry)
+    return {
+        "ref": ref,
+        "pattern": pattern,
+        "is_regex": is_regex,
+        "lines_total": payload["lines_total"],
+        "match_count": len(matches),
+        "truncated": truncated,
+        "matches": matches,
+    }
 def cancel(session_id: str = "main") -> bool:
     """Request cancellation of the next ``execute()`` call for ``session_id``.

stata-code 0.7.1__tar.gz → 0.7.2__tar.gz

stata-code 0.7.1tar.gz → 0.7.2tar.gz