PyPI - agentpack-cli - Versions diffs - 0.3.22__tar.gz → 0.3.23__tar.gz - Mend

agentpack-cli 0.3.22tar.gz → 0.3.23tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (154) hide show

{agentpack_cli-0.3.22 → agentpack_cli-0.3.23}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: agentpack-cli
-Version: 0.3.22
+Version: 0.3.23
 Summary: Local context engine for AI coding agents that ranks relevant repo files and builds compact task-focused context packs for Claude Code, Codex, Cursor, Windsurf, MCP, and CI workflows.
 Project-URL: Homepage, https://github.com/vishal2612200/agentpack
 Project-URL: Documentation, https://vishal2612200.github.io/agentpack/
@@ -77,19 +77,20 @@ pipx run --spec agentpack-cli agentpack route --task "fix auth token expiry"
 ![AgentPack route demo](docs/assets/agentpack-route-demo.svg)
-> **Status: alpha (v0.3.22).** Works, tested, and used in real sessions. Python and JavaScript/TypeScript are the best-supported languages. Current benchmarks are useful regression checks, not broad proof that AgentPack improves coding-agent success. API may change before 1.0.
+> **Status: alpha (v0.3.23).** Works, tested, and used in real sessions. Python and JavaScript/TypeScript are the best-supported languages. Current benchmarks are useful regression checks, not broad proof that AgentPack improves coding-agent success. API may change before 1.0.
 >
 > **Platform note:** macOS, Linux, and Windows are supported. Windows support targets PowerShell plus Git for Windows. `cmd.exe` and bare Git setups are not a supported path yet.
 >
 > **Name note:** PyPI package is `agentpack-cli`, npm package is `@vishal2612200/agentpack`, and the command is `agentpack`. This project is unrelated to AgentPack dataset papers or other repos with the same name.
-## What's New in 0.3.22
+## What's New in 0.3.23
-`0.3.22` is a benchmark recall release. It promotes maintenance-context
-recovery to the current expanded public-suite baseline: **66.0% recall / 51.1%
-token precision** across 108 scored public cases.
-`0.3.21` established the prior honest baseline at **57.0% recall / 50.6% token precision**. The new result clears the 65% recall target while keeping token
-precision above the 51% release floor; remaining risk is config/build recall and NestJS token precision. Result: [`benchmarks/results/2026-06-13-public.md`](benchmarks/results/2026-06-13-public.md).
+`0.3.23` is a guarded-loop hardening release. It keeps AgentPack's core promise
+focused on local context and workflow evidence while making `agentpack work --run`
+an optional proof harness around existing agents. The loop now has known runner
+adapters, smoke checks, phase/diff/risk/acceptance artifacts, rollback patches,
+metrics, and stricter finish gates. The prior expanded public-suite baseline
+remains **66.0% recall / 51.1% token precision** across 108 scored public cases.
 ## Core Workflow
@@ -245,7 +246,7 @@ agentpack status
 agentpack finish --since main
 ```
-Use `agentpack quickstart --task "..." --write` when you want AgentPack to print the next commands and write the task file for you. Use `agentpack start "..." --pack-only` when you want only a fresh pack and not the guard path.
+Use `agentpack quickstart --task "..." --write` when you want AgentPack to print the next commands and write the task file for you. Use `agentpack start "..." --pack-only` when you want only a fresh pack and not the guard path. Optional guardrail: `agentpack work --run` is a proof harness around existing agents, not AgentPack's default workflow or an autonomous coding agent.
 ### Learn from AI-assisted work

{agentpack_cli-0.3.22 → agentpack_cli-0.3.23}/README.md RENAMED Viewed

@@ -30,19 +30,20 @@ pipx run --spec agentpack-cli agentpack route --task "fix auth token expiry"
 ![AgentPack route demo](docs/assets/agentpack-route-demo.svg)
-> **Status: alpha (v0.3.22).** Works, tested, and used in real sessions. Python and JavaScript/TypeScript are the best-supported languages. Current benchmarks are useful regression checks, not broad proof that AgentPack improves coding-agent success. API may change before 1.0.
+> **Status: alpha (v0.3.23).** Works, tested, and used in real sessions. Python and JavaScript/TypeScript are the best-supported languages. Current benchmarks are useful regression checks, not broad proof that AgentPack improves coding-agent success. API may change before 1.0.
 >
 > **Platform note:** macOS, Linux, and Windows are supported. Windows support targets PowerShell plus Git for Windows. `cmd.exe` and bare Git setups are not a supported path yet.
 >
 > **Name note:** PyPI package is `agentpack-cli`, npm package is `@vishal2612200/agentpack`, and the command is `agentpack`. This project is unrelated to AgentPack dataset papers or other repos with the same name.
-## What's New in 0.3.22
+## What's New in 0.3.23
-`0.3.22` is a benchmark recall release. It promotes maintenance-context
-recovery to the current expanded public-suite baseline: **66.0% recall / 51.1%
-token precision** across 108 scored public cases.
-`0.3.21` established the prior honest baseline at **57.0% recall / 50.6% token precision**. The new result clears the 65% recall target while keeping token
-precision above the 51% release floor; remaining risk is config/build recall and NestJS token precision. Result: [`benchmarks/results/2026-06-13-public.md`](benchmarks/results/2026-06-13-public.md).
+`0.3.23` is a guarded-loop hardening release. It keeps AgentPack's core promise
+focused on local context and workflow evidence while making `agentpack work --run`
+an optional proof harness around existing agents. The loop now has known runner
+adapters, smoke checks, phase/diff/risk/acceptance artifacts, rollback patches,
+metrics, and stricter finish gates. The prior expanded public-suite baseline
+remains **66.0% recall / 51.1% token precision** across 108 scored public cases.
 ## Core Workflow
@@ -198,7 +199,7 @@ agentpack status
 agentpack finish --since main
 ```
-Use `agentpack quickstart --task "..." --write` when you want AgentPack to print the next commands and write the task file for you. Use `agentpack start "..." --pack-only` when you want only a fresh pack and not the guard path.
+Use `agentpack quickstart --task "..." --write` when you want AgentPack to print the next commands and write the task file for you. Use `agentpack start "..." --pack-only` when you want only a fresh pack and not the guard path. Optional guardrail: `agentpack work --run` is a proof harness around existing agents, not AgentPack's default workflow or an autonomous coding agent.
 ### Learn from AI-assisted work

{agentpack_cli-0.3.22 → agentpack_cli-0.3.23}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "agentpack-cli"
-version = "0.3.22"
+version = "0.3.23"
 description = "Local context engine for AI coding agents that ranks relevant repo files and builds compact task-focused context packs for Claude Code, Codex, Cursor, Windsurf, MCP, and CI workflows."
 readme = "README.md"
 requires-python = ">=3.10"

{agentpack_cli-0.3.22 → agentpack_cli-0.3.23}/src/agentpack/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """AgentPack — task-aware context packing for AI coding agents."""
-__version__ = "0.3.22"
+__version__ = "0.3.23"

{agentpack_cli-0.3.22 → agentpack_cli-0.3.23}/src/agentpack/commands/ci_cmd.py RENAMED Viewed

@@ -64,6 +64,16 @@ jobs:
       - run: python -m pip install -e ".[dev]"
       - run: python -m agentpack.cli dev-check
+  loop-smoke:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - run: python -m pip install -e ".[dev]"
+      - run: python -m agentpack.cli loop-smoke --json
   release-gate:
     runs-on: ubuntu-latest
     if: github.event_name == 'push'

{agentpack_cli-0.3.22 → agentpack_cli-0.3.23}/src/agentpack/commands/workflow_cmd.py RENAMED Viewed

@@ -2,6 +2,7 @@ from __future__ import annotations
 import json
 import subprocess
+import tempfile
 from pathlib import Path
 from typing import Any
@@ -17,6 +18,7 @@ from agentpack.core.loop_protocol import (
     initialize_loop,
     load_loop_state,
     mark_done,
+    resolve_runner_adapter,
     run_loop,
 )
 from agentpack.core.thread_context import resolve_thread_option
@@ -35,11 +37,13 @@ def register(app: typer.Typer) -> None:
         pack_only: bool = typer.Option(False, "--pack-only", help="Run pack directly instead of guard."),
         no_init: bool = typer.Option(False, "--no-init", help="Do not initialize the repo when .agentpack/config.toml is missing."),
         no_next: bool = typer.Option(False, "--no-next", help="Do not print next-step diagnostics after context refresh."),
-        run_loop_requested: bool = typer.Option(False, "--run", help="Run the configured Ralph Loop after preparing context."),
-        dry_run: bool = typer.Option(False, "--dry-run", help="Plan Ralph Loop execution without running the configured runner."),
-        runner: str = typer.Option("", "--runner", help="Generic shell command for the Ralph Loop runner."),
+        run_loop_requested: bool = typer.Option(False, "--run", help="Run the optional guarded loop after preparing context."),
+        dry_run: bool = typer.Option(False, "--dry-run", help="Plan guarded-loop execution without running the configured runner."),
+        runner: str = typer.Option("", "--runner", help="Generic shell command for the optional guarded-loop runner."),
+        runner_adapter: str = typer.Option("", "--runner-adapter", help="Resolve runner command from a known adapter: claude, codex, cursor."),
         max_iterations: int = typer.Option(0, "--max-iterations", help="Override [loop].max_iterations for this run."),
-        verify: list[str] = typer.Option([], "--verify", help="Verification command for Ralph Loop. Repeatable."),
+        verify: list[str] = typer.Option([], "--verify", help="Verification command for the guarded loop. Repeatable."),
+        acceptance: list[str] = typer.Option([], "--acceptance", help="Semantic acceptance check for runner contract. Repeatable."),
         json_output: bool = typer.Option(False, "--json", help="Emit JSON."),
     ) -> None:
         """Initialize if needed, write a task, refresh context, and show next steps."""
@@ -73,10 +77,13 @@ def register(app: typer.Typer) -> None:
                 root,
                 task_text,
                 cfg.loop,
-                runner_override=runner,
+                runner_override=runner or _resolve_runner_adapter(runner_adapter, root),
                 max_iterations_override=max_iterations,
                 verification_overrides=list(verify) if verify else None,
+                acceptance_overrides=list(acceptance) if acceptance else None,
             )
+            if runner_adapter:
+                state.runner_adapter = runner_adapter
             if dry_run:
                 loop_plan = dry_run_plan(root, state).model_dump(mode="json")
                 _finish(stages, json_output, loop_plan=loop_plan)
@@ -105,6 +112,7 @@ def register(app: typer.Typer) -> None:
         skip_benchmark_capture: bool = typer.Option(False, "--skip-benchmark-capture", help="Skip benchmark case capture."),
         archive_thread: bool = typer.Option(False, "--archive-thread", help="Archive the thread after marking state done."),
         allow_empty_capture: bool = typer.Option(False, "--allow-empty-capture", help="Allow benchmark capture with no expected files."),
+        allow_high_risk: bool = typer.Option(False, "--allow-high-risk", help="Allow finish after inspecting a high-risk Ralph Loop diff."),
         json_output: bool = typer.Option(False, "--json", help="Emit JSON."),
     ) -> None:
         """Run finish checks, capture benchmark evidence, and mark work done."""
@@ -115,7 +123,7 @@ def register(app: typer.Typer) -> None:
         finish_task = task or _read_task(root, thread) or (loop_state.task if loop_state else "")
         loop_applies = loop_state is not None and cfg.loop.enabled and (not finish_task or finish_task == loop_state.task)
         if loop_applies:
-            blockers = _loop_finish_blockers(root, cfg.loop, loop_state, thread)
+            blockers = _loop_finish_blockers(root, cfg.loop, loop_state, thread, allow_empty_diff=allow_empty_capture, allow_high_risk=allow_high_risk)
             if blockers:
                 _finish_blocked(blockers, json_output)
                 raise typer.Exit(1)
@@ -146,6 +154,106 @@ def register(app: typer.Typer) -> None:
             stages.append(_run("threads-archive", cli_module_argv("threads", "archive", thread_id, "--summary", summary), root))
         _finish(stages, json_output)
+    @app.command("loop-smoke")
+    def loop_smoke(
+        runner: str = typer.Option("", "--runner", help="Runner command to test against a tiny fixture repo."),
+        runner_adapter: str = typer.Option("", "--runner-adapter", help="Resolve runner command from a known adapter."),
+        json_output: bool = typer.Option(False, "--json", help="Emit JSON."),
+    ) -> None:
+        """Run an optional guarded-loop smoke test in a temporary fixture repo."""
+        with tempfile.TemporaryDirectory(prefix="agentpack-loop-smoke-") as raw:
+            root = Path(raw)
+            _seed_loop_smoke_repo(root)
+            resolved_runner = runner or _resolve_runner_adapter(runner_adapter, root) or _deterministic_smoke_runner(root)
+            state = initialize_loop(
+                root,
+                "make the smoke test pass by changing app.py value to 2",
+                load_config(root).loop,
+                runner_override=resolved_runner,
+                verification_overrides=["python -m pytest -q"],
+                acceptance_overrides=["smoke test passes"],
+                max_iterations_override=2,
+            )
+            summary = run_loop(root, state, refresh=lambda: LoopCommandResult(command="smoke-refresh", returncode=0, output_excerpt="ok"))
+            payload = {
+                "passed": summary.status == "ready_to_finish",
+                "summary": summary.model_dump(mode="json"),
+                "runner": resolved_runner,
+            }
+            latest = load_loop_state(root)
+            if latest is not None and latest.last_runner is not None:
+                payload["runner_output_excerpt"] = latest.last_runner.output_excerpt
+            if latest is not None:
+                payload["failure_class"] = latest.failure_class
+            if json_output:
+                typer.echo(json.dumps(payload, indent=2, sort_keys=True))
+            else:
+                marker = "[green]✓[/]" if payload["passed"] else "[red]✗[/]"
+                console.print(f"{marker} loop smoke {summary.status}")
+                console.print(f"runner: [bold]{resolved_runner}[/]")
+            if summary.status != "ready_to_finish":
+                raise typer.Exit(1)
+    @app.command("loop-rollback")
+    def loop_rollback(
+        iteration: int = typer.Option(0, "--iteration", help="Rollback patch iteration (0 = latest recorded)."),
+        json_output: bool = typer.Option(False, "--json", help="Emit JSON."),
+    ) -> None:
+        """Restore the worktree to the latest recorded guarded-loop rollback patch."""
+        root = _root()
+        state = load_loop_state(root)
+        patch = _loop_rollback_patch(root, state, iteration)
+        payload = {"patch": str(patch.relative_to(root)) if patch else "", "applied": False}
+        current = subprocess.run(["git", "diff", "--binary"], cwd=root, capture_output=True, text=True)
+        if current.stdout.strip():
+            reverse = subprocess.run(["git", "apply", "-R", "-"], input=current.stdout, cwd=root, capture_output=True, text=True)
+            if reverse.returncode != 0:
+                payload["reason"] = reverse.stderr.strip() or "failed to reverse current diff"
+                if json_output:
+                    typer.echo(json.dumps(payload, indent=2, sort_keys=True))
+                    return
+                console.print(f"[red]Rollback failed:[/] {payload['reason']}")
+                raise typer.Exit(1)
+            payload["applied"] = True
+        elif not patch:
+            payload["reason"] = "no rollback patch found and worktree has no tracked diff"
+            if json_output:
+                typer.echo(json.dumps(payload, indent=2, sort_keys=True))
+                return
+            console.print("[yellow]No rollback patch found and worktree has no tracked diff.[/]")
+            return
+        if not patch:
+            if json_output:
+                typer.echo(json.dumps(payload, indent=2, sort_keys=True))
+                return
+            console.print("[green]✓[/] Reversed current tracked diff")
+            return
+        apply_result = subprocess.run(["git", "apply", str(patch)], cwd=root, capture_output=True, text=True)
+        payload["applied"] = apply_result.returncode == 0
+        if apply_result.returncode != 0:
+            payload["reason"] = apply_result.stderr.strip() or "failed to apply rollback patch"
+            if json_output:
+                typer.echo(json.dumps(payload, indent=2, sort_keys=True))
+                return
+            console.print(f"[red]Rollback failed:[/] {payload['reason']}")
+            raise typer.Exit(1)
+        if json_output:
+            typer.echo(json.dumps(payload, indent=2, sort_keys=True))
+            return
+        console.print(f"[green]✓[/] Applied rollback patch {payload['patch']}")
+    @app.command("loop-metrics")
+    def loop_metrics(json_output: bool = typer.Option(False, "--json", help="Emit JSON.")) -> None:
+        """Summarize guarded-loop outcomes over time."""
+        root = _root()
+        rows = _read_loop_metrics(root)
+        summary = _summarize_loop_metrics(rows)
+        if json_output:
+            typer.echo(json.dumps(summary, indent=2, sort_keys=True))
+            return
+        console.print(f"Runs: [bold]{summary['runs']}[/]")
+        console.print(f"Ready: [green]{summary['ready_to_finish']}[/]  Blocked: [yellow]{summary['blocked']}[/]  Done: [green]{summary['done']}[/]")
 def _run(name: str, command: list[str], root: Path) -> dict[str, Any]:
     result = subprocess.run(command, cwd=root, capture_output=True, text=True)
@@ -190,8 +298,19 @@ def _finish(
         raise typer.Exit(1)
-def _loop_finish_blockers(root: Path, loop_cfg, loop_state, thread: str) -> list[dict[str, Any]]:
-    blockers = [blocker.model_dump(mode="json") for blocker in finish_blockers(root, loop_cfg, loop_state)]
+def _loop_finish_blockers(
+    root: Path,
+    loop_cfg,
+    loop_state,
+    thread: str,
+    *,
+    allow_empty_diff: bool = False,
+    allow_high_risk: bool = False,
+) -> list[dict[str, Any]]:
+    blockers = [
+        blocker.model_dump(mode="json")
+        for blocker in finish_blockers(root, loop_cfg, loop_state, allow_empty_diff=allow_empty_diff, allow_high_risk=allow_high_risk)
+    ]
     fresh, reason = _context_is_fresh(root, thread_id=resolve_thread_option(thread))
     if not fresh:
         blockers.append(
@@ -204,6 +323,94 @@ def _loop_finish_blockers(root: Path, loop_cfg, loop_state, thread: str) -> list
     return blockers
+def _resolve_runner_adapter(adapter: str, root: Path) -> str:
+    if not adapter:
+        return ""
+    command = resolve_runner_adapter(adapter, root)
+    if not command:
+        console.print(f"[red]Runner adapter unavailable:[/] {adapter}")
+        raise typer.Exit(1)
+    return command
+def _seed_loop_smoke_repo(root: Path) -> None:
+    subprocess.run(["git", "init"], cwd=root, check=True, capture_output=True, text=True)
+    (root / ".agentpack").mkdir()
+    (root / ".agentpack" / "config.toml").write_text("[context]\n[loop]\nrequire_clean_tree = false\n", encoding="utf-8")
+    (root / ".agentpack" / "task.md").write_text("make smoke test pass\n", encoding="utf-8")
+    (root / "app.py").write_text("VALUE = 1\n", encoding="utf-8")
+    (root / "test_app.py").write_text("from app import VALUE\n\n\ndef test_value():\n    assert VALUE == 2\n", encoding="utf-8")
+    subprocess.run(["git", "add", "app.py", "test_app.py"], cwd=root, check=True, capture_output=True, text=True)
+    subprocess.run(
+        ["git", "-c", "user.email=test@example.com", "-c", "user.name=Test User", "commit", "-m", "init"],
+        cwd=root,
+        check=True,
+        capture_output=True,
+        text=True,
+    )
+def _deterministic_smoke_runner(root: Path) -> str:
+    script = root / "smoke_runner.py"
+    script.write_text(
+        "from pathlib import Path\n"
+        "Path('app.py').write_text('VALUE = 2\\n', encoding='utf-8')\n"
+        "print('{\"status\":\"changed\",\"summary\":\"updated app.py\",\"files_changed\":[\"app.py\"],\"acceptance\":{\"smoke test passes\":\"pass\"}}')\n",
+        encoding="utf-8",
+    )
+    return "python smoke_runner.py"
+def _loop_rollback_patch(root: Path, state, iteration: int) -> Path | None:
+    if iteration > 0:
+        candidate = root / ".agentpack" / "loop_rollback" / f"iteration-{iteration}-before.patch"
+        return candidate if candidate.exists() else None
+    if state is not None and state.rollback_patch:
+        candidate = root / state.rollback_patch
+        if candidate.exists():
+            return candidate
+    patches = sorted((root / ".agentpack" / "loop_rollback").glob("iteration-*-before.patch"))
+    return patches[-1] if patches else None
+def _read_loop_metrics(root: Path) -> list[dict[str, Any]]:
+    path = root / ".agentpack" / "loop_metrics.jsonl"
+    if not path.exists():
+        return []
+    rows: list[dict[str, Any]] = []
+    for line in path.read_text(encoding="utf-8", errors="replace").splitlines():
+        try:
+            value = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(value, dict):
+            rows.append(value)
+    return rows[-500:]
+def _summarize_loop_metrics(rows: list[dict[str, Any]]) -> dict[str, Any]:
+    outcomes: dict[str, int] = {}
+    failure_classes: dict[str, int] = {}
+    total_iterations = 0
+    for row in rows:
+        outcome = str(row.get("outcome") or "unknown")
+        outcomes[outcome] = outcomes.get(outcome, 0) + 1
+        failure_class = str(row.get("failure_class") or "")
+        if failure_class:
+            failure_classes[failure_class] = failure_classes.get(failure_class, 0) + 1
+        total_iterations += int(row.get("iterations") or 0)
+    runs = len(rows)
+    return {
+        "runs": runs,
+        "ready_to_finish": outcomes.get("ready_to_finish", 0),
+        "blocked": outcomes.get("blocked", 0),
+        "done": outcomes.get("done", 0),
+        "outcomes": outcomes,
+        "failure_classes": failure_classes,
+        "avg_iterations": round(total_iterations / runs, 2) if runs else 0,
+    }
 def _finish_blocked(blockers: list[dict[str, Any]], json_output: bool) -> None:
     if json_output:
         typer.echo(json.dumps({"passed": False, "stages": [], "loop_blockers": blockers}, indent=2, sort_keys=True))

{agentpack_cli-0.3.22 → agentpack_cli-0.3.23}/src/agentpack/core/config.py RENAMED Viewed

@@ -76,8 +76,11 @@ class LearningConfig(BaseModel):
 class LoopConfig(BaseModel):
     enabled: bool = True
     runner: str = ""
+    runner_adapter: str = ""
+    runner_prompt_output: str = ".agentpack/loop_runner_prompt.md"
     max_iterations: int = 10
     verification_commands: list[str] = Field(default_factory=list)
+    acceptance_checks: list[str] = Field(default_factory=list)
     require_verification: bool = True
     require_progress_update: bool = True
     require_clean_tree: bool = True
@@ -86,6 +89,8 @@ class LoopConfig(BaseModel):
     runner_timeout_seconds: int = 600
     verification_timeout_seconds: int = 600
     max_repeated_failures: int = 3
+    risk_sensitive_globs: list[str] = Field(default_factory=list)
+    risk_high_file_count: int = 20
 class RuntimeConfig(BaseModel):

agentpack-cli 0.3.22__tar.gz → 0.3.23__tar.gz

agentpack-cli 0.3.22tar.gz → 0.3.23tar.gz