PyPI - cli-agent-runner - Versions diffs - 0.1.37__tar.gz → 0.1.39__tar.gz - Mend

cli-agent-runner 0.1.37tar.gz → 0.1.39tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (227) hide show

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/CHANGELOG.md RENAMED Viewed

@@ -7,6 +7,30 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+### Fixed
+- Grace-kill (`max_grace_after_result_s`) is no longer defeated by long-lived helper subprocesses (e.g. claude's persistent Bash-tool shell-snapshot). `[runtime] grace_kill_ignore_patterns` lists regexes for cmdlines to exclude from the liveness count; the claude preset ships a matching default.
+### Added
+- `[runtime] grace_kill_ignore_patterns: list[str]` — regex patterns; matching child cmdlines are excluded from the grace-kill liveness check.
+- `round_grace_extended` event payload gains `ignored_children` — cmdlines filtered by `grace_kill_ignore_patterns`.
+### Changed
+- Docs: `commands.md` documents `monitor --mode/--port` and `init --preset`; the Chinese verb list and `[monitor]` default values now point to the generated tables instead of restating them; runbook upgrade examples use a version placeholder.
+### Internal
+- New invariant `test_doc_claims_match_ssot` gates documented counts (detectors / defenses / verbs) and config value-sets (`dirty_action` / `context_injection_mode` / transient classification) against their code SSOT — count/enum doc drift now fails CI at the introducing commit.
+- Removed the unused `alert-kinds` docgen renderer; de-duplicated redundant defense-count and alert-kind guards to one canonical tripwire each.
+## [0.1.38] - 2026-05-24
+### Fixed
+- Grace-kill (`max_grace_after_result_s`) no longer reaps a round that emitted `type=result` while a backgrounded child process (e.g. a long build) is still running. It now reaps only when the agent's process group has no live worker processes left (a genuine hang); otherwise it waits for the round to finish or for the `round_timeout_s` ceiling.
+- Corrected `round_grace_kill`'s description: the kill is gated on the process group being idle (no live workers), not on log silence.
+### Added
+- New event `round_grace_extended` — emitted once when grace elapsed after `type=result` but a live worker process kept the round busy; carries the worker cmdlines.
+- `round_grace_kill` now carries `live_children` (cmdlines observed at kill time; empty for a genuine idle hang).
 ## [0.1.37] - 2026-05-22
 ### Fixed

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cli-agent-runner
-Version: 0.1.37
+Version: 0.1.39
 Summary: Restart-on-exit supervisor for autonomous CLI agents
 Project-URL: Homepage, https://github.com/wan9yu/cli-agent-runner
 Project-URL: Documentation, https://github.com/wan9yu/cli-agent-runner#readme

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/agent_runner/_docgen.py RENAMED Viewed

@@ -110,11 +110,6 @@ def render_defenses_table() -> str:
     return "\n".join(lines)
-def render_alert_kinds_list() -> str:
-    """Flat bullet list of all known alert kinds, alphabetised."""
-    return "\n".join(f"- `{k}`" for k in sorted(KNOWN_ALERT_KINDS))
 def render_detector_list() -> str:
     """Bullet list of detectors; auto-stop kinds flagged inline."""
     lines: list[str] = []
@@ -155,7 +150,6 @@ def render_verb_table() -> str:
 RENDERERS: dict[str, Callable[[], str]] = {
     "defenses-table": render_defenses_table,
-    "alert-kinds": render_alert_kinds_list,
     "detector-list": render_detector_list,
     "event-kinds": render_event_kinds_list,
     "config-schema": render_config_schema_table,

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/agent_runner/_emit.py RENAMED Viewed

@@ -19,6 +19,7 @@ __all__ = [
     "emit_fresh_eyes_round_triggered",
     "emit_max_rounds_reached",
     "emit_rate_limit_stop",
+    "emit_round_grace_extended",
     "emit_round_grace_kill",
     "emit_round_progress",
     "emit_round_substrate_after",
@@ -233,16 +234,51 @@ def emit_round_grace_kill(
     *,
     round_num: int,
     grace_s: int,
+    live_children: list[str] | None = None,
 ) -> None:
-    """Emit when subprocess killed because grace-after-result timer expired.
-    Subprocess emitted type=result in JSONL log then sat silent for longer
-    than max_grace_after_result_s seconds. Distinguishes from round_timeout_kill
-    (wall-clock exceeded without result event).
+    """Emit when the subprocess was killed because the grace-after-result timer
+    expired AND the agent's process group had no live worker processes left
+    (a genuine hang). Distinct from round_grace_extended (grace elapsed but a
+    worker was still running) and round_timeout_kill (wall-clock exceeded).
     """
     from agent_runner.events import ROUND_GRACE_KILL, emit
-    emit(log_dir, ROUND_GRACE_KILL, round_num=round_num, grace_s=grace_s)
+    emit(
+        log_dir,
+        ROUND_GRACE_KILL,
+        round_num=round_num,
+        grace_s=grace_s,
+        live_children=live_children or [],
+    )
+def emit_round_grace_extended(
+    log_dir: Path,
+    *,
+    round_num: int,
+    grace_s: int,
+    live_children: list[str],
+    ignored_children: list[str] | None = None,
+) -> None:
+    """Emit when the grace-after-result timer expired but the agent still had
+    live worker processes (e.g. a backgrounded build), so the round was NOT
+    killed; it continues until it finishes or hits round_timeout_s.
+    ignored_children: cmdlines that matched a grace_kill_ignore_patterns entry
+        and were excluded from the liveness count — useful for verifying
+        patterns are firing and for noticing when an upstream CLI changes
+        its helper path.
+    """
+    from agent_runner.events import ROUND_GRACE_EXTENDED, emit
+    emit(
+        log_dir,
+        ROUND_GRACE_EXTENDED,
+        round_num=round_num,
+        grace_s=grace_s,
+        live_children=live_children,
+        ignored_children=ignored_children or [],
+    )
 def emit_anomaly_repetitive_tool(

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/agent_runner/_version.py RENAMED Viewed

@@ -18,7 +18,7 @@ version_tuple: tuple[int | str, ...]
 commit_id: str | None
 __commit_id__: str | None
-__version__ = version = '0.1.37'
-__version_tuple__ = version_tuple = (0, 1, 37)
+__version__ = version = '0.1.39'
+__version_tuple__ = version_tuple = (0, 1, 39)
 __commit_id__ = commit_id = None

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/agent_runner/agent_runtime.py RENAMED Viewed

@@ -11,13 +11,16 @@ Defenses encoded here:
 from __future__ import annotations
 import os
+import re
 import signal
 import subprocess  # noqa: TID251 — sanctioned subprocess caller
 import time
 from collections.abc import Callable
-from dataclasses import dataclass
+from dataclasses import dataclass, field
 from pathlib import Path
+import psutil
 REAP_GRACE_S = 5
@@ -28,6 +31,7 @@ class RunResult:
     timed_out: bool
     pid: int
     killed_for_grace: bool = False
+    grace_kill_children: list[str] = field(default_factory=list)
 def _build_argv(command: list[str], prompt_arg_template: list[str], prompt: str) -> list[str]:
@@ -54,6 +58,45 @@ def _kill_pgroup(proc: subprocess.Popen) -> None:
         pass
+def _live_children(
+    proc: subprocess.Popen,
+    *,
+    ignore_patterns: list[re.Pattern[str]] | None = None,
+    max_n: int = 5,
+    max_len: int = 120,
+) -> tuple[list[str], list[str]]:
+    """Cmdlines of live (non-zombie) descendants of ``proc``, split into
+    ``(live, ignored)``: ``live`` is what counts toward the grace-kill
+    liveness check; ``ignored`` matched an ``ignore_patterns`` entry and is
+    excluded (e.g. claude's persistent shell-snapshot helper). Both lists
+    are bounded by ``max_n``/``max_len`` to keep events small. ``ignore_patterns
+    is None`` → no filtering, ``ignored`` is empty, ``live`` matches 0.1.38.
+    """
+    try:
+        parent = psutil.Process(proc.pid)
+    except (psutil.NoSuchProcess, psutil.AccessDenied):
+        return [], []
+    live: list[str] = []
+    ignored: list[str] = []
+    for child in parent.children(recursive=True):
+        try:
+            if child.status() == psutil.STATUS_ZOMBIE:
+                continue
+            line = " ".join(child.cmdline()) or child.name()
+        except (psutil.NoSuchProcess, psutil.AccessDenied):
+            continue
+        short = line[:max_len]
+        if ignore_patterns and any(p.search(line) for p in ignore_patterns):
+            if len(ignored) < max_n:
+                ignored.append(short)
+        else:
+            if len(live) < max_n:
+                live.append(short)
+        if len(live) >= max_n and len(ignored) >= max_n:
+            break
+    return live, ignored
 # Exact compact bytes — matches claude CLI's no-whitespace JSONL output.
 # A future CLI variant emitting `{"type": "result", ...}` (with space) would
 # bypass this scan; revisit if that happens.
@@ -71,19 +114,28 @@ def run(
     max_grace_after_result_s: int = 0,
     progress_callback: Callable[[dict], None] | None = None,
     progress_interval_s: int = 0,
+    on_grace_extended: Callable[[list[str], list[str]], None] | None = None,
+    grace_kill_ignore_patterns: list[re.Pattern[str]] | None = None,
 ) -> RunResult:
     """Spawn the agent subprocess and wait for exit or timeout.
     Wall-clock timeout (R1128). On timeout: SIGTERM pgroup → REAP_GRACE_S → SIGKILL.
     max_grace_after_result_s: when > 0, start a countdown after the first
-    type=result event is detected in the log; kill if subprocess is still
-    running after this many seconds (HUNG defense). 0 = disabled.
+    type=result event is detected in the log. After it elapses, reap the
+    process group only if the agent has no live worker processes left (a
+    genuine hang). If a worker is still running (e.g. a backgrounded build),
+    do not reap — invoke ``on_grace_extended`` once and keep waiting until the
+    round finishes or hits the wall-clock ``timeout_s`` ceiling. 0 = disabled.
     progress_callback: when not None and progress_interval_s > 0, called every
     progress_interval_s seconds with a dict of log stats (log_size_kb,
     last_write_age_s, wall_age_s). Keeps agent_runtime event-free; callers
     build the callback to emit events.
+    grace_kill_ignore_patterns: pre-compiled regex patterns; child cmdlines
+    matching any pattern (re.search) are excluded from the liveness count
+    (persistent helpers that aren't real workers). None = no filtering.
     """
     argv = _build_argv(command, prompt_arg_template, prompt)
     env = {**os.environ, **env_extra}
@@ -100,6 +152,7 @@ def run(
         start_new_session=True,
     )
     result_seen_at: float | None = None
+    grace_extended_emitted = False
     try:
         while True:
             ret = proc.poll()
@@ -114,10 +167,9 @@ def run(
                 return RunResult(
                     exit_code=exit_code, duration_s=duration, timed_out=True, pid=proc.pid
                 )
-            # Grace kill: result emitted but subprocess still running
+            # Grace kill: result emitted but subprocess still running.
             if max_grace_after_result_s > 0:
                 if result_seen_at is None:
-                    # Cheap check: byte-scan log for marker substring
                     try:
                         with log_path.open("rb") as f:
                             if _RESULT_MARKER in f.read():
@@ -125,16 +177,26 @@ def run(
                     except OSError:
                         pass  # log not flushed yet; check next tick
                 if result_seen_at is not None and now - result_seen_at > max_grace_after_result_s:
-                    _kill_pgroup(proc)
-                    duration = time.time() - start
-                    exit_code = proc.returncode if proc.returncode is not None else -1
-                    return RunResult(
-                        exit_code=exit_code,
-                        duration_s=duration,
-                        timed_out=True,
-                        pid=proc.pid,
-                        killed_for_grace=True,
-                    )
+                    live, ignored = _live_children(proc, ignore_patterns=grace_kill_ignore_patterns)
+                    if live:
+                        # Busy: a backgrounded worker is still running. Don't
+                        # reap — defer to the wall-clock ceiling. Signal once.
+                        if not grace_extended_emitted:
+                            if on_grace_extended is not None:
+                                on_grace_extended(live, ignored)
+                            grace_extended_emitted = True
+                    else:
+                        _kill_pgroup(proc)
+                        duration = time.time() - start
+                        exit_code = proc.returncode if proc.returncode is not None else -1
+                        return RunResult(
+                            exit_code=exit_code,
+                            duration_s=duration,
+                            timed_out=True,
+                            pid=proc.pid,
+                            killed_for_grace=True,
+                            grace_kill_children=[],
+                        )
             # Progress heartbeat: call back if interval elapsed
             if progress_callback is not None and progress_interval_s > 0:
                 if now - last_progress_at >= progress_interval_s:

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/agent_runner/api.py RENAMED Viewed

@@ -733,6 +733,7 @@ from agent_runner._emit import (  # noqa: E402,F401 — intentional bottom re-ex
     emit_fresh_eyes_round_triggered,
     emit_max_rounds_reached,
     emit_rate_limit_stop,
+    emit_round_grace_extended,
     emit_round_grace_kill,
     emit_round_progress,
     emit_round_substrate_after,

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/agent_runner/config.py RENAMED Viewed

@@ -2,6 +2,7 @@
 from __future__ import annotations
+import re
 import tomllib
 from dataclasses import dataclass, field
 from pathlib import Path
@@ -40,6 +41,12 @@ class RuntimeConfig:
     fresh_eyes_every_n: int | None = None  # None = disabled
     dry_run: bool = False
     max_grace_after_result_s: int = 0  # 0 = disabled
+    grace_kill_ignore_patterns: list[str] = field(default_factory=list)
+    """Regex patterns (re.search) tested against each child process's joined
+    cmdline. Matching children are excluded from the grace-kill liveness
+    check — for persistent helper subprocesses (e.g. claude's shell-snapshot
+    bash) that would otherwise defeat max_grace_after_result_s. Empty list
+    = no filtering (0.1.38 behavior preserved)."""
 @dataclass(frozen=True)
@@ -221,6 +228,25 @@ def _validate_remote_failure_tolerance(value: Any) -> int:
     return v
+def _validate_regex_list(value: Any, *, field: str) -> list[str]:
+    """Validate a list of regex pattern strings (each must compile). Returns the
+    raw strings unchanged; callers compile when they need ``re.Pattern`` objects."""
+    if not isinstance(value, list):
+        raise ValueError(f"{field}: expected a list of regex strings, got {type(value).__name__}")
+    out: list[str] = []
+    for p in value:
+        if not isinstance(p, str):
+            raise ValueError(
+                f"{field}: each pattern must be a string, got {type(p).__name__}: {p!r}"
+            )
+        try:
+            re.compile(p)
+        except re.error as e:
+            raise ValueError(f"{field}: invalid regex {p!r}: {e}") from e
+        out.append(p)
+    return out
 _PHASE_OVERRIDE_ALLOWED_FIELDS = frozenset(
     {
         "round_timeout_s",
@@ -392,6 +418,10 @@ def load_config(toml_path: Path) -> Config:
             runtime_d.get("max_grace_after_result_s", 0),
             field="runtime.max_grace_after_result_s",
         ),
+        grace_kill_ignore_patterns=_validate_regex_list(
+            runtime_d.get("grace_kill_ignore_patterns", []),
+            field="runtime.grace_kill_ignore_patterns",
+        ),
     )
     prompt_d = raw.get("prompt", {})
     mode = prompt_d.get("context_injection_mode", "prepend")

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/agent_runner/events.py RENAMED Viewed

@@ -49,6 +49,7 @@ ORPHAN_STASHED = "orphan_stashed"
 PACKAGE_UPGRADED = "package_upgraded"
 PROMPT_OVERWRITTEN = "prompt_overwritten"
 ROUND_END = "round_end"
+ROUND_GRACE_EXTENDED = "round_grace_extended"
 ROUND_GRACE_KILL = "round_grace_kill"
 ROUND_PROGRESS = "round_progress"
 ROUND_START = "round_start"

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/agent_runner/presets/claude.toml RENAMED Viewed

@@ -16,6 +16,7 @@ work_dir = "."
 log_dir = "~/.agent-runner/{project}/logs"
 round_timeout_s = 1800
 restart_delay_s = 3
+grace_kill_ignore_patterns = ['\.claude/shell-snapshots/snapshot-bash-']
 [prompt]
 file = "./prompts/main.md"

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/agent_runner/runner.py RENAMED Viewed

@@ -10,6 +10,7 @@ import hashlib
 import json
 import os
 import random
+import re
 import sys
 import time
 import traceback as tb_mod
@@ -466,6 +467,17 @@ def _run_one_round_inner(cfg: Config, *, phase_override: str | None = None) -> R
             **stats,
         )
+    grace_kill_ignore_patterns = [re.compile(p) for p in cfg.runtime.grace_kill_ignore_patterns]
+    def _grace_extended_emit(live: list[str], ignored: list[str]) -> None:
+        api.emit_round_grace_extended(
+            log_dir,
+            round_num=round_num,
+            grace_s=cfg.runtime.max_grace_after_result_s,
+            live_children=live,
+            ignored_children=ignored,
+        )
     result = agent_runtime.run(
         command=cfg.agent.command,
         prompt_arg_template=cfg.agent.prompt_arg_template,
@@ -476,6 +488,8 @@ def _run_one_round_inner(cfg: Config, *, phase_override: str | None = None) -> R
         max_grace_after_result_s=cfg.runtime.max_grace_after_result_s,
         progress_callback=_progress_emit,
         progress_interval_s=cfg.monitor.round_progress_interval_s,
+        on_grace_extended=_grace_extended_emit,
+        grace_kill_ignore_patterns=grace_kill_ignore_patterns,
     )
     events.emit(
         log_dir,
@@ -549,6 +563,7 @@ def _run_one_round_inner(cfg: Config, *, phase_override: str | None = None) -> R
             log_dir,
             round_num=round_num,
             grace_s=cfg.runtime.max_grace_after_result_s,
+            live_children=result.grace_kill_children,
         )
     elif result.timed_out:
         events.emit(

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/docs/architecture.md RENAMED Viewed

@@ -168,6 +168,7 @@ hook (vs ALL pre-round hooks), use `[plugins] disable = ["that_entry_point_name"
 - `package_upgraded`
 - `prompt_overwritten`
 - `round_end`
+- `round_grace_extended`
 - `round_grace_kill`
 - `round_progress`
 - `round_start`

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/docs/commands.md RENAMED Viewed

@@ -34,8 +34,15 @@ are shared between `peek`, `watch`, and `monitor`.
 Scaffold a new project: writes `agent-runner.toml`, `prompts/main.md`, and
 appends `logs/` to `.gitignore`. By default also creates a git commit.
+Flags:
+- `--preset {claude,aider,gemini}` — agent CLI preset to scaffold (default: `claude`)
+- `--force` — overwrite an existing `agent-runner.toml`
+- `--no-commit` — skip the initial git commit
 ```bash
-agent-runner init                      # default: commit
+agent-runner init                      # default: claude preset, commit
+agent-runner init --preset aider       # aider preset
 agent-runner init --no-commit          # skip the commit
 agent-runner init --force              # overwrite an existing toml
 ```
@@ -133,7 +140,7 @@ agent-runner events --kind transient_error_backoff_capped --tail
 `peek` in a clear-and-refresh loop. Default 2s interval. Stop with Ctrl-C.
-### `agent-runner monitor [--host SSH-ALIAS] [--interval N] [--json]`
+### `agent-runner monitor [--host SSH-ALIAS] [--interval N] [--mode MODE] [--port PORT] [--json]`
 Anomaly-detection daemon. Runs the 12 detectors against the live state on every
 poll. Without `--host`, watches local logs at default 30s interval. With
@@ -143,15 +150,25 @@ When OAuth-fail or disk-critical detectors fire, monitor automatically issues a
 graceful stop (locally via `api.stop`; remotely via `ssh <host> 'agent-runner stop'`).
 Override with `[monitor]` config block (see configuration.md).
+Flags:
+- `--mode {anomaly,narrate,events,http}` — output mode (default: `anomaly`). `narrate`
+  streams a human-readable narrative; `events` streams raw event JSON; `http` serves
+  a local progress page.
+- `--port PORT` — HTTP port for `--mode http` (default: `8765`, local-only).
+- `--host SSH-ALIAS` — watch a remote agent-runner via ssh (anomaly mode only).
 ```bash
-agent-runner monitor                       # local
+agent-runner monitor                       # local anomaly mode
 agent-runner monitor --host pi             # remote
+agent-runner monitor --mode narrate        # streaming narrative
+agent-runner monitor --mode http --port 9000  # HTTP progress page on port 9000
 agent-runner monitor --json | jq -c        # pipe alerts to a downstream consumer
 ```
 ## 中文摘要
-16 个动词：`init / install / uninstall / start / stop / kill / cancel / restart / status / round / serve / upgrade / peek / watch / events / monitor`。
+16 个动词，完整列表见上方动词表（自动生成）。
 观察类（peek/watch/monitor）三视角对称，全部共用 `--round / --log / --events / --select / --json` 下钻参数。

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/docs/configuration.md RENAMED Viewed

@@ -47,6 +47,7 @@ running with newly-set `dirty_action = "auto_commit"` is undefined).
 | `fresh_eyes_every_n` | `int | None` | None |
 | `dry_run` | `bool` | False |
 | `max_grace_after_result_s` | `int` | 0 |
+| `grace_kill_ignore_patterns` | `list[str]` | [] |
 ### `[prompt]`
@@ -200,6 +201,10 @@ Unconfigured phases (and configs without `[phases]`) keep using the global
 ## `[monitor]` (optional, defaults shown)
+> Authoritative field-level defaults are in the generated schema table above
+> (`[monitor]` section). The snippet below shows only the fields most commonly
+> customised, with operational notes.
 ```toml
 [monitor]
 auto_stop_on = ["oauth_fail", "disk_critical"]

{cli_agent_runner-0.1.37 → cli_agent_runner-0.1.39}/docs/long-running-agents.md RENAMED Viewed

@@ -140,7 +140,7 @@ token breakdown + cost (where the underlying CLI exposes it).
 ```
 Use as input to a cost-tracking detector or external billing reconciler.
-See `docs/migrations/0.1.28.md` for the current 12-field payload schema
+See `docs/migrations/0.1.28.md` for the current payload schema
 (includes `cache_creation_tokens`, `tool_call_count`, `phase`, `success`)
 plus a consumer dispatcher sketch. Aggregation (rollups, budget warnings)
 is the consumer's responsibility — agent-runner emits raw per-round

cli_agent_runner-0.1.39/docs/migrations/0.1.38.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Migrating to 0.1.38
+## TL;DR
+```bash
+pip install --upgrade cli-agent-runner==0.1.38
+```
+No action or config change. If you use `max_grace_after_result_s`, grace-kill
+now distinguishes a hung agent from a still-busy one.
+## What changed
+Previously, grace-kill reaped the whole process group `max_grace_after_result_s`
+seconds after the agent emitted `type=result`, with no awareness of child
+processes. A round that backgrounded a long build and emitted `type=result`
+("waiting for build…") could have the still-running build reaped.
+Now, at grace expiry, agent-runner checks for live worker processes in the
+agent's process group:
+- **No live workers** → genuine hang → reaped (`round_grace_kill`, as before).
+- **A live worker** (e.g. a build) → not reaped; agent-runner emits
+  `round_grace_extended` once and waits until the round finishes or hits the
+  `round_timeout_s` wall-clock ceiling.
+The check re-runs each poll tick, so once a backgrounded worker exits, a still-
+stuck agent is reaped promptly rather than waiting out `round_timeout_s`.
+## Three distinct outcomes
+- `round_grace_extended` — grace elapsed but a worker is still running (busy).
+- `round_grace_kill` — grace elapsed and the process group is idle (hang).
+- `round_timeout_kill` — wall-clock `round_timeout_s` exceeded (hard ceiling).
+## Recommended agent contract
+Treat the safety net as a net, not a crutch: emit `type=result` only when the
+turn is truly done. Run long builds/tests foreground and commit before ending
+the turn, rather than backgrounding work past `type=result`.
+## Known limitation
+The "live worker" check treats *any* live non-zombie descendant as busy. If
+your agent keeps persistent helper subprocesses alive past `type=result` (e.g.
+MCP servers), grace-kill will defer hangs to the `round_timeout_s` ceiling. This
+never false-kills; the `round_grace_extended` events make it visible if it
+happens.
+## What did NOT change
+- `max_grace_after_result_s` config, default, and `0 = disabled`.
+- `round_timeout_s` wall-clock kill (still the hard ceiling and backstop).

cli_agent_runner-0.1.39/docs/migrations/0.1.39.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Migrating to 0.1.39
+## TL;DR
+```bash
+pip install --upgrade cli-agent-runner==0.1.39
+```
+**Claude users running 0.1.38**: add one line to `[runtime]` to unblock
+grace-kill against claude's persistent shell-snapshot helper (see below).
+New `agent-runner init --preset=claude` scaffolds get this automatically.
+**Everyone else**: no action.
+## Persistent-helper exclusion (the live fix)
+0.1.38's grace-kill liveness check was correctly conservative — it refused to
+reap a round with live worker children — but claude's `-p` mode keeps a
+persistent Bash-tool shell-snapshot subprocess alive for the whole session.
+That subprocess is not doing work; it's idle infrastructure. 0.1.38 saw it as
+a live worker and deferred every post-result hang to `round_timeout_s` instead
+of reaping at `max_grace_after_result_s`. This is the "persistent-helper
+caveat" 0.1.38's migration doc flagged.
+0.1.39 adds `[runtime] grace_kill_ignore_patterns` — a list of regex patterns;
+child cmdlines matching any pattern (via `re.search`) are excluded from the
+liveness count. `presets/claude.toml` ships a default pattern matching
+claude's shell-snapshot.
+### Existing claude operators — one line
+Add to your `[runtime]` block:
+```toml
+[runtime]
+grace_kill_ignore_patterns = ['\.claude/shell-snapshots/snapshot-bash-']
+```
+Or run `agent-runner init --preset=claude` in a scratch directory and diff
+the generated `agent-runner.toml` against yours.
+After the change, post-result hangs are reaped at `max_grace_after_result_s`.
+Without it, they continue to defer to `round_timeout_s` (the 0.1.38 behavior).
+### Verifying the pattern is firing
+The `round_grace_extended` event payload gains `ignored_children` listing
+cmdlines that matched a pattern. Use it to:
+- confirm the shell-snapshot is being filtered (`ignored_children` non-empty)
+- catch the day claude renames its helper (`live_children` shows a new
+  unfiltered persistent process)
+### Other presets
+`aider.toml` and `gemini.toml` ship no default patterns. Add operator-specific
+patterns to your own `agent-runner.toml` if needed.
+## SSOT consistency hardening (also in 0.1.39)
+A new invariant `test_doc_claims_match_ssot` gates documented counts
+(detectors / defenses / verbs) and config value-sets against code SSOT.
+`commands.md` documents `monitor --mode/--port` and `init --preset`.
+Redundant count guards collapsed to one canonical tripwire each. The unused
+`alert-kinds` docgen renderer was removed. No action required.
+## What did NOT change
+- The 0.1.38 grace-kill liveness semantics (still process-group-based;
+  patterns are an exclusion filter on top).
+- `round_grace_kill` (still fires only when the post-filter live set is empty).
+- `round_timeout_s` (still the hard ceiling).
+- `max_grace_after_result_s` (knob unchanged).
+- For non-claude deployments: zero behavior change.

cli-agent-runner 0.1.37__tar.gz → 0.1.39__tar.gz

cli-agent-runner 0.1.37tar.gz → 0.1.39tar.gz