PyPI - coding-cli-runtime - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

coding-cli-runtime 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,38 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).
 ## [Unreleased]
+## [0.3.0] - 2026-04-09
+### Added
+- **Headless launch core helpers** — per-provider arg renderers derived from
+  `ProviderContract.headless`: `build_claude_headless_core()`,
+  `build_codex_headless_core()`, `build_copilot_headless_core()`,
+  `build_gemini_headless_core()`. All consumers (app-generation, feather,
+  codex_cli, provider_contracts builder) now delegate to these.
+- `scan_session_dir()` — generic directory-scanning primitive for session log
+  discovery with `extract_fn` callback (internal, not in public `__all__`).
+- Session log discovery section in README.
+- API summary table in README.
+- 27 new Stage 2 tests for headless cores, builder delegation, and
+  `scan_session_dir`.
+### Changed
+- `build_codex_exec_spec()` now delegates to `build_codex_headless_core()`.
+  `full_auto` and `skip_git_repo_check` params preserved and passed through.
+- `_build_non_interactive_run()` now delegates to per-provider headless core
+  helpers instead of assembling flags inline.
+- Feather `report_data.py` and `report_sections.py` use headless core helpers
+  with fallback for environments without `coding_cli_runtime`.
+- Feather `generate_report.py` Codex session discovery replaced with
+  `find_codex_session()` from `coding_cli_runtime`.
+- App-generation `claude_impl.py`, `copilot_impl.py`, `gemini_impl.py`
+  `build_command()` functions delegate to headless core helpers.
+- Dead headless opt-out flags removed from Copilot (`--allow-all`, `--ask-user`,
+  `--use-custom-instructions`) and Gemini (`--auto-approve`) CLI specs —
+  these were never used in batch runs and are now handled by the headless core.
+- README rewritten: user-action feature list, `run_interactive_session` example,
+  `uv add` install, API summary, Contributing link, session log discovery.
 ## [0.2.0] - 2026-04-08
 ### Added

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: coding-cli-runtime
-Version: 0.2.0
+Version: 0.3.0
 Summary: Reusable CLI runtime primitives for provider-backed automation workflows
 Author-email: LLM Eval maintainers <llm-eval-maintainers@users.noreply.github.com>
 License-Expression: MIT
@@ -40,17 +40,21 @@ code doesn't need provider-specific subprocess handling.
 **What it does (and why not just `subprocess.run`):**
-- Unified request/result types across all four CLIs
-- Timeout enforcement with graceful process termination
-- Provider-aware failure classification (retryable vs fatal)
-- Built-in model catalog with defaults, reasoning levels, and capabilities
-- Interactive session management for long-running generation tasks
-- Zero runtime dependencies
+- Run any provider CLI with unified request/result types and timeout enforcement
+- Query the model catalog (with user-override and live-cache fallback)
+- Classify failures as retryable vs fatal per provider
+- Look up provider auth, config dirs, and headless launch flags
+- Build non-interactive launch commands without hardcoding provider flags
+- Find session logs after a run (Codex, Claude)
+- Run long-lived sessions with process-group cleanup and transcript mirroring
+- No Python package dependencies — only requires the provider CLIs themselves
 ## Installation
 ```bash
 pip install coding-cli-runtime
+# or
+uv add coding-cli-runtime
 ```
 Requires Python 3.10+.
@@ -65,7 +69,7 @@ from pathlib import Path
 from coding_cli_runtime import CliRunRequest, run_cli_command
 request = CliRunRequest(
-    cmd_parts=("codex", "--model", "o4-mini", "--quiet", "exec", "fix the tests"),
+    cmd_parts=("codex", "--model", "gpt-5.4", "--quiet", "exec", "fix the tests"),
     cwd=Path("/tmp/my-project"),
     timeout_seconds=120,
 )
@@ -180,6 +184,38 @@ can drill into whichever aspect they need. This is reference metadata,
 not a command-construction control plane — consumers keep their own
 command assembly and adopt contract fields selectively.
+### Build headless launch commands
+```python
+from coding_cli_runtime import build_claude_headless_core, build_codex_headless_core
+# Claude: binary + --print + --permission-mode + --dangerously-skip-permissions + --model
+cmd = build_claude_headless_core("claude-sonnet-4-6")
+cmd.extend(["--output-format", "text", "--disallowedTools", "Bash,Task"])
+# Codex: binary + exec + --full-auto + --sandbox + --skip-git-repo-check + --model
+cmd = build_codex_headless_core("gpt-5.4", sandbox_mode="read-only")
+cmd.extend(["-C", str(workdir)])
+```
+Headless core helpers emit the standard flags for non-interactive runs.
+Consumers append app-specific tails (tool restrictions, output paths, etc.).
+### Find session logs after a run
+```python
+import time
+from coding_cli_runtime import find_codex_session, find_claude_session
+# Find the most recent Codex session log for a given working directory
+session = find_codex_session("/path/to/project", since_ts=time.time() - 300)
+if session:
+    print(f"Session log: {session}")  # ~/.codex/sessions/.../conversation.jsonl
+```
+Works for Codex and Claude. Scans provider config directories for session
+files matching the working directory and time window.
 ## Key types
 | Type | Purpose |
@@ -191,11 +227,50 @@ command assembly and adopt contract fields selectively.
 | `ProviderContract` | Structured provider CLI metadata (auth, paths, headless launch) |
 | `FailureClassification` | Classified error with retryable flag and category |
-`run_interactive_session()` manages long-running CLI processes with
-timeout enforcement, process-group cleanup, transcript mirroring, and
-automatic retries. Only `cmd_parts`, `cwd`, `stdin_text`, and `logger` are
-required — observability labels like `job_name` and `phase_tag` default to
-sensible values so external callers don't need to invent them.
+### Run long-lived CLI sessions
+For CLI runs that take minutes (e.g., full app generation), use
+`run_interactive_session()` instead of `run_cli_command()`. It adds:
+- Process-group cleanup (kills orphaned child processes on timeout)
+- Transcript mirroring (streams CLI output to a file while the process runs)
+- Automatic retries on transient failures
+```python
+from coding_cli_runtime import run_interactive_session
+result = await run_interactive_session(
+    cmd_parts=("claude", "--print", "--model", "claude-sonnet-4-6"),
+    cwd=workdir,
+    stdin_text=prompt,
+    logger=logger,
+    timeout_seconds=600,
+)
+```
+Only `cmd_parts`, `cwd`, `stdin_text`, and `logger` are required.
+Observability labels (`job_name`, `phase_tag`) default to sensible values.
+## API summary
+The full public API is listed in [`__init__.py`](src/coding_cli_runtime/__init__.py).
+Key function groups:
+| Group | Functions |
+|-------|-----------|
+| Execution | `run_cli_command`, `run_cli_command_sync`, `run_interactive_session` |
+| Provider metadata | `get_provider_contract`, `get_provider_spec`, `list_provider_specs` |
+| Contract helpers | `build_env_overlay`, `resolve_config_paths`, `render_prompt`, `resolve_auth` |
+| Headless launch | `build_claude_headless_core`, `build_codex_headless_core`, `build_copilot_headless_core`, `build_gemini_headless_core` |
+| Codex batch | `build_codex_exec_spec` |
+| Failure handling | `classify_provider_failure` |
+| Session logs | `find_codex_session`, `find_claude_session` |
+| Schema | `load_schema`, `validate_payload` |
+| Utilities | `redact_text`, `build_model_id`, `normalize_path_str` |
+## Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and quality checks.
 ## Prerequisites

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/README.md RENAMED Viewed

@@ -14,17 +14,21 @@ code doesn't need provider-specific subprocess handling.
 **What it does (and why not just `subprocess.run`):**
-- Unified request/result types across all four CLIs
-- Timeout enforcement with graceful process termination
-- Provider-aware failure classification (retryable vs fatal)
-- Built-in model catalog with defaults, reasoning levels, and capabilities
-- Interactive session management for long-running generation tasks
-- Zero runtime dependencies
+- Run any provider CLI with unified request/result types and timeout enforcement
+- Query the model catalog (with user-override and live-cache fallback)
+- Classify failures as retryable vs fatal per provider
+- Look up provider auth, config dirs, and headless launch flags
+- Build non-interactive launch commands without hardcoding provider flags
+- Find session logs after a run (Codex, Claude)
+- Run long-lived sessions with process-group cleanup and transcript mirroring
+- No Python package dependencies — only requires the provider CLIs themselves
 ## Installation
 ```bash
 pip install coding-cli-runtime
+# or
+uv add coding-cli-runtime
 ```
 Requires Python 3.10+.
@@ -39,7 +43,7 @@ from pathlib import Path
 from coding_cli_runtime import CliRunRequest, run_cli_command
 request = CliRunRequest(
-    cmd_parts=("codex", "--model", "o4-mini", "--quiet", "exec", "fix the tests"),
+    cmd_parts=("codex", "--model", "gpt-5.4", "--quiet", "exec", "fix the tests"),
     cwd=Path("/tmp/my-project"),
     timeout_seconds=120,
 )
@@ -154,6 +158,38 @@ can drill into whichever aspect they need. This is reference metadata,
 not a command-construction control plane — consumers keep their own
 command assembly and adopt contract fields selectively.
+### Build headless launch commands
+```python
+from coding_cli_runtime import build_claude_headless_core, build_codex_headless_core
+# Claude: binary + --print + --permission-mode + --dangerously-skip-permissions + --model
+cmd = build_claude_headless_core("claude-sonnet-4-6")
+cmd.extend(["--output-format", "text", "--disallowedTools", "Bash,Task"])
+# Codex: binary + exec + --full-auto + --sandbox + --skip-git-repo-check + --model
+cmd = build_codex_headless_core("gpt-5.4", sandbox_mode="read-only")
+cmd.extend(["-C", str(workdir)])
+```
+Headless core helpers emit the standard flags for non-interactive runs.
+Consumers append app-specific tails (tool restrictions, output paths, etc.).
+### Find session logs after a run
+```python
+import time
+from coding_cli_runtime import find_codex_session, find_claude_session
+# Find the most recent Codex session log for a given working directory
+session = find_codex_session("/path/to/project", since_ts=time.time() - 300)
+if session:
+    print(f"Session log: {session}")  # ~/.codex/sessions/.../conversation.jsonl
+```
+Works for Codex and Claude. Scans provider config directories for session
+files matching the working directory and time window.
 ## Key types
 | Type | Purpose |
@@ -165,11 +201,50 @@ command assembly and adopt contract fields selectively.
 | `ProviderContract` | Structured provider CLI metadata (auth, paths, headless launch) |
 | `FailureClassification` | Classified error with retryable flag and category |
-`run_interactive_session()` manages long-running CLI processes with
-timeout enforcement, process-group cleanup, transcript mirroring, and
-automatic retries. Only `cmd_parts`, `cwd`, `stdin_text`, and `logger` are
-required — observability labels like `job_name` and `phase_tag` default to
-sensible values so external callers don't need to invent them.
+### Run long-lived CLI sessions
+For CLI runs that take minutes (e.g., full app generation), use
+`run_interactive_session()` instead of `run_cli_command()`. It adds:
+- Process-group cleanup (kills orphaned child processes on timeout)
+- Transcript mirroring (streams CLI output to a file while the process runs)
+- Automatic retries on transient failures
+```python
+from coding_cli_runtime import run_interactive_session
+result = await run_interactive_session(
+    cmd_parts=("claude", "--print", "--model", "claude-sonnet-4-6"),
+    cwd=workdir,
+    stdin_text=prompt,
+    logger=logger,
+    timeout_seconds=600,
+)
+```
+Only `cmd_parts`, `cwd`, `stdin_text`, and `logger` are required.
+Observability labels (`job_name`, `phase_tag`) default to sensible values.
+## API summary
+The full public API is listed in [`__init__.py`](src/coding_cli_runtime/__init__.py).
+Key function groups:
+| Group | Functions |
+|-------|-----------|
+| Execution | `run_cli_command`, `run_cli_command_sync`, `run_interactive_session` |
+| Provider metadata | `get_provider_contract`, `get_provider_spec`, `list_provider_specs` |
+| Contract helpers | `build_env_overlay`, `resolve_config_paths`, `render_prompt`, `resolve_auth` |
+| Headless launch | `build_claude_headless_core`, `build_codex_headless_core`, `build_copilot_headless_core`, `build_gemini_headless_core` |
+| Codex batch | `build_codex_exec_spec` |
+| Failure handling | `classify_provider_failure` |
+| Session logs | `find_codex_session`, `find_claude_session` |
+| Schema | `load_schema`, `validate_payload` |
+| Utilities | `redact_text`, `build_model_id`, `normalize_path_str` |
+## Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and quality checks.
 ## Prerequisites

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "coding-cli-runtime"
-version = "0.2.0"
+version = "0.3.0"
 description = "Reusable CLI runtime primitives for provider-backed automation workflows"
 readme = {file = "README.md", content-type = "text/markdown"}
 license = "MIT"
@@ -94,7 +94,7 @@ disallow_untyped_defs = false
 warn_return_any = false
 [tool.bumpversion]
-current_version = "0.2.0"
+current_version = "0.3.0"
 parse = "(?P<major>\\d+)\\.(?P<minor>\\d+)\\.(?P<patch>\\d+)"
 serialize = ["{major}.{minor}.{patch}"]
 commit = true

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/src/coding_cli_runtime/__init__.py RENAMED Viewed

@@ -2,7 +2,7 @@
 from __future__ import annotations
-__version__ = "0.2.0"
+__version__ = "0.3.0"
 from .auth import AuthResolution, resolve_auth
 from .codex_cli import CodexExecSpec, build_codex_exec_spec
@@ -15,6 +15,12 @@ from .contracts import (
     ErrorCode,
 )
 from .failure_classification import FailureClassification, classify_provider_failure
+from .headless import (
+    build_claude_headless_core,
+    build_codex_headless_core,
+    build_copilot_headless_core,
+    build_gemini_headless_core,
+)
 from .provider_contracts import (
     ApprovalContract,
     AuthContract,
@@ -97,7 +103,12 @@ __all__ = [
     "SessionRetryDecision",
     "SessionExecutionTimeoutError",
     "TranscriptMirrorStrategy",
+    "build_claude_headless_core",
+    "build_codex_exec_spec",
+    "build_codex_headless_core",
+    "build_copilot_headless_core",
     "build_env_overlay",
+    "build_gemini_headless_core",
     "get_claude_default_model",
     "get_claude_effort_levels",
     "get_claude_model_candidates",
@@ -112,7 +123,6 @@ __all__ = [
     "get_provider_spec",
     "list_provider_specs",
     "build_model_id",
-    "build_codex_exec_spec",
     "classify_provider_failure",
     "load_schema",
     "render_prompt",

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/src/coding_cli_runtime/codex_cli.py RENAMED Viewed

@@ -60,18 +60,20 @@ def build_codex_exec_spec(
         model_controls=model_controls,
     )
     reasoning_config_value = json.dumps(effective_reasoning)
-    cmd_parts: list[str] = [str(codex_bin), "exec"]
+    from .headless import build_codex_headless_core
+    cmd_parts: list[str] = build_codex_headless_core(
+        model,
+        binary=str(codex_bin),
+        sandbox_mode=sandbox if sandbox else None,
+        full_auto=full_auto,
+        skip_git_repo_check=skip_git_repo_check,
+    )
     if json_output:
         cmd_parts.append("--json")
-    if full_auto:
-        cmd_parts.append("--full-auto")
-    cmd_parts.extend(["--sandbox", sandbox])
-    if skip_git_repo_check:
-        cmd_parts.append("--skip-git-repo-check")
     cmd_parts.extend(
         [
-            "--model",
-            model,
             "--config",
             f"model_reasoning_effort={reasoning_config_value}",
             "-C",

coding_cli_runtime-0.3.0/src/coding_cli_runtime/headless.py ADDED Viewed

@@ -0,0 +1,124 @@
+"""Per-provider headless launch core helpers.
+Each helper emits the standard headless launch args for its provider,
+derived from ``ProviderContract.headless``. Consumers append their own
+app-specific tails (tool restrictions, output paths, prompt, etc.).
+These helpers are the canonical source for headless launch flag assembly
+within ``coding_cli_runtime``. In-repo consumers (feather, codex_cli,
+provider_contracts builder) delegate to them. App-generation provider
+wrappers may still assemble flags directly when their command construction
+is interleaved with consumer-specific logic (reasoning config, output
+format, artifact paths).
+"""
+from __future__ import annotations
+from .provider_contracts import get_provider_contract
+def build_claude_headless_core(
+    model: str,
+    *,
+    binary: str | None = None,
+    permission_mode: str | None = None,
+    skip_permissions: bool = True,
+) -> list[str]:
+    """Build Claude headless launch core args.
+    Returns args up to and including ``--model``. Does NOT include prompt,
+    output format, tool restrictions, or other app-specific flags.
+    """
+    contract = get_provider_contract("claude")
+    h = contract.headless
+    cmd: list[str] = [binary or contract.binary]
+    cmd.extend(h.activation_args)
+    if h.approval.permission_mode_flag:
+        mode = permission_mode or h.approval.default_permission_mode
+        if mode:
+            cmd.extend([h.approval.permission_mode_flag, mode])
+    if skip_permissions and h.approval.flag:
+        cmd.append(h.approval.flag)
+    cmd.extend(["--model", model])
+    return cmd
+def build_codex_headless_core(
+    model: str,
+    *,
+    binary: str | None = None,
+    sandbox_mode: str | None = None,
+    full_auto: bool = True,
+    skip_git_repo_check: bool = True,
+) -> list[str]:
+    """Build Codex headless launch core args.
+    Returns args including ``exec``, ``--full-auto``, ``--sandbox``,
+    ``--skip-git-repo-check``, and ``--model``. Does NOT include
+    ``-C``, ``-o``, ``--output-schema``, or reasoning config.
+    Args:
+        full_auto: Include ``--full-auto`` (default True).
+        skip_git_repo_check: Include ``--skip-git-repo-check`` (default True).
+    """
+    contract = get_provider_contract("codex")
+    h = contract.headless
+    cmd: list[str] = [binary or contract.binary]
+    cmd.extend(h.activation_args)
+    if full_auto and h.noninteractive_mode_flag:
+        cmd.append(h.noninteractive_mode_flag)
+    if h.sandbox is not None:
+        mode = sandbox_mode or h.sandbox.writable_mode
+        cmd.extend([h.sandbox.flag, mode])
+    if skip_git_repo_check and h.requires_git_repo and h.skip_git_repo_flag:
+        cmd.append(h.skip_git_repo_flag)
+    cmd.extend(["--model", model])
+    return cmd
+def build_copilot_headless_core(
+    model: str,
+    *,
+    binary: str | None = None,
+    stream: str | None = None,
+) -> list[str]:
+    """Build Copilot headless launch core args.
+    Returns args including activation (``--no-ask-user``,
+    ``--no-custom-instructions``), ``--allow-all``, ``--stream``,
+    and ``--model``. Does NOT include ``-p``, ``--share``, or
+    force-implementation.
+    """
+    contract = get_provider_contract("copilot")
+    h = contract.headless
+    cmd: list[str] = [binary or contract.binary]
+    cmd.extend(h.activation_args)
+    if h.approval.flag:
+        cmd.append(h.approval.flag)
+    cmd.extend(["--model", model])
+    if h.stream_flag:
+        stream_value = stream or h.default_stream_mode
+        if stream_value:
+            cmd.extend([h.stream_flag, stream_value])
+    return cmd
+def build_gemini_headless_core(
+    model: str,
+    *,
+    binary: str | None = None,
+) -> list[str]:
+    """Build Gemini headless launch core args.
+    Returns args including approval flag (``--yolo``) and ``--model``.
+    Does NOT include ``--prompt ""`` activation (that's part of prompt
+    transport, handled by ``render_prompt()``).
+    """
+    contract = get_provider_contract("gemini")
+    h = contract.headless
+    cmd: list[str] = [binary or contract.binary]
+    cmd.extend(h.activation_args)
+    if h.approval.flag:
+        cmd.append(h.approval.flag)
+    cmd.extend(["--model", model])
+    return cmd

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/src/coding_cli_runtime/provider_contracts.py RENAMED Viewed

@@ -406,47 +406,53 @@ def _build_non_interactive_run(
     stream: str | None = None,
     extra_flags: tuple[str, ...] = (),
 ) -> NonInteractiveRunSpec:
-    """Build a non-interactive CLI run spec. Internal convenience."""
-    contract = get_provider_contract(provider_id)
-    h = contract.headless
-    bin_name = binary or contract.binary
-    cmd: list[str] = [bin_name]
-    # Headless activation (e.g. "--print" for Claude, "exec" for Codex)
-    cmd.extend(h.activation_args)
-    # Non-interactive mode flag (e.g. "--full-auto" for Codex)
-    if h.noninteractive_mode_flag:
-        cmd.append(h.noninteractive_mode_flag)
-    # Sandbox (Codex)
-    if h.sandbox is not None:
-        mode = codex_sandbox_mode or h.sandbox.writable_mode
-        cmd.extend([h.sandbox.flag, mode])
-    # Git repo bypass
-    if h.requires_git_repo and h.skip_git_repo_flag:
-        cmd.append(h.skip_git_repo_flag)
+    """Build a non-interactive CLI run spec. Internal convenience.
-    # Approval
-    if h.approval.flag:
-        cmd.append(h.approval.flag)
-    # Permission mode (Claude)
-    if h.approval.permission_mode_flag:
-        mode_value = permission_mode or h.approval.default_permission_mode
-        if mode_value:
-            cmd.extend([h.approval.permission_mode_flag, mode_value])
+    Delegates headless core arg assembly to ``headless.build_*_headless_core()``
+    helpers, which derive flags from ``ProviderContract.headless``.
+    """
+    from .headless import (
+        build_claude_headless_core,
+        build_codex_headless_core,
+        build_copilot_headless_core,
+        build_gemini_headless_core,
+    )
-    # Model
-    cmd.extend(["--model", model])
+    contract = get_provider_contract(provider_id)
+    h = contract.headless
+    key = provider_id.strip().lower()
-    # Stream (Copilot)
-    if h.stream_flag:
-        stream_value = stream or h.default_stream_mode
-        if stream_value:
-            cmd.extend([h.stream_flag, stream_value])
+    # Headless core (binary + activation + approval + model + stream)
+    if key == "claude":
+        cmd = build_claude_headless_core(model, binary=binary, permission_mode=permission_mode)
+    elif key == "codex":
+        cmd = build_codex_headless_core(model, binary=binary, sandbox_mode=codex_sandbox_mode)
+    elif key == "copilot":
+        cmd = build_copilot_headless_core(model, binary=binary, stream=stream)
+    elif key == "gemini":
+        cmd = build_gemini_headless_core(model, binary=binary)
+    else:
+        # Fallback for unknown providers — generic assembly
+        bin_name = binary or contract.binary
+        cmd = [bin_name, *h.activation_args]
+        if h.noninteractive_mode_flag:
+            cmd.append(h.noninteractive_mode_flag)
+        if h.sandbox is not None:
+            mode = codex_sandbox_mode or h.sandbox.writable_mode
+            cmd.extend([h.sandbox.flag, mode])
+        if h.requires_git_repo and h.skip_git_repo_flag:
+            cmd.append(h.skip_git_repo_flag)
+        if h.approval.flag:
+            cmd.append(h.approval.flag)
+        if h.approval.permission_mode_flag:
+            mode_value = permission_mode or h.approval.default_permission_mode
+            if mode_value:
+                cmd.extend([h.approval.permission_mode_flag, mode_value])
+        cmd.extend(["--model", model])
+        if h.stream_flag:
+            stream_value = stream or h.default_stream_mode
+            if stream_value:
+                cmd.extend([h.stream_flag, stream_value])
     # Prompt
     payload = render_prompt(h.prompt, prompt)

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/src/coding_cli_runtime/session_logs.py RENAMED Viewed

@@ -5,7 +5,11 @@ from __future__ import annotations
 import json
 import os
 import re
+from collections.abc import Callable
 from pathlib import Path
+from typing import TypeVar
+_T = TypeVar("_T")
 def normalize_path_str(path_str: str) -> str:
@@ -15,6 +19,58 @@ def normalize_path_str(path_str: str) -> str:
         return os.path.normpath(path_str)
+# ---------------------------------------------------------------------------
+# Generic session-directory scanning primitive
+# ---------------------------------------------------------------------------
+def scan_session_dir(
+    directory: Path,
+    *,
+    glob_pattern: str = "*.jsonl",
+    since_ts: float,
+    mtime_buffer: float = 15.0,
+    extract_fn: Callable[[Path], _T | None],
+    max_candidates: int = 200,
+) -> list[tuple[float, Path, _T]]:
+    """Scan a directory for session files, filter by mtime, extract metadata.
+    Returns a list of ``(mtime, path, extracted)`` tuples sorted by mtime
+    descending. Provider-specific ranking/selection stays with the caller.
+    Args:
+        directory: Directory to scan.
+        glob_pattern: Glob pattern for session files (default: ``*.jsonl``).
+        since_ts: Only include files with mtime >= ``since_ts - mtime_buffer``.
+        mtime_buffer: Seconds of slack before ``since_ts`` (default: 15).
+        extract_fn: Called on each candidate path. Return ``None`` to skip.
+        max_candidates: Max number of candidates to process after mtime filter.
+    """
+    if not directory.exists():
+        return []
+    candidates: list[tuple[float, Path]] = []
+    try:
+        for path in directory.rglob(glob_pattern):
+            try:
+                mtime = path.stat().st_mtime
+            except OSError:
+                continue
+            if mtime >= since_ts - mtime_buffer:
+                candidates.append((mtime, path))
+    except (OSError, RuntimeError):
+        return []
+    candidates.sort(key=lambda item: item[0], reverse=True)
+    results: list[tuple[float, Path, _T]] = []
+    for mtime, path in candidates[:max_candidates]:
+        extracted = extract_fn(path)
+        if extracted is not None:
+            results.append((mtime, path, extracted))
+    return results
 def codex_session_roots() -> list[Path]:
     base = Path.home() / ".codex"
     return [base / "sessions", base / "archived_sessions"]

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/src/coding_cli_runtime.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: coding-cli-runtime
-Version: 0.2.0
+Version: 0.3.0
 Summary: Reusable CLI runtime primitives for provider-backed automation workflows
 Author-email: LLM Eval maintainers <llm-eval-maintainers@users.noreply.github.com>
 License-Expression: MIT
@@ -40,17 +40,21 @@ code doesn't need provider-specific subprocess handling.
 **What it does (and why not just `subprocess.run`):**
-- Unified request/result types across all four CLIs
-- Timeout enforcement with graceful process termination
-- Provider-aware failure classification (retryable vs fatal)
-- Built-in model catalog with defaults, reasoning levels, and capabilities
-- Interactive session management for long-running generation tasks
-- Zero runtime dependencies
+- Run any provider CLI with unified request/result types and timeout enforcement
+- Query the model catalog (with user-override and live-cache fallback)
+- Classify failures as retryable vs fatal per provider
+- Look up provider auth, config dirs, and headless launch flags
+- Build non-interactive launch commands without hardcoding provider flags
+- Find session logs after a run (Codex, Claude)
+- Run long-lived sessions with process-group cleanup and transcript mirroring
+- No Python package dependencies — only requires the provider CLIs themselves
 ## Installation
 ```bash
 pip install coding-cli-runtime
+# or
+uv add coding-cli-runtime
 ```
 Requires Python 3.10+.
@@ -65,7 +69,7 @@ from pathlib import Path
 from coding_cli_runtime import CliRunRequest, run_cli_command
 request = CliRunRequest(
-    cmd_parts=("codex", "--model", "o4-mini", "--quiet", "exec", "fix the tests"),
+    cmd_parts=("codex", "--model", "gpt-5.4", "--quiet", "exec", "fix the tests"),
     cwd=Path("/tmp/my-project"),
     timeout_seconds=120,
 )
@@ -180,6 +184,38 @@ can drill into whichever aspect they need. This is reference metadata,
 not a command-construction control plane — consumers keep their own
 command assembly and adopt contract fields selectively.
+### Build headless launch commands
+```python
+from coding_cli_runtime import build_claude_headless_core, build_codex_headless_core
+# Claude: binary + --print + --permission-mode + --dangerously-skip-permissions + --model
+cmd = build_claude_headless_core("claude-sonnet-4-6")
+cmd.extend(["--output-format", "text", "--disallowedTools", "Bash,Task"])
+# Codex: binary + exec + --full-auto + --sandbox + --skip-git-repo-check + --model
+cmd = build_codex_headless_core("gpt-5.4", sandbox_mode="read-only")
+cmd.extend(["-C", str(workdir)])
+```
+Headless core helpers emit the standard flags for non-interactive runs.
+Consumers append app-specific tails (tool restrictions, output paths, etc.).
+### Find session logs after a run
+```python
+import time
+from coding_cli_runtime import find_codex_session, find_claude_session
+# Find the most recent Codex session log for a given working directory
+session = find_codex_session("/path/to/project", since_ts=time.time() - 300)
+if session:
+    print(f"Session log: {session}")  # ~/.codex/sessions/.../conversation.jsonl
+```
+Works for Codex and Claude. Scans provider config directories for session
+files matching the working directory and time window.
 ## Key types
 | Type | Purpose |
@@ -191,11 +227,50 @@ command assembly and adopt contract fields selectively.
 | `ProviderContract` | Structured provider CLI metadata (auth, paths, headless launch) |
 | `FailureClassification` | Classified error with retryable flag and category |
-`run_interactive_session()` manages long-running CLI processes with
-timeout enforcement, process-group cleanup, transcript mirroring, and
-automatic retries. Only `cmd_parts`, `cwd`, `stdin_text`, and `logger` are
-required — observability labels like `job_name` and `phase_tag` default to
-sensible values so external callers don't need to invent them.
+### Run long-lived CLI sessions
+For CLI runs that take minutes (e.g., full app generation), use
+`run_interactive_session()` instead of `run_cli_command()`. It adds:
+- Process-group cleanup (kills orphaned child processes on timeout)
+- Transcript mirroring (streams CLI output to a file while the process runs)
+- Automatic retries on transient failures
+```python
+from coding_cli_runtime import run_interactive_session
+result = await run_interactive_session(
+    cmd_parts=("claude", "--print", "--model", "claude-sonnet-4-6"),
+    cwd=workdir,
+    stdin_text=prompt,
+    logger=logger,
+    timeout_seconds=600,
+)
+```
+Only `cmd_parts`, `cwd`, `stdin_text`, and `logger` are required.
+Observability labels (`job_name`, `phase_tag`) default to sensible values.
+## API summary
+The full public API is listed in [`__init__.py`](src/coding_cli_runtime/__init__.py).
+Key function groups:
+| Group | Functions |
+|-------|-----------|
+| Execution | `run_cli_command`, `run_cli_command_sync`, `run_interactive_session` |
+| Provider metadata | `get_provider_contract`, `get_provider_spec`, `list_provider_specs` |
+| Contract helpers | `build_env_overlay`, `resolve_config_paths`, `render_prompt`, `resolve_auth` |
+| Headless launch | `build_claude_headless_core`, `build_codex_headless_core`, `build_copilot_headless_core`, `build_gemini_headless_core` |
+| Codex batch | `build_codex_exec_spec` |
+| Failure handling | `classify_provider_failure` |
+| Session logs | `find_codex_session`, `find_claude_session` |
+| Schema | `load_schema`, `validate_payload` |
+| Utilities | `redact_text`, `build_model_id`, `normalize_path_str` |
+## Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and quality checks.
 ## Prerequisites

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/src/coding_cli_runtime.egg-info/SOURCES.txt RENAMED Viewed

@@ -11,6 +11,7 @@ src/coding_cli_runtime/contracts.py
 src/coding_cli_runtime/copilot_reasoning_baseline.json
 src/coding_cli_runtime/copilot_reasoning_logs.py
 src/coding_cli_runtime/failure_classification.py
+src/coding_cli_runtime/headless.py
 src/coding_cli_runtime/json_io.py
 src/coding_cli_runtime/provider_contracts.py
 src/coding_cli_runtime/provider_controls.py
@@ -35,4 +36,5 @@ tests/test_packaging.py
 tests/test_playground_probe_smoke.py
 tests/test_provider_catalog_resolution.py
 tests/test_provider_contracts.py
-tests/test_runtime_parity.py
+tests/test_runtime_parity.py
+tests/test_stage2_tier1.py

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/tests/test_coverage_gaps.py RENAMED Viewed

@@ -163,11 +163,19 @@ class TestCodexCli:
         spec = self._build(json_output=False)
         assert "--json" not in spec.cmd_parts
-    def test_no_full_auto(self) -> None:
+    def test_full_auto_always_included(self) -> None:
+        spec = self._build()
+        assert "--full-auto" in spec.cmd_parts
+    def test_skip_git_repo_check_always_included(self) -> None:
+        spec = self._build()
+        assert "--skip-git-repo-check" in spec.cmd_parts
+    def test_full_auto_false_omits_flag(self) -> None:
         spec = self._build(full_auto=False)
         assert "--full-auto" not in spec.cmd_parts
-    def test_no_skip_git_repo_check(self) -> None:
+    def test_skip_git_repo_check_false_omits_flag(self) -> None:
         spec = self._build(skip_git_repo_check=False)
         assert "--skip-git-repo-check" not in spec.cmd_parts
@@ -475,7 +483,7 @@ class TestProviderControls:
         assert "gpt-5.4" in result
     def test_build_model_id_empty_controls(self) -> None:
-        assert build_model_id("claude-4-6", applied_controls={}) == "claude-4-6"
+        assert build_model_id("claude-sonnet-4-6", applied_controls={}) == "claude-sonnet-4-6"
 # ── auth ──────────────────────────────────────────────────────────────
@@ -589,10 +597,10 @@ class TestProviderControlsDeep:
     def test_build_model_id_multiple_controls(self) -> None:
         result = build_model_id(
-            "claude-4-6",
+            "claude-sonnet-4-6",
             applied_controls={"effort": "medium", "thinking_tokens": 8192},
         )
-        assert "claude-4-6:" in result
+        assert "claude-sonnet-4-6:" in result
         assert "effort=medium" in result
         assert "thinking_tokens=8192" in result
@@ -630,7 +638,7 @@ class TestSchemaValidationDeep:
     def test_type_check_boolean_not_integer(self) -> None:
         payload = {
             "provider": "claude",
-            "model": "claude-4-6",
+            "model": "claude-sonnet-4-6",
             "run_id": "test",
             "status": "completed",
             "error_code": "none",
@@ -642,7 +650,7 @@ class TestSchemaValidationDeep:
     def test_null_type_accepted(self) -> None:
         payload = {
             "provider": "claude",
-            "model": "claude-4-6",
+            "model": "claude-sonnet-4-6",
             "run_id": "test",
             "status": "completed",
             "error_code": "none",

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/tests/test_packaging.py RENAMED Viewed

@@ -34,11 +34,12 @@ def test_builds_wheel_and_sdist(tmp_path) -> None:
     wheel_path = wheel_paths[0]
     sdist_path = sdist_paths[0]
     # Read version from pyproject.toml so this test doesn't break on bumps
-    import tomllib
+    import re
     pyproject = package_root / "pyproject.toml"
-    with open(pyproject, "rb") as f:
-        version = tomllib.load(f)["project"]["version"]
+    match = re.search(r'^version\s*=\s*"([^"]+)"', pyproject.read_text(), re.MULTILINE)
+    assert match, "Could not find version in pyproject.toml"
+    version = match.group(1)
     assert wheel_path.name.startswith(f"coding_cli_runtime-{version}-")
     assert sdist_path.name == f"coding_cli_runtime-{version}.tar.gz"

{coding_cli_runtime-0.2.0 → coding_cli_runtime-0.3.0}/tests/test_provider_contracts.py RENAMED Viewed

@@ -229,11 +229,11 @@ def test_render_prompt_flag_delivery_missing_flag_raises() -> None:
 def test_builder_claude_command_shape() -> None:
-    spec = _build_non_interactive_run("claude", model="claude-4-6", prompt="do stuff")
+    spec = _build_non_interactive_run("claude", model="claude-sonnet-4-6", prompt="do stuff")
     assert spec.cmd_parts[0] == "claude"
     assert "--print" in spec.cmd_parts
     assert "--model" in spec.cmd_parts
-    assert "claude-4-6" in spec.cmd_parts
+    assert "claude-sonnet-4-6" in spec.cmd_parts
     assert "--dangerously-skip-permissions" in spec.cmd_parts
     assert "--permission-mode" in spec.cmd_parts
     assert "bypassPermissions" in spec.cmd_parts
@@ -293,7 +293,7 @@ def test_builder_copilot_stream_off() -> None:
 def test_builder_claude_permission_mode_override() -> None:
     spec = _build_non_interactive_run(
-        "claude", model="claude-4-6", prompt="x", permission_mode="acceptEdits"
+        "claude", model="claude-sonnet-4-6", prompt="x", permission_mode="acceptEdits"
     )
     idx = list(spec.cmd_parts).index("--permission-mode")
     assert spec.cmd_parts[idx + 1] == "acceptEdits"

coding_cli_runtime-0.3.0/tests/test_stage2_tier1.py ADDED Viewed

@@ -0,0 +1,241 @@
+"""Tests for Stage 2 Tier 1 extractions: headless cores, scan_session_dir."""
+from __future__ import annotations
+import json
+from pathlib import Path
+from coding_cli_runtime.headless import (
+    build_claude_headless_core,
+    build_codex_headless_core,
+    build_copilot_headless_core,
+    build_gemini_headless_core,
+)
+from coding_cli_runtime.session_logs import scan_session_dir
+# ── headless launch cores ─────────────────────────────────────────────
+class TestClaudeHeadlessCore:
+    def test_default(self) -> None:
+        cmd = build_claude_headless_core("claude-sonnet-4-6")
+        assert cmd[0] == "claude"
+        assert "--print" in cmd
+        assert "--permission-mode" in cmd
+        assert "bypassPermissions" in cmd
+        assert "--dangerously-skip-permissions" in cmd
+        assert "--model" in cmd
+        assert "claude-sonnet-4-6" in cmd
+    def test_custom_binary(self) -> None:
+        cmd = build_claude_headless_core("m", binary="/custom/claude")
+        assert cmd[0] == "/custom/claude"
+    def test_permission_mode_override(self) -> None:
+        cmd = build_claude_headless_core("m", permission_mode="acceptEdits")
+        idx = cmd.index("--permission-mode")
+        assert cmd[idx + 1] == "acceptEdits"
+    def test_skip_permissions_false(self) -> None:
+        cmd = build_claude_headless_core("m", skip_permissions=False)
+        assert "--dangerously-skip-permissions" not in cmd
+        assert "--permission-mode" in cmd
+    def test_no_prompt_in_output(self) -> None:
+        cmd = build_claude_headless_core("m")
+        assert "-p" not in cmd
+class TestCodexHeadlessCore:
+    def test_default(self) -> None:
+        cmd = build_codex_headless_core("gpt-5.4")
+        assert cmd[0] == "codex"
+        assert "exec" in cmd
+        assert "--full-auto" in cmd
+        assert "--sandbox" in cmd
+        assert "danger-full-access" in cmd
+        assert "--skip-git-repo-check" in cmd
+        assert "--model" in cmd
+        assert "gpt-5.4" in cmd
+    def test_read_only_sandbox(self) -> None:
+        cmd = build_codex_headless_core("m", sandbox_mode="read-only")
+        idx = cmd.index("--sandbox")
+        assert cmd[idx + 1] == "read-only"
+        assert "--full-auto" in cmd
+    def test_custom_binary(self) -> None:
+        cmd = build_codex_headless_core("m", binary="/custom/codex")
+        assert cmd[0] == "/custom/codex"
+    def test_no_output_path_flags(self) -> None:
+        cmd = build_codex_headless_core("m")
+        assert "-C" not in cmd
+        assert "-o" not in cmd
+        assert "--output-schema" not in cmd
+class TestCopilotHeadlessCore:
+    def test_default(self) -> None:
+        cmd = build_copilot_headless_core("gpt-5.4")
+        assert cmd[0] == "copilot"
+        assert "--no-ask-user" in cmd
+        assert "--no-custom-instructions" in cmd
+        assert "--allow-all" in cmd
+        assert "--stream" in cmd
+        assert "on" in cmd
+        assert "--model" in cmd
+    def test_stream_off(self) -> None:
+        cmd = build_copilot_headless_core("m", stream="off")
+        idx = cmd.index("--stream")
+        assert cmd[idx + 1] == "off"
+    def test_no_prompt_flag(self) -> None:
+        cmd = build_copilot_headless_core("m")
+        assert "-p" not in cmd
+    def test_custom_binary(self) -> None:
+        cmd = build_copilot_headless_core("m", binary="/custom/copilot")
+        assert cmd[0] == "/custom/copilot"
+class TestGeminiHeadlessCore:
+    def test_default(self) -> None:
+        cmd = build_gemini_headless_core("gemini-3-pro-preview")
+        assert cmd[0] == "gemini"
+        assert "--yolo" in cmd
+        assert "--model" in cmd
+        assert "gemini-3-pro-preview" in cmd
+    def test_no_prompt_activation(self) -> None:
+        # --prompt "" is prompt transport, not headless core
+        cmd = build_gemini_headless_core("m")
+        assert "--prompt" not in cmd
+    def test_custom_binary(self) -> None:
+        cmd = build_gemini_headless_core("m", binary="/custom/gemini")
+        assert cmd[0] == "/custom/gemini"
+class TestBuilderDelegation:
+    """Verify _build_non_interactive_run still produces correct output after delegation."""
+    def test_claude_via_builder(self) -> None:
+        from coding_cli_runtime.provider_contracts import _build_non_interactive_run
+        spec = _build_non_interactive_run("claude", model="claude-sonnet-4-6", prompt="test")
+        assert "--print" in spec.cmd_parts
+        assert "--dangerously-skip-permissions" in spec.cmd_parts
+        assert spec.stdin_text == "test"
+    def test_codex_via_builder(self) -> None:
+        from coding_cli_runtime.provider_contracts import _build_non_interactive_run
+        spec = _build_non_interactive_run("codex", model="gpt-5.4", prompt="fix")
+        assert "exec" in spec.cmd_parts
+        assert "--full-auto" in spec.cmd_parts
+    def test_copilot_via_builder(self) -> None:
+        from coding_cli_runtime.provider_contracts import _build_non_interactive_run
+        spec = _build_non_interactive_run("copilot", model="m", prompt="task")
+        assert "--no-ask-user" in spec.cmd_parts
+        assert "--allow-all" in spec.cmd_parts
+    def test_gemini_via_builder(self) -> None:
+        from coding_cli_runtime.provider_contracts import _build_non_interactive_run
+        spec = _build_non_interactive_run("gemini", model="m", prompt="build")
+        assert "--yolo" in spec.cmd_parts
+# ── scan_session_dir ──────────────────────────────────────────────────
+class TestScanSessionDir:
+    def _write_jsonl(self, path: Path, records: list[dict]) -> None:
+        path.parent.mkdir(parents=True, exist_ok=True)
+        path.write_text("\n".join(json.dumps(r) for r in records) + "\n", encoding="utf-8")
+    def test_empty_dir(self, tmp_path: Path) -> None:
+        results = scan_session_dir(
+            tmp_path,
+            since_ts=0.0,
+            extract_fn=lambda p: p.name,
+        )
+        assert results == []
+    def test_nonexistent_dir(self, tmp_path: Path) -> None:
+        results = scan_session_dir(
+            tmp_path / "nonexistent",
+            since_ts=0.0,
+            extract_fn=lambda p: p.name,
+        )
+        assert results == []
+    def test_finds_matching_files(self, tmp_path: Path) -> None:
+        self._write_jsonl(tmp_path / "session1.jsonl", [{"type": "start"}])
+        self._write_jsonl(tmp_path / "session2.jsonl", [{"type": "start"}])
+        results = scan_session_dir(
+            tmp_path,
+            since_ts=0.0,
+            extract_fn=lambda p: p.name,
+        )
+        assert len(results) == 2
+        names = {r[2] for r in results}
+        assert names == {"session1.jsonl", "session2.jsonl"}
+    def test_extract_fn_filters(self, tmp_path: Path) -> None:
+        self._write_jsonl(tmp_path / "good.jsonl", [{"type": "start"}])
+        self._write_jsonl(tmp_path / "bad.jsonl", [{"type": "start"}])
+        results = scan_session_dir(
+            tmp_path,
+            since_ts=0.0,
+            extract_fn=lambda p: p.name if "good" in p.name else None,
+        )
+        assert len(results) == 1
+        assert results[0][2] == "good.jsonl"
+    def test_sorted_by_mtime_descending(self, tmp_path: Path) -> None:
+        import time
+        p1 = tmp_path / "old.jsonl"
+        self._write_jsonl(p1, [{"x": 1}])
+        time.sleep(0.05)
+        p2 = tmp_path / "new.jsonl"
+        self._write_jsonl(p2, [{"x": 2}])
+        results = scan_session_dir(
+            tmp_path,
+            since_ts=0.0,
+            extract_fn=lambda p: p.name,
+        )
+        assert results[0][2] == "new.jsonl"
+    def test_custom_glob_pattern(self, tmp_path: Path) -> None:
+        (tmp_path / "session.jsonl").write_text("{}\n")
+        (tmp_path / "session.json").write_text("{}\n")
+        results = scan_session_dir(
+            tmp_path,
+            glob_pattern="*.json",
+            since_ts=0.0,
+            extract_fn=lambda p: p.name,
+        )
+        assert len(results) == 1
+        assert results[0][2] == "session.json"
+    def test_max_candidates_limits(self, tmp_path: Path) -> None:
+        for i in range(5):
+            self._write_jsonl(tmp_path / f"s{i}.jsonl", [{"i": i}])
+        results = scan_session_dir(
+            tmp_path,
+            since_ts=0.0,
+            extract_fn=lambda p: p.name,
+            max_candidates=2,
+        )
+        assert len(results) == 2