PyPI - pycastle - Versions diffs - 0.1.3.10.dev0__tar.gz → 0.2.0.1.dev0__tar.gz - Mend

pycastle 0.1.3.10.dev0tar.gz → 0.2.0.1.dev0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (179) hide show

{pycastle-0.1.3.10.dev0 → pycastle-0.2.0.1.dev0}/.gitignore RENAMED Viewed

@@ -1,6 +1,7 @@
 *.idea
 *.claude
 pycastle/
+*.pyc
 # Byte-compiled / optimized / DLL files
 __pycache__/

{pycastle-0.1.3.10.dev0 → pycastle-0.2.0.1.dev0}/CONTEXT.md RENAMED Viewed

@@ -19,8 +19,7 @@
 | --- | --- | --- |
 | **config.py** | Python file in the pycastle directory defining behavioral configuration; overrides the defaults module field by field at runtime | settings.py, settings |
 | **defaults module** | `src/pycastle/defaults/config.py` bundled in the package; contains only pure default values, no logic; never touched by users or the config loader directly | defaults config, fallback config |
-| **config loader** | The `loader.py` module inside the `config/` package; reads the defaults, executes the consuming project's config.py via importlib, and applies any programmatic overrides; contains no subprocess calls and no default values of its own | — |
-| **config validator** | The `validator.py` module inside the `config/` package; owns `validate_config(cfg, claude_service) -> Config`; resolves model shorthands to full model IDs and validates effort levels; raises `ConfigValidationError` on any invalid entry; returns a new immutable `Config` via `dataclasses.replace` | — |
+| **config loader** | The `loader.py` module inside the `config/` package; reads the defaults, executes the consuming project's config.py via importlib, applies any programmatic overrides, validates effort strings, and returns an immutable `Config`; pure — no subprocess calls, no service dependencies, no default values of its own | — |
 | **.env** | File in the pycastle directory holding secrets and credentials only — never committed to git | environment file, config |
 | **GH_TOKEN** | GitHub personal access token stored in .env, used for GitHub API calls and label management | github token, gh pat |
 | **CLAUDE_CODE_OAUTH_TOKEN** | Long-lived OAuth token for Claude Code authentication, stored in .env | claude token, oauth token |
@@ -31,12 +30,12 @@
 | **field-by-field override** | The config loader strategy: for each non-underscore name in the consuming project's config.py, `setattr` replaces the corresponding name in the config loader module; absent names fall back to the defaults module | full replacement, merge override |
 | **STAGE_OVERRIDES** | Config dict with one entry per orchestration phase (`plan`, `implement`, `review`, `merge`), each holding a model shorthand and an effort level | stage config, model config |
 | **stage override** | The per-phase `model` + `effort` entry inside STAGE_OVERRIDES for one orchestration phase | phase config, agent config |
-| **model shorthand** | A short family alias (`haiku`, `sonnet`, `opus`) that pycastle resolves to the latest full model ID at startup | model alias, model name |
-| **full model ID** | The versioned Claude model identifier (e.g. `claude-sonnet-4-6`) resolved from a model shorthand via `claude list-models` | model ID, model version |
-| **effort level** | One of three Claude effort values (`low`, `normal`, `high`) that controls cost and reasoning depth | effort, effort flag |
+| **model shorthand** | A short family alias (`haiku`, `sonnet`, `opus`) accepted by the Claude CLI natively; stored as-is in `Config` and passed through to `claude --model` at stage execution time; not resolved at config load time (see ADR 0002) | model alias, model name |
+| **full model ID** | The versioned Claude model identifier (e.g. `claude-sonnet-4-6`); may be stored directly in a stage override instead of a shorthand; passed through to `claude --model` unchanged | model ID, model version |
+| **effort level** | One of five Claude effort values (`low`, `medium`, `high`, `xhigh`, `max`) that controls cost and reasoning depth; validated at config load time against this fixed set | effort, effort flag |
 | **CLI default** | The behavior when no `--model` or `--effort` flag is injected — triggered by an empty string in STAGE_OVERRIDES | default model, unset |
-| **validate_config** | Public function in the config validator module; takes a `Config` and a `ClaudeService`, resolves model shorthands to full model IDs, validates all stage overrides, and returns a new immutable `Config`; raises `ConfigValidationError` on any invalid entry | config validation, startup check |
-| **ConfigValidationError** | Error raised by validate_config when a model shorthand or effort level is unrecognised; includes the invalid value, closest valid suggestion, and full list of valid options | validation error, config error |
+| **ConfigValidationError** | Error raised by the config loader when an effort level is unrecognised; includes the invalid value, closest valid suggestion, and full list of valid options | validation error, config error |
+| **auto_push** | Boolean config entry (default `True`) that controls whether `merge_phase` pushes local main to the remote after any merges produce commits; set to `False` to disable automatic pushing | push_after_merge, AUTO_PUSH |
 ## GitHub Integration
@@ -72,8 +71,9 @@
 | **programmatic merge path** | Fast-path logic in the merge phase that runs `git merge --no-edit` directly via subprocess without spawning the Merger; used when all branches merge cleanly | fast path, direct merge |
 | **clean merge** | A `git merge --no-edit` that exits zero and requires no conflict resolution | conflict-free merge, successful merge |
 | **conflicting branch** | A branch whose `git merge --no-edit` exits non-zero; `git merge --abort` is run immediately and the branch is collected for the Merger | failed merge branch |
-| **RALPH** | The required commit message prefix for all Implementer commits (e.g. `RALPH: fix auth bug`) | — |
-| **RALPH: Review -** | The required commit message prefix for all Reviewer commits (e.g. `RALPH: Review - improve error handling`); distinguished from Implementer commits by the `Review -` infix; each agent produces exactly one commit per branch | — |
+| **RALPH: Implement -** | The commit message prefix injected by `run_issue()` in code for all Implementer commits (e.g. `RALPH: Implement - fix auth bug`); prepended to the message the Implementer outputs inside `<commit_message>` tags; used by the implement skip to detect whether implement work is complete | — |
+| **RALPH: Review -** | The commit message prefix injected by `run_issue()` in code for all Reviewer commits (e.g. `RALPH: Review - improve error handling`); prepended to the message the Reviewer outputs inside `<commit_message>` tags; used by the review skip to detect whether review work is complete; each agent produces exactly one commit per branch | — |
+| **`<commit_message>` tag** | XML tag emitted by Implementers and Reviewers instead of `<promise>COMPLETE</promise>`; contains the agent's plain description of changes (no prefix); the orchestrator prepends the appropriate RALPH prefix, stages all worktree changes with `git add -A`, and commits; absence of this tag is a failed run — worktree is preserved and the agent restarts to continue | — |
 | **in-flight issue** | An open issue that has an existing `pycastle/issue-<n>` branch or worktree from a previous interrupted iteration; signals that implement or review work is already partially or fully complete | mid-flight issue, resumed issue |
 | **merge-time preflight skip** | The behavior when the Merger's Pre-flight phase returns failures: `merge_phase` logs a diagnostic, skips the Merger, and returns normally with conflict issues still pending; the next iteration's pre-planning preflight detects the broken baseline and recovers via the preflight-fix path | merge preflight abort |
 | **planning skip** | The behavior in `run_iteration` when at least one open issue is in-flight: the Planner is not invoked and only the in-flight issues are used as the working set for the current iteration; issues with neither a branch nor a worktree are deferred | plan bypass |
@@ -91,7 +91,7 @@
 | **merge-sandbox worktree** | A temporary named-branch worktree (`pycastle/merge-sandbox`) created by `merge_phase` from HEAD after clean merges complete; the Merger runs inside it to resolve conflicting branches; always removed in a `try/finally` by `merge_phase` regardless of state; on success `merge_phase` fast-forwards `main` from the branch before cleanup; located at `.pycastle/.worktrees/merge-sandbox` | merger worktree, conflict worktree |
 | **branch** | A git branch name assigned to an issue inside the plan; follows the pattern `pycastle/issue-<n>-<slug>` | feature branch, issue branch |
 | **orphan worktree** | A worktree directory under `.pycastle/.worktrees/` no longer registered in git, typically left by a crashed agent run | stale worktree, leftover worktree |
-| **orphan sweep** | Startup operation that cross-references `.pycastle/.worktrees/` against `git worktree list --porcelain` and deletes unregistered directories | worktree cleanup, stale cleanup |
+| **orphan sweep** | Startup operation that cross-references `.pycastle/.worktrees/` against `git worktree list --porcelain`, deletes unregistered directories, and removes the `.worktrees` parent directory if no active children remain | worktree cleanup, stale cleanup |
 | **collision detection** | Mechanism that prevents two parallel agents from simultaneously creating worktrees for the same branch, implemented as a per-branch async lock | — |
 ## Prompts
@@ -102,7 +102,8 @@
 | **prompts directory** | The `prompts/` subdirectory inside the pycastle directory holding all prompt files | templates dir |
 | **placeholder** | A `{{VARIABLE}}` token inside a prompt, substituted at render time | template variable, slot |
 | **shell expression** | A `` !`command` `` token inside a prompt, replaced by the command's stdout output at preprocess time | shell expansion |
-| **prompt pipeline** | The two-stage process of rendering placeholders then preprocessing shell expressions | templating, rendering |
+| **prompt pipeline** | The single module (`prompt_pipeline`) owning all prompt concerns: loading coding-standard files from the prompts directory (`load_standards`), rendering `{{placeholders}}` against an args dict, and preprocessing `` !`shell` `` expressions; exposes `prepare_prompt`, `load_standards`, and `PromptRenderError` | templating, rendering |
+| **`load_standards`** | Function in `prompt_pipeline` that reads the five coding-standard files from the `coding-standards/` subdirectory of the prompts directory and returns a `dict[str, str]` keyed by placeholder name (`TESTING_STANDARDS`, `MOCKING_STANDARDS`, `INTERFACES_STANDARDS`, `DEEP_MODULES_STANDARDS`, `REFACTORING_STANDARDS`); missing files return an empty string | — |
 | **CODING_STANDARDS.md** | A reference document placed in the prompts directory and treated as a prompt for discovery and scaffolding purposes | standards file |
 | **EXPLORATION section** | The section of the implement prompt that instructs the Implementer to read files before coding; scoped to files mentioned in the issue body — not a full repository survey | explore section, discovery section |
 | **FEEDBACK LOOPS section** | The section of the implement prompt that instructs the Implementer to run IMPLEMENT_CHECKS commands before committing | feedback section, pre-commit checks |
@@ -115,12 +116,14 @@
 | Term | Definition | Aliases to avoid |
 | --- | --- | --- |
-| **agent output protocol** | The contract between prompts and the orchestrator: the set of XML tags agents emit to signal structured output (`<plan>`, `<issue>`, `<promise>`), plus the module that owns the complete NDJSON stream → typed output pipeline | output format, agent tags, agent signals |
+| **agent output protocol** | The contract between prompts and the orchestrator: the set of XML tags agents emit to signal structured output (`<plan>`, `<issue>`, `<commit_message>`, `<promise>`), plus the module that owns the complete NDJSON stream → typed output pipeline | output format, agent tags, agent signals |
 | **`<plan>` tag** | XML tag emitted by the Planner containing a JSON payload listing unblocked issues for the current iteration; extracted by the agent output protocol module | plan output, plan block |
 | **`<issue>` tag** | XML tag emitted by the preflight-issue agent containing the GitHub issue number it filed; extracted by the agent output protocol module | issue output, issue number tag |
-| **`<promise>COMPLETE</promise>`** | XML tag emitted by Implementers, Reviewers, and the Merger to declare that their work phase is complete; detected by the agent output protocol module | done signal, completion tag |
-| **`AgentOutputProtocolError`** | Base exception raised by the agent output protocol module when a required tag is missing or malformed; subclassed by `PlanParseError`, `IssueParseError`, and `PromiseParseError` | parse error, protocol error |
-| **`process_stream()`** | Single entry point in the agent output protocol module; accepts an iterable of decoded NDJSON lines, an `on_turn` callback, an `AgentRole`, and `usage_limit_patterns`; drives the per-line loop, emits complete assistant turns via the callback, raises `UsageLimitError` immediately on detection, unwraps the result envelope, and returns a typed `AgentOutput`; the container runner is the only caller — phases never call it directly | protocol entry point, stream processor |
+| **`<promise>COMPLETE</promise>`** | XML tag emitted by the Merger and the preflight-issue agent to declare that their work phase is complete; Implementers and Reviewers use `<commit_message>` instead | done signal, completion tag |
+| **`AgentOutputProtocolError`** | Base exception raised by the agent output protocol module when a required tag is missing or malformed; subclassed by `PlanParseError`, `IssueParseError`, `PromiseParseError`, and `CommitMessageParseError` | parse error, protocol error |
+| **`CommitMessageParseError`** | Subclass of `AgentOutputProtocolError` raised when an Implementer or Reviewer completes without emitting a `<commit_message>` tag; treated as a failed run — worktree is preserved and the agent is restarted | — |
+| **`CommitMessageOutput`** | Typed output returned by `process_stream` for IMPLEMENTER and REVIEWER roles; carries the agent's plain `message: str`; the orchestrator prepends the RALPH prefix before committing | — |
+| **`process_stream()`** | Single entry point in the agent output protocol module; accepts an iterable of decoded NDJSON lines, an `on_turn` callback, and an `AgentRole`; drives the per-line loop, emits complete assistant turns via the callback, raises `UsageLimitError` immediately on detection of a 429 error response, unwraps the result envelope, and returns a typed `AgentOutput`; the container runner is the only caller — phases never call it directly | protocol entry point, stream processor |
 | **`on_turn` callback** | A `Callable[[str], None]` passed to `process_stream` by the container runner; invoked once per complete assistant turn during the Work phase; constructed by the container runner as a lambda over `StatusDisplay.print` so the agent output protocol module has no dependency on `StatusDisplay` | turn callback, display hook |
 | **Claude streaming envelope** | The NDJSON format Claude Code uses for structured output; lines are JSON objects and the agent's final result is carried in the `{"type": "result", "result": "..."}` line; unwrapped internally by `process_stream` before tag extraction | streaming format, NDJSON output |
@@ -128,7 +131,7 @@
 | Term | Definition | Aliases to avoid |
 | --- | --- | --- |
-| **agent lifecycle phase** | One of four named stages (Setup, Pre-flight, Prepare, Work) within a single agent container run | step, stage |
+| **agent lifecycle phase** | One of three named stages (Setup, Pre-flight, Work) within a single agent container run; the Prepare phase was retired as a distinct stage — prompt rendering is now an internal step of the Work phase | step, stage |
 | **Setup phase** | First agent lifecycle phase: worktree creation, gitdir overlay creation, parent git dir mount wiring, container start, git identity propagation, and consuming project dependency installation (`pip install -e '.[dev]'` or `pip install -r requirements.txt`); any tool referenced in PREFLIGHT_CHECKS must be declared in the consuming project's dependency file — the image does not provide dev tools as a fallback | container setup, init phase |
 | **Pre-flight phase** | Second agent lifecycle phase: runs quality checks sequentially inside the container and returns a list of failure tuples to the orchestrator; does not spawn agents internally | preflight, pre-flight check phase |
 | **quality check** | One command run during the Pre-flight phase, as defined in PREFLIGHT_CHECKS; each runs independently so all failures are collected in a single pass | quality gate, check |
@@ -137,8 +140,7 @@
 | **pre-existing failure** | A pre-flight failure that existed before the current agent's task began; root cause of scope creep | baseline failure |
 | **scope creep** | The behavior where an agent modifies files outside its assigned task scope, typically caused by inheriting pre-existing failures | overreach |
 | **skip_preflight** | Flag on `run_agent()` that bypasses the Pre-flight phase; always True for the preflight-issue agent; defaults to False for all other agents | — |
-| **Prepare phase** | Third agent lifecycle phase: prompt rendering and prompt injection into the container | hook phase, pre-work |
-| **Work phase** | Fourth agent lifecycle phase: Claude Code invocation and streaming output collection | execution phase, run phase |
+| **Work phase** | Third agent lifecycle phase: prompt rendering and injection into the container, followed by Claude Code invocation and streaming output collection; prompt preparation is an internal step of `ContainerRunner.work()` — not a separate phase or method call | execution phase, run phase |
 | **git identity propagation** | Setup phase operation that reads the host `git user.name` and `git user.email` and configures them inside the container | git config injection, user setup |
 | **idle timeout** | Maximum wall-clock seconds an agent may produce no output before being killed and raising AgentTimeoutError; default 300 s | inactivity timeout, silence timeout |
 | **worktree timeout** | Maximum wall-clock seconds a git worktree operation may take before raising WorktreeTimeoutError; default 30 s | git timeout |
@@ -149,7 +151,9 @@
 | Term | Definition | Aliases to avoid |
 | --- | --- | --- |
 | **Dockerfile** | File in the pycastle directory defining the Docker image for agent containers — ships without baked-in credentials and without baked-in dev tools; system utilities (git, gh), Claude Code CLI, and the Python runtime are the only baked-in contents; all dev tools (e.g. ruff, mypy, pytest) must be declared in the consuming project's dependency file and are installed at runtime during the Setup phase | image definition |
-| **container runner** | Package module that manages Docker container lifecycle, injects runtime secrets, and drives the four agent lifecycle phases (Setup, Pre-flight, Prepare, Work) via instance methods; holds `status_display` at construction time so phase methods can update terminal state without caller involvement; during the Work phase owns Docker byte chunking, byte-to-line splitting, log writing, and idle timeout detection, then delegates the line stream to `process_stream` | docker wrapper |
+| **DockerSession** | Module in `docker_session.py` that owns Docker container lifecycle and low-level I/O; constructed from a pre-computed volume spec, filtered container environment, image name, config, and an optional `auto_overlay` path to delete on exit; exposes `exec_simple(command, timeout) → str` and `write_file(content, container_path)`; used by `ContainerRunner` as its Docker substrate — no agent-protocol concepts live here | docker client, container manager |
+| **`build_volume_spec`** | Pure-ish function in `docker_session.py` that computes the complete Docker volume specification for a container run from host paths; owns the necessary file I/O: reads the `.git` file to locate the parent git dir, creates the gitdir overlay on Windows when needed; returns `(volumes_dict, auto_overlay)` where `auto_overlay` is a host path `DockerSession.__exit__` must delete, or `None` if no overlay was created | volume builder, mount spec |
+| **container runner** | Package module that drives the three agent lifecycle phases (Setup, Pre-flight, Work) inside a `DockerSession`; constructed with a name, a `DockerSession` instance, model, effort, `status_display`, and config; delegates all Docker I/O to the session; during the Work phase renders the prompt, writes it to the container, then drives `WorkStream` for byte chunking, log writing, idle timeout detection, and delegates the line stream to `process_stream` | docker wrapper |
 | **host repo** | The git repository on the developer's machine that is mounted into each agent container | project repo, local repo |
 | **volume mount** | A Docker bind mount attaching a host filesystem path to a container-internal path, with an explicit read/write mode | bind mount, volume |
 | **RO mount** | A volume mount with `mode: "ro"` — the container cannot write to it; used for the host repo | read-only mount |
@@ -162,12 +166,13 @@
 | **new-branch path** | The `git worktree add -b <branch> <path> <safe-SHA>` form used when the branch does not yet exist; always branched from the pinned safe SHA rather than HEAD | — |
 | **existing-branch path** | The `git worktree add <path> <branch>` form used when the branch already exists | — |
 | **worktree contents check** | Guard step run after `git worktree add` that verifies `pyproject.toml` or `requirements.txt` is present; fails with the worktree path and directory listing if absent | checkout guard, file check |
-| **`detached_worktree`** | Async context manager in `worktree.py` that creates a detached checkout at a given SHA, yields the path, and guarantees removal in `__aexit__` regardless of outcome; used by `planning_phase` and `preflight_phase` for their sandbox worktrees | managed_worktree |
-| **`branch_worktree`** | Async context manager in `worktree.py` that creates a named-branch worktree at a given SHA, yields the path, and on exit removes the worktree and optionally deletes the branch; used by `merge_phase` for the merge-sandbox worktree | managed_worktree |
+| **`detached_worktree`** | Async context manager in `worktree.py` that creates a detached checkout at a given SHA, yields the path, and guarantees removal in `__aexit__` regardless of outcome; also removes the `.worktrees` parent directory if no other worktrees remain after cleanup; used by `planning_phase` and `preflight_phase` for their sandbox worktrees | managed_worktree |
+| **`branch_worktree`** | Async context manager in `worktree.py` that creates a named-branch worktree at a given SHA, yields the path, and on exit removes the worktree, optionally deletes the branch, and removes the `.worktrees` parent directory if no other worktrees remain; used by `merge_phase` for the merge-sandbox worktree | managed_worktree |
 | **`_agent_worktree`** | Async context manager in `implement.py` that owns the full Implementer and Reviewer worktree lifecycle; accepts a branch name, SHA, `CancellationToken`, and `Deps`; on entry creates the worktree and gitdir overlay; on exit conditionally removes the worktree based on `token.wants_worktree_preserved` and working-tree cleanliness, and always removes the gitdir overlay; used by `run_issue` twice per issue — once for the Implementer (new-branch path) and once for the Reviewer (existing-branch path); defined in `implement.py` not `worktree.py` because its cleanup policy depends on agent-lifecycle state (`CancellationToken`) rather than being unconditional | managed_worktree |
 | **`worktree_name_for_branch`** | Function in `worktree.py` that derives a short directory name from a branch string: extracts `issue-N` from `pycastle/issue-N-slug` or falls back to a sanitised slug; single authoritative definition replacing duplicated regex in `agent_runner` and `merge_phase` | — |
 | **`worktree_path`** | Function in `worktree.py` that constructs the host filesystem path for a named worktree at `<repo_root>/<pycastle_dir>/.worktrees/<name>`; single authoritative path expression replacing duplication across all phase modules | — |
 | **runtime injection** | The act of reading `~/.claude.json` from the host and writing it to `/home/agent/.claude.json` inside a container before the agent runs | baking in, build-time config |
+| **WorkStream** | Class in `stream_session.py` that converts a raw Docker byte stream into an `AgentOutput`; constructed with a byte-chunk iterator, a log path, an idle timeout, and an `on_chunk: Callable[[], None]` callback; its `run(role, on_turn) → AgentOutput` method drives a feeder thread, writes each byte chunk to the log file (flushed immediately), calls `on_chunk()` per chunk, detects idle timeouts (raising `AgentTimeoutError`), splits bytes into complete UTF-8 lines, and delegates to `process_stream`; the only caller is `ContainerRunner.run_streaming`, which passes `status_display.reset_idle_timer` as the `on_chunk` callback | stream session, work session |
 | **StreamParser** | Retired — its assistant-turn assembly logic is now a private implementation detail of `process_stream` in the agent output protocol module; `stream_parser.py` no longer exists as a public module | stream processor, message parser |
 | **agent message** | The text content emitted by an agent during a single assistant turn; excludes tool-use and tool-result blocks; during the Work phase, printed to the console prefixed with the agent name and followed by a blank line; not shown in the status panel | assistant message, agent output |
 | **PycastleError** | Base exception class for all pycastle domain errors | — |
@@ -191,14 +196,19 @@
 | **ClaudeService** | Service that encapsulates the `claude list-models` subprocess call with process-lifetime caching | Claude wrapper, model provider |
 | **DockerService** | Service that encapsulates the `docker build` subprocess call with support for build args | Docker wrapper, build provider |
 | **GithubService** | Service that encapsulates `gh` CLI calls for GitHub issue operations: closing issues, querying parent issues, listing open sub-issues, and reading issue labels | GitHub wrapper, gh provider |
+| **`Deps`** | Concrete dataclass constructed once per iteration in the orchestrator and passed to `run_iteration`; bundles the full set of iteration-layer dependencies: `repo_root`, `git_svc`, `github_svc`, `agent_runner`, `cfg`, `logger`, and `status_display`; satisfies every per-phase dependency protocol via structural typing so the orchestrator passes it unmodified; `env` is intentionally absent — it is consumed at `AgentRunner` construction time before `Deps` is built and is not threaded through the iteration layer | iteration context, deps container |
+| **per-phase dependency protocol** | A private `Protocol` class declared in each phase module listing only the fields that phase actually accesses; `Deps` satisfies every protocol via structural typing; tests construct minimal inline dataclasses with only the required fields instead of building a full `Deps`; follows the `_WorktreeDeps` pattern established in `worktree.py`; individual protocols: `_PreflightDeps` (in `preflight.py`), `_PlanningDeps` (in `planning.py`), `_ImplementDeps` (in `implement.py`), `_MergeDeps` (in `merge.py`), `_UtilDeps` (in `_utils.py`) | deps narrowing, phase context |
+| **`_WorktreeDeps`** | Private protocol in `worktree.py` listing only the fields that worktree utilities need (`repo_root`, `cfg`, `git_svc`); the original instance of the per-phase dependency protocol pattern; satisfied by `Deps` structurally | — |
 | **Logger** | Injectable abstraction that owns all structured log output for one iteration; exposes named channels (`log_error`, `log_agent_output`) each writing to a dedicated file under `logs/`; injected via `Deps` so tests never touch the filesystem | log writer, output handler |
 | **RecordingLogger** | Test double for `Logger` that records every call in memory; tests assert on recorded calls rather than capturing stderr or reading log files | mock logger, spy logger |
-| **StatusDisplay** | Injectable abstraction that owns the live terminal status panel and all formatted terminal output; exposes `register(caller, startup_message="started", work_body="")`, `update_phase`, `reset_idle_timer`, `remove(caller, shutdown_message="finished", shutdown_style="success")`, and `print(caller, message, style=None)` methods; backed by a `rich` `Live` display in production and a `PlainStatusDisplay` in tests; injected via `Deps` as a separate concern from `Logger`; defined in `status_display` module | terminal display, status bar |
-| **caller** | The identity string passed as the first argument to `StatusDisplay.register`, `remove`, and `print`; rendered as a `[Caller]` prefix on every terminal output line; empty string `""` is the anonymous caller — no brackets are printed and the message is output as-is; a blank line is inserted before any output call (`register`, `remove`, or `print`) when the caller differs from the previous one, or unconditionally when the caller is `""` (anonymous outputs always stand alone); canonical callers — phase rows: `"Preflight"`, `"Plan"`, `"Implement"`, `"Merge"`; agents: `"Preflight Agent"`, `"Plan Agent"`, `"Implement Agent #N"`, `"Review Agent #N"`, `"Merge Agent"` | source, label |
-| **work_body** | The caller-constructed string passed as the third argument to `register`; displayed in the body column of the status row during the Work phase; empty string for callers that do not reach Work | — |
-| **PlainStatusDisplay** | Plain-terminal adapter for `StatusDisplay` defined in `status_display` module; panel methods (`update_phase`, `reset_idle_timer`) are no-ops; `register` and `remove` print their startup/shutdown messages; `print(caller, message, style=None)` formats output as `[Caller] message` with no ANSI colour codes, no bold, and style ignored; used in tests so assertions can match the full formatted line | NullStatusDisplay |
-| **status row** | One headerless line in the `StatusDisplay` live panel; created by `register` and removed by `remove`; two kinds: **agent rows** (one per active agent — `"Preflight Agent"`, `"Plan Agent"`, `"Implement Agent #N"`, `"Review Agent #N"`, `"Merge Agent"`) and **phase rows** (one per active phase — `"Preflight"`, `"Plan"`, `"Implement"`, `"Merge"`); phase rows and agent rows within the same phase coexist; format: `elapsed \| Name \| idle \| body`; elapsed is dim and right-justified; name is bold with any numeric part styled bold cyan; idle is dim; body shows the current lifecycle phase name for all non-Work states, or the `work_body` string during Work; elapsed counts up from `register` and never resets; idle resets on each Docker stream chunk; the live panel is preceded by one blank line to visually separate it from scrollback; ordered by orchestration phase (plan → implement → review → merge) then by issue number | agent status row, status entry, agent row |
-| **IterationOutcome** | Sealed return type of `run_iteration()`; one of four variants: `Continue` (iteration completed, keep looping), `Done` (no issues found, stop cleanly), `AbortedHITL` (HITL verdict — carries `issue_number`; orchestrator exits non-zero), `AbortedUsageLimit` (token ceiling hit — worktrees preserved; orchestrator sleeps until 2 minutes past the next local-time full hour, then continues the loop to retry the current issue from scratch; repeats indefinitely on consecutive hits) | iteration result, loop result |
+| **StatusDisplay** | Injectable abstraction that owns the live terminal status panel and all formatted terminal output; exposes `register(caller, kind, startup_message="started", work_body="", initial_phase="Setup")`, `update_phase`, `reset_idle_timer`, `remove(caller, shutdown_message="finished", shutdown_style="success")`, and `print(caller, message, style=None)` methods; `kind` is a required `Literal["phase", "agent"]` discriminator stored per-caller and consulted by the blank-line rule (no default — every call site must classify its caller explicitly); `shutdown_message` and `message` may contain `\n` — each line is emitted separately with the `[Caller]` prefix and the same style applied to every line; `shutdown_style` accepts `"success"` (green), `"error"` (red), or `"warning"` (yellow); backed by a `rich` `Live` display in production and a `PlainStatusDisplay` in tests; injected via `Deps` as a separate concern from `Logger`; defined in `status_display` module | terminal display, status bar |
+| **caller** | The identity string passed as the first argument to `StatusDisplay.register`, `remove`, and `print`; rendered as a `[Caller]` prefix on every terminal output line; empty string `""` is the anonymous caller — no brackets are printed and the message is output as-is; canonical callers — phase rows (`kind="phase"`): `"Preflight"`, `"Plan"`, `"Implement"`, `"Merge"`; agents (`kind="agent"`): `"Preflight Agent"`, `"Plan Agent"`, `"Implement Agent #N"`, `"Review Agent #N"`, `"Merge Agent"` | source, label |
+| **blank-line rule** | The rule the display applies before any output call (`register`, `remove`, or `print`) to decide whether to emit a separating blank line; a blank line is inserted iff (a) the caller is anonymous (`""`, always isolated), or (b) the caller differs from the previous caller AND the (previous-kind, current-kind) pair is *not* `("phase","agent")` or `("agent","phase")`; effect: a phase row and the agent rows it spawns render as one uninterrupted block, while phase→different-phase and agent→different-agent transitions keep their blank line; the very first output call also gets a leading blank line; `update_phase` and `reset_idle_timer` never touch the rule's state; `print` from a registered-but-unknown-kind caller falls through to "blank line yes" as a safe default | separator rule |
+| **work_body** | The caller-constructed string passed as the third argument to `register`; applies to agent rows only; displayed in the body column during the Work phase; empty string for agent rows that do not reach Work; unused by phase rows (which use `initial_phase` for their fixed body label) | — |
+| **PlainStatusDisplay** | Plain-terminal adapter for `StatusDisplay` defined in `status_display` module; panel methods (`update_phase`, `reset_idle_timer`) are no-ops; `register` and `remove` print their startup/shutdown messages; `print(caller, message, style=None)` formats output as `[Caller] message` with no ANSI colour codes, no bold, and style ignored; multi-line messages are split and each line prefixed with `[Caller]`; used in tests so assertions can match the full formatted line | NullStatusDisplay |
+| **phase_row** | Async context manager in `iteration/` that owns the `StatusDisplay` register/remove lifecycle for a single phase row; accepts `startup_message: str = "started"` forwarded to `register`; on entry calls `register(caller, kind="phase", startup_message=startup_message, initial_phase=initial_phase)` — the single source of truth for `kind="phase"`; yields a `PhaseRow` whose `close(shutdown_message, shutdown_style="success")` method calls `remove()` and marks the row as closed; if `close()` is never called before exit (exception path), automatically calls `remove(caller, "failed", shutdown_style="error")`; the canonical way to manage phase row lifecycle — replaces hand-rolled active-flag patterns; all four phase rows (Preflight, Plan, Implement, Merge) register through this wrapper | — |
+| **status row** | One headerless line in the `StatusDisplay` live panel; created by `register` and removed by `remove`; two kinds: **agent rows** (one per active agent — `"Preflight Agent"`, `"Plan Agent"`, `"Implement Agent #N"`, `"Review Agent #N"`, `"Merge Agent"`) and **phase rows** (one per active phase — `"Preflight"`, `"Plan"`, `"Implement"`, `"Merge"`); phase rows and agent rows within the same phase coexist; format: `elapsed \| Name \| idle \| body`; elapsed is dim and right-justified; name is bold with any numeric part styled bold cyan; idle is dim; body column: for **agent rows**, shows the current agent lifecycle phase name for all non-Work states, or `work_body` during Work; for **phase rows**, shows a body derived from the phase: `"Planning"` for Plan, `"Merging"` for Merge, `"Running"` for Preflight; for the **Implement phase row** specifically, the body is dynamic — `"Running: started Agents for X/Y issues"` where Y is the total issue count for the phase and X increments each time an agent acquires the concurrency semaphore (monotonic; either an Implement Agent or Review Agent counts); elapsed counts up from `register` and never resets; idle resets on each Docker stream chunk; the live panel is preceded by one blank line to visually separate it from scrollback; ordered by orchestration phase (plan → implement → review → merge) then by issue number | agent status row, status entry, agent row |
+| **IterationOutcome** | Sealed return type of `run_iteration()`; one of four variants: `Continue` (iteration completed, keep looping), `Done` (no issues found, stop cleanly), `AbortedHITL` (HITL verdict — carries `issue_number`; orchestrator exits non-zero), `AbortedUsageLimit` (token ceiling hit — carries `reset_time: datetime | None`; worktrees preserved; orchestrator sleeps until `reset_time + 2 min` when parsed from the Claude message, or until 2 minutes past the next local-time full hour when the reset time cannot be parsed; status message appends `"(estimated)"` on the fallback path; continues the loop to retry the current issue from scratch; repeats indefinitely on consecutive hits) | iteration result, loop result |
 ## Test Anti-Patterns (Red Flags)
@@ -213,7 +223,7 @@
 ## Relationships
 - **STAGE_OVERRIDES** has exactly four entries, one per orchestration phase (`plan`, `implement`, `review`, `merge`); each entry has independent `model` and `effort` fields — an empty string for either means CLI default (no flag injected).
-- **validate_config** is called internally by `load_config()` before it returns; queries `claude list-models` once per process (cached by `ClaudeService`); returns a new immutable `Config` via `dataclasses.replace` with all non-empty `model` entries resolved to full model IDs.
+- **`load_config()`** is a pure function — no subprocess calls; it validates effort strings against the fixed set and raises `ConfigValidationError` on any invalid entry; model strings (shorthands or full IDs) are stored as-is in `Config` and resolved by the Claude CLI at stage execution time (see ADR 0002).
 - The **Planner** produces one plan per iteration listing only unblocked AFK issues; blockers and HITL issues are excluded via the dependency graph.
 - Each AFK issue in a plan is processed by exactly one **Implementer** followed by one **Reviewer**.
 - The **merge phase** attempts the programmatic merge path for every branch sequentially; the **Merger** is spawned at most once per iteration and only when conflicting branches exist.
@@ -222,7 +232,7 @@
 - The **HITL verdict** is read by the orchestrator from the GitHub issue label after the **preflight-issue agent** completes; `ready-for-agent` triggers the **preflight-fix path**, `ready-for-human` aborts with a non-zero exit code.
 - On the **preflight-fix path**, the Planner is skipped; one Implementer is spawned for the preflight issue, followed by one Reviewer, then a merge; a new iteration then begins.
 - The **Planner** and all **Implementer** worktrees are created from the pinned **safe SHA**, never from HEAD directly; this guarantees every agent sees the same verified-clean committed state regardless of external commits that land on main after preflight passes.
-- The **planning skip** is checked before every Planner invocation; it takes priority over normal planning when any open issue is **in-flight**. The **implement skip** and **review skip** are checked inside `run_issue` before any worktree is created; they are mutually exclusive with normal agent spawning for that phase. Both skips are triggered by commit prefix detection (`RALPH: Review -` → review skip; `RALPH:` without `Review -` → implement skip only).
+- The **planning skip** is checked before every Planner invocation; it takes priority over normal planning when any open issue is **in-flight**. The **implement skip** and **review skip** are checked inside `run_issue` before any worktree is created; they are mutually exclusive with normal agent spawning for that phase. Both skips are triggered by commit prefix detection (`RALPH: Review -` → review skip; `RALPH: Implement -` → implement skip only).
 - A **merge-time preflight skip** leaves conflict issues open; they become **in-flight issues** on the next iteration, triggering the **planning skip** and then the **implement skip** or **review skip** as appropriate once the baseline is fixed.
 - In **sequential mode** (`max_parallel = 1`), the iteration processes issues one by one: after each issue's merge the safe SHA is re-pinned to the new HEAD, and the next Implementer starts from that SHA; a failed issue is skipped (remains `ready-for-agent`) and the queue continues; the Merger remains available as a fallback for unexpected conflicts; no additional pre-flight checks run between issues.
 - The **Pre-flight phase** (agent lifecycle) runs quality checks inside the container and returns a list of failure tuples to the orchestrator; it never spawns agents internally.
@@ -231,10 +241,13 @@
 - Host mounts per container: host repo → RO at `/home/agent/repo`; worktree → RW at `/home/agent/workspace`; `<host-repo>/.git` → RW at `/.pycastle-parent-git`; on Windows, gitdir overlay → RO over `/home/agent/workspace/.git`.
 - A **Service** defines a Custom exception hierarchy so callers never handle raw subprocess exceptions; tests inject Default implementations from a test fixture and override per-test for error paths.
 - **StatusDisplay** is a separate injectable in `Deps` alongside `Logger`; `Logger` owns file I/O, `StatusDisplay` owns the live terminal UI — they never overlap.
-- Rich markup (e.g. `[red]...[/red]`) must never be embedded in a `StatusDisplay.print` message string; colouring is expressed exclusively via the `style` parameter (`"error"`, `"success"`).
-- A **status row** is created by `StatusDisplay.register` and removed by `StatusDisplay.remove`; phase rows are registered at the start of each orchestration phase and removed at its end; agent rows are registered at container Setup and removed when the agent finishes or errors; the `rich` `Live` display is started on the first `register` call and stopped after the last `remove` call.
+- **`Deps`** does not carry `env`; credentials are extracted from the environment in `main.py`, passed directly to `AgentRunner` at construction time, and are not accessible to any iteration-layer phase. Phase functions never reference `env` directly.
+- Each phase module declares its own **per-phase dependency protocol** listing only its actual field accesses; `Deps` satisfies all of them structurally so the orchestrator passes it unchanged; tests construct minimal inline dataclasses with only the required fields. `_WorktreeDeps` in `worktree.py` is the established precedent for this pattern.
+- Rich markup (e.g. `[red]...[/red]`) must never be embedded in a `StatusDisplay.print` message string; colouring is expressed exclusively via the `style` parameter (`"error"`, `"success"`, `"warning"`).
+- A **status row** is created by `StatusDisplay.register` and removed by `StatusDisplay.remove`; phase rows are managed via the **`phase_row`** context manager (which sets `kind="phase"`) — registered on entry and removed (with the phase outcome as the shutdown message) via `PhaseRow.close()`; agent rows are registered directly at container Setup with `kind="agent"` and removed when the agent finishes or errors; the `rich` `Live` display is started on the first `register` call and stopped after the last `remove` call. The **blank-line rule** consults the registered kind to suppress the separator between a phase row and the agent rows it spawns (in either direction).
 - All orchestrator-level terminal output (e.g. "Planning complete…") is routed through `StatusDisplay.print()` so `rich` can coordinate it with the live panel; bare `print()` calls are not used while a `StatusDisplay` is active.
-- During the Work phase the container runner owns byte chunking, byte-to-line splitting, log writing, and idle timeout detection; it passes the decoded NDJSON line stream and an **`on_turn` callback** to **`process_stream`**, which assembles assistant turns (invoking the callback for each), detects usage limit lines and raises `UsageLimitError` immediately, unwraps the result envelope, and returns a typed `AgentOutput`; phases receive `AgentOutput` directly from `AgentRunner.run()` — no phase calls `parse()` or `assert_complete()`. Setup, Pre-flight, and Prepare phases produce no console output — their activity is reflected only in the body column of the agent status row.
+- During the Work phase the container runner renders and injects the prompt, then owns byte chunking, byte-to-line splitting, log writing, and idle timeout detection via `WorkStream`; it passes the decoded NDJSON line stream and an **`on_turn` callback** to **`process_stream`**, which assembles assistant turns (invoking the callback for each), detects 429 error responses via `_check_usage_limit` and raises `UsageLimitError(reset_time)` immediately (where `reset_time: datetime | None` is parsed from the Claude message and converted to local time), unwraps the result envelope, and returns a typed `AgentOutput`; phases receive `AgentOutput` directly from `AgentRunner.run()` — no phase calls `parse()` or `assert_complete()`. Setup and Pre-flight phases produce no console output — their activity is reflected only in the body column of the agent status row.
+- **`AgentRunner`** constructs a `DockerSession` (calling `build_volume_spec` to resolve volume paths) and a `ContainerRunner` (passing the session), then orchestrates the three lifecycle phases; it is the only caller of `build_volume_spec` and the owner of `CLAUDE_ACCOUNT_JSON` injection into the session.
 ## Example dialogue

{pycastle-0.1.3.10.dev0 → pycastle-0.2.0.1.dev0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pycastle
-Version: 0.1.3.10.dev0
+Version: 0.2.0.1.dev0
 Summary: Python orchestrator for autonomous Claude Code agents in Docker
 License: MIT License

pycastle-0.2.0.1.dev0/docs/adr/0002-cli-native-model-shorthand-resolution.md ADDED Viewed

@@ -0,0 +1,34 @@
+# ADR 0002: CLI-native model shorthand resolution over load-time API call
+**Status:** Accepted
+**Date:** 2026-05-03
+## Context
+`Config.plan_override.model` (and the equivalent fields for implement, review, and merge stages) accepts either a full Claude model ID (`claude-sonnet-4-6`) or a shorthand (`sonnet`). Before this decision, `load_config` resolved shorthands to full model IDs at config load time by calling `claude_service.list_models()` — a subprocess call to the Claude CLI — and selecting the latest matching model.
+Two approaches were considered:
+**Option A — Resolve at load time (previous behaviour):**
+`load_config` instantiates `ClaudeService`, calls `list_models()`, and replaces any shorthand in the config with the resolved full model ID before returning. The returned `Config` always contains fully-resolved model IDs.
+**Option B — Pass through to the CLI:**
+`load_config` is pure (file I/O only). The model string is passed as-is to the Claude CLI at stage execution time. The CLI resolves shorthands natively.
+## Decision
+**Option B.** Model shorthand resolution is delegated to the Claude CLI at stage execution time.
+## Reasons
+- **Hidden interface cost.** Option A makes `load_config` appear to be a pure file-loading operation but introduces a subprocess call as a hidden side effect. Callers — including tests — must know to mock `ClaudeService` to avoid hitting the CLI.
+- **Verified CLI support.** The Claude CLI accepts shorthands directly (`claude --model sonnet` works). There is no need to pre-resolve them.
+- **Locality of validation.** Invalid model strings surface as CLI errors at the point of use, where the context (which stage, which run) is most relevant.
+- **Testability.** A pure `load_config` can be tested with plain `Config` comparisons and no mocks.
+## Consequences
+- `Config.plan_override.model` (and equivalent fields) may hold a shorthand or a full model ID — callers cannot distinguish between them by type alone.
+- Invalid model strings are not caught at startup. A bad model value surfaces as a CLI error when the relevant stage first runs, not when config is loaded.
+- `validator.py` and its `_fetch_models` / `_resolve_shorthand` machinery are removed. Effort validation (a pure set-membership check) moves inline into `load_config`.
+- `load_config` no longer accepts or instantiates a `claude_service` argument.

{pycastle-0.1.3.10.dev0 → pycastle-0.2.0.1.dev0}/src/pycastle/agent_output_protocol.py RENAMED Viewed

@@ -3,10 +3,15 @@ import enum
 import json
 import re
 from collections.abc import Callable, Iterable
-from typing import TypeAlias
+from datetime import datetime, timedelta, timezone
+from typing import Literal, TypeAlias
 from .errors import UsageLimitError
+_RESET_TIME_RE = re.compile(
+    r"resets\s+(\d{1,2}:\d{2}(?:am|pm))\s+\(UTC\)", re.IGNORECASE
+)
 class AgentRole(enum.Enum):
     PLANNER = "planner"
@@ -32,7 +37,14 @@ class CompletionOutput:
     pass
-AgentOutput: TypeAlias = PlannerOutput | IssueOutput | CompletionOutput
+@dataclasses.dataclass(frozen=True)
+class CommitMessageOutput:
+    message: str
+AgentOutput: TypeAlias = (
+    PlannerOutput | IssueOutput | CompletionOutput | CommitMessageOutput
+)
 class AgentOutputProtocolError(Exception):
@@ -51,6 +63,10 @@ class PromiseParseError(AgentOutputProtocolError):
     pass
+class CommitMessageParseError(AgentOutputProtocolError):
+    pass
 def _extract_planner_output(text: str) -> PlannerOutput:
     match = re.search(r"<plan>([\s\S]*?)</plan>", text)
     if not match:
@@ -99,23 +115,30 @@ def _extract_issue_output(text: str) -> IssueOutput:
     return IssueOutput(labels=labels, number=number)
-def _is_usage_limit_line(line: str, patterns: tuple[str, ...]) -> bool:
+def _check_usage_limit(line: str) -> datetime | None | Literal[False]:
     try:
         obj = json.loads(line)
-        if isinstance(obj, dict):
-            if obj.get("type") == "result" and obj.get("is_error"):
-                if obj.get("api_error_status") == 429:
-                    return True
-                result_text = obj.get("result")
-                if isinstance(result_text, str) and any(
-                    p.lower() in result_text.lower() for p in patterns
-                ):
-                    return True
-            return False
     except json.JSONDecodeError:
-        pass
-    line_lower = line.lower()
-    return any(p.lower() in line_lower for p in patterns)
+        return False
+    if not isinstance(obj, dict) or obj.get("api_error_status") != 429:
+        return False
+    result_text = obj.get("result")
+    if not isinstance(result_text, str):
+        return None
+    match = _RESET_TIME_RE.search(result_text)
+    if not match:
+        return None
+    try:
+        parsed = datetime.strptime(match.group(1).lower(), "%I:%M%p").time()
+    except ValueError:
+        return None
+    today_utc = datetime.now(timezone.utc).date()
+    utc_dt = datetime.combine(today_utc, parsed, tzinfo=timezone.utc)
+    local_dt = utc_dt.astimezone().replace(tzinfo=None)
+    now_local = datetime.now()
+    if local_dt < now_local - timedelta(minutes=2):
+        local_dt += timedelta(days=1)
+    return local_dt
 def _extract_turn(line: str) -> str | None:
@@ -135,22 +158,29 @@ def _extract_turn(line: str) -> str | None:
     return "\n\n".join(parts) if parts else None
+_COMMIT_MESSAGE_RE = re.compile(r"<commit_message>([\s\S]*?)</commit_message>")
 def process_stream(
     lines: Iterable[str],
     on_turn: Callable[[str], None],
     role: AgentRole,
-    usage_limit_patterns: tuple[str, ...],
 ) -> AgentOutput:
     collected: list[str] = []
     result_text: str | None = None
     for line in lines:
         collected.append(line)
-        if _is_usage_limit_line(line, usage_limit_patterns):
-            raise UsageLimitError(line)
+        usage_limit = _check_usage_limit(line)
+        if usage_limit is not False:
+            raise UsageLimitError(reset_time=usage_limit)
         turn = _extract_turn(line)
         if turn is not None:
             on_turn(turn)
-            if role in (AgentRole.IMPLEMENTER, AgentRole.REVIEWER, AgentRole.MERGER):
+            if role in (AgentRole.IMPLEMENTER, AgentRole.REVIEWER):
+                match = _COMMIT_MESSAGE_RE.search(turn)
+                if match:
+                    return CommitMessageOutput(message=match.group(1).strip())
+            elif role == AgentRole.MERGER:
                 if re.search(r"<promise>COMPLETE</promise>", turn):
                     return CompletionOutput()
             elif role == AgentRole.PLANNER:
@@ -184,6 +214,13 @@ def process_stream(
             return _extract_planner_output(text)
         except PlanParseError as exc:
             raise PlanParseError(f"{exc}{tail}") from exc.__cause__
+    if role in (AgentRole.IMPLEMENTER, AgentRole.REVIEWER):
+        match = _COMMIT_MESSAGE_RE.search(text)
+        if not match:
+            raise CommitMessageParseError(
+                f"Agent produced no <commit_message> tag.{tail}"
+            )
+        return CommitMessageOutput(message=match.group(1).strip())
     if not re.search(r"<promise>COMPLETE</promise>", text):
         raise PromiseParseError(
             f"Agent produced no <promise>COMPLETE</promise> tag.{tail}"

{pycastle-0.1.3.10.dev0 → pycastle-0.2.0.1.dev0}/src/pycastle/agent_runner.py RENAMED Viewed

@@ -6,6 +6,7 @@ from .agent_output_protocol import AgentOutput, AgentRole
 from .agent_result import CancellationToken, PreflightFailure
 from .config import Config
 from .container_runner import ContainerRunner
+from .docker_session import DockerSession, build_volume_spec
 from .errors import AgentTimeoutError, UsageLimitError
 from .services import GitService
 from .status_display import PlainStatusDisplay
@@ -55,6 +56,25 @@ class AgentRunner:
         self._git_service = git_service
         self._docker_client = docker_client
+    def _build_session(self, mount_path: Path) -> DockerSession:
+        volumes, auto_overlay = build_volume_spec(mount_path)
+        container_env = {
+            k: v for k, v in self._env.items() if k != "CLAUDE_ACCOUNT_JSON"
+        }
+        return DockerSession(
+            volumes=volumes,
+            container_env=container_env,
+            image_name=self._cfg.docker_image_name,
+            cfg=self._cfg,
+            docker_client=self._docker_client,
+            auto_overlay=auto_overlay,
+        )
+    def _inject_claude_credentials(self, session: DockerSession) -> None:
+        claude_json = self._env.get("CLAUDE_ACCOUNT_JSON")
+        if claude_json:
+            session.write_file(claude_json, "/home/agent/.claude.json")
     async def run(self, request: RunRequest) -> AgentOutput | PreflightFailure:
         name = request.name
         prompt_file = request.prompt_file
@@ -72,23 +92,23 @@ class AgentRunner:
         _token = token if token is not None else CancellationToken()
         if _token.is_cancelled:
-            raise UsageLimitError("Agent cancelled due to usage limit")
+            raise UsageLimitError(reset_time=None)
+        session = self._build_session(mount_path)
         runner = ContainerRunner(
             name,
-            mount_path,
-            self._env,
+            session,
             model=model,
             effort=effort,
-            docker_client=self._docker_client,
             status_display=status_display,
             cfg=self._cfg,
         )
+        status_display.register(name, "agent", work_body=work_body)
         try:
             git_name = self._git_service.get_user_name()
             git_email = self._git_service.get_user_email()
             await runner.setup(git_name, git_email, work_body)
-            await runner.prepare(prompt_file, prompt_args or {})
+            self._inject_claude_credentials(session)
             if not skip_preflight:
                 failures = await runner.preflight(list(self._cfg.preflight_checks))
                 if failures:
@@ -96,8 +116,9 @@ class AgentRunner:
             retries_left = self._cfg.timeout_retries
             while True:
                 try:
-                    output = await runner.work(request.role)
-                    return output
+                    return await runner.work(
+                        request.role, prompt_file, prompt_args or {}
+                    )
                 except AgentTimeoutError:
                     if retries_left <= 0:
                         raise
@@ -114,7 +135,7 @@ class AgentRunner:
         finally:
             status_display.remove(name)
             try:
-                runner.__exit__(None, None, None)
+                session.__exit__(None, None, None)
             except Exception:
                 pass
@@ -132,20 +153,21 @@ class AgentRunner:
         git_name = self._git_service.get_user_name()
         git_email = self._git_service.get_user_email()
+        session = self._build_session(mount_path)
         runner = ContainerRunner(
             name,
-            mount_path,
-            self._env,
-            docker_client=self._docker_client,
+            session,
             status_display=status_display,
             cfg=self._cfg,
         )
+        status_display.register(name, "agent", work_body=work_body)
         try:
             await runner.setup(git_name, git_email, work_body)
+            self._inject_claude_credentials(session)
             return await runner.preflight(list(self._cfg.preflight_checks))
         finally:
             status_display.remove(name)
             try:
-                runner.__exit__(None, None, None)
+                session.__exit__(None, None, None)
             except Exception:
                 pass

{pycastle-0.1.3.10.dev0 → pycastle-0.2.0.1.dev0}/src/pycastle/config/__init__.py RENAMED Viewed

@@ -2,6 +2,5 @@ from __future__ import annotations
 from pycastle._types import StageOverride
 from pycastle.config.loader import Config, load_config
-from pycastle.config.validator import validate_config
-__all__ = ["Config", "StageOverride", "load_config", "validate_config"]
+__all__ = ["Config", "StageOverride", "load_config"]

pycastle 0.1.3.10.dev0__tar.gz → 0.2.0.1.dev0__tar.gz

pycastle 0.1.3.10.dev0tar.gz → 0.2.0.1.dev0tar.gz