PyPI - spec-runner - Versions diffs - 2.3.0__tar.gz → 2.3.1__tar.gz - Mend

spec-runner 2.3.0tar.gz → 2.3.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (101) hide show

{spec_runner-2.3.0/src/spec_runner.egg-info → spec_runner-2.3.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: spec-runner
-Version: 2.3.0
+Version: 2.3.1
 Summary: Task automation from markdown specs via Claude CLI
 Author: Andrei
 License-Expression: MIT
@@ -134,7 +134,8 @@ Tasks are defined in `spec/tasks.md`:
 # Execution
 spec-runner run                            # Execute next ready task
 spec-runner run --task=TASK-001            # Execute specific task
-spec-runner run --all                      # Execute all ready tasks
+spec-runner run --all                      # Execute all ready tasks (resets failed→pending by default)
+spec-runner run --all --no-reset-failed    # Keep failed tasks sticky (skip the default reset)
 spec-runner run --all --hitl-review        # Interactive HITL approval gate
 spec-runner run --force                    # Skip lock check (stale lock)
 spec-runner run --tui                      # Execute with live TUI dashboard
@@ -375,6 +376,11 @@ paths:
 | llama-cli | Yes | `{cmd} -m {model} -p {prompt} --no-display-prompt` |
 | Custom | Use template | `{cmd} --prompt {prompt}` |
+> **Full pi-driven loop:** `pi` can run the entire dev → review → test cycle (with native
+> skills, per-stage tool control and a read-only review gate) using only config and a small
+> script — no core code. See [docs/pi-workflow.md](docs/pi-workflow.md) and the runnable
+> [examples/pi-loop/](examples/pi-loop/).
 ## Project Structure
 ```

{spec_runner-2.3.0 → spec_runner-2.3.1}/README.md RENAMED Viewed

@@ -99,7 +99,8 @@ Tasks are defined in `spec/tasks.md`:
 # Execution
 spec-runner run                            # Execute next ready task
 spec-runner run --task=TASK-001            # Execute specific task
-spec-runner run --all                      # Execute all ready tasks
+spec-runner run --all                      # Execute all ready tasks (resets failed→pending by default)
+spec-runner run --all --no-reset-failed    # Keep failed tasks sticky (skip the default reset)
 spec-runner run --all --hitl-review        # Interactive HITL approval gate
 spec-runner run --force                    # Skip lock check (stale lock)
 spec-runner run --tui                      # Execute with live TUI dashboard
@@ -340,6 +341,11 @@ paths:
 | llama-cli | Yes | `{cmd} -m {model} -p {prompt} --no-display-prompt` |
 | Custom | Use template | `{cmd} --prompt {prompt}` |
+> **Full pi-driven loop:** `pi` can run the entire dev → review → test cycle (with native
+> skills, per-stage tool control and a read-only review gate) using only config and a small
+> script — no core code. See [docs/pi-workflow.md](docs/pi-workflow.md) and the runnable
+> [examples/pi-loop/](examples/pi-loop/).
 ## Project Structure
 ```

{spec_runner-2.3.0 → spec_runner-2.3.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "spec-runner"
-version = "2.3.0"
+version = "2.3.1"
 description = "Task automation from markdown specs via Claude CLI"
 readme = "README.md"
 requires-python = ">=3.10"

{spec_runner-2.3.0 → spec_runner-2.3.1}/src/spec_runner/skills/spec-generator-skill/SKILL.md RENAMED Viewed

@@ -311,6 +311,10 @@ File templates are located in `templates/`:
 - `executor.py` — auto-execution via Claude CLI
 - `executor.config.yaml` — executor configuration
 - `Makefile.template` — Make targets for the project
+- `pi/` — full **dev → review → test loop on the `pi` coding agent** with no core code:
+  `pi/skills/{pi-implementer,pi-reviewer,pi-tester}/SKILL.md` (pi-native role skills) and
+  `pi/spec-runner.pi.config.yaml` (per-stage command templates: full tools for develop, a
+  read-only review gate). See `docs/pi-workflow.md` and `examples/pi-loop/` in the repo.
 ## Examples

spec_runner-2.3.1/src/spec_runner/skills/spec-generator-skill/templates/pi/skills/pi-implementer/SKILL.md ADDED Viewed

@@ -0,0 +1,37 @@
+---
+name: pi-implementer
+description: Implement a spec-runner task end-to-end with the pi coding agent — write the code AND its tests, run the tests via bash until green, then emit the spec-runner completion marker. Use during the development stage of the dev→review→test loop.
+---
+# Pi Implementer
+You are the **implementer** in a spec-runner dev→review→test loop. spec-runner hands you a
+single task (with its checklist, requirements and design refs). Your job is to make it real,
+verify it yourself, and report a machine-readable result.
+## Workflow
+1. **Understand the task.** Read the task description, checklist, and any referenced
+   `[REQ-XXX]` / `[DESIGN-XXX]`. Use `read`/`grep`/`find` to study the existing code and
+   match its conventions (naming, structure, style). Do not invent new patterns when one
+   already exists.
+2. **Implement.** Use `edit`/`write` to make the smallest change that satisfies the task.
+   Keep functions small and focused; add docstrings to public APIs; follow the project's
+   line-length and style rules.
+3. **Write tests in the same pass.** Every new behavior gets a test. Cover the happy path,
+   edge cases, and error paths. Put tests where the project keeps them (mirror existing
+   test layout).
+4. **Run the tests yourself with `bash`.** Run the project's test command (e.g.
+   `uv run pytest -q`). If anything fails, fix it and re-run. Do not stop while tests are
+   red. If you also have a linter/formatter, run it and fix what it flags.
+5. **Report.** End your final message with exactly one marker on its own line:
+   - `TASK_COMPLETE` — the change is implemented, tested, and the suite is green.
+   - `TASK_FAILED` — you could not complete the task (explain why on the line above).
+## Rules
+- Only touch code related to this task. No drive-by refactors.
+- Never leave the suite red and claim success — spec-runner re-runs the tests as a gate and
+  will fail the task anyway.
+- Prefer reusing existing helpers over writing new ones.
+- Put the marker on the **last line**, nothing after it.

spec_runner-2.3.1/src/spec_runner/skills/spec-generator-skill/templates/pi/skills/pi-reviewer/SKILL.md ADDED Viewed

@@ -0,0 +1,32 @@
+---
+name: pi-reviewer
+description: Read-only code review for a spec-runner task with the pi coding agent — inspect the diff and changed files, report findings, and emit a pass/fail verdict WITHOUT editing any code. Use during the review-gate stage of the dev→review→test loop.
+---
+# Pi Reviewer (read-only gate)
+You are the **reviewer** in a spec-runner dev→review→test loop. You run with a read-only tool
+set (`read`, `grep`, `find`) — you **cannot and must not** modify code. Your job is to judge
+the change and return a verdict spec-runner can act on.
+## Workflow
+1. **Read the change.** spec-runner gives you the task, the changed files, and the diff. Use
+   `read`/`grep`/`find` to inspect the touched code and its surroundings in context.
+2. **Review against this checklist:**
+   - **Correctness** — logic errors, off-by-one, null/None handling, wrong types.
+   - **Security** — injection, unsafe input handling, secrets in code.
+   - **Error handling** — missing or swallowed errors, unguarded edge cases.
+   - **Test coverage** — does the new behavior have tests? Are edge/error paths covered?
+   - **Task fit** — are all checklist items actually implemented?
+3. **Report findings** concisely: file + line + what's wrong + why it matters. If clean, say so.
+4. **Verdict.** End your final message with exactly one marker on its own line:
+   - `REVIEW_PASSED` — no blocking issues; the change is good to merge.
+   - `REVIEW_FAILED` — there are issues that need attention (list them above).
+## Rules
+- **Do not edit, write, or run code.** You are a gate, not a fixer. If something is wrong,
+  describe it and fail the review — the implementer will fix it on retry.
+- Be specific and actionable. "Looks fine" is not a review; "no issues found in X, Y, Z" is.
+- Put the verdict marker on the **last line**, nothing after it.

spec_runner-2.3.1/src/spec_runner/skills/spec-generator-skill/templates/pi/skills/pi-tester/SKILL.md ADDED Viewed

@@ -0,0 +1,28 @@
+---
+name: pi-tester
+description: Strengthen a module's test suite with the pi coding agent — add missing edge-case, error-path and regression tests, then run them to confirm they pass. Use for a standalone test-hardening step (script or plugin) outside the main task gate.
+---
+# Pi Tester
+You harden tests for a target module. This is a focused **testing** pass, not feature work —
+you add and improve tests, you do not change production behavior.
+## Workflow
+1. **Find the target.** You're given a module/area (often via `$1` / `$SR_TASK_ID`). Use
+   `read`/`grep`/`find` to locate it and its existing tests.
+2. **Find the gaps.** Look for untested branches, edge cases (empty/boundary inputs), error
+   paths, and recently changed behavior with no regression test.
+3. **Add tests** with `edit`/`write`, matching the project's test framework and layout. Keep
+   tests small, named clearly, and independent.
+4. **Run them with `bash`** (e.g. `uv run pytest -q`). Keep iterating until they pass. If a
+   new test reveals a real bug, report it clearly rather than weakening the test to pass.
+5. **Report** what you added and the final test result.
+## Rules
+- Do not change production code to make a test pass — if a test fails for a real reason, flag
+  it. (If invoked read-only, just report the gaps and proposed tests.)
+- Prefer extending existing test files over creating parallel ones.
+- Keep additions minimal and high-signal; no redundant or trivial assertions.

spec_runner-2.3.1/src/spec_runner/skills/spec-generator-skill/templates/pi/spec-runner.pi.config.yaml ADDED Viewed

@@ -0,0 +1,51 @@
+# spec-runner config for a full dev→review→test loop driven by the `pi` coding agent.
+#
+# Copy to your project root as `spec-runner.config.yaml` and adjust the model. The pi-native
+# skills referenced below ship alongside this file under `pi/skills/` — copy them to your
+# project as `.pi/skills/` (or point `--skill` at wherever you keep them).
+#
+# See docs/pi-workflow.md for the full explanation of how each stage maps to pi flags.
+executor:
+  # --- which agent backend ---
+  claude_command: "pi"
+  # IMPORTANT: set an explicit model your pi install is authenticated for — run
+  # `pi --list-models` to see them (ids look like "openai-codex/gpt-5.4", "google/gemini-...").
+  # With the templates below a blank model yields a bare `--model ` which shlex drops and pi
+  # errors. Either set a model here or remove `--model {model}` from the templates and rely on
+  # pi's own default (provider / PI_* env / ~/.pi/agent/settings.json).
+  claude_model: "openai-codex/gpt-5.4"
+  # --- DEVELOP stage: implementer with the full tool set + role skill ---
+  command_template: "{cmd} -p --model {model} --tools read,bash,edit,write,grep,find,ls --skill .pi/skills/pi-implementer {prompt}"
+  # --- REVIEW stage: read-only gate (no edit/write/bash) + reviewer skill ---
+  review_command: "pi"
+  review_model: "openai-codex/gpt-5.4"
+  review_command_template: "{cmd} -p --model {model} --tools read,grep,find --skill .pi/skills/pi-reviewer {prompt}"
+  # --- TEST stage: spec-runner runs the suite as the gate ---
+  run_tests_on_done: true
+  run_review: true
+  commands:
+    test: "uv run pytest -q"
+    lint: "uv run ruff check ."
+    lint_fix: "uv run ruff check . --fix"
+  # Roles are ALSO carried as personas. pi only puts skill *descriptions* in the system prompt
+  # and loads the full SKILL.md on demand — in headless `-p` runs that's not guaranteed, so the
+  # persona system_prompt (always injected into the prompt text by spec-runner) is the reliable
+  # belt-and-suspenders layer. The `.pi/skills/` skills add the reusable pi-native workflow.
+  personas:
+    implementer:
+      system_prompt: >-
+        You are the implementer. Implement the task and its tests, run the tests via bash
+        until green, then end with TASK_COMPLETE (or TASK_FAILED on the last line).
+    reviewer:
+      system_prompt: >-
+        You are a read-only reviewer. Inspect the diff, report findings, and DO NOT edit code.
+        End with REVIEW_PASSED or REVIEW_FAILED on the last line.
+    qa:
+      system_prompt: >-
+        You harden tests: add edge-case, error-path and regression tests, then run them.
+        Do not weaken production behavior to make tests pass.

spec_runner-2.3.1/src/spec_runner/skills/spec-generator-skill/templates/prompts/review.pi.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Code Review (read-only gate)
+Task: ${TASK_ID} — ${TASK_NAME}
+Files changed:
+${CHANGED_FILES}
+Diff:
+${GIT_DIFF}
+## Instructions
+You are a **read-only** reviewer. Inspect the change — do NOT edit, write, or run code.
+Review for:
+1. Bugs and logic errors
+2. Security issues
+3. Missing error handling
+4. Test coverage (does the new behavior have tests? are edge/error paths covered?)
+5. Task fit (are all checklist items implemented?)
+Report findings concisely (file + line + what's wrong + why). If clean, say what you checked.
+## Required Response Format
+You MUST end your response with exactly one of these status codes on a new line:
+```
+REVIEW_PASSED
+```
+Use this if the code looks good and has no blocking issues.
+```
+REVIEW_FAILED
+```
+Use this if there are issues that need attention. List them above; the implementer will fix
+them on retry — you do not fix anything yourself.
+Do not add any text after the status code.

{spec_runner-2.3.0 → spec_runner-2.3.1/src/spec_runner.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: spec-runner
-Version: 2.3.0
+Version: 2.3.1
 Summary: Task automation from markdown specs via Claude CLI
 Author: Andrei
 License-Expression: MIT
@@ -134,7 +134,8 @@ Tasks are defined in `spec/tasks.md`:
 # Execution
 spec-runner run                            # Execute next ready task
 spec-runner run --task=TASK-001            # Execute specific task
-spec-runner run --all                      # Execute all ready tasks
+spec-runner run --all                      # Execute all ready tasks (resets failed→pending by default)
+spec-runner run --all --no-reset-failed    # Keep failed tasks sticky (skip the default reset)
 spec-runner run --all --hitl-review        # Interactive HITL approval gate
 spec-runner run --force                    # Skip lock check (stale lock)
 spec-runner run --tui                      # Execute with live TUI dashboard
@@ -375,6 +376,11 @@ paths:
 | llama-cli | Yes | `{cmd} -m {model} -p {prompt} --no-display-prompt` |
 | Custom | Use template | `{cmd} --prompt {prompt}` |
+> **Full pi-driven loop:** `pi` can run the entire dev → review → test cycle (with native
+> skills, per-stage tool control and a read-only review gate) using only config and a small
+> script — no core code. See [docs/pi-workflow.md](docs/pi-workflow.md) and the runnable
+> [examples/pi-loop/](examples/pi-loop/).
 ## Project Structure
 ```

{spec_runner-2.3.0 → spec_runner-2.3.1}/src/spec_runner.egg-info/SOURCES.txt RENAMED Viewed

@@ -51,6 +51,10 @@ src/spec_runner/skills/spec-generator-skill/templates/requirements.template.md
 src/spec_runner/skills/spec-generator-skill/templates/task.py
 src/spec_runner/skills/spec-generator-skill/templates/tasks.template.md
 src/spec_runner/skills/spec-generator-skill/templates/workflow.template.md
+src/spec_runner/skills/spec-generator-skill/templates/pi/spec-runner.pi.config.yaml
+src/spec_runner/skills/spec-generator-skill/templates/pi/skills/pi-implementer/SKILL.md
+src/spec_runner/skills/spec-generator-skill/templates/pi/skills/pi-reviewer/SKILL.md
+src/spec_runner/skills/spec-generator-skill/templates/pi/skills/pi-tester/SKILL.md
 src/spec_runner/skills/spec-generator-skill/templates/prompts/review.claude.md
 src/spec_runner/skills/spec-generator-skill/templates/prompts/review.codex.md
 src/spec_runner/skills/spec-generator-skill/templates/prompts/review.llama.md

spec_runner-2.3.0/src/spec_runner/skills/spec-generator-skill/templates/prompts/review.pi.md DELETED Viewed

@@ -1,38 +0,0 @@
-# Code Review
-Task: ${TASK_ID} — ${TASK_NAME}
-Files changed:
-${CHANGED_FILES}
-Diff:
-${GIT_DIFF}
-## Instructions
-Review the code for:
-1. Bugs and errors
-2. Security issues
-3. Missing error handling
-4. Test coverage
-## Required Response Format
-You MUST end your response with exactly one of these status codes on a new line:
-```
-REVIEW_PASSED
-```
-Use this if the code looks good and has no issues.
-```
-REVIEW_FIXED
-```
-Use this if you found and fixed issues.
-```
-REVIEW_FAILED
-```
-Use this if there are issues that need manual attention.
-Do not add any text after the status code.