npm - agent-bober - Versions diffs - 0.4.2 → 0.5.0 - Mend

agent-bober 0.4.2 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md +60 -4
package/agents/bober-evaluator.md +84 -8
package/agents/bober-generator.md +102 -0
package/agents/bober-planner.md +24 -0
package/dist/cli/commands/init.js +1 -0
package/dist/cli/commands/init.js.map +1 -1
package/dist/evaluators/builtin/playwright.d.ts +11 -0
package/dist/evaluators/builtin/playwright.d.ts.map +1 -1
package/dist/evaluators/builtin/playwright.js +259 -12
package/dist/evaluators/builtin/playwright.js.map +1 -1
package/package.json +1 -1
package/skills/bober.eval/SKILL.md +145 -148
package/skills/bober.playwright/SKILL.md +429 -0
package/skills/bober.playwright/references/playwright-patterns.md +377 -0
package/skills/bober.run/SKILL.md +433 -119
package/skills/bober.sprint/SKILL.md +147 -57
package/templates/presets/nextjs/bober.config.json +2 -1

package/skills/bober.sprint/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: bober.sprint
-description: Execute the next pending sprint — negotiate contracts, run the Generator, evaluate output, and iterate until passing or exhausting retries.
+description: Execute the next pending sprint — spawn generator and evaluator subagents, orchestrate the retry loop until passing or exhausting retries.
 argument-hint: "[sprint-number]"
 handoffs:
   - label: "Evaluate Sprint"
@@ -11,9 +11,11 @@ handoffs:
     prompt: "Execute the next sprint"
 ---
-# bober.sprint — Sprint Execution Skill
+# bober.sprint — Sprint Execution Orchestrator
-You are running the **bober.sprint** skill. Your job is to execute a single sprint from an existing plan through the full Generator-Evaluator loop: negotiate the contract, generate the implementation, evaluate the output, and iterate until the sprint passes or retries are exhausted.
+You are the **orchestrator** for a single sprint cycle. You do NOT implement code or evaluate it yourself. You spawn subagents using the **Agent tool** for both the generator (implementation) and evaluator (verification) roles, and you manage the retry loop between them.
+Each subagent runs in its own isolated context window. It receives ONLY the information you explicitly pass in its prompt. After a subagent completes, you read the files it created on disk to get the full results.
 ## Prerequisites
@@ -22,7 +24,9 @@ Before starting, verify these exist:
 - At least one PlanSpec in `.bober/specs/`
 - At least one SprintContract in `.bober/contracts/`
-If any are missing, tell the user to run `/bober:plan` first.
+If any are missing, tell the user to run `/bober-plan` first.
+Also read `.bober/principles.md` if it exists. You will include the principles text in every subagent prompt.
 ## Step 1: Identify the Target Sprint
@@ -71,9 +75,9 @@ When a contract status is `proposed`, it has not yet been reviewed for executabi
    {"event":"sprint-started","contractId":"...","specId":"...","timestamp":"..."}
    ```
-## Step 3: Create Context Handoff
+## Step 3: Build the Context Handoff
-Create a ContextHandoff document for the Generator. This document is the ONLY context the Generator receives -- it must be self-contained.
+Build a ContextHandoff document for the Generator. This document is the ONLY context the Generator subagent receives — it must be self-contained.
 **ContextHandoff structure:**
 ```json
@@ -103,6 +107,7 @@ Create a ContextHandoff document for the Generator. This document is the ONLY co
     "commands": { "<commands section from bober.config.json>" },
     "generator": { "<generator section from bober.config.json>" }
   },
+  "principles": "<full text of .bober/principles.md or null>",
   "evaluatorFeedback": null
 }
 ```
@@ -129,9 +134,7 @@ Save the handoff to `.bober/handoffs/<handoffId>.json`.
 }
 ```
-## Step 4: Spawn the Generator
-Invoke the `bober-generator` subagent with the handoff document.
+## Step 4: Spawn the Generator Subagent
 **Before spawning:**
 1. Ensure the correct git branch exists and is checked out:
@@ -140,48 +143,132 @@ Invoke the `bober-generator` subagent with the handoff document.
    ```
 2. If this is a retry, the Generator should be on the same branch with the previous attempt's code still present.
-**Spawn the Generator:**
-Use the `bober-generator` agent definition. Pass it the handoff file path.
+**Use the Agent tool to spawn the generator:**
-**After the Generator completes:**
-1. Read the Generator's completion report
-2. Verify the Generator committed its changes (check `git log`)
-3. Proceed to evaluation
+```
+Agent tool call:
+  description: "Sprint <N>: <sprint title>"
+  prompt: <the full prompt below>
+```
-## Step 5: Spawn the Evaluator
+**Generator prompt:**
-Create an Evaluator handoff document:
+```
+You are the Bober Generator subagent. You have been spawned by the orchestrator to implement a sprint.
+## Context Handoff
+<paste the FULL handoff JSON>
+## Instructions
+1. Read the SprintContract at .bober/contracts/<contractId>.json
+2. Read the PlanSpec at .bober/specs/<specId>.json for broader context
+3. Read bober.config.json for commands configuration
+4. Read .bober/principles.md if it exists — adhere to all principles strictly
+5. Read the files listed in the contract's estimatedFiles
+6. Implement the sprint according to the contract's success criteria
+7. Self-verify: run build, typecheck, lint, and test commands
+8. Commit your changes (format: "bober(<sprint-N>): <description>")
+9. Work on the feature branch, never on main/master
+<IF iteration > 1>
+## IMPORTANT — This is a RETRY (iteration <N>)
+The previous attempt failed evaluation. Here is the evaluator's feedback:
+<paste evaluator feedback JSON>
+Focus on fixing the specific failures listed above.
+</IF>
+## Your Response
+When done, respond with EXACTLY this JSON structure (no other text):
+{
+  "contractId": "<contract ID>",
+  "status": "complete | partial | blocked",
+  "criteriaResults": [
+    {"criterionId": "sc-X-Y", "met": true/false, "evidence": "<evidence>"}
+  ],
+  "filesChanged": [
+    {"path": "<path>", "action": "created | modified | deleted", "description": "<what>"}
+  ],
+  "testsAdded": ["<test files>"],
+  "commits": ["<hash> - <message>"],
+  "blockers": ["<issues>"],
+  "notes": "<context for evaluator>"
+}
+```
-```json
+**After the Generator subagent returns:**
+1. Parse the generator's response to extract the completion report.
+2. Verify commits were made: `git log --oneline -5`
+3. Save the generator report to `.bober/handoffs/gen-report-<contractId>-<iteration>.json`
+4. If the generator subagent crashed or returned an error, mark the sprint as `needs-rework` with note "Generator subagent failed".
+## Step 5: Spawn the Evaluator Subagent
+**Use the Agent tool to spawn the evaluator:**
+```
+Agent tool call:
+  description: "Evaluate sprint <N>: <sprint title>"
+  prompt: <the full prompt below>
+```
+**Evaluator prompt:**
+```
+You are the Bober Evaluator subagent. You have been spawned by the orchestrator to evaluate a sprint.
+## Sprint Contract
+<paste the full SprintContract JSON>
+## Generator's Completion Report
+<paste the generator's completion report JSON>
+## Project Configuration
+<paste relevant sections: commands, evaluator config>
+## Project Principles
+<paste full text of .bober/principles.md or "No principles file found.">
+## Context
+- Contract ID: <contractId>
+- Spec ID: <specId>
+- Sprint: <N> of <total>
+- Iteration: <N>
+- Branch: <current branch>
+## Instructions
+1. Read the SprintContract at .bober/contracts/<contractId>.json
+2. Read bober.config.json for configured eval strategies and commands
+3. Run each configured evaluation strategy using the commands from config
+4. Verify EVERY success criterion one by one
+5. Check for regressions
+6. Check adherence to project principles
+7. Produce a structured EvalResult
+IMPORTANT: You do NOT have Write or Edit tools. Output the EvalResult JSON in your response.
+## Your Response
+Respond with EXACTLY this JSON structure (no other text):
 {
-  "handoffId": "handoff-<contractId>-eval-<iteration>",
-  "type": "to-evaluator",
+  "evalId": "eval-<contractId>-<iteration>",
   "contractId": "<contract ID>",
   "specId": "<spec ID>",
   "timestamp": "<ISO-8601>",
-  "iteration": 1,
-  "context": {
-    "generatorReport": { "<Generator's completion report>" },
-    "changedFiles": ["<files the generator reports changing>"],
-    "branch": "<current branch>"
-  },
-  "contract": { "<full SprintContract object>" },
-  "config": {
-    "commands": { "<commands section from bober.config.json>" },
-    "evaluator": { "<evaluator section from bober.config.json>" }
-  }
+  "iteration": <N>,
+  "overallResult": "pass | fail",
+  "score": { "criteriaTotal": N, "criteriaPassed": N, "criteriaFailed": N, "criteriaSkipped": N, "requiredPassed": N, "requiredFailed": N, "requiredTotal": N },
+  "strategyResults": [ {"strategy": "<type>", "required": true/false, "result": "pass|fail|skipped", "output": "<output>", "details": "<details>"} ],
+  "criteriaResults": [ {"criterionId": "sc-X-Y", "description": "<desc>", "required": true/false, "result": "pass|fail|skipped", "evidence": "<evidence>", "feedback": "<if failed>"} ],
+  "regressions": [],
+  "generatorFeedback": [],
+  "summary": "<2-3 sentence summary>"
 }
 ```
-Save the handoff to `.bober/handoffs/<handoffId>.json`.
-**Spawn the Evaluator:**
-Use the `bober-evaluator` agent definition. Pass it the handoff file path.
-**After the Evaluator completes:**
-1. Read the EvalResult
-2. Save the EvalResult to `.bober/eval-results/` if the evaluator could not (it lacks Write tools)
-3. Determine pass/fail
+**After the Evaluator subagent returns:**
+1. Parse the evaluator's response to extract the EvalResult.
+2. Save the EvalResult to `.bober/eval-results/eval-<contractId>-<iteration>.json` (the evaluator cannot write files).
+3. Determine pass/fail from the `overallResult` field.
 ## Step 6: Process Evaluation Result
@@ -203,7 +290,7 @@ Use the `bober-evaluator` agent definition. Pass it the handoff file path.
 4. **Report success to the user:**
    ```
-   Sprint <N> PASSED on iteration <M>.
+   === Sprint <N> PASSED on iteration <M> ===
    Completed: <sprint title>
    Key results:
@@ -211,7 +298,7 @@ Use the `bober-evaluator` agent definition. Pass it the handoff file path.
    - <criterion 2>: PASS
    ...
-   Next sprint: <next sprint title> (run /bober.sprint to continue)
+   Next sprint: <next sprint title> (run /bober-sprint to continue)
    ```
 ### If the sprint FAILS and retries remain:
@@ -227,14 +314,15 @@ Check `evaluator.maxIterations` from `bober.config.json` (default: 3). If the cu
 3. **Report the retry to the user:**
    ```
-   Sprint <N> iteration <M> FAILED. <X> of <Y> criteria not met.
+   === Sprint <N> iteration <M> FAILED ===
+   <X> of <Y> criteria not met.
    Retrying (iteration <M+1> of <maxIterations>)...
    Failed criteria:
    - <criterion>: <brief reason>
    ```
-4. **Go to Step 4** (spawn Generator again with feedback)
+4. **Go to Step 4** (spawn a FRESH Generator subagent with feedback)
 ### If the sprint FAILS and no retries remain:
@@ -253,7 +341,7 @@ Check `evaluator.maxIterations` from `bober.config.json` (default: 3). If the cu
 4. **Report failure to the user with full context:**
    ```
-   Sprint <N> FAILED after <maxIterations> iterations.
+   === Sprint <N> FAILED after <maxIterations> iterations ===
    Contract: <contract title>
    Failed criteria:
@@ -265,8 +353,8 @@ Check `evaluator.maxIterations` from `bober.config.json` (default: 3). If the cu
    Recommended actions:
    - Review the failed criteria and evaluator feedback
    - Consider simplifying the sprint scope
-   - Run /bober.sprint <N> to retry from scratch
-   - Run /bober.plan to revise the plan
+   - Run /bober-sprint <N> to retry from scratch
+   - Run /bober-plan to revise the plan
    ```
 ## Step 7: Context Reset
@@ -274,20 +362,22 @@ Check `evaluator.maxIterations` from `bober.config.json` (default: 3). If the cu
 After a sprint completes (pass or fail), manage context:
 Read `pipeline.contextReset` from config:
-- `always`: Context is fully reset between sprints. The next sprint starts fresh with only the handoff document.
-- `on-threshold`: Context resets only if the conversation is getting long. Not applicable in single-sprint skill execution.
-- `never`: Context carries forward. Not recommended.
-## Next Steps
-After completing this phase, suggest the following next steps to the user:
-- `/bober-eval` — Evaluate the current sprint output independently
-- `/bober-sprint` — Execute the next sprint in the plan
+- `always`: Each subagent already gets a fresh context. This is the default behavior.
+- `on-threshold`: Same as `always` with subagent architecture.
+- `never`: Include richer context summaries in the handoff for the next sprint.
 ## Error Handling
+- **Subagent crash/timeout:** If the Agent tool call fails, log the error. Do not let it crash the orchestration. Mark the sprint appropriately and report to the user.
+- **Subagent returns malformed response:** Read files on disk as the source of truth. The subagent may have saved files correctly even if its text response was garbled.
 - **Generator fails to produce any output:** Mark sprint as `needs-rework` with note "Generator produced no output"
 - **Evaluator cannot run strategies:** Report which strategies failed to execute and why. If a required strategy cannot run, mark sprint as `needs-rework` with a configuration issue note.
 - **Git conflicts:** Report the conflict to the user. Do not auto-resolve.
-- **Build broken before sprint started:** Verify the build passes BEFORE starting the Generator. If the build is already broken, report this and do not proceed.
+- **Build broken before sprint started:** Verify the build passes BEFORE spawning the Generator. If the build is already broken, report this and do not proceed.
 - **Missing dependencies:** If `npm install` or equivalent has not been run, run it before starting.
+## Next Steps
+After completing this phase, suggest the following next steps to the user:
+- `/bober-eval` — Evaluate the current sprint output independently
+- `/bober-sprint` — Execute the next sprint in the plan

package/templates/presets/nextjs/bober.config.json CHANGED Viewed

@@ -6,7 +6,8 @@
     { "type": "typecheck", "required": true },
     { "type": "lint", "required": true },
     { "type": "build", "required": true },
-    { "type": "unit-test", "required": true }
+    { "type": "unit-test", "required": true },
+    { "type": "playwright", "required": false }
   ], "maxIterations": 3 },
   "sprint": { "maxSprints": 10, "requireContracts": true, "sprintSize": "medium" },
   "pipeline": { "maxIterations": 20, "requireApproval": false, "contextReset": "always" },