npm - @protolabsai/proto - Versions diffs - 0.21.0 → 0.23.0 - Mend

@protolabsai/proto 0.21.0 → 0.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +66 -1
package/bundled/harness-reference/SKILL.md +146 -0
package/bundled/qc-helper/docs/features/sub-agents.md +52 -6
package/bundled/sprint-contract/SKILL.md +62 -0
package/bundled/subagent-driven-development/SKILL.md +10 -0
package/cli.js +4942 -3868
package/package.json +2 -2

package/README.md CHANGED Viewed

@@ -262,9 +262,74 @@ A `MEMORY.md` index is auto-generated and loaded into the system prompt at the s
 After each conversation turn, a background extraction agent reviews recent messages and auto-creates memories for notable facts. This runs fire-and-forget with restricted tools (read/write/glob in the memory directory only).
+## Agent Harness
+proto includes a harness system that enforces quality gates, limits scope, and recovers from failures automatically.
+### Sprint Contract (Scope Lock)
+Prevents agents from modifying files outside an agreed scope. Before coding begins, negotiate a contract that defines exactly which files will be created or modified. The scope lock is armed — any write outside scope is rejected with a recovery message.
+**Workflow:**
+```bash
+proto
+/sprint-contract
+> Task: Refactor auth module
+> Files: src/auth.ts, src/utils.ts
+> Confirm
+```
+**Behavior:**
+- Write to `src/auth.ts` → ALLOWED
+- Write to `tests/foo.test.ts` → BLOCKED with scope violation message
+Contracts persist at `.proto/sprint-contract.json` and auto-restore on session resume.
+### Behavior Verification Gate
+Post-run smoke tests that verify changes actually work. After a subagent completes, the gate runs your defined scenarios (shell commands) in parallel. Failures inject a remediation message back to the agent for self-correction.
+**Setup** — create `.proto/verify-scenarios.json`:
+```json
+[
+  { "name": "tests pass", "command": "npm test -- --run", "timeoutMs": 60000 },
+  { "name": "build works", "command": "npm run build", "timeoutMs": 30000 },
+  { "name": "no TypeScript errors", "command": "npm run typecheck" }
+]
+```
+**Behavior:**
+1. Agent completes task, reports GOAL
+2. Gate fires, runs all scenarios in parallel
+3. If any fail → remediation message injected, agent self-corrects
+4. Gate fires again until all pass
+### Multi-Sample Retry
+When a subagent fails (ERROR, MAX_TURNS, or TIMEOUT), proto retries up to 2 more times with escalating temperatures (0.7 → 1.0 → 1.3). Each retry gets a `[RETRY CONTEXT]` block summarizing previous failures. Best result by score is returned.
+This reduces false negatives from single-run failures and gives the model multiple chances with different sampling strategies.
+### Repo Map
+PageRank-based file importance ranking. Analyzes the project's TypeScript/JS import graph to surface the most central files. Useful for understanding codebase structure or finding related files.
+**Usage:**
+```bash
+proto -p "Use the repo_map tool to find the most important files in this codebase"
+proto -p "Use repo_map with seedFiles=['src/auth.ts'] to find related files"
+```
+Results are cached at `.proto/repo-map-cache.json` and auto-invalidate on file changes.
 ## Skills
-proto ships with 16 bundled skills for agentic workflows:
+proto ships with 21 bundled skills for agentic workflows:
 - **brainstorming** — Structured ideation
 - **dispatching-parallel-agents** — Fan-out/fan-in subagent patterns

package/bundled/harness-reference/SKILL.md ADDED Viewed

@@ -0,0 +1,146 @@
+---
+name: harness-reference
+description: Reference guide for all agent harness safety features — doom loop detection, scope lock, git checkpoints, observation masking, sprint contract, reminders, repo map, behavior verification, and multi-sample retry
+---
+# Agent Harness Reference
+The proto harness is a set of safety and reliability features that wrap every agent execution. They fire automatically — you don't need to invoke them manually. This skill documents each feature so you can understand what's protecting you and how to configure it.
+## Features
+### Doom Loop Detection
+**What it does:** Detects when the agent is repeating the same tool call pattern in a sliding 20-call window. If the same fingerprint (tool + args hash) appears 3+ times, the harness injects a recovery message and records a Langfuse span.
+**You don't need to do anything.** The harness detects this automatically.
+### Scope Lock (Sprint Contract)
+**What it does:** Before coding, the `sprint-contract` skill negotiates an explicit contract — the set of files that may change. Once activated, any write outside that set is blocked with a structured error.
+**To activate:** Use the `sprint-contract` skill at the start of an implementation task. It writes `.proto/sprint-contract.json` and arms the in-memory scope lock. The lock is restored on session restart.
+**To check status:** If a write is blocked, the error message tells you the violating path and the permitted set.
+### Git Checkpoints
+**What it does:** Before every file-mutating tool call (`write_file`, `edit`, `replace`), the harness creates a shadow-repo commit. This lets you diff or roll back to any pre-edit state.
+**To roll back:** Use `git log` to find the checkpoint commit and `git checkout <hash> -- <file>` to restore.
+### Observation Masking
+**What it does:** When the context window gets large, the harness applies a rolling verbatim window — tool-call/result pairs older than the window are summarized as `[OBSERVATION_MASK: N pairs omitted]`. This keeps recent context intact while reducing token usage.
+**You don't need to do anything.** Fires automatically during LLM compaction.
+### Harness Reminders
+**What it does:** The harness injects periodic reminders into context based on three triggers:
+- Every 50 tool calls: warns about high tool usage
+- After 3 consecutive test failures: suggests pausing to diagnose
+- After 8 turns without any file write: suggests the agent may be over-analyzing
+**You don't need to do anything.** The harness injects these automatically.
+### Repo Map (`repo_map` tool)
+**What it does:** Analyzes the import graph of the codebase and runs PageRank to surface the most-connected (and most-relevant) files. Call it at the start of any exploration or implementation task for fast orientation.
+**To use:**
+```
+repo_map {}                          # globally most-connected files
+repo_map { seedFiles: ["/abs/path"] } # personalized from known-relevant files
+```
+Results are cached at `.proto/repo-map-cache.json` and invalidated on file changes.
+### Behavior Verification Gate
+**What it does:** After every subagent task that completes successfully, the harness runs user-configured "verification scenarios" — shell commands that check your feature actually works. Failures are injected back to the agent for self-correction.
+**To configure:** Create `.proto/verify-scenarios.json`:
+```json
+[
+  {
+    "name": "Unit tests pass",
+    "command": "npm test -- --run",
+    "timeoutMs": 60000
+  },
+  {
+    "name": "Build succeeds",
+    "command": "npm run build",
+    "timeoutMs": 30000
+  },
+  {
+    "name": "API health check",
+    "command": "curl -sf http://localhost:3000/health",
+    "expectedPattern": "ok",
+    "timeoutMs": 5000
+  }
+]
+```
+See `.proto/verify-scenarios.example.json` for a full reference.
+### Multi-Sample Retry (`multi_sample: true`)
+**What it does:** When a subagent fails (doom loop, error, or max turns exceeded), the harness automatically retries up to 2 more times with escalating temperatures (0.7 → 1.0 → 1.3) and injects the failure context into each retry prompt. The best result among all attempts is returned and scored.
+**Scoring:**
+- GOAL + behavior gate pass → 3 (perfect)
+- GOAL + no gate / gate pass → 3
+- GOAL + gate fail → 2 (completed but not verified)
+- MAX_TURNS / TIMEOUT → 1 (partial)
+- ERROR → 0 (failure)
+**To enable:** Set `multi_sample: true` on the Agent tool call:
+```
+Agent {
+  subagent_type: "general-purpose",
+  prompt: "implement the auth service",
+  multi_sample: true
+}
+```
+Use for complex tasks with a history of failure, not for simple searches.
+### Sprint Contract Service
+**What it does:** Manages the full sprint contract lifecycle — parse, activate scope lock, persist to disk, load on resume. See the `sprint-contract` skill for usage.
+**Files involved:**
+- `.proto/sprint-contract.json` — persisted contract (restored on session start)
+- `SprintContractService` — programmatic API
+## Langfuse Fine-Tuning Data
+All harness interventions emit OTel spans routed to Langfuse via OTLP → Tempo. To build fine-tuning datasets:
+1. In Langfuse > Traces, filter by span name = `harness.intervention`
+2. Use `harness.intervention.type` attribute to segment by type:
+   - `doom_loop` — recovery from loops
+   - `scope_violation` — scope lock enforcement
+   - `verification_failed` — post-edit and behavior gate failures
+   - `reminder.*` — context reminders
+3. Export matching traces → dataset items
+4. Annotate `harness.outcome` = `"recovered"` | `"not_recovered"`
+5. Train on (input_context, intervention_message) pairs where outcome = recovered
+## Configuration Summary
+| Feature                 | Config location                                  | Default                              |
+| ----------------------- | ------------------------------------------------ | ------------------------------------ |
+| Doom loop threshold     | Code constant (`DOOM_REPEAT_THRESHOLD = 3`)      | Always on                            |
+| Scope lock              | `.proto/sprint-contract.json`                    | Off until sprint-contract skill runs |
+| Behavior gate scenarios | `.proto/verify-scenarios.json`                   | No scenarios (off)                   |
+| Multi-sample retry      | `multi_sample: true` on Agent call               | Off (opt-in)                         |
+| Observation mask window | Code constant (`INCREMENTAL_PROTECTED_TAIL`)     | Always on                            |
+| Harness reminders       | Code constants (50 calls / 3 failures / 8 turns) | Always on                            |

package/bundled/qc-helper/docs/features/sub-agents.md CHANGED Viewed

@@ -125,12 +125,58 @@ When multiple agents share the same name, higher-priority location wins.
 Four agents are always available:
-| Agent             | Purpose                                              | Tools              |
-| ----------------- | ---------------------------------------------------- | ------------------ |
-| `general-purpose` | Complex multi-step tasks, code search                | All (except Agent) |
-| `Explore`         | Fast codebase search and analysis                    | Read-only          |
-| `verify`          | Review changes for correctness before finalizing     | Read-only          |
-| `coordinator`     | Orchestrate multi-agent work with task decomposition | All + Agent        |
+| Agent             | Purpose                                              | Tools               |
+| ----------------- | ---------------------------------------------------- | ------------------- |
+| `general-purpose` | Complex multi-step tasks, code search                | All (except Agent)  |
+| `Explore`         | Fast codebase search and analysis                    | Read-only + RepoMap |
+| `verify`          | Review changes for correctness before finalizing     | Read-only           |
+| `coordinator`     | Orchestrate multi-agent work with task decomposition | All + Agent         |
+The `Explore` and `Plan` agents use the `repo_map` tool automatically at the start of tasks on large codebases to orient themselves via import-graph PageRank before diving in. You can also call `repo_map` explicitly from any agent. See [Agent Harness — Repo map](../../developers/harness#repo-map) for details.
+## Multi-sample retry
+For high-stakes tasks where a single failed attempt is costly, set `multi_sample: true` on the Agent tool call. The harness will automatically retry up to 2 more times with escalating temperatures (0.7 → 1.0 → 1.3) if the first attempt fails, and return the best result.
+```json
+{
+  "subagent_type": "general-purpose",
+  "description": "Implement the auth service",
+  "prompt": "...",
+  "multi_sample": true
+}
+```
+Each retry includes a `[RETRY CONTEXT]` block summarizing what went wrong in the previous attempt. Attempts are scored (GOAL + verification pass = 3, GOAL = 3, partial = 1, error = 0) and the highest-scoring result is returned. When scores tie, the earlier (lower-temperature) attempt wins.
+Use multi-sample for complex implementation tasks, not for searches or read-only queries.
+See [Agent Harness — Multi-sample retry](../../developers/harness#multi-sample-retry) for the full scoring and temperature reference.
+## Behavior verification gate
+You can configure post-task verification scenarios that run automatically after a subagent completes successfully. If any scenario fails, the output is fed back to the agent so it can self-correct.
+Create `.proto/verify-scenarios.json` in your project root:
+```json
+[
+  {
+    "name": "Unit tests pass",
+    "command": "npm test -- --run",
+    "timeoutMs": 60000
+  },
+  {
+    "name": "Build succeeds",
+    "command": "npm run build",
+    "timeoutMs": 30000
+  }
+]
+```
+Scenarios run in parallel. Each has a `name`, a shell `command`, an optional `expectedPattern` (regex the stdout must match), and an optional `timeoutMs`. Exit code 0 is a pass when no pattern is specified.
+See [Agent Harness — Behavior verification gate](../../developers/harness#behavior-verification-gate) for the complete field reference.
 ## Background execution

package/bundled/sprint-contract/SKILL.md ADDED Viewed

@@ -0,0 +1,62 @@
+---
+name: sprint-contract
+description: Negotiate a sprint contract before coding — locks down exactly which files will be touched, what will change, and the acceptance criteria. Activates the scope lock to prevent scope creep.
+---
+# Sprint Contract
+Produce an explicit, machine-readable sprint contract before writing any code.
+The contract defines the permitted file set (scope lock), acceptance criteria,
+and a sequenced implementation plan.
+**Announce at start:** "I'm using the sprint-contract skill to negotiate the contract before coding."
+## Process
+1. **Read the task** — understand exactly what is being asked
+2. **Explore** — use fff**grep and fff**find_files to locate relevant files; read key files to understand current state
+3. **Identify the change surface** — determine the minimum set of files that must change
+4. **Produce the contract** — output a JSON contract (see format below)
+5. **Activate scope lock** — write the contract to `.proto/sprint-contract.json` so the harness can enforce the file set
+## Contract Format
+Output a JSON block with this exact structure:
+```json
+{
+  "task": "one-sentence description of what will be built",
+  "filesToCreate": ["/absolute/path/to/new/file.ts"],
+  "filesToModify": ["/absolute/path/to/existing/file.ts"],
+  "functionsToChange": {
+    "/absolute/path/to/file.ts": ["functionName", "ClassName.methodName"]
+  },
+  "acceptanceCriteria": [
+    "The X test passes",
+    "Feature Y is accessible via Z",
+    "No existing tests are broken"
+  ],
+  "implementationSequence": [
+    "1. Add type definitions to types.ts",
+    "2. Implement service in service.ts",
+    "3. Wire into existing call site in client.ts",
+    "4. Add tests"
+  ],
+  "risks": ["Changing X may affect Y — verify after"]
+}
+```
+## Rules
+- **Minimize scope**: only include files that genuinely need to change
+- **Absolute paths**: all file paths must be absolute
+- **No speculation**: only include files you have verified exist (via read or search)
+- **Testable criteria**: each acceptance criterion must be objectively verifiable
+- **Sequenced implementation**: order steps to minimize breakage (types → impl → tests)
+## After the Contract
+Write the JSON to `.proto/sprint-contract.json` in the project root.
+Then report: "Sprint contract negotiated. Scope lock activated for N files."
+The harness will automatically prevent edits to files outside the contract's file set.

package/bundled/subagent-driven-development/SKILL.md CHANGED Viewed

@@ -120,6 +120,16 @@ Implementer subagents report one of four statuses. Handle each appropriately:
 **Never** ignore an escalation or force the same model to retry without changes. If the implementer said it's stuck, something needs to change.
+## Harness Features
+The harness provides automatic safety nets you can leverage when dispatching implementers:
+**Multi-sample retry** (`multi_sample: true`): For complex or high-risk tasks, set this on the Agent tool call. If the implementer fails (doom loop, error, max turns), the harness automatically retries up to 2 more times with escalating temperatures (0.7 → 1.0 → 1.3) and injects the failure context into each retry prompt. Returns the best result. Use for tasks that have previously failed or that touch many files.
+**Behavior verification gate**: If `.proto/verify-scenarios.json` exists in the project, the harness runs those scenarios after every successful implementer completion. Failures are injected back to the model for self-correction. Add scenarios for smoke tests, build checks, and HTTP health checks.
+**Sprint contract scope lock**: If the implementing agent was given a sprint contract (via the `sprint-contract` skill), the scope lock prevents it from writing files outside the agreed set. Any violation is blocked and reported.
 ## Prompt Templates
 - `./implementer-prompt.md` - Dispatch implementer subagent