@sienklogic/plan-build-run 2.22.2 → 2.24.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (90) hide show
  1. package/CHANGELOG.md +42 -0
  2. package/dashboard/package.json +3 -2
  3. package/dashboard/src/middleware/errorHandler.js +12 -2
  4. package/dashboard/src/repositories/planning.repository.js +24 -12
  5. package/dashboard/src/routes/pages.routes.js +182 -4
  6. package/dashboard/src/server.js +4 -0
  7. package/dashboard/src/services/audit.service.js +42 -0
  8. package/dashboard/src/services/dashboard.service.js +1 -12
  9. package/dashboard/src/services/local-llm-metrics.service.js +81 -0
  10. package/dashboard/src/services/quick.service.js +62 -0
  11. package/dashboard/src/services/roadmap.service.js +1 -11
  12. package/dashboard/src/utils/strip-bom.js +8 -0
  13. package/dashboard/src/views/audit-detail.ejs +5 -0
  14. package/dashboard/src/views/audits.ejs +5 -0
  15. package/dashboard/src/views/partials/analytics-content.ejs +61 -0
  16. package/dashboard/src/views/partials/audit-detail-content.ejs +12 -0
  17. package/dashboard/src/views/partials/audits-content.ejs +34 -0
  18. package/dashboard/src/views/partials/quick-content.ejs +40 -0
  19. package/dashboard/src/views/partials/quick-detail-content.ejs +29 -0
  20. package/dashboard/src/views/partials/sidebar.ejs +16 -0
  21. package/dashboard/src/views/partials/todos-content.ejs +13 -3
  22. package/dashboard/src/views/quick-detail.ejs +5 -0
  23. package/dashboard/src/views/quick.ejs +5 -0
  24. package/package.json +1 -1
  25. package/plugins/copilot-pbr/agents/debugger.agent.md +15 -0
  26. package/plugins/copilot-pbr/agents/integration-checker.agent.md +9 -2
  27. package/plugins/copilot-pbr/agents/planner.agent.md +19 -0
  28. package/plugins/copilot-pbr/agents/researcher.agent.md +20 -0
  29. package/plugins/copilot-pbr/agents/synthesizer.agent.md +12 -0
  30. package/plugins/copilot-pbr/agents/verifier.agent.md +22 -2
  31. package/plugins/copilot-pbr/plugin.json +1 -1
  32. package/plugins/copilot-pbr/references/config-reference.md +89 -0
  33. package/plugins/copilot-pbr/references/plan-format.md +22 -0
  34. package/plugins/copilot-pbr/skills/health/SKILL.md +8 -1
  35. package/plugins/copilot-pbr/skills/help/SKILL.md +4 -4
  36. package/plugins/copilot-pbr/skills/milestone/SKILL.md +12 -12
  37. package/plugins/copilot-pbr/skills/status/SKILL.md +37 -1
  38. package/plugins/copilot-pbr/templates/INTEGRATION-REPORT.md.tmpl +18 -2
  39. package/plugins/copilot-pbr/templates/VERIFICATION-DETAIL.md.tmpl +2 -1
  40. package/plugins/cursor-pbr/.cursor-plugin/plugin.json +1 -1
  41. package/plugins/cursor-pbr/agents/debugger.md +15 -0
  42. package/plugins/cursor-pbr/agents/integration-checker.md +9 -2
  43. package/plugins/cursor-pbr/agents/planner.md +19 -0
  44. package/plugins/cursor-pbr/agents/researcher.md +20 -0
  45. package/plugins/cursor-pbr/agents/synthesizer.md +12 -0
  46. package/plugins/cursor-pbr/agents/verifier.md +22 -2
  47. package/plugins/cursor-pbr/references/config-reference.md +89 -0
  48. package/plugins/cursor-pbr/references/plan-format.md +22 -0
  49. package/plugins/cursor-pbr/skills/health/SKILL.md +8 -1
  50. package/plugins/cursor-pbr/skills/help/SKILL.md +4 -4
  51. package/plugins/cursor-pbr/skills/milestone/SKILL.md +12 -12
  52. package/plugins/cursor-pbr/skills/status/SKILL.md +37 -1
  53. package/plugins/cursor-pbr/templates/INTEGRATION-REPORT.md.tmpl +18 -2
  54. package/plugins/cursor-pbr/templates/VERIFICATION-DETAIL.md.tmpl +2 -1
  55. package/plugins/pbr/.claude-plugin/plugin.json +1 -1
  56. package/plugins/pbr/agents/debugger.md +15 -0
  57. package/plugins/pbr/agents/integration-checker.md +9 -2
  58. package/plugins/pbr/agents/planner.md +19 -0
  59. package/plugins/pbr/agents/researcher.md +20 -0
  60. package/plugins/pbr/agents/synthesizer.md +12 -0
  61. package/plugins/pbr/agents/verifier.md +22 -2
  62. package/plugins/pbr/references/config-reference.md +89 -0
  63. package/plugins/pbr/references/plan-format.md +22 -0
  64. package/plugins/pbr/scripts/check-config-change.js +33 -0
  65. package/plugins/pbr/scripts/check-plan-format.js +52 -4
  66. package/plugins/pbr/scripts/check-subagent-output.js +43 -3
  67. package/plugins/pbr/scripts/config-schema.json +48 -0
  68. package/plugins/pbr/scripts/local-llm/client.js +214 -0
  69. package/plugins/pbr/scripts/local-llm/health.js +217 -0
  70. package/plugins/pbr/scripts/local-llm/metrics.js +252 -0
  71. package/plugins/pbr/scripts/local-llm/operations/classify-artifact.js +76 -0
  72. package/plugins/pbr/scripts/local-llm/operations/classify-error.js +75 -0
  73. package/plugins/pbr/scripts/local-llm/operations/score-source.js +72 -0
  74. package/plugins/pbr/scripts/local-llm/operations/summarize-context.js +62 -0
  75. package/plugins/pbr/scripts/local-llm/operations/validate-task.js +59 -0
  76. package/plugins/pbr/scripts/local-llm/router.js +101 -0
  77. package/plugins/pbr/scripts/local-llm/shadow.js +60 -0
  78. package/plugins/pbr/scripts/local-llm/threshold-tuner.js +118 -0
  79. package/plugins/pbr/scripts/pbr-tools.js +120 -3
  80. package/plugins/pbr/scripts/post-write-dispatch.js +2 -2
  81. package/plugins/pbr/scripts/progress-tracker.js +29 -3
  82. package/plugins/pbr/scripts/session-cleanup.js +36 -1
  83. package/plugins/pbr/scripts/validate-task.js +30 -1
  84. package/plugins/pbr/skills/health/SKILL.md +8 -1
  85. package/plugins/pbr/skills/help/SKILL.md +4 -4
  86. package/plugins/pbr/skills/milestone/SKILL.md +12 -12
  87. package/plugins/pbr/skills/status/SKILL.md +38 -2
  88. package/plugins/pbr/templates/INTEGRATION-REPORT.md.tmpl +18 -2
  89. package/plugins/pbr/templates/VERIFICATION-DETAIL.md.tmpl +2 -1
  90. package/dashboard/src/views/coming-soon.ejs +0 -11
@@ -173,8 +173,8 @@ Start a new milestone cycle with new phases.
173
173
 
174
174
 
175
175
  ╔══════════════════════════════════════════════════════════════╗
176
- ║ ▶ NEXT UP ║
177
- ╚══════════════════════════════════════════════════════════════╝
176
+ ║ ▶ NEXT UP ║
177
+ ╚══════════════════════════════════════════════════════════════╝
178
178
 
179
179
  **Phase {N}: {name}** — start with discussion or planning
180
180
 
@@ -442,8 +442,8 @@ Archive a completed milestone and prepare for the next one.
442
442
 
443
443
 
444
444
  ╔══════════════════════════════════════════════════════════════╗
445
- ║ ▶ NEXT UP ║
446
- ╚══════════════════════════════════════════════════════════════╝
445
+ ║ ▶ NEXT UP ║
446
+ ╚══════════════════════════════════════════════════════════════╝
447
447
 
448
448
  **Start the next milestone** — plan new features
449
449
 
@@ -531,8 +531,8 @@ Verify milestone completion with cross-phase integration checks.
531
531
 
532
532
 
533
533
  ╔══════════════════════════════════════════════════════════════╗
534
- ║ ▶ NEXT UP ║
535
- ╚══════════════════════════════════════════════════════════════╝
534
+ ║ ▶ NEXT UP ║
535
+ ╚══════════════════════════════════════════════════════════════╝
536
536
 
537
537
  **Complete the milestone** — archive and tag
538
538
 
@@ -562,8 +562,8 @@ Verify milestone completion with cross-phase integration checks.
562
562
 
563
563
 
564
564
  ╔══════════════════════════════════════════════════════════════╗
565
- ║ ▶ NEXT UP ║
566
- ╚══════════════════════════════════════════════════════════════╝
565
+ ║ ▶ NEXT UP ║
566
+ ╚══════════════════════════════════════════════════════════════╝
567
567
 
568
568
  **Close the gaps** — create fix phases
569
569
 
@@ -591,8 +591,8 @@ Verify milestone completion with cross-phase integration checks.
591
591
 
592
592
 
593
593
  ╔══════════════════════════════════════════════════════════════╗
594
- ║ ▶ NEXT UP ║
595
- ╚══════════════════════════════════════════════════════════════╝
594
+ ║ ▶ NEXT UP ║
595
+ ╚══════════════════════════════════════════════════════════════╝
596
596
 
597
597
  **Address tech debt or proceed**
598
598
 
@@ -695,8 +695,8 @@ Create phases to close gaps found during an audit.
695
695
 
696
696
 
697
697
  ╔══════════════════════════════════════════════════════════════╗
698
- ║ ▶ NEXT UP ║
699
- ╚══════════════════════════════════════════════════════════════╝
698
+ ║ ▶ NEXT UP ║
699
+ ╚══════════════════════════════════════════════════════════════╝
700
700
 
701
701
  **Plan the first gap-closure phase**
702
702
 
@@ -68,6 +68,31 @@ Read the following files (skip any that don't exist):
68
68
  5. **`.planning/REQUIREMENTS.md`** — Requirements (if exists)
69
69
  - Extract: requirement completion status if tracked
70
70
 
71
+ ### Step 1b: Read Local LLM Stats (advisory — skip on any error)
72
+
73
+ After loading config.json, check `local_llm.enabled`. If `true`:
74
+
75
+ ```bash
76
+ node ${PLUGIN_ROOT}/scripts/pbr-tools.js llm status
77
+ node ${PLUGIN_ROOT}/scripts/pbr-tools.js llm metrics
78
+ ```
79
+
80
+ Parse both JSON responses. Capture:
81
+
82
+ - `status.model` — model name
83
+ - `metrics.total_calls` — lifetime total calls
84
+ - `metrics.tokens_saved` — lifetime frontier tokens saved
85
+ - `metrics.cost_saved_usd` — lifetime cost estimate
86
+ - `metrics.avg_latency_ms` — lifetime average latency
87
+
88
+ Also run session-scoped metrics if `.planning/.session-start` exists:
89
+
90
+ ```bash
91
+ node ${PLUGIN_ROOT}/scripts/pbr-tools.js llm metrics --session <content-of-.session-start>
92
+ ```
93
+
94
+ If `local_llm.enabled` is `false` or commands fail, skip this step silently.
95
+
71
96
  ### Step 2: Scan Phase Directories
72
97
 
73
98
  For each phase listed in ROADMAP.md:
@@ -191,8 +216,18 @@ Todos: {count} pending. Run `/pbr:todo list` to see them.
191
216
 
192
217
  {If notes exist:}
193
218
  Notes: {count} quick capture(s). `/pbr:note list` to review.
219
+
220
+ {If local_llm.enabled AND total_calls > 0:}
221
+ Local LLM: enabled ({model}, avg {avg_ms}ms)
222
+ This session: {session_calls} calls, ~{session_tokens} frontier tokens saved
223
+ Lifetime: {total_calls} calls, ~{tokens_saved} tokens saved (~{cost_str} at $3/M)
224
+
225
+ {If local_llm.enabled AND total_calls == 0:}
226
+ Local LLM: enabled ({model}) — no calls yet this session
194
227
  ```
195
228
 
229
+ The Local LLM block is **advisory only** — it never affects the routing decision or Next Up suggestion.
230
+
196
231
  ### Progress Bar
197
232
 
198
233
  Generate a 20-character progress bar:
@@ -342,9 +377,10 @@ This skill should be fast. It's a status check, not an analysis.
342
377
  - Cache nothing (always read fresh state)
343
378
 
344
379
  **DO NOT:**
380
+
345
381
  - Read full SUMMARY.md contents (frontmatter is enough)
346
382
  - Read plan file contents (just check existence)
347
- - Run any Bash commands
383
+ - Run Bash commands except for Step 1b (2-3 `pbr-tools` calls only when `local_llm.enabled: true`, skipped entirely otherwise)
348
384
  - Modify any files
349
385
  - Invoke any agents
350
386
 
@@ -112,7 +112,22 @@ Phase 03 (Core) ──provides──→ Phase 04 (Frontend)
112
112
  ### Flow 2: {Flow Name} - {STATUS}
113
113
  ...
114
114
 
115
- ## 5. Integration Issues Summary
115
+ ## 5. Data-Flow Propagation
116
+
117
+ ### Cross-Boundary Data Flows
118
+
119
+ | Data Field | Source | Intermediate Steps | Destination | Status |
120
+ |------------|--------|-------------------|-------------|--------|
121
+ | {field name} | {origin, e.g., hook stdin `data.session_id`} | {module1:L12 → module2:L45} | {dest, e.g., metrics.jsonl `session_id`} | PROPAGATED |
122
+ | {field name} | {origin} | {module1:L12 → module2:L45} | {dest} | DATA_DROPPED |
123
+
124
+ ### Data-Flow Issues
125
+
126
+ | Field | Dropped At | Available In Scope | Passed Instead | Fix |
127
+ |-------|-----------|-------------------|----------------|-----|
128
+ | {field} | {file:line} | `data.session_id` | `undefined` | Pass `data.session_id` |
129
+
130
+ ## 6. Integration Issues Summary
116
131
 
117
132
  ### Critical Issues (system cannot function)
118
133
 
@@ -131,7 +146,7 @@ Phase 03 (Core) ──provides──→ Phase 04 (Frontend)
131
146
  1. **{Issue}**: {description}
132
147
  - Fix: {recommended action}
133
148
 
134
- ## 6. Integration Score
149
+ ## 7. Integration Score
135
150
 
136
151
  | Category | Items Checked | Passed | Failed | Score |
137
152
  |----------|--------------|--------|--------|-------|
@@ -139,6 +154,7 @@ Phase 03 (Core) ──provides──→ Phase 04 (Frontend)
139
154
  | API coverage | {n} | {n} | {n} | {%} |
140
155
  | Auth protection | {n} | {n} | {n} | {%} |
141
156
  | E2E flows | {n} | {n} | {n} | {%} |
157
+ | Data-flow propagation | {n} | {n} | {n} | {%} |
142
158
  | **Overall** | {n} | {n} | {n} | **{%}** |
143
159
 
144
160
  ## Recommendations
@@ -54,8 +54,9 @@ anti_patterns:
54
54
 
55
55
  | # | Link Description | Source | Target | Status | Evidence |
56
56
  |---|-----------------|--------|--------|--------|----------|
57
- | 1 | {what connects to what} | `{source_file}` | `{target_file}` | WIRED | Import at L12, called at L45 |
57
+ | 1 | {what connects to what} | `{source_file}` | `{target_file}` | WIRED | Import at L12, called at L45, args correct |
58
58
  | 2 | {what connects to what} | `{source_file}` | `{target_file}` | BROKEN | Imported but never called |
59
+ | 3 | {what connects to what} | `{source_file}` | `{target_file}` | ARGS_WRONG | Called at L45 but passes undefined for sessionId (data.session_id in scope) |
59
60
 
60
61
  ## Gaps Found
61
62
 
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "pbr",
3
3
  "displayName": "Plan-Build-Run",
4
- "version": "2.22.2",
4
+ "version": "2.24.0",
5
5
  "description": "Plan-Build-Run — Structured development workflow for Cursor. Solves context rot through disciplined subagent delegation, structured planning, atomic execution, and goal-backward verification.",
6
6
  "author": {
7
7
  "name": "SienkLogic",
@@ -137,6 +137,21 @@ Then emit a `DECISION` checkpoint asking the user to approve, modify, or reject
137
137
 
138
138
  **Commit format**: `fix({scope}): {description}` with body: `Root cause: ...` and `Debug session: .planning/debug/{slug}.md`
139
139
 
140
+ ## Local LLM Error Classification (Optional)
141
+
142
+ When you receive an error message or stack trace, you MAY use the local LLM to classify it before starting hypothesis generation. This is advisory — skip it if unavailable.
143
+
144
+ ```bash
145
+ # Write the error to a temp file, then classify:
146
+ echo "Error text here" > /tmp/debug-error.txt
147
+ node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm classify-error /tmp/debug-error.txt debugger 2>/dev/null
148
+ # Returns: {"category":"missing_output","confidence":0.91,"latency_ms":1840,"fallback_used":false}
149
+ ```
150
+
151
+ Categories: `connection_refused`, `timeout`, `missing_output`, `wrong_output_format`, `permission_error`, `unknown`.
152
+
153
+ If classification succeeds, use the returned category to bias your initial hypothesis ranking. If it returns null or fails, proceed with manual hypothesis generation as normal.
154
+
140
155
  ## Common Bug Patterns
141
156
 
142
157
  Reference: `references/common-bug-patterns.md` — covers off-by-one, null/undefined, async/timing, state management, import/module, environment, and data shape patterns.
@@ -34,6 +34,7 @@ You MUST perform all applicable categories (skip only if zero items exist for th
34
34
  3. **Auth Protection** — Every non-public route must have auth middleware. Frontend route guards must match backend protection.
35
35
  4. **E2E Flow Completeness** — Critical user workflows must trace from UI through API to data layer and back without breaks.
36
36
  5. **Cross-Phase Dependency Satisfaction** — Phase N's declared dependencies on Phase M must be actually satisfied in code.
37
+ 6. **Data-Flow Propagation** — Values originating at one boundary (hook stdin fields, API request params, env vars) must propagate correctly through the call chain to their destination (log entries, database records, API responses). A connected pipeline with missing data is a broken integration.
37
38
 
38
39
  > **First-phase edge case**: If no completed phases exist yet, focus on verifying the current phase's internal consistency — exports match imports within the phase, API contracts are self-consistent. Cross-phase checks are not applicable and should be skipped.
39
40
 
@@ -46,14 +47,19 @@ Read `references/agent-contracts.md` to validate agent-to-agent handoffs. Verify
46
47
  - **Write access for output artifact only** — you have Write access for your output artifact only. You CANNOT fix source code — you REPORT issues.
47
48
  - **Cross-phase scope** — unlike verifier (single phase), you check across phases.
48
49
 
49
- ## 6-Step Verification Process
50
+ ## 7-Step Verification Process
50
51
 
51
52
  1. **Build Export/Import Map**: Read each completed phase's SUMMARY.md frontmatter (`requires`, `provides`, `affects`). Grep actual exports/imports in source. Cross-reference declared vs actual — flag mismatches.
52
53
  2. **Verify Export Usage**: For each `provides` item: locate actual export (missing = `MISSING_EXPORT` ERROR), find consumers (none = `ORPHANED` WARNING), verify usage not just import (`IMPORTED_UNUSED` WARNING), check signature compatibility (`MISMATCHED` ERROR). Status `CONSUMED` = OK.
53
54
  3. **Verify API Coverage**: Discover routes, find frontend callers, match by method+path+body/params. Produce coverage table. See `references/integration-patterns.md` for framework-specific patterns.
54
55
  4. **Verify Auth Protection**: Identify auth mechanism, list all routes, classify (public vs protected), check frontend guards. Flag UNPROTECTED routes.
55
56
  5. **Verify E2E Flows**: Trace critical workflows step-by-step — verify each step exists and connects to the next (import/call/redirect). Record evidence (file:line). Flow status: COMPLETE | BROKEN | PARTIAL | UNTRACEABLE. See `references/integration-patterns.md` for flow templates.
56
- 6. **Compile Integration Report**: Produce final report with all findings by category.
57
+ 6. **Verify Data-Flow Propagation**: For each cross-boundary data field identified in plans or SUMMARY.md, trace the value from source through intermediate functions to destination. Verify the value is actually passed (not `undefined`/`null`/hardcoded) at each step.
58
+ - **Source examples**: hook stdin (`data.session_id`), API request params, environment variables, config fields
59
+ - **Destination examples**: log entries, database records, API responses, metric files
60
+ - **Method**: Grep each intermediate call site and inspect arguments. Flag `DATA_DROPPED` when a value available in scope is replaced by `undefined` or a placeholder.
61
+ - **Status**: `PROPAGATED` (value flows correctly) | `DATA_DROPPED` (value lost at some step) | `UNTRACEABLE` (cannot determine flow)
62
+ 7. **Compile Integration Report**: Produce final report with all findings by category.
57
63
 
58
64
  ## Output Format
59
65
 
@@ -118,3 +124,4 @@ See `references/integration-patterns.md` for grep/search patterns by framework.
118
124
  - "File exists" is not "component is integrated"
119
125
  - Auth middleware existing somewhere does not mean routes are protected
120
126
  - Always check error handling paths, not just happy paths
127
+ - Structural connectivity is not data-flow correctness — a connected pipeline can still drop data at any step
@@ -65,6 +65,23 @@ Each must-have maps to one or more tasks. Every task exists to make a must-have
65
65
 
66
66
  ---
67
67
 
68
+ ## Data Contracts for Cross-Boundary Parameters
69
+
70
+ When a function signature includes parameters that flow across module boundaries — session IDs from hook stdin, config objects from disk, auth tokens from environment — the plan **MUST** specify the **source** for each argument, not just the type.
71
+
72
+ For every cross-boundary call in a task's `<action>`, document:
73
+
74
+ | Parameter | Source | Context | Fallback |
75
+ |-----------|--------|---------|----------|
76
+ | `sessionId` | `data.session_id` (hook stdin) | Hook scripts only | `undefined` (CLI context) |
77
+ | `config` | `configLoad(planningDir)` | All callers | `resolveConfig(undefined)` |
78
+
79
+ **When to apply:** Any function call where the caller and callee live in different modules AND at least one argument originates from an external boundary (stdin, env, disk, network). Internal helper calls within the same module do not need contracts.
80
+
81
+ **Why this matters:** Without explicit source mapping, executors will use the type-correct but value-wrong default (e.g., `undefined` instead of `data.session_id`). The plan is the single source of truth for how data flows — if the plan says `undefined`, the executor will faithfully implement `undefined`.
82
+
83
+ ---
84
+
68
85
  ## Plan Structure
69
86
 
70
87
  Read `references/plan-format.md` for the complete plan file specification including:
@@ -164,6 +181,7 @@ When CONTEXT.md or RESEARCH-SUMMARY.md contains `[NEEDS DECISION]` flags from th
164
181
  - [ ] Dependencies are acyclic, no file conflicts within same wave
165
182
  - [ ] Locked decisions honored, no deferred ideas included
166
183
  - [ ] Verify commands are actually executable
184
+ - [ ] Cross-boundary parameters have documented sources (data contracts)
167
185
 
168
186
  ---
169
187
 
@@ -237,3 +255,4 @@ One-line task descriptions in `<name>`. File paths in `<files>`, not explanation
237
255
  9. DO NOT plan for features outside the current phase goal
238
256
  10. DO NOT assume research is done — check discovery level
239
257
  11. DO NOT leave done conditions vague — they must be observable
258
+ 12. DO NOT specify literal `undefined` for parameters that have a known source in the calling context — use data contracts to map sources
@@ -53,6 +53,26 @@ All claims must be attributed to a source level. Higher levels override lower le
53
53
 
54
54
  **Offline Fallback**: If web tools are unavailable (air-gapped environment, MCP not configured), rely on local sources: codebase analysis via Glob/Grep, existing documentation, and README files. Assign these S3-S4 confidence levels. Do not attempt WebFetch or WebSearch — note in the output header that external sources were unavailable.
55
55
 
56
+ ## Local LLM Source Scoring (Optional)
57
+
58
+ If local LLM offload is configured, you MAY use it to score source credibility instead of manually assigning S-levels. This is advisory — never wait on it or fail if it returns null.
59
+
60
+ Check availability first:
61
+
62
+ ```bash
63
+ node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm status 2>/dev/null
64
+ ```
65
+
66
+ If `enabled: true`, score a source excerpt:
67
+
68
+ ```bash
69
+ echo "Source URL and content excerpt" > /tmp/source-excerpt.txt
70
+ node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm score-source "https://example.com/docs" /tmp/source-excerpt.txt 2>/dev/null
71
+ # Returns: {"level":"S2","confidence":0.87,"reason":"Official library documentation page"}
72
+ ```
73
+
74
+ Use the returned `level` to set your source tag. If the call fails or returns `null`, assign the level manually per the hierarchy table above.
75
+
56
76
  ---
57
77
 
58
78
  ## Confidence Levels
@@ -97,6 +97,18 @@ conflicts: N
97
97
  - **Research gaps**: Add `[RESEARCH GAP]` flag, add to Open Questions with high impact, never fabricate
98
98
  - **Duplicates**: Consolidate into one entry, note multi-source agreement, reference all documents
99
99
 
100
+ ## Local LLM Context Summarization (Optional)
101
+
102
+ When input research documents are large (>2000 words combined), you MAY use the local LLM to pre-summarize each document before synthesis. This reduces your own context consumption. Advisory only — if unavailable, read documents normally.
103
+
104
+ ```bash
105
+ # Pre-summarize a large research document to ~150 words:
106
+ node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm summarize /path/to/RESEARCH.md 150 2>/dev/null
107
+ # Returns: {"summary":"...plain text summary under 150 words...","latency_ms":2100,"fallback_used":false}
108
+ ```
109
+
110
+ Use the returned `summary` string as your working copy of that document's findings. Still read the original for any specific version numbers, code examples, or direct quotes needed in the output.
111
+
100
112
  ## Anti-Patterns
101
113
 
102
114
  ### Universal Anti-Patterns
@@ -94,10 +94,29 @@ Verify the artifact is imported AND used by other parts of the system (functions
94
94
  | Yes | Yes | No | UNWIRED |
95
95
  | Yes | Yes | Yes | PASSED |
96
96
 
97
+ > **Note:** WIRED status (Level 3) requires correct arguments, not just correct function names. A call that passes `undefined` for a parameter available in scope is `ARGS_WRONG`, not `WIRED`.
98
+
97
99
  ### Step 6: Verify Key Links (Always)
98
100
 
99
101
  For each key_link: identify source and target components, verify the import path resolves, verify the imported symbol is actually called/used, and verify call signatures match. Watch for: wrong import paths, imported-but-never-called symbols, defined-but-never-applied middleware, registered-but-never-triggered event handlers.
100
102
 
103
+ ### Step 6b: Argument-Level Spot Checks (Always)
104
+
105
+ Beyond verifying that calls exist, spot-check that **arguments passed to cross-boundary calls carry the correct values**. A call with the right function but wrong arguments is effectively UNWIRED.
106
+
107
+ **Focus on:** IDs (session, user, request), config objects, auth tokens, and context data that originate from external boundaries (stdin, env, disk).
108
+
109
+ **Method:**
110
+ 1. For each key_link verified in Step 6, grep the call site and inspect the arguments
111
+ 2. Compare each argument against the data source available in the calling scope
112
+ 3. Flag any argument that passes `undefined`, `null`, or a hardcoded placeholder when the calling scope has the real value available (e.g., `data.session_id` is in scope but `undefined` is passed)
113
+
114
+ **Classification:**
115
+ - `WIRED` requires both correct function AND correct arguments
116
+ - `ARGS_WRONG` = correct function called but one or more arguments are incorrect/missing — this is a key link gap
117
+
118
+ **Example:** A hook script receives `data` from stdin containing `session_id`. If it calls `logMetric(planningDir, { session_id: undefined })` instead of `logMetric(planningDir, { session_id: data.session_id })`, that is an `ARGS_WRONG` gap even though the call itself exists.
119
+
101
120
  ### Step 7: Check Requirements Coverage (Always)
102
121
 
103
122
  Cross-reference all must-haves against verification results in a table:
@@ -106,8 +125,8 @@ Cross-reference all must-haves against verification results in a table:
106
125
  | # | Must-Have | Type | L1 (Exists) | L2 (Substantive) | L3 (Wired) | Status |
107
126
  |---|----------|------|-------------|-------------------|------------|--------|
108
127
  | 1 | {description} | truth | - | - | - | VERIFIED/FAILED |
109
- | 2 | {description} | artifact | YES/NO | YES/STUB/PARTIAL | WIRED/ORPHANED | PASS/FAIL |
110
- | 3 | {description} | key_link | - | - | YES/NO | PASS/FAIL |
128
+ | 2 | {description} | artifact | YES/NO | YES/STUB/PARTIAL | WIRED/ORPHANED/ARGS_WRONG | PASS/FAIL |
129
+ | 3 | {description} | key_link | - | - | YES/NO/ARGS_WRONG | PASS/FAIL |
111
130
  ```
112
131
 
113
132
  ### Step 8: Scan for Anti-Patterns (Full Verification Only)
@@ -225,3 +244,4 @@ Read `references/stub-patterns.md` for stub detection patterns by technology. Re
225
244
  9. DO NOT give PASSED status if ANY must-have fails at ANY level
226
245
  10. DO NOT count deferred items as gaps — they are intentionally not implemented
227
246
  11. DO NOT be lenient — your job is to find problems, not to be encouraging
247
+ 12. DO NOT mark a call as WIRED if it passes hardcoded `undefined`/`null` for parameters that have a known source in scope — check arguments, not just function names
@@ -440,3 +440,92 @@ Run validation with: `node plugins/pbr/scripts/pbr-tools.js config validate`
440
440
  | `tdd_mode: true` + `depth: quick` | quick depth skips verification, which conflicts with TDD's verify-first approach |
441
441
  | `git.mode: disabled` + `atomic_commits: true` | atomic_commits has no effect when git is disabled |
442
442
  | `git.branching: phase` + `git.mode: disabled` | Branching settings are ignored when git is disabled |
443
+
444
+ ---
445
+
446
+ ## local_llm
447
+
448
+ Offloads selected PBR inference tasks to a locally running Ollama instance, reducing frontier model usage and latency for fast classification calls. The key `enabled` defaults to `false`, so users without Ollama see no change — all LLM calls continue routing to Claude as normal. When enabled, PBR uses a `local_first` routing strategy: fast tasks (artifact classification, task validation) go to the local model; complex tasks (planning, execution) stay on the frontier model.
449
+
450
+ ### Quick setup
451
+
452
+ 1. Install Ollama:
453
+ - **Linux/macOS**: `curl -fsSL https://ollama.com/install.sh | sh`
454
+ - **Windows**: Download from [ollama.com/download](https://ollama.com/download) and run the installer
455
+ 2. Pull the recommended model: `ollama pull qwen2.5-coder:7b`
456
+ 3. Add to `.planning/config.json`:
457
+
458
+ ```json
459
+ "local_llm": {
460
+ "enabled": true,
461
+ "model": "qwen2.5-coder:7b"
462
+ }
463
+ ```
464
+
465
+ 4. Verify connectivity: `node /path/to/plugins/pbr/scripts/pbr-tools.js llm health`
466
+
467
+ ### Field reference
468
+
469
+ | Property | Type | Default | Description |
470
+ |----------|------|---------|-------------|
471
+ | `local_llm.enabled` | boolean | `false` | Enable local LLM offloading; `false` = all calls use frontier |
472
+ | `local_llm.provider` | string | `"ollama"` | Backend provider; only `"ollama"` is supported |
473
+ | `local_llm.endpoint` | string | `"http://localhost:11434"` | Ollama API base URL |
474
+ | `local_llm.model` | string | `"qwen2.5-coder:7b"` | Model tag to use for local inference |
475
+ | `local_llm.timeout_ms` | integer | `3000` | Per-request timeout in milliseconds; >= 500 |
476
+ | `local_llm.max_retries` | integer | `1` | Number of retry attempts on failure before falling back |
477
+ | `local_llm.fallback` | string | `"frontier"` | What to use when local LLM fails: `"frontier"` or `"skip"` |
478
+ | `local_llm.routing_strategy` | string | `"local_first"` | `"local_first"` sends fast tasks local; `"always_local"` routes everything |
479
+
480
+ ### features sub-table
481
+
482
+ Controls which PBR tasks are eligible for local LLM offloading.
483
+
484
+ | Property | Default | Description |
485
+ |----------|---------|-------------|
486
+ | `artifact_classification` | `true` | Classify artifact types (PLAN, SUMMARY, VERIFICATION) locally |
487
+ | `task_validation` | `true` | Validate task scope and completeness locally |
488
+ | `context_summarization` | `false` | Summarize context windows locally (higher token demand) |
489
+ | `source_scoring` | `false` | Score source files by relevance locally |
490
+
491
+ ### advanced sub-table
492
+
493
+ | Property | Default | Description |
494
+ |----------|---------|-------------|
495
+ | `confidence_threshold` | `0.9` | Minimum confidence (0–1) for local output to be accepted; below this, falls back to frontier |
496
+ | `shadow_mode` | `false` | Run local LLM in parallel with frontier but discard local results — useful for tuning confidence thresholds without affecting output |
497
+ | `max_input_tokens` | `2000` | Truncate inputs longer than this before sending to local model |
498
+ | `keep_alive` | `"30m"` | How long Ollama keeps the model loaded between requests (Ollama format: `"5m"`, `"1h"`) |
499
+ | `num_ctx` | `4096` | Context window size passed to Ollama; **must be 4096 on Windows** (see Windows gotchas) |
500
+ | `disable_after_failures` | `3` | Automatically disable local LLM for the session after this many consecutive failures |
501
+
502
+ ### Hardware requirements
503
+
504
+ | Tier | Hardware | Notes |
505
+ |------|----------|-------|
506
+ | Recommended | RTX 3060+ with 8 GB VRAM | Full GPU acceleration; qwen2.5-coder:7b loads entirely in VRAM |
507
+ | Functional | GTX 1660+ with 6 GB VRAM | GPU acceleration with slight layer offload to RAM |
508
+ | Marginal | CPU only, 32 GB RAM | Works but adds 5-20s latency per call; disable context-heavy features |
509
+
510
+ For GPU acceleration, ensure NVIDIA drivers are 520+ and CUDA 11.8+ is installed. AMD GPU support is available via ROCm on Linux only.
511
+
512
+ ### Windows gotchas
513
+
514
+ - **Smart App Control**: May block `ollama_llama_server.exe` on first run. Allow it via Security settings or disable Smart App Control.
515
+ - **Windows Defender**: Add an exclusion for `%LOCALAPPDATA%\Programs\Ollama\ollama_llama_server.exe` to prevent Defender from scanning inference calls in real time.
516
+ - **`num_ctx` must be 4096**: Higher values cause GPU memory fragmentation on Windows and result in OOM errors mid-session. Always set `advanced.num_ctx: 4096` in your config.
517
+ - **Firewall**: Ollama listens on `localhost:11434` by default. If you see connection refused errors, check that Windows Firewall is not blocking loopback connections.
518
+
519
+ ### Viewing metrics
520
+
521
+ After enabling local LLM, PBR logs per-call metrics to `.planning/logs/local-llm-metrics.jsonl`. Use the built-in subcommands to inspect them:
522
+
523
+ ```bash
524
+ # Show session summary (calls routed, latency, token savings)
525
+ node plugins/pbr/scripts/pbr-tools.js llm metrics
526
+
527
+ # Suggest routing threshold adjustments based on recent accuracy
528
+ node plugins/pbr/scripts/pbr-tools.js llm adjust-thresholds
529
+ ```
530
+
531
+ Metrics include: routing decision, model used, latency ms, confidence score, whether the frontier fallback was triggered, and estimated tokens saved.
@@ -71,6 +71,28 @@ requirement_ids:
71
71
  | `consumes` | NO | array | What this plan needs from prior plans. Format: `"Thing (from plan XX-YY)"` |
72
72
  | `requirement_ids` | NO | array | Requirement IDs from REQUIREMENTS.md or ROADMAP.md goal IDs that this plan addresses. Enables bidirectional traceability between plans and requirements/goals. |
73
73
  | `dependency_fingerprints` | NO | object | Hashes of dependency phase SUMMARY.md files at plan-creation time. Used to detect stale plans. |
74
+ | `data_contracts` | NO | array | Cross-boundary parameter mappings for calls where arguments originate from external boundaries. Format: `"param: source (context) [fallback]"` |
75
+
76
+ ### Data Contracts
77
+
78
+ When a task's `<action>` includes calls across module boundaries where arguments come from external sources (hook stdin, env vars, API params, config files), document the parameter-to-source mapping in `data_contracts` frontmatter and in the `<action>` step itself.
79
+
80
+ Example frontmatter:
81
+
82
+ ```yaml
83
+ data_contracts:
84
+ - "sessionId: data.session_id (hook stdin) [undefined in CLI context]"
85
+ - "config: configLoad(planningDir) (disk) [resolveConfig(undefined)]"
86
+ ```
87
+
88
+ Example in `<action>`:
89
+
90
+ ```
91
+ 3. Call classifyArtifact(llmConfig, planningDir, content, fileType, data.session_id)
92
+ Data contract: sessionId ← data.session_id from hook stdin (undefined in CLI context)
93
+ ```
94
+
95
+ **When to apply:** Any call where caller and callee are in different modules AND at least one argument originates from an external boundary. Internal helper calls within the same module do not need contracts.
74
96
 
75
97
  ---
76
98
 
@@ -127,7 +127,7 @@ Read `.planning/config.json` and check for fields referenced by skills:
127
127
  - PASS: All expected fields present with correct types
128
128
  - WARN (missing fields): Report each missing field and which skill uses it — "Run `/pbr:config` to set all options."
129
129
 
130
- ### Check 10: Orphaned Crash Recovery Files
130
+ ### Check 10: Orphaned Crash Recovery & Lock Files
131
131
 
132
132
  The executor creates `.PROGRESS-{plan_id}` files as crash recovery breadcrumbs during builds and deletes them after `SUMMARY.md` is written. Similarly, `.checkpoint-manifest.json` files track checkpoint state during execution. If the executor crashes mid-build, these files remain and could confuse future runs.
133
133
 
@@ -147,6 +147,13 @@ Glob for `.planning/phases/**/.PROGRESS-*` and `.planning/phases/**/.checkpoint-
147
147
  ```
148
148
  Fix suggestion: "Checkpoint manifests are leftover from interrupted builds. Safe to delete if no `/pbr:build` is currently running. Remove with `rm <path>`."
149
149
 
150
+ Also check for `.planning/.active-skill`:
151
+
152
+ - If the file does not exist: no action needed (PASS for this sub-check)
153
+ - If the file exists, check its age by comparing the file modification time to the current time:
154
+ - If older than 1 hour: WARN with fix suggestion: "Stale .active-skill lock file detected (set {age} ago). No PBR skill appears to be running. Safe to delete with `rm .planning/.active-skill`."
155
+ - If younger than 1 hour: INFO: "Active skill lock exists ({content}). A PBR skill may be running."
156
+
150
157
  ---
151
158
 
152
159
  ## Auto-Fix for Common Corruption Patterns
@@ -210,10 +210,10 @@ The `features.team_discussions` config flag (and `/pbr:build --team`) enables **
210
210
  ║ ▶ NEXT UP ║
211
211
  ╚══════════════════════════════════════════════════════════════╝
212
212
 
213
- `/pbr:begin` — start a new project
214
- `/pbr:status` — check current project status
215
- `/pbr:config` — configure workflow settings
216
- `/pbr:help <command>` — detailed help for a specific command
213
+ - `/pbr:begin` — start a new project
214
+ - `/pbr:status` — check current project status
215
+ - `/pbr:config` — configure workflow settings
216
+ - `/pbr:help <command>` — detailed help for a specific command
217
217
 
218
218
  ```
219
219
 
@@ -174,8 +174,8 @@ Start a new milestone cycle with new phases.
174
174
 
175
175
 
176
176
  ╔══════════════════════════════════════════════════════════════╗
177
- ║ ▶ NEXT UP ║
178
- ╚══════════════════════════════════════════════════════════════╝
177
+ ║ ▶ NEXT UP ║
178
+ ╚══════════════════════════════════════════════════════════════╝
179
179
 
180
180
  **Phase {N}: {name}** — start with discussion or planning
181
181
 
@@ -443,8 +443,8 @@ Archive a completed milestone and prepare for the next one.
443
443
 
444
444
 
445
445
  ╔══════════════════════════════════════════════════════════════╗
446
- ║ ▶ NEXT UP ║
447
- ╚══════════════════════════════════════════════════════════════╝
446
+ ║ ▶ NEXT UP ║
447
+ ╚══════════════════════════════════════════════════════════════╝
448
448
 
449
449
  **Start the next milestone** — plan new features
450
450
 
@@ -532,8 +532,8 @@ Verify milestone completion with cross-phase integration checks.
532
532
 
533
533
 
534
534
  ╔══════════════════════════════════════════════════════════════╗
535
- ║ ▶ NEXT UP ║
536
- ╚══════════════════════════════════════════════════════════════╝
535
+ ║ ▶ NEXT UP ║
536
+ ╚══════════════════════════════════════════════════════════════╝
537
537
 
538
538
  **Complete the milestone** — archive and tag
539
539
 
@@ -563,8 +563,8 @@ Verify milestone completion with cross-phase integration checks.
563
563
 
564
564
 
565
565
  ╔══════════════════════════════════════════════════════════════╗
566
- ║ ▶ NEXT UP ║
567
- ╚══════════════════════════════════════════════════════════════╝
566
+ ║ ▶ NEXT UP ║
567
+ ╚══════════════════════════════════════════════════════════════╝
568
568
 
569
569
  **Close the gaps** — create fix phases
570
570
 
@@ -592,8 +592,8 @@ Verify milestone completion with cross-phase integration checks.
592
592
 
593
593
 
594
594
  ╔══════════════════════════════════════════════════════════════╗
595
- ║ ▶ NEXT UP ║
596
- ╚══════════════════════════════════════════════════════════════╝
595
+ ║ ▶ NEXT UP ║
596
+ ╚══════════════════════════════════════════════════════════════╝
597
597
 
598
598
  **Address tech debt or proceed**
599
599
 
@@ -696,8 +696,8 @@ Create phases to close gaps found during an audit.
696
696
 
697
697
 
698
698
  ╔══════════════════════════════════════════════════════════════╗
699
- ║ ▶ NEXT UP ║
700
- ╚══════════════════════════════════════════════════════════════╝
699
+ ║ ▶ NEXT UP ║
700
+ ╚══════════════════════════════════════════════════════════════╝
701
701
 
702
702
  **Plan the first gap-closure phase**
703
703