@sienklogic/plan-build-run 2.22.2 → 2.24.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (90) hide show
  1. package/CHANGELOG.md +42 -0
  2. package/dashboard/package.json +3 -2
  3. package/dashboard/src/middleware/errorHandler.js +12 -2
  4. package/dashboard/src/repositories/planning.repository.js +24 -12
  5. package/dashboard/src/routes/pages.routes.js +182 -4
  6. package/dashboard/src/server.js +4 -0
  7. package/dashboard/src/services/audit.service.js +42 -0
  8. package/dashboard/src/services/dashboard.service.js +1 -12
  9. package/dashboard/src/services/local-llm-metrics.service.js +81 -0
  10. package/dashboard/src/services/quick.service.js +62 -0
  11. package/dashboard/src/services/roadmap.service.js +1 -11
  12. package/dashboard/src/utils/strip-bom.js +8 -0
  13. package/dashboard/src/views/audit-detail.ejs +5 -0
  14. package/dashboard/src/views/audits.ejs +5 -0
  15. package/dashboard/src/views/partials/analytics-content.ejs +61 -0
  16. package/dashboard/src/views/partials/audit-detail-content.ejs +12 -0
  17. package/dashboard/src/views/partials/audits-content.ejs +34 -0
  18. package/dashboard/src/views/partials/quick-content.ejs +40 -0
  19. package/dashboard/src/views/partials/quick-detail-content.ejs +29 -0
  20. package/dashboard/src/views/partials/sidebar.ejs +16 -0
  21. package/dashboard/src/views/partials/todos-content.ejs +13 -3
  22. package/dashboard/src/views/quick-detail.ejs +5 -0
  23. package/dashboard/src/views/quick.ejs +5 -0
  24. package/package.json +1 -1
  25. package/plugins/copilot-pbr/agents/debugger.agent.md +15 -0
  26. package/plugins/copilot-pbr/agents/integration-checker.agent.md +9 -2
  27. package/plugins/copilot-pbr/agents/planner.agent.md +19 -0
  28. package/plugins/copilot-pbr/agents/researcher.agent.md +20 -0
  29. package/plugins/copilot-pbr/agents/synthesizer.agent.md +12 -0
  30. package/plugins/copilot-pbr/agents/verifier.agent.md +22 -2
  31. package/plugins/copilot-pbr/plugin.json +1 -1
  32. package/plugins/copilot-pbr/references/config-reference.md +89 -0
  33. package/plugins/copilot-pbr/references/plan-format.md +22 -0
  34. package/plugins/copilot-pbr/skills/health/SKILL.md +8 -1
  35. package/plugins/copilot-pbr/skills/help/SKILL.md +4 -4
  36. package/plugins/copilot-pbr/skills/milestone/SKILL.md +12 -12
  37. package/plugins/copilot-pbr/skills/status/SKILL.md +37 -1
  38. package/plugins/copilot-pbr/templates/INTEGRATION-REPORT.md.tmpl +18 -2
  39. package/plugins/copilot-pbr/templates/VERIFICATION-DETAIL.md.tmpl +2 -1
  40. package/plugins/cursor-pbr/.cursor-plugin/plugin.json +1 -1
  41. package/plugins/cursor-pbr/agents/debugger.md +15 -0
  42. package/plugins/cursor-pbr/agents/integration-checker.md +9 -2
  43. package/plugins/cursor-pbr/agents/planner.md +19 -0
  44. package/plugins/cursor-pbr/agents/researcher.md +20 -0
  45. package/plugins/cursor-pbr/agents/synthesizer.md +12 -0
  46. package/plugins/cursor-pbr/agents/verifier.md +22 -2
  47. package/plugins/cursor-pbr/references/config-reference.md +89 -0
  48. package/plugins/cursor-pbr/references/plan-format.md +22 -0
  49. package/plugins/cursor-pbr/skills/health/SKILL.md +8 -1
  50. package/plugins/cursor-pbr/skills/help/SKILL.md +4 -4
  51. package/plugins/cursor-pbr/skills/milestone/SKILL.md +12 -12
  52. package/plugins/cursor-pbr/skills/status/SKILL.md +37 -1
  53. package/plugins/cursor-pbr/templates/INTEGRATION-REPORT.md.tmpl +18 -2
  54. package/plugins/cursor-pbr/templates/VERIFICATION-DETAIL.md.tmpl +2 -1
  55. package/plugins/pbr/.claude-plugin/plugin.json +1 -1
  56. package/plugins/pbr/agents/debugger.md +15 -0
  57. package/plugins/pbr/agents/integration-checker.md +9 -2
  58. package/plugins/pbr/agents/planner.md +19 -0
  59. package/plugins/pbr/agents/researcher.md +20 -0
  60. package/plugins/pbr/agents/synthesizer.md +12 -0
  61. package/plugins/pbr/agents/verifier.md +22 -2
  62. package/plugins/pbr/references/config-reference.md +89 -0
  63. package/plugins/pbr/references/plan-format.md +22 -0
  64. package/plugins/pbr/scripts/check-config-change.js +33 -0
  65. package/plugins/pbr/scripts/check-plan-format.js +52 -4
  66. package/plugins/pbr/scripts/check-subagent-output.js +43 -3
  67. package/plugins/pbr/scripts/config-schema.json +48 -0
  68. package/plugins/pbr/scripts/local-llm/client.js +214 -0
  69. package/plugins/pbr/scripts/local-llm/health.js +217 -0
  70. package/plugins/pbr/scripts/local-llm/metrics.js +252 -0
  71. package/plugins/pbr/scripts/local-llm/operations/classify-artifact.js +76 -0
  72. package/plugins/pbr/scripts/local-llm/operations/classify-error.js +75 -0
  73. package/plugins/pbr/scripts/local-llm/operations/score-source.js +72 -0
  74. package/plugins/pbr/scripts/local-llm/operations/summarize-context.js +62 -0
  75. package/plugins/pbr/scripts/local-llm/operations/validate-task.js +59 -0
  76. package/plugins/pbr/scripts/local-llm/router.js +101 -0
  77. package/plugins/pbr/scripts/local-llm/shadow.js +60 -0
  78. package/plugins/pbr/scripts/local-llm/threshold-tuner.js +118 -0
  79. package/plugins/pbr/scripts/pbr-tools.js +120 -3
  80. package/plugins/pbr/scripts/post-write-dispatch.js +2 -2
  81. package/plugins/pbr/scripts/progress-tracker.js +29 -3
  82. package/plugins/pbr/scripts/session-cleanup.js +36 -1
  83. package/plugins/pbr/scripts/validate-task.js +30 -1
  84. package/plugins/pbr/skills/health/SKILL.md +8 -1
  85. package/plugins/pbr/skills/help/SKILL.md +4 -4
  86. package/plugins/pbr/skills/milestone/SKILL.md +12 -12
  87. package/plugins/pbr/skills/status/SKILL.md +38 -2
  88. package/plugins/pbr/templates/INTEGRATION-REPORT.md.tmpl +18 -2
  89. package/plugins/pbr/templates/VERIFICATION-DETAIL.md.tmpl +2 -1
  90. package/dashboard/src/views/coming-soon.ejs +0 -11
@@ -88,3 +88,64 @@
88
88
  <% } else { %>
89
89
  <p>No phase data available.</p>
90
90
  <% } %>
91
+
92
+ <% if (typeof llmMetrics !== 'undefined' && llmMetrics) { %>
93
+ <article>
94
+ <header>Local LLM Offload</header>
95
+ <div class="grid">
96
+ <article>
97
+ <header>Total Calls</header>
98
+ <strong class="stat-value"><%= llmMetrics.summary.total_calls %></strong>
99
+ <span class="stat-unit">calls</span>
100
+ </article>
101
+ <article>
102
+ <header>Tokens Saved</header>
103
+ <strong class="stat-value"><%= llmMetrics.summary.tokens_saved.toLocaleString() %></strong>
104
+ <span class="stat-unit">frontier tokens</span>
105
+ </article>
106
+ <article>
107
+ <header>Est. Cost Saved</header>
108
+ <strong class="stat-value">$<%= llmMetrics.summary.cost_saved_usd.toFixed(4) %></strong>
109
+ <span class="stat-unit">at $3/M tokens</span>
110
+ </article>
111
+ <article>
112
+ <header>Fallback Rate</header>
113
+ <strong class="stat-value"><%= llmMetrics.summary.fallback_rate_pct %>%</strong>
114
+ <span class="stat-unit"><%= llmMetrics.summary.fallback_count %> fallbacks</span>
115
+ </article>
116
+ <article>
117
+ <header>Avg Latency</header>
118
+ <strong class="stat-value"><%= llmMetrics.summary.avg_latency_ms %></strong>
119
+ <span class="stat-unit">ms/call</span>
120
+ </article>
121
+ </div>
122
+ <% if (llmMetrics.byOperation && llmMetrics.byOperation.length > 0) { %>
123
+ <div class="overflow-auto" style="margin-top: var(--space-md);">
124
+ <table>
125
+ <thead>
126
+ <tr>
127
+ <th>Operation</th>
128
+ <th>Calls</th>
129
+ <th>Fallbacks</th>
130
+ <th>Tokens Saved</th>
131
+ </tr>
132
+ </thead>
133
+ <tbody>
134
+ <% llmMetrics.byOperation.forEach(op => { %>
135
+ <tr>
136
+ <td><%= op.operation %></td>
137
+ <td><%= op.calls %></td>
138
+ <td><%= op.fallbacks %></td>
139
+ <td><%= op.tokens_saved.toLocaleString() %></td>
140
+ </tr>
141
+ <% }) %>
142
+ </tbody>
143
+ </table>
144
+ </div>
145
+ <% } %>
146
+ <footer style="color: var(--pico-muted-color); font-size: 0.85em;">
147
+ Baseline estimate: each local call replaced ~<%= llmMetrics.baseline.estimated_frontier_tokens_without_local.toLocaleString() %> frontier tokens total.
148
+ Advisory only — no data collected when local LLM is disabled.
149
+ </footer>
150
+ </article>
151
+ <% } %>
@@ -0,0 +1,12 @@
1
+ <%- include('breadcrumbs', { breadcrumbs: typeof breadcrumbs !== 'undefined' ? breadcrumbs : [] }) %>
2
+ <h1><%= title %></h1>
3
+
4
+ <p><a href="/audits">&larr; Back to Audit Reports</a></p>
5
+
6
+ <% if (typeof date !== 'undefined' && date) { %>
7
+ <p><small>Date: <%= date %></small></p>
8
+ <% } %>
9
+
10
+ <article class="markdown-body">
11
+ <%- html %>
12
+ </article>
@@ -0,0 +1,34 @@
1
+ <%- include('breadcrumbs', { breadcrumbs: typeof breadcrumbs !== 'undefined' ? breadcrumbs : [] }) %>
2
+ <h1>Audit Reports</h1>
3
+
4
+ <% if (typeof reports !== 'undefined' && reports.length > 0) { %>
5
+ <article>
6
+ <div class="table-wrap">
7
+ <table>
8
+ <thead>
9
+ <tr>
10
+ <th scope="col">Date</th>
11
+ <th scope="col">Report</th>
12
+ </tr>
13
+ </thead>
14
+ <tbody>
15
+ <% reports.forEach(function(report) { %>
16
+ <tr>
17
+ <td><%= report.date || '—' %></td>
18
+ <td>
19
+ <a href="/audits/<%= report.filename %>"
20
+ hx-get="/audits/<%= report.filename %>"
21
+ hx-target="#main-content"
22
+ hx-push-url="true">
23
+ <%= report.title %>
24
+ </a>
25
+ </td>
26
+ </tr>
27
+ <% }); %>
28
+ </tbody>
29
+ </table>
30
+ </div>
31
+ </article>
32
+ <% } else { %>
33
+ <%- include('empty-state', { icon: '🔍', title: 'No audit reports found', action: 'Run /pbr:audit to generate a session audit report.' }) %>
34
+ <% } %>
@@ -0,0 +1,40 @@
1
+ <%- include('breadcrumbs', { breadcrumbs: typeof breadcrumbs !== 'undefined' ? breadcrumbs : [] }) %>
2
+ <h1>Quick Tasks</h1>
3
+
4
+ <% if (typeof tasks !== 'undefined' && tasks.length > 0) { %>
5
+ <article>
6
+ <div class="table-wrap">
7
+ <table>
8
+ <thead>
9
+ <tr>
10
+ <th scope="col">ID</th>
11
+ <th scope="col">Title</th>
12
+ <th scope="col">Status</th>
13
+ </tr>
14
+ </thead>
15
+ <tbody>
16
+ <% tasks.forEach(function(task) { %>
17
+ <tr>
18
+ <td><%= task.id %></td>
19
+ <td>
20
+ <a href="/quick/<%= task.id %>"
21
+ hx-get="/quick/<%= task.id %>"
22
+ hx-target="#main-content"
23
+ hx-push-url="true">
24
+ <%= task.title %>
25
+ </a>
26
+ </td>
27
+ <td>
28
+ <span class="status-badge" data-status="<%= task.status %>">
29
+ <%= task.status %>
30
+ </span>
31
+ </td>
32
+ </tr>
33
+ <% }); %>
34
+ </tbody>
35
+ </table>
36
+ </div>
37
+ </article>
38
+ <% } else { %>
39
+ <%- include('empty-state', { icon: '⚡', title: 'No quick tasks found', action: '' }) %>
40
+ <% } %>
@@ -0,0 +1,29 @@
1
+ <%- include('breadcrumbs', { breadcrumbs: typeof breadcrumbs !== 'undefined' ? breadcrumbs : [] }) %>
2
+ <h1><%= title %></h1>
3
+
4
+ <p><a href="/quick">&larr; Back to Quick Tasks</a></p>
5
+
6
+ <article>
7
+ <header>
8
+ <strong>Quick Task <%= id %></strong>
9
+ &nbsp;
10
+ <span class="status-badge" data-status="<%= status %>">
11
+ <%= status %>
12
+ </span>
13
+ </header>
14
+
15
+ <% if (planHtml) { %>
16
+ <section>
17
+ <h2>Plan</h2>
18
+ <%- planHtml %>
19
+ </section>
20
+ <% } %>
21
+
22
+ <% if (summaryHtml) { %>
23
+ <hr>
24
+ <section>
25
+ <h2>Summary</h2>
26
+ <%- summaryHtml %>
27
+ </section>
28
+ <% } %>
29
+ </article>
@@ -79,6 +79,22 @@
79
79
  Notes
80
80
  </a>
81
81
  </li>
82
+ <li>
83
+ <a href="/quick"
84
+ hx-get="/quick"
85
+ hx-target="#main-content"
86
+ hx-push-url="true"<%= typeof activePage !== 'undefined' && activePage === 'quick' ? ' aria-current="page"' : '' %>>
87
+ Quick Tasks
88
+ </a>
89
+ </li>
90
+ <li>
91
+ <a href="/audits"
92
+ hx-get="/audits"
93
+ hx-target="#main-content"
94
+ hx-push-url="true"<%= typeof activePage !== 'undefined' && activePage === 'audits' ? ' aria-current="page"' : '' %>>
95
+ Audit Reports
96
+ </a>
97
+ </li>
82
98
  </ul>
83
99
  </details>
84
100
 
@@ -1,10 +1,17 @@
1
1
  <%- include('breadcrumbs', { breadcrumbs: typeof breadcrumbs !== 'undefined' ? breadcrumbs : [] }) %>
2
2
  <h1>Todos</h1>
3
3
 
4
- <p><a href="/todos/new" role="button"
4
+ <p>
5
+ <a href="/todos/new" role="button"
5
6
  hx-get="/todos/new"
6
7
  hx-target="#main-content"
7
- hx-push-url="true">Create Todo</a></p>
8
+ hx-push-url="true">Create Todo</a>
9
+ &nbsp;
10
+ <a href="/todos/done"
11
+ hx-get="/todos/done"
12
+ hx-target="#main-content"
13
+ hx-push-url="true">View Completed Todos</a>
14
+ </p>
8
15
 
9
16
  <% const f = typeof filters !== 'undefined' ? filters : { priority: '', status: '', q: '' }; %>
10
17
  <article>
@@ -70,7 +77,10 @@
70
77
  <tr>
71
78
  <td><%= todo.id %></td>
72
79
  <td>
73
- <a href="/todos/<%= todo.id %>">
80
+ <a href="/todos/<%= todo.id %>"
81
+ hx-get="/todos/<%= todo.id %>"
82
+ hx-target="#main-content"
83
+ hx-push-url="true">
74
84
  <%= todo.title %>
75
85
  </a>
76
86
  </td>
@@ -0,0 +1,5 @@
1
+ <%- include('partials/layout-top', { title: title, activePage: 'quick' }) %>
2
+
3
+ <%- include('partials/quick-detail-content') %>
4
+
5
+ <%- include('partials/layout-bottom') %>
@@ -0,0 +1,5 @@
1
+ <%- include('partials/layout-top', { title: 'Quick Tasks', activePage: 'quick' }) %>
2
+
3
+ <%- include('partials/quick-content') %>
4
+
5
+ <%- include('partials/layout-bottom') %>
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@sienklogic/plan-build-run",
3
- "version": "2.22.2",
3
+ "version": "2.24.0",
4
4
  "description": "Plan it, Build it, Run it — structured development workflow for Claude Code",
5
5
  "keywords": [
6
6
  "claude-code",
@@ -138,6 +138,21 @@ Then emit a `DECISION` checkpoint asking the user to approve, modify, or reject
138
138
 
139
139
  **Commit format**: `fix({scope}): {description}` with body: `Root cause: ...` and `Debug session: .planning/debug/{slug}.md`
140
140
 
141
+ ## Local LLM Error Classification (Optional)
142
+
143
+ When you receive an error message or stack trace, you MAY use the local LLM to classify it before starting hypothesis generation. This is advisory — skip it if unavailable.
144
+
145
+ ```bash
146
+ # Write the error to a temp file, then classify:
147
+ echo "Error text here" > /tmp/debug-error.txt
148
+ node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm classify-error /tmp/debug-error.txt debugger 2>/dev/null
149
+ # Returns: {"category":"missing_output","confidence":0.91,"latency_ms":1840,"fallback_used":false}
150
+ ```
151
+
152
+ Categories: `connection_refused`, `timeout`, `missing_output`, `wrong_output_format`, `permission_error`, `unknown`.
153
+
154
+ If classification succeeds, use the returned category to bias your initial hypothesis ranking. If it returns null or fails, proceed with manual hypothesis generation as normal.
155
+
141
156
  ## Common Bug Patterns
142
157
 
143
158
  Reference: `references/common-bug-patterns.md` — covers off-by-one, null/undefined, async/timing, state management, import/module, environment, and data shape patterns.
@@ -35,6 +35,7 @@ You MUST perform all applicable categories (skip only if zero items exist for th
35
35
  3. **Auth Protection** — Every non-public route must have auth middleware. Frontend route guards must match backend protection.
36
36
  4. **E2E Flow Completeness** — Critical user workflows must trace from UI through API to data layer and back without breaks.
37
37
  5. **Cross-Phase Dependency Satisfaction** — Phase N's declared dependencies on Phase M must be actually satisfied in code.
38
+ 6. **Data-Flow Propagation** — Values originating at one boundary (hook stdin fields, API request params, env vars) must propagate correctly through the call chain to their destination (log entries, database records, API responses). A connected pipeline with missing data is a broken integration.
38
39
 
39
40
  > **First-phase edge case**: If no completed phases exist yet, focus on verifying the current phase's internal consistency — exports match imports within the phase, API contracts are self-consistent. Cross-phase checks are not applicable and should be skipped.
40
41
 
@@ -47,14 +48,19 @@ Read `references/agent-contracts.md` to validate agent-to-agent handoffs. Verify
47
48
  - **Write access for output artifact only** — you have Write access for your output artifact only. You CANNOT fix source code — you REPORT issues.
48
49
  - **Cross-phase scope** — unlike verifier (single phase), you check across phases.
49
50
 
50
- ## 6-Step Verification Process
51
+ ## 7-Step Verification Process
51
52
 
52
53
  1. **Build Export/Import Map**: Read each completed phase's SUMMARY.md frontmatter (`requires`, `provides`, `affects`). Grep actual exports/imports in source. Cross-reference declared vs actual — flag mismatches.
53
54
  2. **Verify Export Usage**: For each `provides` item: locate actual export (missing = `MISSING_EXPORT` ERROR), find consumers (none = `ORPHANED` WARNING), verify usage not just import (`IMPORTED_UNUSED` WARNING), check signature compatibility (`MISMATCHED` ERROR). Status `CONSUMED` = OK.
54
55
  3. **Verify API Coverage**: Discover routes, find frontend callers, match by method+path+body/params. Produce coverage table. See `references/integration-patterns.md` for framework-specific patterns.
55
56
  4. **Verify Auth Protection**: Identify auth mechanism, list all routes, classify (public vs protected), check frontend guards. Flag UNPROTECTED routes.
56
57
  5. **Verify E2E Flows**: Trace critical workflows step-by-step — verify each step exists and connects to the next (import/call/redirect). Record evidence (file:line). Flow status: COMPLETE | BROKEN | PARTIAL | UNTRACEABLE. See `references/integration-patterns.md` for flow templates.
57
- 6. **Compile Integration Report**: Produce final report with all findings by category.
58
+ 6. **Verify Data-Flow Propagation**: For each cross-boundary data field identified in plans or SUMMARY.md, trace the value from source through intermediate functions to destination. Verify the value is actually passed (not `undefined`/`null`/hardcoded) at each step.
59
+ - **Source examples**: hook stdin (`data.session_id`), API request params, environment variables, config fields
60
+ - **Destination examples**: log entries, database records, API responses, metric files
61
+ - **Method**: Grep each intermediate call site and inspect arguments. Flag `DATA_DROPPED` when a value available in scope is replaced by `undefined` or a placeholder.
62
+ - **Status**: `PROPAGATED` (value flows correctly) | `DATA_DROPPED` (value lost at some step) | `UNTRACEABLE` (cannot determine flow)
63
+ 7. **Compile Integration Report**: Produce final report with all findings by category.
58
64
 
59
65
  ## Output Format
60
66
 
@@ -119,3 +125,4 @@ See `references/integration-patterns.md` for grep/search patterns by framework.
119
125
  - "File exists" is not "component is integrated"
120
126
  - Auth middleware existing somewhere does not mean routes are protected
121
127
  - Always check error handling paths, not just happy paths
128
+ - Structural connectivity is not data-flow correctness — a connected pipeline can still drop data at any step
@@ -66,6 +66,23 @@ Each must-have maps to one or more tasks. Every task exists to make a must-have
66
66
 
67
67
  ---
68
68
 
69
+ ## Data Contracts for Cross-Boundary Parameters
70
+
71
+ When a function signature includes parameters that flow across module boundaries — session IDs from hook stdin, config objects from disk, auth tokens from environment — the plan **MUST** specify the **source** for each argument, not just the type.
72
+
73
+ For every cross-boundary call in a task's `<action>`, document:
74
+
75
+ | Parameter | Source | Context | Fallback |
76
+ |-----------|--------|---------|----------|
77
+ | `sessionId` | `data.session_id` (hook stdin) | Hook scripts only | `undefined` (CLI context) |
78
+ | `config` | `configLoad(planningDir)` | All callers | `resolveConfig(undefined)` |
79
+
80
+ **When to apply:** Any function call where the caller and callee live in different modules AND at least one argument originates from an external boundary (stdin, env, disk, network). Internal helper calls within the same module do not need contracts.
81
+
82
+ **Why this matters:** Without explicit source mapping, executors will use the type-correct but value-wrong default (e.g., `undefined` instead of `data.session_id`). The plan is the single source of truth for how data flows — if the plan says `undefined`, the executor will faithfully implement `undefined`.
83
+
84
+ ---
85
+
69
86
  ## Plan Structure
70
87
 
71
88
  Read `references/plan-format.md` for the complete plan file specification including:
@@ -165,6 +182,7 @@ When CONTEXT.md or RESEARCH-SUMMARY.md contains `[NEEDS DECISION]` flags from th
165
182
  - [ ] Dependencies are acyclic, no file conflicts within same wave
166
183
  - [ ] Locked decisions honored, no deferred ideas included
167
184
  - [ ] Verify commands are actually executable
185
+ - [ ] Cross-boundary parameters have documented sources (data contracts)
168
186
 
169
187
  ---
170
188
 
@@ -238,3 +256,4 @@ One-line task descriptions in `<name>`. File paths in `<files>`, not explanation
238
256
  9. DO NOT plan for features outside the current phase goal
239
257
  10. DO NOT assume research is done — check discovery level
240
258
  11. DO NOT leave done conditions vague — they must be observable
259
+ 12. DO NOT specify literal `undefined` for parameters that have a known source in the calling context — use data contracts to map sources
@@ -54,6 +54,26 @@ All claims must be attributed to a source level. Higher levels override lower le
54
54
 
55
55
  **Offline Fallback**: If web tools are unavailable (air-gapped environment, MCP not configured), rely on local sources: codebase analysis via Glob/Grep, existing documentation, and README files. Assign these S3-S4 confidence levels. Do not attempt WebFetch or WebSearch — note in the output header that external sources were unavailable.
56
56
 
57
+ ## Local LLM Source Scoring (Optional)
58
+
59
+ If local LLM offload is configured, you MAY use it to score source credibility instead of manually assigning S-levels. This is advisory — never wait on it or fail if it returns null.
60
+
61
+ Check availability first:
62
+
63
+ ```bash
64
+ node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm status 2>/dev/null
65
+ ```
66
+
67
+ If `enabled: true`, score a source excerpt:
68
+
69
+ ```bash
70
+ echo "Source URL and content excerpt" > /tmp/source-excerpt.txt
71
+ node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm score-source "https://example.com/docs" /tmp/source-excerpt.txt 2>/dev/null
72
+ # Returns: {"level":"S2","confidence":0.87,"reason":"Official library documentation page"}
73
+ ```
74
+
75
+ Use the returned `level` to set your source tag. If the call fails or returns `null`, assign the level manually per the hierarchy table above.
76
+
57
77
  ---
58
78
 
59
79
  ## Confidence Levels
@@ -98,6 +98,18 @@ conflicts: N
98
98
  - **Research gaps**: Add `[RESEARCH GAP]` flag, add to Open Questions with high impact, never fabricate
99
99
  - **Duplicates**: Consolidate into one entry, note multi-source agreement, reference all documents
100
100
 
101
+ ## Local LLM Context Summarization (Optional)
102
+
103
+ When input research documents are large (>2000 words combined), you MAY use the local LLM to pre-summarize each document before synthesis. This reduces your own context consumption. Advisory only — if unavailable, read documents normally.
104
+
105
+ ```bash
106
+ # Pre-summarize a large research document to ~150 words:
107
+ node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm summarize /path/to/RESEARCH.md 150 2>/dev/null
108
+ # Returns: {"summary":"...plain text summary under 150 words...","latency_ms":2100,"fallback_used":false}
109
+ ```
110
+
111
+ Use the returned `summary` string as your working copy of that document's findings. Still read the original for any specific version numbers, code examples, or direct quotes needed in the output.
112
+
101
113
  ## Anti-Patterns
102
114
 
103
115
  ### Universal Anti-Patterns
@@ -95,10 +95,29 @@ Verify the artifact is imported AND used by other parts of the system (functions
95
95
  | Yes | Yes | No | UNWIRED |
96
96
  | Yes | Yes | Yes | PASSED |
97
97
 
98
+ > **Note:** WIRED status (Level 3) requires correct arguments, not just correct function names. A call that passes `undefined` for a parameter available in scope is `ARGS_WRONG`, not `WIRED`.
99
+
98
100
  ### Step 6: Verify Key Links (Always)
99
101
 
100
102
  For each key_link: identify source and target components, verify the import path resolves, verify the imported symbol is actually called/used, and verify call signatures match. Watch for: wrong import paths, imported-but-never-called symbols, defined-but-never-applied middleware, registered-but-never-triggered event handlers.
101
103
 
104
+ ### Step 6b: Argument-Level Spot Checks (Always)
105
+
106
+ Beyond verifying that calls exist, spot-check that **arguments passed to cross-boundary calls carry the correct values**. A call with the right function but wrong arguments is effectively UNWIRED.
107
+
108
+ **Focus on:** IDs (session, user, request), config objects, auth tokens, and context data that originate from external boundaries (stdin, env, disk).
109
+
110
+ **Method:**
111
+ 1. For each key_link verified in Step 6, grep the call site and inspect the arguments
112
+ 2. Compare each argument against the data source available in the calling scope
113
+ 3. Flag any argument that passes `undefined`, `null`, or a hardcoded placeholder when the calling scope has the real value available (e.g., `data.session_id` is in scope but `undefined` is passed)
114
+
115
+ **Classification:**
116
+ - `WIRED` requires both correct function AND correct arguments
117
+ - `ARGS_WRONG` = correct function called but one or more arguments are incorrect/missing — this is a key link gap
118
+
119
+ **Example:** A hook script receives `data` from stdin containing `session_id`. If it calls `logMetric(planningDir, { session_id: undefined })` instead of `logMetric(planningDir, { session_id: data.session_id })`, that is an `ARGS_WRONG` gap even though the call itself exists.
120
+
102
121
  ### Step 7: Check Requirements Coverage (Always)
103
122
 
104
123
  Cross-reference all must-haves against verification results in a table:
@@ -107,8 +126,8 @@ Cross-reference all must-haves against verification results in a table:
107
126
  | # | Must-Have | Type | L1 (Exists) | L2 (Substantive) | L3 (Wired) | Status |
108
127
  |---|----------|------|-------------|-------------------|------------|--------|
109
128
  | 1 | {description} | truth | - | - | - | VERIFIED/FAILED |
110
- | 2 | {description} | artifact | YES/NO | YES/STUB/PARTIAL | WIRED/ORPHANED | PASS/FAIL |
111
- | 3 | {description} | key_link | - | - | YES/NO | PASS/FAIL |
129
+ | 2 | {description} | artifact | YES/NO | YES/STUB/PARTIAL | WIRED/ORPHANED/ARGS_WRONG | PASS/FAIL |
130
+ | 3 | {description} | key_link | - | - | YES/NO/ARGS_WRONG | PASS/FAIL |
112
131
  ```
113
132
 
114
133
  ### Step 8: Scan for Anti-Patterns (Full Verification Only)
@@ -226,3 +245,4 @@ Read `references/stub-patterns.md` for stub detection patterns by technology. Re
226
245
  9. DO NOT give PASSED status if ANY must-have fails at ANY level
227
246
  10. DO NOT count deferred items as gaps — they are intentionally not implemented
228
247
  11. DO NOT be lenient — your job is to find problems, not to be encouraging
248
+ 12. DO NOT mark a call as WIRED if it passes hardcoded `undefined`/`null` for parameters that have a known source in scope — check arguments, not just function names
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "pbr",
3
3
  "displayName": "Plan-Build-Run",
4
- "version": "2.22.2",
4
+ "version": "2.24.0",
5
5
  "description": "Plan-Build-Run — Structured development workflow for GitHub Copilot CLI. Solves context rot through disciplined agent delegation, structured planning, atomic execution, and goal-backward verification.",
6
6
  "author": {
7
7
  "name": "SienkLogic",
@@ -440,3 +440,92 @@ Run validation with: `node plugins/pbr/scripts/pbr-tools.js config validate`
440
440
  | `tdd_mode: true` + `depth: quick` | quick depth skips verification, which conflicts with TDD's verify-first approach |
441
441
  | `git.mode: disabled` + `atomic_commits: true` | atomic_commits has no effect when git is disabled |
442
442
  | `git.branching: phase` + `git.mode: disabled` | Branching settings are ignored when git is disabled |
443
+
444
+ ---
445
+
446
+ ## local_llm
447
+
448
+ Offloads selected PBR inference tasks to a locally running Ollama instance, reducing frontier model usage and latency for fast classification calls. The key `enabled` defaults to `false`, so users without Ollama see no change — all LLM calls continue routing to Claude as normal. When enabled, PBR uses a `local_first` routing strategy: fast tasks (artifact classification, task validation) go to the local model; complex tasks (planning, execution) stay on the frontier model.
449
+
450
+ ### Quick setup
451
+
452
+ 1. Install Ollama:
453
+ - **Linux/macOS**: `curl -fsSL https://ollama.com/install.sh | sh`
454
+ - **Windows**: Download from [ollama.com/download](https://ollama.com/download) and run the installer
455
+ 2. Pull the recommended model: `ollama pull qwen2.5-coder:7b`
456
+ 3. Add to `.planning/config.json`:
457
+
458
+ ```json
459
+ "local_llm": {
460
+ "enabled": true,
461
+ "model": "qwen2.5-coder:7b"
462
+ }
463
+ ```
464
+
465
+ 4. Verify connectivity: `node /path/to/plugins/pbr/scripts/pbr-tools.js llm health`
466
+
467
+ ### Field reference
468
+
469
+ | Property | Type | Default | Description |
470
+ |----------|------|---------|-------------|
471
+ | `local_llm.enabled` | boolean | `false` | Enable local LLM offloading; `false` = all calls use frontier |
472
+ | `local_llm.provider` | string | `"ollama"` | Backend provider; only `"ollama"` is supported |
473
+ | `local_llm.endpoint` | string | `"http://localhost:11434"` | Ollama API base URL |
474
+ | `local_llm.model` | string | `"qwen2.5-coder:7b"` | Model tag to use for local inference |
475
+ | `local_llm.timeout_ms` | integer | `3000` | Per-request timeout in milliseconds; >= 500 |
476
+ | `local_llm.max_retries` | integer | `1` | Number of retry attempts on failure before falling back |
477
+ | `local_llm.fallback` | string | `"frontier"` | What to use when local LLM fails: `"frontier"` or `"skip"` |
478
+ | `local_llm.routing_strategy` | string | `"local_first"` | `"local_first"` sends fast tasks local; `"always_local"` routes everything |
479
+
480
+ ### features sub-table
481
+
482
+ Controls which PBR tasks are eligible for local LLM offloading.
483
+
484
+ | Property | Default | Description |
485
+ |----------|---------|-------------|
486
+ | `artifact_classification` | `true` | Classify artifact types (PLAN, SUMMARY, VERIFICATION) locally |
487
+ | `task_validation` | `true` | Validate task scope and completeness locally |
488
+ | `context_summarization` | `false` | Summarize context windows locally (higher token demand) |
489
+ | `source_scoring` | `false` | Score source files by relevance locally |
490
+
491
+ ### advanced sub-table
492
+
493
+ | Property | Default | Description |
494
+ |----------|---------|-------------|
495
+ | `confidence_threshold` | `0.9` | Minimum confidence (0–1) for local output to be accepted; below this, falls back to frontier |
496
+ | `shadow_mode` | `false` | Run local LLM in parallel with frontier but discard local results — useful for tuning confidence thresholds without affecting output |
497
+ | `max_input_tokens` | `2000` | Truncate inputs longer than this before sending to local model |
498
+ | `keep_alive` | `"30m"` | How long Ollama keeps the model loaded between requests (Ollama format: `"5m"`, `"1h"`) |
499
+ | `num_ctx` | `4096` | Context window size passed to Ollama; **must be 4096 on Windows** (see Windows gotchas) |
500
+ | `disable_after_failures` | `3` | Automatically disable local LLM for the session after this many consecutive failures |
501
+
502
+ ### Hardware requirements
503
+
504
+ | Tier | Hardware | Notes |
505
+ |------|----------|-------|
506
+ | Recommended | RTX 3060+ with 8 GB VRAM | Full GPU acceleration; qwen2.5-coder:7b loads entirely in VRAM |
507
+ | Functional | GTX 1660+ with 6 GB VRAM | GPU acceleration with slight layer offload to RAM |
508
+ | Marginal | CPU only, 32 GB RAM | Works but adds 5-20s latency per call; disable context-heavy features |
509
+
510
+ For GPU acceleration, ensure NVIDIA drivers are 520+ and CUDA 11.8+ is installed. AMD GPU support is available via ROCm on Linux only.
511
+
512
+ ### Windows gotchas
513
+
514
+ - **Smart App Control**: May block `ollama_llama_server.exe` on first run. Allow it via Security settings or disable Smart App Control.
515
+ - **Windows Defender**: Add an exclusion for `%LOCALAPPDATA%\Programs\Ollama\ollama_llama_server.exe` to prevent Defender from scanning inference calls in real time.
516
+ - **`num_ctx` must be 4096**: Higher values cause GPU memory fragmentation on Windows and result in OOM errors mid-session. Always set `advanced.num_ctx: 4096` in your config.
517
+ - **Firewall**: Ollama listens on `localhost:11434` by default. If you see connection refused errors, check that Windows Firewall is not blocking loopback connections.
518
+
519
+ ### Viewing metrics
520
+
521
+ After enabling local LLM, PBR logs per-call metrics to `.planning/logs/local-llm-metrics.jsonl`. Use the built-in subcommands to inspect them:
522
+
523
+ ```bash
524
+ # Show session summary (calls routed, latency, token savings)
525
+ node plugins/pbr/scripts/pbr-tools.js llm metrics
526
+
527
+ # Suggest routing threshold adjustments based on recent accuracy
528
+ node plugins/pbr/scripts/pbr-tools.js llm adjust-thresholds
529
+ ```
530
+
531
+ Metrics include: routing decision, model used, latency ms, confidence score, whether the frontier fallback was triggered, and estimated tokens saved.
@@ -71,6 +71,28 @@ requirement_ids:
71
71
  | `consumes` | NO | array | What this plan needs from prior plans. Format: `"Thing (from plan XX-YY)"` |
72
72
  | `requirement_ids` | NO | array | Requirement IDs from REQUIREMENTS.md or ROADMAP.md goal IDs that this plan addresses. Enables bidirectional traceability between plans and requirements/goals. |
73
73
  | `dependency_fingerprints` | NO | object | Hashes of dependency phase SUMMARY.md files at plan-creation time. Used to detect stale plans. |
74
+ | `data_contracts` | NO | array | Cross-boundary parameter mappings for calls where arguments originate from external boundaries. Format: `"param: source (context) [fallback]"` |
75
+
76
+ ### Data Contracts
77
+
78
+ When a task's `<action>` includes calls across module boundaries where arguments come from external sources (hook stdin, env vars, API params, config files), document the parameter-to-source mapping in `data_contracts` frontmatter and in the `<action>` step itself.
79
+
80
+ Example frontmatter:
81
+
82
+ ```yaml
83
+ data_contracts:
84
+ - "sessionId: data.session_id (hook stdin) [undefined in CLI context]"
85
+ - "config: configLoad(planningDir) (disk) [resolveConfig(undefined)]"
86
+ ```
87
+
88
+ Example in `<action>`:
89
+
90
+ ```
91
+ 3. Call classifyArtifact(llmConfig, planningDir, content, fileType, data.session_id)
92
+ Data contract: sessionId ← data.session_id from hook stdin (undefined in CLI context)
93
+ ```
94
+
95
+ **When to apply:** Any call where caller and callee are in different modules AND at least one argument originates from an external boundary. Internal helper calls within the same module do not need contracts.
74
96
 
75
97
  ---
76
98
 
@@ -127,7 +127,7 @@ Read `.planning/config.json` and check for fields referenced by skills:
127
127
  - PASS: All expected fields present with correct types
128
128
  - WARN (missing fields): Report each missing field and which skill uses it — "Run `/pbr:config` to set all options."
129
129
 
130
- ### Check 10: Orphaned Crash Recovery Files
130
+ ### Check 10: Orphaned Crash Recovery & Lock Files
131
131
 
132
132
  The executor creates `.PROGRESS-{plan_id}` files as crash recovery breadcrumbs during builds and deletes them after `SUMMARY.md` is written. Similarly, `.checkpoint-manifest.json` files track checkpoint state during execution. If the executor crashes mid-build, these files remain and could confuse future runs.
133
133
 
@@ -147,6 +147,13 @@ Glob for `.planning/phases/**/.PROGRESS-*` and `.planning/phases/**/.checkpoint-
147
147
  ```
148
148
  Fix suggestion: "Checkpoint manifests are leftover from interrupted builds. Safe to delete if no `/pbr:build` is currently running. Remove with `rm <path>`."
149
149
 
150
+ Also check for `.planning/.active-skill`:
151
+
152
+ - If the file does not exist: no action needed (PASS for this sub-check)
153
+ - If the file exists, check its age by comparing the file modification time to the current time:
154
+ - If older than 1 hour: WARN with fix suggestion: "Stale .active-skill lock file detected (set {age} ago). No PBR skill appears to be running. Safe to delete with `rm .planning/.active-skill`."
155
+ - If younger than 1 hour: INFO: "Active skill lock exists ({content}). A PBR skill may be running."
156
+
150
157
  ---
151
158
 
152
159
  ## Auto-Fix for Common Corruption Patterns
@@ -210,10 +210,10 @@ The `features.team_discussions` config flag (and `/pbr:build --team`) enables **
210
210
  ║ ▶ NEXT UP ║
211
211
  ╚══════════════════════════════════════════════════════════════╝
212
212
 
213
- `/pbr:begin` — start a new project
214
- `/pbr:status` — check current project status
215
- `/pbr:config` — configure workflow settings
216
- `/pbr:help <command>` — detailed help for a specific command
213
+ - `/pbr:begin` — start a new project
214
+ - `/pbr:status` — check current project status
215
+ - `/pbr:config` — configure workflow settings
216
+ - `/pbr:help <command>` — detailed help for a specific command
217
217
 
218
218
  ```
219
219