@sienklogic/plan-build-run 2.22.2 → 2.24.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +42 -0
- package/dashboard/package.json +3 -2
- package/dashboard/src/middleware/errorHandler.js +12 -2
- package/dashboard/src/repositories/planning.repository.js +24 -12
- package/dashboard/src/routes/pages.routes.js +182 -4
- package/dashboard/src/server.js +4 -0
- package/dashboard/src/services/audit.service.js +42 -0
- package/dashboard/src/services/dashboard.service.js +1 -12
- package/dashboard/src/services/local-llm-metrics.service.js +81 -0
- package/dashboard/src/services/quick.service.js +62 -0
- package/dashboard/src/services/roadmap.service.js +1 -11
- package/dashboard/src/utils/strip-bom.js +8 -0
- package/dashboard/src/views/audit-detail.ejs +5 -0
- package/dashboard/src/views/audits.ejs +5 -0
- package/dashboard/src/views/partials/analytics-content.ejs +61 -0
- package/dashboard/src/views/partials/audit-detail-content.ejs +12 -0
- package/dashboard/src/views/partials/audits-content.ejs +34 -0
- package/dashboard/src/views/partials/quick-content.ejs +40 -0
- package/dashboard/src/views/partials/quick-detail-content.ejs +29 -0
- package/dashboard/src/views/partials/sidebar.ejs +16 -0
- package/dashboard/src/views/partials/todos-content.ejs +13 -3
- package/dashboard/src/views/quick-detail.ejs +5 -0
- package/dashboard/src/views/quick.ejs +5 -0
- package/package.json +1 -1
- package/plugins/copilot-pbr/agents/debugger.agent.md +15 -0
- package/plugins/copilot-pbr/agents/integration-checker.agent.md +9 -2
- package/plugins/copilot-pbr/agents/planner.agent.md +19 -0
- package/plugins/copilot-pbr/agents/researcher.agent.md +20 -0
- package/plugins/copilot-pbr/agents/synthesizer.agent.md +12 -0
- package/plugins/copilot-pbr/agents/verifier.agent.md +22 -2
- package/plugins/copilot-pbr/plugin.json +1 -1
- package/plugins/copilot-pbr/references/config-reference.md +89 -0
- package/plugins/copilot-pbr/references/plan-format.md +22 -0
- package/plugins/copilot-pbr/skills/health/SKILL.md +8 -1
- package/plugins/copilot-pbr/skills/help/SKILL.md +4 -4
- package/plugins/copilot-pbr/skills/milestone/SKILL.md +12 -12
- package/plugins/copilot-pbr/skills/status/SKILL.md +37 -1
- package/plugins/copilot-pbr/templates/INTEGRATION-REPORT.md.tmpl +18 -2
- package/plugins/copilot-pbr/templates/VERIFICATION-DETAIL.md.tmpl +2 -1
- package/plugins/cursor-pbr/.cursor-plugin/plugin.json +1 -1
- package/plugins/cursor-pbr/agents/debugger.md +15 -0
- package/plugins/cursor-pbr/agents/integration-checker.md +9 -2
- package/plugins/cursor-pbr/agents/planner.md +19 -0
- package/plugins/cursor-pbr/agents/researcher.md +20 -0
- package/plugins/cursor-pbr/agents/synthesizer.md +12 -0
- package/plugins/cursor-pbr/agents/verifier.md +22 -2
- package/plugins/cursor-pbr/references/config-reference.md +89 -0
- package/plugins/cursor-pbr/references/plan-format.md +22 -0
- package/plugins/cursor-pbr/skills/health/SKILL.md +8 -1
- package/plugins/cursor-pbr/skills/help/SKILL.md +4 -4
- package/plugins/cursor-pbr/skills/milestone/SKILL.md +12 -12
- package/plugins/cursor-pbr/skills/status/SKILL.md +37 -1
- package/plugins/cursor-pbr/templates/INTEGRATION-REPORT.md.tmpl +18 -2
- package/plugins/cursor-pbr/templates/VERIFICATION-DETAIL.md.tmpl +2 -1
- package/plugins/pbr/.claude-plugin/plugin.json +1 -1
- package/plugins/pbr/agents/debugger.md +15 -0
- package/plugins/pbr/agents/integration-checker.md +9 -2
- package/plugins/pbr/agents/planner.md +19 -0
- package/plugins/pbr/agents/researcher.md +20 -0
- package/plugins/pbr/agents/synthesizer.md +12 -0
- package/plugins/pbr/agents/verifier.md +22 -2
- package/plugins/pbr/references/config-reference.md +89 -0
- package/plugins/pbr/references/plan-format.md +22 -0
- package/plugins/pbr/scripts/check-config-change.js +33 -0
- package/plugins/pbr/scripts/check-plan-format.js +52 -4
- package/plugins/pbr/scripts/check-subagent-output.js +43 -3
- package/plugins/pbr/scripts/config-schema.json +48 -0
- package/plugins/pbr/scripts/local-llm/client.js +214 -0
- package/plugins/pbr/scripts/local-llm/health.js +217 -0
- package/plugins/pbr/scripts/local-llm/metrics.js +252 -0
- package/plugins/pbr/scripts/local-llm/operations/classify-artifact.js +76 -0
- package/plugins/pbr/scripts/local-llm/operations/classify-error.js +75 -0
- package/plugins/pbr/scripts/local-llm/operations/score-source.js +72 -0
- package/plugins/pbr/scripts/local-llm/operations/summarize-context.js +62 -0
- package/plugins/pbr/scripts/local-llm/operations/validate-task.js +59 -0
- package/plugins/pbr/scripts/local-llm/router.js +101 -0
- package/plugins/pbr/scripts/local-llm/shadow.js +60 -0
- package/plugins/pbr/scripts/local-llm/threshold-tuner.js +118 -0
- package/plugins/pbr/scripts/pbr-tools.js +120 -3
- package/plugins/pbr/scripts/post-write-dispatch.js +2 -2
- package/plugins/pbr/scripts/progress-tracker.js +29 -3
- package/plugins/pbr/scripts/session-cleanup.js +36 -1
- package/plugins/pbr/scripts/validate-task.js +30 -1
- package/plugins/pbr/skills/health/SKILL.md +8 -1
- package/plugins/pbr/skills/help/SKILL.md +4 -4
- package/plugins/pbr/skills/milestone/SKILL.md +12 -12
- package/plugins/pbr/skills/status/SKILL.md +38 -2
- package/plugins/pbr/templates/INTEGRATION-REPORT.md.tmpl +18 -2
- package/plugins/pbr/templates/VERIFICATION-DETAIL.md.tmpl +2 -1
- package/dashboard/src/views/coming-soon.ejs +0 -11
|
@@ -88,3 +88,64 @@
|
|
|
88
88
|
<% } else { %>
|
|
89
89
|
<p>No phase data available.</p>
|
|
90
90
|
<% } %>
|
|
91
|
+
|
|
92
|
+
<% if (typeof llmMetrics !== 'undefined' && llmMetrics) { %>
|
|
93
|
+
<article>
|
|
94
|
+
<header>Local LLM Offload</header>
|
|
95
|
+
<div class="grid">
|
|
96
|
+
<article>
|
|
97
|
+
<header>Total Calls</header>
|
|
98
|
+
<strong class="stat-value"><%= llmMetrics.summary.total_calls %></strong>
|
|
99
|
+
<span class="stat-unit">calls</span>
|
|
100
|
+
</article>
|
|
101
|
+
<article>
|
|
102
|
+
<header>Tokens Saved</header>
|
|
103
|
+
<strong class="stat-value"><%= llmMetrics.summary.tokens_saved.toLocaleString() %></strong>
|
|
104
|
+
<span class="stat-unit">frontier tokens</span>
|
|
105
|
+
</article>
|
|
106
|
+
<article>
|
|
107
|
+
<header>Est. Cost Saved</header>
|
|
108
|
+
<strong class="stat-value">$<%= llmMetrics.summary.cost_saved_usd.toFixed(4) %></strong>
|
|
109
|
+
<span class="stat-unit">at $3/M tokens</span>
|
|
110
|
+
</article>
|
|
111
|
+
<article>
|
|
112
|
+
<header>Fallback Rate</header>
|
|
113
|
+
<strong class="stat-value"><%= llmMetrics.summary.fallback_rate_pct %>%</strong>
|
|
114
|
+
<span class="stat-unit"><%= llmMetrics.summary.fallback_count %> fallbacks</span>
|
|
115
|
+
</article>
|
|
116
|
+
<article>
|
|
117
|
+
<header>Avg Latency</header>
|
|
118
|
+
<strong class="stat-value"><%= llmMetrics.summary.avg_latency_ms %></strong>
|
|
119
|
+
<span class="stat-unit">ms/call</span>
|
|
120
|
+
</article>
|
|
121
|
+
</div>
|
|
122
|
+
<% if (llmMetrics.byOperation && llmMetrics.byOperation.length > 0) { %>
|
|
123
|
+
<div class="overflow-auto" style="margin-top: var(--space-md);">
|
|
124
|
+
<table>
|
|
125
|
+
<thead>
|
|
126
|
+
<tr>
|
|
127
|
+
<th>Operation</th>
|
|
128
|
+
<th>Calls</th>
|
|
129
|
+
<th>Fallbacks</th>
|
|
130
|
+
<th>Tokens Saved</th>
|
|
131
|
+
</tr>
|
|
132
|
+
</thead>
|
|
133
|
+
<tbody>
|
|
134
|
+
<% llmMetrics.byOperation.forEach(op => { %>
|
|
135
|
+
<tr>
|
|
136
|
+
<td><%= op.operation %></td>
|
|
137
|
+
<td><%= op.calls %></td>
|
|
138
|
+
<td><%= op.fallbacks %></td>
|
|
139
|
+
<td><%= op.tokens_saved.toLocaleString() %></td>
|
|
140
|
+
</tr>
|
|
141
|
+
<% }) %>
|
|
142
|
+
</tbody>
|
|
143
|
+
</table>
|
|
144
|
+
</div>
|
|
145
|
+
<% } %>
|
|
146
|
+
<footer style="color: var(--pico-muted-color); font-size: 0.85em;">
|
|
147
|
+
Baseline estimate: each local call replaced ~<%= llmMetrics.baseline.estimated_frontier_tokens_without_local.toLocaleString() %> frontier tokens total.
|
|
148
|
+
Advisory only — no data collected when local LLM is disabled.
|
|
149
|
+
</footer>
|
|
150
|
+
</article>
|
|
151
|
+
<% } %>
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
<%- include('breadcrumbs', { breadcrumbs: typeof breadcrumbs !== 'undefined' ? breadcrumbs : [] }) %>
|
|
2
|
+
<h1><%= title %></h1>
|
|
3
|
+
|
|
4
|
+
<p><a href="/audits">← Back to Audit Reports</a></p>
|
|
5
|
+
|
|
6
|
+
<% if (typeof date !== 'undefined' && date) { %>
|
|
7
|
+
<p><small>Date: <%= date %></small></p>
|
|
8
|
+
<% } %>
|
|
9
|
+
|
|
10
|
+
<article class="markdown-body">
|
|
11
|
+
<%- html %>
|
|
12
|
+
</article>
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
<%- include('breadcrumbs', { breadcrumbs: typeof breadcrumbs !== 'undefined' ? breadcrumbs : [] }) %>
|
|
2
|
+
<h1>Audit Reports</h1>
|
|
3
|
+
|
|
4
|
+
<% if (typeof reports !== 'undefined' && reports.length > 0) { %>
|
|
5
|
+
<article>
|
|
6
|
+
<div class="table-wrap">
|
|
7
|
+
<table>
|
|
8
|
+
<thead>
|
|
9
|
+
<tr>
|
|
10
|
+
<th scope="col">Date</th>
|
|
11
|
+
<th scope="col">Report</th>
|
|
12
|
+
</tr>
|
|
13
|
+
</thead>
|
|
14
|
+
<tbody>
|
|
15
|
+
<% reports.forEach(function(report) { %>
|
|
16
|
+
<tr>
|
|
17
|
+
<td><%= report.date || '—' %></td>
|
|
18
|
+
<td>
|
|
19
|
+
<a href="/audits/<%= report.filename %>"
|
|
20
|
+
hx-get="/audits/<%= report.filename %>"
|
|
21
|
+
hx-target="#main-content"
|
|
22
|
+
hx-push-url="true">
|
|
23
|
+
<%= report.title %>
|
|
24
|
+
</a>
|
|
25
|
+
</td>
|
|
26
|
+
</tr>
|
|
27
|
+
<% }); %>
|
|
28
|
+
</tbody>
|
|
29
|
+
</table>
|
|
30
|
+
</div>
|
|
31
|
+
</article>
|
|
32
|
+
<% } else { %>
|
|
33
|
+
<%- include('empty-state', { icon: '🔍', title: 'No audit reports found', action: 'Run /pbr:audit to generate a session audit report.' }) %>
|
|
34
|
+
<% } %>
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
<%- include('breadcrumbs', { breadcrumbs: typeof breadcrumbs !== 'undefined' ? breadcrumbs : [] }) %>
|
|
2
|
+
<h1>Quick Tasks</h1>
|
|
3
|
+
|
|
4
|
+
<% if (typeof tasks !== 'undefined' && tasks.length > 0) { %>
|
|
5
|
+
<article>
|
|
6
|
+
<div class="table-wrap">
|
|
7
|
+
<table>
|
|
8
|
+
<thead>
|
|
9
|
+
<tr>
|
|
10
|
+
<th scope="col">ID</th>
|
|
11
|
+
<th scope="col">Title</th>
|
|
12
|
+
<th scope="col">Status</th>
|
|
13
|
+
</tr>
|
|
14
|
+
</thead>
|
|
15
|
+
<tbody>
|
|
16
|
+
<% tasks.forEach(function(task) { %>
|
|
17
|
+
<tr>
|
|
18
|
+
<td><%= task.id %></td>
|
|
19
|
+
<td>
|
|
20
|
+
<a href="/quick/<%= task.id %>"
|
|
21
|
+
hx-get="/quick/<%= task.id %>"
|
|
22
|
+
hx-target="#main-content"
|
|
23
|
+
hx-push-url="true">
|
|
24
|
+
<%= task.title %>
|
|
25
|
+
</a>
|
|
26
|
+
</td>
|
|
27
|
+
<td>
|
|
28
|
+
<span class="status-badge" data-status="<%= task.status %>">
|
|
29
|
+
<%= task.status %>
|
|
30
|
+
</span>
|
|
31
|
+
</td>
|
|
32
|
+
</tr>
|
|
33
|
+
<% }); %>
|
|
34
|
+
</tbody>
|
|
35
|
+
</table>
|
|
36
|
+
</div>
|
|
37
|
+
</article>
|
|
38
|
+
<% } else { %>
|
|
39
|
+
<%- include('empty-state', { icon: '⚡', title: 'No quick tasks found', action: '' }) %>
|
|
40
|
+
<% } %>
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
<%- include('breadcrumbs', { breadcrumbs: typeof breadcrumbs !== 'undefined' ? breadcrumbs : [] }) %>
|
|
2
|
+
<h1><%= title %></h1>
|
|
3
|
+
|
|
4
|
+
<p><a href="/quick">← Back to Quick Tasks</a></p>
|
|
5
|
+
|
|
6
|
+
<article>
|
|
7
|
+
<header>
|
|
8
|
+
<strong>Quick Task <%= id %></strong>
|
|
9
|
+
|
|
10
|
+
<span class="status-badge" data-status="<%= status %>">
|
|
11
|
+
<%= status %>
|
|
12
|
+
</span>
|
|
13
|
+
</header>
|
|
14
|
+
|
|
15
|
+
<% if (planHtml) { %>
|
|
16
|
+
<section>
|
|
17
|
+
<h2>Plan</h2>
|
|
18
|
+
<%- planHtml %>
|
|
19
|
+
</section>
|
|
20
|
+
<% } %>
|
|
21
|
+
|
|
22
|
+
<% if (summaryHtml) { %>
|
|
23
|
+
<hr>
|
|
24
|
+
<section>
|
|
25
|
+
<h2>Summary</h2>
|
|
26
|
+
<%- summaryHtml %>
|
|
27
|
+
</section>
|
|
28
|
+
<% } %>
|
|
29
|
+
</article>
|
|
@@ -79,6 +79,22 @@
|
|
|
79
79
|
Notes
|
|
80
80
|
</a>
|
|
81
81
|
</li>
|
|
82
|
+
<li>
|
|
83
|
+
<a href="/quick"
|
|
84
|
+
hx-get="/quick"
|
|
85
|
+
hx-target="#main-content"
|
|
86
|
+
hx-push-url="true"<%= typeof activePage !== 'undefined' && activePage === 'quick' ? ' aria-current="page"' : '' %>>
|
|
87
|
+
Quick Tasks
|
|
88
|
+
</a>
|
|
89
|
+
</li>
|
|
90
|
+
<li>
|
|
91
|
+
<a href="/audits"
|
|
92
|
+
hx-get="/audits"
|
|
93
|
+
hx-target="#main-content"
|
|
94
|
+
hx-push-url="true"<%= typeof activePage !== 'undefined' && activePage === 'audits' ? ' aria-current="page"' : '' %>>
|
|
95
|
+
Audit Reports
|
|
96
|
+
</a>
|
|
97
|
+
</li>
|
|
82
98
|
</ul>
|
|
83
99
|
</details>
|
|
84
100
|
|
|
@@ -1,10 +1,17 @@
|
|
|
1
1
|
<%- include('breadcrumbs', { breadcrumbs: typeof breadcrumbs !== 'undefined' ? breadcrumbs : [] }) %>
|
|
2
2
|
<h1>Todos</h1>
|
|
3
3
|
|
|
4
|
-
<p
|
|
4
|
+
<p>
|
|
5
|
+
<a href="/todos/new" role="button"
|
|
5
6
|
hx-get="/todos/new"
|
|
6
7
|
hx-target="#main-content"
|
|
7
|
-
hx-push-url="true">Create Todo</a
|
|
8
|
+
hx-push-url="true">Create Todo</a>
|
|
9
|
+
|
|
10
|
+
<a href="/todos/done"
|
|
11
|
+
hx-get="/todos/done"
|
|
12
|
+
hx-target="#main-content"
|
|
13
|
+
hx-push-url="true">View Completed Todos</a>
|
|
14
|
+
</p>
|
|
8
15
|
|
|
9
16
|
<% const f = typeof filters !== 'undefined' ? filters : { priority: '', status: '', q: '' }; %>
|
|
10
17
|
<article>
|
|
@@ -70,7 +77,10 @@
|
|
|
70
77
|
<tr>
|
|
71
78
|
<td><%= todo.id %></td>
|
|
72
79
|
<td>
|
|
73
|
-
<a href="/todos/<%= todo.id %>"
|
|
80
|
+
<a href="/todos/<%= todo.id %>"
|
|
81
|
+
hx-get="/todos/<%= todo.id %>"
|
|
82
|
+
hx-target="#main-content"
|
|
83
|
+
hx-push-url="true">
|
|
74
84
|
<%= todo.title %>
|
|
75
85
|
</a>
|
|
76
86
|
</td>
|
package/package.json
CHANGED
|
@@ -138,6 +138,21 @@ Then emit a `DECISION` checkpoint asking the user to approve, modify, or reject
|
|
|
138
138
|
|
|
139
139
|
**Commit format**: `fix({scope}): {description}` with body: `Root cause: ...` and `Debug session: .planning/debug/{slug}.md`
|
|
140
140
|
|
|
141
|
+
## Local LLM Error Classification (Optional)
|
|
142
|
+
|
|
143
|
+
When you receive an error message or stack trace, you MAY use the local LLM to classify it before starting hypothesis generation. This is advisory — skip it if unavailable.
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
# Write the error to a temp file, then classify:
|
|
147
|
+
echo "Error text here" > /tmp/debug-error.txt
|
|
148
|
+
node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm classify-error /tmp/debug-error.txt debugger 2>/dev/null
|
|
149
|
+
# Returns: {"category":"missing_output","confidence":0.91,"latency_ms":1840,"fallback_used":false}
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
Categories: `connection_refused`, `timeout`, `missing_output`, `wrong_output_format`, `permission_error`, `unknown`.
|
|
153
|
+
|
|
154
|
+
If classification succeeds, use the returned category to bias your initial hypothesis ranking. If it returns null or fails, proceed with manual hypothesis generation as normal.
|
|
155
|
+
|
|
141
156
|
## Common Bug Patterns
|
|
142
157
|
|
|
143
158
|
Reference: `references/common-bug-patterns.md` — covers off-by-one, null/undefined, async/timing, state management, import/module, environment, and data shape patterns.
|
|
@@ -35,6 +35,7 @@ You MUST perform all applicable categories (skip only if zero items exist for th
|
|
|
35
35
|
3. **Auth Protection** — Every non-public route must have auth middleware. Frontend route guards must match backend protection.
|
|
36
36
|
4. **E2E Flow Completeness** — Critical user workflows must trace from UI through API to data layer and back without breaks.
|
|
37
37
|
5. **Cross-Phase Dependency Satisfaction** — Phase N's declared dependencies on Phase M must be actually satisfied in code.
|
|
38
|
+
6. **Data-Flow Propagation** — Values originating at one boundary (hook stdin fields, API request params, env vars) must propagate correctly through the call chain to their destination (log entries, database records, API responses). A connected pipeline with missing data is a broken integration.
|
|
38
39
|
|
|
39
40
|
> **First-phase edge case**: If no completed phases exist yet, focus on verifying the current phase's internal consistency — exports match imports within the phase, API contracts are self-consistent. Cross-phase checks are not applicable and should be skipped.
|
|
40
41
|
|
|
@@ -47,14 +48,19 @@ Read `references/agent-contracts.md` to validate agent-to-agent handoffs. Verify
|
|
|
47
48
|
- **Write access for output artifact only** — you have Write access for your output artifact only. You CANNOT fix source code — you REPORT issues.
|
|
48
49
|
- **Cross-phase scope** — unlike verifier (single phase), you check across phases.
|
|
49
50
|
|
|
50
|
-
##
|
|
51
|
+
## 7-Step Verification Process
|
|
51
52
|
|
|
52
53
|
1. **Build Export/Import Map**: Read each completed phase's SUMMARY.md frontmatter (`requires`, `provides`, `affects`). Grep actual exports/imports in source. Cross-reference declared vs actual — flag mismatches.
|
|
53
54
|
2. **Verify Export Usage**: For each `provides` item: locate actual export (missing = `MISSING_EXPORT` ERROR), find consumers (none = `ORPHANED` WARNING), verify usage not just import (`IMPORTED_UNUSED` WARNING), check signature compatibility (`MISMATCHED` ERROR). Status `CONSUMED` = OK.
|
|
54
55
|
3. **Verify API Coverage**: Discover routes, find frontend callers, match by method+path+body/params. Produce coverage table. See `references/integration-patterns.md` for framework-specific patterns.
|
|
55
56
|
4. **Verify Auth Protection**: Identify auth mechanism, list all routes, classify (public vs protected), check frontend guards. Flag UNPROTECTED routes.
|
|
56
57
|
5. **Verify E2E Flows**: Trace critical workflows step-by-step — verify each step exists and connects to the next (import/call/redirect). Record evidence (file:line). Flow status: COMPLETE | BROKEN | PARTIAL | UNTRACEABLE. See `references/integration-patterns.md` for flow templates.
|
|
57
|
-
6. **
|
|
58
|
+
6. **Verify Data-Flow Propagation**: For each cross-boundary data field identified in plans or SUMMARY.md, trace the value from source through intermediate functions to destination. Verify the value is actually passed (not `undefined`/`null`/hardcoded) at each step.
|
|
59
|
+
- **Source examples**: hook stdin (`data.session_id`), API request params, environment variables, config fields
|
|
60
|
+
- **Destination examples**: log entries, database records, API responses, metric files
|
|
61
|
+
- **Method**: Grep each intermediate call site and inspect arguments. Flag `DATA_DROPPED` when a value available in scope is replaced by `undefined` or a placeholder.
|
|
62
|
+
- **Status**: `PROPAGATED` (value flows correctly) | `DATA_DROPPED` (value lost at some step) | `UNTRACEABLE` (cannot determine flow)
|
|
63
|
+
7. **Compile Integration Report**: Produce final report with all findings by category.
|
|
58
64
|
|
|
59
65
|
## Output Format
|
|
60
66
|
|
|
@@ -119,3 +125,4 @@ See `references/integration-patterns.md` for grep/search patterns by framework.
|
|
|
119
125
|
- "File exists" is not "component is integrated"
|
|
120
126
|
- Auth middleware existing somewhere does not mean routes are protected
|
|
121
127
|
- Always check error handling paths, not just happy paths
|
|
128
|
+
- Structural connectivity is not data-flow correctness — a connected pipeline can still drop data at any step
|
|
@@ -66,6 +66,23 @@ Each must-have maps to one or more tasks. Every task exists to make a must-have
|
|
|
66
66
|
|
|
67
67
|
---
|
|
68
68
|
|
|
69
|
+
## Data Contracts for Cross-Boundary Parameters
|
|
70
|
+
|
|
71
|
+
When a function signature includes parameters that flow across module boundaries — session IDs from hook stdin, config objects from disk, auth tokens from environment — the plan **MUST** specify the **source** for each argument, not just the type.
|
|
72
|
+
|
|
73
|
+
For every cross-boundary call in a task's `<action>`, document:
|
|
74
|
+
|
|
75
|
+
| Parameter | Source | Context | Fallback |
|
|
76
|
+
|-----------|--------|---------|----------|
|
|
77
|
+
| `sessionId` | `data.session_id` (hook stdin) | Hook scripts only | `undefined` (CLI context) |
|
|
78
|
+
| `config` | `configLoad(planningDir)` | All callers | `resolveConfig(undefined)` |
|
|
79
|
+
|
|
80
|
+
**When to apply:** Any function call where the caller and callee live in different modules AND at least one argument originates from an external boundary (stdin, env, disk, network). Internal helper calls within the same module do not need contracts.
|
|
81
|
+
|
|
82
|
+
**Why this matters:** Without explicit source mapping, executors will use the type-correct but value-wrong default (e.g., `undefined` instead of `data.session_id`). The plan is the single source of truth for how data flows — if the plan says `undefined`, the executor will faithfully implement `undefined`.
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
69
86
|
## Plan Structure
|
|
70
87
|
|
|
71
88
|
Read `references/plan-format.md` for the complete plan file specification including:
|
|
@@ -165,6 +182,7 @@ When CONTEXT.md or RESEARCH-SUMMARY.md contains `[NEEDS DECISION]` flags from th
|
|
|
165
182
|
- [ ] Dependencies are acyclic, no file conflicts within same wave
|
|
166
183
|
- [ ] Locked decisions honored, no deferred ideas included
|
|
167
184
|
- [ ] Verify commands are actually executable
|
|
185
|
+
- [ ] Cross-boundary parameters have documented sources (data contracts)
|
|
168
186
|
|
|
169
187
|
---
|
|
170
188
|
|
|
@@ -238,3 +256,4 @@ One-line task descriptions in `<name>`. File paths in `<files>`, not explanation
|
|
|
238
256
|
9. DO NOT plan for features outside the current phase goal
|
|
239
257
|
10. DO NOT assume research is done — check discovery level
|
|
240
258
|
11. DO NOT leave done conditions vague — they must be observable
|
|
259
|
+
12. DO NOT specify literal `undefined` for parameters that have a known source in the calling context — use data contracts to map sources
|
|
@@ -54,6 +54,26 @@ All claims must be attributed to a source level. Higher levels override lower le
|
|
|
54
54
|
|
|
55
55
|
**Offline Fallback**: If web tools are unavailable (air-gapped environment, MCP not configured), rely on local sources: codebase analysis via Glob/Grep, existing documentation, and README files. Assign these S3-S4 confidence levels. Do not attempt WebFetch or WebSearch — note in the output header that external sources were unavailable.
|
|
56
56
|
|
|
57
|
+
## Local LLM Source Scoring (Optional)
|
|
58
|
+
|
|
59
|
+
If local LLM offload is configured, you MAY use it to score source credibility instead of manually assigning S-levels. This is advisory — never wait on it or fail if it returns null.
|
|
60
|
+
|
|
61
|
+
Check availability first:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm status 2>/dev/null
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
If `enabled: true`, score a source excerpt:
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
echo "Source URL and content excerpt" > /tmp/source-excerpt.txt
|
|
71
|
+
node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm score-source "https://example.com/docs" /tmp/source-excerpt.txt 2>/dev/null
|
|
72
|
+
# Returns: {"level":"S2","confidence":0.87,"reason":"Official library documentation page"}
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Use the returned `level` to set your source tag. If the call fails or returns `null`, assign the level manually per the hierarchy table above.
|
|
76
|
+
|
|
57
77
|
---
|
|
58
78
|
|
|
59
79
|
## Confidence Levels
|
|
@@ -98,6 +98,18 @@ conflicts: N
|
|
|
98
98
|
- **Research gaps**: Add `[RESEARCH GAP]` flag, add to Open Questions with high impact, never fabricate
|
|
99
99
|
- **Duplicates**: Consolidate into one entry, note multi-source agreement, reference all documents
|
|
100
100
|
|
|
101
|
+
## Local LLM Context Summarization (Optional)
|
|
102
|
+
|
|
103
|
+
When input research documents are large (>2000 words combined), you MAY use the local LLM to pre-summarize each document before synthesis. This reduces your own context consumption. Advisory only — if unavailable, read documents normally.
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
# Pre-summarize a large research document to ~150 words:
|
|
107
|
+
node "${PLUGIN_ROOT}/scripts/pbr-tools.js" llm summarize /path/to/RESEARCH.md 150 2>/dev/null
|
|
108
|
+
# Returns: {"summary":"...plain text summary under 150 words...","latency_ms":2100,"fallback_used":false}
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
Use the returned `summary` string as your working copy of that document's findings. Still read the original for any specific version numbers, code examples, or direct quotes needed in the output.
|
|
112
|
+
|
|
101
113
|
## Anti-Patterns
|
|
102
114
|
|
|
103
115
|
### Universal Anti-Patterns
|
|
@@ -95,10 +95,29 @@ Verify the artifact is imported AND used by other parts of the system (functions
|
|
|
95
95
|
| Yes | Yes | No | UNWIRED |
|
|
96
96
|
| Yes | Yes | Yes | PASSED |
|
|
97
97
|
|
|
98
|
+
> **Note:** WIRED status (Level 3) requires correct arguments, not just correct function names. A call that passes `undefined` for a parameter available in scope is `ARGS_WRONG`, not `WIRED`.
|
|
99
|
+
|
|
98
100
|
### Step 6: Verify Key Links (Always)
|
|
99
101
|
|
|
100
102
|
For each key_link: identify source and target components, verify the import path resolves, verify the imported symbol is actually called/used, and verify call signatures match. Watch for: wrong import paths, imported-but-never-called symbols, defined-but-never-applied middleware, registered-but-never-triggered event handlers.
|
|
101
103
|
|
|
104
|
+
### Step 6b: Argument-Level Spot Checks (Always)
|
|
105
|
+
|
|
106
|
+
Beyond verifying that calls exist, spot-check that **arguments passed to cross-boundary calls carry the correct values**. A call with the right function but wrong arguments is effectively UNWIRED.
|
|
107
|
+
|
|
108
|
+
**Focus on:** IDs (session, user, request), config objects, auth tokens, and context data that originate from external boundaries (stdin, env, disk).
|
|
109
|
+
|
|
110
|
+
**Method:**
|
|
111
|
+
1. For each key_link verified in Step 6, grep the call site and inspect the arguments
|
|
112
|
+
2. Compare each argument against the data source available in the calling scope
|
|
113
|
+
3. Flag any argument that passes `undefined`, `null`, or a hardcoded placeholder when the calling scope has the real value available (e.g., `data.session_id` is in scope but `undefined` is passed)
|
|
114
|
+
|
|
115
|
+
**Classification:**
|
|
116
|
+
- `WIRED` requires both correct function AND correct arguments
|
|
117
|
+
- `ARGS_WRONG` = correct function called but one or more arguments are incorrect/missing — this is a key link gap
|
|
118
|
+
|
|
119
|
+
**Example:** A hook script receives `data` from stdin containing `session_id`. If it calls `logMetric(planningDir, { session_id: undefined })` instead of `logMetric(planningDir, { session_id: data.session_id })`, that is an `ARGS_WRONG` gap even though the call itself exists.
|
|
120
|
+
|
|
102
121
|
### Step 7: Check Requirements Coverage (Always)
|
|
103
122
|
|
|
104
123
|
Cross-reference all must-haves against verification results in a table:
|
|
@@ -107,8 +126,8 @@ Cross-reference all must-haves against verification results in a table:
|
|
|
107
126
|
| # | Must-Have | Type | L1 (Exists) | L2 (Substantive) | L3 (Wired) | Status |
|
|
108
127
|
|---|----------|------|-------------|-------------------|------------|--------|
|
|
109
128
|
| 1 | {description} | truth | - | - | - | VERIFIED/FAILED |
|
|
110
|
-
| 2 | {description} | artifact | YES/NO | YES/STUB/PARTIAL | WIRED/ORPHANED | PASS/FAIL |
|
|
111
|
-
| 3 | {description} | key_link | - | - | YES/NO | PASS/FAIL |
|
|
129
|
+
| 2 | {description} | artifact | YES/NO | YES/STUB/PARTIAL | WIRED/ORPHANED/ARGS_WRONG | PASS/FAIL |
|
|
130
|
+
| 3 | {description} | key_link | - | - | YES/NO/ARGS_WRONG | PASS/FAIL |
|
|
112
131
|
```
|
|
113
132
|
|
|
114
133
|
### Step 8: Scan for Anti-Patterns (Full Verification Only)
|
|
@@ -226,3 +245,4 @@ Read `references/stub-patterns.md` for stub detection patterns by technology. Re
|
|
|
226
245
|
9. DO NOT give PASSED status if ANY must-have fails at ANY level
|
|
227
246
|
10. DO NOT count deferred items as gaps — they are intentionally not implemented
|
|
228
247
|
11. DO NOT be lenient — your job is to find problems, not to be encouraging
|
|
248
|
+
12. DO NOT mark a call as WIRED if it passes hardcoded `undefined`/`null` for parameters that have a known source in scope — check arguments, not just function names
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "pbr",
|
|
3
3
|
"displayName": "Plan-Build-Run",
|
|
4
|
-
"version": "2.
|
|
4
|
+
"version": "2.24.0",
|
|
5
5
|
"description": "Plan-Build-Run — Structured development workflow for GitHub Copilot CLI. Solves context rot through disciplined agent delegation, structured planning, atomic execution, and goal-backward verification.",
|
|
6
6
|
"author": {
|
|
7
7
|
"name": "SienkLogic",
|
|
@@ -440,3 +440,92 @@ Run validation with: `node plugins/pbr/scripts/pbr-tools.js config validate`
|
|
|
440
440
|
| `tdd_mode: true` + `depth: quick` | quick depth skips verification, which conflicts with TDD's verify-first approach |
|
|
441
441
|
| `git.mode: disabled` + `atomic_commits: true` | atomic_commits has no effect when git is disabled |
|
|
442
442
|
| `git.branching: phase` + `git.mode: disabled` | Branching settings are ignored when git is disabled |
|
|
443
|
+
|
|
444
|
+
---
|
|
445
|
+
|
|
446
|
+
## local_llm
|
|
447
|
+
|
|
448
|
+
Offloads selected PBR inference tasks to a locally running Ollama instance, reducing frontier model usage and latency for fast classification calls. The key `enabled` defaults to `false`, so users without Ollama see no change — all LLM calls continue routing to Claude as normal. When enabled, PBR uses a `local_first` routing strategy: fast tasks (artifact classification, task validation) go to the local model; complex tasks (planning, execution) stay on the frontier model.
|
|
449
|
+
|
|
450
|
+
### Quick setup
|
|
451
|
+
|
|
452
|
+
1. Install Ollama:
|
|
453
|
+
- **Linux/macOS**: `curl -fsSL https://ollama.com/install.sh | sh`
|
|
454
|
+
- **Windows**: Download from [ollama.com/download](https://ollama.com/download) and run the installer
|
|
455
|
+
2. Pull the recommended model: `ollama pull qwen2.5-coder:7b`
|
|
456
|
+
3. Add to `.planning/config.json`:
|
|
457
|
+
|
|
458
|
+
```json
|
|
459
|
+
"local_llm": {
|
|
460
|
+
"enabled": true,
|
|
461
|
+
"model": "qwen2.5-coder:7b"
|
|
462
|
+
}
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
4. Verify connectivity: `node /path/to/plugins/pbr/scripts/pbr-tools.js llm health`
|
|
466
|
+
|
|
467
|
+
### Field reference
|
|
468
|
+
|
|
469
|
+
| Property | Type | Default | Description |
|
|
470
|
+
|----------|------|---------|-------------|
|
|
471
|
+
| `local_llm.enabled` | boolean | `false` | Enable local LLM offloading; `false` = all calls use frontier |
|
|
472
|
+
| `local_llm.provider` | string | `"ollama"` | Backend provider; only `"ollama"` is supported |
|
|
473
|
+
| `local_llm.endpoint` | string | `"http://localhost:11434"` | Ollama API base URL |
|
|
474
|
+
| `local_llm.model` | string | `"qwen2.5-coder:7b"` | Model tag to use for local inference |
|
|
475
|
+
| `local_llm.timeout_ms` | integer | `3000` | Per-request timeout in milliseconds; >= 500 |
|
|
476
|
+
| `local_llm.max_retries` | integer | `1` | Number of retry attempts on failure before falling back |
|
|
477
|
+
| `local_llm.fallback` | string | `"frontier"` | What to use when local LLM fails: `"frontier"` or `"skip"` |
|
|
478
|
+
| `local_llm.routing_strategy` | string | `"local_first"` | `"local_first"` sends fast tasks local; `"always_local"` routes everything |
|
|
479
|
+
|
|
480
|
+
### features sub-table
|
|
481
|
+
|
|
482
|
+
Controls which PBR tasks are eligible for local LLM offloading.
|
|
483
|
+
|
|
484
|
+
| Property | Default | Description |
|
|
485
|
+
|----------|---------|-------------|
|
|
486
|
+
| `artifact_classification` | `true` | Classify artifact types (PLAN, SUMMARY, VERIFICATION) locally |
|
|
487
|
+
| `task_validation` | `true` | Validate task scope and completeness locally |
|
|
488
|
+
| `context_summarization` | `false` | Summarize context windows locally (higher token demand) |
|
|
489
|
+
| `source_scoring` | `false` | Score source files by relevance locally |
|
|
490
|
+
|
|
491
|
+
### advanced sub-table
|
|
492
|
+
|
|
493
|
+
| Property | Default | Description |
|
|
494
|
+
|----------|---------|-------------|
|
|
495
|
+
| `confidence_threshold` | `0.9` | Minimum confidence (0–1) for local output to be accepted; below this, falls back to frontier |
|
|
496
|
+
| `shadow_mode` | `false` | Run local LLM in parallel with frontier but discard local results — useful for tuning confidence thresholds without affecting output |
|
|
497
|
+
| `max_input_tokens` | `2000` | Truncate inputs longer than this before sending to local model |
|
|
498
|
+
| `keep_alive` | `"30m"` | How long Ollama keeps the model loaded between requests (Ollama format: `"5m"`, `"1h"`) |
|
|
499
|
+
| `num_ctx` | `4096` | Context window size passed to Ollama; **must be 4096 on Windows** (see Windows gotchas) |
|
|
500
|
+
| `disable_after_failures` | `3` | Automatically disable local LLM for the session after this many consecutive failures |
|
|
501
|
+
|
|
502
|
+
### Hardware requirements
|
|
503
|
+
|
|
504
|
+
| Tier | Hardware | Notes |
|
|
505
|
+
|------|----------|-------|
|
|
506
|
+
| Recommended | RTX 3060+ with 8 GB VRAM | Full GPU acceleration; qwen2.5-coder:7b loads entirely in VRAM |
|
|
507
|
+
| Functional | GTX 1660+ with 6 GB VRAM | GPU acceleration with slight layer offload to RAM |
|
|
508
|
+
| Marginal | CPU only, 32 GB RAM | Works but adds 5-20s latency per call; disable context-heavy features |
|
|
509
|
+
|
|
510
|
+
For GPU acceleration, ensure NVIDIA drivers are 520+ and CUDA 11.8+ is installed. AMD GPU support is available via ROCm on Linux only.
|
|
511
|
+
|
|
512
|
+
### Windows gotchas
|
|
513
|
+
|
|
514
|
+
- **Smart App Control**: May block `ollama_llama_server.exe` on first run. Allow it via Security settings or disable Smart App Control.
|
|
515
|
+
- **Windows Defender**: Add an exclusion for `%LOCALAPPDATA%\Programs\Ollama\ollama_llama_server.exe` to prevent Defender from scanning inference calls in real time.
|
|
516
|
+
- **`num_ctx` must be 4096**: Higher values cause GPU memory fragmentation on Windows and result in OOM errors mid-session. Always set `advanced.num_ctx: 4096` in your config.
|
|
517
|
+
- **Firewall**: Ollama listens on `localhost:11434` by default. If you see connection refused errors, check that Windows Firewall is not blocking loopback connections.
|
|
518
|
+
|
|
519
|
+
### Viewing metrics
|
|
520
|
+
|
|
521
|
+
After enabling local LLM, PBR logs per-call metrics to `.planning/logs/local-llm-metrics.jsonl`. Use the built-in subcommands to inspect them:
|
|
522
|
+
|
|
523
|
+
```bash
|
|
524
|
+
# Show session summary (calls routed, latency, token savings)
|
|
525
|
+
node plugins/pbr/scripts/pbr-tools.js llm metrics
|
|
526
|
+
|
|
527
|
+
# Suggest routing threshold adjustments based on recent accuracy
|
|
528
|
+
node plugins/pbr/scripts/pbr-tools.js llm adjust-thresholds
|
|
529
|
+
```
|
|
530
|
+
|
|
531
|
+
Metrics include: routing decision, model used, latency ms, confidence score, whether the frontier fallback was triggered, and estimated tokens saved.
|
|
@@ -71,6 +71,28 @@ requirement_ids:
|
|
|
71
71
|
| `consumes` | NO | array | What this plan needs from prior plans. Format: `"Thing (from plan XX-YY)"` |
|
|
72
72
|
| `requirement_ids` | NO | array | Requirement IDs from REQUIREMENTS.md or ROADMAP.md goal IDs that this plan addresses. Enables bidirectional traceability between plans and requirements/goals. |
|
|
73
73
|
| `dependency_fingerprints` | NO | object | Hashes of dependency phase SUMMARY.md files at plan-creation time. Used to detect stale plans. |
|
|
74
|
+
| `data_contracts` | NO | array | Cross-boundary parameter mappings for calls where arguments originate from external boundaries. Format: `"param: source (context) [fallback]"` |
|
|
75
|
+
|
|
76
|
+
### Data Contracts
|
|
77
|
+
|
|
78
|
+
When a task's `<action>` includes calls across module boundaries where arguments come from external sources (hook stdin, env vars, API params, config files), document the parameter-to-source mapping in `data_contracts` frontmatter and in the `<action>` step itself.
|
|
79
|
+
|
|
80
|
+
Example frontmatter:
|
|
81
|
+
|
|
82
|
+
```yaml
|
|
83
|
+
data_contracts:
|
|
84
|
+
- "sessionId: data.session_id (hook stdin) [undefined in CLI context]"
|
|
85
|
+
- "config: configLoad(planningDir) (disk) [resolveConfig(undefined)]"
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Example in `<action>`:
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
3. Call classifyArtifact(llmConfig, planningDir, content, fileType, data.session_id)
|
|
92
|
+
Data contract: sessionId ← data.session_id from hook stdin (undefined in CLI context)
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
**When to apply:** Any call where caller and callee are in different modules AND at least one argument originates from an external boundary. Internal helper calls within the same module do not need contracts.
|
|
74
96
|
|
|
75
97
|
---
|
|
76
98
|
|
|
@@ -127,7 +127,7 @@ Read `.planning/config.json` and check for fields referenced by skills:
|
|
|
127
127
|
- PASS: All expected fields present with correct types
|
|
128
128
|
- WARN (missing fields): Report each missing field and which skill uses it — "Run `/pbr:config` to set all options."
|
|
129
129
|
|
|
130
|
-
### Check 10: Orphaned Crash Recovery Files
|
|
130
|
+
### Check 10: Orphaned Crash Recovery & Lock Files
|
|
131
131
|
|
|
132
132
|
The executor creates `.PROGRESS-{plan_id}` files as crash recovery breadcrumbs during builds and deletes them after `SUMMARY.md` is written. Similarly, `.checkpoint-manifest.json` files track checkpoint state during execution. If the executor crashes mid-build, these files remain and could confuse future runs.
|
|
133
133
|
|
|
@@ -147,6 +147,13 @@ Glob for `.planning/phases/**/.PROGRESS-*` and `.planning/phases/**/.checkpoint-
|
|
|
147
147
|
```
|
|
148
148
|
Fix suggestion: "Checkpoint manifests are leftover from interrupted builds. Safe to delete if no `/pbr:build` is currently running. Remove with `rm <path>`."
|
|
149
149
|
|
|
150
|
+
Also check for `.planning/.active-skill`:
|
|
151
|
+
|
|
152
|
+
- If the file does not exist: no action needed (PASS for this sub-check)
|
|
153
|
+
- If the file exists, check its age by comparing the file modification time to the current time:
|
|
154
|
+
- If older than 1 hour: WARN with fix suggestion: "Stale .active-skill lock file detected (set {age} ago). No PBR skill appears to be running. Safe to delete with `rm .planning/.active-skill`."
|
|
155
|
+
- If younger than 1 hour: INFO: "Active skill lock exists ({content}). A PBR skill may be running."
|
|
156
|
+
|
|
150
157
|
---
|
|
151
158
|
|
|
152
159
|
## Auto-Fix for Common Corruption Patterns
|
|
@@ -210,10 +210,10 @@ The `features.team_discussions` config flag (and `/pbr:build --team`) enables **
|
|
|
210
210
|
║ ▶ NEXT UP ║
|
|
211
211
|
╚══════════════════════════════════════════════════════════════╝
|
|
212
212
|
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
213
|
+
- `/pbr:begin` — start a new project
|
|
214
|
+
- `/pbr:status` — check current project status
|
|
215
|
+
- `/pbr:config` — configure workflow settings
|
|
216
|
+
- `/pbr:help <command>` — detailed help for a specific command
|
|
217
217
|
|
|
218
218
|
```
|
|
219
219
|
|