nubos-pilot 0.8.1 → 0.8.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -41,7 +41,7 @@ Additional context the orchestrator may inline in the prompt:
41
41
 
42
42
  ## Review Dimensions
43
43
 
44
- Each dimension maps to one or more canonical finding categories from `docs/agent-frontmatter-schema.md`. The 10 canonical codes are:
44
+ Each dimension maps to one or more canonical finding categories from `docs/agent-frontmatter-schema.md`. The 11 canonical codes are:
45
45
 
46
46
  - `missing-success-criterion` — a ROADMAP SC-X is not mapped to any task.
47
47
  - `non-atomic-task` — a task bundles multiple distinct deliverables that should be split.
@@ -53,6 +53,7 @@ Each dimension maps to one or more canonical finding categories from `docs/agent
53
53
  - `bare-askuser-call` — workflow MD emits `AskUserQuestion` directly instead of `node np-tools.cjs askuser --json '{…}'` (D-04).
54
54
  - `hook-field-present` — agent frontmatter contains `hooks:` (D-10).
55
55
  - `forbidden-agent-field` — agent frontmatter contains `model:` or `model_profile:` (D-10).
56
+ - `unverified-assumption` — a slice plan's `<reality_check>` block is missing, empty, or contains an `<assumption>` without a non-empty `verified_by` attribute, OR a `<files_read>` path does not exist in the repo (Reality-Check rule, see Dimension 12).
56
57
 
57
58
  Run each dimension below; for every failure, emit one finding using the matching canonical code.
58
59
 
@@ -117,6 +118,24 @@ Run each dimension below; for every failure, emit one finding using the matching
117
118
  - Extract actionable directives (forbidden patterns, required conventions, mandated tools).
118
119
  - Any plan action that violates them → map to the closest canonical code; if nothing fits, emit `unknown-category`.
119
120
 
121
+ ### Dimension 12: Reality-Check Completeness (Slice-Level, MANDATORY)
122
+
123
+ This dimension exists because plans that look structurally fine still fail at execute-time when the planner encoded an unverified assumption (wrong package version, stale interface signature, prescribed command that does not exist in this env). The planner is required to produce a `<reality_check>` block per slice; you enforce that it actually did, and that the evidence is real.
124
+
125
+ For every `S<NNN>-PLAN.md`:
126
+
127
+ 1. **Block presence** — confirm a `<reality_check>` block exists and appears ABOVE `<tasks>`. Missing or empty block → `unverified-assumption`, severity `critical`, target `S<NNN>-PLAN.md §reality_check`.
128
+ 2. **Sub-blocks present** — confirm `<files_read>`, `<commands_run>`, `<assumptions>`, and `<unknowns>` sub-blocks all exist. Missing sub-block → `unverified-assumption`, severity `critical`.
129
+ 3. **`<files_read>` integrity** — for each `path:line` (or `path:line-line`) entry, use `Glob` or `Read` to confirm the file exists in the repo. A path that does not resolve → `unverified-assumption`, severity `critical`, target the offending entry. (You do NOT need to confirm the line content — that is the planner's professional honesty, audited by the iter-2 PLAN-REVIEW trail.)
130
+ 4. **`<assumption>` `verified_by` integrity** — every `<assumption>` MUST carry a `verified_by` attribute. The attribute value MUST be either:
131
+ - a `path:line` string that appears verbatim in the slice's `<files_read>` block, OR
132
+ - a `cmd:<command>` string whose `<command>` substring appears verbatim in the slice's `<commands_run>` block.
133
+ Missing `verified_by`, empty `verified_by`, or `verified_by` pointing at evidence not present in the same `<reality_check>` → `unverified-assumption`, severity `critical`, target the offending `<assumption>`.
134
+ 5. **`<unknowns>` discipline** — if `<unknowns>` is non-empty, confirm the slice has a Wave-0 reconnaissance task (the first `<task>` in the slice, intra-slice parallel-safe) whose `<name>` or `<action>` references the unknown by phrase. No matching Wave-0 task → `unverified-assumption`, severity `critical`, target the unknown.
135
+ 6. **No silent waivers** — phrases like "TBD", "to be confirmed", "assume defaults", "should work", "presumably", "likely" inside `<reality_check>` are equivalent to a missing `verified_by` and emit `unverified-assumption`.
136
+
137
+ This dimension is the empirical complement to Dimensions 1-11 (which are structural). Together they make the 2-iteration loop sufficient: structural defects caught by 1-11, empirical defects caught by 12.
138
+
120
139
  ## Verdict Format
121
140
 
122
141
  Emit exactly one fenced YAML block. No commentary before or after. The loop in Plan 05-10 parses only `status` and `findings[].category`.
@@ -156,7 +175,7 @@ Fields:
156
175
 
157
176
  | Severity | Meaning | Examples |
158
177
  |----------|---------|----------|
159
- | critical | Plan will not deliver the phase goal as written. MUST be fixed before execution. | `missing-success-criterion`, `cyclic-dependency`, `broken-dependency`, `forbidden-agent-field`, `hook-field-present`, `bare-askuser-call`. |
178
+ | critical | Plan will not deliver the phase goal as written. MUST be fixed before execution. | `missing-success-criterion`, `cyclic-dependency`, `broken-dependency`, `forbidden-agent-field`, `hook-field-present`, `bare-askuser-call`, `unverified-assumption`. |
160
179
  | major | Plan will technically deliver but with defects the verifier will catch post-execution. SHOULD be fixed. | `non-atomic-task`, `missing-coverage-annotation`, `fake-promotion-trigger` when the mis-classification affects wave ordering. |
161
180
  | minor | Plan quality issue that does not block execution. INFO-level for the planner's revision. | `unbounded-scope` with obvious bounded intent, minor wording that hints at scope creep. |
162
181
 
@@ -86,6 +86,41 @@ If the prompt contains a `<files_to_read>` block, you MUST use the `Read` tool t
86
86
  - Return structured results to orchestrator
87
87
  </role>
88
88
 
89
+ <reality_check_protocol>
90
+ ## CRITICAL: Reality-Check Before Planning (MANDATORY)
91
+
92
+ Plans fail at execute-time when they encode assumptions the planner never verified against the actual repo. To stop the replan-after-execute loop, BEFORE writing any `S<NNN>-PLAN.md` you MUST empirically verify every load-bearing assumption and record the evidence inside the slice plan.
93
+
94
+ This is not optional. Plan-checker rejects any slice plan whose `<reality_check>` block is absent, empty, or contains `<assumption>` entries without a `verified_by` attribute (canonical category: `unverified-assumption`, severity `critical`).
95
+
96
+ ### What MUST be reality-checked per slice
97
+
98
+ For every slice you plan, BEFORE writing its `S<NNN>-PLAN.md`:
99
+
100
+ 1. **Versions** — every library / framework / runtime version your plan will pin or rely on. Read the actual manifest the project loads (`composer.lock`, `package-lock.json`, `Gemfile.lock`, `go.mod`, `pyproject.toml`/`uv.lock`, `Pipfile.lock`, `cargo.lock`, etc.) at the precise line. Never derive a version from training data, RESEARCH.md narrative, or a web search alone — confirm it in the lockfile.
101
+ 2. **Interfaces** — every function / class / method / hook your plan tells the executor to call or modify. Open the file with `Read` and quote the actual signature in the slice plan. Do not trust memory; signatures change between versions.
102
+ 3. **Commands** — every shell command your plan prescribes (test runner, build, migration, package install, container exec). Run a non-mutating probe (`--version`, `--help`, `which <cmd>`, `<cmd> list`) to confirm the command exists and behaves as expected. Never prescribe a command you have not seen succeed in this environment.
103
+ 4. **Conventions** — every project convention your plan relies on (naming, dir layout, test framework choice, ORM patterns, auth stack). Confirm by reading at least one existing example in the repo. If `./CLAUDE.md` exists, it is authoritative — quote the relevant line.
104
+
105
+ ### Reality-check the riskiest assumption first
106
+
107
+ If the slice introduces a new dependency, a major-version bump, or touches a stack you have not verified in this run, verify THAT first. A failed reality-check there changes the whole plan; finding it after writing tasks wastes the iteration.
108
+
109
+ ### When you cannot verify
110
+
111
+ If an assumption cannot be empirically resolved from the repo or environment (the library is not yet installed, an external service is unreachable, the lockfile lacks a transitive resolution):
112
+
113
+ - Add it to `<unknowns>` inside `<reality_check>` with the concrete reason.
114
+ - Either resolve it via a Wave-0 reconnaissance task in this same slice (named after the unknown), OR exit the planning run and request `/np:research-phase` for this slice.
115
+ - You may NOT silently encode an unverified assumption. The downstream cost is one wasted execute-phase plus one wasted plan-revision iteration — orders of magnitude higher than one extra `Read` or `Bash` call now.
116
+
117
+ ### What this is NOT
118
+
119
+ - Not a security review (that's `np-security-reviewer`).
120
+ - Not a research substitute (that's `np-researcher` — research finds what *should* be used; reality-check confirms what *is* installed/available).
121
+ - Not exhaustive code reading. Read what the slice's tasks will touch — no more, no less.
122
+ </reality_check_protocol>
123
+
89
124
  <context_fidelity>
90
125
  ## CRITICAL: User Decision Fidelity
91
126
 
@@ -227,19 +262,56 @@ If the executor has to stop and read three more files to figure out what you mea
227
262
 
228
263
  Before emitting a `PLAN.md`, run through this list once:
229
264
 
230
- 1. **Frontmatter:** `phase`, `plan`, `type`, `wave`, `depends_on`, `files_modified`, `autonomous`, `requirements`, `must_haves` present and non-empty where required.
231
- 2. **Objective:** Single `<objective>` block, names the PLAN-XX requirement it closes, states output explicitly.
232
- 3. **Context:** `@path/to/file` references exist in the repo (do a quick `ls` / `Read` round-trip if unsure).
233
- 4. **Tasks:** 1-3 tasks, each with `<files>`, `<action>`, `<verify><automated>…</automated></verify>`, `<done>`.
234
- 5. **Dependencies:** `depends_on` references plan IDs that exist in the current ROADMAP wave graph.
235
- 6. **Verification:** Every `<verify>` has an `<automated>` command. If no test exists yet, the task itself creates it (TDD) or a Wave-0 task does.
236
- 7. **Success criteria:** Measurable, not prose-only. "Executes without throwing" > "works correctly".
237
- 8. **No forbidden patterns:** No bare `AskUserQuestion` calls (use `node np-tools.cjs askuser --json '{...}'`); no legacy helper-CLI references (all helper calls use `np-tools.cjs`); no `hooks:` / `model:` / `model_profile:` fields in agent frontmatter.
265
+ 1. **Reality-Check Block:** `<reality_check>` is present, non-empty, and every `<assumption>` carries a non-empty `verified_by` attribute pointing to a `<files_read>` or `<commands_run>` entry. `<unknowns>` is either empty OR each entry maps to a Wave-0 reconnaissance task in this slice. (Failing this is a guaranteed plan-checker reject.)
266
+ 2. **Frontmatter:** `phase`, `plan`, `type`, `wave`, `depends_on`, `files_modified`, `autonomous`, `requirements`, `must_haves` present and non-empty where required.
267
+ 3. **Objective:** Single `<objective>` block, names the PLAN-XX requirement it closes, states output explicitly.
268
+ 4. **Context:** `@path/to/file` references exist in the repo (do a quick `ls` / `Read` round-trip if unsure).
269
+ 5. **Tasks:** 1-3 tasks, each with `<files>`, `<action>`, `<verify><automated>…</automated></verify>`, `<done>`.
270
+ 6. **Dependencies:** `depends_on` references plan IDs that exist in the current ROADMAP wave graph.
271
+ 7. **Verification:** Every `<verify>` has an `<automated>` command. If no test exists yet, the task itself creates it (TDD) or a Wave-0 task does.
272
+ 8. **Success criteria:** Measurable, not prose-only. "Executes without throwing" > "works correctly".
273
+ 9. **No forbidden patterns:** No bare `AskUserQuestion` calls (use `node np-tools.cjs askuser --json '{...}'`); no legacy helper-CLI references (all helper calls use `np-tools.cjs`); no `hooks:` / `model:` / `model_profile:` fields in agent frontmatter.
238
274
 
239
275
  If any check fails, fix before returning. Plan-checker will catch what you miss, but every fix costs an iteration (max 2 — D-15 in Phase-5 CONTEXT).
240
276
  </answer_validation>
241
277
 
242
278
  <task_format>
279
+ ## Slice Plan Layout (MANDATORY)
280
+
281
+ Every `S<NNN>-PLAN.md` MUST open with a `<reality_check>` block ABOVE `<tasks>`. The block records the empirical evidence behind the slice's assumptions. Plan-checker fails any slice plan that omits it, leaves it empty, or whose `<assumption>` entries lack a `verified_by` attribute (`unverified-assumption`, critical).
282
+
283
+ Required shape:
284
+
285
+ ```
286
+ <reality_check>
287
+ <files_read>
288
+ - composer.lock:1245 (laravel/framework version)
289
+ - app/Models/User.php:18 (HasRoles trait already mixed in)
290
+ </files_read>
291
+ <commands_run>
292
+ - `php artisan about` → "Laravel Version: 11.31.0"
293
+ - `composer show spatie/laravel-permission` → "versions : * 6.10.1"
294
+ </commands_run>
295
+ <assumptions>
296
+ <assumption verified_by="composer.lock:1245">Laravel 11.31 is the installed major.minor — plan targets 11.x APIs.</assumption>
297
+ <assumption verified_by="app/Models/User.php:18">HasRoles trait already present — plan does NOT re-add it.</assumption>
298
+ <assumption verified_by="cmd:composer show spatie/laravel-permission">spatie/laravel-permission 6.10 is installed — no install task needed.</assumption>
299
+ </assumptions>
300
+ <unknowns>
301
+ <!-- Empty when every assumption is verified. Otherwise list each unresolved item with a reason and a Wave-0 task ID that resolves it. -->
302
+ </unknowns>
303
+ </reality_check>
304
+ ```
305
+
306
+ Rules:
307
+
308
+ - **`<files_read>`**: every entry is `path:line` or `path:line-line` (a range). Plan-checker re-reads each path and confirms the file exists. Paste the precise line — do not paraphrase.
309
+ - **`<commands_run>`**: every entry is `` `cmd` → "literal output substring" ``. The substring is what the planner observed. Plan-checker does NOT re-run commands; honesty is enforced by the iter-2 audit trail.
310
+ - **`<assumptions>`**: every `<assumption>` MUST carry a non-empty `verified_by` attribute pointing to either a `<files_read>` path:line entry or a `cmd:<command>` entry already listed in `<commands_run>`. An assumption without `verified_by` is the same as no reality-check.
311
+ - **`<unknowns>`**: empty in the happy path. If non-empty, the slice MUST contain a Wave-0 reconnaissance task (the first task in the slice) that resolves the unknown before downstream tasks run.
312
+
313
+ Reality-check is a planner responsibility, not an executor responsibility. Anything the executor would discover in the first 60 seconds of work belongs in `<reality_check>`.
314
+
243
315
  ## Task XML Format (MANDATORY)
244
316
 
245
317
  Inside each `S<NNN>-PLAN.md`, every `<task>` tag MUST have these four attributes on the opening tag:
@@ -69,6 +69,7 @@ Canonical identifiers for findings that `agents/np-plan-checker.md` emits. Start
69
69
  - `bare-askuser-call` — workflow MD emits `AskUserQuestion` directly instead of `node np-tools.cjs askuser --json '{…}'` (D-04).
70
70
  - `hook-field-present` — agent frontmatter contains `hooks:` (D-10).
71
71
  - `forbidden-agent-field` — agent frontmatter contains `model:` or `model_profile:` (D-10).
72
+ - `unverified-assumption` — a slice plan's `<reality_check>` block is missing, empty, or contains an `<assumption>` without a non-empty `verified_by` attribute, OR a `<files_read>` path does not exist in the repo. Enforces the empirical pre-flight performed by `agents/np-planner.md` so version / interface / command assumptions cannot reach the executor unverified.
72
73
 
73
74
  Each finding returned by plan-checker carries one of these codes plus an anchor `{file, line}` pair so the planner's revise-mode can address them without re-deriving context.
74
75
 
@@ -28,6 +28,7 @@ const CATEGORIES = [
28
28
  'bare-askuser-call',
29
29
  'hook-field-present',
30
30
  'forbidden-agent-field',
31
+ 'unverified-assumption',
31
32
  ];
32
33
 
33
34
  const REQUIRED_H2 = [
@@ -51,7 +52,7 @@ test('PC-1: loadAgent(plan-checker) returns tier=opus and name=plan-checker', ()
51
52
  }
52
53
  });
53
54
 
54
- test('PC-2: body contains all 10 canonical finding-category identifiers', () => {
55
+ test('PC-2: body contains all 11 canonical finding-category identifiers', () => {
55
56
  for (const c of CATEGORIES) {
56
57
  assert.ok(BODY.includes(c), 'missing canonical category: ' + c);
57
58
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "nubos-pilot",
3
- "version": "0.8.1",
3
+ "version": "0.8.3",
4
4
  "description": "AI-driven planning and execution tool for code projects",
5
5
  "homepage": "https://github.com/Nubos-AI/nubos-pilot",
6
6
  "repository": {