nubos-pilot 1.0.6 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -56,6 +56,11 @@ The orchestrator provides these in your prompt context. Read every path it hands
56
56
  - `test` (assertion failed)
57
57
  - `runtime` (uncaught exception inside test or script)
58
58
  - `infra` (missing tool, network, env var) → STOP and emit `## INFRA BLOCKER` block; do not edit source.
59
+ 1a. **MANDATORY knowledge lookup (Rule 9 — non-optional, runs before any Edit).** Pick the failing symbol or error class from Step 1 and run:
60
+ ```bash
61
+ node .nubos-pilot/bin/np-tools.cjs knowledge-search "<failing-symbol-or-error-class>" --limit 5
62
+ ```
63
+ If a hit lives in `.nubos-pilot/codebase/<module>.md`, `Read` that doc before patching. Skipping this step stamps `rule-9-violation` in the Layer-C audit log and the loop routes back to the researcher swarm next round — it is **not** an opt-out.
59
64
  2. **Locate the failure surface** strictly inside `files_modified`. If the failure points outside that set, emit `## SCOPE EXPANSION REQUEST` and stop — do NOT edit out-of-scope files.
60
65
  3. **Propose the smallest patch** that addresses the root cause:
61
66
  - For `compile` / `lint`: edit the offending file directly.
@@ -65,15 +70,15 @@ The orchestrator provides these in your prompt context. Read every path it hands
65
70
  5. **Loop ≤ 3 attempts.** If verify still fails after the third attempt, STOP and write `T<NNNN>-FIX-NOTES.md` describing what was tried, what didn't work, and the suspected root cause. Hand back to executor.
66
71
  6. **On success:** do NOT commit yourself. Hand control back to `np-executor` so the D-03 atomic commit path runs.
67
72
 
68
- ## Knowledge Lookup
73
+ ## Mandatory Knowledge Lookup (Rule 9)
69
74
 
70
- Before guessing at unfamiliar symbols, consult the local index:
75
+ **This is non-optional, not advisory.** Workflow Step 1a runs the lookup before any Edit. Skipping it stamps `rule-9-violation` in the audit log and forces a re-route to the researcher swarm.
71
76
 
72
77
  ```bash
73
78
  node .nubos-pilot/bin/np-tools.cjs knowledge-search "<failing-symbol>" --limit 5
74
79
  ```
75
80
 
76
- If a hit lives in `codebase/<module>.md`, read that doc before patching. Cross-task context belongs in `RULES.md` and `M<NNN>-CONTEXT.md`.
81
+ If a hit lives in `.nubos-pilot/codebase/<module>.md`, `Read` that doc before patching. Cross-task context belongs in `RULES.md` and `M<NNN>-CONTEXT.md`.
77
82
 
78
83
  ## Handoff Protocol
79
84
 
@@ -65,6 +65,14 @@ If any of the three module files cannot be read, emit `category: critic-error` w
65
65
 
66
66
  ## Output Schema — Verdict-Only Contract (ADR-0010 §L5, 2026-05-05)
67
67
 
68
+ > **ACTION CONTRACT — execute in this exact order:**
69
+ >
70
+ > 1. **Read** the three audit modules (`agents/np-critic-style.md`, `agents/np-critic-tests.md`, `agents/np-critic-acceptance.md`) — see Audit Surface table above. Skipping any → `category: critic-error` + route to `stuck`.
71
+ > 2. **`Write`** the full findings JSON to `<report_path>` (the literal path the orchestrator passes in your spawn prompt). Schema = Step 1 below. This artefact stays on disk; the orchestrator reads it via `--critic-outputs-path`, NOT from your final message.
72
+ > 3. **Emit** ONLY the ~150-byte verdict envelope as your final response — no prose, no markdown fence, no inline findings. Schema = Step 2 below.
73
+ >
74
+ > Inlining the full findings JSON as your final message instead of (3) is the canonical bypass — it replays multi-kB into the orchestrator's context every round and silently undoes ADR-0010 §L5. Don't do it.
75
+
68
76
  You emit your audit in **two artefacts**: the full findings JSON gets `Write`-n to a path the orchestrator hands you, and your spawn's final response is a tiny envelope. This keeps the parent context lean — verbatim multi-kB findings reports were the dominant Nubosloop token sink before this revision.
69
77
 
70
78
  ### Step 1 — write the full report to disk
@@ -111,6 +111,10 @@ into the `task(…)` commit. If `workflow.commit_docs=true`, the
111
111
  Unanswered `expects_reply=true` requests block commit-phase via Layer-B (ADR-0015).
112
112
  3. **Transition to in-progress:** `node np-tools.cjs checkpoint transition <task-id> in-progress`.
113
113
  4. **Edit files** — only the paths listed in the task's `files_modified` frontmatter. Use `Read` + `Edit` / `Write`. No scope expansion.
114
+ 4a. **Boundary check before every Edit/Write.** If the path you are about to touch is NOT in `files_modified`:
115
+ - DO NOT edit it. Not even "just an import line", not even a test fixture, not even a sibling module that "obviously needs the same change".
116
+ - Emit a `## SCOPE EXPANSION REQUEST` block naming the out-of-scope path and the symbol/reason that would have made you touch it.
117
+ - STOP and hand back to the orchestrator. The plan declares scope; if the scope is wrong, that is a **planner-bug**, not an executor-fix. The plan-checker route exists for exactly this case.
114
118
  5. **Transition to verifying:** `node np-tools.cjs checkpoint transition <task-id> verifying`.
115
119
  6. **Run the task-level verification command** from the task frontmatter's `verify`. If it fails, fix within the same `files_modified` scope. If it still fails after 2 attempts, STOP and report.
116
120
  7. **Transition to pre-commit:** `node np-tools.cjs checkpoint transition <task-id> pre-commit`.
@@ -32,7 +32,7 @@ Refusal of any rule is a hard-stop. Surface the violation to the orchestrator ve
32
32
  <required_reading>
33
33
  Before auditing, load:
34
34
 
35
- 1. `templates/VALIDATION.md` — the output skeleton (placeholders: `{N}`, `{milestone-slug}`, `{date}`)
35
+ 1. `templates/VALIDATION.md` — the output skeleton (placeholders use `{{name}}` syntax throughout, e.g. `{{phase_number}}`, `{{phase_slug}}`, `{{created_date}}`, `{{test_framework}}`, `{{quick_run_command}}`, etc.)
36
36
  2. `.nubos-pilot/REQUIREMENTS.md` — filter to the milestone's requirement IDs
37
37
  3. Every `<milestone_dir>/slices/S<NNN>/S<NNN>-PLAN.md` — slice plans with `<task>` blocks
38
38
  4. Every `<milestone_dir>/slices/S<NNN>/S<NNN>-SUMMARY.md` — per-wave outcome
@@ -111,16 +111,21 @@ For UNDER_SAMPLED and UNCOVERED: record the specific missing assertion(s) and re
111
111
  **ALWAYS use the Write tool to create files** — never use `Bash(cat << 'EOF')` or heredoc commands for file creation.
112
112
 
113
113
  1. Read `templates/VALIDATION.md` to obtain the skeleton
114
- 2. Substitute placeholders: `{N}` phase number, `{phase-slug}` phase slug, `{date}` today's ISO date
115
- 3. Append per-requirement scoring sections
114
+ 2. Substitute every `{{placeholder}}` from the template using the values supplied in your input block. Authoritative mapping:
115
+ - `{{phase_number}}` integer phase/milestone number (no `M` prefix)
116
+ - `{{phase_slug}}` → kebab-case milestone slug
117
+ - `{{created_date}}` → today's ISO date (YYYY-MM-DD)
118
+ - `{{test_framework}}`, `{{test_config_path}}`, `{{quick_run_command}}`, `{{full_suite_command}}`, `{{full_suite_seconds}}`, `{{max_feedback_seconds}}` → derived from the project's actual test setup (read `package.json` / `composer.json` / equivalent + existing test runner config)
119
+ - Table rows (`{{task_full_id}}`, `{{slice_number}}`, `{{wave_number}}`, `{{requirement_number}}`, `{{threat_ref}}`, `{{secure_behavior_or_na}}`, `{{automated_command}}`, `{{manual_*}}`, etc.) → emit one row per requirement / per task you scored
120
+ 3. Append per-requirement scoring sections (Covered / Under-Sampled / Uncovered) after the templated body
116
121
  4. Write the composed file to `validation_path`
117
122
 
118
- Final VALIDATION.md frontmatter (overriding template defaults with audit results):
123
+ Final VALIDATION.md frontmatter (overriding template defaults with audit results — concrete values, no placeholders left):
119
124
 
120
125
  ```yaml
121
126
  ---
122
- phase: {N}
123
- slug: {phase-slug}
127
+ phase: <integer phase number>
128
+ slug: <kebab-case phase slug>
124
129
  audited_at: YYYY-MM-DDTHH:MM:SSZ
125
130
  requirements_total: N
126
131
  covered: N
@@ -376,7 +376,16 @@ Reality-check is a planner responsibility, not an executor responsibility. Anyth
376
376
 
377
377
  Inside each `S<NNN>-PLAN.md`, every `<task>` tag MUST have these four attributes on the opening tag:
378
378
 
379
- - `id="M<NNN>-S<NNN>-T<NNNN>"` — full-id, e.g. `id="M001-S001-T0001"`. Milestone 3 digits, slice 3 digits, task **4 digits**. **Task numbering restarts at `T0001` inside every slice.** The first task of `S002` is `M<NNN>-S002-T0001`, the first task of `S003` is `M<NNN>-S003-T0001`. Tasks within a slice run `T0001, T0002, T0003, …` without gaps. Never continue the counter across slices (`S001-T0001, S002-T0002` is wrong — it must be `S001-T0001, S002-T0001`).
379
+ - `id="M<NNN>-S<NNN>-T<NNNN>"` — full-id, e.g. `id="M001-S001-T0001"`. Milestone 3 digits, slice 3 digits, task **4 digits**. **Task numbering restarts at `T0001` inside every slice.** Tasks within a slice run `T0001, T0002, T0003, …` without gaps.
380
+
381
+ > ⚠️ **COMMON MISTAKE — the slice counter resets, do NOT continue across slices.**
382
+ >
383
+ > | Pattern | Result |
384
+ > |---|---|
385
+ > | ❌ WRONG | `S001-PLAN.md`: T0001, T0002 → `S002-PLAN.md`: **T0003**, T0004 |
386
+ > | ✅ RIGHT | `S001-PLAN.md`: T0001, T0002 → `S002-PLAN.md`: **T0001**, T0002 |
387
+ >
388
+ > The slice number in the task ID is the authoritative wave; the T-number is per-slice. `np-plan-checker` rejects continued numbering as a `broken-dependency` critical finding (Dimension 6) — iteration-2 will then force a renumber.
380
389
  - `depends_on="<id>[,<id>...]"` — comma-separated predecessor task full-ids, or empty string `""`. Must only reference tasks in **earlier slices** (cross-slice forward deps) or be empty (intra-slice tasks are implicitly parallel, never serial).
381
390
  - `wave="<N>"` — integer equal to the slice number. For S001 use `wave="1"`, for S002 use `wave="2"`, etc.
382
391
  - `tier="<haiku|sonnet|opus>"` — executor tier, picks the model via resolve-model.
@@ -99,24 +99,27 @@ Do NOT use handoffs as a replacement for RESEARCH.md content — they are for si
99
99
 
100
100
  ## Tool Availability Detection
101
101
 
102
- On startup, before doing any research work, probe the web + MCP surface:
103
-
104
- 1. **WebFetch probe** attempt one HEAD request to a known safe URL (e.g. `about:blank` or `https://example.com/`), 5-second timeout. If the tool is missing or the call raises a tool-not-available error, mark `webfetch_available = false`.
105
- 2. **Context7 probe** — call `mcp__context7__list-libraries` (or the lightest available Context7 method) with empty/minimal args, 5-second timeout. If the MCP tool is missing or raises tool-not-available, mark `context7_available = false`.
106
-
107
- Pseudocode:
108
-
109
- ```text
110
- webfetch_available = try_call(WebFetch, HEAD about:blank, timeout=5s) succeeds
111
- context7_available = try_call(mcp__context7__list-libraries, {}, timeout=5s) succeeds
112
-
113
- if webfetch_available OR context7_available:
114
- proceed with full web + MCP research (normal path)
115
- else:
116
- enter Offline-Confirm Protocol (D-21)
117
- ```
118
-
119
- Actual transport detection is the Phase 7/8 runtime-adapter's concern. This agent only needs to know *whether* the capability is callable. Timeouts are 5s per probe; total startup budget ≤ 10s.
102
+ > **ACTION CONTRACT — runs ONCE at startup, before any research work. Total budget 10s.**
103
+ >
104
+ > Execute EXACTLY these two probes, in order:
105
+ >
106
+ > 1. **WebFetch probe** — call the `WebFetch` tool once with URL `https://example.com/` and a trivial extraction prompt (e.g. `"return the page title"`). Wait ≤ 5s.
107
+ > - Success → set `webfetch_available = true`.
108
+ > - Tool returns `tool-not-available` / `unknown tool` / similar → set `webfetch_available = false`.
109
+ > - Timeout or transport error → set `webfetch_available = false`.
110
+ >
111
+ > 2. **Context7 probe** — call `mcp__plugin_compound-engineering_context7__resolve-library-id` (or the lightest Context7 method available in this runtime) with a minimal query (`{libraryName: "react"}`). Wait ≤ 5s.
112
+ > - Success or empty-result response → set `context7_available = true`.
113
+ > - Tool returns `tool-not-available` / MCP server missing → set `context7_available = false`.
114
+ > - Timeout or transport error set `context7_available = false`.
115
+ >
116
+ > 3. **Branch:**
117
+ > - `webfetch_available OR context7_available == true` → proceed with full web + MCP research path.
118
+ > - Both `false` → enter Offline-Confirm Protocol (D-21, below).
119
+ >
120
+ > DO NOT skip either probe. DO NOT assume availability from the tool list — tools listed by the harness may still raise `tool-not-available` at call time. The probe IS the contract.
121
+
122
+ Actual transport detection is the Phase 7/8 runtime-adapter's concern. This agent only needs to know *whether* the capability is callable at runtime.
120
123
 
121
124
  ## Offline-Confirm Protocol (D-21)
122
125
 
@@ -4,7 +4,7 @@ const { NubosPilotError } = require('../../lib/core.cjs');
4
4
  const knowledgeAdapter = require('../../lib/knowledge-adapter.cjs');
5
5
  const args = require('./_args.cjs');
6
6
 
7
- function run(argv, ctx) {
7
+ async function run(argv, ctx) {
8
8
  const context = ctx || {};
9
9
  const cwd = context.cwd || process.cwd();
10
10
  const stdout = context.stdout || process.stdout;
@@ -26,7 +26,7 @@ function run(argv, ctx) {
26
26
  if (limit !== undefined) opts.limit = Number(limit);
27
27
 
28
28
  const adapter = knowledgeAdapter.getAdapter(cwd);
29
- const result = adapter.match(query, opts);
29
+ const result = await adapter.match(query, opts);
30
30
  stdout.write(JSON.stringify({
31
31
  adapter: adapter.name,
32
32
  query,