npm - muonroi-cli - Versions diffs - 1.4.1 → 1.5.0 - Mend

muonroi-cli 1.4.1 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (172) hide show

package/LICENSE +21 -21
package/README.md +122 -122
package/dist/packages/agent-harness-core/src/predicate.d.ts +1 -1
package/dist/src/agent-harness/__tests__/mock-model.spec.js +48 -1
package/dist/src/agent-harness/mock-model.d.ts +11 -0
package/dist/src/agent-harness/mock-model.js +21 -0
package/dist/src/cli/cost-forensics.js +12 -12
package/dist/src/council/__tests__/clarification-prompt.test.js +51 -0
package/dist/src/council/__tests__/clarifier-ready-gate.test.js +32 -0
package/dist/src/council/__tests__/decisions-lock.test.js +17 -1
package/dist/src/council/__tests__/oauth-reachable.test.d.ts +1 -0
package/dist/src/council/__tests__/oauth-reachable.test.js +31 -0
package/dist/src/council/__tests__/parse-outcome-fallback.test.js +11 -0
package/dist/src/council/clarifier.js +9 -1
package/dist/src/council/debate.js +5 -1
package/dist/src/council/decisions-lock.js +3 -3
package/dist/src/council/index.js +12 -5
package/dist/src/council/leader.d.ts +0 -17
package/dist/src/council/leader.js +22 -15
package/dist/src/council/planner.js +1 -1
package/dist/src/council/prompts.js +63 -57
package/dist/src/council/types.d.ts +7 -0
package/dist/src/ee/__tests__/ee-onboarding.test.d.ts +1 -0
package/dist/src/ee/__tests__/ee-onboarding.test.js +32 -0
package/dist/src/ee/auth.d.ts +9 -0
package/dist/src/ee/auth.js +19 -0
package/dist/src/ee/ee-onboarding.d.ts +5 -0
package/dist/src/ee/ee-onboarding.js +76 -0
package/dist/src/generated/version.d.ts +1 -1
package/dist/src/generated/version.js +1 -1
package/dist/src/headless/output.js +6 -4
package/dist/src/headless/output.test.js +4 -3
package/dist/src/index.js +20 -1
package/dist/src/mcp/__tests__/auto-setup.test.js +74 -0
package/dist/src/mcp/__tests__/client-pool.spec.d.ts +1 -0
package/dist/src/mcp/__tests__/client-pool.spec.js +98 -0
package/dist/src/mcp/__tests__/parallel-build.spec.d.ts +1 -0
package/dist/src/mcp/__tests__/parallel-build.spec.js +67 -0
package/dist/src/mcp/__tests__/smart-filter.test.js +56 -0
package/dist/src/mcp/auto-setup.js +56 -2
package/dist/src/mcp/client-pool.d.ts +46 -0
package/dist/src/mcp/client-pool.js +212 -0
package/dist/src/mcp/oauth-callback.js +2 -2
package/dist/src/mcp/parse-headers.test.js +14 -14
package/dist/src/mcp/runtime.d.ts +28 -0
package/dist/src/mcp/runtime.js +117 -51
package/dist/src/mcp/self-verify-runner.d.ts +14 -0
package/dist/src/mcp/self-verify-runner.js +38 -0
package/dist/src/mcp/setup-guide-text.d.ts +9 -0
package/dist/src/mcp/setup-guide-text.js +84 -0
package/dist/src/mcp/smart-filter.js +49 -0
package/dist/src/mcp/smoke.test.js +43 -43
package/dist/src/mcp/tools-server.d.ts +7 -0
package/dist/src/mcp/tools-server.js +19 -22
package/dist/src/models/catalog.json +349 -349
package/dist/src/ops/__tests__/doctor-ee-health.test.js +21 -0
package/dist/src/ops/doctor.d.ts +3 -2
package/dist/src/ops/doctor.js +47 -11
package/dist/src/ops/doctor.test.js +4 -3
package/dist/src/orchestrator/__tests__/mcp-capability-block.test.d.ts +1 -0
package/dist/src/orchestrator/__tests__/mcp-capability-block.test.js +39 -0
package/dist/src/orchestrator/__tests__/project-stack.test.d.ts +1 -0
package/dist/src/orchestrator/__tests__/project-stack.test.js +65 -0
package/dist/src/orchestrator/batch-turn-runner.js +7 -11
package/dist/src/orchestrator/message-processor.js +57 -27
package/dist/src/orchestrator/orchestrator.js +26 -0
package/dist/src/orchestrator/prompts.d.ts +51 -0
package/dist/src/orchestrator/prompts.js +257 -134
package/dist/src/orchestrator/scope-ceiling.js +6 -1
package/dist/src/orchestrator/stream-runner.js +20 -15
package/dist/src/orchestrator/text-tool-call-detector.test.js +13 -13
package/dist/src/pil/__tests__/clarity-gate.test.js +24 -215
package/dist/src/pil/__tests__/config.test.js +1 -17
package/dist/src/pil/__tests__/discovery.test.js +144 -11
package/dist/src/pil/__tests__/layer1-intent-trace.test.js +7 -2
package/dist/src/pil/__tests__/layer1-intent.test.js +3 -0
package/dist/src/pil/__tests__/layer16-clarity.test.js +32 -116
package/dist/src/pil/__tests__/layer4-gsd.test.js +37 -0
package/dist/src/pil/__tests__/layer6-output.test.js +137 -18
package/dist/src/pil/__tests__/llm-classify.test.js +49 -2
package/dist/src/pil/agent-operating-contract.d.ts +1 -1
package/dist/src/pil/agent-operating-contract.js +2 -0
package/dist/src/pil/agent-operating-contract.test.js +7 -2
package/dist/src/pil/cheap-model-playbook.js +35 -35
package/dist/src/pil/cheap-model-workbooks.js +16 -13
package/dist/src/pil/clarity-gate.d.ts +21 -19
package/dist/src/pil/clarity-gate.js +26 -153
package/dist/src/pil/config.d.ts +9 -1
package/dist/src/pil/config.js +15 -4
package/dist/src/pil/discovery.js +211 -136
package/dist/src/pil/layer1-intent.d.ts +12 -0
package/dist/src/pil/layer1-intent.js +283 -38
package/dist/src/pil/layer1-intent.test.js +210 -4
package/dist/src/pil/layer16-clarity.d.ts +25 -11
package/dist/src/pil/layer16-clarity.js +19 -306
package/dist/src/pil/layer4-gsd.js +18 -6
package/dist/src/pil/layer6-output.d.ts +2 -0
package/dist/src/pil/layer6-output.js +137 -22
package/dist/src/pil/llm-classify.d.ts +26 -0
package/dist/src/pil/llm-classify.js +34 -5
package/dist/src/pil/native-capabilities-workbook.d.ts +1 -1
package/dist/src/pil/native-capabilities-workbook.js +82 -76
package/dist/src/pil/schema.d.ts +8 -0
package/dist/src/pil/schema.js +12 -1
package/dist/src/pil/task-tier-map.js +4 -0
package/dist/src/pil/types.d.ts +11 -1
package/dist/src/product-loop/done-gate.js +3 -3
package/dist/src/product-loop/loop-driver.js +18 -18
package/dist/src/product-loop/progress-snapshot.js +4 -4
package/dist/src/providers/auth/gemini-oauth.js +6 -15
package/dist/src/providers/auth/grok-oauth.js +6 -15
package/dist/src/providers/auth/openai-oauth.js +6 -15
package/dist/src/providers/mcp-vision-bridge.js +48 -48
package/dist/src/reporter/index.js +1 -1
package/dist/src/scaffold/bb-ecosystem-apply.js +47 -47
package/dist/src/scaffold/bb-quality-gate.js +5 -5
package/dist/src/scaffold/continuation-prompt.js +60 -60
package/dist/src/scaffold/init-new.js +453 -453
package/dist/src/self-qa/__tests__/scenario-planner.test.js +3 -3
package/dist/src/self-qa/agentic-loop.js +24 -19
package/dist/src/self-qa/spec-emitter.js +26 -23
package/dist/src/storage/__tests__/migrations.test.js +2 -2
package/dist/src/storage/interaction-log.js +5 -5
package/dist/src/storage/migrations.js +122 -122
package/dist/src/storage/sessions.js +42 -42
package/dist/src/storage/transcript.js +91 -84
package/dist/src/storage/usage.js +14 -14
package/dist/src/storage/workspaces.js +12 -12
package/dist/src/tools/__tests__/native-tools.test.d.ts +1 -0
package/dist/src/tools/__tests__/native-tools.test.js +53 -0
package/dist/src/tools/git-safety.d.ts +61 -0
package/dist/src/tools/git-safety.js +141 -0
package/dist/src/tools/git-safety.test.d.ts +1 -0
package/dist/src/tools/git-safety.test.js +111 -0
package/dist/src/tools/native-tools.d.ts +31 -0
package/dist/src/tools/native-tools.js +273 -0
package/dist/src/tools/registry-git-safety.test.d.ts +7 -0
package/dist/src/tools/registry-git-safety.test.js +92 -0
package/dist/src/tools/registry.js +39 -4
package/dist/src/ui/__tests__/markdown-render.test.d.ts +1 -0
package/dist/src/ui/__tests__/markdown-render.test.js +48 -0
package/dist/src/ui/app.js +0 -0
package/dist/src/ui/components/message-view.js +4 -1
package/dist/src/ui/components/structured-response-view.js +7 -3
package/dist/src/ui/components/tool-group.js +7 -1
package/dist/src/ui/markdown-render.d.ts +41 -0
package/dist/src/ui/markdown-render.js +223 -0
package/dist/src/ui/markdown.d.ts +10 -0
package/dist/src/ui/markdown.js +12 -35
package/dist/src/ui/slash/council-inspect.js +4 -4
package/dist/src/ui/slash/export.js +4 -4
package/dist/src/ui/utils/text.d.ts +8 -0
package/dist/src/ui/utils/text.js +16 -0
package/dist/src/ui/utils/text.test.d.ts +1 -0
package/dist/src/ui/utils/text.test.js +23 -0
package/dist/src/usage/ledger.js +48 -15
package/dist/src/utils/__tests__/footprint-gitignore.test.d.ts +1 -0
package/dist/src/utils/__tests__/footprint-gitignore.test.js +50 -0
package/dist/src/utils/clipboard-image.js +23 -23
package/dist/src/utils/open-url.d.ts +56 -0
package/dist/src/utils/open-url.js +58 -0
package/dist/src/utils/open-url.test.d.ts +1 -0
package/dist/src/utils/open-url.test.js +86 -0
package/dist/src/utils/settings.d.ts +12 -0
package/dist/src/utils/settings.js +48 -0
package/dist/src/utils/side-question.js +2 -2
package/dist/src/utils/skills.js +3 -3
package/dist/src/verify/__tests__/coverage-parsers.test.js +30 -30
package/dist/src/verify/environment.js +2 -1
package/package.json +1 -1
package/dist/src/pil/layer16-clarity.test.js +0 -31
/package/dist/src/{pil/layer16-clarity.test.d.ts → council/__tests__/clarification-prompt.test.d.ts} +0 -0

package/dist/src/pil/__tests__/layer16-clarity.test.js CHANGED Viewed

@@ -1,5 +1,9 @@
 import { describe, expect, it } from "vitest";
-import { buildInterviewQuestion, detectClarityGaps, resolveGapsNonInteractive } from "../layer16-clarity.js";
+import { buildInterviewQuestion, resolveGapsNonInteractive } from "../layer16-clarity.js";
+// Phase 2 (2026-06-16): detectClarityGaps + its keyword option-builders were
+// removed (the model now generates every clarification). The surviving helpers
+// — buildInterviewQuestion (render) and resolveGapsNonInteractive (headless
+// default-answer resolution) — are exercised here with model-shaped gaps.
 const EMPTY_PROJECT = {
     language: "typescript",
     framework: null,
@@ -10,122 +14,10 @@ const EMPTY_PROJECT = {
         { path: "src/billing/", name: "billing", entryFiles: [], exportedSymbols: [] },
     ],
     eePatterns: [],
-    relevantModules: [],
+    relevantModules: [{ path: "src/auth/", relevance: "named in prompt", exists: true }],
     scannedAt: Date.now(),
     cwd: "/proj",
 };
-describe("detectClarityGaps()", () => {
-    it("detects outcome gap for vague non-debug prompt", () => {
-        // PIL-L6 fix — debug now joins the autofill set, so vague debug prompts
-        // ("fix auth") no longer trigger an outcome question. Use a generate
-        // prompt instead to still cover the gap-detection path.
-        const gaps = detectClarityGaps("build something", "generate", 0.7, EMPTY_PROJECT);
-        const outcomeGap = gaps.find((g) => g.dimension === "outcome");
-        expect(outcomeGap).toBeDefined();
-    });
-    it("does NOT detect outcome gap for vague debug prompt (autofilled)", () => {
-        const gaps = detectClarityGaps("fix auth", "debug", 0.7, EMPTY_PROJECT);
-        const outcomeGap = gaps.find((g) => g.dimension === "outcome");
-        expect(outcomeGap).toBeUndefined();
-    });
-    it("does NOT detect an outcome gap for a vague general prompt (B2 intent-swallow guard)", () => {
-        // B2 — a `general` prompt's only outcome options are tautological
-        // ("Task completed" / "Issue resolved"). Asking them lets the default
-        // answer overwrite the user's real request, so the intent collapses to
-        // "general: Task completed" and the original prompt is lost. Skip the
-        // askcard so the outcome falls back to the raw request downstream.
-        const gaps = detectClarityGaps("the project feels messy", "general", 0.7, EMPTY_PROJECT);
-        const outcomeGap = gaps.find((g) => g.dimension === "outcome");
-        expect(outcomeGap).toBeUndefined();
-    });
-    it("detects scope gap when no file reference", () => {
-        const gaps = detectClarityGaps("fix auth", "debug", 0.7, EMPTY_PROJECT);
-        const scopeGap = gaps.find((g) => g.dimension === "scope");
-        expect(scopeGap).toBeDefined();
-    });
-    it("returns no gaps for specific prompt", () => {
-        const gaps = detectClarityGaps("fix TypeError in src/auth/login.ts:42", "debug", 0.9, EMPTY_PROJECT);
-        expect(gaps).toHaveLength(0);
-    });
-    it("scope options include matching bounded contexts", () => {
-        const gaps = detectClarityGaps("fix auth", "debug", 0.7, EMPTY_PROJECT);
-        const scopeGap = gaps.find((g) => g.dimension === "scope");
-        expect(scopeGap?.options.some((o) => o.includes("auth"))).toBe(true);
-    });
-    it("does NOT detect a scope gap for a general prompt with no codebase signal (B2-symmetric scope guard)", () => {
-        // Live drive (session 8a87aa060c6a): the pure non-codebase prompt "Reply
-        // with exactly one word: PONG" fired the scope askcard "Which part of the
-        // codebase should this target?" because countFileReferences /
-        // hasExplicitScope / hasOperationalScope were all empty — the detector
-        // assumes every prompt is a codebase task. A general/unclassified prompt
-        // has no codebase dimension to scope, so the question is nonsensical (and
-        // its acceptance card is downstream noise). Skip it, symmetric to the B2
-        // outcome guard; scope falls back to project-root downstream.
-        const gaps = detectClarityGaps("Reply with exactly one word: PONG", "general", 0.6, EMPTY_PROJECT);
-        expect(gaps.find((g) => g.dimension === "scope")).toBeUndefined();
-        // The only candidate gap was scope → general prompt now yields zero gaps,
-        // so discovery never marks interviewed=true and shows no acceptance card.
-        expect(gaps).toHaveLength(0);
-    });
-    it("STILL detects a scope gap for a classified (non-general) task with no file reference", () => {
-        // Guard must stay narrow: a real code task that simply omitted a path still
-        // benefits from the scope-narrowing askcard. Only general/null is skipped.
-        const gaps = detectClarityGaps("implement the search feature", "generate", 0.7, EMPTY_PROJECT);
-        expect(gaps.find((g) => g.dimension === "scope")).toBeDefined();
-    });
-    it("does NOT detect a scope gap for an image-analysis prompt (image is the scope)", () => {
-        // Live drive (PR#34 probe): "Take a screenshot of the homepage and analyze
-        // the diagram.png image to describe its layout" fired the codebase-scope
-        // askcard "Which part of the codebase should this target?" — nonsensical for
-        // an image-analysis task. The image (screenshot / diagram.png) IS the scope,
-        // symmetric to how operational (CI/build) prompts are scoped to the pipeline.
-        const gaps = detectClarityGaps("Take a screenshot of the homepage and analyze the diagram.png image to describe its layout", "analyze", 0.7, EMPTY_PROJECT);
-        expect(gaps.find((g) => g.dimension === "scope")).toBeUndefined();
-        // analyze autofills outcome, so with scope suppressed there are zero gaps →
-        // no interview, no acceptance card.
-        expect(gaps).toHaveLength(0);
-    });
-    it("STILL detects a scope gap for a code task that mentions an ambiguous non-image word", () => {
-        // Narrowness guard: image-scope suppression must not swallow real codebase
-        // tasks. "add a logo to the header" carries no concrete image signal (no
-        // file extension / screenshot / photo), so the scope askcard stays.
-        const gaps = detectClarityGaps("add a logo to the header", "generate", 0.7, EMPTY_PROJECT);
-        expect(gaps.find((g) => g.dimension === "scope")).toBeDefined();
-    });
-    it("does NOT detect a scope gap for a web-search / external-info prompt", () => {
-        // Live drive (tavily probe, session d7a45a2dba30): "search the web for the
-        // latest vitest release notes" classified taskType=analyze fired the
-        // codebase-scope askcard and recorded a wrong scope ("src/mcp"). A
-        // web-search task is scoped to the web, not the codebase — symmetric to the
-        // image-scope and operational-scope guards.
-        const gaps = detectClarityGaps("search the web for the latest vitest release notes", "analyze", 0.7, EMPTY_PROJECT);
-        expect(gaps.find((g) => g.dimension === "scope")).toBeUndefined();
-        expect(gaps).toHaveLength(0);
-    });
-    it("does NOT detect a scope gap for a self-contained computation prompt (data is inline)", () => {
-        // Live drive (deepseek-vs-grok A/B, session 17fc23f0): "Compute f([3,1,2])
-        // where f sorts the list ascending then returns the sum of the first two
-        // elements." classified taskType=analyze (regex:read matched the bare word
-        // "list", conf 0.80 → skipped the brain) fired BOTH the pil-interview scope
-        // askcard ("Which part of the codebase should this target?" → auto "Entire
-        // project") AND the pil-acceptance card. The operand [3,1,2] is supplied
-        // inline — the task has no codebase dimension to scope. Symmetric to the
-        // image / web / operational scope guards.
-        const gaps = detectClarityGaps("Compute f([3,1,2]) where f sorts the list ascending then returns the sum of the first two elements.", "analyze", 0.8, EMPTY_PROJECT);
-        expect(gaps.find((g) => g.dimension === "scope")).toBeUndefined();
-        // analyze autofills outcome, so with scope suppressed there are zero gaps →
-        // no interview, no acceptance card.
-        expect(gaps).toHaveLength(0);
-    });
-    it("STILL detects a scope gap for a code task that embeds a literal but no compute framing", () => {
-        // Narrowness guard: the inline-literal suppression must not swallow real
-        // codebase tasks. "set the default retry delays to [100, 200, 400] in the
-        // config" carries a literal but is scoped to the codebase (no compute verb),
-        // so the scope askcard stays.
-        const gaps = detectClarityGaps("set the default retry delays to [100, 200, 400] in the config", "generate", 0.7, EMPTY_PROJECT);
-        expect(gaps.find((g) => g.dimension === "scope")).toBeDefined();
-    });
-});
 describe("buildInterviewQuestion()", () => {
     it("builds a CouncilQuestionData with pil-interview phase", () => {
         const gap = {
@@ -141,11 +33,35 @@ describe("buildInterviewQuestion()", () => {
         expect(q.options).toBeDefined();
         expect(q.options.some((o) => o.kind === "freetext")).toBe(true);
     });
+    it("surfaces the model's reason (gap.description) as the askcard context", () => {
+        const gap = {
+            dimension: "outcome",
+            description: "answering this changes whether we add OAuth or just API keys",
+            suggestedQuestion: "Which auth method?",
+            options: ["OAuth", "API keys"],
+            defaultIndex: 0,
+        };
+        const q = buildInterviewQuestion(gap, "q-2");
+        expect(q.context).toBe("answering this changes whether we add OAuth or just API keys");
+    });
 });
 describe("resolveGapsNonInteractive()", () => {
-    it("fills gaps with best-effort from project context", () => {
-        const gaps = detectClarityGaps("fix auth", "debug", 0.7, EMPTY_PROJECT);
+    it("fills gaps with best-effort defaults from the model options + project context", () => {
+        const gaps = [
+            {
+                dimension: "outcome",
+                description: "Model-generated clarification #1",
+                suggestedQuestion: "What outcome do you expect?",
+                options: ["Error resolved", "Other (type free answer)"],
+                defaultIndex: 0,
+            },
+        ];
         const resolved = resolveGapsNonInteractive(gaps, EMPTY_PROJECT, "fix auth");
+        expect(resolved.outcome).toBe("Error resolved");
+        expect(resolved.scope.length).toBeGreaterThan(0);
+    });
+    it("falls back to the raw-derived outcome when there is no outcome gap", () => {
+        const resolved = resolveGapsNonInteractive([], EMPTY_PROJECT, "fix the login bug");
         expect(resolved.outcome).toBeTruthy();
         expect(resolved.scope.length).toBeGreaterThan(0);
     });

package/dist/src/pil/__tests__/layer4-gsd.test.js CHANGED Viewed

@@ -75,6 +75,43 @@ describe("layer4Gsd (gsd-native)", () => {
         const result = await layer4Gsd(makeCtx({ raw: "review the pull request" }));
         expect(["review", "discuss", "execute"]).toContain(result.gsdPhase);
     });
+    it("routes a question-shaped analyze/debug prompt to the QUESTION directive (no 'state a plan')", async () => {
+        // De-robotizing: a plain question must not get the STANDARD "state a 2-3 line
+        // plan" scaffold even when L1 classifies it analyze/debug (not "general").
+        const q = "why does the build fail intermittently?";
+        const result = await layer4Gsd(makeCtx({ raw: q, enriched: q, taskType: "debug", intentKind: "task" }));
+        expect(result.enriched).toContain("QUESTION / explanatory");
+        expect(result.enriched).not.toContain("State a 2-3 line plan");
+    });
+    it("treats a genuine general question (general + task) as informational", async () => {
+        const q = "what does the enrichment layer do?";
+        const result = await layer4Gsd(makeCtx({ raw: q, enriched: q, taskType: "general", intentKind: "task" }));
+        expect(result.enriched).toContain("QUESTION / explanatory");
+    });
+    it("does NOT treat an implementation request as informational even if phrased as a question", async () => {
+        // isImplementationIntent guards the question clause: "can you refactor … and
+        // wire up …" is a real edit task → STANDARD scaffold, not the QUESTION directive.
+        const q = "can you refactor the dropdown and wire up the keyboard handlers?";
+        const result = await layer4Gsd(makeCtx({ raw: q, enriched: q, taskType: "refactor", intentKind: "task" }));
+        expect(result.enriched).not.toContain("QUESTION / explanatory");
+    });
+    it("Phase 2b: deliverableKind='answer' is informational even for an imperative (no '?') prompt", async () => {
+        // The raw text is a plain imperative — the legacy regex (isQuestionLike /
+        // isMetaAnalysisPrompt) would NOT mark it informational. The model's
+        // deliverableKind='answer' must override that and route to the QUESTION
+        // directive — proving L4 consumes the model signal, not the regex.
+        const raw = "go over the auth module and tell me what it does";
+        const result = await layer4Gsd(makeCtx({ raw, enriched: raw, taskType: "analyze", intentKind: "task", deliverableKind: "answer" }));
+        expect(result.enriched).toContain("QUESTION / explanatory");
+    });
+    it("Phase 2b: deliverableKind='code' is NOT informational even for a question-shaped prompt", async () => {
+        // The raw text reads as a question — the legacy regex would mark it
+        // informational. The model's deliverableKind='code' must override that so
+        // the STANDARD implement scaffold is used (the deliverable is file edits).
+        const raw = "why not just refactor the dropdown and wire the keyboard handlers?";
+        const result = await layer4Gsd(makeCtx({ raw, enriched: raw, taskType: "refactor", intentKind: "task", deliverableKind: "code" }));
+        expect(result.enriched).not.toContain("QUESTION / explanatory");
+    });
     it("uses ctx.gsdPhase from L1 (unified path) without calling routeTask", async () => {
         const { routeTask } = await import("../../ee/bridge.js");
         vi.mocked(routeTask).mockClear();

package/dist/src/pil/__tests__/layer6-output.test.js CHANGED Viewed

@@ -1,5 +1,5 @@
 import { beforeEach, describe, expect, it, vi } from "vitest";
-import { applyPilSuffix, getResponseToolSet, layer6Output } from "../layer6-output.js";
+import { applyPilSuffix, getResponseToolSet, isImplementationIntent, isQuestionLike, layer6Output, } from "../layer6-output.js";
 // Mock bridge for PIL-03 classifyViaBrain tests
 vi.mock("../../ee/bridge.js", () => ({
     classifyViaBrain: vi.fn().mockResolvedValue(null),
@@ -55,6 +55,38 @@ describe("applyPilSuffix — per-task-type suffixes", () => {
             expect(result).toMatch(/Tôi sẽ/); // bilingual
         }
     });
+    it("de-robotized: NO_PREAMBLE bans only openers, not end-of-turn summary or inter-tool narration", () => {
+        // The summary + inter-tool bans were removed because they stripped natural
+        // connective tissue (the "máy móc" feel). Inter-tool spam is still removed
+        // structurally by stripInterToolNarration() in reasoning.ts. This guards
+        // against the bans silently creeping back into the system prompt.
+        const result = applyPilSuffix("S", makeCtx("debug", "concise"));
+        expect(result).toMatch(/FORBIDDEN OPENERS/);
+        expect(result).not.toMatch(/FORBIDDEN END-OF-TURN SUMMARY/);
+        expect(result).not.toMatch(/FORBIDDEN INTER-TOOL NARRATION/);
+    });
+    it("de-robotized: debug suffix is guidance, not a rigid arrow skeleton", () => {
+        // "Format = Hypothesis → Root cause → Fix → Verify" produced stilted,
+        // label-prefixed answers. It must read as guidance now.
+        const result = applyPilSuffix("S", makeCtx("debug", "concise"));
+        expect(result).toContain("OUTPUT RULES (debug)");
+        expect(result).not.toMatch(/Format = Hypothesis/);
+    });
+    it("E: appends the anti-bookkeeping note on the natural path for non-question turns", () => {
+        // The contract's REPORTING rule leaks as a provenance footer ("evidence only
+        // from this turn") on imperative answer turns; the natural path now guards it.
+        const result = applyPilSuffix("S", makeCtx("analyze", "concise"));
+        expect(result).toMatch(/WRITE FOR THE READER/);
+        expect(result).toMatch(/provenance/i);
+    });
+    it("E: skips the anti-bookkeeping note for question turns (L4 QUESTION directive covers them)", () => {
+        const ctx = { ...makeCtx("analyze", "concise"), raw: "why does the enrichment layer fail?" };
+        expect(applyPilSuffix("S", ctx)).not.toMatch(/WRITE FOR THE READER/);
+    });
+    it("E: response-tools path does not add the natural-path bookkeeping note", () => {
+        const result = applyPilSuffix("S", makeCtx("analyze", "balanced"), true);
+        expect(result).not.toMatch(/WRITE FOR THE READER/);
+    });
     it("PIL-04: response-tools path skips budget+preamble (tool already enforces structure)", () => {
         const result = applyPilSuffix("S", makeCtx("analyze", "balanced"), true);
         expect(result).toContain("respond_analyze");
@@ -74,29 +106,33 @@ describe("applyPilSuffix — per-task-type suffixes", () => {
         expect(result).toMatch(/Do NOT append an evidence-provenance footer/);
     });
 });
-describe("getResponseToolSet — PIL-04 Tier 1.1 gating", () => {
-    it("returns response tool for analyze (list-shaped, JSON wins)", () => {
-        const tools = getResponseToolSet(makeCtx("analyze", null));
+describe("getResponseToolSet — narrow gating (de-robotizing)", () => {
+    // Override raw on a typed ctx so the report/question discriminator is exercised.
+    const ctxRaw = (raw, t) => ({ ...makeCtx(t, null), raw });
+    it("returns response tool for analyze on an explicit report/list request", () => {
+        const tools = getResponseToolSet(ctxRaw("audit the orchestrator and list all cost-leak findings", "analyze"));
         expect(Object.keys(tools)).toContain("respond_analyze");
     });
-    it("returns response tool for plan (list-shaped, JSON wins)", () => {
-        const tools = getResponseToolSet(makeCtx("plan", null));
+    it("returns response tool for plan on an explicit plan request", () => {
+        const tools = getResponseToolSet(ctxRaw("plan the migration to the new auth flow step by step", "plan"));
         expect(Object.keys(tools)).toContain("respond_plan");
     });
+    it("returns response tool for debug only on an explicit report request", () => {
+        const tools = getResponseToolSet(ctxRaw("audit the failing suite and list each root cause", "debug"));
+        expect(Object.keys(tools)).toContain("respond_debug");
+    });
     it("returns empty toolset for generate (code-heavy, markdown wins)", () => {
         expect(getResponseToolSet(makeCtx("generate", null))).toEqual({});
     });
     it("returns empty toolset for refactor (diff-heavy, markdown wins)", () => {
         expect(getResponseToolSet(makeCtx("refactor", null))).toEqual({});
     });
-    it("returns response tool for debug (bounded schema, structural enforcement wins)", () => {
-        const tools = getResponseToolSet(makeCtx("debug", null));
-        expect(Object.keys(tools)).toContain("respond_debug");
-    });
     it("returns empty toolset for documentation (prose-heavy)", () => {
         expect(getResponseToolSet(makeCtx("documentation", null))).toEqual({});
     });
-    it("returns response tool for general when no providerId is passed (back-compat)", () => {
+    it("returns response tool for general regardless of report signal (renders as plain markdown)", () => {
+        // general is exempt from the report/question gate: GeneralSchema is pure text
+        // and its renderer shows plain markdown, so respond_general is never robotic.
         const tools = getResponseToolSet(makeCtx("general", null));
         expect(Object.keys(tools)).toContain("respond_general");
     });
@@ -113,24 +149,78 @@ describe("getResponseToolSet — PIL-04 Tier 1.1 gating", () => {
     it("returns empty toolset when taskType is null", () => {
         expect(getResponseToolSet(makeCtx(null, null))).toEqual({});
     });
+    it("gates the response tool for chitchat turns", () => {
+        const ctx = { ...makeCtx("general", null), intentKind: "chitchat" };
+        expect(getResponseToolSet(ctx)).toEqual({});
+    });
+    it("DROPS respond_<task> for question-style debug/analyze/plan (natural markdown path)", () => {
+        // The de-robotizing change: a plain QUESTION must not be forced into the
+        // rigid respond_* schema + labeled renderer. It falls through to the softened
+        // markdown OUTPUT RULES so the answer reads as natural prose.
+        expect(getResponseToolSet(ctxRaw("why does the build fail intermittently?", "debug"))).toEqual({});
+        expect(getResponseToolSet(ctxRaw("analyze how the enrichment function works", "analyze"))).toEqual({});
+        expect(getResponseToolSet(ctxRaw("what is the cleanest way to structure this module?", "plan"))).toEqual({});
+    });
+    it("KEEPS respond_<task> for explicit report / list / plan requests (EN + VI)", () => {
+        const keep = (raw, t) => Object.keys(getResponseToolSet(ctxRaw(raw, t)));
+        expect(keep("list all cost leaks in the orchestrator", "analyze")).toContain("respond_analyze");
+        expect(keep("review the module and report each finding by severity", "analyze")).toContain("respond_analyze");
+        expect(keep("lập kế hoạch migration sang auth flow mới", "plan")).toContain("respond_plan");
+    });
+    it("DROPS respond_<task> for a QUESTION that merely mentions plan/list (narrow-gate fix)", () => {
+        // Live bug (grok interview): a question that QUOTED the phrase "state a 2-3
+        // line plan" matched the bare word 'plan' in STRUCTURED_REPORT_RE and forced
+        // respond_plan, cramming an introspective answer into a rigid plan schema. A
+        // question-shaped prompt must stay on the natural markdown path even when it
+        // contains plan/list words.
+        expect(getResponseToolSet(ctxRaw("what rules constrain you, e.g. the 'state a 2-3 line plan' directive?", "plan"))).toEqual({});
+        expect(getResponseToolSet(ctxRaw("can you list the main points?", "analyze"))).toEqual({});
+        expect(getResponseToolSet(ctxRaw("how would you plan the rollout?", "plan"))).toEqual({});
+        // Imperative delivery requests are NOT question-shaped → still structured.
+        expect(Object.keys(getResponseToolSet(ctxRaw("plan the rollout step by step", "plan")))).toContain("respond_plan");
+    });
     it("drops respond_<task> on an IMPLEMENTATION-intent prompt (no premature terminal answer)", () => {
         // Live (grok session 19fa8895c41c): an "Improve … implement these fixes"
         // prompt classified `debug` got respond_debug; the model called it mid-task
         // as a plan and the turn ended before the edits completed. Implementation
         // turns must fall through to markdown OUTPUT RULES, not a terminal tool.
+        // Implementation intent takes precedence over a report signal.
         const impl = (raw, t) => ({ ...makeCtx(t, null), raw });
         expect(getResponseToolSet(impl("Improve the story-list screen. Implement these prioritized fixes: …", "debug"))).toEqual({});
         expect(getResponseToolSet(impl("Edit ONLY these two files and fix the empty span", "debug"))).toEqual({});
         expect(getResponseToolSet(impl("refactor the genre dropdown and wire up keyboard handlers", "analyze"))).toEqual({});
         expect(getResponseToolSet(impl("triển khai các cải tiến đã đề xuất", "plan"))).toEqual({});
     });
-    it("KEEPS respond_<task> for pure analysis/plan prompts (narrowness guard)", () => {
-        // The deliverable here IS a structured report — must not be suppressed.
-        const ana = (raw, t) => ({ ...makeCtx(t, null), raw });
-        expect(Object.keys(getResponseToolSet(ana("analyze the orchestrator for cost leaks", "analyze")))).toContain("respond_analyze");
-        expect(Object.keys(getResponseToolSet(ana("why does the build fail intermittently?", "debug")))).toContain("respond_debug");
-        expect(Object.keys(getResponseToolSet(ana("plan the migration to the new auth flow", "plan")))).toContain("respond_plan");
-        expect(Object.keys(getResponseToolSet(ana("review the auth module and explain the design", "analyze")))).toContain("respond_analyze");
+});
+describe("getResponseToolSet — Phase 2b deliverableKind consume (model overrides regex)", () => {
+    const ctxD = (raw, t, deliverableKind) => ({
+        ...makeCtx(t, null),
+        raw,
+        deliverableKind,
+    });
+    it("deliverableKind='code' DROPS respond_* even when the prompt reads as a report/list", () => {
+        // Legacy regex (prefersStructuredReport) would KEEP the tool on "list all …".
+        // The model said the deliverable is code → drop it (edits, not a report).
+        expect(getResponseToolSet(ctxD("list all cost leaks in the orchestrator", "analyze", "code"))).toEqual({});
+    });
+    it("deliverableKind='report' KEEPS respond_* even when the prompt is question-shaped", () => {
+        // Legacy regex (isQuestionLike) would DROP the tool on "why does …?". The
+        // model said the deliverable is a structured report → keep it.
+        const tools = getResponseToolSet(ctxD("why does the suite fail — break it down by cause", "analyze", "report"));
+        expect(Object.keys(tools)).toContain("respond_analyze");
+    });
+    it("deliverableKind='answer' DROPS respond_* for non-general even on a report-shaped request", () => {
+        expect(getResponseToolSet(ctxD("plan the migration step by step", "plan", "answer"))).toEqual({});
+    });
+    it("deliverableKind='answer' KEEPS respond_general (general is exempt — renders as plain markdown)", () => {
+        const tools = getResponseToolSet(ctxD("what does the enrichment layer do?", "general", "answer"));
+        expect(Object.keys(tools)).toContain("respond_general");
+    });
+    it("falls back to the legacy regex when deliverableKind is absent (null)", () => {
+        // No model signal → legacy path: question-shaped analyze drops the tool.
+        expect(getResponseToolSet({ ...makeCtx("analyze", null), raw: "why does the build fail?" })).toEqual({});
+        // …and an explicit report request keeps it.
+        expect(Object.keys(getResponseToolSet({ ...makeCtx("analyze", null), raw: "list all cost leaks" }))).toContain("respond_analyze");
     });
 });
 describe("applyPilSuffix — outputStyle variants", () => {
@@ -294,4 +384,33 @@ describe("layer6Output", () => {
         expect(result.enriched).toBe(ctx.enriched);
     });
 });
+describe("isQuestionLike — Vietnamese yes/no question frames (regression: session f6f7881a5fae)", () => {
+    it("detects the live miss: 'check ... dùng được mcp ... không nhé'", () => {
+        // The exact prompt that was mis-routed to the implement/verify scaffold.
+        expect(isQuestionLike("bạn check xem dùng được mcp muonroi-docs không nhé")).toBe(true);
+        // It is NOT an implementation intent, so layer4-gsd's informational gate fires.
+        expect(isImplementationIntent("bạn check xem dùng được mcp muonroi-docs không nhé")).toBe(false);
+    });
+    it("detects common VI yes/no tails", () => {
+        expect(isQuestionLike("dùng được không")).toBe(true);
+        expect(isQuestionLike("cái này chạy được không vậy")).toBe(true);
+        expect(isQuestionLike("đúng không")).toBe(true);
+        expect(isQuestionLike("phải không nhỉ")).toBe(true);
+        expect(isQuestionLike("test đã pass chưa")).toBe(true);
+        expect(isQuestionLike("xong chưa ạ")).toBe(true);
+        expect(isQuestionLike("có chạy được không?")).toBe(true);
+    });
+    it("does NOT treat a mid-sentence negation as a question", () => {
+        // "không là hỏng" = "or it breaks" — 'không' is not the clause-final particle.
+        expect(isQuestionLike("đừng commit file .env không là lộ key")).toBe(false);
+        // Plain imperative with a 'nhé' softener (no 'không'/'chưa' tail) stays a task.
+        expect(isQuestionLike("sửa giúp tôi cái này nhé")).toBe(false);
+        expect(isQuestionLike("triển khai tính năng login")).toBe(false);
+    });
+    it("still detects the pre-existing EN/VI question shapes", () => {
+        expect(isQuestionLike("why does the build fail?")).toBe(true);
+        expect(isQuestionLike("tại sao build lỗi")).toBe(true);
+        expect(isQuestionLike("explain the pipeline")).toBe(true);
+    });
+});
 //# sourceMappingURL=layer6-output.test.js.map

package/dist/src/pil/__tests__/llm-classify.test.js CHANGED Viewed

@@ -23,6 +23,32 @@ describe("createLlmClassifier (PIL Layer 1 Pass 4)", () => {
         expect(result?.outputStyle).toBe("concise");
         expect(result?.confidence).toBeGreaterThan(0.5);
     });
+    it("parses the three-word reply and marks chitchat from the intent word", async () => {
+        const handle = installMockModel({ fixture: { stream: textOnlyStream("general,concise,chat") } });
+        cleanup = handle.uninstall;
+        const factory = (() => handle.model);
+        const classify = createLlmClassifier(factory, "deepseek-v4-flash");
+        const result = await classify("cảm ơn bạn nhé");
+        expect(result?.taskType).toBe("general");
+        expect(result?.intentKind).toBe("chitchat");
+    });
+    it("treats a general QUESTION as task, not chitchat (keep-tools)", async () => {
+        const handle = installMockModel({ fixture: { stream: textOnlyStream("general,concise,task") } });
+        cleanup = handle.uninstall;
+        const factory = (() => handle.model);
+        const classify = createLlmClassifier(factory, "deepseek-v4-flash");
+        const result = await classify("bạn thử call tool setup_guide xem được không");
+        expect(result?.intentKind).toBe("task");
+    });
+    it("defaults intentKind to task when the model omits the third word (backward compatible)", async () => {
+        const handle = installMockModel({ fixture: { stream: textOnlyStream("debug,concise") } });
+        cleanup = handle.uninstall;
+        const factory = (() => handle.model);
+        const classify = createLlmClassifier(factory, "deepseek-v4-flash");
+        const result = await classify("fix the failing build");
+        expect(result?.taskType).toBe("debug");
+        expect(result?.intentKind).toBe("task");
+    });
     it("returns null when the reply cannot be parsed", async () => {
         const handle = installMockModel({ fixture: { stream: textOnlyStream("¯\\_(ツ)_/¯") } });
         cleanup = handle.uninstall;
@@ -100,14 +126,35 @@ describe("createLlmClassifier (PIL Layer 1 Pass 4)", () => {
         expect(result?.taskType).toBe("debug");
         expect(result?.outputStyle).toBe("concise");
     });
-    it("keeps the tiny 16-token budget for non-reasoning models", async () => {
+    it("keeps a tiny output budget for non-reasoning models (24 — four comma words)", async () => {
         const handle = installMockModel({ fixture: { stream: textOnlyStream("generate,concise") } });
         cleanup = handle.uninstall;
         const factory = (() => handle.model);
         const classify = createLlmClassifier(factory, "Qwen/Qwen3-8B"); // reasoning:false
         await classify("add a new endpoint");
         const call = handle.calls[0];
-        expect(call.maxOutputTokens).toBe(16);
+        expect(call.maxOutputTokens).toBe(24);
+    });
+    it("parses the fourth word as the output deliverable (Phase 2b)", async () => {
+        const handle = installMockModel({ fixture: { stream: textOnlyStream("debug,concise,task,code") } });
+        cleanup = handle.uninstall;
+        const factory = (() => handle.model);
+        const classify = createLlmClassifier(factory, "deepseek-v4-flash");
+        const result = await classify("fix the crash in src/auth/login.ts");
+        expect(result?.taskType).toBe("debug");
+        expect(result?.deliverableKind).toBe("code");
+    });
+    it("recovers the deliverable position-independently and defaults to null when absent", async () => {
+        const reportHandle = installMockModel({ fixture: { stream: textOnlyStream("analyze,concise,task,report") } });
+        cleanup = reportHandle.uninstall;
+        const reportClassify = createLlmClassifier((() => reportHandle.model), "deepseek-v4-flash");
+        expect((await reportClassify("list every env var the CLI reads"))?.deliverableKind).toBe("report");
+        reportHandle.uninstall();
+        // Model omits the 4th word → deliverableKind null (consumers fall back to regex).
+        const bareHandle = installMockModel({ fixture: { stream: textOnlyStream("debug,concise") } });
+        cleanup = bareHandle.uninstall;
+        const bareClassify = createLlmClassifier((() => bareHandle.model), "deepseek-v4-flash");
+        expect((await bareClassify("fix it"))?.deliverableKind).toBeNull();
     });
 });
 //# sourceMappingURL=llm-classify.test.js.map

package/dist/src/pil/agent-operating-contract.d.ts CHANGED Viewed

@@ -37,7 +37,7 @@
  * one imperative line targeting that phase's most damaging failure mode. Kept
  * tight (primacy matters more than detail; tokens are the cost).
  */
-export declare const AGENT_OPERATING_CONTRACT = "[AGENT OPERATING CONTRACT \u2014 read first; applies to every step]\n\n1. BEFORE ACTING: do only what was asked. Never assume scope or facts \u2014 if ambiguous, ask or use defaults; never invent requirements.\n2. READING: base statements on what you read/ran THIS turn. Do not infer contents of files you did not open.\n3. EXECUTING: smallest correct change; never widen scope or mask failures (no `|| true`, skipped tests, or swallowed catch).\n4. WHEN UNSURE: verify and cross-check BEFORE concluding. Bugs need a reproduction; reading code is not proof.\n5. REPORTING: answer ONLY what was asked. Every fact or file:line MUST come from this turn; else label \"unverified\"; do not guess. Synthesize evidence gracefully \u2014 do NOT dump massive verbatim tool outputs into the final answer. Cite concise file:line references. Never claim a build/test ran, or describe edits, you did not actually do this turn; if a check can't run, fix it or say so \u2014 don't imply success.\n\n6. LANGUAGE: Reply in user's detected language for final output. Internal reasoning, tools, and code remain in English.\n\n7. ANTI-M\u00D9 / COMPACTION: After seeing \"[pre-compaction warning at step...\" or \"[context compacted at step...\", decide if you need full prior tool results. Emit PRESERVE_FULL_CONTEXT for full veto this turn, or the lighter KEEP_TOOL_IDS: id1,id2 (ids from prior stub \"(id=...)\") to protect only high-value results (read_file/grep on src/PLAN/error etc are auto-protected). Use the ee_query tool with \"tool-artifact id=XXX\" for on-demand full re-hydrate of elided ones. Self-check \"task finished?\" / \"compacted yet?\". Use EE checkpoints.\n\n[END CONTRACT \u2014 instructions follow]";
+export declare const AGENT_OPERATING_CONTRACT = "[AGENT OPERATING CONTRACT \u2014 read first; applies to every step]\n\n1. BEFORE ACTING: do only what was asked. Never assume scope or facts \u2014 if ambiguous, ask or use defaults; never invent requirements.\n2. READING: base statements on what you read/ran THIS turn. Do not infer contents of files you did not open.\n3. EXECUTING: smallest correct change; never widen scope or mask failures (no `|| true`, skipped tests, or swallowed catch).\n4. WHEN UNSURE: verify and cross-check BEFORE concluding. Bugs need a reproduction; reading code is not proof.\n5. REPORTING: answer ONLY what was asked. Every fact or file:line MUST come from this turn; else label \"unverified\"; do not guess. Synthesize evidence gracefully \u2014 do NOT dump massive verbatim tool outputs into the final answer. Cite concise file:line references. Never claim a build/test ran, or describe edits, you did not actually do this turn; if a check can't run, fix it or say so \u2014 don't imply success.\n\n6. LANGUAGE: Reply in user's detected language for final output. Internal reasoning, tools, and code remain in English.\n\n7. ANTI-M\u00D9 / COMPACTION: After seeing \"[pre-compaction warning at step...\" or \"[context compacted at step...\", decide if you need full prior tool results. Emit PRESERVE_FULL_CONTEXT for full veto this turn, or the lighter KEEP_TOOL_IDS: id1,id2 (ids from prior stub \"(id=...)\") to protect only high-value results (read_file/grep on src/PLAN/error etc are auto-protected). Use the ee_query tool with \"tool-artifact id=XXX\" for on-demand full re-hydrate of elided ones. Self-check \"task finished?\" / \"compacted yet?\". Use EE checkpoints.\n\n8. GIT SAFETY: never push on red \u2014 run the check, await its result in a SEPARATE step, confirm 0 failures, then push. Never `git add -A`/`commit -a`; stage explicitly so secrets (.env, .muonroi-cli/, keys) aren't committed. Never `--no-verify`.\n\n[END CONTRACT \u2014 instructions follow]";
 export interface ContractSectionOptions {
     /** Chitchat turns carry no tools and make no factual claims — skip the contract. */
     chitchat?: boolean;

package/dist/src/pil/agent-operating-contract.js CHANGED Viewed

@@ -49,6 +49,8 @@ export const AGENT_OPERATING_CONTRACT = `[AGENT OPERATING CONTRACT — read firs
 7. ANTI-MÙ / COMPACTION: After seeing "[pre-compaction warning at step..." or "[context compacted at step...", decide if you need full prior tool results. Emit PRESERVE_FULL_CONTEXT for full veto this turn, or the lighter KEEP_TOOL_IDS: id1,id2 (ids from prior stub "(id=...)") to protect only high-value results (read_file/grep on src/PLAN/error etc are auto-protected). Use the ee_query tool with "tool-artifact id=XXX" for on-demand full re-hydrate of elided ones. Self-check "task finished?" / "compacted yet?". Use EE checkpoints.
+8. GIT SAFETY: never push on red — run the check, await its result in a SEPARATE step, confirm 0 failures, then push. Never \`git add -A\`/\`commit -a\`; stage explicitly so secrets (.env, .muonroi-cli/, keys) aren't committed. Never \`--no-verify\`.
 [END CONTRACT — instructions follow]`;
 /**
  * Build the contract block for insertion at the front of the system prompt.

package/dist/src/pil/agent-operating-contract.test.js CHANGED Viewed

@@ -42,8 +42,13 @@ describe("AGENT_OPERATING_CONTRACT", () => {
         expect(AGENT_OPERATING_CONTRACT).toMatch(/AGENT OPERATING CONTRACT/i);
         expect(AGENT_OPERATING_CONTRACT).toMatch(/END CONTRACT/i);
     });
-    it("stays compact (under 1800 chars) to preserve attention budget on every turn (anti-mù section added)", () => {
-        expect(AGENT_OPERATING_CONTRACT.length).toBeLessThan(1800);
+    it("carries the git-safety rule (never push on red; no broad git add of secrets)", () => {
+        expect(AGENT_OPERATING_CONTRACT).toMatch(/GIT SAFETY/i);
+        expect(AGENT_OPERATING_CONTRACT).toMatch(/push on red|never push/i);
+        expect(AGENT_OPERATING_CONTRACT).toMatch(/git add -A|stage explicitly/i);
+    });
+    it("stays compact (under 1900 chars) to preserve attention budget on every turn (git-safety rule added)", () => {
+        expect(AGENT_OPERATING_CONTRACT.length).toBeLessThan(1900);
     });
 });
 describe("buildContractSection", () => {

package/dist/src/pil/cheap-model-playbook.js CHANGED Viewed

@@ -25,41 +25,41 @@
  * Wrapped with the `[CRITICAL TOOL-USE RULES ...]` marker so the model knows
  * to treat these as overrides to anything that follows.
  */
-export const CHEAP_MODEL_PLAYBOOK = `[CRITICAL TOOL-USE RULES — read before invoking any tool; these override defaults that follow]
-1. Bash output is AUTOMATICALLY cached. Every \`bash\` call returns a \`run_id\`
-   (e.g. \`bash-1\`) you can re-query via \`bash_output_get(run_id, mode=tail|head|grep|lines)\`.
-   - When you want only the last N lines: do NOT pipe \`| tail -N\`. Run the
-     bare command, then call \`bash_output_get(run_id, mode=tail, lines=N)\`.
-   - Same for \`| head\`, \`| grep PATTERN\`, \`> file\`. Pipes/redirects HIDE
-     the full output from the cache; \`bash_output_get\` reads from the cache
-     without re-running.
-   - This applies to EVERY bash call, not just retries.
-   - To VIEW a file use \`read_file\` (start_line/end_line) — never sed/cat a
-     file. \`bash_output_get\` is for COMMAND output, not files.
-2. Before reading more than 3 files to understand a topic, delegate to
-   \`task(agent="explore")\`. The sub-agent returns a compressed summary;
-   you save reading tokens.
-3. Use the \`grep\` tool (ripgrep) for content search — NOT \`bash\` with
-   \`grep\` / \`find\` piped.
-4. When a tool returns \`ERROR: ...\`, do NOT retry the identical call.
-   Pick a different tool, change inputs meaningfully, or stop and report.
-5. Fix the ROOT CAUSE, never mask a failure to make it "pass"
-   (\`continue-on-error\`, swallowed try/catch, skipped/deleted test, \`|| true\`).
-   If a step fails from a missing secret/config, make it CONDITIONAL (skip when
-   absent) so it still runs when present — do NOT blanket-ignore it.
-6. For a build / CI / test failure, read the ACTUAL failure log or stack trace
-   BEFORE hypothesizing — fix the real error, not a guess from source alone.
-7. ANTI-MÙ / COMPACTION (for long sessions): On pre-warn or "[context compacted at step...", emit PRESERVE_FULL_CONTEXT (full veto) or lighter KEEP_TOOL_IDS: id1,id2 (from stub id=) to protect specific high-value results. read_file/grep/lsp/bash on src/PLAN/error are auto-kept (idea 1). Use ee.query tool with "tool-artifact id=XXX" for on-demand full. Self-check "task finished?" / "compacted yet?". Use EE checkpoints.
-[END CRITICAL TOOL-USE RULES — your regular instructions begin below]
+export const CHEAP_MODEL_PLAYBOOK = `[CRITICAL TOOL-USE RULES — read before invoking any tool; these override defaults that follow]
+1. Bash output is AUTOMATICALLY cached. Every \`bash\` call returns a \`run_id\`
+   (e.g. \`bash-1\`) you can re-query via \`bash_output_get(run_id, mode=tail|head|grep|lines)\`.
+   - When you want only the last N lines: do NOT pipe \`| tail -N\`. Run the
+     bare command, then call \`bash_output_get(run_id, mode=tail, lines=N)\`.
+   - Same for \`| head\`, \`| grep PATTERN\`, \`> file\`. Pipes/redirects HIDE
+     the full output from the cache; \`bash_output_get\` reads from the cache
+     without re-running.
+   - This applies to EVERY bash call, not just retries.
+   - To VIEW a file use \`read_file\` (start_line/end_line) — never sed/cat a
+     file. \`bash_output_get\` is for COMMAND output, not files.
+2. Before reading more than 3 files to understand a topic, delegate to
+   \`task(agent="explore")\`. The sub-agent returns a compressed summary;
+   you save reading tokens.
+3. Use the \`grep\` tool (ripgrep) for content search — NOT \`bash\` with
+   \`grep\` / \`find\` piped.
+4. When a tool returns \`ERROR: ...\`, do NOT retry the identical call.
+   Pick a different tool, change inputs meaningfully, or stop and report.
+5. Fix the ROOT CAUSE, never mask a failure to make it "pass"
+   (\`continue-on-error\`, swallowed try/catch, skipped/deleted test, \`|| true\`).
+   If a step fails from a missing secret/config, make it CONDITIONAL (skip when
+   absent) so it still runs when present — do NOT blanket-ignore it.
+6. For a build / CI / test failure, read the ACTUAL failure log or stack trace
+   BEFORE hypothesizing — fix the real error, not a guess from source alone.
+7. ANTI-MÙ / COMPACTION (for long sessions): On pre-warn or "[context compacted at step...", emit PRESERVE_FULL_CONTEXT (full veto) or lighter KEEP_TOOL_IDS: id1,id2 (from stub id=) to protect specific high-value results. read_file/grep/lsp/bash on src/PLAN/error are auto-kept (idea 1). Use ee.query tool with "tool-artifact id=XXX" for on-demand full. Self-check "task finished?" / "compacted yet?". Use EE checkpoints.
+[END CRITICAL TOOL-USE RULES — your regular instructions begin below]
 `;
 /**
  * Predicate gating playbook injection.