npm - @ngockhoale/ukit - Versions diffs - 1.5.2 → 1.5.5 - Mend

@ngockhoale/ukit 1.5.2 → 1.5.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

package/CHANGELOG.md +57 -5
package/README.md +2 -2
package/manifests/platform.full.yaml +2 -2
package/package.json +1 -1
package/src/cli/commands/doctor.js +14 -2
package/src/cli/commands/install.js +2 -2
package/src/cli/commands/uninstall.js +1 -1
package/src/index/taskRouting.js +117 -1
package/templates/.claude/agents/bug-debugger.md +48 -19
package/templates/.claude/agents/code-reviewer.md +86 -0
package/templates/.claude/agents/feature-implementer.md +59 -18
package/templates/.claude/hooks/skill-router.sh +1 -1
package/templates/.claude/hooks/verification-guard.sh +1 -1
package/templates/.claude/skills/next-step/SKILL.md +1 -1
package/templates/.claude/ukit/index/post-edit-verify.mjs +3 -2
package/templates/.claude/ukit/index/route-task.mjs +8 -4
package/templates/.claude/ukit/runtime/output-compression.mjs +37 -1
package/templates/AGENTS.md +7 -0
package/templates/CLAUDE.md +8 -1
package/templates/docs/AI_HANDOFF/ACTIVE.md +9 -0
package/templates/docs/AI_HANDOFF/HISTORY.md +4 -0
package/templates/docs/AI_HANDOFF/INDEX.md +13 -0
package/templates/docs/AI_HANDOFF/PLAN.md +75 -0
package/templates/docs/AI_HANDOFF/RULES.md +127 -0
package/templates/docs/AI_HANDOFF/archive/.gitkeep +0 -0
package/templates/docs/AI_HANDOFF/tasks/.gitkeep +0 -0
package/templates/docs/AI_HANDOFF/tasks/_TEMPLATE.md +72 -0
package/templates/docs/INSTALL.md +2 -2
package/templates/docs/PROJECT.md +1 -1
package/templates/docs/UKIT_USAGE_GUIDE.md +1 -1
package/templates/docs/WORKLOG.md +11 -0
package/templates/ukit/storage/config.json +93 -1
package/templates/docs/AI_HANDOFF.md +0 -118

package/CHANGELOG.md CHANGED Viewed

@@ -2,23 +2,75 @@
 All notable changes to UKit are documented here.
+## 1.5.5 - 2026-05-30
+### Added
+- **Handoff Quality Gate** — opt-in, tool-agnostic, file-based, 4-phase pipeline (Idea+Plan → Create Tasks → Implement+Test → Review+Test) for tasks routed through `docs/AI_HANDOFF/`. Daily ad-hoc prompts are NOT affected and continue with the existing lightweight workflow.
+- New agent `templates/.claude/agents/code-reviewer.md`: independent reviewer with a model-isolation check. Refuses to review when `EXECUTOR_MODEL` equals reviewer's own model, when either is missing/unknown, or when re-run of Verification Commands fails. Emits a structured `## Reviewer Verdict` block.
+- New config block `handoff.*` in `templates/ukit/storage/config.json` with Vietnamese `_help` entries: `handoff.plan.requireTestPlan`, `handoff.executor.testFirstRequired`, `handoff.reviewer.model` (default `unic-smart`), `handoff.reviewer.blockOnCritical`, etc. The `*.modelHint` fields are explicitly documented as *labels for humans, not enforced selectors* — model enforcement happens via executor self-report + reviewer refusal.
+- New `## 4. Test Plan (REQUIRED — TDD-style)` section in `templates/docs/AI_HANDOFF/PLAN.md`.
+- New `templates/docs/AI_HANDOFF/tasks/_TEMPLATE.md`: every task file MUST embed its own `## Test Cases` (table with happy + ≥1 edge case + regression for bug fixes) + `## Test Files` + `## Verification Commands` + `## Acceptance Criteria`. Tasks without these are `needs_breakdown`, not `ready`.
+- `## Discussion` section pattern on task files: structured AI-to-AI comment thread (date · role · tool/model) so planner/executor/reviewer can talk back to each other through files — no out-of-band channels.
+- New task status states in `INDEX.md`: `pending_review`, `changes_requested`, `critical_block`, `approved`, `approved_minor`, plus Owner/Reviewer columns.
+### Changed
+- `templates/.claude/agents/feature-implementer.md`: now operates in two explicit modes. **Daily mode** (DEFAULT): unchanged lightweight flow — tests only when touched code has coverage, no reviewer trigger. **Handoff mode** (when task lives under `docs/AI_HANDOFF/tasks/`): test-first → green → reviewer; cannot claim `STATUS: DONE` without fresh PASS in turn. Report format now includes `EXECUTOR_TOOL`, `EXECUTOR_MODEL`, `EXECUTOR_SUBAGENT` self-report.
+- `templates/.claude/agents/bug-debugger.md`: same two-mode split. Handoff mode requires regression-test-first (RED before fix, GREEN after); Daily mode unchanged.
+- `templates/docs/AI_HANDOFF/RULES.md`: rewritten around the 4-phase model. Added `## Hard rule — All work stays in docs/AI_HANDOFF/` (no out-of-band AI communication). Added Phase 2 TDD-embedded requirement. Auto-compact 80-line rule scoped to state files (ACTIVE/INDEX/tasks); PLAN.md and RULES.md exempt.
+- `templates/CLAUDE.md` + `templates/AGENTS.md`: added scoped `## Handoff Quality Gate` section with explicit OPT-IN scope language so adapter targets (Claude Code, Kilo Code, Codex, OpenCode, future tools) only apply Quality Gate to handoff work, not daily prompts.
+### Fixed
+- Fixed output-history deduplication for `promptCache: false` — the second call with semantically equivalent (but noisy) output now returns the cached first summary instead of recomputing. Root cause: `normalizeOutputSummaryForDedupe` wasn't stripping `FAIL`/`PASS`/`Test Files`/`Tests`/`Duration`/`Start at` lines from the dedup key, so slight token-budget differences between two similar payloads produced different keys and broke cache hits. Added `findOutputHistoryEntry` lookup in `main()` to serve cached summaries without prompt-cache.
+### Why
+User had two real concerns: (1) cheap-model executors (e.g. Kilo Code) miss small things — fixed by mandatory test-first + independent reviewer with different model; (2) UKit cannot dictate which model any external tool uses — fixed by making the contract self-report-based (executor writes `EXECUTOR_MODEL`, reviewer compares and refuses on match/unknown). The Quality Gate is opt-in via the `docs/AI_HANDOFF/` folder so daily ad-hoc work is unaffected.
+## 1.5.4 - 2026-05-28
+### Fixed
+- Fixed stale `docs/AI_HANDOFF.md` references in next-step skill, INSTALL.md, UKIT_USAGE_GUIDE.md, PROJECT.md (template + local).
+- Fixed mirror `route-task.mjs` not showing `handoff=` in summary — wrong intent order (docsSpecific before handoff) + missing handoffFile field + missing display segment.
+- Added `WORKLOG_ARCHIVE.md` to `.gitignore`.
+### Changed
+- Compacted `docs/MEMORY.md` from 118KB/298 lines to 7.8KB/78 lines. Old entries archived to `docs/MEMORY_ARCHIVE.md`.
+- Cleared handoff task queue after all cycle 002 tasks resolved.
+## 1.5.3 - 2026-05-28
+### Changed
+- Restructured `docs/AI_HANDOFF.md` into `docs/AI_HANDOFF/` folder: ACTIVE.md (snapshot), RULES.md (flow + token budget), PLAN.md (brainstorm), INDEX.md (task index), tasks/ (per-task files), archive/ (completed cycles, max 3).
+- Each task is now an isolated file in `tasks/TASK-xxx.md` — AI reads only the task it implements, not the entire backlog.
+- Added token budget rule: combined handoff reads must stay under 200 lines per request.
+- Added auto-compact rule: if any handoff file exceeds 80 lines, trigger `clear handoff`.
+- Updated template mirror `templates/docs/AI_HANDOFF/` to match new folder structure.
+- Updated `taskRouting.js` handoffFile target from `docs/AI_HANDOFF.md` to `docs/AI_HANDOFF/ACTIVE.md`.
+- Updated `install.js`, `doctor.js`, `uninstall.js` to check for `docs/AI_HANDOFF/ACTIVE.md`.
 ## 1.5.2 - 2026-05-28
 ### Added
 - Added first-class `ukit:handoff` prompt detection and routing for explicit handoff/brainstorm-to-task flows.
-- Added `intentMode: handoff` to route summaries so hooks and helpers can prioritize `docs/AI_HANDOFF.md` automatically.
-- Added a reusable generic `docs/AI_HANDOFF.md` handoff template for installed projects.
+- Added `intentMode: handoff` to route summaries so hooks and helpers can prioritize handoff file automatically.
+- Added a reusable generic handoff template for installed projects.
 ### Changed
-- Handoff prompts now route through `docs-quality` skill instead of generic `update-status`, making `docs/AI_HANDOFF.md` the primary coordination artifact.
-- Updated installed `next-step` skill guidance to read `docs/AI_HANDOFF.md` first for explicit handoff prompts.
+- Handoff prompts now route through `docs-quality` skill instead of generic `update-status`.
+- Updated installed `next-step` skill guidance to read handoff file first for explicit handoff prompts.
 - Updated `route-task` mirror and hook behavior in both source and installed artifact so `ukit install` users get consistent handoff routing.
 ### Fixed
-- Handoff authoring is now advisory (exit 0) in Safe Patch, so large `docs/AI_HANDOFF.md` batches no longer block the handoff workflow.
+- Handoff authoring is now advisory (exit 0) in Safe Patch, so large handoff batches no longer block the handoff workflow.
 - Hard runtime/shared-risk file broad rewrites still block by default unless `advisoryOnly=true` or `UKIT_SAFE_PATCH_ADVISORY=1` is set.
 ### Tests

package/README.md CHANGED Viewed

@@ -32,7 +32,7 @@ ukit install
 4. Fill in the generated docs baseline:
    - `docs/PROJECT.md`
    - `docs/MEMORY.md`
-   - `docs/AI_HANDOFF.md`
+   - `docs/AI_HANDOFF/`
    - `docs/WORKLOG.md`
 5. Open your AI tool and work in natural language.
@@ -90,7 +90,7 @@ UKit v1.3.1 keeps the same shared runtime contract while adding Safe Patch Proto
 - install globally with `npm install -g @ngockhoale/ukit`
 - keep using the exact same human workflow inside projects: `ukit install`
 - preserve the same `ukit` binary, hooks, and install-first orchestration while standardizing the runtime root as hidden `.ukit/`
-- install `docs/AI_HANDOFF.md` as the default cross-AI handoff file for plan → task breakdown → implementation continuity, with explicit sections so one AI can plan, another can refine tasks, and another can implement them reliably
+- install `docs/AI_HANDOFF/` as the cross-AI handoff folder with per-task isolation: ACTIVE.md (snapshot), INDEX.md (task index), tasks/ (one file per task) so each AI reads only the task it needs, with token budget rules in RULES.md
 - auto-route open-ended “what next?” / “continue” prompts to the `next-step` skill with a visible freshness cue when status may be stale
 - auto-route explicit handoff/wrap-up requests to the `update-status` skill while skipping trivial/no-state-change tasks
 - keep concrete debug/implementation/review prompts primary, so project status never replaces source/index-first task work

package/manifests/platform.full.yaml CHANGED Viewed

@@ -145,8 +145,8 @@ items:
   - id: docs-ai-handoff
     type: config
-    sourceTemplate: docs/AI_HANDOFF.md
-    targetPath: docs/AI_HANDOFF.md
+    sourceTemplate: docs/AI_HANDOFF
+    targetPath: docs/AI_HANDOFF
     requires:
       - docs-project
       - docs-memory

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ngockhoale/ukit",
-  "version": "1.5.2",
+  "version": "1.5.5",
   "description": "Install/update an index-first AI workspace for Claude Code, Antigravity, OpenAI Codex, and OpenCode.",
   "license": "MIT",
   "type": "module",

package/src/cli/commands/doctor.js CHANGED Viewed

@@ -1,4 +1,5 @@
 import path from 'node:path';
+import fs from 'node:fs/promises';
 import { pathExists, readJsonIfExists } from '../../core/fileOps.js';
 import { buildPathConfig } from '../../core/paths.js';
 import { buildRuntimePaths } from '../../core/runtimePaths.js';
@@ -45,6 +46,15 @@ export async function runDoctor({ packageRoot, projectRoot, argv = [] }) {
     : [];
   const codexAdapterTracked = trackedPaths.some((entry) => entry.startsWith('.codex/'));
+  const WORKLOG_MAX_LINES = 600;
+  let worklogLineCount = 0;
+  try {
+    const content = await fs.readFile(path.join(projectRoot, 'docs', 'WORKLOG.md'), 'utf8');
+    worklogLineCount = content.split('\n').length;
+  } catch {
+    // file may not exist
+  }
   const checks = {
     manifestLoaded: Boolean(manifest?.name),
     templatesDirExists: await pathExists(pathConfig.templatesRoot),
@@ -62,8 +72,9 @@ export async function runDoctor({ packageRoot, projectRoot, argv = [] }) {
     sessionMemoryDirExists: await pathExists(runtimePaths.sessionsDir),
     docsProjectExists: await pathExists(path.join(projectRoot, 'docs', 'PROJECT.md')),
     docsMemoryExists: await pathExists(path.join(projectRoot, 'docs', 'MEMORY.md')),
-    docsAiHandoffExists: await pathExists(path.join(projectRoot, 'docs', 'AI_HANDOFF.md')),
+    docsAiHandoffExists: await pathExists(path.join(projectRoot, 'docs', 'AI_HANDOFF', 'ACTIVE.md')),
     docsWorklogExists: await pathExists(path.join(projectRoot, 'docs', 'WORKLOG.md')),
+    docsWorklogUnderBudget: worklogLineCount <= WORKLOG_MAX_LINES,
     allProvidersConfigured: providers.allSupported,
     ...(codexAdapterTracked
       ? {
@@ -104,8 +115,9 @@ export async function runDoctor({ packageRoot, projectRoot, argv = [] }) {
   console.log(`[UKit]   ${ok(checks.sessionMemoryDirExists)}  .ukit/storage/memory/sessions/`);
   console.log(`[UKit]   ${ok(checks.docsProjectExists)}  docs/PROJECT.md`);
   console.log(`[UKit]   ${ok(checks.docsMemoryExists)}  docs/MEMORY.md`);
-  console.log(`[UKit]   ${ok(checks.docsAiHandoffExists)}  docs/AI_HANDOFF.md`);
+  console.log(`[UKit]   ${ok(checks.docsAiHandoffExists)}  docs/AI_HANDOFF/`);
   console.log(`[UKit]   ${ok(checks.docsWorklogExists)}  docs/WORKLOG.md`);
+  console.log(`[UKit]   ${ok(checks.docsWorklogUnderBudget)}  docs/WORKLOG.md under budget (${worklogLineCount}/${WORKLOG_MAX_LINES} lines)`);
   if (codexAdapterTracked) {
     console.log(`[UKit]   ${ok(checks.codexReadmeExists)}  .codex/README.md`);
     console.log(`[UKit]   ${ok(checks.codexSettingsExists)}  .codex/settings.json`);

package/src/cli/commands/install.js CHANGED Viewed

@@ -239,7 +239,7 @@ export async function runInstall({ packageRoot, projectRoot, packageVersion, arg
   const docsLabels = [
     'docs/PROJECT.md',
     'docs/MEMORY.md',
-    'docs/AI_HANDOFF.md',
+    'docs/AI_HANDOFF/ACTIVE.md',
     'docs/WORKLOG.md',
   ];
@@ -254,7 +254,7 @@ export async function runInstall({ packageRoot, projectRoot, packageVersion, arg
   if (missingDocs.length > 0) {
     console.log(`[UKit] Missing docs — fill these in before first use: ${missingDocs.join(', ')}`);
   } else {
-    console.log('[UKit] Docs baseline ready: docs/PROJECT.md, docs/MEMORY.md, docs/AI_HANDOFF.md, docs/WORKLOG.md');
+    console.log('[UKit] Docs baseline ready: docs/PROJECT.md, docs/MEMORY.md, docs/AI_HANDOFF/, docs/WORKLOG.md');
     console.log('[UKit] Fill them once with real project context for the best results.');
   }

package/src/cli/commands/uninstall.js CHANGED Viewed

@@ -47,5 +47,5 @@ export async function runUninstall({ projectRoot, argv = [] }) {
   }
   console.log(`[UKit] Uninstall complete. Removed ${result.removed}/${result.attempted} managed paths.`);
-  console.log('[UKit] Note: docs/PROJECT.md, docs/MEMORY.md, docs/AI_HANDOFF.md, docs/WORKLOG.md contain user content and were preserved. Delete manually if needed.');
+  console.log('[UKit] Note: docs/PROJECT.md, docs/MEMORY.md, docs/AI_HANDOFF/, docs/WORKLOG.md contain user content and were preserved. Delete manually if needed.');
 }

package/src/index/taskRouting.js CHANGED Viewed

@@ -139,6 +139,10 @@ export async function deriveTaskRoute({
     contextRecommendation,
     verificationRecommendation,
   });
+  const handoffBudget = intentMode === 'handoff'
+    ? await checkHandoffBudget(absoluteRoot)
+    : null;
+  const worklogBudget = await checkWorklogBudget(absoluteRoot);
   const routeSummary = buildRouteSummary({
     activeSkills,
     routingContext: {
@@ -157,6 +161,8 @@ export async function deriveTaskRoute({
     contextRecommendation,
     verificationRecommendation,
     nextAction,
+    handoffBudget,
+    worklogBudget,
   });
   const approachSelector = routeSummary?.approachSelector ?? null;
@@ -180,6 +186,8 @@ export async function deriveTaskRoute({
     verificationRecommendation,
     nextAction,
     routeSummary,
+    handoffBudget,
+    worklogBudget,
     ...(degradedWarnings.length > 0 ? { degradedWarnings } : {}),
   };
 }
@@ -190,6 +198,8 @@ export function buildRouteSummary({
   contextRecommendation = null,
   verificationRecommendation = null,
   nextAction = null,
+  handoffBudget = null,
+  worklogBudget = null,
 } = {}) {
   const autonomyLevel = routingContext.autonomyLevel ?? 'balanced';
   const delegationRecommendation = deriveDelegationRecommendation({
@@ -243,7 +253,7 @@ export function buildRouteSummary({
       ),
   );
   const nextActionCommand = compactHelperLane ? null : nextAction?.command ?? null;
-  const handoffFile = routingContext.intentMode === 'handoff' ? 'docs/AI_HANDOFF.md' : null;
+  const handoffFile = routingContext.intentMode === 'handoff' ? 'docs/AI_HANDOFF/ACTIVE.md' : null;
   const summaryLine = [
     routingContext.taskType ? `task=${routingContext.taskType}` : null,
     handoffFile ? `handoff=${handoffFile}` : null,
@@ -253,6 +263,8 @@ export function buildRouteSummary({
     editGuardHint ? `editGuard=${editGuardHint}` : null,
     delegationRecommendation?.hint ? `delegate=${delegationRecommendation.hint}` : null,
     policyMode ? `policy=${policyMode}` : null,
+    handoffBudget?.warning ? `budget=${handoffBudget.warning}` : null,
+    worklogBudget?.warning ? `budget=${worklogBudget.warning}` : null,
   ].filter(Boolean).join(' | ');
   return {
@@ -270,6 +282,8 @@ export function buildRouteSummary({
     continuationState,
     intentMode: routingContext.intentMode ?? null,
     handoffFile,
+    handoffBudget,
+    worklogBudget,
     delegateHint: delegationRecommendation?.hint ?? null,
     nextActionType: nextAction?.type ?? null,
     nextActionCommand,
@@ -1480,6 +1494,108 @@ function unique(values) {
   return [...new Set(values.filter(Boolean))];
 }
+const HANDOFF_BUDGET_MAX_LINES = 200;
+const HANDOFF_FILE_MAX_LINES = 80;
+const WORKLOG_BUDGET_MAX_LINES = 600;
+const WORKLOG_BUDGET_MAX_ENTRIES = 30;
+export async function checkHandoffBudget(rootDir) {
+  const handoffDir = path.join(rootDir, 'docs', 'AI_HANDOFF');
+  const checkFiles = ['ACTIVE.md', 'INDEX.md'];
+  let totalLines = 0;
+  const fileDetails = [];
+  for (const fileName of checkFiles) {
+    const filePath = path.join(handoffDir, fileName);
+    try {
+      const content = await fs.readFile(filePath, 'utf8');
+      const lines = content.split('\n').length;
+      totalLines += lines;
+      fileDetails.push({ file: fileName, lines });
+    } catch {
+      // File may not exist yet — not an error
+    }
+  }
+  // Also count task files
+  const tasksDir = path.join(handoffDir, 'tasks');
+  try {
+    const taskFiles = await fs.readdir(tasksDir);
+    for (const taskFile of taskFiles) {
+      if (!taskFile.endsWith('.md')) continue;
+      const content = await fs.readFile(path.join(tasksDir, taskFile), 'utf8');
+      const lines = content.split('\n').length;
+      totalLines += lines;
+      fileDetails.push({ file: `tasks/${taskFile}`, lines });
+    }
+  } catch {
+    // tasks dir may not exist
+  }
+  const oversizedFiles = fileDetails.filter((f) => f.lines > HANDOFF_FILE_MAX_LINES);
+  const overBudget = totalLines > HANDOFF_BUDGET_MAX_LINES;
+  let warning = null;
+  if (overBudget) {
+    warning = 'over-budget';
+  } else if (oversizedFiles.length > 0) {
+    warning = 'file-oversized';
+  }
+  return {
+    totalLines,
+    maxLines: HANDOFF_BUDGET_MAX_LINES,
+    fileDetails,
+    oversizedFiles: oversizedFiles.map((f) => f.file),
+    overBudget,
+    warning,
+    action: warning ? 'clear-handoff' : null,
+  };
+}
+export async function checkWorklogBudget(rootDir) {
+  const worklogPath = path.join(rootDir, 'docs', 'WORKLOG.md');
+  try {
+    const content = await fs.readFile(worklogPath, 'utf8');
+    const lines = content.split('\n').length;
+    const entryMatches = content.match(/^## \d{4}-\d{2}-\d{2}/gm) ?? [];
+    const entryCount = entryMatches.length;
+    const overLineBudget = lines > WORKLOG_BUDGET_MAX_LINES;
+    const overEntryBudget = entryCount > WORKLOG_BUDGET_MAX_ENTRIES;
+    const overBudget = overLineBudget || overEntryBudget;
+    let warning = null;
+    if (overLineBudget && overEntryBudget) {
+      warning = 'worklog-lines-and-entries-over';
+    } else if (overLineBudget) {
+      warning = 'worklog-lines-over';
+    } else if (overEntryBudget) {
+      warning = 'worklog-entries-over';
+    }
+    return {
+      lines,
+      maxLines: WORKLOG_BUDGET_MAX_LINES,
+      entryCount,
+      maxEntries: WORKLOG_BUDGET_MAX_ENTRIES,
+      overBudget,
+      warning,
+      action: warning ? 'compact-worklog' : null,
+    };
+  } catch {
+    return {
+      lines: 0,
+      maxLines: WORKLOG_BUDGET_MAX_LINES,
+      entryCount: 0,
+      maxEntries: WORKLOG_BUDGET_MAX_ENTRIES,
+      overBudget: false,
+      warning: null,
+      action: null,
+    };
+  }
+}
 const DELEGATABLE_IMPLEMENTATION_SKILL_IDS = new Set([
   'delivery',
   'frontend',

package/templates/.claude/agents/bug-debugger.md CHANGED Viewed

@@ -8,50 +8,79 @@ tools: ["Read", "Grep", "Glob", "Bash", "Edit", "TodoWrite"]
 Systematic debugging — understand before fixing.
+**Two modes:**
+- **Daily/ad-hoc** (DEFAULT): bug not coming from `docs/AI_HANDOFF/` → reproduce → fix → verify (no mandatory regression test if no pre-existing coverage; original lightweight flow).
+- **Handoff mode**: bug task lives in `docs/AI_HANDOFF/tasks/TASK-xxx.md` → activate Quality Gate: regression-test-first → green → reviewer.
 ## Workflow
 ### 1. Reproduce (required)
-- Run the failing command/action
-- Capture exact error message and stack trace
-- If not reproducible → document conditions and ask user
+- Run the failing command/action.
+- Capture exact error message and stack trace.
+- If not reproducible → document conditions and ask user.
 ### 2. Trace Root Cause
-- Read error location and surrounding code
-- Trace data flow: input → processing → failure point
-- Identify: is this a logic error, state error, or integration error?
+- Read error location and surrounding code.
+- Trace data flow: input → processing → failure point.
+- Identify: logic / state / integration error?
+### 3. Regression Test First (RED) — Handoff mode
+- Write a regression test that reproduces the bug as a failing test.
+- Run it: must FAIL with the original error/signature.
+- If you truly cannot write a regression test (pure UI glitch, env-only issue), document why and attach a manual repro script.
+- **Daily mode**: write a regression test only if the file already has tests; otherwise rely on the original repro command for verification.
-### 3. Fix
+### 4. Fix (GREEN)
-- Apply smallest reliable fix at the root cause
-- Do NOT patch symptoms — fix the cause
+- Apply smallest reliable fix at the root cause.
+- Do NOT patch symptoms — fix the cause.
+- Re-run the regression test: must PASS.
-### 4. Verify
+### 5. Verify
-- Re-run the original failing command → must pass
-- Run related tests: `yarn test [relevant-file]`
-- Check no regression in adjacent functionality
+- Re-run the original failing command → must pass.
+- Run related tests: `yarn test [relevant-file]`.
+- If shared code touched, run wider suite.
+- Check no regression in adjacent functionality.
-### 5. Report
+### 6. Report
 ```
 STATUS: DONE | BLOCKED | PARTIAL
+EXECUTOR_TOOL: [claude-code | kilo-code | codex | opencode | other]
+EXECUTOR_MODEL: [exact model name you are running as. "unknown" if you cannot tell.]
+EXECUTOR_SUBAGENT: [subagent name within your host, if any, else "-"]
 SUMMARY: [1-2 sentences — root cause and fix]
 ROOT_CAUSE: [what caused the bug]
+REGRESSION_TEST:
+  file: [path]
+  red_before: [exact error captured]
+  green_after: [pass output line]
 FILES_CHANGED:
   - [file path]: [what changed]
-VERIFIED: [original failing command now passes + test output]
+VERIFICATION:
+  command: [exact command]
+  result: [N pass / M fail / exit code]
 ISSUES: [any remaining risks or edge cases, or "none"]
-NEXT: [follow-up needed, or "nothing — bug fixed"]
+HANDOFF_TO_REVIEWER: yes | no — reason
+NEXT: [follow-up needed, or "ready for review"]
 ```
+### 7. Trigger Reviewer — Handoff mode ONLY
+Daily mode: skip. Handoff mode: set task status `pending_review` in `INDEX.md`; a reviewer session (model from `handoff.reviewer.model`, MUST differ from this debugger's model) will pick it up.
 ## Rules
-- Don't patch blindly — confirm root cause with evidence
+- **Iron law (Handoff mode):** no `DONE` without (a) regression test passing and (b) original failing command passing, both in this turn.
+- **Daily mode:** original — original failing command must pass; regression test optional unless prior coverage exists.
+- Don't patch blindly — confirm root cause with evidence.
 - For bug triage, use graduated doc budget:
   - obvious/simple bug: `docs/MEMORY.md` only
   - non-trivial bug: `docs/MEMORY.md` + `docs/PROJECT.md` + `docs/CODE_MAP.md`
   - read `docs/WORKLOG.md` only recent relevant entries
-- Keep fix scope minimal — no drive-by refactors
-- If root cause is unclear after 5 minutes of tracing → ask user for more context
+- Keep fix scope minimal — no drive-by refactors.
+- If root cause is unclear after 5 minutes of tracing → ask user for more context.

package/templates/.claude/agents/code-reviewer.md ADDED Viewed

@@ -0,0 +1,86 @@
+---
+name: code-reviewer
+description: "Independent reviewer for handoff Phase 3. Use after executor reports STATUS: DONE on a handoff task. MUST run with a model different from the executor (configured in .ukit/storage/config.json → handoff.reviewer.model, default unic-smart). Produces a verdict: APPROVED | APPROVED-WITH-MINOR | CHANGES-REQUESTED | CRITICAL."
+model: inherit
+color: yellow
+tools: ["Read", "Grep", "Glob", "Bash"]
+---
+You are the independent reviewer for UKit's handoff Quality Gate. Your model is configured in `.ukit/storage/config.json` → `handoff.reviewer.model` and MUST differ from the executor's model. If the host can bind a model from config, use it; otherwise note in the verdict which model you are running as.
+**Do not invent issues. Do not rubber-stamp.** Every finding must point at a specific file + line + concrete failure mode.
+## Inputs you expect
+- Path to task file: `docs/AI_HANDOFF/tasks/TASK-xxx.md` (has Test Plan §4 + Verification Commands + Executor Report at bottom).
+- The executor's `STATUS: DONE` report with FILES_CHANGED + VERIFICATION block.
+- The diff (use `git diff` or read FILES_CHANGED directly).
+If any input is missing, return `CHANGES-REQUESTED` with reason "incomplete handoff package".
+## Review order
+1. **Test Plan adherence** — Were all tests in §4 actually implemented? Run them yourself: `<task Verification Commands>`. Fresh PASS required, no trusting executor's output blindly.
+2. **Correctness** — Does the diff implement the requested behavior? Any obvious wrong assumptions, stale refs, missing cases?
+3. **Regression risk** — What existing behavior could this break? Are shared paths/tests/contracts still aligned? Run the wider test suite if shared code was touched.
+4. **Safety / security / data loss** — Destructive actions, auth/permission, path handling, unsafe shell/DB/file ops.
+5. **Performance / scale** — Accidental N+1, repeated I/O, large scans inside hot paths.
+6. **Maintainability** — Duplicated logic, dead branches, misleading naming, drift between docs/tests/source.
+## Severity ladder
+- **CRITICAL** — security hole, data loss risk, broken core behavior, test was faked (no real assertion), or verification command does NOT actually pass when you re-run it. Blocks handoff cứng.
+- **CHANGES-REQUESTED** — Important issues: missing edge-case test, regression risk in shared code, wrong abstraction at scope boundary. Executor must fix and re-submit.
+- **APPROVED-WITH-MINOR** — Minor naming / doc / style issues. Logged on task file but handoff allowed.
+- **APPROVED** — Clean.
+## Output (append to task file as `## Reviewer Verdict`)
+```
+## Reviewer Verdict
+VERDICT: APPROVED | APPROVED-WITH-MINOR | CHANGES-REQUESTED | CRITICAL
+REVIEWER_MODEL: [model name actually used]
+EXECUTOR_MODEL: [from executor report]
+VERIFICATION_RERUN:
+  command: [exact command]
+  result: [N pass / M fail]
+TEST_PLAN_COVERAGE: [all-followed | partial — list gaps | missing — list]
+FINDINGS:
+  critical:
+    - file: <path:line> — <what fails, how>
+  important:
+    - file: <path:line> — <what risk, evidence>
+  minor:
+    - file: <path:line> — <what to clean up>
+NEXT_STATUS_FOR_INDEX: approved | approved_minor | changes_requested | critical_block
+NOTES: [1-2 sentences for human reviewer if needed]
+```
+After writing the verdict, update `docs/AI_HANDOFF/INDEX.md` row for this task: set Status = NEXT_STATUS_FOR_INDEX, set Reviewer = your model name.
+## Model isolation check (FIRST thing you do)
+UKit cannot force any tool to use a specific model. The contract is enforced HERE, by you, via self-report comparison.
+1. Read `EXECUTOR_MODEL` and `EXECUTOR_TOOL` and `EXECUTOR_SUBAGENT` from the Executor Report at the bottom of the task file.
+2. Identify your own model. Your model SHOULD match `handoff.reviewer.model` in `.ukit/storage/config.json`. State both in the verdict.
+3. Apply this table:
+| Executor model        | Your model            | Action                                                                                     |
+|-----------------------|-----------------------|--------------------------------------------------------------------------------------------|
+| named, != yours       | named                 | proceed with review                                                                        |
+| named, == yours       | named                 | REFUSE -> VERDICT = CHANGES-REQUESTED, reason "reviewer model must differ from executor"   |
+| "unknown"             | named                 | proceed but mark `NOTES: executor model unverified - human, please confirm before merge`   |
+| named                 | "unknown"             | REFUSE -> VERDICT = CHANGES-REQUESTED, reason "reviewer cannot verify own model"           |
+| missing field         | any                   | REFUSE -> VERDICT = CHANGES-REQUESTED, reason "executor did not self-report model - re-run with v1.5.5+ contract" |
+Same model is the most common silent failure. Do not skip this check.
+## Rules
+- **Always re-run** the task's Verification Commands. If they fail, VERDICT = CRITICAL regardless of executor claims.
+- If executor said `TEST_PLAN_FOLLOWED: N/A` without a real justification, downgrade to at minimum CHANGES-REQUESTED.
+- Never approve when test file has no real `expect`/`assert` - that is a fake test -> CRITICAL.
+- Keep the verdict block <= 30 lines. Findings are bullet points, not essays.
+- The same-model refusal above is non-negotiable: bypassing it defeats the entire Quality Gate.