@ngockhoale/ukit 1.5.4 → 1.5.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,33 @@
2
2
 
3
3
  All notable changes to UKit are documented here.
4
4
 
5
+ ## 1.5.5 - 2026-05-30
6
+
7
+ ### Added
8
+
9
+ - **Handoff Quality Gate** — opt-in, tool-agnostic, file-based, 4-phase pipeline (Idea+Plan → Create Tasks → Implement+Test → Review+Test) for tasks routed through `docs/AI_HANDOFF/`. Daily ad-hoc prompts are NOT affected and continue with the existing lightweight workflow.
10
+ - New agent `templates/.claude/agents/code-reviewer.md`: independent reviewer with a model-isolation check. Refuses to review when `EXECUTOR_MODEL` equals reviewer's own model, when either is missing/unknown, or when re-run of Verification Commands fails. Emits a structured `## Reviewer Verdict` block.
11
+ - New config block `handoff.*` in `templates/ukit/storage/config.json` with Vietnamese `_help` entries: `handoff.plan.requireTestPlan`, `handoff.executor.testFirstRequired`, `handoff.reviewer.model` (default `unic-smart`), `handoff.reviewer.blockOnCritical`, etc. The `*.modelHint` fields are explicitly documented as *labels for humans, not enforced selectors* — model enforcement happens via executor self-report + reviewer refusal.
12
+ - New `## 4. Test Plan (REQUIRED — TDD-style)` section in `templates/docs/AI_HANDOFF/PLAN.md`.
13
+ - New `templates/docs/AI_HANDOFF/tasks/_TEMPLATE.md`: every task file MUST embed its own `## Test Cases` (table with happy + ≥1 edge case + regression for bug fixes) + `## Test Files` + `## Verification Commands` + `## Acceptance Criteria`. Tasks without these are `needs_breakdown`, not `ready`.
14
+ - `## Discussion` section pattern on task files: structured AI-to-AI comment thread (date · role · tool/model) so planner/executor/reviewer can talk back to each other through files — no out-of-band channels.
15
+ - New task status states in `INDEX.md`: `pending_review`, `changes_requested`, `critical_block`, `approved`, `approved_minor`, plus Owner/Reviewer columns.
16
+
17
+ ### Changed
18
+
19
+ - `templates/.claude/agents/feature-implementer.md`: now operates in two explicit modes. **Daily mode** (DEFAULT): unchanged lightweight flow — tests only when touched code has coverage, no reviewer trigger. **Handoff mode** (when task lives under `docs/AI_HANDOFF/tasks/`): test-first → green → reviewer; cannot claim `STATUS: DONE` without fresh PASS in turn. Report format now includes `EXECUTOR_TOOL`, `EXECUTOR_MODEL`, `EXECUTOR_SUBAGENT` self-report.
20
+ - `templates/.claude/agents/bug-debugger.md`: same two-mode split. Handoff mode requires regression-test-first (RED before fix, GREEN after); Daily mode unchanged.
21
+ - `templates/docs/AI_HANDOFF/RULES.md`: rewritten around the 4-phase model. Added `## Hard rule — All work stays in docs/AI_HANDOFF/` (no out-of-band AI communication). Added Phase 2 TDD-embedded requirement. Auto-compact 80-line rule scoped to state files (ACTIVE/INDEX/tasks); PLAN.md and RULES.md exempt.
22
+ - `templates/CLAUDE.md` + `templates/AGENTS.md`: added scoped `## Handoff Quality Gate` section with explicit OPT-IN scope language so adapter targets (Claude Code, Kilo Code, Codex, OpenCode, future tools) only apply Quality Gate to handoff work, not daily prompts.
23
+
24
+ ### Fixed
25
+
26
+ - Fixed output-history deduplication for `promptCache: false` — the second call with semantically equivalent (but noisy) output now returns the cached first summary instead of recomputing. Root cause: `normalizeOutputSummaryForDedupe` wasn't stripping `FAIL`/`PASS`/`Test Files`/`Tests`/`Duration`/`Start at` lines from the dedup key, so slight token-budget differences between two similar payloads produced different keys and broke cache hits. Added `findOutputHistoryEntry` lookup in `main()` to serve cached summaries without prompt-cache.
27
+
28
+ ### Why
29
+
30
+ User had two real concerns: (1) cheap-model executors (e.g. Kilo Code) miss small things — fixed by mandatory test-first + independent reviewer with different model; (2) UKit cannot dictate which model any external tool uses — fixed by making the contract self-report-based (executor writes `EXECUTOR_MODEL`, reviewer compares and refuses on match/unknown). The Quality Gate is opt-in via the `docs/AI_HANDOFF/` folder so daily ad-hoc work is unaffected.
31
+
5
32
  ## 1.5.4 - 2026-05-28
6
33
 
7
34
  ### Fixed
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ngockhoale/ukit",
3
- "version": "1.5.4",
3
+ "version": "1.5.6",
4
4
  "description": "Install/update an index-first AI workspace for Claude Code, Antigravity, OpenAI Codex, and OpenCode.",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -18,6 +18,7 @@ export async function runDiff({ packageRoot, projectRoot, packageVersion, argv =
18
18
  const toolsArg = parseToolsArg(argv);
19
19
  const selectedOptionalTools = resolveOptionalToolKeys(toolsArg);
20
20
  const selectedAdapterItemIds = toSelectedAdapterItemIds(selectedOptionalTools);
21
+ const withCodegraph = argv.includes('--with-codegraph');
21
22
 
22
23
  const pathConfig = buildPathConfig({ packageRoot, projectRoot });
23
24
 
@@ -26,6 +27,7 @@ export async function runDiff({ packageRoot, projectRoot, packageVersion, argv =
26
27
  pathConfig,
27
28
  dryRun: true,
28
29
  selectedAdapterItemIds,
30
+ withCodegraph,
29
31
  });
30
32
 
31
33
  console.log('[UKit] Diff preview:');
@@ -198,6 +198,7 @@ export async function pruneDeselectedAdapters({
198
198
  export async function runInstall({ packageRoot, projectRoot, packageVersion, argv = [] }) {
199
199
  const toolsArg = parseToolsArg(argv);
200
200
  const selectedOptionalTools = resolveOptionalToolKeys(toolsArg);
201
+ const withCodegraph = argv.includes('--with-codegraph');
201
202
 
202
203
  const pruneResult = await pruneDeselectedAdapters({
203
204
  projectRoot,
@@ -213,6 +214,7 @@ export async function runInstall({ packageRoot, projectRoot, packageVersion, arg
213
214
  dryRun: false,
214
215
  selectedAdapterItemIds,
215
216
  retainedManagedRelativePaths: pruneResult?.retainedManagedPaths ?? [],
217
+ withCodegraph,
216
218
  });
217
219
 
218
220
  const { create, update, unchanged, skip } = result.summary;
package/src/cli/index.js CHANGED
@@ -92,6 +92,10 @@ export async function runCli({ argv, packageRoot, projectRoot, packageVersion })
92
92
  console.log(' Values: claude, antigravity, codex, opencode');
93
93
  console.log(' Example: ukit install --tools=claude,codex,opencode');
94
94
  console.log(' Use --tools=none to disable all optional adapters');
95
+ console.log(' --with-codegraph Enable CodeGraph MCP integration in CLAUDE.md');
96
+ console.log(' Adds routing rules that prefer codegraph_* tools for');
97
+ console.log(' symbol search, context building, and impact analysis.');
98
+ console.log(' Omit this flag to remove the section on reinstall.');
95
99
  console.log('');
96
100
  console.log('Maintainer / debug examples:');
97
101
  console.log(' ukit index build');
@@ -217,6 +217,7 @@ export async function runInstallPipeline({
217
217
  dryRun,
218
218
  selectedAdapterItemIds,
219
219
  retainedManagedRelativePaths = [],
220
+ withCodegraph = false,
220
221
  }) {
221
222
  // Check for an existing install before making any changes.
222
223
  // cleanupLegacyPaths() is only safe to run on reinstalls: on a fresh install
@@ -239,6 +240,7 @@ export async function runInstallPipeline({
239
240
  stackContext,
240
241
  packageVersion,
241
242
  providerContext,
243
+ withCodegraph,
242
244
  });
243
245
 
244
246
  const plan = await buildInstallPlan({
@@ -1,4 +1,17 @@
1
- export function buildTemplateVariables({ projectContext, stackContext, packageVersion, providerContext }) {
1
+ const CODEGRAPH_SECTION = `
2
+ ## CodeGraph Integration (active)
3
+
4
+ CodeGraph MCP is enabled for this project. When \`codegraph_*\` tools appear in your tool list:
5
+ - **Symbol search** → \`codegraph_search\` (replaces \`node .claude/ukit/index/query-index.mjs\`)
6
+ - **Context building** → \`codegraph_context\` (replaces \`resolve-context.mjs\`)
7
+ - **Impact analysis** → \`codegraph_impact\` (replaces \`impact-context.mjs\`)
8
+ - **Call graph** → \`codegraph_trace\`, \`codegraph_callers\`, \`codegraph_callees\` (new capability, no UKit equivalent)
9
+
10
+ UKit routing, memory, and skill activation remain primary. CodeGraph is the index query layer only.
11
+ To disable: rerun \`ukit install\` without \`--with-codegraph\`.
12
+ `;
13
+
14
+ export function buildTemplateVariables({ projectContext, stackContext, packageVersion, providerContext, withCodegraph = false }) {
2
15
  return {
3
16
  project: {
4
17
  name: projectContext.project.name,
@@ -35,5 +48,6 @@ export function buildTemplateVariables({ projectContext, stackContext, packageVe
35
48
  user: {
36
49
  profile: 'default',
37
50
  },
51
+ codegraphSection: withCodegraph ? CODEGRAPH_SECTION : '',
38
52
  };
39
53
  }
@@ -8,50 +8,79 @@ tools: ["Read", "Grep", "Glob", "Bash", "Edit", "TodoWrite"]
8
8
 
9
9
  Systematic debugging — understand before fixing.
10
10
 
11
+ **Two modes:**
12
+ - **Daily/ad-hoc** (DEFAULT): bug not coming from `docs/AI_HANDOFF/` → reproduce → fix → verify (no mandatory regression test if no pre-existing coverage; original lightweight flow).
13
+ - **Handoff mode**: bug task lives in `docs/AI_HANDOFF/tasks/TASK-xxx.md` → activate Quality Gate: regression-test-first → green → reviewer.
14
+
11
15
  ## Workflow
12
16
 
13
17
  ### 1. Reproduce (required)
14
18
 
15
- - Run the failing command/action
16
- - Capture exact error message and stack trace
17
- - If not reproducible → document conditions and ask user
19
+ - Run the failing command/action.
20
+ - Capture exact error message and stack trace.
21
+ - If not reproducible → document conditions and ask user.
18
22
 
19
23
  ### 2. Trace Root Cause
20
24
 
21
- - Read error location and surrounding code
22
- - Trace data flow: input → processing → failure point
23
- - Identify: is this a logic error, state error, or integration error?
25
+ - Read error location and surrounding code.
26
+ - Trace data flow: input → processing → failure point.
27
+ - Identify: logic / state / integration error?
28
+
29
+ ### 3. Regression Test First (RED) — Handoff mode
30
+
31
+ - Write a regression test that reproduces the bug as a failing test.
32
+ - Run it: must FAIL with the original error/signature.
33
+ - If you truly cannot write a regression test (pure UI glitch, env-only issue), document why and attach a manual repro script.
34
+ - **Daily mode**: write a regression test only if the file already has tests; otherwise rely on the original repro command for verification.
24
35
 
25
- ### 3. Fix
36
+ ### 4. Fix (GREEN)
26
37
 
27
- - Apply smallest reliable fix at the root cause
28
- - Do NOT patch symptoms — fix the cause
38
+ - Apply smallest reliable fix at the root cause.
39
+ - Do NOT patch symptoms — fix the cause.
40
+ - Re-run the regression test: must PASS.
29
41
 
30
- ### 4. Verify
42
+ ### 5. Verify
31
43
 
32
- - Re-run the original failing command → must pass
33
- - Run related tests: `yarn test [relevant-file]`
34
- - Check no regression in adjacent functionality
44
+ - Re-run the original failing command → must pass.
45
+ - Run related tests: `yarn test [relevant-file]`.
46
+ - If shared code touched, run wider suite.
47
+ - Check no regression in adjacent functionality.
35
48
 
36
- ### 5. Report
49
+ ### 6. Report
37
50
 
38
51
  ```
39
52
  STATUS: DONE | BLOCKED | PARTIAL
53
+ EXECUTOR_TOOL: [claude-code | kilo-code | codex | opencode | other]
54
+ EXECUTOR_MODEL: [exact model name you are running as. "unknown" if you cannot tell.]
55
+ EXECUTOR_SUBAGENT: [subagent name within your host, if any, else "-"]
40
56
  SUMMARY: [1-2 sentences — root cause and fix]
41
57
  ROOT_CAUSE: [what caused the bug]
58
+ REGRESSION_TEST:
59
+ file: [path]
60
+ red_before: [exact error captured]
61
+ green_after: [pass output line]
42
62
  FILES_CHANGED:
43
63
  - [file path]: [what changed]
44
- VERIFIED: [original failing command now passes + test output]
64
+ VERIFICATION:
65
+ command: [exact command]
66
+ result: [N pass / M fail / exit code]
45
67
  ISSUES: [any remaining risks or edge cases, or "none"]
46
- NEXT: [follow-up needed, or "nothing bug fixed"]
68
+ HANDOFF_TO_REVIEWER: yes | noreason
69
+ NEXT: [follow-up needed, or "ready for review"]
47
70
  ```
48
71
 
72
+ ### 7. Trigger Reviewer — Handoff mode ONLY
73
+
74
+ Daily mode: skip. Handoff mode: set task status `pending_review` in `INDEX.md`; a reviewer session (model from `handoff.reviewer.model`, MUST differ from this debugger's model) will pick it up.
75
+
49
76
  ## Rules
50
77
 
51
- - Don't patch blindly confirm root cause with evidence
78
+ - **Iron law (Handoff mode):** no `DONE` without (a) regression test passing and (b) original failing command passing, both in this turn.
79
+ - **Daily mode:** original — original failing command must pass; regression test optional unless prior coverage exists.
80
+ - Don't patch blindly — confirm root cause with evidence.
52
81
  - For bug triage, use graduated doc budget:
53
82
  - obvious/simple bug: `docs/MEMORY.md` only
54
83
  - non-trivial bug: `docs/MEMORY.md` + `docs/PROJECT.md` + `docs/CODE_MAP.md`
55
84
  - read `docs/WORKLOG.md` only recent relevant entries
56
- - Keep fix scope minimal — no drive-by refactors
57
- - If root cause is unclear after 5 minutes of tracing → ask user for more context
85
+ - Keep fix scope minimal — no drive-by refactors.
86
+ - If root cause is unclear after 5 minutes of tracing → ask user for more context.
@@ -0,0 +1,86 @@
1
+ ---
2
+ name: code-reviewer
3
+ description: "Independent reviewer for handoff Phase 3. Use after executor reports STATUS: DONE on a handoff task. MUST run with a model different from the executor (configured in .ukit/storage/config.json → handoff.reviewer.model, default unic-smart). Produces a verdict: APPROVED | APPROVED-WITH-MINOR | CHANGES-REQUESTED | CRITICAL."
4
+ model: inherit
5
+ color: yellow
6
+ tools: ["Read", "Grep", "Glob", "Bash"]
7
+ ---
8
+
9
+ You are the independent reviewer for UKit's handoff Quality Gate. Your model is configured in `.ukit/storage/config.json` → `handoff.reviewer.model` and MUST differ from the executor's model. If the host can bind a model from config, use it; otherwise note in the verdict which model you are running as.
10
+
11
+ **Do not invent issues. Do not rubber-stamp.** Every finding must point at a specific file + line + concrete failure mode.
12
+
13
+ ## Inputs you expect
14
+
15
+ - Path to task file: `docs/AI_HANDOFF/tasks/TASK-xxx.md` (has Test Plan §4 + Verification Commands + Executor Report at bottom).
16
+ - The executor's `STATUS: DONE` report with FILES_CHANGED + VERIFICATION block.
17
+ - The diff (use `git diff` or read FILES_CHANGED directly).
18
+
19
+ If any input is missing, return `CHANGES-REQUESTED` with reason "incomplete handoff package".
20
+
21
+ ## Review order
22
+
23
+ 1. **Test Plan adherence** — Were all tests in §4 actually implemented? Run them yourself: `<task Verification Commands>`. Fresh PASS required, no trusting executor's output blindly.
24
+ 2. **Correctness** — Does the diff implement the requested behavior? Any obvious wrong assumptions, stale refs, missing cases?
25
+ 3. **Regression risk** — What existing behavior could this break? Are shared paths/tests/contracts still aligned? Run the wider test suite if shared code was touched.
26
+ 4. **Safety / security / data loss** — Destructive actions, auth/permission, path handling, unsafe shell/DB/file ops.
27
+ 5. **Performance / scale** — Accidental N+1, repeated I/O, large scans inside hot paths.
28
+ 6. **Maintainability** — Duplicated logic, dead branches, misleading naming, drift between docs/tests/source.
29
+
30
+ ## Severity ladder
31
+
32
+ - **CRITICAL** — security hole, data loss risk, broken core behavior, test was faked (no real assertion), or verification command does NOT actually pass when you re-run it. Blocks handoff cứng.
33
+ - **CHANGES-REQUESTED** — Important issues: missing edge-case test, regression risk in shared code, wrong abstraction at scope boundary. Executor must fix and re-submit.
34
+ - **APPROVED-WITH-MINOR** — Minor naming / doc / style issues. Logged on task file but handoff allowed.
35
+ - **APPROVED** — Clean.
36
+
37
+ ## Output (append to task file as `## Reviewer Verdict`)
38
+
39
+ ```
40
+ ## Reviewer Verdict
41
+
42
+ VERDICT: APPROVED | APPROVED-WITH-MINOR | CHANGES-REQUESTED | CRITICAL
43
+ REVIEWER_MODEL: [model name actually used]
44
+ EXECUTOR_MODEL: [from executor report]
45
+ VERIFICATION_RERUN:
46
+ command: [exact command]
47
+ result: [N pass / M fail]
48
+ TEST_PLAN_COVERAGE: [all-followed | partial — list gaps | missing — list]
49
+ FINDINGS:
50
+ critical:
51
+ - file: <path:line> — <what fails, how>
52
+ important:
53
+ - file: <path:line> — <what risk, evidence>
54
+ minor:
55
+ - file: <path:line> — <what to clean up>
56
+ NEXT_STATUS_FOR_INDEX: approved | approved_minor | changes_requested | critical_block
57
+ NOTES: [1-2 sentences for human reviewer if needed]
58
+ ```
59
+
60
+ After writing the verdict, update `docs/AI_HANDOFF/INDEX.md` row for this task: set Status = NEXT_STATUS_FOR_INDEX, set Reviewer = your model name.
61
+
62
+ ## Model isolation check (FIRST thing you do)
63
+
64
+ UKit cannot force any tool to use a specific model. The contract is enforced HERE, by you, via self-report comparison.
65
+
66
+ 1. Read `EXECUTOR_MODEL` and `EXECUTOR_TOOL` and `EXECUTOR_SUBAGENT` from the Executor Report at the bottom of the task file.
67
+ 2. Identify your own model. Your model SHOULD match `handoff.reviewer.model` in `.ukit/storage/config.json`. State both in the verdict.
68
+ 3. Apply this table:
69
+
70
+ | Executor model | Your model | Action |
71
+ |-----------------------|-----------------------|--------------------------------------------------------------------------------------------|
72
+ | named, != yours | named | proceed with review |
73
+ | named, == yours | named | REFUSE -> VERDICT = CHANGES-REQUESTED, reason "reviewer model must differ from executor" |
74
+ | "unknown" | named | proceed but mark `NOTES: executor model unverified - human, please confirm before merge` |
75
+ | named | "unknown" | REFUSE -> VERDICT = CHANGES-REQUESTED, reason "reviewer cannot verify own model" |
76
+ | missing field | any | REFUSE -> VERDICT = CHANGES-REQUESTED, reason "executor did not self-report model - re-run with v1.5.5+ contract" |
77
+
78
+ Same model is the most common silent failure. Do not skip this check.
79
+
80
+ ## Rules
81
+
82
+ - **Always re-run** the task's Verification Commands. If they fail, VERDICT = CRITICAL regardless of executor claims.
83
+ - If executor said `TEST_PLAN_FOLLOWED: N/A` without a real justification, downgrade to at minimum CHANGES-REQUESTED.
84
+ - Never approve when test file has no real `expect`/`assert` - that is a fake test -> CRITICAL.
85
+ - Keep the verdict block <= 30 lines. Findings are bullet points, not essays.
86
+ - The same-model refusal above is non-negotiable: bypassing it defeats the entire Quality Gate.
@@ -8,48 +8,89 @@ tools: ["Read", "Edit", "Write", "Grep", "Glob", "Bash", "TodoWrite"]
8
8
 
9
9
  Implement requested behavior with minimal scope drift.
10
10
 
11
+ **Two modes — auto-detect at start:**
12
+ - **Daily/ad-hoc mode** (DEFAULT): task didn't come from `docs/AI_HANDOFF/` → use the original lightweight workflow. Tests only when touched code already has coverage. No reviewer trigger.
13
+ - **Handoff mode**: task file is `docs/AI_HANDOFF/tasks/TASK-xxx.md` OR user explicitly invokes handoff (e.g. "execute task TASK-001") → activate full Quality Gate: test-first → green → reviewer.
14
+
15
+ If unsure, ask the user. Don't apply Handoff mode rules to a quick one-off fix.
16
+
11
17
  ## Workflow
12
18
 
13
19
  ### 1. Understand (< 30 seconds)
14
20
 
15
- - Infer intent directly from user request (build/test/docs flow)
21
+ - Infer intent directly from user request (build/test/docs flow).
16
22
  - Apply graduated doc budget:
17
23
  - trivial: no docs
18
24
  - simple: `docs/MEMORY.md` only
19
25
  - non-trivial: `docs/MEMORY.md` + `docs/PROJECT.md` + `docs/CODE_MAP.md`
20
- - Identify target files and existing patterns
21
- - If confidence is low or risk is high, ask one short clarifying question before deeper analysis
26
+ - Identify target files and existing patterns.
27
+ - If task came from handoff, read `tasks/TASK-xxx.md` and locate its **Test Plan** + **Verification Commands**.
28
+ - If confidence is low or risk is high, ask one short clarifying question before deeper analysis.
22
29
 
23
30
  ### 2. Plan Approach (< 1 minute)
24
31
 
25
- - List files to create/modify (max diff)
26
- - Implement directly
32
+ - List files to create/modify (max diff).
33
+ - **Handoff mode only:** if no Test Plan exists in the task file and task is not `trivial`, write one inline before implementing (happy + ≥1 edge case; regression test if fixing a bug). In daily mode, skip this step.
34
+
35
+ ### 3. Test First (RED) — Handoff mode
36
+
37
+ - Write the test(s) from §2 / from task Test Plan.
38
+ - Run them: must FAIL for the expected reason. Capture output.
39
+ - If test passes immediately → test is wrong or behavior already exists. Fix the test or stop and report.
40
+ - **Daily mode**: skip this step unless touched code already has tests (then follow original rule).
27
41
 
28
- ### 3. Implement
29
42
 
30
- - Smallest correct change set
31
- - Reuse existing code before creating new
32
- - No unrelated changes or speculative refactors
33
- - Follow project conventions (check `.claude/skills/` for patterns)
43
+ ### 4. Implement (GREEN)
34
44
 
35
- ### 4. Verify
45
+ - Smallest correct change set to make the test pass.
46
+ - Reuse existing code before creating new.
47
+ - No unrelated changes or speculative refactors.
48
+ - Follow project conventions (check `.claude/skills/` for patterns).
36
49
 
37
- - Run existing tests if touched behavior has coverage: `yarn test` or project-specific
38
- - For SQL changes: verify with `EXPLAIN ANALYZE` on non-trivial queries
39
- - Check no lint errors introduced
50
+ ### 5. Verify (REQUIRED before DONE in Handoff mode; targeted in Daily mode)
40
51
 
41
- ### 5. Report
52
+ - **Handoff mode**: run the task's Verification Commands fresh in this turn. Capture full output. If ANY test fails → status is `PARTIAL` or `BLOCKED`, never `DONE`.
53
+ - **Daily mode**: run existing tests if touched behavior has coverage; lint clean; targeted verification only.
54
+ - For SQL changes: verify with `EXPLAIN ANALYZE` on non-trivial queries.
55
+ - Check no lint errors introduced.
56
+
57
+ ### 6. Report
42
58
 
43
59
  ```
44
60
  STATUS: DONE | BLOCKED | PARTIAL
61
+ EXECUTOR_TOOL: [claude-code | kilo-code | codex | opencode | other]
62
+ EXECUTOR_MODEL: [exact model name you are running as — e.g. unic-code, claude-sonnet-4-5, gpt-5-mini. If you truly cannot tell, write "unknown" — reviewer treats unknown as suspicious and asks the human to confirm.]
63
+ EXECUTOR_SUBAGENT: [name of the subagent you are, if your host has multiple — e.g. "Kilo:code", "Claude:feature-implementer". Otherwise "-".]
45
64
  SUMMARY: [1-2 sentences of what was implemented]
65
+ TEST_PLAN_FOLLOWED: [task §4 / inline / N/A — reason]
46
66
  FILES_CHANGED:
47
67
  - [file path]: [what changed]
48
- VERIFIED: [test output / manual check result]
68
+ TESTS_ADDED:
69
+ - [test file]: [test names]
70
+ VERIFICATION:
71
+ command: [exact command run]
72
+ result: [N pass / M fail / exit code]
73
+ output_excerpt: |
74
+ [last 5-10 lines of test output]
49
75
  ISSUES: [any problems or edge cases, or "none"]
50
- NEXT: [follow-up needed, or "nothing task complete"]
76
+ HANDOFF_TO_REVIEWER: yes | noreason
77
+ NEXT: [follow-up needed, or "ready for review"]
51
78
  ```
52
79
 
80
+ > **Self-report rule:** UKit cannot force any tool/host to use a specific model. Your self-reported `EXECUTOR_MODEL` is how the reviewer (in another tool or subagent) knows what to compare against its own model. Misreporting → reviewer refuses and asks the human to confirm.
81
+
82
+ ### 7. Trigger Reviewer — Handoff mode ONLY
83
+
84
+ - Daily mode: skip this step entirely. Just report and stop.
85
+ - Handoff mode + `STATUS: DONE` + `handoff.reviewer.enabled=true`:
86
+ - Set task status to `pending_review` in `docs/AI_HANDOFF/INDEX.md`.
87
+ - The next AI session (any tool, model from `handoff.reviewer.model`, MUST differ from executor) will pick `pending_review` task and run review.
88
+ - Do NOT dispatch reviewer in-process unless your host explicitly supports it AND can guarantee a different model — file-based handoff is the default.
89
+
53
90
  ## Rules
54
91
 
55
- - Add/update tests only when touched behavior already has coverage
92
+ - **Iron law (Handoff mode):** no `DONE` without fresh PASS output in the current turn.
93
+ - **Daily mode:** original rule — add tests only when touched behavior already has coverage.
94
+ - If Handoff Test Plan says `N/A`, document why in the report and ensure manual verification ran.
95
+ - Never silently skip reviewer phase in Handoff mode; if disabled, say so explicitly in NEXT.
96
+ - Detection rule: if the task came from `docs/AI_HANDOFF/tasks/`, you are in Handoff mode. Otherwise Daily mode.
@@ -1082,7 +1082,11 @@ function normalizeOutputSummaryForDedupe(summary) {
1082
1082
  return String(summary ?? '')
1083
1083
  .split(/\r?\n/)
1084
1084
  .map((line) => String(line ?? '').trim())
1085
- .filter((line) => line && !/^- Full output:\s+/i.test(line))
1085
+ .filter((line) => line
1086
+ && !/^- Full output:\s+/i.test(line)
1087
+ && !/^-\s*FAIL(?:ED)?\b/i.test(line)
1088
+ && !/^-\s*PASS\b/i.test(line)
1089
+ && !/^-\s*(?:Test Files|Tests|Duration|Start at)\b/i.test(line))
1086
1090
  .join('\n');
1087
1091
  }
1088
1092
 
@@ -1131,6 +1135,16 @@ async function appendOutputHistory(projectRoot, entry) {
1131
1135
  return nextDocument;
1132
1136
  }
1133
1137
 
1138
+ async function findOutputHistoryEntry(projectRoot, command, summary) {
1139
+ const runtimePaths = buildRuntimePaths(projectRoot);
1140
+ const current = normalizeOutputHistoryDocument(await readJson(runtimePaths.outputHistoryPath, { entries: [] }));
1141
+ const lookupKey = [
1142
+ String(command ?? '').trim(),
1143
+ normalizeOutputSummaryForDedupe(summary),
1144
+ ].join('\n').trim().toLowerCase();
1145
+ return current.entries.find((candidate) => buildOutputHistoryDedupeKey(candidate) === lookupKey) ?? null;
1146
+ }
1147
+
1134
1148
  function shouldCompress(config) {
1135
1149
  return Boolean(config?.tokenPipeline?.outputCompression);
1136
1150
  }
@@ -1244,6 +1258,28 @@ async function main() {
1244
1258
  exitCode,
1245
1259
  projectRoot,
1246
1260
  });
1261
+
1262
+ const historyMatch = await findOutputHistoryEntry(projectRoot, result.command, result.summary);
1263
+ if (historyMatch?.summary) {
1264
+ await appendOutputHistory(projectRoot, {
1265
+ timestamp: Date.now(),
1266
+ command: historyMatch.command,
1267
+ profile: historyMatch.profile,
1268
+ summary: historyMatch.summary,
1269
+ tokensBefore: historyMatch.tokensBefore,
1270
+ tokensAfter: historyMatch.tokensAfter,
1271
+ savedTokens: historyMatch.savedTokens,
1272
+ exitCode,
1273
+ rawSaved: historyMatch.rawSaved,
1274
+ rawPath: historyMatch.rawPath,
1275
+ rawBytes: historyMatch.rawBytes,
1276
+ recoveryReason: historyMatch.recoveryReason,
1277
+ truncated: historyMatch.truncated,
1278
+ });
1279
+ process.stdout.write(String(historyMatch.summary));
1280
+ return;
1281
+ }
1282
+
1247
1283
  const recoveryReason = buildRawOutputRecoveryReason({
1248
1284
  exitCode,
1249
1285
  tokensBefore: result.tokensBefore,
@@ -83,6 +83,13 @@ For clearly non-code specialist lanes (docs-only, status, task queue), skip the
83
83
  - Threshold-based compact pressure is internal orchestration; do not expose it to users.
84
84
  - For Codex Desktop long sessions, UKit can use soft auto-compact handoffs. Default `compact.codexContext.compactTarget=150` means about 150 compact handoff lines (120-150 preferred, hard max 170), not 150 tokens.
85
85
 
86
+ ## Handoff Quality Gate — OPT-IN
87
+
88
+ Activates ONLY when work goes through `docs/AI_HANDOFF/` (user says "execute task TASK-xxx" or target is `docs/AI_HANDOFF/tasks/*.md`). Daily prompts → unchanged lightweight flow, no test-first/reviewer overhead.
89
+
90
+ In Handoff mode: read `docs/AI_HANDOFF/RULES.md` for the 4-phase spec (Idea+Plan → Create Tasks → Implement+Test → Review+Test), state machine, comment thread, and self-reported model contract. Config: `.ukit/storage/config.json` → `handoff.*`.
91
+
92
+
86
93
  ## Context + Verification Budget
87
94
 
88
95
  - **Trivial**: no docs, no index query unless the file target is unclear.
@@ -89,6 +89,13 @@ For clearly non-code specialist lanes (docs-only, status, task queue), skip the
89
89
  - Preserve UTF-8 BOM/no-BOM and LF/CRLF for existing multilingual/user-authored files.
90
90
  - Use `node .claude/ukit/index/safe-patch.mjs` internally when normal Edit/Write may normalize bytes or when anchor-based matching is needed.
91
91
 
92
+ ## Handoff Quality Gate — OPT-IN
93
+
94
+ CHỈ kích hoạt khi task đi qua `docs/AI_HANDOFF/` (user nói "execute task TASK-xxx" hoặc target là `docs/AI_HANDOFF/tasks/*.md`). Daily prompt → KHÔNG đụng, flow cũ giữ nguyên.
95
+
96
+ Khi Handoff mode: đọc `docs/AI_HANDOFF/RULES.md` để biết 4 phase (Idea+Plan → Create Tasks → Implement+Test → Review+Test) + state machine + comment thread + self-report model. Config: `.ukit/storage/config.json` → `handoff.*`.
97
+
98
+
92
99
  ## Context + Verification Budget
93
100
 
94
101
  - **Trivial**: no docs.
@@ -158,3 +165,4 @@ DuraOne skill chỉ active khi pack `duraone` được cài hoặc `.claude/skil
158
165
  - `.claude/skills/duraone/references/sql.md`
159
166
  - `.claude/skills/duraone/references/workflow.md`
160
167
  - Khi không active: dùng generic coding standards + project-specific patterns từ index.
168
+ {{codegraphSection}}
@@ -1,6 +1,13 @@
1
1
  # Handoff Task Index
2
2
 
3
- | ID | Title | Priority | Size | Status | File |
4
- |----|-------|----------|------|--------|------|
3
+ <!--
4
+ Status values (xem RULES.md §Status state machine):
5
+ ready | in_progress | pending_review | changes_requested | critical_block | approved | approved_minor | blocked | done
6
+
7
+ Owner = tool đang giữ task: claude-code | kilo-code | codex | opencode | -
8
+ -->
9
+
10
+ | ID | Title | Priority | Size | Status | Owner | Reviewer | File |
11
+ |----|-------|----------|------|--------|-------|----------|------|
5
12
 
6
13
  Updated:
@@ -2,4 +2,74 @@
2
2
 
3
3
  Status: `empty`
4
4
 
5
- <!-- AI ghi ý tưởng + phân tích ở đây. Sau khi tách thành tasks, clear phần dưới. -->
5
+ <!--
6
+ File này CHỈ dùng cho luồng Handoff Quality Gate.
7
+ Daily prompt / quick fix → KHÔNG cần đụng vào đây, UKit vẫn chạy flow cũ bình thường.
8
+
9
+ Chỉ kích hoạt khi user explicit đẩy việc qua handoff (ví dụ: "đưa vào handoff", "gom ý tưởng X").
10
+
11
+ QUALITY GATE: mỗi task split ra phải kèm Test Plan (xem mục bên dưới).
12
+ Không có Test Plan → task không được phép chuyển sang status `ready`.
13
+ -->
14
+
15
+ ## 1. Intent / Goal
16
+
17
+ <!-- 1-2 câu mô tả thứ user muốn đạt. Không paste lại nguyên prompt. -->
18
+
19
+ ## 2. Scope
20
+
21
+ - In scope:
22
+ - Out of scope:
23
+ - Risk surface (file/module rủi ro share):
24
+
25
+ ## 3. Approach
26
+
27
+ <!-- Cách làm ngắn gọn. Reuse code có sẵn trước khi tạo mới. -->
28
+
29
+ ## 4. Test Plan (REQUIRED — TDD-style)
30
+
31
+ Liệt kê test sẽ viết TRƯỚC khi code. Mỗi test phải có:
32
+
33
+ | # | Loại | Tên test | File | Expect | Pre-state |
34
+ |---|------|----------|------|--------|-----------|
35
+ | 1 | unit / integration / regression / e2e | `<tên test mô tả hành vi>` | `<path/to/file.test.js>` | `<output kỳ vọng cụ thể>` | `<input/fixture>` |
36
+
37
+ Bắt buộc tối thiểu:
38
+
39
+ - **Happy path**: hành vi chính chạy đúng.
40
+ - **Edge case**: ít nhất 1 (null/empty/boundary/concurrent…).
41
+ - **Regression** (nếu fix bug): test fail-trước-khi-fix, pass-sau-khi-fix.
42
+
43
+ Nếu task không thể test (config-only, doc-only, prototype throw-away): ghi `Test plan: N/A — lý do: <…>` và đính kèm phương án verify thủ công.
44
+
45
+ ## 5. Verification Commands
46
+
47
+ Lệnh chính xác executor sẽ chạy:
48
+
49
+ ```bash
50
+ # ví dụ:
51
+ # yarn test path/to/file.test.js
52
+ # yarn test --run
53
+ # node scripts/smoke.mjs
54
+ ```
55
+
56
+ ## 6. Acceptance Criteria
57
+
58
+ - [ ] Tất cả test ở Test Plan PASS (kèm output trong report).
59
+ - [ ] Không có regression ở suite liên quan.
60
+ - [ ] Reviewer (model riêng) báo `APPROVED` hoặc `APPROVED-WITH-MINOR`.
61
+ - [ ] Docs/CHANGELOG cập nhật nếu user-facing.
62
+
63
+ ## 7. Task Split (Phase 2 — TDD-embedded, MANDATORY)
64
+
65
+ Khi human approve plan, AI tạo từng `tasks/TASK-xxx.md` theo cấu trúc ở `tasks/_TEMPLATE.md`.
66
+
67
+ **Mỗi TASK file BẮT BUỘC có:**
68
+ - `## Test Cases` — bảng test (loại, tên, expected, fixture) cho slice của task. Tối thiểu happy + 1 edge + regression nếu fix bug.
69
+ - `## Test Files` — đường dẫn cụ thể file test sẽ tạo/sửa.
70
+ - `## Verification Commands` — lệnh executor + reviewer đều chạy fresh.
71
+ - `## Acceptance Criteria`.
72
+
73
+ Task không kèm Test Cases + Test Files cụ thể → đánh `needs_breakdown`, không cho status `ready`.
74
+
75
+ Update `INDEX.md`: thêm row cho mỗi task mới với `Status: ready`, `Owner: -`.
@@ -7,7 +7,7 @@
7
7
  - Do NOT read `RULES.md` every request — only when you need flow clarification.
8
8
  - Do NOT read multiple task files in one request.
9
9
  - If ACTIVE.md + INDEX.md + task file would exceed budget, read only the task file.
10
- - Auto-compact: if any single handoff file exceeds 80 lines, trigger `clear handoff` immediately.
10
+ - Auto-compact: if any **state file** (`ACTIVE.md`, `INDEX.md`, or any single `tasks/TASK-xxx.md`) exceeds 80 lines, trigger `clear handoff` / split task. `PLAN.md` and `RULES.md` are reference/spec — exempt.
11
11
 
12
12
  ## How Human Submits Ideas
13
13
 
@@ -15,12 +15,90 @@
15
15
  - If request is already a concrete task (clear file/logic/output, small enough to do now), bypass handoff and execute directly.
16
16
  - If request is broad/ambiguous/multi-step, use handoff.
17
17
 
18
- ## Handoff Flow
18
+ ## Hard rule — All work stays in `docs/AI_HANDOFF/`
19
19
 
20
- 1. Human submits ideas AI writes to `PLAN.md`.
21
- 2. Human approves plan → AI splits into `tasks/TASK-xxx.md` + updates `INDEX.md`.
22
- 3. AI implementer reads `INDEX.md` picks task reads `tasks/TASK-xxx.md` implements.
23
- 4. After implementation → update `INDEX.md` status, archive cycle if done.
20
+ Mọi giao tiếp giữa các AI trong handoff CHỈ qua file dưới `docs/AI_HANDOFF/`:
21
+
22
+ - `PLAN.md` brainstorm + Test Plan tổng (Phase 1).
23
+ - `INDEX.md` — bảng task + status (mọi phase đọc/ghi).
24
+ - `tasks/TASK-xxx.md` — nơi sống của từng task: Goal + Test Cases + Verification + Executor Report + Reviewer Verdict + **Discussion thread**.
25
+ - `ACTIVE.md` — snapshot cycle hiện tại.
26
+ - `archive/` — cycle cũ.
27
+
28
+ **Cấm**: AI gửi câu hỏi/comment qua chat tool khác, qua commit message, hay qua file ngoài thư mục này. Lý do: cross-tool/cross-subagent chỉ đồng bộ được qua file. AI nào không đọc folder này = không tham gia handoff.
29
+
30
+ ### Discussion thread (AI-to-AI comments)
31
+
32
+ Khi cần hỏi-lại / push-back / gợi ý cho phase khác, AI ghi vào `## Discussion` của task file (template ở `tasks/_TEMPLATE.md`). Format:
33
+
34
+ ```
35
+ ### <YYYY-MM-DD> · <role: planner|executor|reviewer> · <tool/model>
36
+ <nội dung — gửi @planner / @executor / @reviewer nếu có người nhận>
37
+ ```
38
+
39
+ Phase kế tiếp PHẢI đọc Discussion trước khi tiếp tục — coi như inbox.
40
+
41
+ ## Handoff Flow (tool-agnostic, file-based state machine)
42
+
43
+ UKit handoff hoạt động qua **file state**. Anh tự chọn tool nào cho từng phase — Claude Code / Kilo Code / Codex / OpenCode / tool mới sau này — đều được. UKit chỉ care về **role của model**, không care tool.
44
+
45
+ 3 phase × 3 role model:
46
+
47
+ - **Plan** — model mạnh nhất anh có (reasoning model). Có thể chạy ở bất kỳ tool nào hỗ trợ planning tốt.
48
+ - **Execute** — model rẻ-mà-vẫn-thông-minh (code model). Có thể là subagent code của Kilo, hay agent build của OpenCode, hay feature-implementer của Claude Code.
49
+ - **Review** — **MODEL KHÁC executor** (reasoning model thường tốt hơn). Có thể là tool khác, hoặc cùng tool nhưng subagent khác model (ví dụ Kilo có subagent code và subagent review riêng).
50
+
51
+ Hai mô hình triển khai đều hợp lệ:
52
+ - **Cross-tool**: ví dụ Claude (plan) → Kilo (execute) → Claude (review). Bridge qua file.
53
+ - **Same-tool different-subagent**: ví dụ Kilo:plan → Kilo:code → Kilo:review, miễn 3 subagent dùng MODEL khác nhau ở role tương ứng.
54
+
55
+ Mỗi tool/subagent đọc cùng `INDEX.md` + `tasks/TASK-xxx.md` → chọn task theo `status` → cập nhật status khi xong.
56
+
57
+ > **Quan trọng — UKit không enforce model:** `handoff.executor.cheapSmartModelHint` và `handoff.reviewer.model` trong `.ukit/storage/config.json` chỉ là **nhãn** để anh biết MUỐN dùng gì. Tool nào dùng model nào là do anh chọn trong settings của tool đó. UKit enforce contract bằng cách bắt executor TỰ KHAI `EXECUTOR_MODEL` trong Executor Report; reviewer so với chính nó và refuse nếu trùng. Vì vậy nếu trong Kilo anh để cả code-subagent và review-subagent đều dùng cùng model → reviewer sẽ tự refuse, không silent-pass.
58
+
59
+ ### Status state machine
60
+
61
+ ```
62
+ brainstorm ──[plan approved]──▶ ready ──[executor pick]──▶ in_progress
63
+ ├─[PASS]──▶ pending_review
64
+ └─[FAIL]──▶ blocked
65
+ pending_review ──[reviewer]──▶ approved | approved_minor ──▶ done
66
+ ├▶ changes_requested ──[fix]──▶ in_progress
67
+ └▶ critical_block ──[fix]──▶ in_progress
68
+ ```
69
+
70
+ ### 4 Phases
71
+
72
+ **Phase 1 — Idea + Plan** (smart/reasoning model)
73
+ - Human submit ideas (natural language).
74
+ - AI ghi vào `PLAN.md`: §1 Intent, §2 Scope, §3 Approach, **§4 Test Plan (bắt buộc TDD-style)**, §5 Verification Commands, §6 Acceptance Criteria.
75
+ - Output: PLAN.md đầy đủ, chờ human approve.
76
+
77
+ **Phase 2 — Create Tasks (TDD-embedded, MANDATORY)** (smart/reasoning model, thường cùng phase 1)
78
+ - Human approve plan → AI split `PLAN.md §7` sang nhiều `tasks/TASK-xxx.md`.
79
+ - **Mỗi TASK file BẮT BUỘC có Test Plan của riêng nó**, không chỉ trỏ về PLAN.md. Cụ thể:
80
+ - `§ Test Cases`: bảng test (loại, tên test, expected) cho phần task này — happy + ≥1 edge case + regression (nếu fix bug).
81
+ - `§ Test Files`: đường dẫn cụ thể file test sẽ tạo/sửa (ví dụ `tests/auth/login.test.js`).
82
+ - `§ Verification Commands`: lệnh executor sẽ chạy để xác nhận PASS.
83
+ - `§ Acceptance Criteria`: checklist.
84
+ - Nếu split mà task nào không kèm được Test Cases + Test Files cụ thể → task đó chưa đủ `ready`, đánh `needs_breakdown`.
85
+ - Update `INDEX.md`: thêm row mỗi task với status `ready`.
86
+ - Đây là **điểm cắt human-approval**: phase này xong, executor được phép pick.
87
+ - Mục tiêu: executor (cheap-smart model) đọc task file là biết NGAY test gì cần viết trước, KHÔNG phải tự suy diễn.
88
+
89
+ **Phase 3 — Implement + Test** (cheap-smart/code model)
90
+ - User: "execute next task" / "làm TASK-001" / "implement task 1".
91
+ - Executor đọc `INDEX.md` → pick `ready` task → đổi `in_progress` → **viết test trước → RED → implement → GREEN** → chạy Verification Commands fresh trong turn → append `## Executor Report` (gồm `EXECUTOR_TOOL`/`EXECUTOR_MODEL`/`EXECUTOR_SUBAGENT` + verification output) vào cuối task file → đổi status `pending_review`.
92
+ - KHÔNG được claim DONE nếu chưa có PASS fresh.
93
+
94
+ **Phase 4 — Review + Test** (reviewer model — KHÁC model executor)
95
+ - User: "review pending tasks" / "review TASK-001".
96
+ - Reviewer đọc INDEX → pick `pending_review` → đọc task file + diff → **so model với `EXECUTOR_MODEL`, refuse nếu trùng/unknown** → **re-run Verification Commands fresh** (không tin executor) → áp `code-review` skill → append `## Reviewer Verdict` vào task file (verdict + findings + reviewer model dùng) → đổi status:
97
+ - `approved` / `approved_minor` → cho phép `done`.
98
+ - `changes_requested` → executor phải fix Important → lặp Phase 3-4.
99
+ - `critical_block` → executor PHẢI fix → lặp Phase 3-4.
100
+
101
+ Nếu `handoff.reviewer.enabled=false`, Phase 4 skip nhưng phải log lý do vào task — bỏ Phase 4 là bỏ lưới an toàn cuối.
24
102
 
25
103
  ## Task Gate
26
104
 
@@ -28,11 +106,13 @@ A task is `ready` only when it has:
28
106
  - Clear target files
29
107
  - Clear action
30
108
  - Dependencies stated
31
- - Verification command
109
+ - **Test Plan** (PLAN.md §4) — happy path + ≥1 edge case (+ regression test nếu fix bug); hoặc `N/A` kèm lý do
110
+ - Verification command (lệnh executor sẽ chạy)
32
111
  - Acceptance criteria
33
112
 
34
113
  Missing any → `needs_breakdown`, `blocked`, or `needs_human`.
35
114
 
115
+
36
116
  ## Clear Handoff
37
117
 
38
118
  1. Archive current cycle → `archive/cycle-NNN.md`.
@@ -0,0 +1,72 @@
1
+ # TASK-XXX — <short title>
2
+
3
+ <!--
4
+ Template cho mỗi task. Planner copy file này khi split PLAN.md sang task riêng.
5
+ File này BẮT BUỘC giữ structure: Goal + Test Cases + Test Files + Verification + Acceptance.
6
+ Mọi AI (planner / executor / reviewer) đọc và ghi vào file NÀY. Không trao đổi ngoài file.
7
+ -->
8
+
9
+ - Status: `ready` <!-- ready | in_progress | pending_review | changes_requested | critical_block | approved | approved_minor | blocked | done -->
10
+ - Owner: `-` <!-- tool đang giữ task -->
11
+ - Reviewer: `-` <!-- model name reviewer dùng, set ở Phase 4 -->
12
+ - Parent plan: `docs/AI_HANDOFF/PLAN.md` §<section>
13
+
14
+ ## Goal
15
+
16
+ <!-- 1-2 câu mô tả slice này làm gì. -->
17
+
18
+ ## Target Files
19
+
20
+ - `<path/to/source.js>` — <what changes>
21
+
22
+ ## Test Cases (REQUIRED — TDD)
23
+
24
+ | # | Loại | Tên test | Expected | Pre-state / Fixture |
25
+ |---|------|----------|----------|---------------------|
26
+ | 1 | unit | `<describe behavior>` | `<concrete expected>` | `<input>` |
27
+ | 2 | edge | `<null/empty/boundary>` | `<expected>` | `<input>` |
28
+ | 3 | regression (nếu bug fix) | `<reproduces bug>` | RED before fix, GREEN after | `<repro input>` |
29
+
30
+ ## Test Files
31
+
32
+ - `<tests/path/to/file.test.js>` — chứa các test ở trên.
33
+
34
+ ## Verification Commands
35
+
36
+ ```bash
37
+ yarn test tests/path/to/file.test.js
38
+ ```
39
+
40
+ ## Acceptance Criteria
41
+
42
+ - [ ] Mọi test ở §Test Cases PASS.
43
+ - [ ] Không regression ở suite liên quan.
44
+ - [ ] Reviewer verdict APPROVED hoặc APPROVED-WITH-MINOR.
45
+ - [ ] Docs/CHANGELOG cập nhật nếu user-facing.
46
+
47
+ ## Dependencies
48
+
49
+ - (none) <!-- hoặc TASK-xxx phải done trước -->
50
+
51
+ ---
52
+
53
+ ## Discussion
54
+
55
+ <!--
56
+ AI nói chuyện với nhau Ở ĐÂY, không nói qua tool khác.
57
+ Format mỗi comment:
58
+
59
+ ### <date> · <role: planner|executor|reviewer> · <tool/model>
60
+ <nội dung — câu hỏi, lưu ý, đề xuất, đẩy ngược về phase trước>
61
+
62
+ Reply lùn 1 level (####). Ghi rõ "→ @planner" / "→ @executor" / "→ @reviewer" nếu có người nhận cụ thể.
63
+ -->
64
+
65
+ (chưa có comment)
66
+
67
+ ---
68
+
69
+ <!--
70
+ Phase 3 executor append `## Executor Report` BÊN DƯỚI dấu phân cách này.
71
+ Phase 4 reviewer append `## Reviewer Verdict` BÊN DƯỚI Executor Report.
72
+ -->
@@ -1,5 +1,5 @@
1
1
  {
2
- "version": "1.5.4",
2
+ "version": "1.5.6",
3
3
  "agent": "claude-code",
4
4
  "autonomy": {
5
5
  "level": "balanced",
@@ -156,6 +156,45 @@
156
156
  "deltaMaxHunks": 3,
157
157
  "deltaMaxDiffCells": 2000000
158
158
  },
159
+ "handoff": {
160
+ "enabled": true,
161
+ "crossTool": true,
162
+ "plan": {
163
+ "requireTestPlan": true,
164
+ "minTestsHappyPath": 1,
165
+ "minTestsEdgeCase": 1,
166
+ "regressionTestRequiredForBugfix": true,
167
+ "smartModelHint": "claude-opus-4-6"
168
+ },
169
+ "executor": {
170
+ "testFirstRequired": true,
171
+ "mustRunVerificationInTurn": true,
172
+ "blockDoneWithoutFreshPass": true,
173
+ "cheapSmartModelHint": "unic-code",
174
+ "appendReportToTaskFile": true
175
+ },
176
+ "reviewer": {
177
+ "enabled": true,
178
+ "model": "unic-smart",
179
+ "agent": "code-reviewer",
180
+ "mustDifferFromExecutor": true,
181
+ "blockOnCritical": true,
182
+ "blockOnChangesRequested": true,
183
+ "rerunVerificationCommands": true,
184
+ "appendVerdictToTaskFile": true
185
+ },
186
+ "statusFlow": [
187
+ "ready",
188
+ "in_progress",
189
+ "pending_review",
190
+ "changes_requested",
191
+ "critical_block",
192
+ "approved",
193
+ "approved_minor",
194
+ "blocked",
195
+ "done"
196
+ ]
197
+ },
159
198
  "subagents": {
160
199
  "enabled": true,
161
200
  "smallTaskModel": "unic-lite",
@@ -259,6 +298,30 @@
259
298
  "field": "compact.codexContext.autoCompact",
260
299
  "mac_dinh": true,
261
300
  "y_nghia": "Chỉ set false khi đang debug hành vi compact của UKit."
301
+ },
302
+ "doi_reviewer_model": {
303
+ "field": "handoff.reviewer.model",
304
+ "mac_dinh": "unic-smart",
305
+ "y_nghia": "Model dùng cho reviewer agent ở Phase 3. BẮT BUỘC khác model executor để bắt được lỗi mà executor miss. Có thể dùng claude-opus-4-6, unic-smart, hoặc bất kỳ model reasoning mạnh nào.",
306
+ "vi_du": "Nếu executor là unic-code (Kilo Code), set reviewer.model=unic-smart hoặc claude-opus-4-6. Nếu executor là claude-sonnet, set reviewer thành claude-opus."
307
+ },
308
+ "tat_reviewer_phase": {
309
+ "field": "handoff.reviewer.enabled",
310
+ "mac_dinh": true,
311
+ "y_nghia": "Tắt Phase 3 review nếu muốn chạy nhanh prototype. KHÔNG khuyến nghị cho code production — bỏ review là bỏ lưới an toàn cuối cùng.",
312
+ "khi_nao_tat": "Chỉ tắt cho throw-away prototype hoặc khi anh tự review thủ công."
313
+ },
314
+ "block_handoff_khi_critical": {
315
+ "field": "handoff.reviewer.blockOnCritical",
316
+ "mac_dinh": true,
317
+ "y_nghia": "Nếu reviewer báo CRITICAL → task không được phép done; executor phải fix và review lại. Set false chỉ khi muốn cảnh báo mềm.",
318
+ "khuyen_nghi": "Giữ true. Đây là rào chắn chống ship lỗi nghiêm trọng."
319
+ },
320
+ "bat_test_plan_bat_buoc": {
321
+ "field": "handoff.plan.requireTestPlan",
322
+ "mac_dinh": true,
323
+ "y_nghia": "Planner phải hoàn thành PLAN.md §4 (Test Plan) trước khi task chuyển ready. Tắt sẽ làm UKit cho phép skip TDD — kéo theo executor dễ làm sót.",
324
+ "khuyen_nghi": "Giữ true."
262
325
  }
263
326
  },
264
327
  "version": "Phiên bản config runtime đi kèm package UKit.",
@@ -374,6 +437,35 @@
374
437
  "deltaMaxChangedLines": "Ngân sách dòng đổi mặc định cho kiểm tra delta sau này.",
375
438
  "deltaMaxHunks": "Ngân sách hunk đổi mặc định cho kiểm tra delta sau này.",
376
439
  "deltaMaxDiffCells": "Giới hạn cell LCS khi tính delta; quá ngưỡng này UKit dùng fallback tuyến tính để tránh chậm trên file rất lớn."
440
+ },
441
+ "handoff": {
442
+ "enabled": "Bật Quality Gate cho handoff: plan có Test Plan, executor test-first, reviewer model khác. Tắt = quay về flow cũ (dễ lọt lỗi vặt).",
443
+ "crossTool": "true nghĩa là handoff truyền qua file (PLAN/INDEX/tasks) chứ không qua in-process subagent — cho phép plan ở Claude Code, execute ở Kilo Code, review ở Claude Code khác model.",
444
+ "plan": {
445
+ "requireTestPlan": "Bắt buộc PLAN.md §4 phải có Test Plan trước khi task chuyển ready.",
446
+ "minTestsHappyPath": "Tối thiểu test cho happy path.",
447
+ "minTestsEdgeCase": "Tối thiểu test cho edge case (null/empty/boundary/concurrent…).",
448
+ "regressionTestRequiredForBugfix": "Bug fix phải có regression test fail-trước-fix.",
449
+ "smartModelHint": "Gợi ý model mạnh nhất cho phase plan (ví dụ claude-opus-4-6). UKit không tự ép, chỉ ghi hint vào task."
450
+ },
451
+ "executor": {
452
+ "testFirstRequired": "Executor phải viết test trước khi implement (RED → GREEN).",
453
+ "mustRunVerificationInTurn": "Phải chạy Verification Commands trong cùng turn báo DONE; không tin output cũ.",
454
+ "blockDoneWithoutFreshPass": "Không cho phép STATUS:DONE nếu chưa có PASS fresh trong turn này.",
455
+ "cheapSmartModelHint": "Gợi ý model rẻ-mà-vẫn-thông-minh cho executor (ví dụ unic-code). UKit chỉ ghi hint.",
456
+ "appendReportToTaskFile": "Executor append `## Executor Report` vào cuối task file để reviewer (tool khác) đọc được."
457
+ },
458
+ "reviewer": {
459
+ "enabled": "Bật Phase 3 review độc lập. Tắt = bỏ lưới an toàn cuối cùng, không khuyến nghị.",
460
+ "model": "Model reviewer. BẮT BUỘC khác executor model. Mặc định unic-smart; có thể dùng claude-opus-4-6 hoặc bất kỳ model reasoning mạnh nào.",
461
+ "agent": "Tên reviewer agent (xem .claude/agents/code-reviewer.md).",
462
+ "mustDifferFromExecutor": "Nếu true, reviewer tự refuse khi phát hiện cùng model với executor.",
463
+ "blockOnCritical": "CRITICAL → block handoff cứng. Set false chỉ khi muốn warning mềm.",
464
+ "blockOnChangesRequested": "CHANGES-REQUESTED → block handoff đến khi fix. Set false sẽ cho phép done với issue Important.",
465
+ "rerunVerificationCommands": "Reviewer phải tự chạy lại Verification Commands, không tin output executor.",
466
+ "appendVerdictToTaskFile": "Reviewer append `## Reviewer Verdict` vào task file để khép vòng handoff."
467
+ },
468
+ "statusFlow": "Danh sách status hợp lệ trong INDEX.md. Đổi tên trong này = đổi state machine handoff."
377
469
  }
378
470
  }
379
471
  }