@ngockhoale/ukit 1.5.4 → 1.5.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +27 -0
- package/package.json +1 -1
- package/src/cli/commands/diff.js +2 -0
- package/src/cli/commands/install.js +2 -0
- package/src/cli/index.js +4 -0
- package/src/core/runInstallPipeline.js +2 -0
- package/src/render/buildVariables.js +15 -1
- package/templates/.claude/agents/bug-debugger.md +48 -19
- package/templates/.claude/agents/code-reviewer.md +86 -0
- package/templates/.claude/agents/feature-implementer.md +59 -18
- package/templates/.claude/ukit/runtime/output-compression.mjs +37 -1
- package/templates/AGENTS.md +7 -0
- package/templates/CLAUDE.md +8 -0
- package/templates/docs/AI_HANDOFF/INDEX.md +9 -2
- package/templates/docs/AI_HANDOFF/PLAN.md +71 -1
- package/templates/docs/AI_HANDOFF/RULES.md +87 -7
- package/templates/docs/AI_HANDOFF/tasks/_TEMPLATE.md +72 -0
- package/templates/ukit/storage/config.json +93 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,33 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to UKit are documented here.
|
|
4
4
|
|
|
5
|
+
## 1.5.5 - 2026-05-30
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
|
|
9
|
+
- **Handoff Quality Gate** — opt-in, tool-agnostic, file-based, 4-phase pipeline (Idea+Plan → Create Tasks → Implement+Test → Review+Test) for tasks routed through `docs/AI_HANDOFF/`. Daily ad-hoc prompts are NOT affected and continue with the existing lightweight workflow.
|
|
10
|
+
- New agent `templates/.claude/agents/code-reviewer.md`: independent reviewer with a model-isolation check. Refuses to review when `EXECUTOR_MODEL` equals reviewer's own model, when either is missing/unknown, or when re-run of Verification Commands fails. Emits a structured `## Reviewer Verdict` block.
|
|
11
|
+
- New config block `handoff.*` in `templates/ukit/storage/config.json` with Vietnamese `_help` entries: `handoff.plan.requireTestPlan`, `handoff.executor.testFirstRequired`, `handoff.reviewer.model` (default `unic-smart`), `handoff.reviewer.blockOnCritical`, etc. The `*.modelHint` fields are explicitly documented as *labels for humans, not enforced selectors* — model enforcement happens via executor self-report + reviewer refusal.
|
|
12
|
+
- New `## 4. Test Plan (REQUIRED — TDD-style)` section in `templates/docs/AI_HANDOFF/PLAN.md`.
|
|
13
|
+
- New `templates/docs/AI_HANDOFF/tasks/_TEMPLATE.md`: every task file MUST embed its own `## Test Cases` (table with happy + ≥1 edge case + regression for bug fixes) + `## Test Files` + `## Verification Commands` + `## Acceptance Criteria`. Tasks without these are `needs_breakdown`, not `ready`.
|
|
14
|
+
- `## Discussion` section pattern on task files: structured AI-to-AI comment thread (date · role · tool/model) so planner/executor/reviewer can talk back to each other through files — no out-of-band channels.
|
|
15
|
+
- New task status states in `INDEX.md`: `pending_review`, `changes_requested`, `critical_block`, `approved`, `approved_minor`, plus Owner/Reviewer columns.
|
|
16
|
+
|
|
17
|
+
### Changed
|
|
18
|
+
|
|
19
|
+
- `templates/.claude/agents/feature-implementer.md`: now operates in two explicit modes. **Daily mode** (DEFAULT): unchanged lightweight flow — tests only when touched code has coverage, no reviewer trigger. **Handoff mode** (when task lives under `docs/AI_HANDOFF/tasks/`): test-first → green → reviewer; cannot claim `STATUS: DONE` without fresh PASS in turn. Report format now includes `EXECUTOR_TOOL`, `EXECUTOR_MODEL`, `EXECUTOR_SUBAGENT` self-report.
|
|
20
|
+
- `templates/.claude/agents/bug-debugger.md`: same two-mode split. Handoff mode requires regression-test-first (RED before fix, GREEN after); Daily mode unchanged.
|
|
21
|
+
- `templates/docs/AI_HANDOFF/RULES.md`: rewritten around the 4-phase model. Added `## Hard rule — All work stays in docs/AI_HANDOFF/` (no out-of-band AI communication). Added Phase 2 TDD-embedded requirement. Auto-compact 80-line rule scoped to state files (ACTIVE/INDEX/tasks); PLAN.md and RULES.md exempt.
|
|
22
|
+
- `templates/CLAUDE.md` + `templates/AGENTS.md`: added scoped `## Handoff Quality Gate` section with explicit OPT-IN scope language so adapter targets (Claude Code, Kilo Code, Codex, OpenCode, future tools) only apply Quality Gate to handoff work, not daily prompts.
|
|
23
|
+
|
|
24
|
+
### Fixed
|
|
25
|
+
|
|
26
|
+
- Fixed output-history deduplication for `promptCache: false` — the second call with semantically equivalent (but noisy) output now returns the cached first summary instead of recomputing. Root cause: `normalizeOutputSummaryForDedupe` wasn't stripping `FAIL`/`PASS`/`Test Files`/`Tests`/`Duration`/`Start at` lines from the dedup key, so slight token-budget differences between two similar payloads produced different keys and broke cache hits. Added `findOutputHistoryEntry` lookup in `main()` to serve cached summaries without prompt-cache.
|
|
27
|
+
|
|
28
|
+
### Why
|
|
29
|
+
|
|
30
|
+
User had two real concerns: (1) cheap-model executors (e.g. Kilo Code) miss small things — fixed by mandatory test-first + independent reviewer with different model; (2) UKit cannot dictate which model any external tool uses — fixed by making the contract self-report-based (executor writes `EXECUTOR_MODEL`, reviewer compares and refuses on match/unknown). The Quality Gate is opt-in via the `docs/AI_HANDOFF/` folder so daily ad-hoc work is unaffected.
|
|
31
|
+
|
|
5
32
|
## 1.5.4 - 2026-05-28
|
|
6
33
|
|
|
7
34
|
### Fixed
|
package/package.json
CHANGED
package/src/cli/commands/diff.js
CHANGED
|
@@ -18,6 +18,7 @@ export async function runDiff({ packageRoot, projectRoot, packageVersion, argv =
|
|
|
18
18
|
const toolsArg = parseToolsArg(argv);
|
|
19
19
|
const selectedOptionalTools = resolveOptionalToolKeys(toolsArg);
|
|
20
20
|
const selectedAdapterItemIds = toSelectedAdapterItemIds(selectedOptionalTools);
|
|
21
|
+
const withCodegraph = argv.includes('--with-codegraph');
|
|
21
22
|
|
|
22
23
|
const pathConfig = buildPathConfig({ packageRoot, projectRoot });
|
|
23
24
|
|
|
@@ -26,6 +27,7 @@ export async function runDiff({ packageRoot, projectRoot, packageVersion, argv =
|
|
|
26
27
|
pathConfig,
|
|
27
28
|
dryRun: true,
|
|
28
29
|
selectedAdapterItemIds,
|
|
30
|
+
withCodegraph,
|
|
29
31
|
});
|
|
30
32
|
|
|
31
33
|
console.log('[UKit] Diff preview:');
|
|
@@ -198,6 +198,7 @@ export async function pruneDeselectedAdapters({
|
|
|
198
198
|
export async function runInstall({ packageRoot, projectRoot, packageVersion, argv = [] }) {
|
|
199
199
|
const toolsArg = parseToolsArg(argv);
|
|
200
200
|
const selectedOptionalTools = resolveOptionalToolKeys(toolsArg);
|
|
201
|
+
const withCodegraph = argv.includes('--with-codegraph');
|
|
201
202
|
|
|
202
203
|
const pruneResult = await pruneDeselectedAdapters({
|
|
203
204
|
projectRoot,
|
|
@@ -213,6 +214,7 @@ export async function runInstall({ packageRoot, projectRoot, packageVersion, arg
|
|
|
213
214
|
dryRun: false,
|
|
214
215
|
selectedAdapterItemIds,
|
|
215
216
|
retainedManagedRelativePaths: pruneResult?.retainedManagedPaths ?? [],
|
|
217
|
+
withCodegraph,
|
|
216
218
|
});
|
|
217
219
|
|
|
218
220
|
const { create, update, unchanged, skip } = result.summary;
|
package/src/cli/index.js
CHANGED
|
@@ -92,6 +92,10 @@ export async function runCli({ argv, packageRoot, projectRoot, packageVersion })
|
|
|
92
92
|
console.log(' Values: claude, antigravity, codex, opencode');
|
|
93
93
|
console.log(' Example: ukit install --tools=claude,codex,opencode');
|
|
94
94
|
console.log(' Use --tools=none to disable all optional adapters');
|
|
95
|
+
console.log(' --with-codegraph Enable CodeGraph MCP integration in CLAUDE.md');
|
|
96
|
+
console.log(' Adds routing rules that prefer codegraph_* tools for');
|
|
97
|
+
console.log(' symbol search, context building, and impact analysis.');
|
|
98
|
+
console.log(' Omit this flag to remove the section on reinstall.');
|
|
95
99
|
console.log('');
|
|
96
100
|
console.log('Maintainer / debug examples:');
|
|
97
101
|
console.log(' ukit index build');
|
|
@@ -217,6 +217,7 @@ export async function runInstallPipeline({
|
|
|
217
217
|
dryRun,
|
|
218
218
|
selectedAdapterItemIds,
|
|
219
219
|
retainedManagedRelativePaths = [],
|
|
220
|
+
withCodegraph = false,
|
|
220
221
|
}) {
|
|
221
222
|
// Check for an existing install before making any changes.
|
|
222
223
|
// cleanupLegacyPaths() is only safe to run on reinstalls: on a fresh install
|
|
@@ -239,6 +240,7 @@ export async function runInstallPipeline({
|
|
|
239
240
|
stackContext,
|
|
240
241
|
packageVersion,
|
|
241
242
|
providerContext,
|
|
243
|
+
withCodegraph,
|
|
242
244
|
});
|
|
243
245
|
|
|
244
246
|
const plan = await buildInstallPlan({
|
|
@@ -1,4 +1,17 @@
|
|
|
1
|
-
|
|
1
|
+
const CODEGRAPH_SECTION = `
|
|
2
|
+
## CodeGraph Integration (active)
|
|
3
|
+
|
|
4
|
+
CodeGraph MCP is enabled for this project. When \`codegraph_*\` tools appear in your tool list:
|
|
5
|
+
- **Symbol search** → \`codegraph_search\` (replaces \`node .claude/ukit/index/query-index.mjs\`)
|
|
6
|
+
- **Context building** → \`codegraph_context\` (replaces \`resolve-context.mjs\`)
|
|
7
|
+
- **Impact analysis** → \`codegraph_impact\` (replaces \`impact-context.mjs\`)
|
|
8
|
+
- **Call graph** → \`codegraph_trace\`, \`codegraph_callers\`, \`codegraph_callees\` (new capability, no UKit equivalent)
|
|
9
|
+
|
|
10
|
+
UKit routing, memory, and skill activation remain primary. CodeGraph is the index query layer only.
|
|
11
|
+
To disable: rerun \`ukit install\` without \`--with-codegraph\`.
|
|
12
|
+
`;
|
|
13
|
+
|
|
14
|
+
export function buildTemplateVariables({ projectContext, stackContext, packageVersion, providerContext, withCodegraph = false }) {
|
|
2
15
|
return {
|
|
3
16
|
project: {
|
|
4
17
|
name: projectContext.project.name,
|
|
@@ -35,5 +48,6 @@ export function buildTemplateVariables({ projectContext, stackContext, packageVe
|
|
|
35
48
|
user: {
|
|
36
49
|
profile: 'default',
|
|
37
50
|
},
|
|
51
|
+
codegraphSection: withCodegraph ? CODEGRAPH_SECTION : '',
|
|
38
52
|
};
|
|
39
53
|
}
|
|
@@ -8,50 +8,79 @@ tools: ["Read", "Grep", "Glob", "Bash", "Edit", "TodoWrite"]
|
|
|
8
8
|
|
|
9
9
|
Systematic debugging — understand before fixing.
|
|
10
10
|
|
|
11
|
+
**Two modes:**
|
|
12
|
+
- **Daily/ad-hoc** (DEFAULT): bug not coming from `docs/AI_HANDOFF/` → reproduce → fix → verify (no mandatory regression test if no pre-existing coverage; original lightweight flow).
|
|
13
|
+
- **Handoff mode**: bug task lives in `docs/AI_HANDOFF/tasks/TASK-xxx.md` → activate Quality Gate: regression-test-first → green → reviewer.
|
|
14
|
+
|
|
11
15
|
## Workflow
|
|
12
16
|
|
|
13
17
|
### 1. Reproduce (required)
|
|
14
18
|
|
|
15
|
-
- Run the failing command/action
|
|
16
|
-
- Capture exact error message and stack trace
|
|
17
|
-
- If not reproducible → document conditions and ask user
|
|
19
|
+
- Run the failing command/action.
|
|
20
|
+
- Capture exact error message and stack trace.
|
|
21
|
+
- If not reproducible → document conditions and ask user.
|
|
18
22
|
|
|
19
23
|
### 2. Trace Root Cause
|
|
20
24
|
|
|
21
|
-
- Read error location and surrounding code
|
|
22
|
-
- Trace data flow: input → processing → failure point
|
|
23
|
-
- Identify:
|
|
25
|
+
- Read error location and surrounding code.
|
|
26
|
+
- Trace data flow: input → processing → failure point.
|
|
27
|
+
- Identify: logic / state / integration error?
|
|
28
|
+
|
|
29
|
+
### 3. Regression Test First (RED) — Handoff mode
|
|
30
|
+
|
|
31
|
+
- Write a regression test that reproduces the bug as a failing test.
|
|
32
|
+
- Run it: must FAIL with the original error/signature.
|
|
33
|
+
- If you truly cannot write a regression test (pure UI glitch, env-only issue), document why and attach a manual repro script.
|
|
34
|
+
- **Daily mode**: write a regression test only if the file already has tests; otherwise rely on the original repro command for verification.
|
|
24
35
|
|
|
25
|
-
###
|
|
36
|
+
### 4. Fix (GREEN)
|
|
26
37
|
|
|
27
|
-
- Apply smallest reliable fix at the root cause
|
|
28
|
-
- Do NOT patch symptoms — fix the cause
|
|
38
|
+
- Apply smallest reliable fix at the root cause.
|
|
39
|
+
- Do NOT patch symptoms — fix the cause.
|
|
40
|
+
- Re-run the regression test: must PASS.
|
|
29
41
|
|
|
30
|
-
###
|
|
42
|
+
### 5. Verify
|
|
31
43
|
|
|
32
|
-
- Re-run the original failing command → must pass
|
|
33
|
-
- Run related tests: `yarn test [relevant-file]
|
|
34
|
-
-
|
|
44
|
+
- Re-run the original failing command → must pass.
|
|
45
|
+
- Run related tests: `yarn test [relevant-file]`.
|
|
46
|
+
- If shared code touched, run wider suite.
|
|
47
|
+
- Check no regression in adjacent functionality.
|
|
35
48
|
|
|
36
|
-
###
|
|
49
|
+
### 6. Report
|
|
37
50
|
|
|
38
51
|
```
|
|
39
52
|
STATUS: DONE | BLOCKED | PARTIAL
|
|
53
|
+
EXECUTOR_TOOL: [claude-code | kilo-code | codex | opencode | other]
|
|
54
|
+
EXECUTOR_MODEL: [exact model name you are running as. "unknown" if you cannot tell.]
|
|
55
|
+
EXECUTOR_SUBAGENT: [subagent name within your host, if any, else "-"]
|
|
40
56
|
SUMMARY: [1-2 sentences — root cause and fix]
|
|
41
57
|
ROOT_CAUSE: [what caused the bug]
|
|
58
|
+
REGRESSION_TEST:
|
|
59
|
+
file: [path]
|
|
60
|
+
red_before: [exact error captured]
|
|
61
|
+
green_after: [pass output line]
|
|
42
62
|
FILES_CHANGED:
|
|
43
63
|
- [file path]: [what changed]
|
|
44
|
-
|
|
64
|
+
VERIFICATION:
|
|
65
|
+
command: [exact command]
|
|
66
|
+
result: [N pass / M fail / exit code]
|
|
45
67
|
ISSUES: [any remaining risks or edge cases, or "none"]
|
|
46
|
-
|
|
68
|
+
HANDOFF_TO_REVIEWER: yes | no — reason
|
|
69
|
+
NEXT: [follow-up needed, or "ready for review"]
|
|
47
70
|
```
|
|
48
71
|
|
|
72
|
+
### 7. Trigger Reviewer — Handoff mode ONLY
|
|
73
|
+
|
|
74
|
+
Daily mode: skip. Handoff mode: set task status `pending_review` in `INDEX.md`; a reviewer session (model from `handoff.reviewer.model`, MUST differ from this debugger's model) will pick it up.
|
|
75
|
+
|
|
49
76
|
## Rules
|
|
50
77
|
|
|
51
|
-
-
|
|
78
|
+
- **Iron law (Handoff mode):** no `DONE` without (a) regression test passing and (b) original failing command passing, both in this turn.
|
|
79
|
+
- **Daily mode:** original — original failing command must pass; regression test optional unless prior coverage exists.
|
|
80
|
+
- Don't patch blindly — confirm root cause with evidence.
|
|
52
81
|
- For bug triage, use graduated doc budget:
|
|
53
82
|
- obvious/simple bug: `docs/MEMORY.md` only
|
|
54
83
|
- non-trivial bug: `docs/MEMORY.md` + `docs/PROJECT.md` + `docs/CODE_MAP.md`
|
|
55
84
|
- read `docs/WORKLOG.md` only recent relevant entries
|
|
56
|
-
- Keep fix scope minimal — no drive-by refactors
|
|
57
|
-
- If root cause is unclear after 5 minutes of tracing → ask user for more context
|
|
85
|
+
- Keep fix scope minimal — no drive-by refactors.
|
|
86
|
+
- If root cause is unclear after 5 minutes of tracing → ask user for more context.
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: code-reviewer
|
|
3
|
+
description: "Independent reviewer for handoff Phase 3. Use after executor reports STATUS: DONE on a handoff task. MUST run with a model different from the executor (configured in .ukit/storage/config.json → handoff.reviewer.model, default unic-smart). Produces a verdict: APPROVED | APPROVED-WITH-MINOR | CHANGES-REQUESTED | CRITICAL."
|
|
4
|
+
model: inherit
|
|
5
|
+
color: yellow
|
|
6
|
+
tools: ["Read", "Grep", "Glob", "Bash"]
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
You are the independent reviewer for UKit's handoff Quality Gate. Your model is configured in `.ukit/storage/config.json` → `handoff.reviewer.model` and MUST differ from the executor's model. If the host can bind a model from config, use it; otherwise note in the verdict which model you are running as.
|
|
10
|
+
|
|
11
|
+
**Do not invent issues. Do not rubber-stamp.** Every finding must point at a specific file + line + concrete failure mode.
|
|
12
|
+
|
|
13
|
+
## Inputs you expect
|
|
14
|
+
|
|
15
|
+
- Path to task file: `docs/AI_HANDOFF/tasks/TASK-xxx.md` (has Test Plan §4 + Verification Commands + Executor Report at bottom).
|
|
16
|
+
- The executor's `STATUS: DONE` report with FILES_CHANGED + VERIFICATION block.
|
|
17
|
+
- The diff (use `git diff` or read FILES_CHANGED directly).
|
|
18
|
+
|
|
19
|
+
If any input is missing, return `CHANGES-REQUESTED` with reason "incomplete handoff package".
|
|
20
|
+
|
|
21
|
+
## Review order
|
|
22
|
+
|
|
23
|
+
1. **Test Plan adherence** — Were all tests in §4 actually implemented? Run them yourself: `<task Verification Commands>`. Fresh PASS required, no trusting executor's output blindly.
|
|
24
|
+
2. **Correctness** — Does the diff implement the requested behavior? Any obvious wrong assumptions, stale refs, missing cases?
|
|
25
|
+
3. **Regression risk** — What existing behavior could this break? Are shared paths/tests/contracts still aligned? Run the wider test suite if shared code was touched.
|
|
26
|
+
4. **Safety / security / data loss** — Destructive actions, auth/permission, path handling, unsafe shell/DB/file ops.
|
|
27
|
+
5. **Performance / scale** — Accidental N+1, repeated I/O, large scans inside hot paths.
|
|
28
|
+
6. **Maintainability** — Duplicated logic, dead branches, misleading naming, drift between docs/tests/source.
|
|
29
|
+
|
|
30
|
+
## Severity ladder
|
|
31
|
+
|
|
32
|
+
- **CRITICAL** — security hole, data loss risk, broken core behavior, test was faked (no real assertion), or verification command does NOT actually pass when you re-run it. Blocks handoff cứng.
|
|
33
|
+
- **CHANGES-REQUESTED** — Important issues: missing edge-case test, regression risk in shared code, wrong abstraction at scope boundary. Executor must fix and re-submit.
|
|
34
|
+
- **APPROVED-WITH-MINOR** — Minor naming / doc / style issues. Logged on task file but handoff allowed.
|
|
35
|
+
- **APPROVED** — Clean.
|
|
36
|
+
|
|
37
|
+
## Output (append to task file as `## Reviewer Verdict`)
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
## Reviewer Verdict
|
|
41
|
+
|
|
42
|
+
VERDICT: APPROVED | APPROVED-WITH-MINOR | CHANGES-REQUESTED | CRITICAL
|
|
43
|
+
REVIEWER_MODEL: [model name actually used]
|
|
44
|
+
EXECUTOR_MODEL: [from executor report]
|
|
45
|
+
VERIFICATION_RERUN:
|
|
46
|
+
command: [exact command]
|
|
47
|
+
result: [N pass / M fail]
|
|
48
|
+
TEST_PLAN_COVERAGE: [all-followed | partial — list gaps | missing — list]
|
|
49
|
+
FINDINGS:
|
|
50
|
+
critical:
|
|
51
|
+
- file: <path:line> — <what fails, how>
|
|
52
|
+
important:
|
|
53
|
+
- file: <path:line> — <what risk, evidence>
|
|
54
|
+
minor:
|
|
55
|
+
- file: <path:line> — <what to clean up>
|
|
56
|
+
NEXT_STATUS_FOR_INDEX: approved | approved_minor | changes_requested | critical_block
|
|
57
|
+
NOTES: [1-2 sentences for human reviewer if needed]
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
After writing the verdict, update `docs/AI_HANDOFF/INDEX.md` row for this task: set Status = NEXT_STATUS_FOR_INDEX, set Reviewer = your model name.
|
|
61
|
+
|
|
62
|
+
## Model isolation check (FIRST thing you do)
|
|
63
|
+
|
|
64
|
+
UKit cannot force any tool to use a specific model. The contract is enforced HERE, by you, via self-report comparison.
|
|
65
|
+
|
|
66
|
+
1. Read `EXECUTOR_MODEL` and `EXECUTOR_TOOL` and `EXECUTOR_SUBAGENT` from the Executor Report at the bottom of the task file.
|
|
67
|
+
2. Identify your own model. Your model SHOULD match `handoff.reviewer.model` in `.ukit/storage/config.json`. State both in the verdict.
|
|
68
|
+
3. Apply this table:
|
|
69
|
+
|
|
70
|
+
| Executor model | Your model | Action |
|
|
71
|
+
|-----------------------|-----------------------|--------------------------------------------------------------------------------------------|
|
|
72
|
+
| named, != yours | named | proceed with review |
|
|
73
|
+
| named, == yours | named | REFUSE -> VERDICT = CHANGES-REQUESTED, reason "reviewer model must differ from executor" |
|
|
74
|
+
| "unknown" | named | proceed but mark `NOTES: executor model unverified - human, please confirm before merge` |
|
|
75
|
+
| named | "unknown" | REFUSE -> VERDICT = CHANGES-REQUESTED, reason "reviewer cannot verify own model" |
|
|
76
|
+
| missing field | any | REFUSE -> VERDICT = CHANGES-REQUESTED, reason "executor did not self-report model - re-run with v1.5.5+ contract" |
|
|
77
|
+
|
|
78
|
+
Same model is the most common silent failure. Do not skip this check.
|
|
79
|
+
|
|
80
|
+
## Rules
|
|
81
|
+
|
|
82
|
+
- **Always re-run** the task's Verification Commands. If they fail, VERDICT = CRITICAL regardless of executor claims.
|
|
83
|
+
- If executor said `TEST_PLAN_FOLLOWED: N/A` without a real justification, downgrade to at minimum CHANGES-REQUESTED.
|
|
84
|
+
- Never approve when test file has no real `expect`/`assert` - that is a fake test -> CRITICAL.
|
|
85
|
+
- Keep the verdict block <= 30 lines. Findings are bullet points, not essays.
|
|
86
|
+
- The same-model refusal above is non-negotiable: bypassing it defeats the entire Quality Gate.
|
|
@@ -8,48 +8,89 @@ tools: ["Read", "Edit", "Write", "Grep", "Glob", "Bash", "TodoWrite"]
|
|
|
8
8
|
|
|
9
9
|
Implement requested behavior with minimal scope drift.
|
|
10
10
|
|
|
11
|
+
**Two modes — auto-detect at start:**
|
|
12
|
+
- **Daily/ad-hoc mode** (DEFAULT): task didn't come from `docs/AI_HANDOFF/` → use the original lightweight workflow. Tests only when touched code already has coverage. No reviewer trigger.
|
|
13
|
+
- **Handoff mode**: task file is `docs/AI_HANDOFF/tasks/TASK-xxx.md` OR user explicitly invokes handoff (e.g. "execute task TASK-001") → activate full Quality Gate: test-first → green → reviewer.
|
|
14
|
+
|
|
15
|
+
If unsure, ask the user. Don't apply Handoff mode rules to a quick one-off fix.
|
|
16
|
+
|
|
11
17
|
## Workflow
|
|
12
18
|
|
|
13
19
|
### 1. Understand (< 30 seconds)
|
|
14
20
|
|
|
15
|
-
- Infer intent directly from user request (build/test/docs flow)
|
|
21
|
+
- Infer intent directly from user request (build/test/docs flow).
|
|
16
22
|
- Apply graduated doc budget:
|
|
17
23
|
- trivial: no docs
|
|
18
24
|
- simple: `docs/MEMORY.md` only
|
|
19
25
|
- non-trivial: `docs/MEMORY.md` + `docs/PROJECT.md` + `docs/CODE_MAP.md`
|
|
20
|
-
- Identify target files and existing patterns
|
|
21
|
-
- If
|
|
26
|
+
- Identify target files and existing patterns.
|
|
27
|
+
- If task came from handoff, read `tasks/TASK-xxx.md` and locate its **Test Plan** + **Verification Commands**.
|
|
28
|
+
- If confidence is low or risk is high, ask one short clarifying question before deeper analysis.
|
|
22
29
|
|
|
23
30
|
### 2. Plan Approach (< 1 minute)
|
|
24
31
|
|
|
25
|
-
- List files to create/modify (max diff)
|
|
26
|
-
-
|
|
32
|
+
- List files to create/modify (max diff).
|
|
33
|
+
- **Handoff mode only:** if no Test Plan exists in the task file and task is not `trivial`, write one inline before implementing (happy + ≥1 edge case; regression test if fixing a bug). In daily mode, skip this step.
|
|
34
|
+
|
|
35
|
+
### 3. Test First (RED) — Handoff mode
|
|
36
|
+
|
|
37
|
+
- Write the test(s) from §2 / from task Test Plan.
|
|
38
|
+
- Run them: must FAIL for the expected reason. Capture output.
|
|
39
|
+
- If test passes immediately → test is wrong or behavior already exists. Fix the test or stop and report.
|
|
40
|
+
- **Daily mode**: skip this step unless touched code already has tests (then follow original rule).
|
|
27
41
|
|
|
28
|
-
### 3. Implement
|
|
29
42
|
|
|
30
|
-
|
|
31
|
-
- Reuse existing code before creating new
|
|
32
|
-
- No unrelated changes or speculative refactors
|
|
33
|
-
- Follow project conventions (check `.claude/skills/` for patterns)
|
|
43
|
+
### 4. Implement (GREEN)
|
|
34
44
|
|
|
35
|
-
|
|
45
|
+
- Smallest correct change set to make the test pass.
|
|
46
|
+
- Reuse existing code before creating new.
|
|
47
|
+
- No unrelated changes or speculative refactors.
|
|
48
|
+
- Follow project conventions (check `.claude/skills/` for patterns).
|
|
36
49
|
|
|
37
|
-
|
|
38
|
-
- For SQL changes: verify with `EXPLAIN ANALYZE` on non-trivial queries
|
|
39
|
-
- Check no lint errors introduced
|
|
50
|
+
### 5. Verify (REQUIRED before DONE in Handoff mode; targeted in Daily mode)
|
|
40
51
|
|
|
41
|
-
|
|
52
|
+
- **Handoff mode**: run the task's Verification Commands fresh in this turn. Capture full output. If ANY test fails → status is `PARTIAL` or `BLOCKED`, never `DONE`.
|
|
53
|
+
- **Daily mode**: run existing tests if touched behavior has coverage; lint clean; targeted verification only.
|
|
54
|
+
- For SQL changes: verify with `EXPLAIN ANALYZE` on non-trivial queries.
|
|
55
|
+
- Check no lint errors introduced.
|
|
56
|
+
|
|
57
|
+
### 6. Report
|
|
42
58
|
|
|
43
59
|
```
|
|
44
60
|
STATUS: DONE | BLOCKED | PARTIAL
|
|
61
|
+
EXECUTOR_TOOL: [claude-code | kilo-code | codex | opencode | other]
|
|
62
|
+
EXECUTOR_MODEL: [exact model name you are running as — e.g. unic-code, claude-sonnet-4-5, gpt-5-mini. If you truly cannot tell, write "unknown" — reviewer treats unknown as suspicious and asks the human to confirm.]
|
|
63
|
+
EXECUTOR_SUBAGENT: [name of the subagent you are, if your host has multiple — e.g. "Kilo:code", "Claude:feature-implementer". Otherwise "-".]
|
|
45
64
|
SUMMARY: [1-2 sentences of what was implemented]
|
|
65
|
+
TEST_PLAN_FOLLOWED: [task §4 / inline / N/A — reason]
|
|
46
66
|
FILES_CHANGED:
|
|
47
67
|
- [file path]: [what changed]
|
|
48
|
-
|
|
68
|
+
TESTS_ADDED:
|
|
69
|
+
- [test file]: [test names]
|
|
70
|
+
VERIFICATION:
|
|
71
|
+
command: [exact command run]
|
|
72
|
+
result: [N pass / M fail / exit code]
|
|
73
|
+
output_excerpt: |
|
|
74
|
+
[last 5-10 lines of test output]
|
|
49
75
|
ISSUES: [any problems or edge cases, or "none"]
|
|
50
|
-
|
|
76
|
+
HANDOFF_TO_REVIEWER: yes | no — reason
|
|
77
|
+
NEXT: [follow-up needed, or "ready for review"]
|
|
51
78
|
```
|
|
52
79
|
|
|
80
|
+
> **Self-report rule:** UKit cannot force any tool/host to use a specific model. Your self-reported `EXECUTOR_MODEL` is how the reviewer (in another tool or subagent) knows what to compare against its own model. Misreporting → reviewer refuses and asks the human to confirm.
|
|
81
|
+
|
|
82
|
+
### 7. Trigger Reviewer — Handoff mode ONLY
|
|
83
|
+
|
|
84
|
+
- Daily mode: skip this step entirely. Just report and stop.
|
|
85
|
+
- Handoff mode + `STATUS: DONE` + `handoff.reviewer.enabled=true`:
|
|
86
|
+
- Set task status to `pending_review` in `docs/AI_HANDOFF/INDEX.md`.
|
|
87
|
+
- The next AI session (any tool, model from `handoff.reviewer.model`, MUST differ from executor) will pick `pending_review` task and run review.
|
|
88
|
+
- Do NOT dispatch reviewer in-process unless your host explicitly supports it AND can guarantee a different model — file-based handoff is the default.
|
|
89
|
+
|
|
53
90
|
## Rules
|
|
54
91
|
|
|
55
|
-
-
|
|
92
|
+
- **Iron law (Handoff mode):** no `DONE` without fresh PASS output in the current turn.
|
|
93
|
+
- **Daily mode:** original rule — add tests only when touched behavior already has coverage.
|
|
94
|
+
- If Handoff Test Plan says `N/A`, document why in the report and ensure manual verification ran.
|
|
95
|
+
- Never silently skip reviewer phase in Handoff mode; if disabled, say so explicitly in NEXT.
|
|
96
|
+
- Detection rule: if the task came from `docs/AI_HANDOFF/tasks/`, you are in Handoff mode. Otherwise Daily mode.
|
|
@@ -1082,7 +1082,11 @@ function normalizeOutputSummaryForDedupe(summary) {
|
|
|
1082
1082
|
return String(summary ?? '')
|
|
1083
1083
|
.split(/\r?\n/)
|
|
1084
1084
|
.map((line) => String(line ?? '').trim())
|
|
1085
|
-
.filter((line) => line
|
|
1085
|
+
.filter((line) => line
|
|
1086
|
+
&& !/^- Full output:\s+/i.test(line)
|
|
1087
|
+
&& !/^-\s*FAIL(?:ED)?\b/i.test(line)
|
|
1088
|
+
&& !/^-\s*PASS\b/i.test(line)
|
|
1089
|
+
&& !/^-\s*(?:Test Files|Tests|Duration|Start at)\b/i.test(line))
|
|
1086
1090
|
.join('\n');
|
|
1087
1091
|
}
|
|
1088
1092
|
|
|
@@ -1131,6 +1135,16 @@ async function appendOutputHistory(projectRoot, entry) {
|
|
|
1131
1135
|
return nextDocument;
|
|
1132
1136
|
}
|
|
1133
1137
|
|
|
1138
|
+
async function findOutputHistoryEntry(projectRoot, command, summary) {
|
|
1139
|
+
const runtimePaths = buildRuntimePaths(projectRoot);
|
|
1140
|
+
const current = normalizeOutputHistoryDocument(await readJson(runtimePaths.outputHistoryPath, { entries: [] }));
|
|
1141
|
+
const lookupKey = [
|
|
1142
|
+
String(command ?? '').trim(),
|
|
1143
|
+
normalizeOutputSummaryForDedupe(summary),
|
|
1144
|
+
].join('\n').trim().toLowerCase();
|
|
1145
|
+
return current.entries.find((candidate) => buildOutputHistoryDedupeKey(candidate) === lookupKey) ?? null;
|
|
1146
|
+
}
|
|
1147
|
+
|
|
1134
1148
|
function shouldCompress(config) {
|
|
1135
1149
|
return Boolean(config?.tokenPipeline?.outputCompression);
|
|
1136
1150
|
}
|
|
@@ -1244,6 +1258,28 @@ async function main() {
|
|
|
1244
1258
|
exitCode,
|
|
1245
1259
|
projectRoot,
|
|
1246
1260
|
});
|
|
1261
|
+
|
|
1262
|
+
const historyMatch = await findOutputHistoryEntry(projectRoot, result.command, result.summary);
|
|
1263
|
+
if (historyMatch?.summary) {
|
|
1264
|
+
await appendOutputHistory(projectRoot, {
|
|
1265
|
+
timestamp: Date.now(),
|
|
1266
|
+
command: historyMatch.command,
|
|
1267
|
+
profile: historyMatch.profile,
|
|
1268
|
+
summary: historyMatch.summary,
|
|
1269
|
+
tokensBefore: historyMatch.tokensBefore,
|
|
1270
|
+
tokensAfter: historyMatch.tokensAfter,
|
|
1271
|
+
savedTokens: historyMatch.savedTokens,
|
|
1272
|
+
exitCode,
|
|
1273
|
+
rawSaved: historyMatch.rawSaved,
|
|
1274
|
+
rawPath: historyMatch.rawPath,
|
|
1275
|
+
rawBytes: historyMatch.rawBytes,
|
|
1276
|
+
recoveryReason: historyMatch.recoveryReason,
|
|
1277
|
+
truncated: historyMatch.truncated,
|
|
1278
|
+
});
|
|
1279
|
+
process.stdout.write(String(historyMatch.summary));
|
|
1280
|
+
return;
|
|
1281
|
+
}
|
|
1282
|
+
|
|
1247
1283
|
const recoveryReason = buildRawOutputRecoveryReason({
|
|
1248
1284
|
exitCode,
|
|
1249
1285
|
tokensBefore: result.tokensBefore,
|
package/templates/AGENTS.md
CHANGED
|
@@ -83,6 +83,13 @@ For clearly non-code specialist lanes (docs-only, status, task queue), skip the
|
|
|
83
83
|
- Threshold-based compact pressure is internal orchestration; do not expose it to users.
|
|
84
84
|
- For Codex Desktop long sessions, UKit can use soft auto-compact handoffs. Default `compact.codexContext.compactTarget=150` means about 150 compact handoff lines (120-150 preferred, hard max 170), not 150 tokens.
|
|
85
85
|
|
|
86
|
+
## Handoff Quality Gate — OPT-IN
|
|
87
|
+
|
|
88
|
+
Activates ONLY when work goes through `docs/AI_HANDOFF/` (user says "execute task TASK-xxx" or target is `docs/AI_HANDOFF/tasks/*.md`). Daily prompts → unchanged lightweight flow, no test-first/reviewer overhead.
|
|
89
|
+
|
|
90
|
+
In Handoff mode: read `docs/AI_HANDOFF/RULES.md` for the 4-phase spec (Idea+Plan → Create Tasks → Implement+Test → Review+Test), state machine, comment thread, and self-reported model contract. Config: `.ukit/storage/config.json` → `handoff.*`.
|
|
91
|
+
|
|
92
|
+
|
|
86
93
|
## Context + Verification Budget
|
|
87
94
|
|
|
88
95
|
- **Trivial**: no docs, no index query unless the file target is unclear.
|
package/templates/CLAUDE.md
CHANGED
|
@@ -89,6 +89,13 @@ For clearly non-code specialist lanes (docs-only, status, task queue), skip the
|
|
|
89
89
|
- Preserve UTF-8 BOM/no-BOM and LF/CRLF for existing multilingual/user-authored files.
|
|
90
90
|
- Use `node .claude/ukit/index/safe-patch.mjs` internally when normal Edit/Write may normalize bytes or when anchor-based matching is needed.
|
|
91
91
|
|
|
92
|
+
## Handoff Quality Gate — OPT-IN
|
|
93
|
+
|
|
94
|
+
CHỈ kích hoạt khi task đi qua `docs/AI_HANDOFF/` (user nói "execute task TASK-xxx" hoặc target là `docs/AI_HANDOFF/tasks/*.md`). Daily prompt → KHÔNG đụng, flow cũ giữ nguyên.
|
|
95
|
+
|
|
96
|
+
Khi Handoff mode: đọc `docs/AI_HANDOFF/RULES.md` để biết 4 phase (Idea+Plan → Create Tasks → Implement+Test → Review+Test) + state machine + comment thread + self-report model. Config: `.ukit/storage/config.json` → `handoff.*`.
|
|
97
|
+
|
|
98
|
+
|
|
92
99
|
## Context + Verification Budget
|
|
93
100
|
|
|
94
101
|
- **Trivial**: no docs.
|
|
@@ -158,3 +165,4 @@ DuraOne skill chỉ active khi pack `duraone` được cài hoặc `.claude/skil
|
|
|
158
165
|
- `.claude/skills/duraone/references/sql.md`
|
|
159
166
|
- `.claude/skills/duraone/references/workflow.md`
|
|
160
167
|
- Khi không active: dùng generic coding standards + project-specific patterns từ index.
|
|
168
|
+
{{codegraphSection}}
|
|
@@ -1,6 +1,13 @@
|
|
|
1
1
|
# Handoff Task Index
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
|
|
3
|
+
<!--
|
|
4
|
+
Status values (xem RULES.md §Status state machine):
|
|
5
|
+
ready | in_progress | pending_review | changes_requested | critical_block | approved | approved_minor | blocked | done
|
|
6
|
+
|
|
7
|
+
Owner = tool đang giữ task: claude-code | kilo-code | codex | opencode | -
|
|
8
|
+
-->
|
|
9
|
+
|
|
10
|
+
| ID | Title | Priority | Size | Status | Owner | Reviewer | File |
|
|
11
|
+
|----|-------|----------|------|--------|-------|----------|------|
|
|
5
12
|
|
|
6
13
|
Updated:
|
|
@@ -2,4 +2,74 @@
|
|
|
2
2
|
|
|
3
3
|
Status: `empty`
|
|
4
4
|
|
|
5
|
-
<!--
|
|
5
|
+
<!--
|
|
6
|
+
File này CHỈ dùng cho luồng Handoff Quality Gate.
|
|
7
|
+
Daily prompt / quick fix → KHÔNG cần đụng vào đây, UKit vẫn chạy flow cũ bình thường.
|
|
8
|
+
|
|
9
|
+
Chỉ kích hoạt khi user explicit đẩy việc qua handoff (ví dụ: "đưa vào handoff", "gom ý tưởng X").
|
|
10
|
+
|
|
11
|
+
QUALITY GATE: mỗi task split ra phải kèm Test Plan (xem mục bên dưới).
|
|
12
|
+
Không có Test Plan → task không được phép chuyển sang status `ready`.
|
|
13
|
+
-->
|
|
14
|
+
|
|
15
|
+
## 1. Intent / Goal
|
|
16
|
+
|
|
17
|
+
<!-- 1-2 câu mô tả thứ user muốn đạt. Không paste lại nguyên prompt. -->
|
|
18
|
+
|
|
19
|
+
## 2. Scope
|
|
20
|
+
|
|
21
|
+
- In scope:
|
|
22
|
+
- Out of scope:
|
|
23
|
+
- Risk surface (file/module rủi ro share):
|
|
24
|
+
|
|
25
|
+
## 3. Approach
|
|
26
|
+
|
|
27
|
+
<!-- Cách làm ngắn gọn. Reuse code có sẵn trước khi tạo mới. -->
|
|
28
|
+
|
|
29
|
+
## 4. Test Plan (REQUIRED — TDD-style)
|
|
30
|
+
|
|
31
|
+
Liệt kê test sẽ viết TRƯỚC khi code. Mỗi test phải có:
|
|
32
|
+
|
|
33
|
+
| # | Loại | Tên test | File | Expect | Pre-state |
|
|
34
|
+
|---|------|----------|------|--------|-----------|
|
|
35
|
+
| 1 | unit / integration / regression / e2e | `<tên test mô tả hành vi>` | `<path/to/file.test.js>` | `<output kỳ vọng cụ thể>` | `<input/fixture>` |
|
|
36
|
+
|
|
37
|
+
Bắt buộc tối thiểu:
|
|
38
|
+
|
|
39
|
+
- **Happy path**: hành vi chính chạy đúng.
|
|
40
|
+
- **Edge case**: ít nhất 1 (null/empty/boundary/concurrent…).
|
|
41
|
+
- **Regression** (nếu fix bug): test fail-trước-khi-fix, pass-sau-khi-fix.
|
|
42
|
+
|
|
43
|
+
Nếu task không thể test (config-only, doc-only, prototype throw-away): ghi `Test plan: N/A — lý do: <…>` và đính kèm phương án verify thủ công.
|
|
44
|
+
|
|
45
|
+
## 5. Verification Commands
|
|
46
|
+
|
|
47
|
+
Lệnh chính xác executor sẽ chạy:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
# ví dụ:
|
|
51
|
+
# yarn test path/to/file.test.js
|
|
52
|
+
# yarn test --run
|
|
53
|
+
# node scripts/smoke.mjs
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## 6. Acceptance Criteria
|
|
57
|
+
|
|
58
|
+
- [ ] Tất cả test ở Test Plan PASS (kèm output trong report).
|
|
59
|
+
- [ ] Không có regression ở suite liên quan.
|
|
60
|
+
- [ ] Reviewer (model riêng) báo `APPROVED` hoặc `APPROVED-WITH-MINOR`.
|
|
61
|
+
- [ ] Docs/CHANGELOG cập nhật nếu user-facing.
|
|
62
|
+
|
|
63
|
+
## 7. Task Split (Phase 2 — TDD-embedded, MANDATORY)
|
|
64
|
+
|
|
65
|
+
Khi human approve plan, AI tạo từng `tasks/TASK-xxx.md` theo cấu trúc ở `tasks/_TEMPLATE.md`.
|
|
66
|
+
|
|
67
|
+
**Mỗi TASK file BẮT BUỘC có:**
|
|
68
|
+
- `## Test Cases` — bảng test (loại, tên, expected, fixture) cho slice của task. Tối thiểu happy + 1 edge + regression nếu fix bug.
|
|
69
|
+
- `## Test Files` — đường dẫn cụ thể file test sẽ tạo/sửa.
|
|
70
|
+
- `## Verification Commands` — lệnh executor + reviewer đều chạy fresh.
|
|
71
|
+
- `## Acceptance Criteria`.
|
|
72
|
+
|
|
73
|
+
Task không kèm Test Cases + Test Files cụ thể → đánh `needs_breakdown`, không cho status `ready`.
|
|
74
|
+
|
|
75
|
+
Update `INDEX.md`: thêm row cho mỗi task mới với `Status: ready`, `Owner: -`.
|
|
@@ -7,7 +7,7 @@
|
|
|
7
7
|
- Do NOT read `RULES.md` every request — only when you need flow clarification.
|
|
8
8
|
- Do NOT read multiple task files in one request.
|
|
9
9
|
- If ACTIVE.md + INDEX.md + task file would exceed budget, read only the task file.
|
|
10
|
-
- Auto-compact: if any single
|
|
10
|
+
- Auto-compact: if any **state file** (`ACTIVE.md`, `INDEX.md`, or any single `tasks/TASK-xxx.md`) exceeds 80 lines, trigger `clear handoff` / split task. `PLAN.md` and `RULES.md` are reference/spec — exempt.
|
|
11
11
|
|
|
12
12
|
## How Human Submits Ideas
|
|
13
13
|
|
|
@@ -15,12 +15,90 @@
|
|
|
15
15
|
- If request is already a concrete task (clear file/logic/output, small enough to do now), bypass handoff and execute directly.
|
|
16
16
|
- If request is broad/ambiguous/multi-step, use handoff.
|
|
17
17
|
|
|
18
|
-
##
|
|
18
|
+
## Hard rule — All work stays in `docs/AI_HANDOFF/`
|
|
19
19
|
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
20
|
+
Mọi giao tiếp giữa các AI trong handoff CHỈ qua file dưới `docs/AI_HANDOFF/`:
|
|
21
|
+
|
|
22
|
+
- `PLAN.md` — brainstorm + Test Plan tổng (Phase 1).
|
|
23
|
+
- `INDEX.md` — bảng task + status (mọi phase đọc/ghi).
|
|
24
|
+
- `tasks/TASK-xxx.md` — nơi sống của từng task: Goal + Test Cases + Verification + Executor Report + Reviewer Verdict + **Discussion thread**.
|
|
25
|
+
- `ACTIVE.md` — snapshot cycle hiện tại.
|
|
26
|
+
- `archive/` — cycle cũ.
|
|
27
|
+
|
|
28
|
+
**Cấm**: AI gửi câu hỏi/comment qua chat tool khác, qua commit message, hay qua file ngoài thư mục này. Lý do: cross-tool/cross-subagent chỉ đồng bộ được qua file. AI nào không đọc folder này = không tham gia handoff.
|
|
29
|
+
|
|
30
|
+
### Discussion thread (AI-to-AI comments)
|
|
31
|
+
|
|
32
|
+
Khi cần hỏi-lại / push-back / gợi ý cho phase khác, AI ghi vào `## Discussion` của task file (template ở `tasks/_TEMPLATE.md`). Format:
|
|
33
|
+
|
|
34
|
+
```
|
|
35
|
+
### <YYYY-MM-DD> · <role: planner|executor|reviewer> · <tool/model>
|
|
36
|
+
<nội dung — gửi @planner / @executor / @reviewer nếu có người nhận>
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Phase kế tiếp PHẢI đọc Discussion trước khi tiếp tục — coi như inbox.
|
|
40
|
+
|
|
41
|
+
## Handoff Flow (tool-agnostic, file-based state machine)
|
|
42
|
+
|
|
43
|
+
UKit handoff hoạt động qua **file state**. Anh tự chọn tool nào cho từng phase — Claude Code / Kilo Code / Codex / OpenCode / tool mới sau này — đều được. UKit chỉ care về **role của model**, không care tool.
|
|
44
|
+
|
|
45
|
+
3 phase × 3 role model:
|
|
46
|
+
|
|
47
|
+
- **Plan** — model mạnh nhất anh có (reasoning model). Có thể chạy ở bất kỳ tool nào hỗ trợ planning tốt.
|
|
48
|
+
- **Execute** — model rẻ-mà-vẫn-thông-minh (code model). Có thể là subagent code của Kilo, hay agent build của OpenCode, hay feature-implementer của Claude Code.
|
|
49
|
+
- **Review** — **MODEL KHÁC executor** (reasoning model thường tốt hơn). Có thể là tool khác, hoặc cùng tool nhưng subagent khác model (ví dụ Kilo có subagent code và subagent review riêng).
|
|
50
|
+
|
|
51
|
+
Hai mô hình triển khai đều hợp lệ:
|
|
52
|
+
- **Cross-tool**: ví dụ Claude (plan) → Kilo (execute) → Claude (review). Bridge qua file.
|
|
53
|
+
- **Same-tool different-subagent**: ví dụ Kilo:plan → Kilo:code → Kilo:review, miễn 3 subagent dùng MODEL khác nhau ở role tương ứng.
|
|
54
|
+
|
|
55
|
+
Mỗi tool/subagent đọc cùng `INDEX.md` + `tasks/TASK-xxx.md` → chọn task theo `status` → cập nhật status khi xong.
|
|
56
|
+
|
|
57
|
+
> **Quan trọng — UKit không enforce model:** `handoff.executor.cheapSmartModelHint` và `handoff.reviewer.model` trong `.ukit/storage/config.json` chỉ là **nhãn** để anh biết MUỐN dùng gì. Tool nào dùng model nào là do anh chọn trong settings của tool đó. UKit enforce contract bằng cách bắt executor TỰ KHAI `EXECUTOR_MODEL` trong Executor Report; reviewer so với chính nó và refuse nếu trùng. Vì vậy nếu trong Kilo anh để cả code-subagent và review-subagent đều dùng cùng model → reviewer sẽ tự refuse, không silent-pass.
|
|
58
|
+
|
|
59
|
+
### Status state machine
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
brainstorm ──[plan approved]──▶ ready ──[executor pick]──▶ in_progress
|
|
63
|
+
├─[PASS]──▶ pending_review
|
|
64
|
+
└─[FAIL]──▶ blocked
|
|
65
|
+
pending_review ──[reviewer]──▶ approved | approved_minor ──▶ done
|
|
66
|
+
├▶ changes_requested ──[fix]──▶ in_progress
|
|
67
|
+
└▶ critical_block ──[fix]──▶ in_progress
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
### 4 Phases
|
|
71
|
+
|
|
72
|
+
**Phase 1 — Idea + Plan** (smart/reasoning model)
|
|
73
|
+
- Human submit ideas (natural language).
|
|
74
|
+
- AI ghi vào `PLAN.md`: §1 Intent, §2 Scope, §3 Approach, **§4 Test Plan (bắt buộc TDD-style)**, §5 Verification Commands, §6 Acceptance Criteria.
|
|
75
|
+
- Output: PLAN.md đầy đủ, chờ human approve.
|
|
76
|
+
|
|
77
|
+
**Phase 2 — Create Tasks (TDD-embedded, MANDATORY)** (smart/reasoning model, thường cùng phase 1)
|
|
78
|
+
- Human approve plan → AI split `PLAN.md §7` sang nhiều `tasks/TASK-xxx.md`.
|
|
79
|
+
- **Mỗi TASK file BẮT BUỘC có Test Plan của riêng nó**, không chỉ trỏ về PLAN.md. Cụ thể:
|
|
80
|
+
- `§ Test Cases`: bảng test (loại, tên test, expected) cho phần task này — happy + ≥1 edge case + regression (nếu fix bug).
|
|
81
|
+
- `§ Test Files`: đường dẫn cụ thể file test sẽ tạo/sửa (ví dụ `tests/auth/login.test.js`).
|
|
82
|
+
- `§ Verification Commands`: lệnh executor sẽ chạy để xác nhận PASS.
|
|
83
|
+
- `§ Acceptance Criteria`: checklist.
|
|
84
|
+
- Nếu split mà task nào không kèm được Test Cases + Test Files cụ thể → task đó chưa đủ `ready`, đánh `needs_breakdown`.
|
|
85
|
+
- Update `INDEX.md`: thêm row mỗi task với status `ready`.
|
|
86
|
+
- Đây là **điểm cắt human-approval**: phase này xong, executor được phép pick.
|
|
87
|
+
- Mục tiêu: executor (cheap-smart model) đọc task file là biết NGAY test gì cần viết trước, KHÔNG phải tự suy diễn.
|
|
88
|
+
|
|
89
|
+
**Phase 3 — Implement + Test** (cheap-smart/code model)
|
|
90
|
+
- User: "execute next task" / "làm TASK-001" / "implement task 1".
|
|
91
|
+
- Executor đọc `INDEX.md` → pick `ready` task → đổi `in_progress` → **viết test trước → RED → implement → GREEN** → chạy Verification Commands fresh trong turn → append `## Executor Report` (gồm `EXECUTOR_TOOL`/`EXECUTOR_MODEL`/`EXECUTOR_SUBAGENT` + verification output) vào cuối task file → đổi status `pending_review`.
|
|
92
|
+
- KHÔNG được claim DONE nếu chưa có PASS fresh.
|
|
93
|
+
|
|
94
|
+
**Phase 4 — Review + Test** (reviewer model — KHÁC model executor)
|
|
95
|
+
- User: "review pending tasks" / "review TASK-001".
|
|
96
|
+
- Reviewer đọc INDEX → pick `pending_review` → đọc task file + diff → **so model với `EXECUTOR_MODEL`, refuse nếu trùng/unknown** → **re-run Verification Commands fresh** (không tin executor) → áp `code-review` skill → append `## Reviewer Verdict` vào task file (verdict + findings + reviewer model dùng) → đổi status:
|
|
97
|
+
- `approved` / `approved_minor` → cho phép `done`.
|
|
98
|
+
- `changes_requested` → executor phải fix Important → lặp Phase 3-4.
|
|
99
|
+
- `critical_block` → executor PHẢI fix → lặp Phase 3-4.
|
|
100
|
+
|
|
101
|
+
Nếu `handoff.reviewer.enabled=false`, Phase 4 skip nhưng phải log lý do vào task — bỏ Phase 4 là bỏ lưới an toàn cuối.
|
|
24
102
|
|
|
25
103
|
## Task Gate
|
|
26
104
|
|
|
@@ -28,11 +106,13 @@ A task is `ready` only when it has:
|
|
|
28
106
|
- Clear target files
|
|
29
107
|
- Clear action
|
|
30
108
|
- Dependencies stated
|
|
31
|
-
-
|
|
109
|
+
- **Test Plan** (PLAN.md §4) — happy path + ≥1 edge case (+ regression test nếu fix bug); hoặc `N/A` kèm lý do
|
|
110
|
+
- Verification command (lệnh executor sẽ chạy)
|
|
32
111
|
- Acceptance criteria
|
|
33
112
|
|
|
34
113
|
Missing any → `needs_breakdown`, `blocked`, or `needs_human`.
|
|
35
114
|
|
|
115
|
+
|
|
36
116
|
## Clear Handoff
|
|
37
117
|
|
|
38
118
|
1. Archive current cycle → `archive/cycle-NNN.md`.
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
# TASK-XXX — <short title>
|
|
2
|
+
|
|
3
|
+
<!--
|
|
4
|
+
Template cho mỗi task. Planner copy file này khi split PLAN.md sang task riêng.
|
|
5
|
+
File này BẮT BUỘC giữ structure: Goal + Test Cases + Test Files + Verification + Acceptance.
|
|
6
|
+
Mọi AI (planner / executor / reviewer) đọc và ghi vào file NÀY. Không trao đổi ngoài file.
|
|
7
|
+
-->
|
|
8
|
+
|
|
9
|
+
- Status: `ready` <!-- ready | in_progress | pending_review | changes_requested | critical_block | approved | approved_minor | blocked | done -->
|
|
10
|
+
- Owner: `-` <!-- tool đang giữ task -->
|
|
11
|
+
- Reviewer: `-` <!-- model name reviewer dùng, set ở Phase 4 -->
|
|
12
|
+
- Parent plan: `docs/AI_HANDOFF/PLAN.md` §<section>
|
|
13
|
+
|
|
14
|
+
## Goal
|
|
15
|
+
|
|
16
|
+
<!-- 1-2 câu mô tả slice này làm gì. -->
|
|
17
|
+
|
|
18
|
+
## Target Files
|
|
19
|
+
|
|
20
|
+
- `<path/to/source.js>` — <what changes>
|
|
21
|
+
|
|
22
|
+
## Test Cases (REQUIRED — TDD)
|
|
23
|
+
|
|
24
|
+
| # | Loại | Tên test | Expected | Pre-state / Fixture |
|
|
25
|
+
|---|------|----------|----------|---------------------|
|
|
26
|
+
| 1 | unit | `<describe behavior>` | `<concrete expected>` | `<input>` |
|
|
27
|
+
| 2 | edge | `<null/empty/boundary>` | `<expected>` | `<input>` |
|
|
28
|
+
| 3 | regression (nếu bug fix) | `<reproduces bug>` | RED before fix, GREEN after | `<repro input>` |
|
|
29
|
+
|
|
30
|
+
## Test Files
|
|
31
|
+
|
|
32
|
+
- `<tests/path/to/file.test.js>` — chứa các test ở trên.
|
|
33
|
+
|
|
34
|
+
## Verification Commands
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
yarn test tests/path/to/file.test.js
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Acceptance Criteria
|
|
41
|
+
|
|
42
|
+
- [ ] Mọi test ở §Test Cases PASS.
|
|
43
|
+
- [ ] Không regression ở suite liên quan.
|
|
44
|
+
- [ ] Reviewer verdict APPROVED hoặc APPROVED-WITH-MINOR.
|
|
45
|
+
- [ ] Docs/CHANGELOG cập nhật nếu user-facing.
|
|
46
|
+
|
|
47
|
+
## Dependencies
|
|
48
|
+
|
|
49
|
+
- (none) <!-- hoặc TASK-xxx phải done trước -->
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Discussion
|
|
54
|
+
|
|
55
|
+
<!--
|
|
56
|
+
AI nói chuyện với nhau Ở ĐÂY, không nói qua tool khác.
|
|
57
|
+
Format mỗi comment:
|
|
58
|
+
|
|
59
|
+
### <date> · <role: planner|executor|reviewer> · <tool/model>
|
|
60
|
+
<nội dung — câu hỏi, lưu ý, đề xuất, đẩy ngược về phase trước>
|
|
61
|
+
|
|
62
|
+
Reply lùn 1 level (####). Ghi rõ "→ @planner" / "→ @executor" / "→ @reviewer" nếu có người nhận cụ thể.
|
|
63
|
+
-->
|
|
64
|
+
|
|
65
|
+
(chưa có comment)
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
<!--
|
|
70
|
+
Phase 3 executor append `## Executor Report` BÊN DƯỚI dấu phân cách này.
|
|
71
|
+
Phase 4 reviewer append `## Reviewer Verdict` BÊN DƯỚI Executor Report.
|
|
72
|
+
-->
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
{
|
|
2
|
-
"version": "1.5.
|
|
2
|
+
"version": "1.5.6",
|
|
3
3
|
"agent": "claude-code",
|
|
4
4
|
"autonomy": {
|
|
5
5
|
"level": "balanced",
|
|
@@ -156,6 +156,45 @@
|
|
|
156
156
|
"deltaMaxHunks": 3,
|
|
157
157
|
"deltaMaxDiffCells": 2000000
|
|
158
158
|
},
|
|
159
|
+
"handoff": {
|
|
160
|
+
"enabled": true,
|
|
161
|
+
"crossTool": true,
|
|
162
|
+
"plan": {
|
|
163
|
+
"requireTestPlan": true,
|
|
164
|
+
"minTestsHappyPath": 1,
|
|
165
|
+
"minTestsEdgeCase": 1,
|
|
166
|
+
"regressionTestRequiredForBugfix": true,
|
|
167
|
+
"smartModelHint": "claude-opus-4-6"
|
|
168
|
+
},
|
|
169
|
+
"executor": {
|
|
170
|
+
"testFirstRequired": true,
|
|
171
|
+
"mustRunVerificationInTurn": true,
|
|
172
|
+
"blockDoneWithoutFreshPass": true,
|
|
173
|
+
"cheapSmartModelHint": "unic-code",
|
|
174
|
+
"appendReportToTaskFile": true
|
|
175
|
+
},
|
|
176
|
+
"reviewer": {
|
|
177
|
+
"enabled": true,
|
|
178
|
+
"model": "unic-smart",
|
|
179
|
+
"agent": "code-reviewer",
|
|
180
|
+
"mustDifferFromExecutor": true,
|
|
181
|
+
"blockOnCritical": true,
|
|
182
|
+
"blockOnChangesRequested": true,
|
|
183
|
+
"rerunVerificationCommands": true,
|
|
184
|
+
"appendVerdictToTaskFile": true
|
|
185
|
+
},
|
|
186
|
+
"statusFlow": [
|
|
187
|
+
"ready",
|
|
188
|
+
"in_progress",
|
|
189
|
+
"pending_review",
|
|
190
|
+
"changes_requested",
|
|
191
|
+
"critical_block",
|
|
192
|
+
"approved",
|
|
193
|
+
"approved_minor",
|
|
194
|
+
"blocked",
|
|
195
|
+
"done"
|
|
196
|
+
]
|
|
197
|
+
},
|
|
159
198
|
"subagents": {
|
|
160
199
|
"enabled": true,
|
|
161
200
|
"smallTaskModel": "unic-lite",
|
|
@@ -259,6 +298,30 @@
|
|
|
259
298
|
"field": "compact.codexContext.autoCompact",
|
|
260
299
|
"mac_dinh": true,
|
|
261
300
|
"y_nghia": "Chỉ set false khi đang debug hành vi compact của UKit."
|
|
301
|
+
},
|
|
302
|
+
"doi_reviewer_model": {
|
|
303
|
+
"field": "handoff.reviewer.model",
|
|
304
|
+
"mac_dinh": "unic-smart",
|
|
305
|
+
"y_nghia": "Model dùng cho reviewer agent ở Phase 3. BẮT BUỘC khác model executor để bắt được lỗi mà executor miss. Có thể dùng claude-opus-4-6, unic-smart, hoặc bất kỳ model reasoning mạnh nào.",
|
|
306
|
+
"vi_du": "Nếu executor là unic-code (Kilo Code), set reviewer.model=unic-smart hoặc claude-opus-4-6. Nếu executor là claude-sonnet, set reviewer thành claude-opus."
|
|
307
|
+
},
|
|
308
|
+
"tat_reviewer_phase": {
|
|
309
|
+
"field": "handoff.reviewer.enabled",
|
|
310
|
+
"mac_dinh": true,
|
|
311
|
+
"y_nghia": "Tắt Phase 3 review nếu muốn chạy nhanh prototype. KHÔNG khuyến nghị cho code production — bỏ review là bỏ lưới an toàn cuối cùng.",
|
|
312
|
+
"khi_nao_tat": "Chỉ tắt cho throw-away prototype hoặc khi anh tự review thủ công."
|
|
313
|
+
},
|
|
314
|
+
"block_handoff_khi_critical": {
|
|
315
|
+
"field": "handoff.reviewer.blockOnCritical",
|
|
316
|
+
"mac_dinh": true,
|
|
317
|
+
"y_nghia": "Nếu reviewer báo CRITICAL → task không được phép done; executor phải fix và review lại. Set false chỉ khi muốn cảnh báo mềm.",
|
|
318
|
+
"khuyen_nghi": "Giữ true. Đây là rào chắn chống ship lỗi nghiêm trọng."
|
|
319
|
+
},
|
|
320
|
+
"bat_test_plan_bat_buoc": {
|
|
321
|
+
"field": "handoff.plan.requireTestPlan",
|
|
322
|
+
"mac_dinh": true,
|
|
323
|
+
"y_nghia": "Planner phải hoàn thành PLAN.md §4 (Test Plan) trước khi task chuyển ready. Tắt sẽ làm UKit cho phép skip TDD — kéo theo executor dễ làm sót.",
|
|
324
|
+
"khuyen_nghi": "Giữ true."
|
|
262
325
|
}
|
|
263
326
|
},
|
|
264
327
|
"version": "Phiên bản config runtime đi kèm package UKit.",
|
|
@@ -374,6 +437,35 @@
|
|
|
374
437
|
"deltaMaxChangedLines": "Ngân sách dòng đổi mặc định cho kiểm tra delta sau này.",
|
|
375
438
|
"deltaMaxHunks": "Ngân sách hunk đổi mặc định cho kiểm tra delta sau này.",
|
|
376
439
|
"deltaMaxDiffCells": "Giới hạn cell LCS khi tính delta; quá ngưỡng này UKit dùng fallback tuyến tính để tránh chậm trên file rất lớn."
|
|
440
|
+
},
|
|
441
|
+
"handoff": {
|
|
442
|
+
"enabled": "Bật Quality Gate cho handoff: plan có Test Plan, executor test-first, reviewer model khác. Tắt = quay về flow cũ (dễ lọt lỗi vặt).",
|
|
443
|
+
"crossTool": "true nghĩa là handoff truyền qua file (PLAN/INDEX/tasks) chứ không qua in-process subagent — cho phép plan ở Claude Code, execute ở Kilo Code, review ở Claude Code khác model.",
|
|
444
|
+
"plan": {
|
|
445
|
+
"requireTestPlan": "Bắt buộc PLAN.md §4 phải có Test Plan trước khi task chuyển ready.",
|
|
446
|
+
"minTestsHappyPath": "Tối thiểu test cho happy path.",
|
|
447
|
+
"minTestsEdgeCase": "Tối thiểu test cho edge case (null/empty/boundary/concurrent…).",
|
|
448
|
+
"regressionTestRequiredForBugfix": "Bug fix phải có regression test fail-trước-fix.",
|
|
449
|
+
"smartModelHint": "Gợi ý model mạnh nhất cho phase plan (ví dụ claude-opus-4-6). UKit không tự ép, chỉ ghi hint vào task."
|
|
450
|
+
},
|
|
451
|
+
"executor": {
|
|
452
|
+
"testFirstRequired": "Executor phải viết test trước khi implement (RED → GREEN).",
|
|
453
|
+
"mustRunVerificationInTurn": "Phải chạy Verification Commands trong cùng turn báo DONE; không tin output cũ.",
|
|
454
|
+
"blockDoneWithoutFreshPass": "Không cho phép STATUS:DONE nếu chưa có PASS fresh trong turn này.",
|
|
455
|
+
"cheapSmartModelHint": "Gợi ý model rẻ-mà-vẫn-thông-minh cho executor (ví dụ unic-code). UKit chỉ ghi hint.",
|
|
456
|
+
"appendReportToTaskFile": "Executor append `## Executor Report` vào cuối task file để reviewer (tool khác) đọc được."
|
|
457
|
+
},
|
|
458
|
+
"reviewer": {
|
|
459
|
+
"enabled": "Bật Phase 3 review độc lập. Tắt = bỏ lưới an toàn cuối cùng, không khuyến nghị.",
|
|
460
|
+
"model": "Model reviewer. BẮT BUỘC khác executor model. Mặc định unic-smart; có thể dùng claude-opus-4-6 hoặc bất kỳ model reasoning mạnh nào.",
|
|
461
|
+
"agent": "Tên reviewer agent (xem .claude/agents/code-reviewer.md).",
|
|
462
|
+
"mustDifferFromExecutor": "Nếu true, reviewer tự refuse khi phát hiện cùng model với executor.",
|
|
463
|
+
"blockOnCritical": "CRITICAL → block handoff cứng. Set false chỉ khi muốn warning mềm.",
|
|
464
|
+
"blockOnChangesRequested": "CHANGES-REQUESTED → block handoff đến khi fix. Set false sẽ cho phép done với issue Important.",
|
|
465
|
+
"rerunVerificationCommands": "Reviewer phải tự chạy lại Verification Commands, không tin output executor.",
|
|
466
|
+
"appendVerdictToTaskFile": "Reviewer append `## Reviewer Verdict` vào task file để khép vòng handoff."
|
|
467
|
+
},
|
|
468
|
+
"statusFlow": "Danh sách status hợp lệ trong INDEX.md. Đổi tên trong này = đổi state machine handoff."
|
|
377
469
|
}
|
|
378
470
|
}
|
|
379
471
|
}
|