tink-harness 1.13.0 → 1.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "tink",
3
3
  "description": "A small harness layer for Claude Code and Codex.",
4
- "version": "1.13.0",
4
+ "version": "1.14.0",
5
5
  "author": {
6
6
  "name": "dotori"
7
7
  }
package/CHANGELOG.md CHANGED
@@ -2,6 +2,11 @@
2
2
 
3
3
  All notable changes to Tink are tracked here.
4
4
 
5
+ ## [1.14.0] - 2026-06-19
6
+
7
+ - Added `CLAUDE_CONFIG_DIR` support: global installs now respect the env var (set via direnv or shell) so commands and skills land in the right config directory instead of always defaulting to `~/.claude`.
8
+ - Added `tink-harness update --all-repos`: finds every repo under the home directory that has Tink installed and updates each one. Uses `direnv exec` when available so per-repo `.envrc` overrides (including `CLAUDE_CONFIG_DIR`) are applied automatically; falls back to parsing simple `export` lines from `.envrc` otherwise.
9
+
5
10
  ## [1.13.0] - 2026-06-19
6
11
 
7
12
  - Added focused opt-in harnesses for recurring agent workflows: `issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, and `architecture-deepening`.
package/README.ko.md CHANGED
@@ -10,7 +10,7 @@ Tink는 사소하지 않은 모든 에이전트 작업을 눈에 보이는 파
10
10
 
11
11
  <sub>Claude Code와 Codex를 위한 작은 하네스 레이어</sub>
12
12
 
13
- **최신 패키지:** v1.13.0 — 이슈 정리, 어려운 버그 진단 루프, 리뷰, decision map, architecture deepening을 위한 focused opt-in 하네스를 추가하고 Claude Code·Codex 양쪽 cast 라우팅과 문서를 갱신했습니다. 전체 변경 이력은 [CHANGELOG](CHANGELOG.md)를 확인하세요.
13
+ **최신 패키지:** v1.14.0 — 글로벌 설치 `CLAUDE_CONFIG_DIR` 환경변수를 반영하고, `update --all-repos`로 하위 모든 Tink 레포를 번에 업데이트할 있게 됐습니다. direnv가 있으면 레포별 `.envrc`를 자동으로 로드합니다. 전체 변경 이력은 [CHANGELOG](CHANGELOG.md)를 확인하세요.
14
14
 
15
15
  [English](README.md) · **한국어** · [변경 이력](CHANGELOG.md)
16
16
 
package/README.md CHANGED
@@ -24,7 +24,7 @@
24
24
  <a href="https://github.com/dotoricode/tink-harness/stargazers"><img src="https://img.shields.io/github/stars/dotoricode/tink-harness?style=social" alt="GitHub stars"></a>
25
25
  </p>
26
26
 
27
- <p><strong>Latest package:</strong> v1.13.0 - Tink adds focused opt-in harnesses for issue triage, hard-bug diagnosis loops, two-axis reviews, decision maps, and architecture deepening, with cast routing and docs updated for both Claude Code and Codex. See <a href="CHANGELOG.md">CHANGELOG</a> for release history.</p>
27
+ <p><strong>Latest package:</strong> v1.14.0 - Tink respects <code>CLAUDE_CONFIG_DIR</code> for global installs and adds <code>update --all-repos</code> to refresh every Tink-installed repo in one command, with direnv support for per-repo env overrides. See <a href="CHANGELOG.md">CHANGELOG</a> for release history.</p>
28
28
 
29
29
  **English** · [한국어](README.ko.md) · [Changelog](CHANGELOG.md)
30
30
 
package/VERSIONING.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Versioning
2
2
 
3
- Current version: `1.13.0`
3
+ Current version: `1.14.0`
4
4
 
5
5
  Tink follows semver from `1.0.0` onward.
6
6
 
package/bin/install.js CHANGED
@@ -126,7 +126,7 @@ function argValue(name) {
126
126
  }
127
127
 
128
128
  function usage() {
129
- console.log(`Tink installer for Claude Code and Codex\n\nUsage:\n tink-harness [install] [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--with-hook] [--clean-codex-picker] [--dry-run] [--force]\n tink-harness update [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--clean-codex-picker] [--dry-run] [--force]\n tink-harness dashboard [--no-open]\n\nIf the command is not installed yet, use:\n npx tink-harness@latest [install]\n npx tink-harness@latest update\n\nCommands:\n install Install Tink.\n update Update Tink to the latest templates. Asks only the agent surface; Tink-owned files always refresh, user-modified harness/memory/config files are kept.\n dashboard Generate the harness health report from local .tink records and open it in your browser. Use --no-open to skip opening.\n\nDefault interactive flow:\n 1. Select language\n 2. Show TINK wizard\n 3. Select Claude Code, Codex, or both\n 4. Select components\n 5. Select repo/global installation scope\n 6. Select Advanced options\n 7. Select git tracking policy for project state\n\nAdvanced options:\n --dry-run Preview only. Show what would be written or removed, but do not change files.\n --force Overwrite user-modified files. Use only when you want official templates to replace local edits.\n --clean-codex-picker Codex-only cleanup. Remove repo-local Claude Tink surfaces that show as Source Command Tink entries.\n\nEnvironment:\n TINK_INSTALL_SURFACES=claude|codex|all\n TINK_CLEAN_CODEX_PICKER=1\n\nScopes:\n repo Install shared .tink files into the current project.\n global Install shared .tink files into your home directory.\n`);
129
+ console.log(`Tink installer for Claude Code and Codex\n\nUsage:\n tink-harness [install] [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--with-hook] [--clean-codex-picker] [--dry-run] [--force]\n tink-harness update [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--clean-codex-picker] [--dry-run] [--force]\n tink-harness update --all-repos\n tink-harness dashboard [--no-open]\n\nIf the command is not installed yet, use:\n npx tink-harness@latest [install]\n npx tink-harness@latest update\n\nCommands:\n install Install Tink.\n update Update Tink to the latest templates. Asks only the agent surface; Tink-owned files always refresh, user-modified harness/memory/config files are kept.\n dashboard Generate the harness health report from local .tink records and open it in your browser. Use --no-open to skip opening.\n\nDefault interactive flow:\n 1. Select language\n 2. Show TINK wizard\n 3. Select Claude Code, Codex, or both\n 4. Select components\n 5. Select repo/global installation scope\n 6. Select Advanced options\n 7. Select git tracking policy for project state\n\nAdvanced options:\n --dry-run Preview only. Show what would be written or removed, but do not change files.\n --force Overwrite user-modified files. Use only when you want official templates to replace local edits.\n --clean-codex-picker Codex-only cleanup. Remove repo-local Claude Tink surfaces that show as Source Command Tink entries.\n --all-repos Update all repos with Tink under the home directory. Uses direnv if available to load per-repo .envrc.\n\nEnvironment:\n TINK_INSTALL_SURFACES=claude|codex|all\n TINK_CLEAN_CODEX_PICKER=1\n CLAUDE_CONFIG_DIR Override ~/.claude for global installs (e.g. set by direnv per project)\n CODEX_HOME Override ~/.codex for Codex skill installs\n\nScopes:\n repo Install shared .tink files into the current project.\n global Install shared .tink files into your home directory.\n`);
130
130
  }
131
131
 
132
132
  function findTinkRoot() {
@@ -228,6 +228,15 @@ function codexHome() {
228
228
  return process.env.CODEX_HOME || path.join(os.homedir(), '.codex');
229
229
  }
230
230
 
231
+ // CLAUDE_CONFIG_DIR replaces ~/.claude for global installs (like direnv per-project overrides).
232
+ // Repo-scope installs always use <repo>/.claude regardless of this env var.
233
+ function claudeDir(target) {
234
+ if (process.env.CLAUDE_CONFIG_DIR && target === os.homedir()) {
235
+ return process.env.CLAUDE_CONFIG_DIR;
236
+ }
237
+ return path.join(target, '.claude');
238
+ }
239
+
231
240
  function legacyComponentOptionsFor(agent, language) {
232
241
  const options = COMPONENTS[language].filter((item) => {
233
242
  if (item.value === 'commands') return includesClaude(agent);
@@ -364,8 +373,8 @@ function locationSummary(agent, scope) {
364
373
  return [
365
374
  `Repo target: ${repoTarget}`,
366
375
  `Shared .tink target: ${path.join(installTarget, '.tink')}`,
367
- includesClaude(agent) ? `Claude Code command target: ${path.join(installTarget, '.claude/commands/tink')}` : null,
368
- includesClaude(agent) ? `Claude Code skill target: ${path.join(installTarget, '.claude/skills/tink')}` : null,
376
+ includesClaude(agent) ? `Claude Code command target: ${path.join(claudeDir(installTarget), 'commands/tink')}` : null,
377
+ includesClaude(agent) ? `Claude Code skill target: ${path.join(claudeDir(installTarget), 'skills/tink')}` : null,
369
378
  includesCodex(agent) ? `Codex skills target: ${path.join(codexHome(), 'skills')}` : null,
370
379
  includesCodex(agent) ? `Codex picker cleanup target: ${path.join(process.cwd(), '.claude')}` : null
371
380
  ].filter(Boolean).join('\n');
@@ -710,12 +719,12 @@ function copyDir(src, dest, base) {
710
719
 
711
720
  function copyTinkCommands(templateRoot, target) {
712
721
  const commandSrc = path.join(templateRoot, 'claude/commands/tink');
713
- const commandDest = path.join(target, '.claude/commands/tink');
714
- const flatCommandDest = path.join(target, '.claude/commands');
722
+ const commandDest = path.join(claudeDir(target), 'commands/tink');
723
+ const flatCommandDest = path.join(claudeDir(target), 'commands');
715
724
  const legacyFlatCommands = ['tink-setup.md', 'tink-forge.md', 'tink-list.md', 'tink-purge.md', 'tink-hone.md'];
716
725
  const legacyNamespaceCommands = ['forge.md', 'purge.md', 'hone.md'];
717
726
  const legacyTinyCommands = ['tiny-setup.md', 'tiny-use.md', 'tiny-list.md', 'tiny-save.md'];
718
- const legacyDirs = [path.join(flatCommandDest, 'tiny'), path.join(target, '.claude/skills/tiny')];
727
+ const legacyDirs = [path.join(flatCommandDest, 'tiny'), path.join(claudeDir(target), 'skills/tiny')];
719
728
  for (const name of legacyFlatCommands) {
720
729
  const legacy = path.join(flatCommandDest, name);
721
730
  if (fs.existsSync(legacy)) {
@@ -863,7 +872,7 @@ function hookCommandFor(scope, target) {
863
872
  }
864
873
 
865
874
  function registerClaudeHook(target, scope, base) {
866
- const settingsPath = path.join(target, '.claude/settings.json');
875
+ const settingsPath = path.join(claudeDir(target), 'settings.json');
867
876
  const settings = readJsonFile(settingsPath, {});
868
877
  const command = hookCommandFor(scope, target);
869
878
  settings.hooks ||= {};
@@ -893,7 +902,7 @@ function copySelected(scope, components, agent) {
893
902
  }
894
903
  if (wantsClaudeSkill(components)) {
895
904
  if (includesClaude(agent) && !cleanupCodexPicker) {
896
- copyDir(path.join(templateRoot, 'claude/skills'), path.join(target, '.claude/skills'), target);
905
+ copyDir(path.join(templateRoot, 'claude/skills'), path.join(claudeDir(target), 'skills'), target);
897
906
  }
898
907
  }
899
908
  if (wantsCodexSkills(components)) {
@@ -995,8 +1004,8 @@ function doneLineFor(agent) {
995
1004
 
996
1005
  function updateResultSummary(agent, targets) {
997
1006
  const locations = [
998
- includesClaude(agent) ? `Claude Code commands: ${path.join(targets.installTarget, '.claude/commands/tink')}` : null,
999
- includesClaude(agent) ? `Claude Code skill: ${path.join(targets.installTarget, '.claude/skills/tink')}` : null,
1007
+ includesClaude(agent) ? `Claude Code commands: ${path.join(claudeDir(targets.installTarget), 'commands/tink')}` : null,
1008
+ includesClaude(agent) ? `Claude Code skill: ${path.join(claudeDir(targets.installTarget), 'skills/tink')}` : null,
1000
1009
  includesCodex(agent) ? `Codex skills: ${path.join(targets.codexTarget, 'skills')}` : null,
1001
1010
  `Tink shared files: ${path.join(targets.installTarget, '.tink')}`
1002
1011
  ].filter(Boolean);
@@ -1216,12 +1225,119 @@ async function resolveChoices() {
1216
1225
  return { agent, scope, components, gitPolicy, hookScope, language };
1217
1226
  }
1218
1227
 
1228
+ function findAllTinkRepos() {
1229
+ const found = [];
1230
+ const skip = new Set(['node_modules', '.git', 'vendor', 'dist', 'build', 'out', 'target', '.cache']);
1231
+
1232
+ function scan(dir, depth) {
1233
+ if (depth > 4) return;
1234
+ let entries;
1235
+ try { entries = fs.readdirSync(dir, { withFileTypes: true }); } catch { return; }
1236
+ let hasTink = false;
1237
+ for (const entry of entries) {
1238
+ if (!entry.isDirectory()) continue;
1239
+ if (entry.name === '.tink') { hasTink = true; continue; }
1240
+ if (skip.has(entry.name) || entry.name.startsWith('.')) continue;
1241
+ scan(path.join(dir, entry.name), depth + 1);
1242
+ }
1243
+ if (hasTink) found.push(dir);
1244
+ }
1245
+
1246
+ scan(os.homedir(), 0);
1247
+ return found;
1248
+ }
1249
+
1250
+ function isDirenvAvailable() {
1251
+ return spawnSync('direnv', ['version'], { encoding: 'utf8' }).status === 0;
1252
+ }
1253
+
1254
+ function parseEnvrc(envrcPath, repoDir) {
1255
+ if (!fs.existsSync(envrcPath)) return {};
1256
+ const env = {};
1257
+ for (const line of fs.readFileSync(envrcPath, 'utf8').split('\n')) {
1258
+ const m = line.match(/^\s*export\s+([A-Z_][A-Z0-9_]*)=(.*)/);
1259
+ if (!m) continue;
1260
+ let val = m[2].trim().replace(/^["']|["']$/g, '');
1261
+ val = val
1262
+ .replace(/\$HOME|\bHOME\b/g, os.homedir())
1263
+ .replace(/\$PWD|\bPWD\b/g, repoDir)
1264
+ .replace(/^~/, os.homedir());
1265
+ env[m[1]] = val;
1266
+ }
1267
+ return env;
1268
+ }
1269
+
1270
+ async function runAllRepos() {
1271
+ const allRepos = findAllTinkRepos();
1272
+ const sourceRoot = path.resolve(root);
1273
+ const repos = allRepos.filter((r) => path.resolve(r) !== sourceRoot);
1274
+
1275
+ if (repos.length === 0) {
1276
+ console.log('No repos with Tink installed found under home directory.');
1277
+ return;
1278
+ }
1279
+
1280
+ const hasDirenv = isDirenvAvailable();
1281
+ const installScript = path.join(root, 'bin/install.js');
1282
+
1283
+ console.log(`Found ${repos.length} repo(s) with Tink installed:\n`);
1284
+ for (const repo of repos) {
1285
+ const envrc = path.join(repo, '.envrc');
1286
+ const envVars = hasDirenv ? {} : parseEnvrc(envrc, repo);
1287
+ const claudeTarget = envVars.CLAUDE_CONFIG_DIR
1288
+ ? envVars.CLAUDE_CONFIG_DIR
1289
+ : path.join(repo, '.claude');
1290
+ const note = fs.existsSync(envrc)
1291
+ ? hasDirenv
1292
+ ? `(direnv)`
1293
+ : envVars.CLAUDE_CONFIG_DIR
1294
+ ? `(.envrc → CLAUDE_CONFIG_DIR=${envVars.CLAUDE_CONFIG_DIR})`
1295
+ : `(.envrc, no CLAUDE_CONFIG_DIR)`
1296
+ : '';
1297
+ console.log(` ${repo} ${note}`);
1298
+ console.log(` → ${claudeTarget}/commands/tink`);
1299
+ }
1300
+ console.log('');
1301
+
1302
+ for (const repo of repos) {
1303
+ console.log(`▶ ${path.basename(repo)} (${repo})`);
1304
+ const envrc = path.join(repo, '.envrc');
1305
+ const extraEnv = hasDirenv ? {} : parseEnvrc(envrc, repo);
1306
+ const mergedEnv = { ...process.env, ...extraEnv };
1307
+
1308
+ let result;
1309
+ if (hasDirenv && fs.existsSync(envrc)) {
1310
+ result = spawnSync(
1311
+ 'direnv', ['exec', repo, 'node', installScript, 'update', '--yes', '--scope=repo'],
1312
+ { cwd: repo, env: process.env, stdio: 'inherit', encoding: 'utf8' }
1313
+ );
1314
+ } else {
1315
+ result = spawnSync(
1316
+ process.execPath, [installScript, 'update', '--yes', '--scope=repo'],
1317
+ { cwd: repo, env: mergedEnv, stdio: 'inherit', encoding: 'utf8' }
1318
+ );
1319
+ }
1320
+
1321
+ if (result.status !== 0) {
1322
+ console.error(` ✗ failed (exit ${result.status})`);
1323
+ } else {
1324
+ console.log(` ✓ done`);
1325
+ }
1326
+ console.log('');
1327
+ }
1328
+ }
1329
+
1219
1330
  async function main() {
1220
1331
  if (command === 'help' || args.includes('--help')) {
1221
1332
  usage();
1222
1333
  process.exit(0);
1223
1334
  }
1224
1335
 
1336
+ if (command === 'update' && args.includes('--all-repos')) {
1337
+ await runAllRepos();
1338
+ return;
1339
+ }
1340
+
1225
1341
  if (command === 'dashboard') {
1226
1342
  runDashboard();
1227
1343
  return;
package/commands/cast.md CHANGED
@@ -160,6 +160,38 @@ Optional current-run artifacts are created only when their harness is selected:
160
160
  - `goals.json`: current-run goals for `goal-checkpoint`; keep 2-6 goals, one active goal, status, done criteria, verification, and evidence.
161
161
  - `delegation.md`: handoff or parallel-work packets for `delegation-brief`; include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
162
162
 
163
+ ## Evidence Split
164
+ Evidence Split is a base-run habit, not a separate harness. It keeps real work small while the task is happening by splitting broad or uncertain work into evidence-sized packets.
165
+
166
+ Use Evidence Split at cast time and again during implementation when:
167
+ - the first plan has several uncertain facts,
168
+ - implementation starts coupling several files or concepts,
169
+ - a check fails and the next action is unclear,
170
+ - context is becoming broad or stale,
171
+ - independent verification, review, or handoff would reduce risk.
172
+
173
+ Skip it for tiny, obvious edits where a packet would not change the next action.
174
+
175
+ Packet vocabulary:
176
+ - `probe`: answer one unknown with 1-3 inputs.
177
+ - `patch`: make one narrow implementation change.
178
+ - `verify`: prove one success condition or failure recovery.
179
+ - `review`: inspect one risk, regression, or omission.
180
+ - `decision`: record one branch, chosen option, and evidence.
181
+
182
+ Represent packets in existing run state:
183
+ - `steps.json`: packetized steps and status.
184
+ - `context-map.json`: the input files, sources, or excluded context for each packet.
185
+ - `notes.md`: why work was split or re-split during implementation.
186
+ - `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
187
+
188
+ Safety defaults:
189
+ - Do not start workers, tmux panes, worktrees, or external agents automatically.
190
+ - Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
191
+ - Do not let multiple packets edit the same file concurrently.
192
+ - Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
193
+ - Keep each packet to 1-3 primary inputs when possible.
194
+
163
195
  Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
164
196
 
165
197
  ```json
@@ -480,12 +512,13 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
480
512
  - new pattern not covered yet
481
513
 
482
514
  These are task types, not harness names. Generic types (code change, bug fix, research, review, docs) default to the base run; a harness is added only when a specialized one genuinely fits.
483
- 6. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
515
+ 6. Apply the Evidence Split check before choosing harnesses. If it changes the next action, represent the first packets in `steps.json` and connect each packet to context or verification evidence in `context-map.json`. Keep this check lightweight and skip it for tiny work.
516
+ 7. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
484
517
  - If the request is an ambiguous idea, early product concept, or underspecified implementation prompt, prefer `requirements-interview` before planning or coding. This is the default harness when Stitch is expected to trigger for goal ambiguity or missing acceptance criteria.
485
518
  - If the request asks for a plan, architecture decision, large refactor, migration, or broad public contract change, consider `plan-consensus`.
486
519
  - If the work naturally splits into multiple durable milestones, add `goal-checkpoint` and create `.tink/current/goals.json` after approval.
487
520
  - If parallel review, independent verification, or handoff would reduce risk, add `delegation-brief` and create `.tink/current/delegation.md` after approval. This harness prepares briefs only; it never starts tmux, worktrees, workers, or external agents.
488
- 7. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
521
+ 8. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
489
522
  - Use `issue-triage` for issue/PR/QA intake, ready-for-agent briefs, needs-info/wontfix decisions, or vertical issue slices.
490
523
  - Use `bug-diagnosis-loop` for hard bugs, regressions, intermittent failures, or performance problems where a red-capable loop must come before code changes.
491
524
  - Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
@@ -496,7 +529,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
496
529
  - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.
497
530
  - `plan-consensus` must be explicitly considered for any from-scratch implementation, reimplementation, migration, or public contract/API design. If skipped, record a one-line reason in the 오버레이 점검 line.
498
531
  - The context budget and the "prefer 1-3 harnesses" guidance never justify dropping a REQUIRED overlay: overlays are cheap state files, not extra loaded context. A large task judged "fine with default harnesses" because the synthesis probe found a fit is a selection bug - the probe only answers whether a custom procedure is needed, not whether overlays are needed.
499
- 8. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
532
+ 9. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
500
533
 
501
534
  After selecting, run a quick quality check using the index metadata for each chosen harness:
502
535
  - If fewer than 2 words in `use_when` match the current task description (case-insensitive) → treat as a Stitch harness-mismatch signal
@@ -504,26 +537,26 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
504
537
  - If `asks` is empty or missing and the task goal is not self-evident → treat as a Stitch goal-ambiguity signal
505
538
  Feed any signals into the Stitch evaluation at step 16.
506
539
 
507
- 9. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
508
- 10. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
509
- 11. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
510
- 12. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
511
- 13. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
512
- 14. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
513
- 15. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
514
- 16. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
515
- 17. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
516
- 18. Ask for explicit approval before non-trivial work.
517
- 19. After approval, read only the selected harness files and any approved run-only draft.
518
- 20. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
519
- 21. Execute the first safe step immediately:
540
+ 10. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
541
+ 11. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
542
+ 12. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
543
+ 13. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
544
+ 14. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
545
+ 15. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
546
+ 16. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
547
+ 17. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
548
+ 18. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
549
+ 19. Ask for explicit approval before non-trivial work.
550
+ 20. After approval, read only the selected harness files and any approved run-only draft.
551
+ 21. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
552
+ 22. Execute the first safe step immediately:
520
553
  - inspect relevant files,
521
554
  - run a read-only diagnostic,
522
555
  - draft the first artifact,
523
556
  - or reproduce the issue.
524
- 22. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
525
- 23. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
526
- 24. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
557
+ 23. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. Re-run Evidence Split when new uncertainty, coupling, failed checks, or context sprawl appears; update packetized steps and context evidence before continuing. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
558
+ 24. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
559
+ 25. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
527
560
 
528
561
 
529
562
  ## Synthesis probe
@@ -91,17 +91,18 @@ Standalone CLI를 더 짧게 입력하고, 로컬 health report를 더 쉽게
91
91
  - `dashboard`는 기본적으로 로컬 정적 파일만 만든다. 서버, watcher, hidden cache, 자동 하네스 수정은 하지 않는다.
92
92
  - 생성 파일 경로가 플랫폼별로 안정화된 뒤에만 선택적인 open/export flag를 검토한다.
93
93
 
94
- ## Swarm Fast Lane
94
+ ## Evidence Split / Parallel Evidence
95
95
 
96
- 작업 병렬화를 위한 멀티 에이전트 하네스를 연구하되, Tink를 별도 multi-agent runtime으로 만들지 않는다. 상세 계획은 `docs/swarm-fast-lane.ko.md`와 `docs/swarm-fast-lane.md`에 둔다.
96
+ 작업 병렬화보다 먼저, Tink의 기본 작업 루프에 Evidence Split을 넣는다. Tink를 별도 multi-agent runtime으로 만들지 않고, 큰 작업을 작은 증거 packet으로 나누는 기본 동작부터 안정화한다. 상세 연구 기록은 `docs/swarm-fast-lane.ko.md`와 `docs/swarm-fast-lane.md`에 둔다.
97
97
 
98
- - worker는 전체 작업이 아니라 1-3개 입력만 가진 작은 packet 본다.
99
- - worker는 기본적으로 파일을 직접 수정하지 않고 evidence와 patch candidate만 반환한다.
98
+ - `/tink:cast`와 `$tink:cast`는 하네스 선택 전에 `probe`, `patch`, `verify`, `review`, `decision` packet으로 나눌 수 있는지 점검한다.
99
+ - 실제 작업 중에도 불확실성, 검증 실패, context 확대, 변경 결합이 생기면 다시 packet으로 나눈다.
100
+ - packet은 전체 작업이 아니라 1-3개 입력만 가진 작은 단위를 본다.
101
+ - 외부 worker가 필요할 때도 기본적으로 파일을 직접 수정하지 않고 evidence와 patch candidate만 반환한다.
100
102
  - 메인 에이전트만 최종 patch 선택, 파일 수정, 검증을 책임진다.
101
103
  - 성공 지표는 "항상 더 빠름"이 아니라 main context 감소, 재작업 감소, 실패 조기 발견, 검증 통과율 유지 또는 개선으로 둔다.
102
- - 초기 모드는 `parallel-probe`, `patch-candidate-race`, `micro-contract-split`, `speculative-verifier`, `context-starvation-mode` 후보를 검토한다.
103
- - `/tink:cast`와 `$tink:cast`가 하네스를 언제 선택하고 언제 거절할지 문서화한다.
104
- - worker 출력은 300단어 이하, evidence-only, confidence 포함으로 제한한다.
104
+ - 초기 모드는 core behavior인 Evidence Split으로 두고, 실제 worker runtime은 별도 후속 작업으로 미룬다.
105
+ - worker 출력은 future runtime에서도 300단어 이하, evidence-only, confidence 포함으로 제한한다.
105
106
  - public contract, secrets, 넓은 repo scan, 동일 파일 동시 수정이 필요한 작업에서는 선택하지 않는다.
106
107
 
107
108
  ## 제외
@@ -91,17 +91,18 @@ Make the standalone CLI easier to type and make the local health report easier t
91
91
  - Keep `dashboard` local and static by default: no server, watcher, hidden cache, or automatic harness edits.
92
92
  - Allow an optional open/export flag only after the generated file path behavior is stable across platforms.
93
93
 
94
- ## Swarm Fast Lane
94
+ ## Evidence Split / Parallel Evidence
95
95
 
96
- Research a multi-agent harness for parallel work without turning Tink into a separate multi-agent runtime. The detailed plan lives in `docs/swarm-fast-lane.ko.md` and `docs/swarm-fast-lane.md`.
96
+ Before adding parallel workers, add Evidence Split to Tink's default work loop. Tink should not become a separate multi-agent runtime; it should first make large work divisible into small evidence packets. The research notes live in `docs/swarm-fast-lane.ko.md` and `docs/swarm-fast-lane.md`.
97
97
 
98
- - Workers see small packets with only 1-3 inputs, not the whole task.
99
- - Workers do not edit files by default; they return evidence and patch candidates.
98
+ - `/tink:cast` and `$tink:cast` check whether work should split into `probe`, `patch`, `verify`, `review`, or `decision` packets before harness selection.
99
+ - During implementation, Tink re-splits work when uncertainty, failed checks, context sprawl, or coupled changes appear.
100
+ - Packets see only 1-3 inputs, not the whole task.
101
+ - If external workers are used later, they do not edit files by default; they return evidence and patch candidates.
100
102
  - The main agent owns final patch selection, file edits, and verification.
101
103
  - Success is measured by less main-agent context, less rework, earlier failure detection, and equal or better verification pass rate, not by claiming universal raw speed.
102
- - Initial mode candidates are `parallel-probe`, `patch-candidate-race`, `micro-contract-split`, `speculative-verifier`, and `context-starvation-mode`.
103
- - `/tink:cast` and `$tink:cast` should document when to select or reject this harness.
104
- - Worker output is capped at 300 words and must include evidence and confidence.
104
+ - The initial implementation is the core Evidence Split behavior; actual worker runtime remains deferred.
105
+ - Future worker output should be capped at 300 words and include evidence and confidence.
105
106
  - Do not select it for unclear public contracts, secrets, broad repository scans, or same-file concurrent edits.
106
107
 
107
108
  ## Excluded
@@ -1,16 +1,16 @@
1
- # Swarm Fast Lane 연구 계획
1
+ # Evidence Split / Parallel Evidence 연구 계획
2
2
 
3
- 이 문서는 멀티 에이전트를 작업 병렬화에 쓰되, Tink가 별도 거대 런타임이 되지 않도록 제한하는 연구 계획이다. 목표는 "에이전트를 많이 띄우기"가 아니라, 작은 컨텍스트 패킷을 병렬로 탐색해 메인 에이전트의 재작업과 전체 컨텍스트 부담을 줄이는 것이다.
3
+ 이 문서는 멀티 에이전트 작업 병렬화의 단계로, Tink가 작업을 작은 evidence packet으로 나누는 기본 동작을 갖도록 제한하는 연구 계획이다. 목표는 "에이전트를 많이 띄우기"가 아니라, 작은 컨텍스트 패킷으로 조사, 수정, 검증, 리뷰, 결정을 분리해 메인 에이전트의 재작업과 전체 컨텍스트 부담을 줄이는 것이다.
4
4
 
5
5
  ## 문제 정의
6
6
 
7
7
  일반적인 멀티 에이전트 병렬화는 토큰을 더 많이 쓴다. 각 worker가 같은 문맥을 다시 읽고, 서로 다른 수정이 충돌하며, 메인 에이전트가 합산 비용을 다시 치르기 때문이다.
8
8
 
9
- `swarm-fast-lane`은 이 문제를 반대로 접근한다.
9
+ Evidence Split은 이 문제를 반대로 접근한다.
10
10
 
11
- - worker가 전체 작업을 이해하지 않는다.
12
- - worker가 넓은 파일을 읽지 않는다.
13
- - worker가 기본적으로 직접 수정하지 않는다.
11
+ - packet이 전체 작업을 이해하지 않는다.
12
+ - packet이 넓은 파일을 읽지 않는다.
13
+ - 외부 worker가 쓰이더라도 기본적으로 직접 수정하지 않는다.
14
14
  - worker 출력은 짧은 evidence와 patch candidate로 제한한다.
15
15
  - 메인 에이전트만 최종 경로를 선택하고 파일을 수정한다.
16
16
 
@@ -80,14 +80,16 @@ worker는 파일 수정 없이 관련 파일, 위험, 테스트 후보만 찾는
80
80
 
81
81
  worker에게 의도적으로 불완전한 최소 컨텍스트만 준다. 목적은 좋은 구현이 아니라, 작은 정보로도 잡히는 문제를 싸게 찾는 것이다.
82
82
 
83
- ## 하네스 계약
83
+ ## Core Behavior 계약
84
84
 
85
- `swarm-fast-lane` 하네스는 다음 조건을 만족할 때만 선택한다.
85
+ Evidence Split은 별도 하네스가 아니라 `/tink:cast`와 `$tink:cast`의 기본 동작이다. 다음 조건에서 사용한다.
86
86
 
87
87
  - 작업이 2-5개의 독립 packet으로 나뉜다.
88
88
  - 각 packet은 입력 파일 또는 질문이 1-3개로 제한된다.
89
- - worker의 출력은 300단어 이하로 제한된다.
90
- - worker는 기본적으로 직접 파일을 수정하지 않는다.
89
+ - packet type은 `probe`, `patch`, `verify`, `review`, `decision` 중 하나다.
90
+ - 실제 작업 불확실성, 검증 실패, context 확대, 변경 결합이 생기면 다시 packet으로 나눈다.
91
+ - 외부 worker의 출력은 future runtime에서도 300단어 이하로 제한한다.
92
+ - 외부 worker는 기본적으로 직접 파일을 수정하지 않는다.
91
93
  - worker 출력에는 evidence, 추천 행동, confidence가 포함된다.
92
94
  - 메인 에이전트가 최종 patch와 검증을 책임진다.
93
95
 
@@ -118,15 +120,14 @@ worker에게 의도적으로 불완전한 최소 컨텍스트만 준다. 목적
118
120
 
119
121
  첫 구현 slice는 다음을 완료로 본다.
120
122
 
121
- - `swarm-fast-lane` 하네스 초안이 있다.
122
- - `/tink:cast`와 `$tink:cast`가 하네스를 언제 선택하고 언제 거절할지 문서화되어 있다.
123
- - worker packet 형식이 `.tink/current/delegation.md` 또는 별도 run artifact로 표현된다.
123
+ - Evidence Split이 Tink core rules와 `/tink:cast`, `$tink:cast` 문서에 기본 동작으로 들어간다.
124
+ - packet 형식이 `steps.json`, `context-map.json`, `notes.md`, 필요 `.tink/current/delegation.md`로 표현된다.
124
125
  - worker 직접 수정은 기본 비활성이다.
125
- - 최소 하나의 fixture 또는 문서 예제가 있다.
126
+ - 작은 작업에서는 생략 가능하다는 lightweight rule이 있다.
126
127
  - 검증은 "더 빠름"을 단정하지 않고, context 감소와 재작업 감소 근거를 기록한다.
127
128
 
128
129
  ## 열린 질문
129
130
 
130
131
  - 실제 worker 실행은 Codex/Claude Code의 기존 기능을 얇게 호출할지, Tink는 packet 문서화까지만 할지 결정해야 한다.
131
- - worker 결과 schema를 `delegation-brief`에 통합할지, 별도 하네스로 둘지 결정해야 한다.
132
- - fast lane이라는 이름이 과도한 성능 보장을 암시하지 않도록 사용자 문구를 조정해야 한다.
132
+ - worker 결과 schema를 `delegation-brief`에 통합할지, 별도 runtime artifact로 둘지 결정해야 한다.
133
+ - `swarm-fast-lane` 이름은 연구 문서의 임시 이름으로만 남기고, 사용자 문구는 Evidence Split 또는 Parallel Evidence를 우선한다.
@@ -1,16 +1,16 @@
1
- # Swarm Fast Lane Research Plan
1
+ # Evidence Split / Parallel Evidence Research Plan
2
2
 
3
- This document describes a constrained research plan for using multi-agent parallelism without turning Tink into a large standalone runtime. The goal is not to spawn more agents by default. The goal is to split work into tiny context packets so workers can explore independent evidence while the main agent reduces rework and context load.
3
+ This document describes the step before multi-agent parallelism: Tink should first split large work into small evidence packets without becoming a separate runtime. The goal is not to spawn more agents by default. The goal is to separate probe, patch, verify, review, and decision work into tiny context packets so the main agent reduces rework and context load.
4
4
 
5
5
  ## Problem
6
6
 
7
7
  Naive multi-agent parallelism usually spends more tokens. Each worker rereads context, independent edits conflict, and the main agent still pays a reconciliation cost.
8
8
 
9
- `swarm-fast-lane` inverts that model.
9
+ Evidence Split inverts that model.
10
10
 
11
- - Workers do not understand the whole task.
12
- - Workers do not read broad context.
13
- - Workers do not edit files by default.
11
+ - Packets do not understand the whole task.
12
+ - Packets do not read broad context.
13
+ - If external workers are used, they do not edit files by default.
14
14
  - Worker output is limited to short evidence and patch candidates.
15
15
  - The main agent chooses the final path and owns file edits.
16
16
 
@@ -80,14 +80,16 @@ Workers look only for reasons the current implementation approach will fail. Thi
80
80
 
81
81
  Workers intentionally receive incomplete minimal context. The point is not high-quality implementation; it is cheaply detecting problems that are visible with little information.
82
82
 
83
- ## Harness Contract
83
+ ## Core Behavior Contract
84
84
 
85
- The `swarm-fast-lane` harness is eligible only when:
85
+ Evidence Split is not a separate harness. It is default behavior inside `/tink:cast` and `$tink:cast`. Use it when:
86
86
 
87
87
  - the task splits into 2-5 independent packets
88
88
  - each packet is limited to 1-3 input files or questions
89
- - each worker output is limited to 300 words
90
- - workers do not edit files by default
89
+ - each packet type is `probe`, `patch`, `verify`, `review`, or `decision`
90
+ - work should be re-split during implementation because uncertainty, failed checks, context sprawl, or coupled changes appeared
91
+ - future worker output is limited to 300 words
92
+ - external workers do not edit files by default
91
93
  - worker output includes evidence, recommended action, and confidence
92
94
  - the main agent owns final patching and verification
93
95
 
@@ -118,15 +120,14 @@ The first version can start with estimates, but run artifacts should record evid
118
120
 
119
121
  The first implementation slice is done when:
120
122
 
121
- - a `swarm-fast-lane` harness draft exists
122
- - `/tink:cast` and `$tink:cast` document when to select or reject it
123
- - worker packet format is represented in `.tink/current/delegation.md` or another run artifact
123
+ - Evidence Split is documented as default behavior in Tink core rules and `/tink:cast`, `$tink:cast`
124
+ - packet format is represented in `steps.json`, `context-map.json`, `notes.md`, and optionally `.tink/current/delegation.md`
124
125
  - direct worker edits are disabled by default
125
- - at least one fixture or example exists
126
+ - tiny work can skip the packet ceremony
126
127
  - verification records context reduction and rework reduction evidence instead of claiming raw speed
127
128
 
128
129
  ## Open Questions
129
130
 
130
131
  - Should actual worker execution call existing Codex/Claude Code features, or should Tink only document packets?
131
- - Should worker result schema extend `delegation-brief`, or should this be a separate harness?
132
- - Should the user-facing name avoid implying guaranteed speed?
132
+ - Should worker result schema extend `delegation-brief`, or should it use a separate runtime artifact?
133
+ - Keep `swarm-fast-lane` only as a research placeholder; prefer Evidence Split or Parallel Evidence in user-facing copy.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tink-harness",
3
- "version": "1.13.0",
3
+ "version": "1.14.0",
4
4
  "description": "Self-growing harnesses for Claude Code and Codex.",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -160,6 +160,38 @@ Optional current-run artifacts are created only when their harness is selected:
160
160
  - `goals.json`: current-run goals for `goal-checkpoint`; keep 2-6 goals, one active goal, status, done criteria, verification, and evidence.
161
161
  - `delegation.md`: handoff or parallel-work packets for `delegation-brief`; include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
162
162
 
163
+ ## Evidence Split
164
+ Evidence Split is a base-run habit, not a separate harness. It keeps real work small while the task is happening by splitting broad or uncertain work into evidence-sized packets.
165
+
166
+ Use Evidence Split at cast time and again during implementation when:
167
+ - the first plan has several uncertain facts,
168
+ - implementation starts coupling several files or concepts,
169
+ - a check fails and the next action is unclear,
170
+ - context is becoming broad or stale,
171
+ - independent verification, review, or handoff would reduce risk.
172
+
173
+ Skip it for tiny, obvious edits where a packet would not change the next action.
174
+
175
+ Packet vocabulary:
176
+ - `probe`: answer one unknown with 1-3 inputs.
177
+ - `patch`: make one narrow implementation change.
178
+ - `verify`: prove one success condition or failure recovery.
179
+ - `review`: inspect one risk, regression, or omission.
180
+ - `decision`: record one branch, chosen option, and evidence.
181
+
182
+ Represent packets in existing run state:
183
+ - `steps.json`: packetized steps and status.
184
+ - `context-map.json`: the input files, sources, or excluded context for each packet.
185
+ - `notes.md`: why work was split or re-split during implementation.
186
+ - `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
187
+
188
+ Safety defaults:
189
+ - Do not start workers, tmux panes, worktrees, or external agents automatically.
190
+ - Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
191
+ - Do not let multiple packets edit the same file concurrently.
192
+ - Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
193
+ - Keep each packet to 1-3 primary inputs when possible.
194
+
163
195
  Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
164
196
 
165
197
  ```json
@@ -480,12 +512,13 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
480
512
  - new pattern not covered yet
481
513
 
482
514
  These are task types, not harness names. Generic types (code change, bug fix, research, review, docs) default to the base run; a harness is added only when a specialized one genuinely fits.
483
- 6. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
515
+ 6. Apply the Evidence Split check before choosing harnesses. If it changes the next action, represent the first packets in `steps.json` and connect each packet to context or verification evidence in `context-map.json`. Keep this check lightweight and skip it for tiny work.
516
+ 7. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
484
517
  - If the request is an ambiguous idea, early product concept, or underspecified implementation prompt, prefer `requirements-interview` before planning or coding. This is the default harness when Stitch is expected to trigger for goal ambiguity or missing acceptance criteria.
485
518
  - If the request asks for a plan, architecture decision, large refactor, migration, or broad public contract change, consider `plan-consensus`.
486
519
  - If the work naturally splits into multiple durable milestones, add `goal-checkpoint` and create `.tink/current/goals.json` after approval.
487
520
  - If parallel review, independent verification, or handoff would reduce risk, add `delegation-brief` and create `.tink/current/delegation.md` after approval. This harness prepares briefs only; it never starts tmux, worktrees, workers, or external agents.
488
- 7. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
521
+ 8. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
489
522
  - Use `issue-triage` for issue/PR/QA intake, ready-for-agent briefs, needs-info/wontfix decisions, or vertical issue slices.
490
523
  - Use `bug-diagnosis-loop` for hard bugs, regressions, intermittent failures, or performance problems where a red-capable loop must come before code changes.
491
524
  - Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
@@ -496,7 +529,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
496
529
  - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.
497
530
  - `plan-consensus` must be explicitly considered for any from-scratch implementation, reimplementation, migration, or public contract/API design. If skipped, record a one-line reason in the 오버레이 점검 line.
498
531
  - The context budget and the "prefer 1-3 harnesses" guidance never justify dropping a REQUIRED overlay: overlays are cheap state files, not extra loaded context. A large task judged "fine with default harnesses" because the synthesis probe found a fit is a selection bug - the probe only answers whether a custom procedure is needed, not whether overlays are needed.
499
- 8. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
532
+ 9. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
500
533
 
501
534
  After selecting, run a quick quality check using the index metadata for each chosen harness:
502
535
  - If fewer than 2 words in `use_when` match the current task description (case-insensitive) → treat as a Stitch harness-mismatch signal
@@ -504,26 +537,26 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
504
537
  - If `asks` is empty or missing and the task goal is not self-evident → treat as a Stitch goal-ambiguity signal
505
538
  Feed any signals into the Stitch evaluation at step 16.
506
539
 
507
- 9. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
508
- 10. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
509
- 11. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
510
- 12. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
511
- 13. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
512
- 14. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
513
- 15. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
514
- 16. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
515
- 17. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
516
- 18. Ask for explicit approval before non-trivial work.
517
- 19. After approval, read only the selected harness files and any approved run-only draft.
518
- 20. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
519
- 21. Execute the first safe step immediately:
540
+ 10. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
541
+ 11. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
542
+ 12. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
543
+ 13. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
544
+ 14. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
545
+ 15. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
546
+ 16. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
547
+ 17. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
548
+ 18. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
549
+ 19. Ask for explicit approval before non-trivial work.
550
+ 20. After approval, read only the selected harness files and any approved run-only draft.
551
+ 21. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
552
+ 22. Execute the first safe step immediately:
520
553
  - inspect relevant files,
521
554
  - run a read-only diagnostic,
522
555
  - draft the first artifact,
523
556
  - or reproduce the issue.
524
- 22. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
525
- 23. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
526
- 24. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
557
+ 23. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. Re-run Evidence Split when new uncertainty, coupling, failed checks, or context sprawl appears; update packetized steps and context evidence before continuing. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
558
+ 24. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
559
+ 25. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
527
560
 
528
561
 
529
562
  ## Synthesis probe
@@ -26,23 +26,24 @@ Accept legacy `$tink <action>` spelling for compatibility, but present `$tink:<a
26
26
  6. If `.tink/current/` exists and continuity is uncertain, read `plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, and `contract.json` when present; summarize goal, last safe point, next step, open questions, and verification; then ask resume/archive/replace/cancel before continuing.
27
27
  7. Run the synthesis probe before committing to `.tink/current/`. Strong fit keeps the harness; generic fit adds a run-only draft; no fit loads `harness-synthesis`.
28
28
  8. If too many tools, skills, agents, or harnesses are available, use `harness-curation` to choose the smallest effective set before loading more context.
29
- 9. Treat visible-thinking and focused work workflows as ordinary Tink harness choices, not new commands. Actively consider them when their trigger changes the procedure: use `requirements-interview` for ambiguity, unclear scope, or missing acceptance criteria; `plan-consensus` for broad plans, migrations, API/schema/contract changes, or tradeoffs; `goal-checkpoint` for multi-file, multi-phase, resumed, release, or long runs; `delegation-brief` for handoff, independent verification, parallel review, or another agent/human brief; `issue-triage` for issue/PR/QA intake or vertical slices; `bug-diagnosis-loop` for hard bugs that need a red-capable loop before code changes; `review-two-axis` for Standards/Spec diff review; `decision-map` for multi-session unresolved decisions; and `architecture-deepening` for deep module, interface, seam, leverage, locality, or testability work.
30
- 10. Run Stitch once before committing to `.tink/current/`: evaluate every time, show exactly one proposal only for high-impact quality or safety branches, and use the configured language.
31
- 11. For non-trivial `$tink:cast` runs, ask for current-run approval before creating `.tink/current/`, loading harness bodies, editing files, or executing the first step. Codex must not silently treat a command invocation as approval.
32
- 12. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
33
- 13. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
34
- 14. Current-run approval never authorizes reusable-state writes. Before saving reusable state, show operation, destination files, exact entry or patch summary, reusable reason, sensitive content excluded, and rollback/removal path.
35
- 15. Before saving a reusable rule graph update, run a structural gate: duplicate, breadth, evidence, verification, Claude Code/Codex compatibility, macOS/Windows compatibility, and portable commands. AI may propose a rule; saving it still requires separate approval.
36
- 16. `$tink:frog` may inspect rule quality as well as harness quality. Prefer keep, rewrite, split, merge, or needs-evidence recommendations before any removal proposal.
37
- 17. For `$tink:weave` or `$tink:frog`, prepare the harness health summary before ranking candidates. If `.tink/tools/generate-harness-lifecycle-summary.mjs` exists, run `node .tink/tools/generate-harness-lifecycle-summary.mjs` from the repo root and then read `.tink/maintenance/harness-lifecycle.json`. If the generator is missing, continue from compact run, queue, ledger, and friction evidence.
38
- 18. When `.tink/maintenance/harness-lifecycle.json` or another file following `.tink/schemas/harness-lifecycle.schema.json` exists, treat it as a plain harness health summary. Use `confidence`, `evidence_grade`, `evidence_handles`, and `safe_next_action` to prioritize `$tink:weave` or `$tink:frog` candidates, but do not treat it as approval. Low-confidence entries stay as observation. Harness edits, rule updates, memory saves, merges, archives, and deletions still require the reusable-state approval gate.
39
- 19. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `.tink/current/goals.json` for `goal-checkpoint` and `.tink/current/delegation.md` for `delegation-brief`.
40
- 20. Do not stop at recommendation. Execute the first safe step after run state exists.
41
- 21. Run `$tink:verify` behavior before final when `contract.json` lists required checks. If `.tink/config.json` has `completion_policy: "strict"`, do not call the run done until required checks are represented in `.tink/current/verification.json`, `.tink/current/evidence.md` exists, and remaining risk is stated.
42
- 22. Store reusable memory or rule updates under `.tink/` only after separate approval.
43
- 23. If a check fails, update `.tink/current/notes.md`, state the failure, last safe point, and next single action. Append compact friction to `.tink/maintenance/friction.jsonl` when it exists. Feed repeated failures to `$tink:weave`.
44
- 24. Keep context compact. Do not paste raw logs or full diffs.
45
- 25. Use calm, clear, concise language. Prefer plain everyday words over technical terms. No jokes.
29
+ 9. Treat Evidence Split as a base-run habit, not a harness: for non-trivial work, first ask whether the task should be split into `probe`, `patch`, `verify`, `review`, or `decision` packets. Use it at cast time and again during implementation when uncertainty grows, a check fails, context gets broad, or several changes start to couple. Keep it lightweight for tiny tasks and skip it when it would add ceremony without changing the next action.
30
+ 10. Treat visible-thinking and focused work workflows as ordinary Tink harness choices, not new commands. Actively consider them when their trigger changes the procedure: use `requirements-interview` for ambiguity, unclear scope, or missing acceptance criteria; `plan-consensus` for broad plans, migrations, API/schema/contract changes, or tradeoffs; `goal-checkpoint` for multi-file, multi-phase, resumed, release, or long runs; `delegation-brief` for handoff, independent verification, parallel review, or another agent/human brief; `issue-triage` for issue/PR/QA intake or vertical slices; `bug-diagnosis-loop` for hard bugs that need a red-capable loop before code changes; `review-two-axis` for Standards/Spec diff review; `decision-map` for multi-session unresolved decisions; and `architecture-deepening` for deep module, interface, seam, leverage, locality, or testability work.
31
+ 11. Run Stitch once before committing to `.tink/current/`: evaluate every time, show exactly one proposal only for high-impact quality or safety branches, and use the configured language.
32
+ 12. For non-trivial `$tink:cast` runs, ask for current-run approval before creating `.tink/current/`, loading harness bodies, editing files, or executing the first step. Codex must not silently treat a command invocation as approval.
33
+ 13. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
34
+ 14. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
35
+ 15. Current-run approval never authorizes reusable-state writes. Before saving reusable state, show operation, destination files, exact entry or patch summary, reusable reason, sensitive content excluded, and rollback/removal path.
36
+ 16. Before saving a reusable rule graph update, run a structural gate: duplicate, breadth, evidence, verification, Claude Code/Codex compatibility, macOS/Windows compatibility, and portable commands. AI may propose a rule; saving it still requires separate approval.
37
+ 17. `$tink:frog` may inspect rule quality as well as harness quality. Prefer keep, rewrite, split, merge, or needs-evidence recommendations before any removal proposal.
38
+ 18. For `$tink:weave` or `$tink:frog`, prepare the harness health summary before ranking candidates. If `.tink/tools/generate-harness-lifecycle-summary.mjs` exists, run `node .tink/tools/generate-harness-lifecycle-summary.mjs` from the repo root and then read `.tink/maintenance/harness-lifecycle.json`. If the generator is missing, continue from compact run, queue, ledger, and friction evidence.
39
+ 19. When `.tink/maintenance/harness-lifecycle.json` or another file following `.tink/schemas/harness-lifecycle.schema.json` exists, treat it as a plain harness health summary. Use `confidence`, `evidence_grade`, `evidence_handles`, and `safe_next_action` to prioritize `$tink:weave` or `$tink:frog` candidates, but do not treat it as approval. Low-confidence entries stay as observation. Harness edits, rule updates, memory saves, merges, archives, and deletions still require the reusable-state approval gate.
40
+ 20. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `.tink/current/goals.json` for `goal-checkpoint` and `.tink/current/delegation.md` for `delegation-brief`. Evidence Split packets live in these run files; do not add a new public command or standalone runtime file for them.
41
+ 21. Do not stop at recommendation. Execute the first safe step after run state exists.
42
+ 22. Run `$tink:verify` behavior before final when `contract.json` lists required checks. If `.tink/config.json` has `completion_policy: "strict"`, do not call the run done until required checks are represented in `.tink/current/verification.json`, `.tink/current/evidence.md` exists, and remaining risk is stated.
43
+ 23. Store reusable memory or rule updates under `.tink/` only after separate approval.
44
+ 24. If a check fails, update `.tink/current/notes.md`, state the failure, last safe point, and next single action. Append compact friction to `.tink/maintenance/friction.jsonl` when it exists. Feed repeated failures to `$tink:weave`.
45
+ 25. Keep context compact. Do not paste raw logs or full diffs.
46
+ 26. Use calm, clear, concise language. Prefer plain everyday words over technical terms. No jokes.
46
47
 
47
48
  ## Codex Approval Protocol
48
49
 
@@ -120,6 +121,39 @@ Optional current-run artifacts:
120
121
  - `.tink/current/goals.json`: create only when `goal-checkpoint` is selected. Keep 2-6 goals, one active goal, status, done criteria, verification, evidence, and next action.
121
122
  - `.tink/current/delegation.md`: create only when `delegation-brief` is selected. Include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
122
123
 
124
+ ## Evidence Split
125
+
126
+ Evidence Split is Tink's default way to keep real work small while it is happening. It is not a separate harness and it does not imply parallel execution.
127
+
128
+ Use Evidence Split when a task is non-trivial and any of these signals appears:
129
+ - the first plan has several uncertain facts,
130
+ - implementation starts coupling several files or concepts,
131
+ - a check fails and the next action is unclear,
132
+ - context is becoming broad or stale,
133
+ - independent verification, review, or handoff would reduce risk.
134
+
135
+ Skip it for tiny, obvious edits where a packet would not change the next action.
136
+
137
+ Packet vocabulary:
138
+ - `probe`: answer one unknown with 1-3 inputs.
139
+ - `patch`: make one narrow implementation change.
140
+ - `verify`: prove one success condition or failure recovery.
141
+ - `review`: inspect one risk, regression, or omission.
142
+ - `decision`: record one branch, chosen option, and evidence.
143
+
144
+ Represent packets in existing run state:
145
+ - `steps.json`: packetized steps and status.
146
+ - `context-map.json`: the input files, sources, or excluded context for each packet.
147
+ - `notes.md`: why work was split or re-split during implementation.
148
+ - `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
149
+
150
+ Safety defaults:
151
+ - Do not start workers, tmux panes, worktrees, or external agents automatically.
152
+ - Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
153
+ - Do not let multiple packets edit the same file concurrently.
154
+ - Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
155
+ - Keep each packet to 1-3 primary inputs when possible.
156
+
123
157
  GJC-style harness selection rules:
124
158
 
125
159
  - Ambiguous ideas, early product concepts, vague bug reports, broad "make it better" requests, and underspecified implementation prompts should start with `requirements-interview`, usually alone until the user clarifies enough to plan or code.