npm - tink-harness - Versions diffs - 1.13.0 → 1.14.0 - Mend

tink-harness 1.13.0 → 1.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +5 -0
package/README.ko.md +1 -1
package/README.md +1 -1
package/VERSIONING.md +1 -1
package/bin/install.js +126 -10
package/commands/cast.md +52 -19
package/docs/planned-work-units.ko.md +8 -7
package/docs/planned-work-units.md +8 -7
package/docs/swarm-fast-lane.ko.md +17 -16
package/docs/swarm-fast-lane.md +17 -16
package/package.json +1 -1
package/templates/claude/commands/tink/cast.md +52 -19
package/templates/codex/skills/tink-core/RULES.md +51 -17

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "tink",
   "description": "A small harness layer for Claude Code and Codex.",
-  "version": "1.13.0",
+  "version": "1.14.0",
   "author": {
     "name": "dotori"
   }

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,11 @@
 All notable changes to Tink are tracked here.
+## [1.14.0] - 2026-06-19
+- Added `CLAUDE_CONFIG_DIR` support: global installs now respect the env var (set via direnv or shell) so commands and skills land in the right config directory instead of always defaulting to `~/.claude`.
+- Added `tink-harness update --all-repos`: finds every repo under the home directory that has Tink installed and updates each one. Uses `direnv exec` when available so per-repo `.envrc` overrides (including `CLAUDE_CONFIG_DIR`) are applied automatically; falls back to parsing simple `export` lines from `.envrc` otherwise.
 ## [1.13.0] - 2026-06-19
 - Added focused opt-in harnesses for recurring agent workflows: `issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, and `architecture-deepening`.

package/README.ko.md CHANGED Viewed

@@ -10,7 +10,7 @@ Tink는 사소하지 않은 모든 에이전트 작업을 눈에 보이는 파
 <sub>Claude Code와 Codex를 위한 작은 하네스 레이어</sub>
-**최신 패키지:** v1.13.0 — 이슈 정리, 어려운 버그 진단 루프, 두 축 리뷰, decision map, architecture deepening을 위한 focused opt-in 하네스를 추가하고 Claude Code·Codex 양쪽 cast 라우팅과 문서를 갱신했습니다. 전체 변경 이력은 [CHANGELOG](CHANGELOG.md)를 확인하세요.
+**최신 패키지:** v1.14.0 — 글로벌 설치 시 `CLAUDE_CONFIG_DIR` 환경변수를 반영하고, `update --all-repos`로 홈 하위 모든 Tink 레포를 한 번에 업데이트할 수 있게 됐습니다. direnv가 있으면 레포별 `.envrc`를 자동으로 로드합니다. 전체 변경 이력은 [CHANGELOG](CHANGELOG.md)를 확인하세요.
 [English](README.md) · **한국어** · [변경 이력](CHANGELOG.md)

package/README.md CHANGED Viewed

@@ -24,7 +24,7 @@
   <a href="https://github.com/dotoricode/tink-harness/stargazers"><img src="https://img.shields.io/github/stars/dotoricode/tink-harness?style=social" alt="GitHub stars"></a>
 </p>
-<p><strong>Latest package:</strong> v1.13.0 - Tink adds focused opt-in harnesses for issue triage, hard-bug diagnosis loops, two-axis reviews, decision maps, and architecture deepening, with cast routing and docs updated for both Claude Code and Codex. See <a href="CHANGELOG.md">CHANGELOG</a> for release history.</p>
+<p><strong>Latest package:</strong> v1.14.0 - Tink respects <code>CLAUDE_CONFIG_DIR</code> for global installs and adds <code>update --all-repos</code> to refresh every Tink-installed repo in one command, with direnv support for per-repo env overrides. See <a href="CHANGELOG.md">CHANGELOG</a> for release history.</p>
 **English** · [한국어](README.ko.md) · [Changelog](CHANGELOG.md)

package/VERSIONING.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Versioning
-Current version: `1.13.0`
+Current version: `1.14.0`
 Tink follows semver from `1.0.0` onward.

package/bin/install.js CHANGED Viewed

@@ -126,7 +126,7 @@ function argValue(name) {
 }
 function usage() {
-  console.log(`Tink installer for Claude Code and Codex\n\nUsage:\n  tink-harness [install] [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--with-hook] [--clean-codex-picker] [--dry-run] [--force]\n  tink-harness update [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--clean-codex-picker] [--dry-run] [--force]\n  tink-harness dashboard [--no-open]\n\nIf the command is not installed yet, use:\n  npx tink-harness@latest [install]\n  npx tink-harness@latest update\n\nCommands:\n  install  Install Tink.\n  update   Update Tink to the latest templates. Asks only the agent surface; Tink-owned files always refresh, user-modified harness/memory/config files are kept.\n  dashboard  Generate the harness health report from local .tink records and open it in your browser. Use --no-open to skip opening.\n\nDefault interactive flow:\n  1. Select language\n  2. Show TINK wizard\n  3. Select Claude Code, Codex, or both\n  4. Select components\n  5. Select repo/global installation scope\n  6. Select Advanced options\n  7. Select git tracking policy for project state\n\nAdvanced options:\n  --dry-run             Preview only. Show what would be written or removed, but do not change files.\n  --force               Overwrite user-modified files. Use only when you want official templates to replace local edits.\n  --clean-codex-picker  Codex-only cleanup. Remove repo-local Claude Tink surfaces that show as Source Command Tink entries.\n\nEnvironment:\n  TINK_INSTALL_SURFACES=claude|codex|all\n  TINK_CLEAN_CODEX_PICKER=1\n\nScopes:\n  repo    Install shared .tink files into the current project.\n  global  Install shared .tink files into your home directory.\n`);
+  console.log(`Tink installer for Claude Code and Codex\n\nUsage:\n  tink-harness [install] [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--with-hook] [--clean-codex-picker] [--dry-run] [--force]\n  tink-harness update [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--clean-codex-picker] [--dry-run] [--force]\n  tink-harness update --all-repos\n  tink-harness dashboard [--no-open]\n\nIf the command is not installed yet, use:\n  npx tink-harness@latest [install]\n  npx tink-harness@latest update\n\nCommands:\n  install  Install Tink.\n  update   Update Tink to the latest templates. Asks only the agent surface; Tink-owned files always refresh, user-modified harness/memory/config files are kept.\n  dashboard  Generate the harness health report from local .tink records and open it in your browser. Use --no-open to skip opening.\n\nDefault interactive flow:\n  1. Select language\n  2. Show TINK wizard\n  3. Select Claude Code, Codex, or both\n  4. Select components\n  5. Select repo/global installation scope\n  6. Select Advanced options\n  7. Select git tracking policy for project state\n\nAdvanced options:\n  --dry-run             Preview only. Show what would be written or removed, but do not change files.\n  --force               Overwrite user-modified files. Use only when you want official templates to replace local edits.\n  --clean-codex-picker  Codex-only cleanup. Remove repo-local Claude Tink surfaces that show as Source Command Tink entries.\n  --all-repos           Update all repos with Tink under the home directory. Uses direnv if available to load per-repo .envrc.\n\nEnvironment:\n  TINK_INSTALL_SURFACES=claude|codex|all\n  TINK_CLEAN_CODEX_PICKER=1\n  CLAUDE_CONFIG_DIR  Override ~/.claude for global installs (e.g. set by direnv per project)\n  CODEX_HOME         Override ~/.codex for Codex skill installs\n\nScopes:\n  repo    Install shared .tink files into the current project.\n  global  Install shared .tink files into your home directory.\n`);
 }
 function findTinkRoot() {
@@ -228,6 +228,15 @@ function codexHome() {
   return process.env.CODEX_HOME || path.join(os.homedir(), '.codex');
 }
+// CLAUDE_CONFIG_DIR replaces ~/.claude for global installs (like direnv per-project overrides).
+// Repo-scope installs always use <repo>/.claude regardless of this env var.
+function claudeDir(target) {
+  if (process.env.CLAUDE_CONFIG_DIR && target === os.homedir()) {
+    return process.env.CLAUDE_CONFIG_DIR;
+  }
+  return path.join(target, '.claude');
+}
 function legacyComponentOptionsFor(agent, language) {
   const options = COMPONENTS[language].filter((item) => {
     if (item.value === 'commands') return includesClaude(agent);
@@ -364,8 +373,8 @@ function locationSummary(agent, scope) {
   return [
     `Repo target: ${repoTarget}`,
     `Shared .tink target: ${path.join(installTarget, '.tink')}`,
-    includesClaude(agent) ? `Claude Code command target: ${path.join(installTarget, '.claude/commands/tink')}` : null,
-    includesClaude(agent) ? `Claude Code skill target: ${path.join(installTarget, '.claude/skills/tink')}` : null,
+    includesClaude(agent) ? `Claude Code command target: ${path.join(claudeDir(installTarget), 'commands/tink')}` : null,
+    includesClaude(agent) ? `Claude Code skill target: ${path.join(claudeDir(installTarget), 'skills/tink')}` : null,
     includesCodex(agent) ? `Codex skills target: ${path.join(codexHome(), 'skills')}` : null,
     includesCodex(agent) ? `Codex picker cleanup target: ${path.join(process.cwd(), '.claude')}` : null
   ].filter(Boolean).join('\n');
@@ -710,12 +719,12 @@ function copyDir(src, dest, base) {
 function copyTinkCommands(templateRoot, target) {
   const commandSrc = path.join(templateRoot, 'claude/commands/tink');
-  const commandDest = path.join(target, '.claude/commands/tink');
-  const flatCommandDest = path.join(target, '.claude/commands');
+  const commandDest = path.join(claudeDir(target), 'commands/tink');
+  const flatCommandDest = path.join(claudeDir(target), 'commands');
   const legacyFlatCommands = ['tink-setup.md', 'tink-forge.md', 'tink-list.md', 'tink-purge.md', 'tink-hone.md'];
   const legacyNamespaceCommands = ['forge.md', 'purge.md', 'hone.md'];
   const legacyTinyCommands = ['tiny-setup.md', 'tiny-use.md', 'tiny-list.md', 'tiny-save.md'];
-  const legacyDirs = [path.join(flatCommandDest, 'tiny'), path.join(target, '.claude/skills/tiny')];
+  const legacyDirs = [path.join(flatCommandDest, 'tiny'), path.join(claudeDir(target), 'skills/tiny')];
   for (const name of legacyFlatCommands) {
     const legacy = path.join(flatCommandDest, name);
     if (fs.existsSync(legacy)) {
@@ -863,7 +872,7 @@ function hookCommandFor(scope, target) {
 }
 function registerClaudeHook(target, scope, base) {
-  const settingsPath = path.join(target, '.claude/settings.json');
+  const settingsPath = path.join(claudeDir(target), 'settings.json');
   const settings = readJsonFile(settingsPath, {});
   const command = hookCommandFor(scope, target);
   settings.hooks ||= {};
@@ -893,7 +902,7 @@ function copySelected(scope, components, agent) {
   }
   if (wantsClaudeSkill(components)) {
     if (includesClaude(agent) && !cleanupCodexPicker) {
-      copyDir(path.join(templateRoot, 'claude/skills'), path.join(target, '.claude/skills'), target);
+      copyDir(path.join(templateRoot, 'claude/skills'), path.join(claudeDir(target), 'skills'), target);
     }
   }
   if (wantsCodexSkills(components)) {
@@ -995,8 +1004,8 @@ function doneLineFor(agent) {
 function updateResultSummary(agent, targets) {
   const locations = [
-    includesClaude(agent) ? `Claude Code commands: ${path.join(targets.installTarget, '.claude/commands/tink')}` : null,
-    includesClaude(agent) ? `Claude Code skill: ${path.join(targets.installTarget, '.claude/skills/tink')}` : null,
+    includesClaude(agent) ? `Claude Code commands: ${path.join(claudeDir(targets.installTarget), 'commands/tink')}` : null,
+    includesClaude(agent) ? `Claude Code skill: ${path.join(claudeDir(targets.installTarget), 'skills/tink')}` : null,
     includesCodex(agent) ? `Codex skills: ${path.join(targets.codexTarget, 'skills')}` : null,
     `Tink shared files: ${path.join(targets.installTarget, '.tink')}`
   ].filter(Boolean);
@@ -1216,12 +1225,119 @@ async function resolveChoices() {
   return { agent, scope, components, gitPolicy, hookScope, language };
 }
+function findAllTinkRepos() {
+  const found = [];
+  const skip = new Set(['node_modules', '.git', 'vendor', 'dist', 'build', 'out', 'target', '.cache']);
+  function scan(dir, depth) {
+    if (depth > 4) return;
+    let entries;
+    try { entries = fs.readdirSync(dir, { withFileTypes: true }); } catch { return; }
+    let hasTink = false;
+    for (const entry of entries) {
+      if (!entry.isDirectory()) continue;
+      if (entry.name === '.tink') { hasTink = true; continue; }
+      if (skip.has(entry.name) || entry.name.startsWith('.')) continue;
+      scan(path.join(dir, entry.name), depth + 1);
+    }
+    if (hasTink) found.push(dir);
+  }
+  scan(os.homedir(), 0);
+  return found;
+}
+function isDirenvAvailable() {
+  return spawnSync('direnv', ['version'], { encoding: 'utf8' }).status === 0;
+}
+function parseEnvrc(envrcPath, repoDir) {
+  if (!fs.existsSync(envrcPath)) return {};
+  const env = {};
+  for (const line of fs.readFileSync(envrcPath, 'utf8').split('\n')) {
+    const m = line.match(/^\s*export\s+([A-Z_][A-Z0-9_]*)=(.*)/);
+    if (!m) continue;
+    let val = m[2].trim().replace(/^["']|["']$/g, '');
+    val = val
+      .replace(/\$HOME|\bHOME\b/g, os.homedir())
+      .replace(/\$PWD|\bPWD\b/g, repoDir)
+      .replace(/^~/, os.homedir());
+    env[m[1]] = val;
+  }
+  return env;
+}
+async function runAllRepos() {
+  const allRepos = findAllTinkRepos();
+  const sourceRoot = path.resolve(root);
+  const repos = allRepos.filter((r) => path.resolve(r) !== sourceRoot);
+  if (repos.length === 0) {
+    console.log('No repos with Tink installed found under home directory.');
+    return;
+  }
+  const hasDirenv = isDirenvAvailable();
+  const installScript = path.join(root, 'bin/install.js');
+  console.log(`Found ${repos.length} repo(s) with Tink installed:\n`);
+  for (const repo of repos) {
+    const envrc = path.join(repo, '.envrc');
+    const envVars = hasDirenv ? {} : parseEnvrc(envrc, repo);
+    const claudeTarget = envVars.CLAUDE_CONFIG_DIR
+      ? envVars.CLAUDE_CONFIG_DIR
+      : path.join(repo, '.claude');
+    const note = fs.existsSync(envrc)
+      ? hasDirenv
+        ? `(direnv)`
+        : envVars.CLAUDE_CONFIG_DIR
+          ? `(.envrc → CLAUDE_CONFIG_DIR=${envVars.CLAUDE_CONFIG_DIR})`
+          : `(.envrc, no CLAUDE_CONFIG_DIR)`
+      : '';
+    console.log(`  ${repo} ${note}`);
+    console.log(`    → ${claudeTarget}/commands/tink`);
+  }
+  console.log('');
+  for (const repo of repos) {
+    console.log(`▶ ${path.basename(repo)} (${repo})`);
+    const envrc = path.join(repo, '.envrc');
+    const extraEnv = hasDirenv ? {} : parseEnvrc(envrc, repo);
+    const mergedEnv = { ...process.env, ...extraEnv };
+    let result;
+    if (hasDirenv && fs.existsSync(envrc)) {
+      result = spawnSync(
+        'direnv', ['exec', repo, 'node', installScript, 'update', '--yes', '--scope=repo'],
+        { cwd: repo, env: process.env, stdio: 'inherit', encoding: 'utf8' }
+      );
+    } else {
+      result = spawnSync(
+        process.execPath, [installScript, 'update', '--yes', '--scope=repo'],
+        { cwd: repo, env: mergedEnv, stdio: 'inherit', encoding: 'utf8' }
+      );
+    }
+    if (result.status !== 0) {
+      console.error(`  ✗ failed (exit ${result.status})`);
+    } else {
+      console.log(`  ✓ done`);
+    }
+    console.log('');
+  }
+}
 async function main() {
   if (command === 'help' || args.includes('--help')) {
     usage();
     process.exit(0);
   }
+  if (command === 'update' && args.includes('--all-repos')) {
+    await runAllRepos();
+    return;
+  }
   if (command === 'dashboard') {
     runDashboard();
     return;

package/commands/cast.md CHANGED Viewed

@@ -160,6 +160,38 @@ Optional current-run artifacts are created only when their harness is selected:
 - `goals.json`: current-run goals for `goal-checkpoint`; keep 2-6 goals, one active goal, status, done criteria, verification, and evidence.
 - `delegation.md`: handoff or parallel-work packets for `delegation-brief`; include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
+## Evidence Split
+Evidence Split is a base-run habit, not a separate harness. It keeps real work small while the task is happening by splitting broad or uncertain work into evidence-sized packets.
+Use Evidence Split at cast time and again during implementation when:
+- the first plan has several uncertain facts,
+- implementation starts coupling several files or concepts,
+- a check fails and the next action is unclear,
+- context is becoming broad or stale,
+- independent verification, review, or handoff would reduce risk.
+Skip it for tiny, obvious edits where a packet would not change the next action.
+Packet vocabulary:
+- `probe`: answer one unknown with 1-3 inputs.
+- `patch`: make one narrow implementation change.
+- `verify`: prove one success condition or failure recovery.
+- `review`: inspect one risk, regression, or omission.
+- `decision`: record one branch, chosen option, and evidence.
+Represent packets in existing run state:
+- `steps.json`: packetized steps and status.
+- `context-map.json`: the input files, sources, or excluded context for each packet.
+- `notes.md`: why work was split or re-split during implementation.
+- `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
+Safety defaults:
+- Do not start workers, tmux panes, worktrees, or external agents automatically.
+- Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
+- Do not let multiple packets edit the same file concurrently.
+- Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
+- Keep each packet to 1-3 primary inputs when possible.
 Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
 ```json
@@ -480,12 +512,13 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - new pattern not covered yet
    These are task types, not harness names. Generic types (code change, bug fix, research, review, docs) default to the base run; a harness is added only when a specialized one genuinely fits.
-6. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
+6. Apply the Evidence Split check before choosing harnesses. If it changes the next action, represent the first packets in `steps.json` and connect each packet to context or verification evidence in `context-map.json`. Keep this check lightweight and skip it for tiny work.
+7. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
    - If the request is an ambiguous idea, early product concept, or underspecified implementation prompt, prefer `requirements-interview` before planning or coding. This is the default harness when Stitch is expected to trigger for goal ambiguity or missing acceptance criteria.
    - If the request asks for a plan, architecture decision, large refactor, migration, or broad public contract change, consider `plan-consensus`.
    - If the work naturally splits into multiple durable milestones, add `goal-checkpoint` and create `.tink/current/goals.json` after approval.
    - If parallel review, independent verification, or handoff would reduce risk, add `delegation-brief` and create `.tink/current/delegation.md` after approval. This harness prepares briefs only; it never starts tmux, worktrees, workers, or external agents.
-7. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
+8. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
    - Use `issue-triage` for issue/PR/QA intake, ready-for-agent briefs, needs-info/wontfix decisions, or vertical issue slices.
    - Use `bug-diagnosis-loop` for hard bugs, regressions, intermittent failures, or performance problems where a red-capable loop must come before code changes.
    - Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
@@ -496,7 +529,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.
    - `plan-consensus` must be explicitly considered for any from-scratch implementation, reimplementation, migration, or public contract/API design. If skipped, record a one-line reason in the 오버레이 점검 line.
    - The context budget and the "prefer 1-3 harnesses" guidance never justify dropping a REQUIRED overlay: overlays are cheap state files, not extra loaded context. A large task judged "fine with default harnesses" because the synthesis probe found a fit is a selection bug - the probe only answers whether a custom procedure is needed, not whether overlays are needed.
-8. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
+9. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
    After selecting, run a quick quality check using the index metadata for each chosen harness:
    - If fewer than 2 words in `use_when` match the current task description (case-insensitive) → treat as a Stitch harness-mismatch signal
@@ -504,26 +537,26 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - If `asks` is empty or missing and the task goal is not self-evident → treat as a Stitch goal-ambiguity signal
    Feed any signals into the Stitch evaluation at step 16.
-9. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
-10. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
-11. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
-12. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
-13. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
-14. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
-15. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
-16. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
-17. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
-18. Ask for explicit approval before non-trivial work.
-19. After approval, read only the selected harness files and any approved run-only draft.
-20. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
-21. Execute the first safe step immediately:
+10. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
+11. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
+12. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
+13. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
+14. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
+15. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
+16. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
+17. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
+18. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
+19. Ask for explicit approval before non-trivial work.
+20. After approval, read only the selected harness files and any approved run-only draft.
+21. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
+22. Execute the first safe step immediately:
    - inspect relevant files,
    - run a read-only diagnostic,
    - draft the first artifact,
    - or reproduce the issue.
-22. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
-23. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
-24. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
+23. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. Re-run Evidence Split when new uncertainty, coupling, failed checks, or context sprawl appears; update packetized steps and context evidence before continuing. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
+24. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
+25. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
 ## Synthesis probe

package/docs/planned-work-units.ko.md CHANGED Viewed

@@ -91,17 +91,18 @@ Standalone CLI를 더 짧게 입력하고, 로컬 health report를 더 쉽게
 - `dashboard`는 기본적으로 로컬 정적 파일만 만든다. 서버, watcher, hidden cache, 자동 하네스 수정은 하지 않는다.
 - 생성 파일 경로가 플랫폼별로 안정화된 뒤에만 선택적인 open/export flag를 검토한다.
-## Swarm Fast Lane
+## Evidence Split / Parallel Evidence
-작업 병렬화를 위한 멀티 에이전트 하네스를 연구하되, Tink를 별도 multi-agent runtime으로 만들지 않는다. 상세 계획은 `docs/swarm-fast-lane.ko.md`와 `docs/swarm-fast-lane.md`에 둔다.
+작업 병렬화보다 먼저, Tink의 기본 작업 루프에 Evidence Split을 넣는다. Tink를 별도 multi-agent runtime으로 만들지 않고, 큰 작업을 작은 증거 packet으로 나누는 기본 동작부터 안정화한다. 상세 연구 기록은 `docs/swarm-fast-lane.ko.md`와 `docs/swarm-fast-lane.md`에 둔다.
-- worker는 전체 작업이 아니라 1-3개 입력만 가진 작은 packet을 본다.
-- worker는 기본적으로 파일을 직접 수정하지 않고 evidence와 patch candidate만 반환한다.
+- `/tink:cast`와 `$tink:cast`는 하네스 선택 전에 `probe`, `patch`, `verify`, `review`, `decision` packet으로 나눌 수 있는지 점검한다.
+- 실제 작업 중에도 불확실성, 검증 실패, context 확대, 변경 결합이 생기면 다시 packet으로 나눈다.
+- packet은 전체 작업이 아니라 1-3개 입력만 가진 작은 단위를 본다.
+- 외부 worker가 필요할 때도 기본적으로 파일을 직접 수정하지 않고 evidence와 patch candidate만 반환한다.
 - 메인 에이전트만 최종 patch 선택, 파일 수정, 검증을 책임진다.
 - 성공 지표는 "항상 더 빠름"이 아니라 main context 감소, 재작업 감소, 실패 조기 발견, 검증 통과율 유지 또는 개선으로 둔다.
-- 초기 모드는 `parallel-probe`, `patch-candidate-race`, `micro-contract-split`, `speculative-verifier`, `context-starvation-mode` 후보를 검토한다.
-- `/tink:cast`와 `$tink:cast`가 이 하네스를 언제 선택하고 언제 거절할지 문서화한다.
-- worker 출력은 300단어 이하, evidence-only, confidence 포함으로 제한한다.
+- 초기 모드는 core behavior인 Evidence Split으로 두고, 실제 worker runtime은 별도 후속 작업으로 미룬다.
+- worker 출력은 future runtime에서도 300단어 이하, evidence-only, confidence 포함으로 제한한다.
 - public contract, secrets, 넓은 repo scan, 동일 파일 동시 수정이 필요한 작업에서는 선택하지 않는다.
 ## 제외

package/docs/planned-work-units.md CHANGED Viewed

@@ -91,17 +91,18 @@ Make the standalone CLI easier to type and make the local health report easier t
 - Keep `dashboard` local and static by default: no server, watcher, hidden cache, or automatic harness edits.
 - Allow an optional open/export flag only after the generated file path behavior is stable across platforms.
-## Swarm Fast Lane
+## Evidence Split / Parallel Evidence
-Research a multi-agent harness for parallel work without turning Tink into a separate multi-agent runtime. The detailed plan lives in `docs/swarm-fast-lane.ko.md` and `docs/swarm-fast-lane.md`.
+Before adding parallel workers, add Evidence Split to Tink's default work loop. Tink should not become a separate multi-agent runtime; it should first make large work divisible into small evidence packets. The research notes live in `docs/swarm-fast-lane.ko.md` and `docs/swarm-fast-lane.md`.
-- Workers see small packets with only 1-3 inputs, not the whole task.
-- Workers do not edit files by default; they return evidence and patch candidates.
+- `/tink:cast` and `$tink:cast` check whether work should split into `probe`, `patch`, `verify`, `review`, or `decision` packets before harness selection.
+- During implementation, Tink re-splits work when uncertainty, failed checks, context sprawl, or coupled changes appear.
+- Packets see only 1-3 inputs, not the whole task.
+- If external workers are used later, they do not edit files by default; they return evidence and patch candidates.
 - The main agent owns final patch selection, file edits, and verification.
 - Success is measured by less main-agent context, less rework, earlier failure detection, and equal or better verification pass rate, not by claiming universal raw speed.
-- Initial mode candidates are `parallel-probe`, `patch-candidate-race`, `micro-contract-split`, `speculative-verifier`, and `context-starvation-mode`.
-- `/tink:cast` and `$tink:cast` should document when to select or reject this harness.
-- Worker output is capped at 300 words and must include evidence and confidence.
+- The initial implementation is the core Evidence Split behavior; actual worker runtime remains deferred.
+- Future worker output should be capped at 300 words and include evidence and confidence.
 - Do not select it for unclear public contracts, secrets, broad repository scans, or same-file concurrent edits.
 ## Excluded

package/docs/swarm-fast-lane.ko.md CHANGED Viewed

@@ -1,16 +1,16 @@
-# Swarm Fast Lane 연구 계획
+# Evidence Split / Parallel Evidence 연구 계획
-이 문서는 멀티 에이전트를 작업 병렬화에 쓰되, Tink가 별도 거대 런타임이 되지 않도록 제한하는 연구 계획이다. 목표는 "에이전트를 많이 띄우기"가 아니라, 작은 컨텍스트 패킷을 병렬로 탐색해 메인 에이전트의 재작업과 전체 컨텍스트 부담을 줄이는 것이다.
+이 문서는 멀티 에이전트 작업 병렬화의 전 단계로, Tink가 큰 작업을 작은 evidence packet으로 나누는 기본 동작을 갖도록 제한하는 연구 계획이다. 목표는 "에이전트를 많이 띄우기"가 아니라, 작은 컨텍스트 패킷으로 조사, 수정, 검증, 리뷰, 결정을 분리해 메인 에이전트의 재작업과 전체 컨텍스트 부담을 줄이는 것이다.
 ## 문제 정의
 일반적인 멀티 에이전트 병렬화는 토큰을 더 많이 쓴다. 각 worker가 같은 문맥을 다시 읽고, 서로 다른 수정이 충돌하며, 메인 에이전트가 합산 비용을 다시 치르기 때문이다.
-`swarm-fast-lane`은 이 문제를 반대로 접근한다.
+Evidence Split은 이 문제를 반대로 접근한다.
-- worker가 전체 작업을 이해하지 않는다.
-- worker가 넓은 파일을 읽지 않는다.
-- worker가 기본적으로 직접 수정하지 않는다.
+- packet이 전체 작업을 이해하지 않는다.
+- packet이 넓은 파일을 읽지 않는다.
+- 외부 worker가 쓰이더라도 기본적으로 직접 수정하지 않는다.
 - worker 출력은 짧은 evidence와 patch candidate로 제한한다.
 - 메인 에이전트만 최종 경로를 선택하고 파일을 수정한다.
@@ -80,14 +80,16 @@ worker는 파일 수정 없이 관련 파일, 위험, 테스트 후보만 찾는
 worker에게 의도적으로 불완전한 최소 컨텍스트만 준다. 목적은 좋은 구현이 아니라, 작은 정보로도 잡히는 문제를 싸게 찾는 것이다.
-## 하네스 계약
+## Core Behavior 계약
-`swarm-fast-lane` 하네스는 다음 조건을 만족할 때만 선택한다.
+Evidence Split은 별도 하네스가 아니라 `/tink:cast`와 `$tink:cast`의 기본 동작이다. 다음 조건에서 사용한다.
 - 작업이 2-5개의 독립 packet으로 나뉜다.
 - 각 packet은 입력 파일 또는 질문이 1-3개로 제한된다.
-- 각 worker의 출력은 300단어 이하로 제한된다.
-- worker는 기본적으로 직접 파일을 수정하지 않는다.
+- packet type은 `probe`, `patch`, `verify`, `review`, `decision` 중 하나다.
+- 실제 작업 중 불확실성, 검증 실패, context 확대, 변경 결합이 생기면 다시 packet으로 나눈다.
+- 외부 worker의 출력은 future runtime에서도 300단어 이하로 제한한다.
+- 외부 worker는 기본적으로 직접 파일을 수정하지 않는다.
 - worker 출력에는 evidence, 추천 행동, confidence가 포함된다.
 - 메인 에이전트가 최종 patch와 검증을 책임진다.
@@ -118,15 +120,14 @@ worker에게 의도적으로 불완전한 최소 컨텍스트만 준다. 목적
 첫 구현 slice는 다음을 완료로 본다.
-- `swarm-fast-lane` 하네스 초안이 있다.
-- `/tink:cast`와 `$tink:cast`가 이 하네스를 언제 선택하고 언제 거절할지 문서화되어 있다.
-- worker packet 형식이 `.tink/current/delegation.md` 또는 별도 run artifact로 표현된다.
+- Evidence Split이 Tink core rules와 `/tink:cast`, `$tink:cast` 문서에 기본 동작으로 들어간다.
+- packet 형식이 `steps.json`, `context-map.json`, `notes.md`, 필요 시 `.tink/current/delegation.md`로 표현된다.
 - worker 직접 수정은 기본 비활성이다.
-- 최소 하나의 fixture 또는 문서 예제가 있다.
+- 작은 작업에서는 생략 가능하다는 lightweight rule이 있다.
 - 검증은 "더 빠름"을 단정하지 않고, context 감소와 재작업 감소 근거를 기록한다.
 ## 열린 질문
 - 실제 worker 실행은 Codex/Claude Code의 기존 기능을 얇게 호출할지, Tink는 packet 문서화까지만 할지 결정해야 한다.
-- worker 결과 schema를 `delegation-brief`에 통합할지, 별도 하네스로 둘지 결정해야 한다.
-- fast lane이라는 이름이 과도한 성능 보장을 암시하지 않도록 사용자 문구를 조정해야 한다.
+- worker 결과 schema를 `delegation-brief`에 통합할지, 별도 runtime artifact로 둘지 결정해야 한다.
+- `swarm-fast-lane` 이름은 연구 문서의 임시 이름으로만 남기고, 사용자 문구는 Evidence Split 또는 Parallel Evidence를 우선한다.

package/docs/swarm-fast-lane.md CHANGED Viewed

@@ -1,16 +1,16 @@
-# Swarm Fast Lane Research Plan
+# Evidence Split / Parallel Evidence Research Plan
-This document describes a constrained research plan for using multi-agent parallelism without turning Tink into a large standalone runtime. The goal is not to spawn more agents by default. The goal is to split work into tiny context packets so workers can explore independent evidence while the main agent reduces rework and context load.
+This document describes the step before multi-agent parallelism: Tink should first split large work into small evidence packets without becoming a separate runtime. The goal is not to spawn more agents by default. The goal is to separate probe, patch, verify, review, and decision work into tiny context packets so the main agent reduces rework and context load.
 ## Problem
 Naive multi-agent parallelism usually spends more tokens. Each worker rereads context, independent edits conflict, and the main agent still pays a reconciliation cost.
-`swarm-fast-lane` inverts that model.
+Evidence Split inverts that model.
-- Workers do not understand the whole task.
-- Workers do not read broad context.
-- Workers do not edit files by default.
+- Packets do not understand the whole task.
+- Packets do not read broad context.
+- If external workers are used, they do not edit files by default.
 - Worker output is limited to short evidence and patch candidates.
 - The main agent chooses the final path and owns file edits.
@@ -80,14 +80,16 @@ Workers look only for reasons the current implementation approach will fail. Thi
 Workers intentionally receive incomplete minimal context. The point is not high-quality implementation; it is cheaply detecting problems that are visible with little information.
-## Harness Contract
+## Core Behavior Contract
-The `swarm-fast-lane` harness is eligible only when:
+Evidence Split is not a separate harness. It is default behavior inside `/tink:cast` and `$tink:cast`. Use it when:
 - the task splits into 2-5 independent packets
 - each packet is limited to 1-3 input files or questions
-- each worker output is limited to 300 words
-- workers do not edit files by default
+- each packet type is `probe`, `patch`, `verify`, `review`, or `decision`
+- work should be re-split during implementation because uncertainty, failed checks, context sprawl, or coupled changes appeared
+- future worker output is limited to 300 words
+- external workers do not edit files by default
 - worker output includes evidence, recommended action, and confidence
 - the main agent owns final patching and verification
@@ -118,15 +120,14 @@ The first version can start with estimates, but run artifacts should record evid
 The first implementation slice is done when:
-- a `swarm-fast-lane` harness draft exists
-- `/tink:cast` and `$tink:cast` document when to select or reject it
-- worker packet format is represented in `.tink/current/delegation.md` or another run artifact
+- Evidence Split is documented as default behavior in Tink core rules and `/tink:cast`, `$tink:cast`
+- packet format is represented in `steps.json`, `context-map.json`, `notes.md`, and optionally `.tink/current/delegation.md`
 - direct worker edits are disabled by default
-- at least one fixture or example exists
+- tiny work can skip the packet ceremony
 - verification records context reduction and rework reduction evidence instead of claiming raw speed
 ## Open Questions
 - Should actual worker execution call existing Codex/Claude Code features, or should Tink only document packets?
-- Should worker result schema extend `delegation-brief`, or should this be a separate harness?
-- Should the user-facing name avoid implying guaranteed speed?
+- Should worker result schema extend `delegation-brief`, or should it use a separate runtime artifact?
+- Keep `swarm-fast-lane` only as a research placeholder; prefer Evidence Split or Parallel Evidence in user-facing copy.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "tink-harness",
-  "version": "1.13.0",
+  "version": "1.14.0",
   "description": "Self-growing harnesses for Claude Code and Codex.",
   "license": "MIT",
   "type": "module",

package/templates/claude/commands/tink/cast.md CHANGED Viewed

@@ -160,6 +160,38 @@ Optional current-run artifacts are created only when their harness is selected:
 - `goals.json`: current-run goals for `goal-checkpoint`; keep 2-6 goals, one active goal, status, done criteria, verification, and evidence.
 - `delegation.md`: handoff or parallel-work packets for `delegation-brief`; include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
+## Evidence Split
+Evidence Split is a base-run habit, not a separate harness. It keeps real work small while the task is happening by splitting broad or uncertain work into evidence-sized packets.
+Use Evidence Split at cast time and again during implementation when:
+- the first plan has several uncertain facts,
+- implementation starts coupling several files or concepts,
+- a check fails and the next action is unclear,
+- context is becoming broad or stale,
+- independent verification, review, or handoff would reduce risk.
+Skip it for tiny, obvious edits where a packet would not change the next action.
+Packet vocabulary:
+- `probe`: answer one unknown with 1-3 inputs.
+- `patch`: make one narrow implementation change.
+- `verify`: prove one success condition or failure recovery.
+- `review`: inspect one risk, regression, or omission.
+- `decision`: record one branch, chosen option, and evidence.
+Represent packets in existing run state:
+- `steps.json`: packetized steps and status.
+- `context-map.json`: the input files, sources, or excluded context for each packet.
+- `notes.md`: why work was split or re-split during implementation.
+- `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
+Safety defaults:
+- Do not start workers, tmux panes, worktrees, or external agents automatically.
+- Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
+- Do not let multiple packets edit the same file concurrently.
+- Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
+- Keep each packet to 1-3 primary inputs when possible.
 Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
 ```json
@@ -480,12 +512,13 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - new pattern not covered yet
    These are task types, not harness names. Generic types (code change, bug fix, research, review, docs) default to the base run; a harness is added only when a specialized one genuinely fits.
-6. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
+6. Apply the Evidence Split check before choosing harnesses. If it changes the next action, represent the first packets in `steps.json` and connect each packet to context or verification evidence in `context-map.json`. Keep this check lightweight and skip it for tiny work.
+7. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
    - If the request is an ambiguous idea, early product concept, or underspecified implementation prompt, prefer `requirements-interview` before planning or coding. This is the default harness when Stitch is expected to trigger for goal ambiguity or missing acceptance criteria.
    - If the request asks for a plan, architecture decision, large refactor, migration, or broad public contract change, consider `plan-consensus`.
    - If the work naturally splits into multiple durable milestones, add `goal-checkpoint` and create `.tink/current/goals.json` after approval.
    - If parallel review, independent verification, or handoff would reduce risk, add `delegation-brief` and create `.tink/current/delegation.md` after approval. This harness prepares briefs only; it never starts tmux, worktrees, workers, or external agents.
-7. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
+8. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
    - Use `issue-triage` for issue/PR/QA intake, ready-for-agent briefs, needs-info/wontfix decisions, or vertical issue slices.
    - Use `bug-diagnosis-loop` for hard bugs, regressions, intermittent failures, or performance problems where a red-capable loop must come before code changes.
    - Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
@@ -496,7 +529,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.
    - `plan-consensus` must be explicitly considered for any from-scratch implementation, reimplementation, migration, or public contract/API design. If skipped, record a one-line reason in the 오버레이 점검 line.
    - The context budget and the "prefer 1-3 harnesses" guidance never justify dropping a REQUIRED overlay: overlays are cheap state files, not extra loaded context. A large task judged "fine with default harnesses" because the synthesis probe found a fit is a selection bug - the probe only answers whether a custom procedure is needed, not whether overlays are needed.
-8. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
+9. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
    After selecting, run a quick quality check using the index metadata for each chosen harness:
    - If fewer than 2 words in `use_when` match the current task description (case-insensitive) → treat as a Stitch harness-mismatch signal
@@ -504,26 +537,26 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - If `asks` is empty or missing and the task goal is not self-evident → treat as a Stitch goal-ambiguity signal
    Feed any signals into the Stitch evaluation at step 16.
-9. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
-10. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
-11. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
-12. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
-13. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
-14. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
-15. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
-16. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
-17. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
-18. Ask for explicit approval before non-trivial work.
-19. After approval, read only the selected harness files and any approved run-only draft.
-20. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
-21. Execute the first safe step immediately:
+10. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
+11. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
+12. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
+13. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
+14. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
+15. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
+16. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
+17. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
+18. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
+19. Ask for explicit approval before non-trivial work.
+20. After approval, read only the selected harness files and any approved run-only draft.
+21. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
+22. Execute the first safe step immediately:
    - inspect relevant files,
    - run a read-only diagnostic,
    - draft the first artifact,
    - or reproduce the issue.
-22. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
-23. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
-24. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
+23. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. Re-run Evidence Split when new uncertainty, coupling, failed checks, or context sprawl appears; update packetized steps and context evidence before continuing. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
+24. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
+25. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
 ## Synthesis probe

package/templates/codex/skills/tink-core/RULES.md CHANGED Viewed

@@ -26,23 +26,24 @@ Accept legacy `$tink <action>` spelling for compatibility, but present `$tink:<a
 6. If `.tink/current/` exists and continuity is uncertain, read `plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, and `contract.json` when present; summarize goal, last safe point, next step, open questions, and verification; then ask resume/archive/replace/cancel before continuing.
 7. Run the synthesis probe before committing to `.tink/current/`. Strong fit keeps the harness; generic fit adds a run-only draft; no fit loads `harness-synthesis`.
 8. If too many tools, skills, agents, or harnesses are available, use `harness-curation` to choose the smallest effective set before loading more context.
-9. Treat visible-thinking and focused work workflows as ordinary Tink harness choices, not new commands. Actively consider them when their trigger changes the procedure: use `requirements-interview` for ambiguity, unclear scope, or missing acceptance criteria; `plan-consensus` for broad plans, migrations, API/schema/contract changes, or tradeoffs; `goal-checkpoint` for multi-file, multi-phase, resumed, release, or long runs; `delegation-brief` for handoff, independent verification, parallel review, or another agent/human brief; `issue-triage` for issue/PR/QA intake or vertical slices; `bug-diagnosis-loop` for hard bugs that need a red-capable loop before code changes; `review-two-axis` for Standards/Spec diff review; `decision-map` for multi-session unresolved decisions; and `architecture-deepening` for deep module, interface, seam, leverage, locality, or testability work.
-10. Run Stitch once before committing to `.tink/current/`: evaluate every time, show exactly one proposal only for high-impact quality or safety branches, and use the configured language.
-11. For non-trivial `$tink:cast` runs, ask for current-run approval before creating `.tink/current/`, loading harness bodies, editing files, or executing the first step. Codex must not silently treat a command invocation as approval.
-12. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
-13. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
-14. Current-run approval never authorizes reusable-state writes. Before saving reusable state, show operation, destination files, exact entry or patch summary, reusable reason, sensitive content excluded, and rollback/removal path.
-15. Before saving a reusable rule graph update, run a structural gate: duplicate, breadth, evidence, verification, Claude Code/Codex compatibility, macOS/Windows compatibility, and portable commands. AI may propose a rule; saving it still requires separate approval.
-16. `$tink:frog` may inspect rule quality as well as harness quality. Prefer keep, rewrite, split, merge, or needs-evidence recommendations before any removal proposal.
-17. For `$tink:weave` or `$tink:frog`, prepare the harness health summary before ranking candidates. If `.tink/tools/generate-harness-lifecycle-summary.mjs` exists, run `node .tink/tools/generate-harness-lifecycle-summary.mjs` from the repo root and then read `.tink/maintenance/harness-lifecycle.json`. If the generator is missing, continue from compact run, queue, ledger, and friction evidence.
-18. When `.tink/maintenance/harness-lifecycle.json` or another file following `.tink/schemas/harness-lifecycle.schema.json` exists, treat it as a plain harness health summary. Use `confidence`, `evidence_grade`, `evidence_handles`, and `safe_next_action` to prioritize `$tink:weave` or `$tink:frog` candidates, but do not treat it as approval. Low-confidence entries stay as observation. Harness edits, rule updates, memory saves, merges, archives, and deletions still require the reusable-state approval gate.
-19. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `.tink/current/goals.json` for `goal-checkpoint` and `.tink/current/delegation.md` for `delegation-brief`.
-20. Do not stop at recommendation. Execute the first safe step after run state exists.
-21. Run `$tink:verify` behavior before final when `contract.json` lists required checks. If `.tink/config.json` has `completion_policy: "strict"`, do not call the run done until required checks are represented in `.tink/current/verification.json`, `.tink/current/evidence.md` exists, and remaining risk is stated.
-22. Store reusable memory or rule updates under `.tink/` only after separate approval.
-23. If a check fails, update `.tink/current/notes.md`, state the failure, last safe point, and next single action. Append compact friction to `.tink/maintenance/friction.jsonl` when it exists. Feed repeated failures to `$tink:weave`.
-24. Keep context compact. Do not paste raw logs or full diffs.
-25. Use calm, clear, concise language. Prefer plain everyday words over technical terms. No jokes.
+9. Treat Evidence Split as a base-run habit, not a harness: for non-trivial work, first ask whether the task should be split into `probe`, `patch`, `verify`, `review`, or `decision` packets. Use it at cast time and again during implementation when uncertainty grows, a check fails, context gets broad, or several changes start to couple. Keep it lightweight for tiny tasks and skip it when it would add ceremony without changing the next action.
+10. Treat visible-thinking and focused work workflows as ordinary Tink harness choices, not new commands. Actively consider them when their trigger changes the procedure: use `requirements-interview` for ambiguity, unclear scope, or missing acceptance criteria; `plan-consensus` for broad plans, migrations, API/schema/contract changes, or tradeoffs; `goal-checkpoint` for multi-file, multi-phase, resumed, release, or long runs; `delegation-brief` for handoff, independent verification, parallel review, or another agent/human brief; `issue-triage` for issue/PR/QA intake or vertical slices; `bug-diagnosis-loop` for hard bugs that need a red-capable loop before code changes; `review-two-axis` for Standards/Spec diff review; `decision-map` for multi-session unresolved decisions; and `architecture-deepening` for deep module, interface, seam, leverage, locality, or testability work.
+11. Run Stitch once before committing to `.tink/current/`: evaluate every time, show exactly one proposal only for high-impact quality or safety branches, and use the configured language.
+12. For non-trivial `$tink:cast` runs, ask for current-run approval before creating `.tink/current/`, loading harness bodies, editing files, or executing the first step. Codex must not silently treat a command invocation as approval.
+13. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
+14. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
+15. Current-run approval never authorizes reusable-state writes. Before saving reusable state, show operation, destination files, exact entry or patch summary, reusable reason, sensitive content excluded, and rollback/removal path.
+16. Before saving a reusable rule graph update, run a structural gate: duplicate, breadth, evidence, verification, Claude Code/Codex compatibility, macOS/Windows compatibility, and portable commands. AI may propose a rule; saving it still requires separate approval.
+17. `$tink:frog` may inspect rule quality as well as harness quality. Prefer keep, rewrite, split, merge, or needs-evidence recommendations before any removal proposal.
+18. For `$tink:weave` or `$tink:frog`, prepare the harness health summary before ranking candidates. If `.tink/tools/generate-harness-lifecycle-summary.mjs` exists, run `node .tink/tools/generate-harness-lifecycle-summary.mjs` from the repo root and then read `.tink/maintenance/harness-lifecycle.json`. If the generator is missing, continue from compact run, queue, ledger, and friction evidence.
+19. When `.tink/maintenance/harness-lifecycle.json` or another file following `.tink/schemas/harness-lifecycle.schema.json` exists, treat it as a plain harness health summary. Use `confidence`, `evidence_grade`, `evidence_handles`, and `safe_next_action` to prioritize `$tink:weave` or `$tink:frog` candidates, but do not treat it as approval. Low-confidence entries stay as observation. Harness edits, rule updates, memory saves, merges, archives, and deletions still require the reusable-state approval gate.
+20. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `.tink/current/goals.json` for `goal-checkpoint` and `.tink/current/delegation.md` for `delegation-brief`. Evidence Split packets live in these run files; do not add a new public command or standalone runtime file for them.
+21. Do not stop at recommendation. Execute the first safe step after run state exists.
+22. Run `$tink:verify` behavior before final when `contract.json` lists required checks. If `.tink/config.json` has `completion_policy: "strict"`, do not call the run done until required checks are represented in `.tink/current/verification.json`, `.tink/current/evidence.md` exists, and remaining risk is stated.
+23. Store reusable memory or rule updates under `.tink/` only after separate approval.
+24. If a check fails, update `.tink/current/notes.md`, state the failure, last safe point, and next single action. Append compact friction to `.tink/maintenance/friction.jsonl` when it exists. Feed repeated failures to `$tink:weave`.
+25. Keep context compact. Do not paste raw logs or full diffs.
+26. Use calm, clear, concise language. Prefer plain everyday words over technical terms. No jokes.
 ## Codex Approval Protocol
@@ -120,6 +121,39 @@ Optional current-run artifacts:
 - `.tink/current/goals.json`: create only when `goal-checkpoint` is selected. Keep 2-6 goals, one active goal, status, done criteria, verification, evidence, and next action.
 - `.tink/current/delegation.md`: create only when `delegation-brief` is selected. Include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
+## Evidence Split
+Evidence Split is Tink's default way to keep real work small while it is happening. It is not a separate harness and it does not imply parallel execution.
+Use Evidence Split when a task is non-trivial and any of these signals appears:
+- the first plan has several uncertain facts,
+- implementation starts coupling several files or concepts,
+- a check fails and the next action is unclear,
+- context is becoming broad or stale,
+- independent verification, review, or handoff would reduce risk.
+Skip it for tiny, obvious edits where a packet would not change the next action.
+Packet vocabulary:
+- `probe`: answer one unknown with 1-3 inputs.
+- `patch`: make one narrow implementation change.
+- `verify`: prove one success condition or failure recovery.
+- `review`: inspect one risk, regression, or omission.
+- `decision`: record one branch, chosen option, and evidence.
+Represent packets in existing run state:
+- `steps.json`: packetized steps and status.
+- `context-map.json`: the input files, sources, or excluded context for each packet.
+- `notes.md`: why work was split or re-split during implementation.
+- `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
+Safety defaults:
+- Do not start workers, tmux panes, worktrees, or external agents automatically.
+- Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
+- Do not let multiple packets edit the same file concurrently.
+- Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
+- Keep each packet to 1-3 primary inputs when possible.
 GJC-style harness selection rules:
 - Ambiguous ideas, early product concepts, vague bug reports, broad "make it better" requests, and underspecified implementation prompts should start with `requirements-interview`, usually alone until the user clarifies enough to plan or code.