npm - tink-harness - Versions diffs - 1.13.0 → 1.15.0 - Mend

tink-harness 1.13.0 → 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +16 -1
package/README.ko.md +14 -1
package/README.md +15 -2
package/VERSIONING.md +1 -1
package/bin/install.js +126 -10
package/commands/cast.md +108 -25
package/docs/geobench.md +29 -0
package/docs/planned-work-units.ko.md +8 -7
package/docs/planned-work-units.md +8 -7
package/docs/swarm-fast-lane.ko.md +17 -16
package/docs/swarm-fast-lane.md +17 -16
package/geobench/tink-harness.yaml +47 -0
package/package.json +2 -1
package/templates/claude/commands/tink/cast.md +108 -25
package/templates/codex/skills/tink-core/RULES.md +52 -17
package/templates/tink/config.json +1 -0

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "tink",
   "description": "A small harness layer for Claude Code and Codex.",
-  "version": "1.13.0",
+  "version": "1.15.0",
   "author": {
     "name": "dotori"
   }

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,22 @@
 All notable changes to Tink are tracked here.
+## Unreleased
+- Added a geobench product spec and runbook for measuring Tink's LLM answer visibility with hit rate, MRR, share of voice, and citation metrics. The runbook keeps benchmark execution separate from this repo and says to publish aggregate metrics only.
+## [1.15.0] - 2026-06-24
+- Added cast mode system: `/tink:cast` now supports three modes — `quick` (forces Lane 1 fast path), `standard` (default, auto triage), and `deep` (structured interview before planning). The active mode is persisted in `.tink/config.json` as `cast_mode`. Setting the mode with `/tink:cast <mode>` shows the current mode and offers a change option when called without a task.
+- Added `deep` mode interview pipeline: Round 0 topology lock confirms inferred components before questions start; Rounds 1–10 ask one question per round with a `[Round N/10 ████░░░]` progress indicator, target the weakest clarity dimension (goal/constraint/success criteria/context), investigate brownfield code before asking, handle counter-questions and clarification requests within the same round, allow early exit from Round 3+, and shift from Contrarian to Simplifier questioning as clarity improves. The interview produces a Goal/Topology/Constraints/Success Criteria/Open Questions spec written to `plan.md` before harness selection begins.
+- Upgraded Stitch to Phase A / Phase B: Phase A (Blocking — safety, missing success criteria, goal ambiguity, harness mismatch) always runs and always surfaces when triggered. Phase B (Plan-shaping — minimality, reuse, deletion/substitution) runs only when a concrete code-grounded alternative exists and is skipped entirely in `deep` mode. Phase B never suggests reducing trust-boundary validation, data-loss prevention, security, accessibility, or explicitly requested requirements.
+- Codex: Rule 27 added for `cast_mode` and `deep` mode behavior; Rule 11 updated for Stitch Phase A/B.
+## [1.14.0] - 2026-06-19
+- Added `CLAUDE_CONFIG_DIR` support: global installs now respect the env var (set via direnv or shell) so commands and skills land in the right config directory instead of always defaulting to `~/.claude`.
+- Added `tink-harness update --all-repos`: finds every repo under the home directory that has Tink installed and updates each one. Uses `direnv exec` when available so per-repo `.envrc` overrides (including `CLAUDE_CONFIG_DIR`) are applied automatically; falls back to parsing simple `export` lines from `.envrc` otherwise.
 ## [1.13.0] - 2026-06-19
 - Added focused opt-in harnesses for recurring agent workflows: `issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, and `architecture-deepening`.
@@ -13,7 +29,6 @@ All notable changes to Tink are tracked here.
 - Added evidence lifecycle manager groundwork: `/tink:verify` now records a human-readable `.tink/current/evidence.md` summary card, config includes a `completion_policy` field for optional strict "no evidence, no done" behavior, and the dashboard lifecycle summary now exposes ROI hints, trust levels, and Activity-tab run review cards for failed or blocked runs without adding a new public replay command.
 - Fixed: `npx tink-harness update` now prefers the current repo when `.tink/` exists there, so a global/home install scope no longer redirects update tests or repo-local updates away from the current project. Stored `git_policy` is still respected.
 - Improved: the Activity dashboard cards were checked in desktop and mobile Chrome headless screenshots, with narrower mobile layout and shorter run-review fallback copy so the new evidence cards stay readable.
 ## [1.11.2] - 2026-06-13
 - Fixed: the 3D harness map showed no connections or signal pulses on fresh installs (or installs whose history was lost to the pre-1.11.0 record-wipe bug). The lifecycle summary's graph was built only from run/ledger evidence; it now also includes the static rule graph - every routing rule connects to its harness, and check/guard chains render - so the map is alive from the first open.

package/README.ko.md CHANGED Viewed

@@ -10,7 +10,7 @@ Tink는 사소하지 않은 모든 에이전트 작업을 눈에 보이는 파
 <sub>Claude Code와 Codex를 위한 작은 하네스 레이어</sub>
-**최신 패키지:** v1.13.0 — 이슈 정리, 어려운 버그 진단 루프, 두 축 리뷰, decision map, architecture deepening을 위한 focused opt-in 하네스를 추가하고 Claude Code·Codex 양쪽 cast 라우팅과 문서를 갱신했습니다. 전체 변경 이력은 [CHANGELOG](CHANGELOG.md)를 확인하세요.
+**최신 패키지:** v1.15.0 — `/tink:cast`에 세 가지 모드(quick / standard / deep)가 생겼습니다. deep 모드는 계획 전에 최대 10라운드 인터뷰를 진행하고, Stitch는 Phase A(차단) · Phase B(계획 조정)로 나뉩니다. 전체 변경 이력은 [CHANGELOG](CHANGELOG.md)를 확인하세요.
 [English](README.md) · **한국어** · [변경 이력](CHANGELOG.md)
@@ -125,6 +125,18 @@ npx tink-harness dashboard          # 파일만 만들려면 --no-open 추가
 ---
+## GEO 노출도 측정
+Tink에는 LLM 답변에서 Tink가 얼마나 자주 언급되고, 어느 순위로 추천되며, 어떤 출처로 인용되는지 측정하기 위한 geobench 제품 스펙이 포함되어 있습니다.
+- Spec: [`geobench/tink-harness.yaml`](geobench/tink-harness.yaml)
+- Runbook: [`docs/geobench.md`](docs/geobench.md)
+- 지표: hit rate, MRR, share of voice, citation rate/share, confidence interval
+벤치마크 결과는 집계 지표만 공개하세요. 원문 provider 답변, 시크릿, 개인 실행 로그는 공개하지 않습니다.
+---
 ## 왜 만들었나
 새로운 AI 코딩 하네스와 워크플로는 계속 늘어납니다. 좋은 것도 많지만, 여러 개를 섞다 보면 환경이 무거워지고 매번 다시 정리해야 합니다.
@@ -228,6 +240,7 @@ Tink가 아는 모든 것은 직접 읽고, diff 보고, 지울 수 있는 평
 - 하네스 건강 요약: `docs/harness-lifecycle-signals.ko.md`, `docs/harness-lifecycle-signals.md`
 - 외부 context 안전: `docs/mcp-safe-profile.md`, `docs/external-context-policy.md`
 - `.tink/current/` 상태 읽기: `docs/work-state.ko.md`, `docs/work-state.md`
+- GEO 노출도 벤치마크: `docs/geobench.md` · spec: `geobench/tink-harness.yaml`
 - 업데이트 안정화: `docs/phase-5-update-confidence.ko.md`, `docs/phase-5-update-confidence.md`
 - Context 효율: `docs/context-budget-ledger.ko.md`, `docs/context-budget-ledger.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-metrics-evaluator.md`, `docs/context-run-history-rollup.ko.md`, `docs/context-run-history-rollup.md`, `docs/context-threshold-status.ko.md`, `docs/context-threshold-status.md`, `docs/context-run-record-policy.ko.md`, `docs/context-run-record-policy.md`
 - 남은 작업 단위: `docs/planned-work-units.ko.md`, `docs/planned-work-units.md` · 로드맵·아이디어 점검: `docs/tink-idea-implementation-plan.ko.md`

package/README.md CHANGED Viewed

@@ -17,14 +17,14 @@
 <p><sub>A small harness layer for Claude Code and Codex</sub></p>
 <p>
-  <a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.13.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
+  <a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.14.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
   <a href="https://www.npmjs.com/package/tink-harness"><img src="https://img.shields.io/npm/v/tink-harness?label=npm&color=cb3837" alt="npm version"></a>
   <a href="https://github.com/dotoricode/tink-harness/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/dotoricode/tink-harness/ci.yml?branch=main&label=ci" alt="CI"></a>
   <a href="https://github.com/dotoricode/tink-harness/blob/main/LICENSE"><img src="https://img.shields.io/github/license/dotoricode/tink-harness" alt="License"></a>
   <a href="https://github.com/dotoricode/tink-harness/stargazers"><img src="https://img.shields.io/github/stars/dotoricode/tink-harness?style=social" alt="GitHub stars"></a>
 </p>
-<p><strong>Latest package:</strong> v1.13.0 - Tink adds focused opt-in harnesses for issue triage, hard-bug diagnosis loops, two-axis reviews, decision maps, and architecture deepening, with cast routing and docs updated for both Claude Code and Codex. See <a href="CHANGELOG.md">CHANGELOG</a> for release history.</p>
+<p><strong>Latest package:</strong> v1.15.0 - <code>/tink:cast</code> now supports three modes (quick / standard / deep); deep mode runs a structured 10-round interview before planning, and Stitch is split into Phase A (blocking) and Phase B (plan-shaping). See <a href="CHANGELOG.md">CHANGELOG</a> for release history.</p>
 **English** · [한국어](README.ko.md) · [Changelog](CHANGELOG.md)
@@ -139,6 +139,18 @@ No server, no telemetry, no hidden cache - it is a static local page that only p
 ---
+## Measure GEO visibility
+Tink includes a geobench product spec so maintainers can measure how often LLM answers mention, rank, and cite Tink across providers.
+- Spec: [`geobench/tink-harness.yaml`](geobench/tink-harness.yaml)
+- Runbook: [`docs/geobench.md`](docs/geobench.md)
+- Metrics: hit rate, MRR, share of voice, citation rate/share, and confidence intervals
+Use the benchmark for aggregate visibility checks only. Do not publish raw provider answers, secrets, or private run logs.
+---
 ## Why I made this
 *Tink is <strong>knit</strong> in reverse: untying tangled workflows and knitting better ones back together. It also nods to Tinker Bell, the small helper at your side.*
@@ -282,6 +294,7 @@ The dashboard is a static local page rendered from those files — the harness h
 - Harness health summary: `docs/harness-lifecycle-signals.md`, `docs/harness-lifecycle-signals.ko.md`
 - External context safety: `docs/mcp-safe-profile.md`, `docs/external-context-policy.md`
 - Reading `.tink/current/` state: `docs/work-state.md`, `docs/work-state.ko.md`
+- GEO visibility benchmark: `docs/geobench.md` · spec: `geobench/tink-harness.yaml`
 - Update confidence: `docs/phase-5-update-confidence.md`, `docs/phase-5-update-confidence.ko.md`
 - Context efficiency: `docs/context-budget-ledger.md`, `docs/context-budget-ledger.ko.md`, `docs/context-metrics-evaluator.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-run-history-rollup.md`, `docs/context-run-history-rollup.ko.md`, `docs/context-threshold-status.md`, `docs/context-threshold-status.ko.md`, `docs/context-run-record-policy.md`, `docs/context-run-record-policy.ko.md`
 - Planned work units: `docs/planned-work-units.md`, `docs/planned-work-units.ko.md` · roadmap and idea audit: `docs/tink-idea-implementation-plan.ko.md`

package/VERSIONING.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Versioning
-Current version: `1.13.0`
+Current version: `1.15.0`
 Tink follows semver from `1.0.0` onward.

package/bin/install.js CHANGED Viewed

@@ -126,7 +126,7 @@ function argValue(name) {
 }
 function usage() {
-  console.log(`Tink installer for Claude Code and Codex\n\nUsage:\n  tink-harness [install] [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--with-hook] [--clean-codex-picker] [--dry-run] [--force]\n  tink-harness update [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--clean-codex-picker] [--dry-run] [--force]\n  tink-harness dashboard [--no-open]\n\nIf the command is not installed yet, use:\n  npx tink-harness@latest [install]\n  npx tink-harness@latest update\n\nCommands:\n  install  Install Tink.\n  update   Update Tink to the latest templates. Asks only the agent surface; Tink-owned files always refresh, user-modified harness/memory/config files are kept.\n  dashboard  Generate the harness health report from local .tink records and open it in your browser. Use --no-open to skip opening.\n\nDefault interactive flow:\n  1. Select language\n  2. Show TINK wizard\n  3. Select Claude Code, Codex, or both\n  4. Select components\n  5. Select repo/global installation scope\n  6. Select Advanced options\n  7. Select git tracking policy for project state\n\nAdvanced options:\n  --dry-run             Preview only. Show what would be written or removed, but do not change files.\n  --force               Overwrite user-modified files. Use only when you want official templates to replace local edits.\n  --clean-codex-picker  Codex-only cleanup. Remove repo-local Claude Tink surfaces that show as Source Command Tink entries.\n\nEnvironment:\n  TINK_INSTALL_SURFACES=claude|codex|all\n  TINK_CLEAN_CODEX_PICKER=1\n\nScopes:\n  repo    Install shared .tink files into the current project.\n  global  Install shared .tink files into your home directory.\n`);
+  console.log(`Tink installer for Claude Code and Codex\n\nUsage:\n  tink-harness [install] [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--with-hook] [--clean-codex-picker] [--dry-run] [--force]\n  tink-harness update [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--clean-codex-picker] [--dry-run] [--force]\n  tink-harness update --all-repos\n  tink-harness dashboard [--no-open]\n\nIf the command is not installed yet, use:\n  npx tink-harness@latest [install]\n  npx tink-harness@latest update\n\nCommands:\n  install  Install Tink.\n  update   Update Tink to the latest templates. Asks only the agent surface; Tink-owned files always refresh, user-modified harness/memory/config files are kept.\n  dashboard  Generate the harness health report from local .tink records and open it in your browser. Use --no-open to skip opening.\n\nDefault interactive flow:\n  1. Select language\n  2. Show TINK wizard\n  3. Select Claude Code, Codex, or both\n  4. Select components\n  5. Select repo/global installation scope\n  6. Select Advanced options\n  7. Select git tracking policy for project state\n\nAdvanced options:\n  --dry-run             Preview only. Show what would be written or removed, but do not change files.\n  --force               Overwrite user-modified files. Use only when you want official templates to replace local edits.\n  --clean-codex-picker  Codex-only cleanup. Remove repo-local Claude Tink surfaces that show as Source Command Tink entries.\n  --all-repos           Update all repos with Tink under the home directory. Uses direnv if available to load per-repo .envrc.\n\nEnvironment:\n  TINK_INSTALL_SURFACES=claude|codex|all\n  TINK_CLEAN_CODEX_PICKER=1\n  CLAUDE_CONFIG_DIR  Override ~/.claude for global installs (e.g. set by direnv per project)\n  CODEX_HOME         Override ~/.codex for Codex skill installs\n\nScopes:\n  repo    Install shared .tink files into the current project.\n  global  Install shared .tink files into your home directory.\n`);
 }
 function findTinkRoot() {
@@ -228,6 +228,15 @@ function codexHome() {
   return process.env.CODEX_HOME || path.join(os.homedir(), '.codex');
 }
+// CLAUDE_CONFIG_DIR replaces ~/.claude for global installs (like direnv per-project overrides).
+// Repo-scope installs always use <repo>/.claude regardless of this env var.
+function claudeDir(target) {
+  if (process.env.CLAUDE_CONFIG_DIR && target === os.homedir()) {
+    return process.env.CLAUDE_CONFIG_DIR;
+  }
+  return path.join(target, '.claude');
+}
 function legacyComponentOptionsFor(agent, language) {
   const options = COMPONENTS[language].filter((item) => {
     if (item.value === 'commands') return includesClaude(agent);
@@ -364,8 +373,8 @@ function locationSummary(agent, scope) {
   return [
     `Repo target: ${repoTarget}`,
     `Shared .tink target: ${path.join(installTarget, '.tink')}`,
-    includesClaude(agent) ? `Claude Code command target: ${path.join(installTarget, '.claude/commands/tink')}` : null,
-    includesClaude(agent) ? `Claude Code skill target: ${path.join(installTarget, '.claude/skills/tink')}` : null,
+    includesClaude(agent) ? `Claude Code command target: ${path.join(claudeDir(installTarget), 'commands/tink')}` : null,
+    includesClaude(agent) ? `Claude Code skill target: ${path.join(claudeDir(installTarget), 'skills/tink')}` : null,
     includesCodex(agent) ? `Codex skills target: ${path.join(codexHome(), 'skills')}` : null,
     includesCodex(agent) ? `Codex picker cleanup target: ${path.join(process.cwd(), '.claude')}` : null
   ].filter(Boolean).join('\n');
@@ -710,12 +719,12 @@ function copyDir(src, dest, base) {
 function copyTinkCommands(templateRoot, target) {
   const commandSrc = path.join(templateRoot, 'claude/commands/tink');
-  const commandDest = path.join(target, '.claude/commands/tink');
-  const flatCommandDest = path.join(target, '.claude/commands');
+  const commandDest = path.join(claudeDir(target), 'commands/tink');
+  const flatCommandDest = path.join(claudeDir(target), 'commands');
   const legacyFlatCommands = ['tink-setup.md', 'tink-forge.md', 'tink-list.md', 'tink-purge.md', 'tink-hone.md'];
   const legacyNamespaceCommands = ['forge.md', 'purge.md', 'hone.md'];
   const legacyTinyCommands = ['tiny-setup.md', 'tiny-use.md', 'tiny-list.md', 'tiny-save.md'];
-  const legacyDirs = [path.join(flatCommandDest, 'tiny'), path.join(target, '.claude/skills/tiny')];
+  const legacyDirs = [path.join(flatCommandDest, 'tiny'), path.join(claudeDir(target), 'skills/tiny')];
   for (const name of legacyFlatCommands) {
     const legacy = path.join(flatCommandDest, name);
     if (fs.existsSync(legacy)) {
@@ -863,7 +872,7 @@ function hookCommandFor(scope, target) {
 }
 function registerClaudeHook(target, scope, base) {
-  const settingsPath = path.join(target, '.claude/settings.json');
+  const settingsPath = path.join(claudeDir(target), 'settings.json');
   const settings = readJsonFile(settingsPath, {});
   const command = hookCommandFor(scope, target);
   settings.hooks ||= {};
@@ -893,7 +902,7 @@ function copySelected(scope, components, agent) {
   }
   if (wantsClaudeSkill(components)) {
     if (includesClaude(agent) && !cleanupCodexPicker) {
-      copyDir(path.join(templateRoot, 'claude/skills'), path.join(target, '.claude/skills'), target);
+      copyDir(path.join(templateRoot, 'claude/skills'), path.join(claudeDir(target), 'skills'), target);
     }
   }
   if (wantsCodexSkills(components)) {
@@ -995,8 +1004,8 @@ function doneLineFor(agent) {
 function updateResultSummary(agent, targets) {
   const locations = [
-    includesClaude(agent) ? `Claude Code commands: ${path.join(targets.installTarget, '.claude/commands/tink')}` : null,
-    includesClaude(agent) ? `Claude Code skill: ${path.join(targets.installTarget, '.claude/skills/tink')}` : null,
+    includesClaude(agent) ? `Claude Code commands: ${path.join(claudeDir(targets.installTarget), 'commands/tink')}` : null,
+    includesClaude(agent) ? `Claude Code skill: ${path.join(claudeDir(targets.installTarget), 'skills/tink')}` : null,
     includesCodex(agent) ? `Codex skills: ${path.join(targets.codexTarget, 'skills')}` : null,
     `Tink shared files: ${path.join(targets.installTarget, '.tink')}`
   ].filter(Boolean);
@@ -1216,12 +1225,119 @@ async function resolveChoices() {
   return { agent, scope, components, gitPolicy, hookScope, language };
 }
+function findAllTinkRepos() {
+  const found = [];
+  const skip = new Set(['node_modules', '.git', 'vendor', 'dist', 'build', 'out', 'target', '.cache']);
+  function scan(dir, depth) {
+    if (depth > 4) return;
+    let entries;
+    try { entries = fs.readdirSync(dir, { withFileTypes: true }); } catch { return; }
+    let hasTink = false;
+    for (const entry of entries) {
+      if (!entry.isDirectory()) continue;
+      if (entry.name === '.tink') { hasTink = true; continue; }
+      if (skip.has(entry.name) || entry.name.startsWith('.')) continue;
+      scan(path.join(dir, entry.name), depth + 1);
+    }
+    if (hasTink) found.push(dir);
+  }
+  scan(os.homedir(), 0);
+  return found;
+}
+function isDirenvAvailable() {
+  return spawnSync('direnv', ['version'], { encoding: 'utf8' }).status === 0;
+}
+function parseEnvrc(envrcPath, repoDir) {
+  if (!fs.existsSync(envrcPath)) return {};
+  const env = {};
+  for (const line of fs.readFileSync(envrcPath, 'utf8').split('\n')) {
+    const m = line.match(/^\s*export\s+([A-Z_][A-Z0-9_]*)=(.*)/);
+    if (!m) continue;
+    let val = m[2].trim().replace(/^["']|["']$/g, '');
+    val = val
+      .replace(/\$HOME|\bHOME\b/g, os.homedir())
+      .replace(/\$PWD|\bPWD\b/g, repoDir)
+      .replace(/^~/, os.homedir());
+    env[m[1]] = val;
+  }
+  return env;
+}
+async function runAllRepos() {
+  const allRepos = findAllTinkRepos();
+  const sourceRoot = path.resolve(root);
+  const repos = allRepos.filter((r) => path.resolve(r) !== sourceRoot);
+  if (repos.length === 0) {
+    console.log('No repos with Tink installed found under home directory.');
+    return;
+  }
+  const hasDirenv = isDirenvAvailable();
+  const installScript = path.join(root, 'bin/install.js');
+  console.log(`Found ${repos.length} repo(s) with Tink installed:\n`);
+  for (const repo of repos) {
+    const envrc = path.join(repo, '.envrc');
+    const envVars = hasDirenv ? {} : parseEnvrc(envrc, repo);
+    const claudeTarget = envVars.CLAUDE_CONFIG_DIR
+      ? envVars.CLAUDE_CONFIG_DIR
+      : path.join(repo, '.claude');
+    const note = fs.existsSync(envrc)
+      ? hasDirenv
+        ? `(direnv)`
+        : envVars.CLAUDE_CONFIG_DIR
+          ? `(.envrc → CLAUDE_CONFIG_DIR=${envVars.CLAUDE_CONFIG_DIR})`
+          : `(.envrc, no CLAUDE_CONFIG_DIR)`
+      : '';
+    console.log(`  ${repo} ${note}`);
+    console.log(`    → ${claudeTarget}/commands/tink`);
+  }
+  console.log('');
+  for (const repo of repos) {
+    console.log(`▶ ${path.basename(repo)} (${repo})`);
+    const envrc = path.join(repo, '.envrc');
+    const extraEnv = hasDirenv ? {} : parseEnvrc(envrc, repo);
+    const mergedEnv = { ...process.env, ...extraEnv };
+    let result;
+    if (hasDirenv && fs.existsSync(envrc)) {
+      result = spawnSync(
+        'direnv', ['exec', repo, 'node', installScript, 'update', '--yes', '--scope=repo'],
+        { cwd: repo, env: process.env, stdio: 'inherit', encoding: 'utf8' }
+      );
+    } else {
+      result = spawnSync(
+        process.execPath, [installScript, 'update', '--yes', '--scope=repo'],
+        { cwd: repo, env: mergedEnv, stdio: 'inherit', encoding: 'utf8' }
+      );
+    }
+    if (result.status !== 0) {
+      console.error(`  ✗ failed (exit ${result.status})`);
+    } else {
+      console.log(`  ✓ done`);
+    }
+    console.log('');
+  }
+}
 async function main() {
   if (command === 'help' || args.includes('--help')) {
     usage();
     process.exit(0);
   }
+  if (command === 'update' && args.includes('--all-repos')) {
+    await runAllRepos();
+    return;
+  }
   if (command === 'dashboard') {
     runDashboard();
     return;

package/commands/cast.md CHANGED Viewed

@@ -32,6 +32,14 @@ A valid `/tink:cast` response must do one of these:
 If the task is clear enough to classify, do not ask broad clarification first. Make a best recommendation, ask for approval, then act.
+## Cast mode
+`/tink:cast` without a task argument shows the current mode and offers a change option. `/tink:cast <mode>` sets the mode and saves it to `cast_mode` in `.tink/config.json`.
+Modes:
+- `quick` — Forces Lane 1 fast path regardless of task complexity. Skips harness selection and starts immediately.
+- `standard` — Default behavior. Quick triage selects the right lane automatically.
+- `deep` — Runs a structured interview before planning. See **Deep mode** below.
 ## Interaction policy
 Always call the `AskUserQuestion` tool for choice prompts. Do not render `❯` text format. Do not ask the user to type a number inline.
@@ -79,12 +87,21 @@ When Stitch is visible, show exactly one proposal in this order: proposal, reaso
 2. reason
 3. choices
-Choose the one proposal by priority:
-1. safety or irreversibility
-2. success criteria or verification
-3. goal or scope ambiguity
-4. harness mismatch
-5. reusable improvement opportunity
+**Phase A — Blocking checks** (always run; always surface when triggered):
+1. Safety or irreversibility
+2. Missing success criteria or verification
+3. Goal or scope ambiguity
+4. Harness mismatch
+**Phase B — Plan-shaping checks** (run after Phase A; surface only when a concrete code-grounded alternative exists):
+5. Minimality — is the plan larger than the request warrants? Are new files, abstractions, or dependencies justified?
+6. Reuse — does an existing helper, pattern, or flow already solve this?
+7. Deletion/substitution — can the addition be replaced with deleting, configuring, or extending an existing path?
+Phase B proposal rules:
+- Never surface Phase B without a concrete alternative grounded in observed code or project state. "This looks large, consider simplifying" is not a valid finding.
+- Never suggest reducing: trust boundary input validation, data loss prevention, security measures, accessibility basics, or explicitly requested requirements.
+- In `deep` mode, skip Phase B entirely — the interview already covered minimality and reuse.
 Stitch may change the order or method of work, but it must not change the user's goal without separate approval.
@@ -114,6 +131,38 @@ If the user chooses `Continue as-is` / `이대로 진행`, proceed with the expl
 Do not record a clean Stitch pass.
+## Deep mode
+When `cast_mode` is `deep`, run a structured interview before the normal Procedure. The interview refines the task into a spec that feeds harness selection.
+**Round 0 — Topology lock** (not counted in progress)
+Before asking any questions, present the high-level components Claude infers from the request and visible codebase context. Ask the user to confirm, add, remove, or merge components. This prevents deep focus on one component from obscuring others.
+**Interview loop — Rounds 1–10**
+Show a progress indicator at the start of each question:
+```
+[Round N/10 ████████░░░░░░░░░░░░]
+```
+Rules:
+- Ask one question per round. Never ask multiple questions in one round.
+- Target the weakest clarity dimension each round: goal (0.35 weight), constraint (0.25), success criteria (0.25), context (0.15). These weights are internal judgment guides, not computed scores. Always pick the dimension where ambiguity most limits the next action.
+- Brownfield rule: investigate the codebase before asking. Do not ask about things already visible in the code. Confirm findings rather than ask from scratch.
+- Counter-question (user answers but also asks a question back): answer the counter-question first, then treat the combined response as this round's answer. Round counter does not advance.
+- Clarification request (user does not understand the question): rephrase and re-ask within the same round. Round counter does not advance.
+- Round 3+: user may exit the interview early and proceed directly to spec generation.
+- Round 10: hard cap. End the interview and produce the spec regardless of ambiguity.
+- End early when goal, constraint, and success criteria are all sufficiently clear, without waiting for Round 10.
+**Question mode shift** (triggered by clarity state, not round number):
+- When goal and constraint are sufficiently clear → shift to Contrarian mode: "What if the opposite were true? What if this assumption is wrong?"
+- When those are also resolved → shift to Simplifier mode: "What is the smallest version that still has meaningful value?"
+**Spec → plan.md → harness selection**
+When the interview ends, write `.tink/current/plan.md` with these top-level sections: Goal, Topology, Constraints, Success Criteria, Open Questions.
+Then proceed to the normal Procedure starting at step 3 (read harness index). Use the spec as the harness selection input instead of the raw task request. Stitch Phase A runs after harness selection as normal. Phase B is skipped.
 ## Reusable State Save Gate
 Reusable State Save Gate is a separate absolute hard approval gate, not merely a Stitch subtype. Current-run approval does not authorize reusable-state writes.
@@ -160,6 +209,38 @@ Optional current-run artifacts are created only when their harness is selected:
 - `goals.json`: current-run goals for `goal-checkpoint`; keep 2-6 goals, one active goal, status, done criteria, verification, and evidence.
 - `delegation.md`: handoff or parallel-work packets for `delegation-brief`; include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
+## Evidence Split
+Evidence Split is a base-run habit, not a separate harness. It keeps real work small while the task is happening by splitting broad or uncertain work into evidence-sized packets.
+Use Evidence Split at cast time and again during implementation when:
+- the first plan has several uncertain facts,
+- implementation starts coupling several files or concepts,
+- a check fails and the next action is unclear,
+- context is becoming broad or stale,
+- independent verification, review, or handoff would reduce risk.
+Skip it for tiny, obvious edits where a packet would not change the next action.
+Packet vocabulary:
+- `probe`: answer one unknown with 1-3 inputs.
+- `patch`: make one narrow implementation change.
+- `verify`: prove one success condition or failure recovery.
+- `review`: inspect one risk, regression, or omission.
+- `decision`: record one branch, chosen option, and evidence.
+Represent packets in existing run state:
+- `steps.json`: packetized steps and status.
+- `context-map.json`: the input files, sources, or excluded context for each packet.
+- `notes.md`: why work was split or re-split during implementation.
+- `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
+Safety defaults:
+- Do not start workers, tmux panes, worktrees, or external agents automatically.
+- Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
+- Do not let multiple packets edit the same file concurrently.
+- Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
+- Keep each packet to 1-3 primary inputs when possible.
 Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
 ```json
@@ -369,6 +450,7 @@ If any of the following is true, the task goes to Lane 3:
 - The task description mentions any of the above concepts
 **Step 2 — Lane decision (only if step 1 finds no hard-gate):**
+If `cast_mode` is `quick`, always select Lane 1 here regardless of task signals.
 **Lane 1 — instant start.** Any of these signals, with no contradicting complexity signal:
 - a question, explanation, or lookup with no file edits
@@ -480,12 +562,13 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - new pattern not covered yet
    These are task types, not harness names. Generic types (code change, bug fix, research, review, docs) default to the base run; a harness is added only when a specialized one genuinely fits.
-6. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
+6. Apply the Evidence Split check before choosing harnesses. If it changes the next action, represent the first packets in `steps.json` and connect each packet to context or verification evidence in `context-map.json`. Keep this check lightweight and skip it for tiny work.
+7. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
    - If the request is an ambiguous idea, early product concept, or underspecified implementation prompt, prefer `requirements-interview` before planning or coding. This is the default harness when Stitch is expected to trigger for goal ambiguity or missing acceptance criteria.
    - If the request asks for a plan, architecture decision, large refactor, migration, or broad public contract change, consider `plan-consensus`.
    - If the work naturally splits into multiple durable milestones, add `goal-checkpoint` and create `.tink/current/goals.json` after approval.
    - If parallel review, independent verification, or handoff would reduce risk, add `delegation-brief` and create `.tink/current/delegation.md` after approval. This harness prepares briefs only; it never starts tmux, worktrees, workers, or external agents.
-7. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
+8. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
    - Use `issue-triage` for issue/PR/QA intake, ready-for-agent briefs, needs-info/wontfix decisions, or vertical issue slices.
    - Use `bug-diagnosis-loop` for hard bugs, regressions, intermittent failures, or performance problems where a red-capable loop must come before code changes.
    - Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
@@ -496,7 +579,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.
    - `plan-consensus` must be explicitly considered for any from-scratch implementation, reimplementation, migration, or public contract/API design. If skipped, record a one-line reason in the 오버레이 점검 line.
    - The context budget and the "prefer 1-3 harnesses" guidance never justify dropping a REQUIRED overlay: overlays are cheap state files, not extra loaded context. A large task judged "fine with default harnesses" because the synthesis probe found a fit is a selection bug - the probe only answers whether a custom procedure is needed, not whether overlays are needed.
-8. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
+9. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
    After selecting, run a quick quality check using the index metadata for each chosen harness:
    - If fewer than 2 words in `use_when` match the current task description (case-insensitive) → treat as a Stitch harness-mismatch signal
@@ -504,26 +587,26 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - If `asks` is empty or missing and the task goal is not self-evident → treat as a Stitch goal-ambiguity signal
    Feed any signals into the Stitch evaluation at step 16.
-9. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
-10. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
-11. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
-12. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
-13. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
-14. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
-15. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
-16. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
-17. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
-18. Ask for explicit approval before non-trivial work.
-19. After approval, read only the selected harness files and any approved run-only draft.
-20. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
-21. Execute the first safe step immediately:
+10. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
+11. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
+12. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
+13. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
+14. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
+15. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
+16. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
+17. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
+18. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
+19. Ask for explicit approval before non-trivial work.
+20. After approval, read only the selected harness files and any approved run-only draft.
+21. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
+22. Execute the first safe step immediately:
    - inspect relevant files,
    - run a read-only diagnostic,
    - draft the first artifact,
    - or reproduce the issue.
-22. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
-23. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
-24. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
+23. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. Re-run Evidence Split when new uncertainty, coupling, failed checks, or context sprawl appears; update packetized steps and context evidence before continuing. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
+24. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
+25. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
 ## Synthesis probe

package/docs/geobench.md ADDED Viewed

@@ -0,0 +1,29 @@
+# GEO Benchmark For Tink
+This repository includes a [`geobench`](https://github.com/NomaDamas/geobench) product spec for measuring LLM answer visibility: hit rate, MRR, share of voice, citation rate/share, and confidence intervals.
+Product spec: [`geobench/tink-harness.yaml`](../geobench/tink-harness.yaml)
+## Run
+Use a local checkout or install of `geobench`; do not commit `.env`, raw run logs, or provider responses.
+```bash
+/path/to/geobench/dist/geobench estimate --product geobench/tink-harness.yaml --providers openai --tier cheap
+/path/to/geobench/dist/geobench profile geobench/tink-harness.yaml
+/path/to/geobench/dist/geobench bench --product geobench/tink-harness.yaml --providers openai --tier cheap --mode benchmark
+```
+To inspect results locally:
+```bash
+/path/to/geobench/dist/geobench dash
+```
+## Publishing Boundary
+Publish aggregate metrics only. Do not publish raw provider answers, secrets, private run logs, or `.env` values. When citing results, include the run date, provider set, tier, query count, and whether the spec was profiled before the run.
+## Korean Summary
+이 repo에는 Tink의 LLM 답변 노출도를 측정하기 위한 geobench 제품 스펙이 포함되어 있습니다. 실행 결과를 공개할 때는 hit rate, MRR, share of voice, citation rate/share 같은 집계 지표만 공개하고, 원문 provider 답변·시크릿·개인 실행 로그는 공개하지 않습니다.

package/docs/planned-work-units.ko.md CHANGED Viewed

@@ -91,17 +91,18 @@ Standalone CLI를 더 짧게 입력하고, 로컬 health report를 더 쉽게
 - `dashboard`는 기본적으로 로컬 정적 파일만 만든다. 서버, watcher, hidden cache, 자동 하네스 수정은 하지 않는다.
 - 생성 파일 경로가 플랫폼별로 안정화된 뒤에만 선택적인 open/export flag를 검토한다.
-## Swarm Fast Lane
+## Evidence Split / Parallel Evidence
-작업 병렬화를 위한 멀티 에이전트 하네스를 연구하되, Tink를 별도 multi-agent runtime으로 만들지 않는다. 상세 계획은 `docs/swarm-fast-lane.ko.md`와 `docs/swarm-fast-lane.md`에 둔다.
+작업 병렬화보다 먼저, Tink의 기본 작업 루프에 Evidence Split을 넣는다. Tink를 별도 multi-agent runtime으로 만들지 않고, 큰 작업을 작은 증거 packet으로 나누는 기본 동작부터 안정화한다. 상세 연구 기록은 `docs/swarm-fast-lane.ko.md`와 `docs/swarm-fast-lane.md`에 둔다.
-- worker는 전체 작업이 아니라 1-3개 입력만 가진 작은 packet을 본다.
-- worker는 기본적으로 파일을 직접 수정하지 않고 evidence와 patch candidate만 반환한다.
+- `/tink:cast`와 `$tink:cast`는 하네스 선택 전에 `probe`, `patch`, `verify`, `review`, `decision` packet으로 나눌 수 있는지 점검한다.
+- 실제 작업 중에도 불확실성, 검증 실패, context 확대, 변경 결합이 생기면 다시 packet으로 나눈다.
+- packet은 전체 작업이 아니라 1-3개 입력만 가진 작은 단위를 본다.
+- 외부 worker가 필요할 때도 기본적으로 파일을 직접 수정하지 않고 evidence와 patch candidate만 반환한다.
 - 메인 에이전트만 최종 patch 선택, 파일 수정, 검증을 책임진다.
 - 성공 지표는 "항상 더 빠름"이 아니라 main context 감소, 재작업 감소, 실패 조기 발견, 검증 통과율 유지 또는 개선으로 둔다.
-- 초기 모드는 `parallel-probe`, `patch-candidate-race`, `micro-contract-split`, `speculative-verifier`, `context-starvation-mode` 후보를 검토한다.
-- `/tink:cast`와 `$tink:cast`가 이 하네스를 언제 선택하고 언제 거절할지 문서화한다.
-- worker 출력은 300단어 이하, evidence-only, confidence 포함으로 제한한다.
+- 초기 모드는 core behavior인 Evidence Split으로 두고, 실제 worker runtime은 별도 후속 작업으로 미룬다.
+- worker 출력은 future runtime에서도 300단어 이하, evidence-only, confidence 포함으로 제한한다.
 - public contract, secrets, 넓은 repo scan, 동일 파일 동시 수정이 필요한 작업에서는 선택하지 않는다.
 ## 제외

package/docs/planned-work-units.md CHANGED Viewed

@@ -91,17 +91,18 @@ Make the standalone CLI easier to type and make the local health report easier t
 - Keep `dashboard` local and static by default: no server, watcher, hidden cache, or automatic harness edits.
 - Allow an optional open/export flag only after the generated file path behavior is stable across platforms.
-## Swarm Fast Lane
+## Evidence Split / Parallel Evidence
-Research a multi-agent harness for parallel work without turning Tink into a separate multi-agent runtime. The detailed plan lives in `docs/swarm-fast-lane.ko.md` and `docs/swarm-fast-lane.md`.
+Before adding parallel workers, add Evidence Split to Tink's default work loop. Tink should not become a separate multi-agent runtime; it should first make large work divisible into small evidence packets. The research notes live in `docs/swarm-fast-lane.ko.md` and `docs/swarm-fast-lane.md`.
-- Workers see small packets with only 1-3 inputs, not the whole task.
-- Workers do not edit files by default; they return evidence and patch candidates.
+- `/tink:cast` and `$tink:cast` check whether work should split into `probe`, `patch`, `verify`, `review`, or `decision` packets before harness selection.
+- During implementation, Tink re-splits work when uncertainty, failed checks, context sprawl, or coupled changes appear.
+- Packets see only 1-3 inputs, not the whole task.
+- If external workers are used later, they do not edit files by default; they return evidence and patch candidates.
 - The main agent owns final patch selection, file edits, and verification.
 - Success is measured by less main-agent context, less rework, earlier failure detection, and equal or better verification pass rate, not by claiming universal raw speed.
-- Initial mode candidates are `parallel-probe`, `patch-candidate-race`, `micro-contract-split`, `speculative-verifier`, and `context-starvation-mode`.
-- `/tink:cast` and `$tink:cast` should document when to select or reject this harness.
-- Worker output is capped at 300 words and must include evidence and confidence.
+- The initial implementation is the core Evidence Split behavior; actual worker runtime remains deferred.
+- Future worker output should be capped at 300 words and include evidence and confidence.
 - Do not select it for unclear public contracts, secrets, broad repository scans, or same-file concurrent edits.
 ## Excluded

package/docs/swarm-fast-lane.ko.md CHANGED Viewed

@@ -1,16 +1,16 @@
-# Swarm Fast Lane 연구 계획
+# Evidence Split / Parallel Evidence 연구 계획
-이 문서는 멀티 에이전트를 작업 병렬화에 쓰되, Tink가 별도 거대 런타임이 되지 않도록 제한하는 연구 계획이다. 목표는 "에이전트를 많이 띄우기"가 아니라, 작은 컨텍스트 패킷을 병렬로 탐색해 메인 에이전트의 재작업과 전체 컨텍스트 부담을 줄이는 것이다.
+이 문서는 멀티 에이전트 작업 병렬화의 전 단계로, Tink가 큰 작업을 작은 evidence packet으로 나누는 기본 동작을 갖도록 제한하는 연구 계획이다. 목표는 "에이전트를 많이 띄우기"가 아니라, 작은 컨텍스트 패킷으로 조사, 수정, 검증, 리뷰, 결정을 분리해 메인 에이전트의 재작업과 전체 컨텍스트 부담을 줄이는 것이다.
 ## 문제 정의
 일반적인 멀티 에이전트 병렬화는 토큰을 더 많이 쓴다. 각 worker가 같은 문맥을 다시 읽고, 서로 다른 수정이 충돌하며, 메인 에이전트가 합산 비용을 다시 치르기 때문이다.
-`swarm-fast-lane`은 이 문제를 반대로 접근한다.
+Evidence Split은 이 문제를 반대로 접근한다.
-- worker가 전체 작업을 이해하지 않는다.
-- worker가 넓은 파일을 읽지 않는다.
-- worker가 기본적으로 직접 수정하지 않는다.
+- packet이 전체 작업을 이해하지 않는다.
+- packet이 넓은 파일을 읽지 않는다.
+- 외부 worker가 쓰이더라도 기본적으로 직접 수정하지 않는다.
 - worker 출력은 짧은 evidence와 patch candidate로 제한한다.
 - 메인 에이전트만 최종 경로를 선택하고 파일을 수정한다.
@@ -80,14 +80,16 @@ worker는 파일 수정 없이 관련 파일, 위험, 테스트 후보만 찾는
 worker에게 의도적으로 불완전한 최소 컨텍스트만 준다. 목적은 좋은 구현이 아니라, 작은 정보로도 잡히는 문제를 싸게 찾는 것이다.
-## 하네스 계약
+## Core Behavior 계약
-`swarm-fast-lane` 하네스는 다음 조건을 만족할 때만 선택한다.
+Evidence Split은 별도 하네스가 아니라 `/tink:cast`와 `$tink:cast`의 기본 동작이다. 다음 조건에서 사용한다.
 - 작업이 2-5개의 독립 packet으로 나뉜다.
 - 각 packet은 입력 파일 또는 질문이 1-3개로 제한된다.
-- 각 worker의 출력은 300단어 이하로 제한된다.
-- worker는 기본적으로 직접 파일을 수정하지 않는다.
+- packet type은 `probe`, `patch`, `verify`, `review`, `decision` 중 하나다.
+- 실제 작업 중 불확실성, 검증 실패, context 확대, 변경 결합이 생기면 다시 packet으로 나눈다.
+- 외부 worker의 출력은 future runtime에서도 300단어 이하로 제한한다.
+- 외부 worker는 기본적으로 직접 파일을 수정하지 않는다.
 - worker 출력에는 evidence, 추천 행동, confidence가 포함된다.
 - 메인 에이전트가 최종 patch와 검증을 책임진다.
@@ -118,15 +120,14 @@ worker에게 의도적으로 불완전한 최소 컨텍스트만 준다. 목적
 첫 구현 slice는 다음을 완료로 본다.
-- `swarm-fast-lane` 하네스 초안이 있다.
-- `/tink:cast`와 `$tink:cast`가 이 하네스를 언제 선택하고 언제 거절할지 문서화되어 있다.
-- worker packet 형식이 `.tink/current/delegation.md` 또는 별도 run artifact로 표현된다.
+- Evidence Split이 Tink core rules와 `/tink:cast`, `$tink:cast` 문서에 기본 동작으로 들어간다.
+- packet 형식이 `steps.json`, `context-map.json`, `notes.md`, 필요 시 `.tink/current/delegation.md`로 표현된다.
 - worker 직접 수정은 기본 비활성이다.
-- 최소 하나의 fixture 또는 문서 예제가 있다.
+- 작은 작업에서는 생략 가능하다는 lightweight rule이 있다.
 - 검증은 "더 빠름"을 단정하지 않고, context 감소와 재작업 감소 근거를 기록한다.
 ## 열린 질문
 - 실제 worker 실행은 Codex/Claude Code의 기존 기능을 얇게 호출할지, Tink는 packet 문서화까지만 할지 결정해야 한다.
-- worker 결과 schema를 `delegation-brief`에 통합할지, 별도 하네스로 둘지 결정해야 한다.
-- fast lane이라는 이름이 과도한 성능 보장을 암시하지 않도록 사용자 문구를 조정해야 한다.
+- worker 결과 schema를 `delegation-brief`에 통합할지, 별도 runtime artifact로 둘지 결정해야 한다.
+- `swarm-fast-lane` 이름은 연구 문서의 임시 이름으로만 남기고, 사용자 문구는 Evidence Split 또는 Parallel Evidence를 우선한다.

package/docs/swarm-fast-lane.md CHANGED Viewed

@@ -1,16 +1,16 @@
-# Swarm Fast Lane Research Plan
+# Evidence Split / Parallel Evidence Research Plan
-This document describes a constrained research plan for using multi-agent parallelism without turning Tink into a large standalone runtime. The goal is not to spawn more agents by default. The goal is to split work into tiny context packets so workers can explore independent evidence while the main agent reduces rework and context load.
+This document describes the step before multi-agent parallelism: Tink should first split large work into small evidence packets without becoming a separate runtime. The goal is not to spawn more agents by default. The goal is to separate probe, patch, verify, review, and decision work into tiny context packets so the main agent reduces rework and context load.
 ## Problem
 Naive multi-agent parallelism usually spends more tokens. Each worker rereads context, independent edits conflict, and the main agent still pays a reconciliation cost.
-`swarm-fast-lane` inverts that model.
+Evidence Split inverts that model.
-- Workers do not understand the whole task.
-- Workers do not read broad context.
-- Workers do not edit files by default.
+- Packets do not understand the whole task.
+- Packets do not read broad context.
+- If external workers are used, they do not edit files by default.
 - Worker output is limited to short evidence and patch candidates.
 - The main agent chooses the final path and owns file edits.
@@ -80,14 +80,16 @@ Workers look only for reasons the current implementation approach will fail. Thi
 Workers intentionally receive incomplete minimal context. The point is not high-quality implementation; it is cheaply detecting problems that are visible with little information.
-## Harness Contract
+## Core Behavior Contract
-The `swarm-fast-lane` harness is eligible only when:
+Evidence Split is not a separate harness. It is default behavior inside `/tink:cast` and `$tink:cast`. Use it when:
 - the task splits into 2-5 independent packets
 - each packet is limited to 1-3 input files or questions
-- each worker output is limited to 300 words
-- workers do not edit files by default
+- each packet type is `probe`, `patch`, `verify`, `review`, or `decision`
+- work should be re-split during implementation because uncertainty, failed checks, context sprawl, or coupled changes appeared
+- future worker output is limited to 300 words
+- external workers do not edit files by default
 - worker output includes evidence, recommended action, and confidence
 - the main agent owns final patching and verification
@@ -118,15 +120,14 @@ The first version can start with estimates, but run artifacts should record evid
 The first implementation slice is done when:
-- a `swarm-fast-lane` harness draft exists
-- `/tink:cast` and `$tink:cast` document when to select or reject it
-- worker packet format is represented in `.tink/current/delegation.md` or another run artifact
+- Evidence Split is documented as default behavior in Tink core rules and `/tink:cast`, `$tink:cast`
+- packet format is represented in `steps.json`, `context-map.json`, `notes.md`, and optionally `.tink/current/delegation.md`
 - direct worker edits are disabled by default
-- at least one fixture or example exists
+- tiny work can skip the packet ceremony
 - verification records context reduction and rework reduction evidence instead of claiming raw speed
 ## Open Questions
 - Should actual worker execution call existing Codex/Claude Code features, or should Tink only document packets?
-- Should worker result schema extend `delegation-brief`, or should this be a separate harness?
-- Should the user-facing name avoid implying guaranteed speed?
+- Should worker result schema extend `delegation-brief`, or should it use a separate runtime artifact?
+- Keep `swarm-fast-lane` only as a research placeholder; prefer Evidence Split or Parallel Evidence in user-facing copy.

package/geobench/tink-harness.yaml ADDED Viewed

@@ -0,0 +1,47 @@
+name: "Tink"
+aliases: ["Tink Harness", "tink-harness", "Tink for Claude Code", "Tink for Codex"]
+romanizations: []
+category: "coding-agent harness and visible workflow runtime"
+description: "Tink is a small harness layer for Claude Code and Codex that keeps non-trivial agent work in visible files: task contracts, run state, verification checks, run records, and approval-gated reusable harnesses."
+competitors: ["Gajae-Code", "Claude Code", "Codex CLI", "OpenCode", "Aider", "Cursor"]
+cited_domains: ["github.com/dotoricode/tink-harness", "npmjs.com/package/tink-harness"]
+target_languages: ["en", "ko"]
+target_audience:
+  - "developers using Claude Code or Codex for multi-step coding work"
+  - "maintainers who want visible run state and verification evidence"
+  - "AI-agent workflow builders comparing harness and memory approaches"
+discovery_sources:
+  - "https://github.com/dotoricode/tink-harness"
+  - "https://www.npmjs.com/package/tink-harness"
+enriched_profile:
+  generated_at: "2026-06-15T00:00:00Z"
+  profiler_model: "curated-public-source"
+  value_proposition: "A local, approval-gated harness layer that makes Claude Code and Codex work inspectable through task contracts, current-run files, verification evidence, run records, and reusable workflow harnesses."
+  source_content_hashes:
+    - "curated-public-source"
+  target_audience:
+    - segment: "coding-agent operators"
+      pains:
+        - "Need visible run state instead of relying on hidden chat memory"
+        - "Need repeatable verification and approval boundaries for multi-step agent work"
+    - segment: "open-source maintainers"
+      pains:
+        - "Need to compare discoverability against adjacent coding-agent tools"
+        - "Need citation and share-of-voice evidence before changing public positioning"
+  use_cases:
+    - problem_statement: "When a developer uses Claude Code or Codex for multi-step work and needs visible task contracts, plans, checks, and run records instead of hidden chat memory."
+      audience: "developers using coding agents"
+      evidence_quotes: ["visible files", "task contract", "run state", "verification steps"]
+      confidence: 0.86
+      language: "en"
+    - problem_statement: "When a maintainer wants reusable coding-agent workflows that are saved only after explicit approval and can be inspected, diffed, and committed."
+      audience: "open-source maintainers"
+      evidence_quotes: ["reusable harnesses", "saved only after your approval", "open, diff, and commit"]
+      confidence: 0.84
+      language: "en"
+    - problem_statement: "Claude Code나 Codex 작업 사이에서 맥락이 사라지지 않도록 실행 상태, 검증 단계, 승인 기반 하네스를 파일로 남기고 싶을 때."
+      audience: "Korean-speaking coding-agent users"
+      evidence_quotes: ["작업 계약", "실행 상태", "검증 단계", "명시적 승인"]
+      confidence: 0.82
+      language: "ko"

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "tink-harness",
-  "version": "1.13.0",
+  "version": "1.15.0",
   "description": "Self-growing harnesses for Claude Code and Codex.",
   "license": "MIT",
   "type": "module",
@@ -14,6 +14,7 @@
     "commands/",
     "skills/",
     "hooks/",
+    "geobench/",
     "docs/*.md",
     "docs/pr/",
     "README.md",

package/templates/claude/commands/tink/cast.md CHANGED Viewed

@@ -32,6 +32,14 @@ A valid `/tink:cast` response must do one of these:
 If the task is clear enough to classify, do not ask broad clarification first. Make a best recommendation, ask for approval, then act.
+## Cast mode
+`/tink:cast` without a task argument shows the current mode and offers a change option. `/tink:cast <mode>` sets the mode and saves it to `cast_mode` in `.tink/config.json`.
+Modes:
+- `quick` — Forces Lane 1 fast path regardless of task complexity. Skips harness selection and starts immediately.
+- `standard` — Default behavior. Quick triage selects the right lane automatically.
+- `deep` — Runs a structured interview before planning. See **Deep mode** below.
 ## Interaction policy
 Always call the `AskUserQuestion` tool for choice prompts. Do not render `❯` text format. Do not ask the user to type a number inline.
@@ -79,12 +87,21 @@ When Stitch is visible, show exactly one proposal in this order: proposal, reaso
 2. reason
 3. choices
-Choose the one proposal by priority:
-1. safety or irreversibility
-2. success criteria or verification
-3. goal or scope ambiguity
-4. harness mismatch
-5. reusable improvement opportunity
+**Phase A — Blocking checks** (always run; always surface when triggered):
+1. Safety or irreversibility
+2. Missing success criteria or verification
+3. Goal or scope ambiguity
+4. Harness mismatch
+**Phase B — Plan-shaping checks** (run after Phase A; surface only when a concrete code-grounded alternative exists):
+5. Minimality — is the plan larger than the request warrants? Are new files, abstractions, or dependencies justified?
+6. Reuse — does an existing helper, pattern, or flow already solve this?
+7. Deletion/substitution — can the addition be replaced with deleting, configuring, or extending an existing path?
+Phase B proposal rules:
+- Never surface Phase B without a concrete alternative grounded in observed code or project state. "This looks large, consider simplifying" is not a valid finding.
+- Never suggest reducing: trust boundary input validation, data loss prevention, security measures, accessibility basics, or explicitly requested requirements.
+- In `deep` mode, skip Phase B entirely — the interview already covered minimality and reuse.
 Stitch may change the order or method of work, but it must not change the user's goal without separate approval.
@@ -114,6 +131,38 @@ If the user chooses `Continue as-is` / `이대로 진행`, proceed with the expl
 Do not record a clean Stitch pass.
+## Deep mode
+When `cast_mode` is `deep`, run a structured interview before the normal Procedure. The interview refines the task into a spec that feeds harness selection.
+**Round 0 — Topology lock** (not counted in progress)
+Before asking any questions, present the high-level components Claude infers from the request and visible codebase context. Ask the user to confirm, add, remove, or merge components. This prevents deep focus on one component from obscuring others.
+**Interview loop — Rounds 1–10**
+Show a progress indicator at the start of each question:
+```
+[Round N/10 ████████░░░░░░░░░░░░]
+```
+Rules:
+- Ask one question per round. Never ask multiple questions in one round.
+- Target the weakest clarity dimension each round: goal (0.35 weight), constraint (0.25), success criteria (0.25), context (0.15). These weights are internal judgment guides, not computed scores. Always pick the dimension where ambiguity most limits the next action.
+- Brownfield rule: investigate the codebase before asking. Do not ask about things already visible in the code. Confirm findings rather than ask from scratch.
+- Counter-question (user answers but also asks a question back): answer the counter-question first, then treat the combined response as this round's answer. Round counter does not advance.
+- Clarification request (user does not understand the question): rephrase and re-ask within the same round. Round counter does not advance.
+- Round 3+: user may exit the interview early and proceed directly to spec generation.
+- Round 10: hard cap. End the interview and produce the spec regardless of ambiguity.
+- End early when goal, constraint, and success criteria are all sufficiently clear, without waiting for Round 10.
+**Question mode shift** (triggered by clarity state, not round number):
+- When goal and constraint are sufficiently clear → shift to Contrarian mode: "What if the opposite were true? What if this assumption is wrong?"
+- When those are also resolved → shift to Simplifier mode: "What is the smallest version that still has meaningful value?"
+**Spec → plan.md → harness selection**
+When the interview ends, write `.tink/current/plan.md` with these top-level sections: Goal, Topology, Constraints, Success Criteria, Open Questions.
+Then proceed to the normal Procedure starting at step 3 (read harness index). Use the spec as the harness selection input instead of the raw task request. Stitch Phase A runs after harness selection as normal. Phase B is skipped.
 ## Reusable State Save Gate
 Reusable State Save Gate is a separate absolute hard approval gate, not merely a Stitch subtype. Current-run approval does not authorize reusable-state writes.
@@ -160,6 +209,38 @@ Optional current-run artifacts are created only when their harness is selected:
 - `goals.json`: current-run goals for `goal-checkpoint`; keep 2-6 goals, one active goal, status, done criteria, verification, and evidence.
 - `delegation.md`: handoff or parallel-work packets for `delegation-brief`; include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
+## Evidence Split
+Evidence Split is a base-run habit, not a separate harness. It keeps real work small while the task is happening by splitting broad or uncertain work into evidence-sized packets.
+Use Evidence Split at cast time and again during implementation when:
+- the first plan has several uncertain facts,
+- implementation starts coupling several files or concepts,
+- a check fails and the next action is unclear,
+- context is becoming broad or stale,
+- independent verification, review, or handoff would reduce risk.
+Skip it for tiny, obvious edits where a packet would not change the next action.
+Packet vocabulary:
+- `probe`: answer one unknown with 1-3 inputs.
+- `patch`: make one narrow implementation change.
+- `verify`: prove one success condition or failure recovery.
+- `review`: inspect one risk, regression, or omission.
+- `decision`: record one branch, chosen option, and evidence.
+Represent packets in existing run state:
+- `steps.json`: packetized steps and status.
+- `context-map.json`: the input files, sources, or excluded context for each packet.
+- `notes.md`: why work was split or re-split during implementation.
+- `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
+Safety defaults:
+- Do not start workers, tmux panes, worktrees, or external agents automatically.
+- Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
+- Do not let multiple packets edit the same file concurrently.
+- Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
+- Keep each packet to 1-3 primary inputs when possible.
 Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
 ```json
@@ -369,6 +450,7 @@ If any of the following is true, the task goes to Lane 3:
 - The task description mentions any of the above concepts
 **Step 2 — Lane decision (only if step 1 finds no hard-gate):**
+If `cast_mode` is `quick`, always select Lane 1 here regardless of task signals.
 **Lane 1 — instant start.** Any of these signals, with no contradicting complexity signal:
 - a question, explanation, or lookup with no file edits
@@ -480,12 +562,13 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - new pattern not covered yet
    These are task types, not harness names. Generic types (code change, bug fix, research, review, docs) default to the base run; a harness is added only when a specialized one genuinely fits.
-6. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
+6. Apply the Evidence Split check before choosing harnesses. If it changes the next action, represent the first packets in `steps.json` and connect each packet to context or verification evidence in `context-map.json`. Keep this check lightweight and skip it for tiny work.
+7. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
    - If the request is an ambiguous idea, early product concept, or underspecified implementation prompt, prefer `requirements-interview` before planning or coding. This is the default harness when Stitch is expected to trigger for goal ambiguity or missing acceptance criteria.
    - If the request asks for a plan, architecture decision, large refactor, migration, or broad public contract change, consider `plan-consensus`.
    - If the work naturally splits into multiple durable milestones, add `goal-checkpoint` and create `.tink/current/goals.json` after approval.
    - If parallel review, independent verification, or handoff would reduce risk, add `delegation-brief` and create `.tink/current/delegation.md` after approval. This harness prepares briefs only; it never starts tmux, worktrees, workers, or external agents.
-7. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
+8. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
    - Use `issue-triage` for issue/PR/QA intake, ready-for-agent briefs, needs-info/wontfix decisions, or vertical issue slices.
    - Use `bug-diagnosis-loop` for hard bugs, regressions, intermittent failures, or performance problems where a red-capable loop must come before code changes.
    - Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
@@ -496,7 +579,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.
    - `plan-consensus` must be explicitly considered for any from-scratch implementation, reimplementation, migration, or public contract/API design. If skipped, record a one-line reason in the 오버레이 점검 line.
    - The context budget and the "prefer 1-3 harnesses" guidance never justify dropping a REQUIRED overlay: overlays are cheap state files, not extra loaded context. A large task judged "fine with default harnesses" because the synthesis probe found a fit is a selection bug - the probe only answers whether a custom procedure is needed, not whether overlays are needed.
-8. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
+9. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
    After selecting, run a quick quality check using the index metadata for each chosen harness:
    - If fewer than 2 words in `use_when` match the current task description (case-insensitive) → treat as a Stitch harness-mismatch signal
@@ -504,26 +587,26 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - If `asks` is empty or missing and the task goal is not self-evident → treat as a Stitch goal-ambiguity signal
    Feed any signals into the Stitch evaluation at step 16.
-9. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
-10. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
-11. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
-12. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
-13. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
-14. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
-15. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
-16. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
-17. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
-18. Ask for explicit approval before non-trivial work.
-19. After approval, read only the selected harness files and any approved run-only draft.
-20. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
-21. Execute the first safe step immediately:
+10. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
+11. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
+12. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
+13. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
+14. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
+15. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
+16. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
+17. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
+18. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
+19. Ask for explicit approval before non-trivial work.
+20. After approval, read only the selected harness files and any approved run-only draft.
+21. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
+22. Execute the first safe step immediately:
    - inspect relevant files,
    - run a read-only diagnostic,
    - draft the first artifact,
    - or reproduce the issue.
-22. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
-23. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
-24. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
+23. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. Re-run Evidence Split when new uncertainty, coupling, failed checks, or context sprawl appears; update packetized steps and context evidence before continuing. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
+24. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
+25. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
 ## Synthesis probe

package/templates/codex/skills/tink-core/RULES.md CHANGED Viewed

@@ -26,23 +26,25 @@ Accept legacy `$tink <action>` spelling for compatibility, but present `$tink:<a
 6. If `.tink/current/` exists and continuity is uncertain, read `plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, and `contract.json` when present; summarize goal, last safe point, next step, open questions, and verification; then ask resume/archive/replace/cancel before continuing.
 7. Run the synthesis probe before committing to `.tink/current/`. Strong fit keeps the harness; generic fit adds a run-only draft; no fit loads `harness-synthesis`.
 8. If too many tools, skills, agents, or harnesses are available, use `harness-curation` to choose the smallest effective set before loading more context.
-9. Treat visible-thinking and focused work workflows as ordinary Tink harness choices, not new commands. Actively consider them when their trigger changes the procedure: use `requirements-interview` for ambiguity, unclear scope, or missing acceptance criteria; `plan-consensus` for broad plans, migrations, API/schema/contract changes, or tradeoffs; `goal-checkpoint` for multi-file, multi-phase, resumed, release, or long runs; `delegation-brief` for handoff, independent verification, parallel review, or another agent/human brief; `issue-triage` for issue/PR/QA intake or vertical slices; `bug-diagnosis-loop` for hard bugs that need a red-capable loop before code changes; `review-two-axis` for Standards/Spec diff review; `decision-map` for multi-session unresolved decisions; and `architecture-deepening` for deep module, interface, seam, leverage, locality, or testability work.
-10. Run Stitch once before committing to `.tink/current/`: evaluate every time, show exactly one proposal only for high-impact quality or safety branches, and use the configured language.
-11. For non-trivial `$tink:cast` runs, ask for current-run approval before creating `.tink/current/`, loading harness bodies, editing files, or executing the first step. Codex must not silently treat a command invocation as approval.
-12. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
-13. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
-14. Current-run approval never authorizes reusable-state writes. Before saving reusable state, show operation, destination files, exact entry or patch summary, reusable reason, sensitive content excluded, and rollback/removal path.
-15. Before saving a reusable rule graph update, run a structural gate: duplicate, breadth, evidence, verification, Claude Code/Codex compatibility, macOS/Windows compatibility, and portable commands. AI may propose a rule; saving it still requires separate approval.
-16. `$tink:frog` may inspect rule quality as well as harness quality. Prefer keep, rewrite, split, merge, or needs-evidence recommendations before any removal proposal.
-17. For `$tink:weave` or `$tink:frog`, prepare the harness health summary before ranking candidates. If `.tink/tools/generate-harness-lifecycle-summary.mjs` exists, run `node .tink/tools/generate-harness-lifecycle-summary.mjs` from the repo root and then read `.tink/maintenance/harness-lifecycle.json`. If the generator is missing, continue from compact run, queue, ledger, and friction evidence.
-18. When `.tink/maintenance/harness-lifecycle.json` or another file following `.tink/schemas/harness-lifecycle.schema.json` exists, treat it as a plain harness health summary. Use `confidence`, `evidence_grade`, `evidence_handles`, and `safe_next_action` to prioritize `$tink:weave` or `$tink:frog` candidates, but do not treat it as approval. Low-confidence entries stay as observation. Harness edits, rule updates, memory saves, merges, archives, and deletions still require the reusable-state approval gate.
-19. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `.tink/current/goals.json` for `goal-checkpoint` and `.tink/current/delegation.md` for `delegation-brief`.
-20. Do not stop at recommendation. Execute the first safe step after run state exists.
-21. Run `$tink:verify` behavior before final when `contract.json` lists required checks. If `.tink/config.json` has `completion_policy: "strict"`, do not call the run done until required checks are represented in `.tink/current/verification.json`, `.tink/current/evidence.md` exists, and remaining risk is stated.
-22. Store reusable memory or rule updates under `.tink/` only after separate approval.
-23. If a check fails, update `.tink/current/notes.md`, state the failure, last safe point, and next single action. Append compact friction to `.tink/maintenance/friction.jsonl` when it exists. Feed repeated failures to `$tink:weave`.
-24. Keep context compact. Do not paste raw logs or full diffs.
-25. Use calm, clear, concise language. Prefer plain everyday words over technical terms. No jokes.
+9. Treat Evidence Split as a base-run habit, not a harness: for non-trivial work, first ask whether the task should be split into `probe`, `patch`, `verify`, `review`, or `decision` packets. Use it at cast time and again during implementation when uncertainty grows, a check fails, context gets broad, or several changes start to couple. Keep it lightweight for tiny tasks and skip it when it would add ceremony without changing the next action.
+10. Treat visible-thinking and focused work workflows as ordinary Tink harness choices, not new commands. Actively consider them when their trigger changes the procedure: use `requirements-interview` for ambiguity, unclear scope, or missing acceptance criteria; `plan-consensus` for broad plans, migrations, API/schema/contract changes, or tradeoffs; `goal-checkpoint` for multi-file, multi-phase, resumed, release, or long runs; `delegation-brief` for handoff, independent verification, parallel review, or another agent/human brief; `issue-triage` for issue/PR/QA intake or vertical slices; `bug-diagnosis-loop` for hard bugs that need a red-capable loop before code changes; `review-two-axis` for Standards/Spec diff review; `decision-map` for multi-session unresolved decisions; and `architecture-deepening` for deep module, interface, seam, leverage, locality, or testability work.
+11. Run Stitch once before committing to `.tink/current/`. Phase A (Blocking): always evaluate and surface when triggered — safety/irreversibility, missing success criteria, goal ambiguity, harness mismatch. Phase B (Plan-shaping): run after Phase A, surface only when a concrete code-grounded alternative exists — minimality, reuse, or deletion/substitution. Never surface Phase B without observed code evidence; never suggest reducing trust-boundary validation, data-loss prevention, security, accessibility, or explicitly requested requirements. In `deep` mode, skip Phase B entirely. Show exactly one proposal and use the configured language.
+12. For non-trivial `$tink:cast` runs, ask for current-run approval before creating `.tink/current/`, loading harness bodies, editing files, or executing the first step. Codex must not silently treat a command invocation as approval.
+13. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
+14. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
+15. Current-run approval never authorizes reusable-state writes. Before saving reusable state, show operation, destination files, exact entry or patch summary, reusable reason, sensitive content excluded, and rollback/removal path.
+16. Before saving a reusable rule graph update, run a structural gate: duplicate, breadth, evidence, verification, Claude Code/Codex compatibility, macOS/Windows compatibility, and portable commands. AI may propose a rule; saving it still requires separate approval.
+17. `$tink:frog` may inspect rule quality as well as harness quality. Prefer keep, rewrite, split, merge, or needs-evidence recommendations before any removal proposal.
+18. For `$tink:weave` or `$tink:frog`, prepare the harness health summary before ranking candidates. If `.tink/tools/generate-harness-lifecycle-summary.mjs` exists, run `node .tink/tools/generate-harness-lifecycle-summary.mjs` from the repo root and then read `.tink/maintenance/harness-lifecycle.json`. If the generator is missing, continue from compact run, queue, ledger, and friction evidence.
+19. When `.tink/maintenance/harness-lifecycle.json` or another file following `.tink/schemas/harness-lifecycle.schema.json` exists, treat it as a plain harness health summary. Use `confidence`, `evidence_grade`, `evidence_handles`, and `safe_next_action` to prioritize `$tink:weave` or `$tink:frog` candidates, but do not treat it as approval. Low-confidence entries stay as observation. Harness edits, rule updates, memory saves, merges, archives, and deletions still require the reusable-state approval gate.
+20. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `.tink/current/goals.json` for `goal-checkpoint` and `.tink/current/delegation.md` for `delegation-brief`. Evidence Split packets live in these run files; do not add a new public command or standalone runtime file for them.
+21. Do not stop at recommendation. Execute the first safe step after run state exists.
+22. Run `$tink:verify` behavior before final when `contract.json` lists required checks. If `.tink/config.json` has `completion_policy: "strict"`, do not call the run done until required checks are represented in `.tink/current/verification.json`, `.tink/current/evidence.md` exists, and remaining risk is stated.
+23. Store reusable memory or rule updates under `.tink/` only after separate approval.
+24. If a check fails, update `.tink/current/notes.md`, state the failure, last safe point, and next single action. Append compact friction to `.tink/maintenance/friction.jsonl` when it exists. Feed repeated failures to `$tink:weave`.
+25. Keep context compact. Do not paste raw logs or full diffs.
+26. Use calm, clear, concise language. Prefer plain everyday words over technical terms. No jokes.
+27. Read `cast_mode` from `.tink/config.json` before classifying the task. If `quick`, force Lane 1 (instant start) unless a hard-gate signal is present. If `deep`, run the structured interview before harness selection: (Round 0) present inferred topology and confirm with the user; (Rounds 1–10 max) ask one question per round targeting the weakest clarity dimension — goal (0.35 weight), constraint (0.25), success criteria (0.25), context (0.15) — investigate brownfield code before asking, do not ask what is already visible; show `[Round N/10 ████░░░░░░░░░░░░░░░░]` at each question; allow early exit from Round 3+; shift to Contrarian questioning when goal and constraint are clear, then Simplifier when those resolve; end by writing Goal, Topology, Constraints, Success Criteria, Open Questions to `.tink/current/plan.md`, then proceed to harness selection with Stitch Phase A only.
 ## Codex Approval Protocol
@@ -120,6 +122,39 @@ Optional current-run artifacts:
 - `.tink/current/goals.json`: create only when `goal-checkpoint` is selected. Keep 2-6 goals, one active goal, status, done criteria, verification, evidence, and next action.
 - `.tink/current/delegation.md`: create only when `delegation-brief` is selected. Include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
+## Evidence Split
+Evidence Split is Tink's default way to keep real work small while it is happening. It is not a separate harness and it does not imply parallel execution.
+Use Evidence Split when a task is non-trivial and any of these signals appears:
+- the first plan has several uncertain facts,
+- implementation starts coupling several files or concepts,
+- a check fails and the next action is unclear,
+- context is becoming broad or stale,
+- independent verification, review, or handoff would reduce risk.
+Skip it for tiny, obvious edits where a packet would not change the next action.
+Packet vocabulary:
+- `probe`: answer one unknown with 1-3 inputs.
+- `patch`: make one narrow implementation change.
+- `verify`: prove one success condition or failure recovery.
+- `review`: inspect one risk, regression, or omission.
+- `decision`: record one branch, chosen option, and evidence.
+Represent packets in existing run state:
+- `steps.json`: packetized steps and status.
+- `context-map.json`: the input files, sources, or excluded context for each packet.
+- `notes.md`: why work was split or re-split during implementation.
+- `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
+Safety defaults:
+- Do not start workers, tmux panes, worktrees, or external agents automatically.
+- Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
+- Do not let multiple packets edit the same file concurrently.
+- Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
+- Keep each packet to 1-3 primary inputs when possible.
 GJC-style harness selection rules:
 - Ambiguous ideas, early product concepts, vague bug reports, broad "make it better" requests, and underspecified implementation prompts should start with `requirements-interview`, usually alone until the user clarifies enough to plan or code.

package/templates/tink/config.json CHANGED Viewed

@@ -10,6 +10,7 @@
   "install_scope": "repo",
   "hook_scope": "off",
   "completion_policy": "normal",
+  "cast_mode": "standard",
   "default_harnesses_per_task": 4,
   "harness_lines_warning": 100,
   "context_budget": "soft",