tink-harness 1.13.0 → 1.15.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/CHANGELOG.md +16 -1
- package/README.ko.md +14 -1
- package/README.md +15 -2
- package/VERSIONING.md +1 -1
- package/bin/install.js +126 -10
- package/commands/cast.md +108 -25
- package/docs/geobench.md +29 -0
- package/docs/planned-work-units.ko.md +8 -7
- package/docs/planned-work-units.md +8 -7
- package/docs/swarm-fast-lane.ko.md +17 -16
- package/docs/swarm-fast-lane.md +17 -16
- package/geobench/tink-harness.yaml +47 -0
- package/package.json +2 -1
- package/templates/claude/commands/tink/cast.md +108 -25
- package/templates/codex/skills/tink-core/RULES.md +52 -17
- package/templates/tink/config.json +1 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,22 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to Tink are tracked here.
|
|
4
4
|
|
|
5
|
+
## Unreleased
|
|
6
|
+
|
|
7
|
+
- Added a geobench product spec and runbook for measuring Tink's LLM answer visibility with hit rate, MRR, share of voice, and citation metrics. The runbook keeps benchmark execution separate from this repo and says to publish aggregate metrics only.
|
|
8
|
+
|
|
9
|
+
## [1.15.0] - 2026-06-24
|
|
10
|
+
|
|
11
|
+
- Added cast mode system: `/tink:cast` now supports three modes — `quick` (forces Lane 1 fast path), `standard` (default, auto triage), and `deep` (structured interview before planning). The active mode is persisted in `.tink/config.json` as `cast_mode`. Setting the mode with `/tink:cast <mode>` shows the current mode and offers a change option when called without a task.
|
|
12
|
+
- Added `deep` mode interview pipeline: Round 0 topology lock confirms inferred components before questions start; Rounds 1–10 ask one question per round with a `[Round N/10 ████░░░]` progress indicator, target the weakest clarity dimension (goal/constraint/success criteria/context), investigate brownfield code before asking, handle counter-questions and clarification requests within the same round, allow early exit from Round 3+, and shift from Contrarian to Simplifier questioning as clarity improves. The interview produces a Goal/Topology/Constraints/Success Criteria/Open Questions spec written to `plan.md` before harness selection begins.
|
|
13
|
+
- Upgraded Stitch to Phase A / Phase B: Phase A (Blocking — safety, missing success criteria, goal ambiguity, harness mismatch) always runs and always surfaces when triggered. Phase B (Plan-shaping — minimality, reuse, deletion/substitution) runs only when a concrete code-grounded alternative exists and is skipped entirely in `deep` mode. Phase B never suggests reducing trust-boundary validation, data-loss prevention, security, accessibility, or explicitly requested requirements.
|
|
14
|
+
- Codex: Rule 27 added for `cast_mode` and `deep` mode behavior; Rule 11 updated for Stitch Phase A/B.
|
|
15
|
+
|
|
16
|
+
## [1.14.0] - 2026-06-19
|
|
17
|
+
|
|
18
|
+
- Added `CLAUDE_CONFIG_DIR` support: global installs now respect the env var (set via direnv or shell) so commands and skills land in the right config directory instead of always defaulting to `~/.claude`.
|
|
19
|
+
- Added `tink-harness update --all-repos`: finds every repo under the home directory that has Tink installed and updates each one. Uses `direnv exec` when available so per-repo `.envrc` overrides (including `CLAUDE_CONFIG_DIR`) are applied automatically; falls back to parsing simple `export` lines from `.envrc` otherwise.
|
|
20
|
+
|
|
5
21
|
## [1.13.0] - 2026-06-19
|
|
6
22
|
|
|
7
23
|
- Added focused opt-in harnesses for recurring agent workflows: `issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, and `architecture-deepening`.
|
|
@@ -13,7 +29,6 @@ All notable changes to Tink are tracked here.
|
|
|
13
29
|
- Added evidence lifecycle manager groundwork: `/tink:verify` now records a human-readable `.tink/current/evidence.md` summary card, config includes a `completion_policy` field for optional strict "no evidence, no done" behavior, and the dashboard lifecycle summary now exposes ROI hints, trust levels, and Activity-tab run review cards for failed or blocked runs without adding a new public replay command.
|
|
14
30
|
- Fixed: `npx tink-harness update` now prefers the current repo when `.tink/` exists there, so a global/home install scope no longer redirects update tests or repo-local updates away from the current project. Stored `git_policy` is still respected.
|
|
15
31
|
- Improved: the Activity dashboard cards were checked in desktop and mobile Chrome headless screenshots, with narrower mobile layout and shorter run-review fallback copy so the new evidence cards stay readable.
|
|
16
|
-
|
|
17
32
|
## [1.11.2] - 2026-06-13
|
|
18
33
|
|
|
19
34
|
- Fixed: the 3D harness map showed no connections or signal pulses on fresh installs (or installs whose history was lost to the pre-1.11.0 record-wipe bug). The lifecycle summary's graph was built only from run/ledger evidence; it now also includes the static rule graph - every routing rule connects to its harness, and check/guard chains render - so the map is alive from the first open.
|
package/README.ko.md
CHANGED
|
@@ -10,7 +10,7 @@ Tink는 사소하지 않은 모든 에이전트 작업을 눈에 보이는 파
|
|
|
10
10
|
|
|
11
11
|
<sub>Claude Code와 Codex를 위한 작은 하네스 레이어</sub>
|
|
12
12
|
|
|
13
|
-
**최신 패키지:** v1.
|
|
13
|
+
**최신 패키지:** v1.15.0 — `/tink:cast`에 세 가지 모드(quick / standard / deep)가 생겼습니다. deep 모드는 계획 전에 최대 10라운드 인터뷰를 진행하고, Stitch는 Phase A(차단) · Phase B(계획 조정)로 나뉩니다. 전체 변경 이력은 [CHANGELOG](CHANGELOG.md)를 확인하세요.
|
|
14
14
|
|
|
15
15
|
[English](README.md) · **한국어** · [변경 이력](CHANGELOG.md)
|
|
16
16
|
|
|
@@ -125,6 +125,18 @@ npx tink-harness dashboard # 파일만 만들려면 --no-open 추가
|
|
|
125
125
|
|
|
126
126
|
---
|
|
127
127
|
|
|
128
|
+
## GEO 노출도 측정
|
|
129
|
+
|
|
130
|
+
Tink에는 LLM 답변에서 Tink가 얼마나 자주 언급되고, 어느 순위로 추천되며, 어떤 출처로 인용되는지 측정하기 위한 geobench 제품 스펙이 포함되어 있습니다.
|
|
131
|
+
|
|
132
|
+
- Spec: [`geobench/tink-harness.yaml`](geobench/tink-harness.yaml)
|
|
133
|
+
- Runbook: [`docs/geobench.md`](docs/geobench.md)
|
|
134
|
+
- 지표: hit rate, MRR, share of voice, citation rate/share, confidence interval
|
|
135
|
+
|
|
136
|
+
벤치마크 결과는 집계 지표만 공개하세요. 원문 provider 답변, 시크릿, 개인 실행 로그는 공개하지 않습니다.
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
128
140
|
## 왜 만들었나
|
|
129
141
|
|
|
130
142
|
새로운 AI 코딩 하네스와 워크플로는 계속 늘어납니다. 좋은 것도 많지만, 여러 개를 섞다 보면 환경이 무거워지고 매번 다시 정리해야 합니다.
|
|
@@ -228,6 +240,7 @@ Tink가 아는 모든 것은 직접 읽고, diff 보고, 지울 수 있는 평
|
|
|
228
240
|
- 하네스 건강 요약: `docs/harness-lifecycle-signals.ko.md`, `docs/harness-lifecycle-signals.md`
|
|
229
241
|
- 외부 context 안전: `docs/mcp-safe-profile.md`, `docs/external-context-policy.md`
|
|
230
242
|
- `.tink/current/` 상태 읽기: `docs/work-state.ko.md`, `docs/work-state.md`
|
|
243
|
+
- GEO 노출도 벤치마크: `docs/geobench.md` · spec: `geobench/tink-harness.yaml`
|
|
231
244
|
- 업데이트 안정화: `docs/phase-5-update-confidence.ko.md`, `docs/phase-5-update-confidence.md`
|
|
232
245
|
- Context 효율: `docs/context-budget-ledger.ko.md`, `docs/context-budget-ledger.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-metrics-evaluator.md`, `docs/context-run-history-rollup.ko.md`, `docs/context-run-history-rollup.md`, `docs/context-threshold-status.ko.md`, `docs/context-threshold-status.md`, `docs/context-run-record-policy.ko.md`, `docs/context-run-record-policy.md`
|
|
233
246
|
- 남은 작업 단위: `docs/planned-work-units.ko.md`, `docs/planned-work-units.md` · 로드맵·아이디어 점검: `docs/tink-idea-implementation-plan.ko.md`
|
package/README.md
CHANGED
|
@@ -17,14 +17,14 @@
|
|
|
17
17
|
<p><sub>A small harness layer for Claude Code and Codex</sub></p>
|
|
18
18
|
|
|
19
19
|
<p>
|
|
20
|
-
<a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.
|
|
20
|
+
<a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.14.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
|
|
21
21
|
<a href="https://www.npmjs.com/package/tink-harness"><img src="https://img.shields.io/npm/v/tink-harness?label=npm&color=cb3837" alt="npm version"></a>
|
|
22
22
|
<a href="https://github.com/dotoricode/tink-harness/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/dotoricode/tink-harness/ci.yml?branch=main&label=ci" alt="CI"></a>
|
|
23
23
|
<a href="https://github.com/dotoricode/tink-harness/blob/main/LICENSE"><img src="https://img.shields.io/github/license/dotoricode/tink-harness" alt="License"></a>
|
|
24
24
|
<a href="https://github.com/dotoricode/tink-harness/stargazers"><img src="https://img.shields.io/github/stars/dotoricode/tink-harness?style=social" alt="GitHub stars"></a>
|
|
25
25
|
</p>
|
|
26
26
|
|
|
27
|
-
<p><strong>Latest package:</strong> v1.
|
|
27
|
+
<p><strong>Latest package:</strong> v1.15.0 - <code>/tink:cast</code> now supports three modes (quick / standard / deep); deep mode runs a structured 10-round interview before planning, and Stitch is split into Phase A (blocking) and Phase B (plan-shaping). See <a href="CHANGELOG.md">CHANGELOG</a> for release history.</p>
|
|
28
28
|
|
|
29
29
|
**English** · [한국어](README.ko.md) · [Changelog](CHANGELOG.md)
|
|
30
30
|
|
|
@@ -139,6 +139,18 @@ No server, no telemetry, no hidden cache - it is a static local page that only p
|
|
|
139
139
|
|
|
140
140
|
---
|
|
141
141
|
|
|
142
|
+
## Measure GEO visibility
|
|
143
|
+
|
|
144
|
+
Tink includes a geobench product spec so maintainers can measure how often LLM answers mention, rank, and cite Tink across providers.
|
|
145
|
+
|
|
146
|
+
- Spec: [`geobench/tink-harness.yaml`](geobench/tink-harness.yaml)
|
|
147
|
+
- Runbook: [`docs/geobench.md`](docs/geobench.md)
|
|
148
|
+
- Metrics: hit rate, MRR, share of voice, citation rate/share, and confidence intervals
|
|
149
|
+
|
|
150
|
+
Use the benchmark for aggregate visibility checks only. Do not publish raw provider answers, secrets, or private run logs.
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
142
154
|
## Why I made this
|
|
143
155
|
|
|
144
156
|
*Tink is <strong>knit</strong> in reverse: untying tangled workflows and knitting better ones back together. It also nods to Tinker Bell, the small helper at your side.*
|
|
@@ -282,6 +294,7 @@ The dashboard is a static local page rendered from those files — the harness h
|
|
|
282
294
|
- Harness health summary: `docs/harness-lifecycle-signals.md`, `docs/harness-lifecycle-signals.ko.md`
|
|
283
295
|
- External context safety: `docs/mcp-safe-profile.md`, `docs/external-context-policy.md`
|
|
284
296
|
- Reading `.tink/current/` state: `docs/work-state.md`, `docs/work-state.ko.md`
|
|
297
|
+
- GEO visibility benchmark: `docs/geobench.md` · spec: `geobench/tink-harness.yaml`
|
|
285
298
|
- Update confidence: `docs/phase-5-update-confidence.md`, `docs/phase-5-update-confidence.ko.md`
|
|
286
299
|
- Context efficiency: `docs/context-budget-ledger.md`, `docs/context-budget-ledger.ko.md`, `docs/context-metrics-evaluator.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-run-history-rollup.md`, `docs/context-run-history-rollup.ko.md`, `docs/context-threshold-status.md`, `docs/context-threshold-status.ko.md`, `docs/context-run-record-policy.md`, `docs/context-run-record-policy.ko.md`
|
|
287
300
|
- Planned work units: `docs/planned-work-units.md`, `docs/planned-work-units.ko.md` · roadmap and idea audit: `docs/tink-idea-implementation-plan.ko.md`
|
package/VERSIONING.md
CHANGED
package/bin/install.js
CHANGED
|
@@ -126,7 +126,7 @@ function argValue(name) {
|
|
|
126
126
|
}
|
|
127
127
|
|
|
128
128
|
function usage() {
|
|
129
|
-
console.log(`Tink installer for Claude Code and Codex\n\nUsage:\n tink-harness [install] [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--with-hook] [--clean-codex-picker] [--dry-run] [--force]\n tink-harness update [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--clean-codex-picker] [--dry-run] [--force]\n tink-harness dashboard [--no-open]\n\nIf the command is not installed yet, use:\n npx tink-harness@latest [install]\n npx tink-harness@latest update\n\nCommands:\n install Install Tink.\n update Update Tink to the latest templates. Asks only the agent surface; Tink-owned files always refresh, user-modified harness/memory/config files are kept.\n dashboard Generate the harness health report from local .tink records and open it in your browser. Use --no-open to skip opening.\n\nDefault interactive flow:\n 1. Select language\n 2. Show TINK wizard\n 3. Select Claude Code, Codex, or both\n 4. Select components\n 5. Select repo/global installation scope\n 6. Select Advanced options\n 7. Select git tracking policy for project state\n\nAdvanced options:\n --dry-run Preview only. Show what would be written or removed, but do not change files.\n --force Overwrite user-modified files. Use only when you want official templates to replace local edits.\n --clean-codex-picker Codex-only cleanup. Remove repo-local Claude Tink surfaces that show as Source Command Tink entries.\n\nEnvironment:\n TINK_INSTALL_SURFACES=claude|codex|all\n TINK_CLEAN_CODEX_PICKER=1\n\nScopes:\n repo Install shared .tink files into the current project.\n global Install shared .tink files into your home directory.\n`);
|
|
129
|
+
console.log(`Tink installer for Claude Code and Codex\n\nUsage:\n tink-harness [install] [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--with-hook] [--clean-codex-picker] [--dry-run] [--force]\n tink-harness update [--scope=repo|global] [--global] [--lang=en|ko|zh] [--yes] [--clean-codex-picker] [--dry-run] [--force]\n tink-harness update --all-repos\n tink-harness dashboard [--no-open]\n\nIf the command is not installed yet, use:\n npx tink-harness@latest [install]\n npx tink-harness@latest update\n\nCommands:\n install Install Tink.\n update Update Tink to the latest templates. Asks only the agent surface; Tink-owned files always refresh, user-modified harness/memory/config files are kept.\n dashboard Generate the harness health report from local .tink records and open it in your browser. Use --no-open to skip opening.\n\nDefault interactive flow:\n 1. Select language\n 2. Show TINK wizard\n 3. Select Claude Code, Codex, or both\n 4. Select components\n 5. Select repo/global installation scope\n 6. Select Advanced options\n 7. Select git tracking policy for project state\n\nAdvanced options:\n --dry-run Preview only. Show what would be written or removed, but do not change files.\n --force Overwrite user-modified files. Use only when you want official templates to replace local edits.\n --clean-codex-picker Codex-only cleanup. Remove repo-local Claude Tink surfaces that show as Source Command Tink entries.\n --all-repos Update all repos with Tink under the home directory. Uses direnv if available to load per-repo .envrc.\n\nEnvironment:\n TINK_INSTALL_SURFACES=claude|codex|all\n TINK_CLEAN_CODEX_PICKER=1\n CLAUDE_CONFIG_DIR Override ~/.claude for global installs (e.g. set by direnv per project)\n CODEX_HOME Override ~/.codex for Codex skill installs\n\nScopes:\n repo Install shared .tink files into the current project.\n global Install shared .tink files into your home directory.\n`);
|
|
130
130
|
}
|
|
131
131
|
|
|
132
132
|
function findTinkRoot() {
|
|
@@ -228,6 +228,15 @@ function codexHome() {
|
|
|
228
228
|
return process.env.CODEX_HOME || path.join(os.homedir(), '.codex');
|
|
229
229
|
}
|
|
230
230
|
|
|
231
|
+
// CLAUDE_CONFIG_DIR replaces ~/.claude for global installs (like direnv per-project overrides).
|
|
232
|
+
// Repo-scope installs always use <repo>/.claude regardless of this env var.
|
|
233
|
+
function claudeDir(target) {
|
|
234
|
+
if (process.env.CLAUDE_CONFIG_DIR && target === os.homedir()) {
|
|
235
|
+
return process.env.CLAUDE_CONFIG_DIR;
|
|
236
|
+
}
|
|
237
|
+
return path.join(target, '.claude');
|
|
238
|
+
}
|
|
239
|
+
|
|
231
240
|
function legacyComponentOptionsFor(agent, language) {
|
|
232
241
|
const options = COMPONENTS[language].filter((item) => {
|
|
233
242
|
if (item.value === 'commands') return includesClaude(agent);
|
|
@@ -364,8 +373,8 @@ function locationSummary(agent, scope) {
|
|
|
364
373
|
return [
|
|
365
374
|
`Repo target: ${repoTarget}`,
|
|
366
375
|
`Shared .tink target: ${path.join(installTarget, '.tink')}`,
|
|
367
|
-
includesClaude(agent) ? `Claude Code command target: ${path.join(installTarget, '
|
|
368
|
-
includesClaude(agent) ? `Claude Code skill target: ${path.join(installTarget, '
|
|
376
|
+
includesClaude(agent) ? `Claude Code command target: ${path.join(claudeDir(installTarget), 'commands/tink')}` : null,
|
|
377
|
+
includesClaude(agent) ? `Claude Code skill target: ${path.join(claudeDir(installTarget), 'skills/tink')}` : null,
|
|
369
378
|
includesCodex(agent) ? `Codex skills target: ${path.join(codexHome(), 'skills')}` : null,
|
|
370
379
|
includesCodex(agent) ? `Codex picker cleanup target: ${path.join(process.cwd(), '.claude')}` : null
|
|
371
380
|
].filter(Boolean).join('\n');
|
|
@@ -710,12 +719,12 @@ function copyDir(src, dest, base) {
|
|
|
710
719
|
|
|
711
720
|
function copyTinkCommands(templateRoot, target) {
|
|
712
721
|
const commandSrc = path.join(templateRoot, 'claude/commands/tink');
|
|
713
|
-
const commandDest = path.join(target, '
|
|
714
|
-
const flatCommandDest = path.join(target, '
|
|
722
|
+
const commandDest = path.join(claudeDir(target), 'commands/tink');
|
|
723
|
+
const flatCommandDest = path.join(claudeDir(target), 'commands');
|
|
715
724
|
const legacyFlatCommands = ['tink-setup.md', 'tink-forge.md', 'tink-list.md', 'tink-purge.md', 'tink-hone.md'];
|
|
716
725
|
const legacyNamespaceCommands = ['forge.md', 'purge.md', 'hone.md'];
|
|
717
726
|
const legacyTinyCommands = ['tiny-setup.md', 'tiny-use.md', 'tiny-list.md', 'tiny-save.md'];
|
|
718
|
-
const legacyDirs = [path.join(flatCommandDest, 'tiny'), path.join(target, '
|
|
727
|
+
const legacyDirs = [path.join(flatCommandDest, 'tiny'), path.join(claudeDir(target), 'skills/tiny')];
|
|
719
728
|
for (const name of legacyFlatCommands) {
|
|
720
729
|
const legacy = path.join(flatCommandDest, name);
|
|
721
730
|
if (fs.existsSync(legacy)) {
|
|
@@ -863,7 +872,7 @@ function hookCommandFor(scope, target) {
|
|
|
863
872
|
}
|
|
864
873
|
|
|
865
874
|
function registerClaudeHook(target, scope, base) {
|
|
866
|
-
const settingsPath = path.join(target, '
|
|
875
|
+
const settingsPath = path.join(claudeDir(target), 'settings.json');
|
|
867
876
|
const settings = readJsonFile(settingsPath, {});
|
|
868
877
|
const command = hookCommandFor(scope, target);
|
|
869
878
|
settings.hooks ||= {};
|
|
@@ -893,7 +902,7 @@ function copySelected(scope, components, agent) {
|
|
|
893
902
|
}
|
|
894
903
|
if (wantsClaudeSkill(components)) {
|
|
895
904
|
if (includesClaude(agent) && !cleanupCodexPicker) {
|
|
896
|
-
copyDir(path.join(templateRoot, 'claude/skills'), path.join(target, '
|
|
905
|
+
copyDir(path.join(templateRoot, 'claude/skills'), path.join(claudeDir(target), 'skills'), target);
|
|
897
906
|
}
|
|
898
907
|
}
|
|
899
908
|
if (wantsCodexSkills(components)) {
|
|
@@ -995,8 +1004,8 @@ function doneLineFor(agent) {
|
|
|
995
1004
|
|
|
996
1005
|
function updateResultSummary(agent, targets) {
|
|
997
1006
|
const locations = [
|
|
998
|
-
includesClaude(agent) ? `Claude Code commands: ${path.join(targets.installTarget, '
|
|
999
|
-
includesClaude(agent) ? `Claude Code skill: ${path.join(targets.installTarget, '
|
|
1007
|
+
includesClaude(agent) ? `Claude Code commands: ${path.join(claudeDir(targets.installTarget), 'commands/tink')}` : null,
|
|
1008
|
+
includesClaude(agent) ? `Claude Code skill: ${path.join(claudeDir(targets.installTarget), 'skills/tink')}` : null,
|
|
1000
1009
|
includesCodex(agent) ? `Codex skills: ${path.join(targets.codexTarget, 'skills')}` : null,
|
|
1001
1010
|
`Tink shared files: ${path.join(targets.installTarget, '.tink')}`
|
|
1002
1011
|
].filter(Boolean);
|
|
@@ -1216,12 +1225,119 @@ async function resolveChoices() {
|
|
|
1216
1225
|
return { agent, scope, components, gitPolicy, hookScope, language };
|
|
1217
1226
|
}
|
|
1218
1227
|
|
|
1228
|
+
function findAllTinkRepos() {
|
|
1229
|
+
const found = [];
|
|
1230
|
+
const skip = new Set(['node_modules', '.git', 'vendor', 'dist', 'build', 'out', 'target', '.cache']);
|
|
1231
|
+
|
|
1232
|
+
function scan(dir, depth) {
|
|
1233
|
+
if (depth > 4) return;
|
|
1234
|
+
let entries;
|
|
1235
|
+
try { entries = fs.readdirSync(dir, { withFileTypes: true }); } catch { return; }
|
|
1236
|
+
let hasTink = false;
|
|
1237
|
+
for (const entry of entries) {
|
|
1238
|
+
if (!entry.isDirectory()) continue;
|
|
1239
|
+
if (entry.name === '.tink') { hasTink = true; continue; }
|
|
1240
|
+
if (skip.has(entry.name) || entry.name.startsWith('.')) continue;
|
|
1241
|
+
scan(path.join(dir, entry.name), depth + 1);
|
|
1242
|
+
}
|
|
1243
|
+
if (hasTink) found.push(dir);
|
|
1244
|
+
}
|
|
1245
|
+
|
|
1246
|
+
scan(os.homedir(), 0);
|
|
1247
|
+
return found;
|
|
1248
|
+
}
|
|
1249
|
+
|
|
1250
|
+
function isDirenvAvailable() {
|
|
1251
|
+
return spawnSync('direnv', ['version'], { encoding: 'utf8' }).status === 0;
|
|
1252
|
+
}
|
|
1253
|
+
|
|
1254
|
+
function parseEnvrc(envrcPath, repoDir) {
|
|
1255
|
+
if (!fs.existsSync(envrcPath)) return {};
|
|
1256
|
+
const env = {};
|
|
1257
|
+
for (const line of fs.readFileSync(envrcPath, 'utf8').split('\n')) {
|
|
1258
|
+
const m = line.match(/^\s*export\s+([A-Z_][A-Z0-9_]*)=(.*)/);
|
|
1259
|
+
if (!m) continue;
|
|
1260
|
+
let val = m[2].trim().replace(/^["']|["']$/g, '');
|
|
1261
|
+
val = val
|
|
1262
|
+
.replace(/\$HOME|\bHOME\b/g, os.homedir())
|
|
1263
|
+
.replace(/\$PWD|\bPWD\b/g, repoDir)
|
|
1264
|
+
.replace(/^~/, os.homedir());
|
|
1265
|
+
env[m[1]] = val;
|
|
1266
|
+
}
|
|
1267
|
+
return env;
|
|
1268
|
+
}
|
|
1269
|
+
|
|
1270
|
+
async function runAllRepos() {
|
|
1271
|
+
const allRepos = findAllTinkRepos();
|
|
1272
|
+
const sourceRoot = path.resolve(root);
|
|
1273
|
+
const repos = allRepos.filter((r) => path.resolve(r) !== sourceRoot);
|
|
1274
|
+
|
|
1275
|
+
if (repos.length === 0) {
|
|
1276
|
+
console.log('No repos with Tink installed found under home directory.');
|
|
1277
|
+
return;
|
|
1278
|
+
}
|
|
1279
|
+
|
|
1280
|
+
const hasDirenv = isDirenvAvailable();
|
|
1281
|
+
const installScript = path.join(root, 'bin/install.js');
|
|
1282
|
+
|
|
1283
|
+
console.log(`Found ${repos.length} repo(s) with Tink installed:\n`);
|
|
1284
|
+
for (const repo of repos) {
|
|
1285
|
+
const envrc = path.join(repo, '.envrc');
|
|
1286
|
+
const envVars = hasDirenv ? {} : parseEnvrc(envrc, repo);
|
|
1287
|
+
const claudeTarget = envVars.CLAUDE_CONFIG_DIR
|
|
1288
|
+
? envVars.CLAUDE_CONFIG_DIR
|
|
1289
|
+
: path.join(repo, '.claude');
|
|
1290
|
+
const note = fs.existsSync(envrc)
|
|
1291
|
+
? hasDirenv
|
|
1292
|
+
? `(direnv)`
|
|
1293
|
+
: envVars.CLAUDE_CONFIG_DIR
|
|
1294
|
+
? `(.envrc → CLAUDE_CONFIG_DIR=${envVars.CLAUDE_CONFIG_DIR})`
|
|
1295
|
+
: `(.envrc, no CLAUDE_CONFIG_DIR)`
|
|
1296
|
+
: '';
|
|
1297
|
+
console.log(` ${repo} ${note}`);
|
|
1298
|
+
console.log(` → ${claudeTarget}/commands/tink`);
|
|
1299
|
+
}
|
|
1300
|
+
console.log('');
|
|
1301
|
+
|
|
1302
|
+
for (const repo of repos) {
|
|
1303
|
+
console.log(`▶ ${path.basename(repo)} (${repo})`);
|
|
1304
|
+
const envrc = path.join(repo, '.envrc');
|
|
1305
|
+
const extraEnv = hasDirenv ? {} : parseEnvrc(envrc, repo);
|
|
1306
|
+
const mergedEnv = { ...process.env, ...extraEnv };
|
|
1307
|
+
|
|
1308
|
+
let result;
|
|
1309
|
+
if (hasDirenv && fs.existsSync(envrc)) {
|
|
1310
|
+
result = spawnSync(
|
|
1311
|
+
'direnv', ['exec', repo, 'node', installScript, 'update', '--yes', '--scope=repo'],
|
|
1312
|
+
{ cwd: repo, env: process.env, stdio: 'inherit', encoding: 'utf8' }
|
|
1313
|
+
);
|
|
1314
|
+
} else {
|
|
1315
|
+
result = spawnSync(
|
|
1316
|
+
process.execPath, [installScript, 'update', '--yes', '--scope=repo'],
|
|
1317
|
+
{ cwd: repo, env: mergedEnv, stdio: 'inherit', encoding: 'utf8' }
|
|
1318
|
+
);
|
|
1319
|
+
}
|
|
1320
|
+
|
|
1321
|
+
if (result.status !== 0) {
|
|
1322
|
+
console.error(` ✗ failed (exit ${result.status})`);
|
|
1323
|
+
} else {
|
|
1324
|
+
console.log(` ✓ done`);
|
|
1325
|
+
}
|
|
1326
|
+
console.log('');
|
|
1327
|
+
}
|
|
1328
|
+
}
|
|
1329
|
+
|
|
1219
1330
|
async function main() {
|
|
1220
1331
|
if (command === 'help' || args.includes('--help')) {
|
|
1221
1332
|
usage();
|
|
1222
1333
|
process.exit(0);
|
|
1223
1334
|
}
|
|
1224
1335
|
|
|
1336
|
+
if (command === 'update' && args.includes('--all-repos')) {
|
|
1337
|
+
await runAllRepos();
|
|
1338
|
+
return;
|
|
1339
|
+
}
|
|
1340
|
+
|
|
1225
1341
|
if (command === 'dashboard') {
|
|
1226
1342
|
runDashboard();
|
|
1227
1343
|
return;
|
package/commands/cast.md
CHANGED
|
@@ -32,6 +32,14 @@ A valid `/tink:cast` response must do one of these:
|
|
|
32
32
|
|
|
33
33
|
If the task is clear enough to classify, do not ask broad clarification first. Make a best recommendation, ask for approval, then act.
|
|
34
34
|
|
|
35
|
+
## Cast mode
|
|
36
|
+
`/tink:cast` without a task argument shows the current mode and offers a change option. `/tink:cast <mode>` sets the mode and saves it to `cast_mode` in `.tink/config.json`.
|
|
37
|
+
|
|
38
|
+
Modes:
|
|
39
|
+
- `quick` — Forces Lane 1 fast path regardless of task complexity. Skips harness selection and starts immediately.
|
|
40
|
+
- `standard` — Default behavior. Quick triage selects the right lane automatically.
|
|
41
|
+
- `deep` — Runs a structured interview before planning. See **Deep mode** below.
|
|
42
|
+
|
|
35
43
|
## Interaction policy
|
|
36
44
|
Always call the `AskUserQuestion` tool for choice prompts. Do not render `❯` text format. Do not ask the user to type a number inline.
|
|
37
45
|
|
|
@@ -79,12 +87,21 @@ When Stitch is visible, show exactly one proposal in this order: proposal, reaso
|
|
|
79
87
|
2. reason
|
|
80
88
|
3. choices
|
|
81
89
|
|
|
82
|
-
|
|
83
|
-
1.
|
|
84
|
-
2. success criteria or verification
|
|
85
|
-
3.
|
|
86
|
-
4.
|
|
87
|
-
|
|
90
|
+
**Phase A — Blocking checks** (always run; always surface when triggered):
|
|
91
|
+
1. Safety or irreversibility
|
|
92
|
+
2. Missing success criteria or verification
|
|
93
|
+
3. Goal or scope ambiguity
|
|
94
|
+
4. Harness mismatch
|
|
95
|
+
|
|
96
|
+
**Phase B — Plan-shaping checks** (run after Phase A; surface only when a concrete code-grounded alternative exists):
|
|
97
|
+
5. Minimality — is the plan larger than the request warrants? Are new files, abstractions, or dependencies justified?
|
|
98
|
+
6. Reuse — does an existing helper, pattern, or flow already solve this?
|
|
99
|
+
7. Deletion/substitution — can the addition be replaced with deleting, configuring, or extending an existing path?
|
|
100
|
+
|
|
101
|
+
Phase B proposal rules:
|
|
102
|
+
- Never surface Phase B without a concrete alternative grounded in observed code or project state. "This looks large, consider simplifying" is not a valid finding.
|
|
103
|
+
- Never suggest reducing: trust boundary input validation, data loss prevention, security measures, accessibility basics, or explicitly requested requirements.
|
|
104
|
+
- In `deep` mode, skip Phase B entirely — the interview already covered minimality and reuse.
|
|
88
105
|
|
|
89
106
|
Stitch may change the order or method of work, but it must not change the user's goal without separate approval.
|
|
90
107
|
|
|
@@ -114,6 +131,38 @@ If the user chooses `Continue as-is` / `이대로 진행`, proceed with the expl
|
|
|
114
131
|
|
|
115
132
|
Do not record a clean Stitch pass.
|
|
116
133
|
|
|
134
|
+
## Deep mode
|
|
135
|
+
When `cast_mode` is `deep`, run a structured interview before the normal Procedure. The interview refines the task into a spec that feeds harness selection.
|
|
136
|
+
|
|
137
|
+
**Round 0 — Topology lock** (not counted in progress)
|
|
138
|
+
Before asking any questions, present the high-level components Claude infers from the request and visible codebase context. Ask the user to confirm, add, remove, or merge components. This prevents deep focus on one component from obscuring others.
|
|
139
|
+
|
|
140
|
+
**Interview loop — Rounds 1–10**
|
|
141
|
+
Show a progress indicator at the start of each question:
|
|
142
|
+
|
|
143
|
+
```
|
|
144
|
+
[Round N/10 ████████░░░░░░░░░░░░]
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
Rules:
|
|
148
|
+
- Ask one question per round. Never ask multiple questions in one round.
|
|
149
|
+
- Target the weakest clarity dimension each round: goal (0.35 weight), constraint (0.25), success criteria (0.25), context (0.15). These weights are internal judgment guides, not computed scores. Always pick the dimension where ambiguity most limits the next action.
|
|
150
|
+
- Brownfield rule: investigate the codebase before asking. Do not ask about things already visible in the code. Confirm findings rather than ask from scratch.
|
|
151
|
+
- Counter-question (user answers but also asks a question back): answer the counter-question first, then treat the combined response as this round's answer. Round counter does not advance.
|
|
152
|
+
- Clarification request (user does not understand the question): rephrase and re-ask within the same round. Round counter does not advance.
|
|
153
|
+
- Round 3+: user may exit the interview early and proceed directly to spec generation.
|
|
154
|
+
- Round 10: hard cap. End the interview and produce the spec regardless of ambiguity.
|
|
155
|
+
- End early when goal, constraint, and success criteria are all sufficiently clear, without waiting for Round 10.
|
|
156
|
+
|
|
157
|
+
**Question mode shift** (triggered by clarity state, not round number):
|
|
158
|
+
- When goal and constraint are sufficiently clear → shift to Contrarian mode: "What if the opposite were true? What if this assumption is wrong?"
|
|
159
|
+
- When those are also resolved → shift to Simplifier mode: "What is the smallest version that still has meaningful value?"
|
|
160
|
+
|
|
161
|
+
**Spec → plan.md → harness selection**
|
|
162
|
+
When the interview ends, write `.tink/current/plan.md` with these top-level sections: Goal, Topology, Constraints, Success Criteria, Open Questions.
|
|
163
|
+
|
|
164
|
+
Then proceed to the normal Procedure starting at step 3 (read harness index). Use the spec as the harness selection input instead of the raw task request. Stitch Phase A runs after harness selection as normal. Phase B is skipped.
|
|
165
|
+
|
|
117
166
|
## Reusable State Save Gate
|
|
118
167
|
Reusable State Save Gate is a separate absolute hard approval gate, not merely a Stitch subtype. Current-run approval does not authorize reusable-state writes.
|
|
119
168
|
|
|
@@ -160,6 +209,38 @@ Optional current-run artifacts are created only when their harness is selected:
|
|
|
160
209
|
- `goals.json`: current-run goals for `goal-checkpoint`; keep 2-6 goals, one active goal, status, done criteria, verification, and evidence.
|
|
161
210
|
- `delegation.md`: handoff or parallel-work packets for `delegation-brief`; include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
|
|
162
211
|
|
|
212
|
+
## Evidence Split
|
|
213
|
+
Evidence Split is a base-run habit, not a separate harness. It keeps real work small while the task is happening by splitting broad or uncertain work into evidence-sized packets.
|
|
214
|
+
|
|
215
|
+
Use Evidence Split at cast time and again during implementation when:
|
|
216
|
+
- the first plan has several uncertain facts,
|
|
217
|
+
- implementation starts coupling several files or concepts,
|
|
218
|
+
- a check fails and the next action is unclear,
|
|
219
|
+
- context is becoming broad or stale,
|
|
220
|
+
- independent verification, review, or handoff would reduce risk.
|
|
221
|
+
|
|
222
|
+
Skip it for tiny, obvious edits where a packet would not change the next action.
|
|
223
|
+
|
|
224
|
+
Packet vocabulary:
|
|
225
|
+
- `probe`: answer one unknown with 1-3 inputs.
|
|
226
|
+
- `patch`: make one narrow implementation change.
|
|
227
|
+
- `verify`: prove one success condition or failure recovery.
|
|
228
|
+
- `review`: inspect one risk, regression, or omission.
|
|
229
|
+
- `decision`: record one branch, chosen option, and evidence.
|
|
230
|
+
|
|
231
|
+
Represent packets in existing run state:
|
|
232
|
+
- `steps.json`: packetized steps and status.
|
|
233
|
+
- `context-map.json`: the input files, sources, or excluded context for each packet.
|
|
234
|
+
- `notes.md`: why work was split or re-split during implementation.
|
|
235
|
+
- `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
|
|
236
|
+
|
|
237
|
+
Safety defaults:
|
|
238
|
+
- Do not start workers, tmux panes, worktrees, or external agents automatically.
|
|
239
|
+
- Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
|
|
240
|
+
- Do not let multiple packets edit the same file concurrently.
|
|
241
|
+
- Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
|
|
242
|
+
- Keep each packet to 1-3 primary inputs when possible.
|
|
243
|
+
|
|
163
244
|
Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
|
|
164
245
|
|
|
165
246
|
```json
|
|
@@ -369,6 +450,7 @@ If any of the following is true, the task goes to Lane 3:
|
|
|
369
450
|
- The task description mentions any of the above concepts
|
|
370
451
|
|
|
371
452
|
**Step 2 — Lane decision (only if step 1 finds no hard-gate):**
|
|
453
|
+
If `cast_mode` is `quick`, always select Lane 1 here regardless of task signals.
|
|
372
454
|
|
|
373
455
|
**Lane 1 — instant start.** Any of these signals, with no contradicting complexity signal:
|
|
374
456
|
- a question, explanation, or lookup with no file edits
|
|
@@ -480,12 +562,13 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
|
|
|
480
562
|
- new pattern not covered yet
|
|
481
563
|
|
|
482
564
|
These are task types, not harness names. Generic types (code change, bug fix, research, review, docs) default to the base run; a harness is added only when a specialized one genuinely fits.
|
|
483
|
-
6.
|
|
565
|
+
6. Apply the Evidence Split check before choosing harnesses. If it changes the next action, represent the first packets in `steps.json` and connect each packet to context or verification evidence in `context-map.json`. Keep this check lightweight and skip it for tiny work.
|
|
566
|
+
7. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
|
|
484
567
|
- If the request is an ambiguous idea, early product concept, or underspecified implementation prompt, prefer `requirements-interview` before planning or coding. This is the default harness when Stitch is expected to trigger for goal ambiguity or missing acceptance criteria.
|
|
485
568
|
- If the request asks for a plan, architecture decision, large refactor, migration, or broad public contract change, consider `plan-consensus`.
|
|
486
569
|
- If the work naturally splits into multiple durable milestones, add `goal-checkpoint` and create `.tink/current/goals.json` after approval.
|
|
487
570
|
- If parallel review, independent verification, or handoff would reduce risk, add `delegation-brief` and create `.tink/current/delegation.md` after approval. This harness prepares briefs only; it never starts tmux, worktrees, workers, or external agents.
|
|
488
|
-
|
|
571
|
+
8. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
|
|
489
572
|
- Use `issue-triage` for issue/PR/QA intake, ready-for-agent briefs, needs-info/wontfix decisions, or vertical issue slices.
|
|
490
573
|
- Use `bug-diagnosis-loop` for hard bugs, regressions, intermittent failures, or performance problems where a red-capable loop must come before code changes.
|
|
491
574
|
- Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
|
|
@@ -496,7 +579,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
|
|
|
496
579
|
- `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.
|
|
497
580
|
- `plan-consensus` must be explicitly considered for any from-scratch implementation, reimplementation, migration, or public contract/API design. If skipped, record a one-line reason in the 오버레이 점검 line.
|
|
498
581
|
- The context budget and the "prefer 1-3 harnesses" guidance never justify dropping a REQUIRED overlay: overlays are cheap state files, not extra loaded context. A large task judged "fine with default harnesses" because the synthesis probe found a fit is a selection bug - the probe only answers whether a custom procedure is needed, not whether overlays are needed.
|
|
499
|
-
|
|
582
|
+
9. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
|
|
500
583
|
|
|
501
584
|
After selecting, run a quick quality check using the index metadata for each chosen harness:
|
|
502
585
|
- If fewer than 2 words in `use_when` match the current task description (case-insensitive) → treat as a Stitch harness-mismatch signal
|
|
@@ -504,26 +587,26 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
|
|
|
504
587
|
- If `asks` is empty or missing and the task goal is not self-evident → treat as a Stitch goal-ambiguity signal
|
|
505
588
|
Feed any signals into the Stitch evaluation at step 16.
|
|
506
589
|
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
|
|
510
|
-
|
|
511
|
-
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
|
|
515
|
-
|
|
516
|
-
|
|
517
|
-
|
|
518
|
-
|
|
519
|
-
|
|
590
|
+
10. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
|
|
591
|
+
11. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
|
|
592
|
+
12. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
|
|
593
|
+
13. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
|
|
594
|
+
14. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
|
|
595
|
+
15. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
|
|
596
|
+
16. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
|
|
597
|
+
17. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
|
|
598
|
+
18. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
|
|
599
|
+
19. Ask for explicit approval before non-trivial work.
|
|
600
|
+
20. After approval, read only the selected harness files and any approved run-only draft.
|
|
601
|
+
21. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
|
|
602
|
+
22. Execute the first safe step immediately:
|
|
520
603
|
- inspect relevant files,
|
|
521
604
|
- run a read-only diagnostic,
|
|
522
605
|
- draft the first artifact,
|
|
523
606
|
- or reproduce the issue.
|
|
524
|
-
|
|
525
|
-
|
|
526
|
-
|
|
607
|
+
23. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. Re-run Evidence Split when new uncertainty, coupling, failed checks, or context sprawl appears; update packetized steps and context evidence before continuing. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
|
|
608
|
+
24. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
|
|
609
|
+
25. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
|
|
527
610
|
|
|
528
611
|
|
|
529
612
|
## Synthesis probe
|
package/docs/geobench.md
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# GEO Benchmark For Tink
|
|
2
|
+
|
|
3
|
+
This repository includes a [`geobench`](https://github.com/NomaDamas/geobench) product spec for measuring LLM answer visibility: hit rate, MRR, share of voice, citation rate/share, and confidence intervals.
|
|
4
|
+
|
|
5
|
+
Product spec: [`geobench/tink-harness.yaml`](../geobench/tink-harness.yaml)
|
|
6
|
+
|
|
7
|
+
## Run
|
|
8
|
+
|
|
9
|
+
Use a local checkout or install of `geobench`; do not commit `.env`, raw run logs, or provider responses.
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
/path/to/geobench/dist/geobench estimate --product geobench/tink-harness.yaml --providers openai --tier cheap
|
|
13
|
+
/path/to/geobench/dist/geobench profile geobench/tink-harness.yaml
|
|
14
|
+
/path/to/geobench/dist/geobench bench --product geobench/tink-harness.yaml --providers openai --tier cheap --mode benchmark
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
To inspect results locally:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
/path/to/geobench/dist/geobench dash
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Publishing Boundary
|
|
24
|
+
|
|
25
|
+
Publish aggregate metrics only. Do not publish raw provider answers, secrets, private run logs, or `.env` values. When citing results, include the run date, provider set, tier, query count, and whether the spec was profiled before the run.
|
|
26
|
+
|
|
27
|
+
## Korean Summary
|
|
28
|
+
|
|
29
|
+
이 repo에는 Tink의 LLM 답변 노출도를 측정하기 위한 geobench 제품 스펙이 포함되어 있습니다. 실행 결과를 공개할 때는 hit rate, MRR, share of voice, citation rate/share 같은 집계 지표만 공개하고, 원문 provider 답변·시크릿·개인 실행 로그는 공개하지 않습니다.
|
|
@@ -91,17 +91,18 @@ Standalone CLI를 더 짧게 입력하고, 로컬 health report를 더 쉽게
|
|
|
91
91
|
- `dashboard`는 기본적으로 로컬 정적 파일만 만든다. 서버, watcher, hidden cache, 자동 하네스 수정은 하지 않는다.
|
|
92
92
|
- 생성 파일 경로가 플랫폼별로 안정화된 뒤에만 선택적인 open/export flag를 검토한다.
|
|
93
93
|
|
|
94
|
-
##
|
|
94
|
+
## Evidence Split / Parallel Evidence
|
|
95
95
|
|
|
96
|
-
작업
|
|
96
|
+
작업 병렬화보다 먼저, Tink의 기본 작업 루프에 Evidence Split을 넣는다. Tink를 별도 multi-agent runtime으로 만들지 않고, 큰 작업을 작은 증거 packet으로 나누는 기본 동작부터 안정화한다. 상세 연구 기록은 `docs/swarm-fast-lane.ko.md`와 `docs/swarm-fast-lane.md`에 둔다.
|
|
97
97
|
|
|
98
|
-
-
|
|
99
|
-
-
|
|
98
|
+
- `/tink:cast`와 `$tink:cast`는 하네스 선택 전에 `probe`, `patch`, `verify`, `review`, `decision` packet으로 나눌 수 있는지 점검한다.
|
|
99
|
+
- 실제 작업 중에도 불확실성, 검증 실패, context 확대, 변경 결합이 생기면 다시 packet으로 나눈다.
|
|
100
|
+
- packet은 전체 작업이 아니라 1-3개 입력만 가진 작은 단위를 본다.
|
|
101
|
+
- 외부 worker가 필요할 때도 기본적으로 파일을 직접 수정하지 않고 evidence와 patch candidate만 반환한다.
|
|
100
102
|
- 메인 에이전트만 최종 patch 선택, 파일 수정, 검증을 책임진다.
|
|
101
103
|
- 성공 지표는 "항상 더 빠름"이 아니라 main context 감소, 재작업 감소, 실패 조기 발견, 검증 통과율 유지 또는 개선으로 둔다.
|
|
102
|
-
- 초기 모드는
|
|
103
|
-
-
|
|
104
|
-
- worker 출력은 300단어 이하, evidence-only, confidence 포함으로 제한한다.
|
|
104
|
+
- 초기 모드는 core behavior인 Evidence Split으로 두고, 실제 worker runtime은 별도 후속 작업으로 미룬다.
|
|
105
|
+
- worker 출력은 future runtime에서도 300단어 이하, evidence-only, confidence 포함으로 제한한다.
|
|
105
106
|
- public contract, secrets, 넓은 repo scan, 동일 파일 동시 수정이 필요한 작업에서는 선택하지 않는다.
|
|
106
107
|
|
|
107
108
|
## 제외
|
|
@@ -91,17 +91,18 @@ Make the standalone CLI easier to type and make the local health report easier t
|
|
|
91
91
|
- Keep `dashboard` local and static by default: no server, watcher, hidden cache, or automatic harness edits.
|
|
92
92
|
- Allow an optional open/export flag only after the generated file path behavior is stable across platforms.
|
|
93
93
|
|
|
94
|
-
##
|
|
94
|
+
## Evidence Split / Parallel Evidence
|
|
95
95
|
|
|
96
|
-
|
|
96
|
+
Before adding parallel workers, add Evidence Split to Tink's default work loop. Tink should not become a separate multi-agent runtime; it should first make large work divisible into small evidence packets. The research notes live in `docs/swarm-fast-lane.ko.md` and `docs/swarm-fast-lane.md`.
|
|
97
97
|
|
|
98
|
-
-
|
|
99
|
-
-
|
|
98
|
+
- `/tink:cast` and `$tink:cast` check whether work should split into `probe`, `patch`, `verify`, `review`, or `decision` packets before harness selection.
|
|
99
|
+
- During implementation, Tink re-splits work when uncertainty, failed checks, context sprawl, or coupled changes appear.
|
|
100
|
+
- Packets see only 1-3 inputs, not the whole task.
|
|
101
|
+
- If external workers are used later, they do not edit files by default; they return evidence and patch candidates.
|
|
100
102
|
- The main agent owns final patch selection, file edits, and verification.
|
|
101
103
|
- Success is measured by less main-agent context, less rework, earlier failure detection, and equal or better verification pass rate, not by claiming universal raw speed.
|
|
102
|
-
-
|
|
103
|
-
-
|
|
104
|
-
- Worker output is capped at 300 words and must include evidence and confidence.
|
|
104
|
+
- The initial implementation is the core Evidence Split behavior; actual worker runtime remains deferred.
|
|
105
|
+
- Future worker output should be capped at 300 words and include evidence and confidence.
|
|
105
106
|
- Do not select it for unclear public contracts, secrets, broad repository scans, or same-file concurrent edits.
|
|
106
107
|
|
|
107
108
|
## Excluded
|
|
@@ -1,16 +1,16 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Evidence Split / Parallel Evidence 연구 계획
|
|
2
2
|
|
|
3
|
-
이 문서는 멀티
|
|
3
|
+
이 문서는 멀티 에이전트 작업 병렬화의 전 단계로, Tink가 큰 작업을 작은 evidence packet으로 나누는 기본 동작을 갖도록 제한하는 연구 계획이다. 목표는 "에이전트를 많이 띄우기"가 아니라, 작은 컨텍스트 패킷으로 조사, 수정, 검증, 리뷰, 결정을 분리해 메인 에이전트의 재작업과 전체 컨텍스트 부담을 줄이는 것이다.
|
|
4
4
|
|
|
5
5
|
## 문제 정의
|
|
6
6
|
|
|
7
7
|
일반적인 멀티 에이전트 병렬화는 토큰을 더 많이 쓴다. 각 worker가 같은 문맥을 다시 읽고, 서로 다른 수정이 충돌하며, 메인 에이전트가 합산 비용을 다시 치르기 때문이다.
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
Evidence Split은 이 문제를 반대로 접근한다.
|
|
10
10
|
|
|
11
|
-
-
|
|
12
|
-
-
|
|
13
|
-
- worker가 기본적으로 직접 수정하지 않는다.
|
|
11
|
+
- packet이 전체 작업을 이해하지 않는다.
|
|
12
|
+
- packet이 넓은 파일을 읽지 않는다.
|
|
13
|
+
- 외부 worker가 쓰이더라도 기본적으로 직접 수정하지 않는다.
|
|
14
14
|
- worker 출력은 짧은 evidence와 patch candidate로 제한한다.
|
|
15
15
|
- 메인 에이전트만 최종 경로를 선택하고 파일을 수정한다.
|
|
16
16
|
|
|
@@ -80,14 +80,16 @@ worker는 파일 수정 없이 관련 파일, 위험, 테스트 후보만 찾는
|
|
|
80
80
|
|
|
81
81
|
worker에게 의도적으로 불완전한 최소 컨텍스트만 준다. 목적은 좋은 구현이 아니라, 작은 정보로도 잡히는 문제를 싸게 찾는 것이다.
|
|
82
82
|
|
|
83
|
-
##
|
|
83
|
+
## Core Behavior 계약
|
|
84
84
|
|
|
85
|
-
|
|
85
|
+
Evidence Split은 별도 하네스가 아니라 `/tink:cast`와 `$tink:cast`의 기본 동작이다. 다음 조건에서 사용한다.
|
|
86
86
|
|
|
87
87
|
- 작업이 2-5개의 독립 packet으로 나뉜다.
|
|
88
88
|
- 각 packet은 입력 파일 또는 질문이 1-3개로 제한된다.
|
|
89
|
-
-
|
|
90
|
-
-
|
|
89
|
+
- packet type은 `probe`, `patch`, `verify`, `review`, `decision` 중 하나다.
|
|
90
|
+
- 실제 작업 중 불확실성, 검증 실패, context 확대, 변경 결합이 생기면 다시 packet으로 나눈다.
|
|
91
|
+
- 외부 worker의 출력은 future runtime에서도 300단어 이하로 제한한다.
|
|
92
|
+
- 외부 worker는 기본적으로 직접 파일을 수정하지 않는다.
|
|
91
93
|
- worker 출력에는 evidence, 추천 행동, confidence가 포함된다.
|
|
92
94
|
- 메인 에이전트가 최종 patch와 검증을 책임진다.
|
|
93
95
|
|
|
@@ -118,15 +120,14 @@ worker에게 의도적으로 불완전한 최소 컨텍스트만 준다. 목적
|
|
|
118
120
|
|
|
119
121
|
첫 구현 slice는 다음을 완료로 본다.
|
|
120
122
|
|
|
121
|
-
- `
|
|
122
|
-
-
|
|
123
|
-
- worker packet 형식이 `.tink/current/delegation.md` 또는 별도 run artifact로 표현된다.
|
|
123
|
+
- Evidence Split이 Tink core rules와 `/tink:cast`, `$tink:cast` 문서에 기본 동작으로 들어간다.
|
|
124
|
+
- packet 형식이 `steps.json`, `context-map.json`, `notes.md`, 필요 시 `.tink/current/delegation.md`로 표현된다.
|
|
124
125
|
- worker 직접 수정은 기본 비활성이다.
|
|
125
|
-
-
|
|
126
|
+
- 작은 작업에서는 생략 가능하다는 lightweight rule이 있다.
|
|
126
127
|
- 검증은 "더 빠름"을 단정하지 않고, context 감소와 재작업 감소 근거를 기록한다.
|
|
127
128
|
|
|
128
129
|
## 열린 질문
|
|
129
130
|
|
|
130
131
|
- 실제 worker 실행은 Codex/Claude Code의 기존 기능을 얇게 호출할지, Tink는 packet 문서화까지만 할지 결정해야 한다.
|
|
131
|
-
- worker 결과 schema를 `delegation-brief`에 통합할지, 별도
|
|
132
|
-
- fast
|
|
132
|
+
- worker 결과 schema를 `delegation-brief`에 통합할지, 별도 runtime artifact로 둘지 결정해야 한다.
|
|
133
|
+
- `swarm-fast-lane` 이름은 연구 문서의 임시 이름으로만 남기고, 사용자 문구는 Evidence Split 또는 Parallel Evidence를 우선한다.
|
package/docs/swarm-fast-lane.md
CHANGED
|
@@ -1,16 +1,16 @@
|
|
|
1
|
-
#
|
|
1
|
+
# Evidence Split / Parallel Evidence Research Plan
|
|
2
2
|
|
|
3
|
-
This document describes
|
|
3
|
+
This document describes the step before multi-agent parallelism: Tink should first split large work into small evidence packets without becoming a separate runtime. The goal is not to spawn more agents by default. The goal is to separate probe, patch, verify, review, and decision work into tiny context packets so the main agent reduces rework and context load.
|
|
4
4
|
|
|
5
5
|
## Problem
|
|
6
6
|
|
|
7
7
|
Naive multi-agent parallelism usually spends more tokens. Each worker rereads context, independent edits conflict, and the main agent still pays a reconciliation cost.
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
Evidence Split inverts that model.
|
|
10
10
|
|
|
11
|
-
-
|
|
12
|
-
-
|
|
13
|
-
-
|
|
11
|
+
- Packets do not understand the whole task.
|
|
12
|
+
- Packets do not read broad context.
|
|
13
|
+
- If external workers are used, they do not edit files by default.
|
|
14
14
|
- Worker output is limited to short evidence and patch candidates.
|
|
15
15
|
- The main agent chooses the final path and owns file edits.
|
|
16
16
|
|
|
@@ -80,14 +80,16 @@ Workers look only for reasons the current implementation approach will fail. Thi
|
|
|
80
80
|
|
|
81
81
|
Workers intentionally receive incomplete minimal context. The point is not high-quality implementation; it is cheaply detecting problems that are visible with little information.
|
|
82
82
|
|
|
83
|
-
##
|
|
83
|
+
## Core Behavior Contract
|
|
84
84
|
|
|
85
|
-
|
|
85
|
+
Evidence Split is not a separate harness. It is default behavior inside `/tink:cast` and `$tink:cast`. Use it when:
|
|
86
86
|
|
|
87
87
|
- the task splits into 2-5 independent packets
|
|
88
88
|
- each packet is limited to 1-3 input files or questions
|
|
89
|
-
- each
|
|
90
|
-
-
|
|
89
|
+
- each packet type is `probe`, `patch`, `verify`, `review`, or `decision`
|
|
90
|
+
- work should be re-split during implementation because uncertainty, failed checks, context sprawl, or coupled changes appeared
|
|
91
|
+
- future worker output is limited to 300 words
|
|
92
|
+
- external workers do not edit files by default
|
|
91
93
|
- worker output includes evidence, recommended action, and confidence
|
|
92
94
|
- the main agent owns final patching and verification
|
|
93
95
|
|
|
@@ -118,15 +120,14 @@ The first version can start with estimates, but run artifacts should record evid
|
|
|
118
120
|
|
|
119
121
|
The first implementation slice is done when:
|
|
120
122
|
|
|
121
|
-
-
|
|
122
|
-
-
|
|
123
|
-
- worker packet format is represented in `.tink/current/delegation.md` or another run artifact
|
|
123
|
+
- Evidence Split is documented as default behavior in Tink core rules and `/tink:cast`, `$tink:cast`
|
|
124
|
+
- packet format is represented in `steps.json`, `context-map.json`, `notes.md`, and optionally `.tink/current/delegation.md`
|
|
124
125
|
- direct worker edits are disabled by default
|
|
125
|
-
-
|
|
126
|
+
- tiny work can skip the packet ceremony
|
|
126
127
|
- verification records context reduction and rework reduction evidence instead of claiming raw speed
|
|
127
128
|
|
|
128
129
|
## Open Questions
|
|
129
130
|
|
|
130
131
|
- Should actual worker execution call existing Codex/Claude Code features, or should Tink only document packets?
|
|
131
|
-
- Should worker result schema extend `delegation-brief`, or should
|
|
132
|
-
-
|
|
132
|
+
- Should worker result schema extend `delegation-brief`, or should it use a separate runtime artifact?
|
|
133
|
+
- Keep `swarm-fast-lane` only as a research placeholder; prefer Evidence Split or Parallel Evidence in user-facing copy.
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
name: "Tink"
|
|
2
|
+
aliases: ["Tink Harness", "tink-harness", "Tink for Claude Code", "Tink for Codex"]
|
|
3
|
+
romanizations: []
|
|
4
|
+
category: "coding-agent harness and visible workflow runtime"
|
|
5
|
+
description: "Tink is a small harness layer for Claude Code and Codex that keeps non-trivial agent work in visible files: task contracts, run state, verification checks, run records, and approval-gated reusable harnesses."
|
|
6
|
+
competitors: ["Gajae-Code", "Claude Code", "Codex CLI", "OpenCode", "Aider", "Cursor"]
|
|
7
|
+
cited_domains: ["github.com/dotoricode/tink-harness", "npmjs.com/package/tink-harness"]
|
|
8
|
+
target_languages: ["en", "ko"]
|
|
9
|
+
target_audience:
|
|
10
|
+
- "developers using Claude Code or Codex for multi-step coding work"
|
|
11
|
+
- "maintainers who want visible run state and verification evidence"
|
|
12
|
+
- "AI-agent workflow builders comparing harness and memory approaches"
|
|
13
|
+
discovery_sources:
|
|
14
|
+
- "https://github.com/dotoricode/tink-harness"
|
|
15
|
+
- "https://www.npmjs.com/package/tink-harness"
|
|
16
|
+
|
|
17
|
+
enriched_profile:
|
|
18
|
+
generated_at: "2026-06-15T00:00:00Z"
|
|
19
|
+
profiler_model: "curated-public-source"
|
|
20
|
+
value_proposition: "A local, approval-gated harness layer that makes Claude Code and Codex work inspectable through task contracts, current-run files, verification evidence, run records, and reusable workflow harnesses."
|
|
21
|
+
source_content_hashes:
|
|
22
|
+
- "curated-public-source"
|
|
23
|
+
target_audience:
|
|
24
|
+
- segment: "coding-agent operators"
|
|
25
|
+
pains:
|
|
26
|
+
- "Need visible run state instead of relying on hidden chat memory"
|
|
27
|
+
- "Need repeatable verification and approval boundaries for multi-step agent work"
|
|
28
|
+
- segment: "open-source maintainers"
|
|
29
|
+
pains:
|
|
30
|
+
- "Need to compare discoverability against adjacent coding-agent tools"
|
|
31
|
+
- "Need citation and share-of-voice evidence before changing public positioning"
|
|
32
|
+
use_cases:
|
|
33
|
+
- problem_statement: "When a developer uses Claude Code or Codex for multi-step work and needs visible task contracts, plans, checks, and run records instead of hidden chat memory."
|
|
34
|
+
audience: "developers using coding agents"
|
|
35
|
+
evidence_quotes: ["visible files", "task contract", "run state", "verification steps"]
|
|
36
|
+
confidence: 0.86
|
|
37
|
+
language: "en"
|
|
38
|
+
- problem_statement: "When a maintainer wants reusable coding-agent workflows that are saved only after explicit approval and can be inspected, diffed, and committed."
|
|
39
|
+
audience: "open-source maintainers"
|
|
40
|
+
evidence_quotes: ["reusable harnesses", "saved only after your approval", "open, diff, and commit"]
|
|
41
|
+
confidence: 0.84
|
|
42
|
+
language: "en"
|
|
43
|
+
- problem_statement: "Claude Code나 Codex 작업 사이에서 맥락이 사라지지 않도록 실행 상태, 검증 단계, 승인 기반 하네스를 파일로 남기고 싶을 때."
|
|
44
|
+
audience: "Korean-speaking coding-agent users"
|
|
45
|
+
evidence_quotes: ["작업 계약", "실행 상태", "검증 단계", "명시적 승인"]
|
|
46
|
+
confidence: 0.82
|
|
47
|
+
language: "ko"
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "tink-harness",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.15.0",
|
|
4
4
|
"description": "Self-growing harnesses for Claude Code and Codex.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"type": "module",
|
|
@@ -14,6 +14,7 @@
|
|
|
14
14
|
"commands/",
|
|
15
15
|
"skills/",
|
|
16
16
|
"hooks/",
|
|
17
|
+
"geobench/",
|
|
17
18
|
"docs/*.md",
|
|
18
19
|
"docs/pr/",
|
|
19
20
|
"README.md",
|
|
@@ -32,6 +32,14 @@ A valid `/tink:cast` response must do one of these:
|
|
|
32
32
|
|
|
33
33
|
If the task is clear enough to classify, do not ask broad clarification first. Make a best recommendation, ask for approval, then act.
|
|
34
34
|
|
|
35
|
+
## Cast mode
|
|
36
|
+
`/tink:cast` without a task argument shows the current mode and offers a change option. `/tink:cast <mode>` sets the mode and saves it to `cast_mode` in `.tink/config.json`.
|
|
37
|
+
|
|
38
|
+
Modes:
|
|
39
|
+
- `quick` — Forces Lane 1 fast path regardless of task complexity. Skips harness selection and starts immediately.
|
|
40
|
+
- `standard` — Default behavior. Quick triage selects the right lane automatically.
|
|
41
|
+
- `deep` — Runs a structured interview before planning. See **Deep mode** below.
|
|
42
|
+
|
|
35
43
|
## Interaction policy
|
|
36
44
|
Always call the `AskUserQuestion` tool for choice prompts. Do not render `❯` text format. Do not ask the user to type a number inline.
|
|
37
45
|
|
|
@@ -79,12 +87,21 @@ When Stitch is visible, show exactly one proposal in this order: proposal, reaso
|
|
|
79
87
|
2. reason
|
|
80
88
|
3. choices
|
|
81
89
|
|
|
82
|
-
|
|
83
|
-
1.
|
|
84
|
-
2. success criteria or verification
|
|
85
|
-
3.
|
|
86
|
-
4.
|
|
87
|
-
|
|
90
|
+
**Phase A — Blocking checks** (always run; always surface when triggered):
|
|
91
|
+
1. Safety or irreversibility
|
|
92
|
+
2. Missing success criteria or verification
|
|
93
|
+
3. Goal or scope ambiguity
|
|
94
|
+
4. Harness mismatch
|
|
95
|
+
|
|
96
|
+
**Phase B — Plan-shaping checks** (run after Phase A; surface only when a concrete code-grounded alternative exists):
|
|
97
|
+
5. Minimality — is the plan larger than the request warrants? Are new files, abstractions, or dependencies justified?
|
|
98
|
+
6. Reuse — does an existing helper, pattern, or flow already solve this?
|
|
99
|
+
7. Deletion/substitution — can the addition be replaced with deleting, configuring, or extending an existing path?
|
|
100
|
+
|
|
101
|
+
Phase B proposal rules:
|
|
102
|
+
- Never surface Phase B without a concrete alternative grounded in observed code or project state. "This looks large, consider simplifying" is not a valid finding.
|
|
103
|
+
- Never suggest reducing: trust boundary input validation, data loss prevention, security measures, accessibility basics, or explicitly requested requirements.
|
|
104
|
+
- In `deep` mode, skip Phase B entirely — the interview already covered minimality and reuse.
|
|
88
105
|
|
|
89
106
|
Stitch may change the order or method of work, but it must not change the user's goal without separate approval.
|
|
90
107
|
|
|
@@ -114,6 +131,38 @@ If the user chooses `Continue as-is` / `이대로 진행`, proceed with the expl
|
|
|
114
131
|
|
|
115
132
|
Do not record a clean Stitch pass.
|
|
116
133
|
|
|
134
|
+
## Deep mode
|
|
135
|
+
When `cast_mode` is `deep`, run a structured interview before the normal Procedure. The interview refines the task into a spec that feeds harness selection.
|
|
136
|
+
|
|
137
|
+
**Round 0 — Topology lock** (not counted in progress)
|
|
138
|
+
Before asking any questions, present the high-level components Claude infers from the request and visible codebase context. Ask the user to confirm, add, remove, or merge components. This prevents deep focus on one component from obscuring others.
|
|
139
|
+
|
|
140
|
+
**Interview loop — Rounds 1–10**
|
|
141
|
+
Show a progress indicator at the start of each question:
|
|
142
|
+
|
|
143
|
+
```
|
|
144
|
+
[Round N/10 ████████░░░░░░░░░░░░]
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
Rules:
|
|
148
|
+
- Ask one question per round. Never ask multiple questions in one round.
|
|
149
|
+
- Target the weakest clarity dimension each round: goal (0.35 weight), constraint (0.25), success criteria (0.25), context (0.15). These weights are internal judgment guides, not computed scores. Always pick the dimension where ambiguity most limits the next action.
|
|
150
|
+
- Brownfield rule: investigate the codebase before asking. Do not ask about things already visible in the code. Confirm findings rather than ask from scratch.
|
|
151
|
+
- Counter-question (user answers but also asks a question back): answer the counter-question first, then treat the combined response as this round's answer. Round counter does not advance.
|
|
152
|
+
- Clarification request (user does not understand the question): rephrase and re-ask within the same round. Round counter does not advance.
|
|
153
|
+
- Round 3+: user may exit the interview early and proceed directly to spec generation.
|
|
154
|
+
- Round 10: hard cap. End the interview and produce the spec regardless of ambiguity.
|
|
155
|
+
- End early when goal, constraint, and success criteria are all sufficiently clear, without waiting for Round 10.
|
|
156
|
+
|
|
157
|
+
**Question mode shift** (triggered by clarity state, not round number):
|
|
158
|
+
- When goal and constraint are sufficiently clear → shift to Contrarian mode: "What if the opposite were true? What if this assumption is wrong?"
|
|
159
|
+
- When those are also resolved → shift to Simplifier mode: "What is the smallest version that still has meaningful value?"
|
|
160
|
+
|
|
161
|
+
**Spec → plan.md → harness selection**
|
|
162
|
+
When the interview ends, write `.tink/current/plan.md` with these top-level sections: Goal, Topology, Constraints, Success Criteria, Open Questions.
|
|
163
|
+
|
|
164
|
+
Then proceed to the normal Procedure starting at step 3 (read harness index). Use the spec as the harness selection input instead of the raw task request. Stitch Phase A runs after harness selection as normal. Phase B is skipped.
|
|
165
|
+
|
|
117
166
|
## Reusable State Save Gate
|
|
118
167
|
Reusable State Save Gate is a separate absolute hard approval gate, not merely a Stitch subtype. Current-run approval does not authorize reusable-state writes.
|
|
119
168
|
|
|
@@ -160,6 +209,38 @@ Optional current-run artifacts are created only when their harness is selected:
|
|
|
160
209
|
- `goals.json`: current-run goals for `goal-checkpoint`; keep 2-6 goals, one active goal, status, done criteria, verification, and evidence.
|
|
161
210
|
- `delegation.md`: handoff or parallel-work packets for `delegation-brief`; include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
|
|
162
211
|
|
|
212
|
+
## Evidence Split
|
|
213
|
+
Evidence Split is a base-run habit, not a separate harness. It keeps real work small while the task is happening by splitting broad or uncertain work into evidence-sized packets.
|
|
214
|
+
|
|
215
|
+
Use Evidence Split at cast time and again during implementation when:
|
|
216
|
+
- the first plan has several uncertain facts,
|
|
217
|
+
- implementation starts coupling several files or concepts,
|
|
218
|
+
- a check fails and the next action is unclear,
|
|
219
|
+
- context is becoming broad or stale,
|
|
220
|
+
- independent verification, review, or handoff would reduce risk.
|
|
221
|
+
|
|
222
|
+
Skip it for tiny, obvious edits where a packet would not change the next action.
|
|
223
|
+
|
|
224
|
+
Packet vocabulary:
|
|
225
|
+
- `probe`: answer one unknown with 1-3 inputs.
|
|
226
|
+
- `patch`: make one narrow implementation change.
|
|
227
|
+
- `verify`: prove one success condition or failure recovery.
|
|
228
|
+
- `review`: inspect one risk, regression, or omission.
|
|
229
|
+
- `decision`: record one branch, chosen option, and evidence.
|
|
230
|
+
|
|
231
|
+
Represent packets in existing run state:
|
|
232
|
+
- `steps.json`: packetized steps and status.
|
|
233
|
+
- `context-map.json`: the input files, sources, or excluded context for each packet.
|
|
234
|
+
- `notes.md`: why work was split or re-split during implementation.
|
|
235
|
+
- `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
|
|
236
|
+
|
|
237
|
+
Safety defaults:
|
|
238
|
+
- Do not start workers, tmux panes, worktrees, or external agents automatically.
|
|
239
|
+
- Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
|
|
240
|
+
- Do not let multiple packets edit the same file concurrently.
|
|
241
|
+
- Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
|
|
242
|
+
- Keep each packet to 1-3 primary inputs when possible.
|
|
243
|
+
|
|
163
244
|
Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
|
|
164
245
|
|
|
165
246
|
```json
|
|
@@ -369,6 +450,7 @@ If any of the following is true, the task goes to Lane 3:
|
|
|
369
450
|
- The task description mentions any of the above concepts
|
|
370
451
|
|
|
371
452
|
**Step 2 — Lane decision (only if step 1 finds no hard-gate):**
|
|
453
|
+
If `cast_mode` is `quick`, always select Lane 1 here regardless of task signals.
|
|
372
454
|
|
|
373
455
|
**Lane 1 — instant start.** Any of these signals, with no contradicting complexity signal:
|
|
374
456
|
- a question, explanation, or lookup with no file edits
|
|
@@ -480,12 +562,13 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
|
|
|
480
562
|
- new pattern not covered yet
|
|
481
563
|
|
|
482
564
|
These are task types, not harness names. Generic types (code change, bug fix, research, review, docs) default to the base run; a harness is added only when a specialized one genuinely fits.
|
|
483
|
-
6.
|
|
565
|
+
6. Apply the Evidence Split check before choosing harnesses. If it changes the next action, represent the first packets in `steps.json` and connect each packet to context or verification evidence in `context-map.json`. Keep this check lightweight and skip it for tiny work.
|
|
566
|
+
7. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
|
|
484
567
|
- If the request is an ambiguous idea, early product concept, or underspecified implementation prompt, prefer `requirements-interview` before planning or coding. This is the default harness when Stitch is expected to trigger for goal ambiguity or missing acceptance criteria.
|
|
485
568
|
- If the request asks for a plan, architecture decision, large refactor, migration, or broad public contract change, consider `plan-consensus`.
|
|
486
569
|
- If the work naturally splits into multiple durable milestones, add `goal-checkpoint` and create `.tink/current/goals.json` after approval.
|
|
487
570
|
- If parallel review, independent verification, or handoff would reduce risk, add `delegation-brief` and create `.tink/current/delegation.md` after approval. This harness prepares briefs only; it never starts tmux, worktrees, workers, or external agents.
|
|
488
|
-
|
|
571
|
+
8. Consider focused work harnesses only when their trigger is strong enough to change the procedure:
|
|
489
572
|
- Use `issue-triage` for issue/PR/QA intake, ready-for-agent briefs, needs-info/wontfix decisions, or vertical issue slices.
|
|
490
573
|
- Use `bug-diagnosis-loop` for hard bugs, regressions, intermittent failures, or performance problems where a red-capable loop must come before code changes.
|
|
491
574
|
- Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
|
|
@@ -496,7 +579,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
|
|
|
496
579
|
- `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.
|
|
497
580
|
- `plan-consensus` must be explicitly considered for any from-scratch implementation, reimplementation, migration, or public contract/API design. If skipped, record a one-line reason in the 오버레이 점검 line.
|
|
498
581
|
- The context budget and the "prefer 1-3 harnesses" guidance never justify dropping a REQUIRED overlay: overlays are cheap state files, not extra loaded context. A large task judged "fine with default harnesses" because the synthesis probe found a fit is a selection bug - the probe only answers whether a custom procedure is needed, not whether overlays are needed.
|
|
499
|
-
|
|
582
|
+
9. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
|
|
500
583
|
|
|
501
584
|
After selecting, run a quick quality check using the index metadata for each chosen harness:
|
|
502
585
|
- If fewer than 2 words in `use_when` match the current task description (case-insensitive) → treat as a Stitch harness-mismatch signal
|
|
@@ -504,26 +587,26 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
|
|
|
504
587
|
- If `asks` is empty or missing and the task goal is not self-evident → treat as a Stitch goal-ambiguity signal
|
|
505
588
|
Feed any signals into the Stitch evaluation at step 16.
|
|
506
589
|
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
|
|
510
|
-
|
|
511
|
-
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
|
|
515
|
-
|
|
516
|
-
|
|
517
|
-
|
|
518
|
-
|
|
519
|
-
|
|
590
|
+
10. Add any rule graph check candidates to `contract.json` verification if they are relevant and cheap. For risky commands, set `approval_required: true`.
|
|
591
|
+
11. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
|
|
592
|
+
12. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
|
|
593
|
+
13. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
|
|
594
|
+
14. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
|
|
595
|
+
15. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
|
|
596
|
+
16. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
|
|
597
|
+
17. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
|
|
598
|
+
18. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
|
|
599
|
+
19. Ask for explicit approval before non-trivial work.
|
|
600
|
+
20. After approval, read only the selected harness files and any approved run-only draft.
|
|
601
|
+
21. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `goals.json` for `goal-checkpoint` and `delegation.md` for `delegation-brief`.
|
|
602
|
+
22. Execute the first safe step immediately:
|
|
520
603
|
- inspect relevant files,
|
|
521
604
|
- run a read-only diagnostic,
|
|
522
605
|
- draft the first artifact,
|
|
523
606
|
- or reproduce the issue.
|
|
524
|
-
|
|
525
|
-
|
|
526
|
-
|
|
607
|
+
23. Keep `steps.json`, `notes.md`, `contract.json`, and `session.json` current as work progresses. Re-run Evidence Split when new uncertainty, coupling, failed checks, or context sprawl appears; update packetized steps and context evidence before continuing. When present, keep `goals.json` and `delegation.md` aligned with actual status and evidence. When the Progress display trigger applies, end every response with the progress block.
|
|
608
|
+
24. Before final, run `/tink:verify` behavior for required contract checks or state why verification is blocked.
|
|
609
|
+
25. If the task exposed a repeated mistake or reusable improvement, use the Reusable State Save Gate approval payload below. Save only after separate user approval.
|
|
527
610
|
|
|
528
611
|
|
|
529
612
|
## Synthesis probe
|
|
@@ -26,23 +26,25 @@ Accept legacy `$tink <action>` spelling for compatibility, but present `$tink:<a
|
|
|
26
26
|
6. If `.tink/current/` exists and continuity is uncertain, read `plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, and `contract.json` when present; summarize goal, last safe point, next step, open questions, and verification; then ask resume/archive/replace/cancel before continuing.
|
|
27
27
|
7. Run the synthesis probe before committing to `.tink/current/`. Strong fit keeps the harness; generic fit adds a run-only draft; no fit loads `harness-synthesis`.
|
|
28
28
|
8. If too many tools, skills, agents, or harnesses are available, use `harness-curation` to choose the smallest effective set before loading more context.
|
|
29
|
-
9. Treat
|
|
30
|
-
10.
|
|
31
|
-
11.
|
|
32
|
-
12.
|
|
33
|
-
13.
|
|
34
|
-
14.
|
|
35
|
-
15.
|
|
36
|
-
16.
|
|
37
|
-
17.
|
|
38
|
-
18.
|
|
39
|
-
19.
|
|
40
|
-
20.
|
|
41
|
-
21.
|
|
42
|
-
22.
|
|
43
|
-
23.
|
|
44
|
-
24.
|
|
45
|
-
25.
|
|
29
|
+
9. Treat Evidence Split as a base-run habit, not a harness: for non-trivial work, first ask whether the task should be split into `probe`, `patch`, `verify`, `review`, or `decision` packets. Use it at cast time and again during implementation when uncertainty grows, a check fails, context gets broad, or several changes start to couple. Keep it lightweight for tiny tasks and skip it when it would add ceremony without changing the next action.
|
|
30
|
+
10. Treat visible-thinking and focused work workflows as ordinary Tink harness choices, not new commands. Actively consider them when their trigger changes the procedure: use `requirements-interview` for ambiguity, unclear scope, or missing acceptance criteria; `plan-consensus` for broad plans, migrations, API/schema/contract changes, or tradeoffs; `goal-checkpoint` for multi-file, multi-phase, resumed, release, or long runs; `delegation-brief` for handoff, independent verification, parallel review, or another agent/human brief; `issue-triage` for issue/PR/QA intake or vertical slices; `bug-diagnosis-loop` for hard bugs that need a red-capable loop before code changes; `review-two-axis` for Standards/Spec diff review; `decision-map` for multi-session unresolved decisions; and `architecture-deepening` for deep module, interface, seam, leverage, locality, or testability work.
|
|
31
|
+
11. Run Stitch once before committing to `.tink/current/`. Phase A (Blocking): always evaluate and surface when triggered — safety/irreversibility, missing success criteria, goal ambiguity, harness mismatch. Phase B (Plan-shaping): run after Phase A, surface only when a concrete code-grounded alternative exists — minimality, reuse, or deletion/substitution. Never surface Phase B without observed code evidence; never suggest reducing trust-boundary validation, data-loss prevention, security, accessibility, or explicitly requested requirements. In `deep` mode, skip Phase B entirely. Show exactly one proposal and use the configured language.
|
|
32
|
+
12. For non-trivial `$tink:cast` runs, ask for current-run approval before creating `.tink/current/`, loading harness bodies, editing files, or executing the first step. Codex must not silently treat a command invocation as approval.
|
|
33
|
+
13. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
|
|
34
|
+
14. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
|
|
35
|
+
15. Current-run approval never authorizes reusable-state writes. Before saving reusable state, show operation, destination files, exact entry or patch summary, reusable reason, sensitive content excluded, and rollback/removal path.
|
|
36
|
+
16. Before saving a reusable rule graph update, run a structural gate: duplicate, breadth, evidence, verification, Claude Code/Codex compatibility, macOS/Windows compatibility, and portable commands. AI may propose a rule; saving it still requires separate approval.
|
|
37
|
+
17. `$tink:frog` may inspect rule quality as well as harness quality. Prefer keep, rewrite, split, merge, or needs-evidence recommendations before any removal proposal.
|
|
38
|
+
18. For `$tink:weave` or `$tink:frog`, prepare the harness health summary before ranking candidates. If `.tink/tools/generate-harness-lifecycle-summary.mjs` exists, run `node .tink/tools/generate-harness-lifecycle-summary.mjs` from the repo root and then read `.tink/maintenance/harness-lifecycle.json`. If the generator is missing, continue from compact run, queue, ledger, and friction evidence.
|
|
39
|
+
19. When `.tink/maintenance/harness-lifecycle.json` or another file following `.tink/schemas/harness-lifecycle.schema.json` exists, treat it as a plain harness health summary. Use `confidence`, `evidence_grade`, `evidence_handles`, and `safe_next_action` to prioritize `$tink:weave` or `$tink:frog` candidates, but do not treat it as approval. Low-confidence entries stay as observation. Harness edits, rule updates, memory saves, merges, archives, and deletions still require the reusable-state approval gate.
|
|
40
|
+
20. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`. If selected, also create `.tink/current/goals.json` for `goal-checkpoint` and `.tink/current/delegation.md` for `delegation-brief`. Evidence Split packets live in these run files; do not add a new public command or standalone runtime file for them.
|
|
41
|
+
21. Do not stop at recommendation. Execute the first safe step after run state exists.
|
|
42
|
+
22. Run `$tink:verify` behavior before final when `contract.json` lists required checks. If `.tink/config.json` has `completion_policy: "strict"`, do not call the run done until required checks are represented in `.tink/current/verification.json`, `.tink/current/evidence.md` exists, and remaining risk is stated.
|
|
43
|
+
23. Store reusable memory or rule updates under `.tink/` only after separate approval.
|
|
44
|
+
24. If a check fails, update `.tink/current/notes.md`, state the failure, last safe point, and next single action. Append compact friction to `.tink/maintenance/friction.jsonl` when it exists. Feed repeated failures to `$tink:weave`.
|
|
45
|
+
25. Keep context compact. Do not paste raw logs or full diffs.
|
|
46
|
+
26. Use calm, clear, concise language. Prefer plain everyday words over technical terms. No jokes.
|
|
47
|
+
27. Read `cast_mode` from `.tink/config.json` before classifying the task. If `quick`, force Lane 1 (instant start) unless a hard-gate signal is present. If `deep`, run the structured interview before harness selection: (Round 0) present inferred topology and confirm with the user; (Rounds 1–10 max) ask one question per round targeting the weakest clarity dimension — goal (0.35 weight), constraint (0.25), success criteria (0.25), context (0.15) — investigate brownfield code before asking, do not ask what is already visible; show `[Round N/10 ████░░░░░░░░░░░░░░░░]` at each question; allow early exit from Round 3+; shift to Contrarian questioning when goal and constraint are clear, then Simplifier when those resolve; end by writing Goal, Topology, Constraints, Success Criteria, Open Questions to `.tink/current/plan.md`, then proceed to harness selection with Stitch Phase A only.
|
|
46
48
|
|
|
47
49
|
## Codex Approval Protocol
|
|
48
50
|
|
|
@@ -120,6 +122,39 @@ Optional current-run artifacts:
|
|
|
120
122
|
- `.tink/current/goals.json`: create only when `goal-checkpoint` is selected. Keep 2-6 goals, one active goal, status, done criteria, verification, evidence, and next action.
|
|
121
123
|
- `.tink/current/delegation.md`: create only when `delegation-brief` is selected. Include packet scope, forbidden actions, expected evidence, and reconciliation notes. Do not start tmux panes, worktrees, workers, or external agents from this harness.
|
|
122
124
|
|
|
125
|
+
## Evidence Split
|
|
126
|
+
|
|
127
|
+
Evidence Split is Tink's default way to keep real work small while it is happening. It is not a separate harness and it does not imply parallel execution.
|
|
128
|
+
|
|
129
|
+
Use Evidence Split when a task is non-trivial and any of these signals appears:
|
|
130
|
+
- the first plan has several uncertain facts,
|
|
131
|
+
- implementation starts coupling several files or concepts,
|
|
132
|
+
- a check fails and the next action is unclear,
|
|
133
|
+
- context is becoming broad or stale,
|
|
134
|
+
- independent verification, review, or handoff would reduce risk.
|
|
135
|
+
|
|
136
|
+
Skip it for tiny, obvious edits where a packet would not change the next action.
|
|
137
|
+
|
|
138
|
+
Packet vocabulary:
|
|
139
|
+
- `probe`: answer one unknown with 1-3 inputs.
|
|
140
|
+
- `patch`: make one narrow implementation change.
|
|
141
|
+
- `verify`: prove one success condition or failure recovery.
|
|
142
|
+
- `review`: inspect one risk, regression, or omission.
|
|
143
|
+
- `decision`: record one branch, chosen option, and evidence.
|
|
144
|
+
|
|
145
|
+
Represent packets in existing run state:
|
|
146
|
+
- `steps.json`: packetized steps and status.
|
|
147
|
+
- `context-map.json`: the input files, sources, or excluded context for each packet.
|
|
148
|
+
- `notes.md`: why work was split or re-split during implementation.
|
|
149
|
+
- `delegation.md`: only when `delegation-brief` is selected or another human/agent packet is explicitly needed.
|
|
150
|
+
|
|
151
|
+
Safety defaults:
|
|
152
|
+
- Do not start workers, tmux panes, worktrees, or external agents automatically.
|
|
153
|
+
- Packet outputs are evidence, risks, recommendations, or patch candidates by default; direct edits require the main agent's normal approval and ownership.
|
|
154
|
+
- Do not let multiple packets edit the same file concurrently.
|
|
155
|
+
- Keep secrets, public contracts, broad refactors, release/publish actions, and final reconciliation under the main agent's control.
|
|
156
|
+
- Keep each packet to 1-3 primary inputs when possible.
|
|
157
|
+
|
|
123
158
|
GJC-style harness selection rules:
|
|
124
159
|
|
|
125
160
|
- Ambiguous ideas, early product concepts, vague bug reports, broad "make it better" requests, and underspecified implementation prompts should start with `requirements-interview`, usually alone until the user clarifies enough to plan or code.
|