tink-harness 1.14.0 → 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "tink",
3
3
  "description": "A small harness layer for Claude Code and Codex.",
4
- "version": "1.14.0",
4
+ "version": "1.15.0",
5
5
  "author": {
6
6
  "name": "dotori"
7
7
  }
package/CHANGELOG.md CHANGED
@@ -2,6 +2,17 @@
2
2
 
3
3
  All notable changes to Tink are tracked here.
4
4
 
5
+ ## Unreleased
6
+
7
+ - Added a geobench product spec and runbook for measuring Tink's LLM answer visibility with hit rate, MRR, share of voice, and citation metrics. The runbook keeps benchmark execution separate from this repo and says to publish aggregate metrics only.
8
+
9
+ ## [1.15.0] - 2026-06-24
10
+
11
+ - Added cast mode system: `/tink:cast` now supports three modes — `quick` (forces Lane 1 fast path), `standard` (default, auto triage), and `deep` (structured interview before planning). The active mode is persisted in `.tink/config.json` as `cast_mode`. Setting the mode with `/tink:cast <mode>` shows the current mode and offers a change option when called without a task.
12
+ - Added `deep` mode interview pipeline: Round 0 topology lock confirms inferred components before questions start; Rounds 1–10 ask one question per round with a `[Round N/10 ████░░░]` progress indicator, target the weakest clarity dimension (goal/constraint/success criteria/context), investigate brownfield code before asking, handle counter-questions and clarification requests within the same round, allow early exit from Round 3+, and shift from Contrarian to Simplifier questioning as clarity improves. The interview produces a Goal/Topology/Constraints/Success Criteria/Open Questions spec written to `plan.md` before harness selection begins.
13
+ - Upgraded Stitch to Phase A / Phase B: Phase A (Blocking — safety, missing success criteria, goal ambiguity, harness mismatch) always runs and always surfaces when triggered. Phase B (Plan-shaping — minimality, reuse, deletion/substitution) runs only when a concrete code-grounded alternative exists and is skipped entirely in `deep` mode. Phase B never suggests reducing trust-boundary validation, data-loss prevention, security, accessibility, or explicitly requested requirements.
14
+ - Codex: Rule 27 added for `cast_mode` and `deep` mode behavior; Rule 11 updated for Stitch Phase A/B.
15
+
5
16
  ## [1.14.0] - 2026-06-19
6
17
 
7
18
  - Added `CLAUDE_CONFIG_DIR` support: global installs now respect the env var (set via direnv or shell) so commands and skills land in the right config directory instead of always defaulting to `~/.claude`.
@@ -18,7 +29,6 @@ All notable changes to Tink are tracked here.
18
29
  - Added evidence lifecycle manager groundwork: `/tink:verify` now records a human-readable `.tink/current/evidence.md` summary card, config includes a `completion_policy` field for optional strict "no evidence, no done" behavior, and the dashboard lifecycle summary now exposes ROI hints, trust levels, and Activity-tab run review cards for failed or blocked runs without adding a new public replay command.
19
30
  - Fixed: `npx tink-harness update` now prefers the current repo when `.tink/` exists there, so a global/home install scope no longer redirects update tests or repo-local updates away from the current project. Stored `git_policy` is still respected.
20
31
  - Improved: the Activity dashboard cards were checked in desktop and mobile Chrome headless screenshots, with narrower mobile layout and shorter run-review fallback copy so the new evidence cards stay readable.
21
-
22
32
  ## [1.11.2] - 2026-06-13
23
33
 
24
34
  - Fixed: the 3D harness map showed no connections or signal pulses on fresh installs (or installs whose history was lost to the pre-1.11.0 record-wipe bug). The lifecycle summary's graph was built only from run/ledger evidence; it now also includes the static rule graph - every routing rule connects to its harness, and check/guard chains render - so the map is alive from the first open.
package/README.ko.md CHANGED
@@ -10,7 +10,7 @@ Tink는 사소하지 않은 모든 에이전트 작업을 눈에 보이는 파
10
10
 
11
11
  <sub>Claude Code와 Codex를 위한 작은 하네스 레이어</sub>
12
12
 
13
- **최신 패키지:** v1.14.0 — 글로벌 설치 `CLAUDE_CONFIG_DIR` 환경변수를 반영하고, `update --all-repos`로 하위 모든 Tink 레포를 번에 업데이트할 있게 됐습니다. direnv가 있으면 레포별 `.envrc`를 자동으로 로드합니다. 전체 변경 이력은 [CHANGELOG](CHANGELOG.md)를 확인하세요.
13
+ **최신 패키지:** v1.15.0 — `/tink:cast`에 가지 모드(quick / standard / deep)가 생겼습니다. deep 모드는 계획 전에 최대 10라운드 인터뷰를 진행하고, Stitch는 Phase A(차단) · Phase B(계획 조정)로 나뉩니다. 전체 변경 이력은 [CHANGELOG](CHANGELOG.md)를 확인하세요.
14
14
 
15
15
  [English](README.md) · **한국어** · [변경 이력](CHANGELOG.md)
16
16
 
@@ -125,6 +125,18 @@ npx tink-harness dashboard # 파일만 만들려면 --no-open 추가
125
125
 
126
126
  ---
127
127
 
128
+ ## GEO 노출도 측정
129
+
130
+ Tink에는 LLM 답변에서 Tink가 얼마나 자주 언급되고, 어느 순위로 추천되며, 어떤 출처로 인용되는지 측정하기 위한 geobench 제품 스펙이 포함되어 있습니다.
131
+
132
+ - Spec: [`geobench/tink-harness.yaml`](geobench/tink-harness.yaml)
133
+ - Runbook: [`docs/geobench.md`](docs/geobench.md)
134
+ - 지표: hit rate, MRR, share of voice, citation rate/share, confidence interval
135
+
136
+ 벤치마크 결과는 집계 지표만 공개하세요. 원문 provider 답변, 시크릿, 개인 실행 로그는 공개하지 않습니다.
137
+
138
+ ---
139
+
128
140
  ## 왜 만들었나
129
141
 
130
142
  새로운 AI 코딩 하네스와 워크플로는 계속 늘어납니다. 좋은 것도 많지만, 여러 개를 섞다 보면 환경이 무거워지고 매번 다시 정리해야 합니다.
@@ -228,6 +240,7 @@ Tink가 아는 모든 것은 직접 읽고, diff 보고, 지울 수 있는 평
228
240
  - 하네스 건강 요약: `docs/harness-lifecycle-signals.ko.md`, `docs/harness-lifecycle-signals.md`
229
241
  - 외부 context 안전: `docs/mcp-safe-profile.md`, `docs/external-context-policy.md`
230
242
  - `.tink/current/` 상태 읽기: `docs/work-state.ko.md`, `docs/work-state.md`
243
+ - GEO 노출도 벤치마크: `docs/geobench.md` · spec: `geobench/tink-harness.yaml`
231
244
  - 업데이트 안정화: `docs/phase-5-update-confidence.ko.md`, `docs/phase-5-update-confidence.md`
232
245
  - Context 효율: `docs/context-budget-ledger.ko.md`, `docs/context-budget-ledger.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-metrics-evaluator.md`, `docs/context-run-history-rollup.ko.md`, `docs/context-run-history-rollup.md`, `docs/context-threshold-status.ko.md`, `docs/context-threshold-status.md`, `docs/context-run-record-policy.ko.md`, `docs/context-run-record-policy.md`
233
246
  - 남은 작업 단위: `docs/planned-work-units.ko.md`, `docs/planned-work-units.md` · 로드맵·아이디어 점검: `docs/tink-idea-implementation-plan.ko.md`
package/README.md CHANGED
@@ -17,14 +17,14 @@
17
17
  <p><sub>A small harness layer for Claude Code and Codex</sub></p>
18
18
 
19
19
  <p>
20
- <a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.13.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
20
+ <a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.14.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
21
21
  <a href="https://www.npmjs.com/package/tink-harness"><img src="https://img.shields.io/npm/v/tink-harness?label=npm&color=cb3837" alt="npm version"></a>
22
22
  <a href="https://github.com/dotoricode/tink-harness/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/dotoricode/tink-harness/ci.yml?branch=main&label=ci" alt="CI"></a>
23
23
  <a href="https://github.com/dotoricode/tink-harness/blob/main/LICENSE"><img src="https://img.shields.io/github/license/dotoricode/tink-harness" alt="License"></a>
24
24
  <a href="https://github.com/dotoricode/tink-harness/stargazers"><img src="https://img.shields.io/github/stars/dotoricode/tink-harness?style=social" alt="GitHub stars"></a>
25
25
  </p>
26
26
 
27
- <p><strong>Latest package:</strong> v1.14.0 - Tink respects <code>CLAUDE_CONFIG_DIR</code> for global installs and adds <code>update --all-repos</code> to refresh every Tink-installed repo in one command, with direnv support for per-repo env overrides. See <a href="CHANGELOG.md">CHANGELOG</a> for release history.</p>
27
+ <p><strong>Latest package:</strong> v1.15.0 - <code>/tink:cast</code> now supports three modes (quick / standard / deep); deep mode runs a structured 10-round interview before planning, and Stitch is split into Phase A (blocking) and Phase B (plan-shaping). See <a href="CHANGELOG.md">CHANGELOG</a> for release history.</p>
28
28
 
29
29
  **English** · [한국어](README.ko.md) · [Changelog](CHANGELOG.md)
30
30
 
@@ -139,6 +139,18 @@ No server, no telemetry, no hidden cache - it is a static local page that only p
139
139
 
140
140
  ---
141
141
 
142
+ ## Measure GEO visibility
143
+
144
+ Tink includes a geobench product spec so maintainers can measure how often LLM answers mention, rank, and cite Tink across providers.
145
+
146
+ - Spec: [`geobench/tink-harness.yaml`](geobench/tink-harness.yaml)
147
+ - Runbook: [`docs/geobench.md`](docs/geobench.md)
148
+ - Metrics: hit rate, MRR, share of voice, citation rate/share, and confidence intervals
149
+
150
+ Use the benchmark for aggregate visibility checks only. Do not publish raw provider answers, secrets, or private run logs.
151
+
152
+ ---
153
+
142
154
  ## Why I made this
143
155
 
144
156
  *Tink is <strong>knit</strong> in reverse: untying tangled workflows and knitting better ones back together. It also nods to Tinker Bell, the small helper at your side.*
@@ -282,6 +294,7 @@ The dashboard is a static local page rendered from those files — the harness h
282
294
  - Harness health summary: `docs/harness-lifecycle-signals.md`, `docs/harness-lifecycle-signals.ko.md`
283
295
  - External context safety: `docs/mcp-safe-profile.md`, `docs/external-context-policy.md`
284
296
  - Reading `.tink/current/` state: `docs/work-state.md`, `docs/work-state.ko.md`
297
+ - GEO visibility benchmark: `docs/geobench.md` · spec: `geobench/tink-harness.yaml`
285
298
  - Update confidence: `docs/phase-5-update-confidence.md`, `docs/phase-5-update-confidence.ko.md`
286
299
  - Context efficiency: `docs/context-budget-ledger.md`, `docs/context-budget-ledger.ko.md`, `docs/context-metrics-evaluator.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-run-history-rollup.md`, `docs/context-run-history-rollup.ko.md`, `docs/context-threshold-status.md`, `docs/context-threshold-status.ko.md`, `docs/context-run-record-policy.md`, `docs/context-run-record-policy.ko.md`
287
300
  - Planned work units: `docs/planned-work-units.md`, `docs/planned-work-units.ko.md` · roadmap and idea audit: `docs/tink-idea-implementation-plan.ko.md`
package/VERSIONING.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Versioning
2
2
 
3
- Current version: `1.14.0`
3
+ Current version: `1.15.0`
4
4
 
5
5
  Tink follows semver from `1.0.0` onward.
6
6
 
package/commands/cast.md CHANGED
@@ -32,6 +32,14 @@ A valid `/tink:cast` response must do one of these:
32
32
 
33
33
  If the task is clear enough to classify, do not ask broad clarification first. Make a best recommendation, ask for approval, then act.
34
34
 
35
+ ## Cast mode
36
+ `/tink:cast` without a task argument shows the current mode and offers a change option. `/tink:cast <mode>` sets the mode and saves it to `cast_mode` in `.tink/config.json`.
37
+
38
+ Modes:
39
+ - `quick` — Forces Lane 1 fast path regardless of task complexity. Skips harness selection and starts immediately.
40
+ - `standard` — Default behavior. Quick triage selects the right lane automatically.
41
+ - `deep` — Runs a structured interview before planning. See **Deep mode** below.
42
+
35
43
  ## Interaction policy
36
44
  Always call the `AskUserQuestion` tool for choice prompts. Do not render `❯` text format. Do not ask the user to type a number inline.
37
45
 
@@ -79,12 +87,21 @@ When Stitch is visible, show exactly one proposal in this order: proposal, reaso
79
87
  2. reason
80
88
  3. choices
81
89
 
82
- Choose the one proposal by priority:
83
- 1. safety or irreversibility
84
- 2. success criteria or verification
85
- 3. goal or scope ambiguity
86
- 4. harness mismatch
87
- 5. reusable improvement opportunity
90
+ **Phase A Blocking checks** (always run; always surface when triggered):
91
+ 1. Safety or irreversibility
92
+ 2. Missing success criteria or verification
93
+ 3. Goal or scope ambiguity
94
+ 4. Harness mismatch
95
+
96
+ **Phase B — Plan-shaping checks** (run after Phase A; surface only when a concrete code-grounded alternative exists):
97
+ 5. Minimality — is the plan larger than the request warrants? Are new files, abstractions, or dependencies justified?
98
+ 6. Reuse — does an existing helper, pattern, or flow already solve this?
99
+ 7. Deletion/substitution — can the addition be replaced with deleting, configuring, or extending an existing path?
100
+
101
+ Phase B proposal rules:
102
+ - Never surface Phase B without a concrete alternative grounded in observed code or project state. "This looks large, consider simplifying" is not a valid finding.
103
+ - Never suggest reducing: trust boundary input validation, data loss prevention, security measures, accessibility basics, or explicitly requested requirements.
104
+ - In `deep` mode, skip Phase B entirely — the interview already covered minimality and reuse.
88
105
 
89
106
  Stitch may change the order or method of work, but it must not change the user's goal without separate approval.
90
107
 
@@ -114,6 +131,38 @@ If the user chooses `Continue as-is` / `이대로 진행`, proceed with the expl
114
131
 
115
132
  Do not record a clean Stitch pass.
116
133
 
134
+ ## Deep mode
135
+ When `cast_mode` is `deep`, run a structured interview before the normal Procedure. The interview refines the task into a spec that feeds harness selection.
136
+
137
+ **Round 0 — Topology lock** (not counted in progress)
138
+ Before asking any questions, present the high-level components Claude infers from the request and visible codebase context. Ask the user to confirm, add, remove, or merge components. This prevents deep focus on one component from obscuring others.
139
+
140
+ **Interview loop — Rounds 1–10**
141
+ Show a progress indicator at the start of each question:
142
+
143
+ ```
144
+ [Round N/10 ████████░░░░░░░░░░░░]
145
+ ```
146
+
147
+ Rules:
148
+ - Ask one question per round. Never ask multiple questions in one round.
149
+ - Target the weakest clarity dimension each round: goal (0.35 weight), constraint (0.25), success criteria (0.25), context (0.15). These weights are internal judgment guides, not computed scores. Always pick the dimension where ambiguity most limits the next action.
150
+ - Brownfield rule: investigate the codebase before asking. Do not ask about things already visible in the code. Confirm findings rather than ask from scratch.
151
+ - Counter-question (user answers but also asks a question back): answer the counter-question first, then treat the combined response as this round's answer. Round counter does not advance.
152
+ - Clarification request (user does not understand the question): rephrase and re-ask within the same round. Round counter does not advance.
153
+ - Round 3+: user may exit the interview early and proceed directly to spec generation.
154
+ - Round 10: hard cap. End the interview and produce the spec regardless of ambiguity.
155
+ - End early when goal, constraint, and success criteria are all sufficiently clear, without waiting for Round 10.
156
+
157
+ **Question mode shift** (triggered by clarity state, not round number):
158
+ - When goal and constraint are sufficiently clear → shift to Contrarian mode: "What if the opposite were true? What if this assumption is wrong?"
159
+ - When those are also resolved → shift to Simplifier mode: "What is the smallest version that still has meaningful value?"
160
+
161
+ **Spec → plan.md → harness selection**
162
+ When the interview ends, write `.tink/current/plan.md` with these top-level sections: Goal, Topology, Constraints, Success Criteria, Open Questions.
163
+
164
+ Then proceed to the normal Procedure starting at step 3 (read harness index). Use the spec as the harness selection input instead of the raw task request. Stitch Phase A runs after harness selection as normal. Phase B is skipped.
165
+
117
166
  ## Reusable State Save Gate
118
167
  Reusable State Save Gate is a separate absolute hard approval gate, not merely a Stitch subtype. Current-run approval does not authorize reusable-state writes.
119
168
 
@@ -401,6 +450,7 @@ If any of the following is true, the task goes to Lane 3:
401
450
  - The task description mentions any of the above concepts
402
451
 
403
452
  **Step 2 — Lane decision (only if step 1 finds no hard-gate):**
453
+ If `cast_mode` is `quick`, always select Lane 1 here regardless of task signals.
404
454
 
405
455
  **Lane 1 — instant start.** Any of these signals, with no contradicting complexity signal:
406
456
  - a question, explanation, or lookup with no file edits
@@ -0,0 +1,29 @@
1
+ # GEO Benchmark For Tink
2
+
3
+ This repository includes a [`geobench`](https://github.com/NomaDamas/geobench) product spec for measuring LLM answer visibility: hit rate, MRR, share of voice, citation rate/share, and confidence intervals.
4
+
5
+ Product spec: [`geobench/tink-harness.yaml`](../geobench/tink-harness.yaml)
6
+
7
+ ## Run
8
+
9
+ Use a local checkout or install of `geobench`; do not commit `.env`, raw run logs, or provider responses.
10
+
11
+ ```bash
12
+ /path/to/geobench/dist/geobench estimate --product geobench/tink-harness.yaml --providers openai --tier cheap
13
+ /path/to/geobench/dist/geobench profile geobench/tink-harness.yaml
14
+ /path/to/geobench/dist/geobench bench --product geobench/tink-harness.yaml --providers openai --tier cheap --mode benchmark
15
+ ```
16
+
17
+ To inspect results locally:
18
+
19
+ ```bash
20
+ /path/to/geobench/dist/geobench dash
21
+ ```
22
+
23
+ ## Publishing Boundary
24
+
25
+ Publish aggregate metrics only. Do not publish raw provider answers, secrets, private run logs, or `.env` values. When citing results, include the run date, provider set, tier, query count, and whether the spec was profiled before the run.
26
+
27
+ ## Korean Summary
28
+
29
+ 이 repo에는 Tink의 LLM 답변 노출도를 측정하기 위한 geobench 제품 스펙이 포함되어 있습니다. 실행 결과를 공개할 때는 hit rate, MRR, share of voice, citation rate/share 같은 집계 지표만 공개하고, 원문 provider 답변·시크릿·개인 실행 로그는 공개하지 않습니다.
@@ -0,0 +1,47 @@
1
+ name: "Tink"
2
+ aliases: ["Tink Harness", "tink-harness", "Tink for Claude Code", "Tink for Codex"]
3
+ romanizations: []
4
+ category: "coding-agent harness and visible workflow runtime"
5
+ description: "Tink is a small harness layer for Claude Code and Codex that keeps non-trivial agent work in visible files: task contracts, run state, verification checks, run records, and approval-gated reusable harnesses."
6
+ competitors: ["Gajae-Code", "Claude Code", "Codex CLI", "OpenCode", "Aider", "Cursor"]
7
+ cited_domains: ["github.com/dotoricode/tink-harness", "npmjs.com/package/tink-harness"]
8
+ target_languages: ["en", "ko"]
9
+ target_audience:
10
+ - "developers using Claude Code or Codex for multi-step coding work"
11
+ - "maintainers who want visible run state and verification evidence"
12
+ - "AI-agent workflow builders comparing harness and memory approaches"
13
+ discovery_sources:
14
+ - "https://github.com/dotoricode/tink-harness"
15
+ - "https://www.npmjs.com/package/tink-harness"
16
+
17
+ enriched_profile:
18
+ generated_at: "2026-06-15T00:00:00Z"
19
+ profiler_model: "curated-public-source"
20
+ value_proposition: "A local, approval-gated harness layer that makes Claude Code and Codex work inspectable through task contracts, current-run files, verification evidence, run records, and reusable workflow harnesses."
21
+ source_content_hashes:
22
+ - "curated-public-source"
23
+ target_audience:
24
+ - segment: "coding-agent operators"
25
+ pains:
26
+ - "Need visible run state instead of relying on hidden chat memory"
27
+ - "Need repeatable verification and approval boundaries for multi-step agent work"
28
+ - segment: "open-source maintainers"
29
+ pains:
30
+ - "Need to compare discoverability against adjacent coding-agent tools"
31
+ - "Need citation and share-of-voice evidence before changing public positioning"
32
+ use_cases:
33
+ - problem_statement: "When a developer uses Claude Code or Codex for multi-step work and needs visible task contracts, plans, checks, and run records instead of hidden chat memory."
34
+ audience: "developers using coding agents"
35
+ evidence_quotes: ["visible files", "task contract", "run state", "verification steps"]
36
+ confidence: 0.86
37
+ language: "en"
38
+ - problem_statement: "When a maintainer wants reusable coding-agent workflows that are saved only after explicit approval and can be inspected, diffed, and committed."
39
+ audience: "open-source maintainers"
40
+ evidence_quotes: ["reusable harnesses", "saved only after your approval", "open, diff, and commit"]
41
+ confidence: 0.84
42
+ language: "en"
43
+ - problem_statement: "Claude Code나 Codex 작업 사이에서 맥락이 사라지지 않도록 실행 상태, 검증 단계, 승인 기반 하네스를 파일로 남기고 싶을 때."
44
+ audience: "Korean-speaking coding-agent users"
45
+ evidence_quotes: ["작업 계약", "실행 상태", "검증 단계", "명시적 승인"]
46
+ confidence: 0.82
47
+ language: "ko"
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tink-harness",
3
- "version": "1.14.0",
3
+ "version": "1.15.0",
4
4
  "description": "Self-growing harnesses for Claude Code and Codex.",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -14,6 +14,7 @@
14
14
  "commands/",
15
15
  "skills/",
16
16
  "hooks/",
17
+ "geobench/",
17
18
  "docs/*.md",
18
19
  "docs/pr/",
19
20
  "README.md",
@@ -32,6 +32,14 @@ A valid `/tink:cast` response must do one of these:
32
32
 
33
33
  If the task is clear enough to classify, do not ask broad clarification first. Make a best recommendation, ask for approval, then act.
34
34
 
35
+ ## Cast mode
36
+ `/tink:cast` without a task argument shows the current mode and offers a change option. `/tink:cast <mode>` sets the mode and saves it to `cast_mode` in `.tink/config.json`.
37
+
38
+ Modes:
39
+ - `quick` — Forces Lane 1 fast path regardless of task complexity. Skips harness selection and starts immediately.
40
+ - `standard` — Default behavior. Quick triage selects the right lane automatically.
41
+ - `deep` — Runs a structured interview before planning. See **Deep mode** below.
42
+
35
43
  ## Interaction policy
36
44
  Always call the `AskUserQuestion` tool for choice prompts. Do not render `❯` text format. Do not ask the user to type a number inline.
37
45
 
@@ -79,12 +87,21 @@ When Stitch is visible, show exactly one proposal in this order: proposal, reaso
79
87
  2. reason
80
88
  3. choices
81
89
 
82
- Choose the one proposal by priority:
83
- 1. safety or irreversibility
84
- 2. success criteria or verification
85
- 3. goal or scope ambiguity
86
- 4. harness mismatch
87
- 5. reusable improvement opportunity
90
+ **Phase A Blocking checks** (always run; always surface when triggered):
91
+ 1. Safety or irreversibility
92
+ 2. Missing success criteria or verification
93
+ 3. Goal or scope ambiguity
94
+ 4. Harness mismatch
95
+
96
+ **Phase B — Plan-shaping checks** (run after Phase A; surface only when a concrete code-grounded alternative exists):
97
+ 5. Minimality — is the plan larger than the request warrants? Are new files, abstractions, or dependencies justified?
98
+ 6. Reuse — does an existing helper, pattern, or flow already solve this?
99
+ 7. Deletion/substitution — can the addition be replaced with deleting, configuring, or extending an existing path?
100
+
101
+ Phase B proposal rules:
102
+ - Never surface Phase B without a concrete alternative grounded in observed code or project state. "This looks large, consider simplifying" is not a valid finding.
103
+ - Never suggest reducing: trust boundary input validation, data loss prevention, security measures, accessibility basics, or explicitly requested requirements.
104
+ - In `deep` mode, skip Phase B entirely — the interview already covered minimality and reuse.
88
105
 
89
106
  Stitch may change the order or method of work, but it must not change the user's goal without separate approval.
90
107
 
@@ -114,6 +131,38 @@ If the user chooses `Continue as-is` / `이대로 진행`, proceed with the expl
114
131
 
115
132
  Do not record a clean Stitch pass.
116
133
 
134
+ ## Deep mode
135
+ When `cast_mode` is `deep`, run a structured interview before the normal Procedure. The interview refines the task into a spec that feeds harness selection.
136
+
137
+ **Round 0 — Topology lock** (not counted in progress)
138
+ Before asking any questions, present the high-level components Claude infers from the request and visible codebase context. Ask the user to confirm, add, remove, or merge components. This prevents deep focus on one component from obscuring others.
139
+
140
+ **Interview loop — Rounds 1–10**
141
+ Show a progress indicator at the start of each question:
142
+
143
+ ```
144
+ [Round N/10 ████████░░░░░░░░░░░░]
145
+ ```
146
+
147
+ Rules:
148
+ - Ask one question per round. Never ask multiple questions in one round.
149
+ - Target the weakest clarity dimension each round: goal (0.35 weight), constraint (0.25), success criteria (0.25), context (0.15). These weights are internal judgment guides, not computed scores. Always pick the dimension where ambiguity most limits the next action.
150
+ - Brownfield rule: investigate the codebase before asking. Do not ask about things already visible in the code. Confirm findings rather than ask from scratch.
151
+ - Counter-question (user answers but also asks a question back): answer the counter-question first, then treat the combined response as this round's answer. Round counter does not advance.
152
+ - Clarification request (user does not understand the question): rephrase and re-ask within the same round. Round counter does not advance.
153
+ - Round 3+: user may exit the interview early and proceed directly to spec generation.
154
+ - Round 10: hard cap. End the interview and produce the spec regardless of ambiguity.
155
+ - End early when goal, constraint, and success criteria are all sufficiently clear, without waiting for Round 10.
156
+
157
+ **Question mode shift** (triggered by clarity state, not round number):
158
+ - When goal and constraint are sufficiently clear → shift to Contrarian mode: "What if the opposite were true? What if this assumption is wrong?"
159
+ - When those are also resolved → shift to Simplifier mode: "What is the smallest version that still has meaningful value?"
160
+
161
+ **Spec → plan.md → harness selection**
162
+ When the interview ends, write `.tink/current/plan.md` with these top-level sections: Goal, Topology, Constraints, Success Criteria, Open Questions.
163
+
164
+ Then proceed to the normal Procedure starting at step 3 (read harness index). Use the spec as the harness selection input instead of the raw task request. Stitch Phase A runs after harness selection as normal. Phase B is skipped.
165
+
117
166
  ## Reusable State Save Gate
118
167
  Reusable State Save Gate is a separate absolute hard approval gate, not merely a Stitch subtype. Current-run approval does not authorize reusable-state writes.
119
168
 
@@ -401,6 +450,7 @@ If any of the following is true, the task goes to Lane 3:
401
450
  - The task description mentions any of the above concepts
402
451
 
403
452
  **Step 2 — Lane decision (only if step 1 finds no hard-gate):**
453
+ If `cast_mode` is `quick`, always select Lane 1 here regardless of task signals.
404
454
 
405
455
  **Lane 1 — instant start.** Any of these signals, with no contradicting complexity signal:
406
456
  - a question, explanation, or lookup with no file edits
@@ -28,7 +28,7 @@ Accept legacy `$tink <action>` spelling for compatibility, but present `$tink:<a
28
28
  8. If too many tools, skills, agents, or harnesses are available, use `harness-curation` to choose the smallest effective set before loading more context.
29
29
  9. Treat Evidence Split as a base-run habit, not a harness: for non-trivial work, first ask whether the task should be split into `probe`, `patch`, `verify`, `review`, or `decision` packets. Use it at cast time and again during implementation when uncertainty grows, a check fails, context gets broad, or several changes start to couple. Keep it lightweight for tiny tasks and skip it when it would add ceremony without changing the next action.
30
30
  10. Treat visible-thinking and focused work workflows as ordinary Tink harness choices, not new commands. Actively consider them when their trigger changes the procedure: use `requirements-interview` for ambiguity, unclear scope, or missing acceptance criteria; `plan-consensus` for broad plans, migrations, API/schema/contract changes, or tradeoffs; `goal-checkpoint` for multi-file, multi-phase, resumed, release, or long runs; `delegation-brief` for handoff, independent verification, parallel review, or another agent/human brief; `issue-triage` for issue/PR/QA intake or vertical slices; `bug-diagnosis-loop` for hard bugs that need a red-capable loop before code changes; `review-two-axis` for Standards/Spec diff review; `decision-map` for multi-session unresolved decisions; and `architecture-deepening` for deep module, interface, seam, leverage, locality, or testability work.
31
- 11. Run Stitch once before committing to `.tink/current/`: evaluate every time, show exactly one proposal only for high-impact quality or safety branches, and use the configured language.
31
+ 11. Run Stitch once before committing to `.tink/current/`. Phase A (Blocking): always evaluate and surface when triggered — safety/irreversibility, missing success criteria, goal ambiguity, harness mismatch. Phase B (Plan-shaping): run after Phase A, surface only when a concrete code-grounded alternative exists — minimality, reuse, or deletion/substitution. Never surface Phase B without observed code evidence; never suggest reducing trust-boundary validation, data-loss prevention, security, accessibility, or explicitly requested requirements. In `deep` mode, skip Phase B entirely. Show exactly one proposal and use the configured language.
32
32
  12. For non-trivial `$tink:cast` runs, ask for current-run approval before creating `.tink/current/`, loading harness bodies, editing files, or executing the first step. Codex must not silently treat a command invocation as approval.
33
33
  13. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
34
34
  14. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
@@ -44,6 +44,7 @@ Accept legacy `$tink <action>` spelling for compatibility, but present `$tink:<a
44
44
  24. If a check fails, update `.tink/current/notes.md`, state the failure, last safe point, and next single action. Append compact friction to `.tink/maintenance/friction.jsonl` when it exists. Feed repeated failures to `$tink:weave`.
45
45
  25. Keep context compact. Do not paste raw logs or full diffs.
46
46
  26. Use calm, clear, concise language. Prefer plain everyday words over technical terms. No jokes.
47
+ 27. Read `cast_mode` from `.tink/config.json` before classifying the task. If `quick`, force Lane 1 (instant start) unless a hard-gate signal is present. If `deep`, run the structured interview before harness selection: (Round 0) present inferred topology and confirm with the user; (Rounds 1–10 max) ask one question per round targeting the weakest clarity dimension — goal (0.35 weight), constraint (0.25), success criteria (0.25), context (0.15) — investigate brownfield code before asking, do not ask what is already visible; show `[Round N/10 ████░░░░░░░░░░░░░░░░]` at each question; allow early exit from Round 3+; shift to Contrarian questioning when goal and constraint are clear, then Simplifier when those resolve; end by writing Goal, Topology, Constraints, Success Criteria, Open Questions to `.tink/current/plan.md`, then proceed to harness selection with Stitch Phase A only.
47
48
 
48
49
  ## Codex Approval Protocol
49
50
 
@@ -10,6 +10,7 @@
10
10
  "install_scope": "repo",
11
11
  "hook_scope": "off",
12
12
  "completion_policy": "normal",
13
+ "cast_mode": "standard",
13
14
  "default_harnesses_per_task": 4,
14
15
  "harness_lines_warning": 100,
15
16
  "context_budget": "soft",