npm - tink-harness - Versions diffs - 1.16.1 → 1.17.1 - Mend

tink-harness 1.16.1 → 1.17.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +9 -0
package/README.ko.md +62 -4
package/README.md +61 -13
package/VERSIONING.md +1 -1
package/commands/cast.md +2 -1
package/commands/deep-cast.md +2 -1
package/package.json +1 -1
package/templates/claude/commands/tink/cast.md +2 -1
package/templates/claude/commands/tink/deep-cast.md +2 -1
package/templates/codex/skills/tink-core/RULES.md +2 -1
package/templates/tink/harnesses/HARNESS.md +1 -0
package/templates/tink/harnesses/index.json +17 -0
package/templates/tink/harnesses/loop-engineering.md +57 -0

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "tink",
   "description": "A small harness layer for Claude Code and Codex.",
-  "version": "1.16.1",
+  "version": "1.17.1",
   "author": {
     "name": "dotori"
   }

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,15 @@ All notable changes to Tink are tracked here.
 ## Unreleased
+## [1.17.1] - 2026-06-30
+- README / README.ko.md에 Before→After 대비 블록과 run record·verify evidence 실물 예시 추가.
+## [1.17.0] - 2026-06-30
+- `loop-engineering` 하네스 추가: 측정 가능한 수용 신호(테스트·lint·build 통과, 커버리지, 벤치마크, 점수 등)를 향해 한 번에 한 병목씩 반복하고, 예산 초과 시 현재 상태·원인·다음 행동을 보고한다. 독립 평가(자기 채점 금지)와 반복 로그를 강제. `goal-checkpoint`(일반 다단계)·`bug-diagnosis-loop`(버그 진단)와 `use_when`에서 명시적으로 구분됨.
+- README 경험 스토리 상단 배치 및 확장.
 ## [1.16.1] - 2026-06-25
 - Codex에서 `$tink:deep-cast` 명령이 보이지 않던 문제 수정 — `templates/codex/skills/tink-deep-cast/SKILL.md` 누락이 원인

package/README.ko.md CHANGED Viewed

@@ -20,6 +20,10 @@
 ---
+*새 AI 코딩 도구를 계속 붙여 썼습니다. 하나씩은 다 쓸 만한데, 쌓을수록 환경이 무겁고 엉켰고 — 정작 일을 시작하기도 전에 토큰을 설정 다시 맞추는 데 적잖이 썼습니다. 저는 반대를 원했어요. 내가 도구에 맞추는 게 아니라, 도구가 나에게 맞춰지는 가벼운 것. 그게 Tink입니다.*
+---
 Tink가 없으면 에이전트 작업의 맥락은 매번 채팅 기록 속으로 사라집니다. 같은 리뷰·리팩터링·디버깅을 손으로 반복하고, 재사용하겠다고 적어둔 워크플로는 어딘가에 묻혀버립니다.
 Tink를 쓰면 사소하지 않은 모든 작업마다 읽고, diff하고, 커밋할 수 있는 파일이 남습니다 — 작업 계약, 눈에 보이는 계획, 검증 단계. 재사용 워크플로(하네스)는 명시적 승인 후에만 저장되고, 실제 run 기록을 바탕으로 점점 나아집니다. 그 기록이 로컬 건강 대시보드가 됩니다.
@@ -79,6 +83,24 @@ $tink:cast 인증 모듈 리팩터링     # Codex
 ## 실제로 남는 것
+**Tink 없이 (Claude Code / Codex 단독):**
+```text
+"인증 모듈 리팩터링"
+→ 계획은 채팅 기록에만
+→ 완료 기준 없음
+→ 다음 세션에서 맥락 소실
+```
+**Tink 사용 후 (`/tink:cast 인증 모듈 리팩터링`):**
+```text
+→ .tink/current/contract.json  — 완료 조건
+→ plan.md / checks.md          — 보이는 계획 + 검증 단계
+→ /tink:verify                 — "된 것 같다"가 아닌 증거로 증명
+→ .tink/runs/…md               — 재사용 가능한 간결한 기록
+```
 사소하지 않은 작업마다 열어보고, diff하고, 커밋할 수 있는 평범한 파일이 남습니다:
 ```text
@@ -92,6 +114,34 @@ $tink:cast 인증 모듈 리팩터링     # Codex
   refactor-review.md                # 재사용 작업 방식 — 승인해야 저장
 ```
+<details>
+<summary><strong>완성된 run은 이렇게 남습니다</strong></summary>
+**Run 기록** (`.tink/runs/YYYY-MM-DD-HHMM-작업명.md`):
+```text
+Status: completed
+Goal: Codex에서 직접 호출 가능한 entrypoint 스킬 추가.
+Changed: `$tink:<action>` 별칭을 인식하도록 main `tink` 스킬 수정; 래퍼 스킬 추가.
+Evidence: `find … | rg 'tink(-|/)'` 로 모든 SKILL.md와 main 스킬 확인.
+Notes: 재사용 메모리·하네스 변경 없음.
+```
+**Verify 증거** (`/tink:verify`, 두 가지 사례):
+```text
+✅  evidence_kind: command
+    evidence_ref:  npm test
+    observed:      테스트 47개 통과, 실패 없음
+⚠️  evidence_kind: manual
+    evidence_ref:  클린 설치 스모크 테스트 (macOS)
+    observed:      미실행 — CI 러너는 Linux 전용
+    next_action:   배포 전 macOS에서 수동 실행 필요
+```
+</details>
 ## CLAUDE.md·슬래시 명령·스킬만으로는 왜 부족할까?
 | 도구 | 제공하는 것 | Tink가 얹는 것 |
@@ -127,16 +177,24 @@ npx tink-harness dashboard          # 파일만 만들려면 --no-open 추가
 ## 왜 만들었나
-새로운 AI 코딩 하네스와 워크플로는 계속 늘어납니다. 좋은 것도 많지만, 여러 개를 섞다 보면 환경이 무거워지고 매번 다시 정리해야 합니다.
+*Tink는 knit(뜨개질)을 거꾸로 쓴 이름입니다. 엉킨 워크플로를 풀고 더 나은 흐름으로 다시 엮는다는 뜻이고, 조용히 곁에서 작게 도와주는 팅커벨(Tinker Bell)도 떠올렸습니다.*
+AI 도구와 콘텐츠는 매일 쏟아집니다. 크고 강력한 하네스 엔지니어링 툴들은 특정 작업과 규모에서 진짜 잘 동작합니다. 하지만 세팅을 매번 바꾸기 어렵고 무거워서, 다른 작업으로 넘어갈 때마다 환경을 처음부터 다시 맞추는 일이 반복됐습니다.
-Hermes Agent를 쓰면서 기억에 남은 건 *사용할수록 나아지는 방식*이었습니다. 반복 작업이 재사용 스킬이 되고, 실수가 메모리가 되고, 시스템이 쓰는 사람에 맞게 천천히 바뀌어갔습니다.
+Hermes Agent를 한동안 쓰면서 기억에 남은 건 특정 기능이 아니었습니다. *사용할수록 나아지는 원리*였어요. 반복 작업이 재사용 스킬이 되고, 실수가 메모리가 되고, 시스템이 쓰는 사람에 맞게 천천히 바뀌어갔습니다.
 Tink는 간단한 질문에서 시작했습니다:
-> Claude Code나 Codex도 같은 방식으로 나와 함께 성장할 수 있을까?
+> 클로드코드 같은 AI Agent 툴도 같은 방식으로 나와 함께 성장할 수 있을까?
 큰 프레임워크가 아니라, 더 많은 에이전트를 돌리는 게 아니라 — 지금 작업에 맞는 하네스를 고르고, 없으면 작은 걸 만들고, 시간이 지나면서 하네스 묶음이 조금씩 나아지도록.
+한 가지 더 인정해야 했습니다. 사람은 완벽한 프롬프트를 만들 수 없고, AI Agent도 아직 완벽하지 않습니다. 그래서 툴이 양방향으로 작동해야 했어요 — 작업 지시를 보완하고 교정하는 방향으로, 단순히 빠르게 실행하는 게 아니라. 그게 `cast`가 작업이 애매할 때 인터뷰를 실행하는 이유고, 검증 실패를 기록해 같은 실수가 반복되지 않도록 하는 이유입니다.
+뜨개질 은유는 장식이 아닙니다. **cast**(코잡기)는 시작 — 이 작업에 맞는 하네스를 고르거나 만드는 것. **frog**(풀시오)는 쓸모를 잃은 걸 걷어내는 것. **weave**(실오라기 정리)는 남은 걸 더 정확하게 다듬는 것. 잘 맞은 방식은 하네스로 저장해 재사용하고, 맞지 않은 건 지우거나 합칩니다.
+아직 완성은 아닙니다. 하지만 매일 업무에서 꺼내 쓰고 있고, 쓸수록 더 유용해집니다. 핵심 전제는 하나입니다: 사람도 AI도 완벽하지 않다면, 둘 사이의 툴이 서로의 부족함을 보완하도록 도와야 한다 — 어느 한쪽을 고정된 설정에 가두는 게 아니라.
 ---
 <details>
@@ -236,7 +294,7 @@ npx tink-harness@latest update
 ## Tink가 아닌 것
-Tink는 코딩 에이전트, 워크플로 엔진, 멀티 에이전트 런타임, 프롬프트 라이브러리가 아닙니다. Claude Code와 Codex 위에 얹는 작은 하네스 레이어입니다.
+코딩 에이전트도, 워크플로 엔진도, 멀티 에이전트 런타임도, 프롬프트 라이브러리도 아닙니다. Claude Code와 Codex 위에 얹는 작은 하네스 레이어입니다.
 ## 기여

package/README.md CHANGED Viewed

@@ -20,6 +20,10 @@
 ---
+*I kept adding new AI coding tools. Each was useful on its own, but stacking them made my setup heavy and tangled — and I'd burn a real slice of my token budget just reconfiguring before any actual work began. I wanted the opposite: something small that adapts to me, not the other way around. That's Tink.*
+---
 Without Tink, agent tasks live only in chat history — context resets on every run, workflows repeat by hand, and nothing gets better over time.
 With Tink, every non-trivial task leaves plain files you can read, diff, and commit: a task contract, a visible plan, verification steps. Reusable workflows — *harnesses* — are saved only after your explicit approval, then improved from real run data. One command turns those records into a local health dashboard.
@@ -79,6 +83,24 @@ $tink:cast refactor the auth module     # Codex
 ## What you actually get
+**Before (plain Claude Code / Codex):**
+```text
+"refactor the auth module"
+→ plan lives in chat only
+→ no completion criteria
+→ context lost next session
+```
+**After (`/tink:cast refactor the auth module`):**
+```text
+→ .tink/current/contract.json  — what must be true when done
+→ plan.md / checks.md          — visible plan and verification steps
+→ /tink:verify                 — proves "done" with evidence, not vibes
+→ .tink/runs/…md               — compact record, reusable next time
+```
 Every non-trivial task leaves plain files you can open, diff, and commit:
 ```text
@@ -92,6 +114,34 @@ Every non-trivial task leaves plain files you can open, diff, and commit:
   refactor-review.md                # reusable ways of working — approval-gated
 ```
+<details>
+<summary><strong>What a finished run actually looks like</strong></summary>
+**Run record** (`.tink/runs/YYYY-MM-DD-HHMM-task.md`):
+```text
+Status: completed
+Goal: make Tink easier to invoke in Codex with direct entrypoint skills.
+Changed: updated main `tink` skill to recognize `$tink:<action>` aliases; added thin wrapper skills.
+Evidence: `find … | rg 'tink(-|/)'` showed all SKILL.md files plus the main skill.
+Notes: no reusable memory or harness was changed.
+```
+**Verify evidence** (`/tink:verify`, two outcomes):
+```text
+✅  evidence_kind: command
+    evidence_ref:  npm test
+    observed:      47 tests passed, 0 failures
+⚠️  evidence_kind: manual
+    evidence_ref:  clean install smoke (macOS)
+    observed:      not run — CI runner is Linux only
+    next_action:   run manually on macOS before publish
+```
+</details>
 ## Why not just CLAUDE.md / slash commands / skills?
 | Tooling | What it gives you | What Tink adds on top |
@@ -129,15 +179,21 @@ Under the hood it runs two read-only helpers (`node .tink/tools/generate-harness
 *Tink is <strong>knit</strong> in reverse: untying tangled workflows and knitting better ones back together. It also nods to Tinker Bell, the small helper at your side.*
-New coding harnesses show up almost every day. Many are genuinely useful. But the more I mixed them, the more my environment got tangled. Resetting everything again and again was tiring.
+New AI tools and content flood in every day. Many are worth trying. But the bigger harness-engineering tools — the ones with real power — are built for specific tasks and scales. Switching between them is friction: heavy setup, hard to reconfigure, awkward to use for something small. I kept bouncing between them and resetting my environment from scratch.
-Then I used Hermes Agent for a while. What stayed with me was the way it gets better through use: repeated work turns into reusable skills, mistakes become memory, and the system slowly adapts to the person using it.
+Then I used Hermes Agent for a while. What stayed with me wasn't any feature — it was the *principle*. A system that gets better through use: repeated work turns into reusable skills, mistakes become memory, the tool slowly adapts to the person using it.
 Tink started from a simple question:
-> Could Claude Code or Codex grow with me in the same way?
+> Could an AI Agent tool like Claude Code or Codex grow with me in the same way?
+Not by adding a big framework. Not by running more agents. Just by helping the agent choose the right approach for the current task, create one when nothing fits, and quietly improve the set over time.
+I also had to accept something: people can't produce perfect prompts, and AI agents aren't perfect yet either. So the tool needed to go both ways — correcting and refining task instructions, not just executing them faster. That's why `cast` runs a structured interview when the task is unclear, and why failed checks get recorded so the same mistake doesn't repeat.
-Not by adding a big framework. Not by running more agents. Just by helping Claude or Codex choose the right harness for the current task, create one when nothing fits, and improve the set over time.
+The knitting metaphors aren't decorative. **cast** (코잡기) is how you start — picking or drafting the right harness for this task. **frog** (풀시오) is how you clean up what stopped being useful. **weave** (실오라기 정리) is how you tighten what stays. A harness that worked once can be saved and reused; one that didn't gets removed or merged.
+It's not finished. But I reach for it every day at work, and it keeps getting more useful. The core bet: if humans and AI agents are both imperfect, the tool between them should help each side compensate for the other — not lock either into a fixed setup.
 ---
@@ -246,15 +302,7 @@ Verify: `docs/update-verification-recipe.md` or `docs/update-verification-recipe
 ## What Tink is not
-Tink is not:
-- a coding agent
-- a workflow engine
-- a multi-agent runtime
-- a prompt library
-- a replacement for Claude Code or Codex
-It is a small harness layer for Claude Code or Codex.
+Tink is not a coding agent, workflow engine, multi-agent runtime, or prompt library. It is a small harness layer for Claude Code or Codex.
 ## Contributing

package/VERSIONING.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Versioning
-Current version: `1.16.1`
+Current version: `1.17.1`
 Tink follows semver from `1.0.0` onward.

package/commands/cast.md CHANGED Viewed

@@ -533,7 +533,7 @@ Rule: while such a run is active, END every assistant response with a progress b
 ## Base run (no harness)
 Generic task-type harnesses (`code-change`, `bug-fix`, `research`, `review`, `docs`) are retired from the default set. Generic work runs as a **base run**: the run state contract alone - `plan.md`, `checks.md`, `steps.json`, `contract.json` - already enforces scope, verification commands, and evidence for ordinary code, bug, research, review, and docs work.
-- Select a harness only when its specialized procedure changes what would actually happen: visible-thinking overlays (`requirements-interview`, `plan-consensus`, `goal-checkpoint`, `delegation-brief`), focused work harnesses (`issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, `architecture-deepening`), risk gates (`ship`, `pre-publish-multi-agent-verify`, `pr-merge`), meta harnesses (`harness-curation`, `harness-synthesis`), `tink-feedback-apply`, or user-created and synthesized domain harnesses.
+- Select a harness only when its specialized procedure changes what would actually happen: visible-thinking overlays (`requirements-interview`, `plan-consensus`, `goal-checkpoint`, `delegation-brief`), focused work harnesses (`issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, `architecture-deepening`, `loop-engineering`), risk gates (`ship`, `pre-publish-multi-agent-verify`, `pr-merge`), meta harnesses (`harness-curation`, `harness-synthesis`), `tink-feedback-apply`, or user-created and synthesized domain harnesses.
 - Never force a loose-fit harness just to show a harness name. "No harness" is a valid and common selection.
 - In user-facing output call this `기본 절차` (Korean) or `base run` (English), with one short explanation line such as `기본 절차로 진행합니다 - 별도 하네스 없이 실행 상태 계약(계획·검증·증거)만 사용`.
 - The base run does not weaken anything: contract checks, Stitch, overlay rules, and the progress display still apply unchanged.
@@ -580,6 +580,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
    - Use `decision-map` only when a loose idea has multiple unresolved decisions that need research, prototype, or discussion tickets across sessions.
    - Use `architecture-deepening` only when the work is explicitly about module/interface/seam shape, deep modules, leverage, locality, or testability.
+   - Use `loop-engineering` only when the user explicitly asks to iterate until a measurable bar passes (tests/lint/build green, coverage, benchmark, score). Not for ordinary multi-step work (`goal-checkpoint`) or hard bugs (`bug-diagnosis-loop`).
    **Overlay selection is rule-bound, not taste.** After drafting the Goals list for the approval payload, re-check before presenting it:
    - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.

package/commands/deep-cast.md CHANGED Viewed

@@ -488,7 +488,7 @@ Rule: while such a run is active, END every assistant response with a progress b
 ## Base run (no harness)
 Generic task-type harnesses (`code-change`, `bug-fix`, `research`, `review`, `docs`) are retired from the default set. Generic work runs as a **base run**: the run state contract alone - `plan.md`, `checks.md`, `steps.json`, `contract.json` - already enforces scope, verification commands, and evidence for ordinary code, bug, research, review, and docs work.
-- Select a harness only when its specialized procedure changes what would actually happen: visible-thinking overlays (`requirements-interview`, `plan-consensus`, `goal-checkpoint`, `delegation-brief`), focused work harnesses (`issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, `architecture-deepening`), risk gates (`ship`, `pre-publish-multi-agent-verify`, `pr-merge`), meta harnesses (`harness-curation`, `harness-synthesis`), `tink-feedback-apply`, or user-created and synthesized domain harnesses.
+- Select a harness only when its specialized procedure changes what would actually happen: visible-thinking overlays (`requirements-interview`, `plan-consensus`, `goal-checkpoint`, `delegation-brief`), focused work harnesses (`issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, `architecture-deepening`, `loop-engineering`), risk gates (`ship`, `pre-publish-multi-agent-verify`, `pr-merge`), meta harnesses (`harness-curation`, `harness-synthesis`), `tink-feedback-apply`, or user-created and synthesized domain harnesses.
 - Never force a loose-fit harness just to show a harness name. "No harness" is a valid and common selection.
 - In user-facing output call this `기본 절차` (Korean) or `base run` (English), with one short explanation line such as `기본 절차로 진행합니다 - 별도 하네스 없이 실행 상태 계약(계획·검증·증거)만 사용`.
 - The base run does not weaken anything: contract checks, Stitch, overlay rules, and the progress display still apply unchanged.
@@ -535,6 +535,7 @@ This is the full path after the interview produces a spec.
    - Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
    - Use `decision-map` only when a loose idea has multiple unresolved decisions that need research, prototype, or discussion tickets across sessions.
    - Use `architecture-deepening` only when the work is explicitly about module/interface/seam shape, deep modules, leverage, locality, or testability.
+   - Use `loop-engineering` only when the user explicitly asks to iterate until a measurable bar passes (tests/lint/build green, coverage, benchmark, score). Not for ordinary multi-step work (`goal-checkpoint`) or hard bugs (`bug-diagnosis-loop`).
    **Overlay selection is rule-bound, not taste.** After drafting the Goals list for the approval payload, re-check before presenting it:
    - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "tink-harness",
-  "version": "1.16.1",
+  "version": "1.17.1",
   "description": "Self-growing harnesses for Claude Code and Codex.",
   "license": "MIT",
   "type": "module",

package/templates/claude/commands/tink/cast.md CHANGED Viewed

@@ -533,7 +533,7 @@ Rule: while such a run is active, END every assistant response with a progress b
 ## Base run (no harness)
 Generic task-type harnesses (`code-change`, `bug-fix`, `research`, `review`, `docs`) are retired from the default set. Generic work runs as a **base run**: the run state contract alone - `plan.md`, `checks.md`, `steps.json`, `contract.json` - already enforces scope, verification commands, and evidence for ordinary code, bug, research, review, and docs work.
-- Select a harness only when its specialized procedure changes what would actually happen: visible-thinking overlays (`requirements-interview`, `plan-consensus`, `goal-checkpoint`, `delegation-brief`), focused work harnesses (`issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, `architecture-deepening`), risk gates (`ship`, `pre-publish-multi-agent-verify`, `pr-merge`), meta harnesses (`harness-curation`, `harness-synthesis`), `tink-feedback-apply`, or user-created and synthesized domain harnesses.
+- Select a harness only when its specialized procedure changes what would actually happen: visible-thinking overlays (`requirements-interview`, `plan-consensus`, `goal-checkpoint`, `delegation-brief`), focused work harnesses (`issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, `architecture-deepening`, `loop-engineering`), risk gates (`ship`, `pre-publish-multi-agent-verify`, `pr-merge`), meta harnesses (`harness-curation`, `harness-synthesis`), `tink-feedback-apply`, or user-created and synthesized domain harnesses.
 - Never force a loose-fit harness just to show a harness name. "No harness" is a valid and common selection.
 - In user-facing output call this `기본 절차` (Korean) or `base run` (English), with one short explanation line such as `기본 절차로 진행합니다 - 별도 하네스 없이 실행 상태 계약(계획·검증·증거)만 사용`.
 - The base run does not weaken anything: contract checks, Stitch, overlay rules, and the progress display still apply unchanged.
@@ -580,6 +580,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
    - Use `decision-map` only when a loose idea has multiple unresolved decisions that need research, prototype, or discussion tickets across sessions.
    - Use `architecture-deepening` only when the work is explicitly about module/interface/seam shape, deep modules, leverage, locality, or testability.
+   - Use `loop-engineering` only when the user explicitly asks to iterate until a measurable bar passes (tests/lint/build green, coverage, benchmark, score). Not for ordinary multi-step work (`goal-checkpoint`) or hard bugs (`bug-diagnosis-loop`).
    **Overlay selection is rule-bound, not taste.** After drafting the Goals list for the approval payload, re-check before presenting it:
    - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.

package/templates/claude/commands/tink/deep-cast.md CHANGED Viewed

@@ -488,7 +488,7 @@ Rule: while such a run is active, END every assistant response with a progress b
 ## Base run (no harness)
 Generic task-type harnesses (`code-change`, `bug-fix`, `research`, `review`, `docs`) are retired from the default set. Generic work runs as a **base run**: the run state contract alone - `plan.md`, `checks.md`, `steps.json`, `contract.json` - already enforces scope, verification commands, and evidence for ordinary code, bug, research, review, and docs work.
-- Select a harness only when its specialized procedure changes what would actually happen: visible-thinking overlays (`requirements-interview`, `plan-consensus`, `goal-checkpoint`, `delegation-brief`), focused work harnesses (`issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, `architecture-deepening`), risk gates (`ship`, `pre-publish-multi-agent-verify`, `pr-merge`), meta harnesses (`harness-curation`, `harness-synthesis`), `tink-feedback-apply`, or user-created and synthesized domain harnesses.
+- Select a harness only when its specialized procedure changes what would actually happen: visible-thinking overlays (`requirements-interview`, `plan-consensus`, `goal-checkpoint`, `delegation-brief`), focused work harnesses (`issue-triage`, `bug-diagnosis-loop`, `review-two-axis`, `decision-map`, `architecture-deepening`, `loop-engineering`), risk gates (`ship`, `pre-publish-multi-agent-verify`, `pr-merge`), meta harnesses (`harness-curation`, `harness-synthesis`), `tink-feedback-apply`, or user-created and synthesized domain harnesses.
 - Never force a loose-fit harness just to show a harness name. "No harness" is a valid and common selection.
 - In user-facing output call this `기본 절차` (Korean) or `base run` (English), with one short explanation line such as `기본 절차로 진행합니다 - 별도 하네스 없이 실행 상태 계약(계획·검증·증거)만 사용`.
 - The base run does not weaken anything: contract checks, Stitch, overlay rules, and the progress display still apply unchanged.
@@ -535,6 +535,7 @@ This is the full path after the interview produces a spec.
    - Use `review-two-axis` for PR/branch/diff review when Standards and Spec should be reported separately.
    - Use `decision-map` only when a loose idea has multiple unresolved decisions that need research, prototype, or discussion tickets across sessions.
    - Use `architecture-deepening` only when the work is explicitly about module/interface/seam shape, deep modules, leverage, locality, or testability.
+   - Use `loop-engineering` only when the user explicitly asks to iterate until a measurable bar passes (tests/lint/build green, coverage, benchmark, score). Not for ordinary multi-step work (`goal-checkpoint`) or hard bugs (`bug-diagnosis-loop`).
    **Overlay selection is rule-bound, not taste.** After drafting the Goals list for the approval payload, re-check before presenting it:
    - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.

package/templates/codex/skills/tink-core/RULES.md CHANGED Viewed

@@ -27,7 +27,7 @@ Accept legacy `$tink <action>` spelling for compatibility, but present `$tink:<a
 7. Run the synthesis probe before committing to `.tink/current/`. Strong fit keeps the harness; generic fit adds a run-only draft; no fit loads `harness-synthesis`.
 8. If too many tools, skills, agents, or harnesses are available, use `harness-curation` to choose the smallest effective set before loading more context.
 9. Treat Evidence Split as a base-run habit, not a harness: for non-trivial work, first ask whether the task should be split into `probe`, `patch`, `verify`, `review`, or `decision` packets. Use it at cast time and again during implementation when uncertainty grows, a check fails, context gets broad, or several changes start to couple. Keep it lightweight for tiny tasks and skip it when it would add ceremony without changing the next action.
-10. Treat visible-thinking and focused work workflows as ordinary Tink harness choices, not new commands. Actively consider them when their trigger changes the procedure: use `requirements-interview` for ambiguity, unclear scope, or missing acceptance criteria; `plan-consensus` for broad plans, migrations, API/schema/contract changes, or tradeoffs; `goal-checkpoint` for multi-file, multi-phase, resumed, release, or long runs; `delegation-brief` for handoff, independent verification, parallel review, or another agent/human brief; `issue-triage` for issue/PR/QA intake or vertical slices; `bug-diagnosis-loop` for hard bugs that need a red-capable loop before code changes; `review-two-axis` for Standards/Spec diff review; `decision-map` for multi-session unresolved decisions; and `architecture-deepening` for deep module, interface, seam, leverage, locality, or testability work.
+10. Treat visible-thinking and focused work workflows as ordinary Tink harness choices, not new commands. Actively consider them when their trigger changes the procedure: use `requirements-interview` for ambiguity, unclear scope, or missing acceptance criteria; `plan-consensus` for broad plans, migrations, API/schema/contract changes, or tradeoffs; `goal-checkpoint` for multi-file, multi-phase, resumed, release, or long runs; `delegation-brief` for handoff, independent verification, parallel review, or another agent/human brief; `issue-triage` for issue/PR/QA intake or vertical slices; `bug-diagnosis-loop` for hard bugs that need a red-capable loop before code changes; `review-two-axis` for Standards/Spec diff review; `decision-map` for multi-session unresolved decisions; and `architecture-deepening` for deep module, interface, seam, leverage, locality, or testability work; and `loop-engineering` for explicit iterate-until-target goals (tests/lint/build green, coverage, benchmark, score) where one pass is not enough.
 11. Run Stitch once before committing to `.tink/current/`. Phase A (Blocking): always evaluate and surface when triggered — safety/irreversibility, missing success criteria, goal ambiguity, harness mismatch. Phase B (Plan-shaping): run after Phase A, surface only when a concrete code-grounded alternative exists — minimality, reuse, or deletion/substitution. Never surface Phase B without observed code evidence; never suggest reducing trust-boundary validation, data-loss prevention, security, accessibility, or explicitly requested requirements. In `deep` mode, skip Phase B entirely. Show exactly one proposal and use the configured language.
 12. For non-trivial `$tink:cast` runs, ask for current-run approval before creating `.tink/current/`, loading harness bodies, editing files, or executing the first step. Codex must not silently treat a command invocation as approval.
 13. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
@@ -170,6 +170,7 @@ Focused work harness selection rules:
 - Branch, PR, or WIP diff review should use `review-two-axis` when Standards and Spec need separate findings.
 - Loose ideas with multiple unresolved decisions across sessions should use `decision-map`; ordinary one-session planning should stay with `plan-consensus`.
 - Architecture health work should use `architecture-deepening` only when module/interface/seam shape, deep modules, leverage, locality, or testability are the point of the task.
+- Explicit iterate-until-target goals (tests/lint/build green, coverage, benchmark, score) where one pass is unlikely to succeed should use `loop-engineering` — it enforces one bottleneck per iteration, an independent acceptance signal, an iteration budget, and a final report if the budget is hit. Not for ordinary multi-step work (`goal-checkpoint`) or hard-bug diagnosis (`bug-diagnosis-loop`).
 When useful, enrich `context-map.json.included[]` and `context-map.json.excluded[]` entries with Context Budget Ledger fields: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`. Use them to keep the first context pack small, mark stale or avoid-next-time context, and connect `verification_target` entries to command checks, manual checks, evidence refs, or verification hints. Do not claim any 90% efficiency score without measurement evidence.

package/templates/tink/harnesses/HARNESS.md CHANGED Viewed

@@ -17,6 +17,7 @@
 - **[review-two-axis](./review-two-axis.md)** (small) — PR·브랜치·diff를 Standards와 Spec 두 축으로 분리해 검토.
 - **[decision-map](./decision-map.md)** (small-heavy) — 여러 세션이 필요한 느슨한 아이디어를 research/prototype/discuss ticket 지도와 frontier로 관리.
 - **[architecture-deepening](./architecture-deepening.md)** (small-heavy) — deep module, interface, seam, leverage, locality 관점으로 구조 개선 후보와 계획을 정리.
+- **[loop-engineering](./loop-engineering.md)** (small) — 측정 가능한 수용 신호(테스트/lint/build 통과, 커버리지, 벤치마크 등)를 향해 반복(한 병목→수정→재검증)하며 예산 내 종료를 강제.
 - **[ship](./ship.md)** (small) — PR 준비, 릴리스, 배포. 위험·롤백 명시. cast 시작 시 안전판이 미리 켜집니다.
 ## 관리용 메타 하네스

package/templates/tink/harnesses/index.json CHANGED Viewed

@@ -134,6 +134,23 @@
       "The final plan verifies at the chosen seam"
     ]
   },
+  {
+    "name": "loop-engineering",
+    "kind": "built-in",
+    "context": "small",
+    "use_when": "User explicitly wants to iterate act-observe-evaluate-fix against a measurable acceptance signal until it passes or a budget is hit. For 'make X pass / reach quality bar Y' goals. Not for ordinary multi-step work (goal-checkpoint) or hard-bug diagnosis (bug-diagnosis-loop).",
+    "asks": [
+      "What single acceptance signal judges each iteration?",
+      "What is the iteration budget before stopping to report?",
+      "What evaluates results independently of the change?"
+    ],
+    "checks": [
+      "One bottleneck fixed per iteration",
+      "Acceptance signal is runnable, not self-judged",
+      "Each iteration logged before the next change",
+      "Budget respected; stop-and-report instead of infinite loop"
+    ]
+  },
   {
     "name": "ship",
     "kind": "built-in",

package/templates/tink/harnesses/loop-engineering.md ADDED Viewed

@@ -0,0 +1,57 @@
+# loop-engineering
+## When to use
+Use when the user explicitly wants to iterate — act, observe, evaluate, fix one
+bottleneck, re-evaluate — against a measurable acceptance signal until it passes
+or an iteration budget is hit. For "make X pass / reach quality bar Y" goals where
+one pass is unlikely to be enough.
+Good triggers:
+- "iterate / loop until tests, lint, build pass", "반복해서 ~까지 통과"
+- "raise the score / coverage / benchmark to N", "keep fixing until green"
+- `/tink:cast loop <task>` style explicit loop intent
+Do not use for ordinary multi-step work that just needs goals tracked — that is
+`goal-checkpoint`. Do not use for hard-bug diagnosis — that is `bug-diagnosis-loop`.
+If there is no runnable or observable acceptance signal, run `requirements-interview`
+first to define a measurable bar.
+## Ask first
+- What single acceptance signal judges each iteration (command, test, metric, manual check)?
+- What is the iteration budget (max iterations or time) before stopping to report?
+- What must never change or run during the loop (forbidden files, commands, deps)?
+- What evaluates results independently of the change, so the author does not grade itself?
+Do not repeat questions already answered in `.tink/current/answers.md`.
+## Plan
+1. Write the loop contract into `contract.json`: objective, acceptance signal(s) as
+   `verification.commands` / `manual_checks`, forbidden actions as `forbidden`; record
+   the iteration budget in `.tink/current/notes.md`.
+2. Measure the baseline once: run the acceptance signal, record the starting score or
+   failure in `notes.md` as iteration 0.
+3. Each iteration: identify the single biggest failure cause, make one focused change
+   for that cause only, then re-run the same acceptance signal.
+4. Append one log line per iteration to `notes.md`: iteration number, the one change,
+   result (pass/fail + score), next bottleneck.
+5. Stop when the acceptance signal passes, or when the budget is reached. Never stop
+   merely because a file was edited.
+6. On budget exhaustion, stop and report current state, failing check, suspected root
+   cause, and the next recommended action.
+## Checks
+- Each iteration changes one bottleneck, not several at once.
+- The acceptance signal is a runnable command or observable check, not a self-judgment.
+- Every iteration is logged in `notes.md` with its result before the next change.
+- Evaluation is independent of the change (separate command or review pass).
+- The loop respects the budget and stops with a report instead of looping forever.
+## Done means
+- All acceptance signals in `contract.json` pass, with the final run recorded as evidence; or
+- The budget was reached and the run is reported blocked with state, failing check, and next action.
+- The iteration log in `notes.md` shows what each iteration changed and measured.
+## If it fails, Tink back
+Return to the last iteration whose acceptance signal improved or held. Restate the active
+bottleneck, the last best result, and the single next change. If no acceptance signal can
+be built, stop and run `requirements-interview` to define a measurable bar before looping.