npm - tink-harness - Versions diffs - 1.9.21 → 1.10.0 - Mend

tink-harness 1.9.21 → 1.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +13 -2
package/README.ko.md +27 -16
package/README.md +28 -19
package/VERSIONING.md +1 -1
package/bin/install.js +144 -8
package/commands/cast.md +81 -31
package/commands/frog.md +9 -1
package/commands/list.md +104 -104
package/commands/setup.md +1 -1
package/commands/update.md +1 -0
package/commands/weave.md +32 -4
package/package.json +1 -1
package/templates/claude/commands/tink/cast.md +81 -31
package/templates/claude/commands/tink/frog.md +9 -1
package/templates/claude/commands/tink/list.md +104 -104
package/templates/claude/commands/tink/setup.md +1 -1
package/templates/claude/commands/tink/update.md +1 -0
package/templates/claude/commands/tink/weave.md +32 -4
package/templates/codex/skills/tink-core/RULES.md +7 -7
package/templates/tink/harnesses/HARNESS.md +4 -5
package/templates/tink/harnesses/index.json +0 -75
package/templates/tink/rules/index.json +0 -28
package/templates/tink/harnesses/bug-fix.md +0 -31
package/templates/tink/harnesses/code-change.md +0 -30
package/templates/tink/harnesses/docs.md +0 -30
package/templates/tink/harnesses/research.md +0 -31
package/templates/tink/harnesses/review.md +0 -31

package/templates/claude/commands/tink/cast.md CHANGED Viewed

@@ -42,7 +42,7 @@ Map prompt content to `AskUserQuestion` fields:
 - `description`: explanatory text for the option
 Label quality rules:
-- Use short, common, readable labels only. Good Korean labels are `승인`, `조정`, `취소`, `요구사항 입력`, `기본 하네스만 사용`, `새 하네스 초안 만들기`, `구조 점검`, `내용 점검`, `전체 점검`.
+- Use short, common, readable labels only. Good Korean labels are `승인`, `조정`, `취소`, `요구사항 입력`, `기본 절차만 사용`, `새 하네스 초안 만들기`, `구조 점검`, `내용 점검`, `전체 점검`.
 - Do not invent compressed Korean labels, transliterated fragments, or unclear summaries such as `콘데의달 지질`.
 - If the idea is too specific for a clean 1-5 word label, put the detail in `description` and use a generic label such as `내용 점검` or `전체 점검`.
 - Before calling `AskUserQuestion`, reread each Korean label. If it looks misspelled, unnatural, or semantically unclear, replace it with a plain fallback label.
@@ -282,11 +282,11 @@ Candidate limits:
 - Add more only when each extra entry changes the first action, verification, or safety boundary.
 - Do not load entire directories unless the directory itself is the artifact under review.
-Also append a compact run record to `.tink/runs/YYYY-MM-DD-HHMM-<slug>.md` when the task completes, is canceled, is blocked, or is superseded. Do not store secrets, raw logs, full diffs, or one-off private context.
+Also append a compact run record to `.tink/runs/YYYY-MM-DD-HHMM-<slug>.md` when the task completes, is canceled, is blocked, or is superseded. Do not store secrets, raw logs, full diffs, or one-off private context. If a run-only draft harness was used, record its name and its 2-4 domain rules compactly in the run record - `/tink:weave` treats drafts that repeat across runs as promotion candidates.
 When appending a run record, also append a signal to `.tink/maintenance/weave-queue.json` if it exists:
 ```json
-{ "id": "signal-<run_id>", "harness": "<primary_selected_harness>", "run": ".tink/runs/<slug>.md", "signal": "<outcome>", "auto": true, "timestamp": "<ISO>" }
+{ "id": "signal-<run_id>", "harness": "<primary_selected_harness or base-run>", "run": ".tink/runs/<slug>.md", "signal": "<outcome>", "auto": true, "timestamp": "<ISO>" }
 ```
 Use `check_failed` as the signal when any check in `checks.md` did not pass; otherwise use the run outcome (`completed`, `blocked`, `canceled`, or `superseded`). Do not create `.tink/maintenance/weave-queue.json` if it does not exist — only append when it is already present.
@@ -376,10 +376,10 @@ Lane 1 behavior:
 - If the work changed files, append a compact run record on completion; pure Q&A needs no record.
 - If the task turns out bigger mid-work, stop, say so in one line, and re-enter triage as Lane 2 or 3 with what was learned.
-**Lane 2 — light harness.** Small but multi-step work: one obvious harness fits, roughly 2-3 files, no architecture or contract decisions, no ambiguity worth an interview.
+**Lane 2 — light run.** Small but multi-step work: roughly 2-3 files, no architecture or contract decisions, no ambiguity worth an interview. At most one obvious specialized harness fits - often none, and the base run is enough.
 Lane 2 behavior:
-- Announce the chosen harness in one line, create minimal run state (`steps.json` plus a short `plan.md`), and execute the first safe step in the same response.
+- Announce the chosen harness or the base run in one line, create minimal run state (`steps.json` plus a short `plan.md`), and execute the first safe step in the same response.
 - Do not ask an approval question for soft-gate work; state the working assumption inline (`범위가 다르면 말씀 주세요` / `Tell me if the scope is different`) and keep moving.
 - Stitch runs silently and surfaces only hard gates.
@@ -408,6 +408,43 @@ Rule: while such a run is active, END every assistant response with a progress b
 - On run completion, show the final 100% bar once with `✅` instead of `다음`.
 - Never skip the block because a response feels small; if the response is blocked, the block shows where work stopped.
+**Full progress view.** The compact block above is the every-response footer. At key moments, show the full map instead, so the user can plan how far to go today:
+- right after the plan is first created or restructured,
+- right after a goal/phase completes,
+- on the first response after resuming a run, and
+- whenever the user asks about progress or the plan.
+```text
+📊 전체 진행 상황
+✅ Phase 0  nav graph            ▓▓▓▓▓▓▓▓▓▓ 100%
+▶  Phase 1  Section Index        ░░░░░░░░░░   0%   ← 지금
+   Phase 2  Block Index          ░░░░░░░░░░   0%
+   Phase 3  query + aliases      ░░░░░░░░░░   0%
+──────────────────────────────────────
+전체 ▓▓▓░░░░░░░ 25% · 4개 중 1개 완료
+▶ Phase 1 세부
+  ✅ [1/4] build-block-index.mjs 작성
+  ▶  [2/4] sections/*.jsonl 생성   ← 지금
+     [3/4] validate-index.mjs 기본 구조
+     [4/4] line range 검증
+```
+- One row per goal/phase with its own 10-cell bar; mark completed rows `✅`, the active row `▶` plus `← 지금`. Below the divider, the same overall bar as the compact block.
+- The detail block lists the active phase's steps with `✅`/`▶`/blank markers - no mini bars; the markers carry the state.
+- Keep alignment tolerant: pad with two or more spaces instead of strict columns, because mixed Korean/English widths break exact tables.
+- The full view replaces the compact block in that response; the next response returns to the compact footer.
+## Base run (no harness)
+Generic task-type harnesses (`code-change`, `bug-fix`, `research`, `review`, `docs`) are retired from the default set. Generic work runs as a **base run**: the run state contract alone - `plan.md`, `checks.md`, `steps.json`, `contract.json` - already enforces scope, verification commands, and evidence for ordinary code, bug, research, review, and docs work.
+- Select a harness only when its specialized procedure changes what would actually happen: visible-thinking overlays (`requirements-interview`, `plan-consensus`, `goal-checkpoint`, `delegation-brief`), risk gates (`ship`, `pre-publish-multi-agent-verify`, `pr-merge`), meta harnesses (`harness-curation`, `harness-synthesis`), `tink-feedback-apply`, or user-created and synthesized domain harnesses.
+- Never force a loose-fit harness just to show a harness name. "No harness" is a valid and common selection.
+- In user-facing output call this `기본 절차` (Korean) or `base run` (English), with one short explanation line such as `기본 절차로 진행합니다 - 별도 하네스 없이 실행 상태 계약(계획·검증·증거)만 사용`.
+- The base run does not weaken anything: contract checks, Stitch, overlay rules, and the progress display still apply unchanged.
+- If a legacy install still has the retired generic harnesses, do not select them; treat the task as a base run and let `/tink:update` or `/tink:frog` clean them up.
 ## Procedure
 This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip most of it.
@@ -435,12 +472,19 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
    - docs
    - ship/release
    - new pattern not covered yet
+   These are task types, not harness names. Generic types (code change, bug fix, research, review, docs) default to the base run; a harness is added only when a specialized one genuinely fits.
 6. Consider GJC-style visible-thinking overlays as normal Tink harnesses, not as new command surfaces:
    - If the request is an ambiguous idea, early product concept, or underspecified implementation prompt, prefer `requirements-interview` before planning or coding. This is the default harness when Stitch is expected to trigger for goal ambiguity or missing acceptance criteria.
    - If the request asks for a plan, architecture decision, large refactor, migration, or broad public contract change, consider `plan-consensus`.
    - If the work naturally splits into multiple durable milestones, add `goal-checkpoint` and create `.tink/current/goals.json` after approval.
    - If parallel review, independent verification, or handoff would reduce risk, add `delegation-brief` and create `.tink/current/delegation.md` after approval. This harness prepares briefs only; it never starts tmux, worktrees, workers, or external agents.
-7. Pick the best existing harness set using the context budget policy below. Prefer 1-3 harnesses, but do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
+   **Overlay selection is rule-bound, not taste.** After drafting the Goals list for the approval payload, re-check before presenting it:
+   - `goal-checkpoint` is REQUIRED (not optional) when ANY of these is true: the Goals list has 2+ goals; 2+ harnesses run sequentially; the plan is expected to need 4+ steps; or the work spans multiple components/directories. Create `goals.json` after approval.
+   - `plan-consensus` must be explicitly considered for any from-scratch implementation, reimplementation, migration, or public contract/API design. If skipped, record a one-line reason in the 오버레이 점검 line.
+   - The context budget and the "prefer 1-3 harnesses" guidance never justify dropping a REQUIRED overlay: overlays are cheap state files, not extra loaded context. A large task judged "fine with default harnesses" because the synthesis probe found a fit is a selection bug - the probe only answers whether a custom procedure is needed, not whether overlays are needed.
+7. Pick the smallest effective set using the context budget policy below: the base run plus 0-3 specialized harnesses. When no specialized harness fits, select the base run alone - do not force a generic fit. Do not use a hard cap when several tiny harnesses add useful checks without crowding context. When the task is ambiguous (Stitch goal-ambiguity is expected to trigger), start with `requirements-interview` alone; add a second harness only after the user clarifies. Do not bundle 2+ harnesses for ambiguous tasks upfront.
    After selecting, run a quick quality check using the index metadata for each chosen harness:
    - If fewer than 2 words in `use_when` match the current task description (case-insensitive) → treat as a Stitch harness-mismatch signal
@@ -452,7 +496,7 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
 9. Add opt-in guard candidates to `notes.md` only as suggestions. Do not register enforcement hooks unless the user separately approves.
 10. Run the synthesis probe on the initial harness choice. The probe produces one of three outcomes: strong fit (0-1 yes), generic fit (2-3 yes), or no fit (4-5 yes or no harness matches).
 11. If the probe finds no fit, load `harness-synthesis` and draft a domain-specific harness for this run instead of forcing a bad fit.
-12. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the built-in harness. Do not save it by default.
+12. If the probe finds a generic fit (2-3 yes), propose a run-only draft harness or domain rules alongside the base run or selected harness. Do not save it by default.
 13. If too many tools, skills, agents, or harnesses are available, load `harness-curation` and choose the smallest effective set before loading more context.
 14. If lightweight signals show a recurring operating habit, use `harness-curation` (its habit calibration section) to make one advisory recommendation without loading a separate body.
 15. If the user points to research, notes, examples, prior failures, or "what I learned today", synthesize from those inputs. Extract behavior-shaping rules and reusable procedure, not a summary.
@@ -471,17 +515,17 @@ This is the Lane 3 full path from Quick triage. Lanes 1 and 2 intentionally skip
 ## Synthesis probe
-Run this short probe even when a built-in harness seems usable. It prevents broad default harnesses from hiding repeatable domain workflows.
+Run this short probe even when the base run or an existing harness seems sufficient. It prevents broad defaults from hiding repeatable domain workflows.
 Answer yes/no:
 1. Is this likely to recur in this repo, product, customer segment, release process, or personal workflow?
 2. Would a domain-specific rule change the first action, the order of steps, the stop condition, or the verification evidence?
-3. Is the selected built-in harness only a loose or generic fit?
+3. Is the base run or selected harness only a loose or generic fit for this domain?
 4. Did the user correction, prior run note, failed check, research source, or named project context expose a reusable rule?
 5. Would a one-screen draft reduce future context or repeated explanation?
 Decision:
-- 0-1 yes: use the selected built-in harness only. Record why no draft is needed if relevant.
+- 0-1 yes: proceed with the base run or selected harness set as-is. Record why no draft is needed if relevant.
 - 2-3 yes: propose a run-only draft harness. It applies to this run, is written into `.tink/current/plan.md` or `notes.md`, and is not saved by default.
 - 4-5 yes: propose a run-only draft now and ask whether it should become a save candidate after the run. Saving still needs the approval payload.
@@ -490,7 +534,7 @@ Run-only draft format:
 ```text
 임시 하네스 초안 (이번 작업 전용):
 - name: <specific-lowercase-name>
-- why not just built-in: <one sentence>
+- why base run is not enough: <one sentence>
 - domain rules: <2-4 bullets that change execution>
 - checks: <2-4 evidence checks>
 - save policy: 이번 run에는 적용, 저장은 반복 근거와 별도 승인 후만
@@ -528,8 +572,10 @@ Use concise, selection-oriented wording. The recommendation must include the fir
 User-facing approval wording:
 - Do not show internal terms such as `Probe`, `probe`, `합성 프로브`, `generic fit`, `제너릭 fit`, or `Stitch`.
-- Translate the synthesis probe result as `맞춤 절차 판단`.
-- Translate `generic fit` as `기본 하네스는 큰 틀만 맞음` or `기본 하네스만으로는 부족함`.
+- Translate the synthesis probe result as `맞춤 절차 판단`. Its "sufficient" verdict must read `별도 맞춤 절차는 불필요` - never `기본 하네스로 충분`, which wrongly implies the whole harness SET was judged sufficient. The probe only decides whether a custom synthesized procedure is needed.
+- Translate `generic fit` as `기본 절차는 큰 틀만 맞음` or `기본 절차만으로는 부족함`.
+- When no harness is selected, show `**🛠️ 선택한 하네스**: 기본 절차(하네스 없음)` with one short phrase about what the base run provides. Do not invent a harness name to fill the line.
+- Always include an `**오버레이 점검:**` line in the approval payload: one verdict per overlay harness that the rule-bound check makes relevant, e.g. `goal-checkpoint 선택(목표 3개·순차 2단계) · plan-consensus 제외(문서 보완은 기존 계약 범위)`. An omitted overlay without a reason is a selection bug.
 - Translate visible Stitch output as `확인할 점`, not `Stitch 점검`.
 - Explain what each selected harness does in one short phrase before asking for approval.
 - Show a short `하네스 선택 과정` when more than one harness or a run-only draft is selected: candidate considered, selected harnesses, and why each was chosen.
@@ -537,7 +583,7 @@ User-facing approval wording:
 Approval option counts (always exactly one applies):
 - Default (no Stitch, no run-only draft): 4 options — 승인 / 조정 / 새 하네스 초안 만들기 / 취소
-- Run-only draft offered: 4 options — 승인 / 조정 / 기본 하네스만 사용 / 취소
+- Run-only draft offered: 4 options — 승인 / 조정 / 기본 절차만 사용 / 취소
 - Stitch soft gate: 4 options — 승인 / 요구사항 입력 / 이대로 진행 / 취소
 - Stitch hard gate (or Save Gate): 3 options — 승인 / 요구사항 입력 / 취소. Never offer `이대로 진행` / `Continue as-is`.
@@ -547,16 +593,17 @@ Approval option counts (always exactly one applies):
 **🎯 Goals**
 - <goal>
-**🛠️ 선택한 하네스**: `code-change + review`
-- `code-change`: 범위가 정해진 코드 변경을 작게 수행하는 하네스
-- `review`: 변경 위험과 누락을 확인하는 점검 하네스
+**🛠️ 선택한 하네스**: 기본 절차 + `goal-checkpoint`
+- 기본 절차: 별도 하네스 없이 실행 상태 계약(계획·검증·증거)으로 진행
+- `goal-checkpoint`: 긴 작업을 목표 단위로 나눠 완료 증거를 남기는 하네스
 **하네스 선택 과정**
-- 후보: `code-change`, `review`, `harness-synthesis`
-- 선택: `code-change + review`
-- 이유: 변경 범위가 좁고, 회귀 확인이 필요합니다. 별도 맞춤 초안은 필요하지 않습니다.
+- 후보: 기본 절차, `goal-checkpoint`, `harness-synthesis`
+- 선택: 기본 절차 + `goal-checkpoint`
+- 이유: 일반 코드 변경이라 특화 하네스는 불필요하고, 목표가 2개 이상이라 goal-checkpoint가 필수입니다.
-- **맞춤 절차 판단:** 기본 하네스로 충분
+- **오버레이 점검:** <예: goal-checkpoint 선택(목표 2개) · plan-consensus 제외(범위 좁음)>
+- **맞춤 절차 판단:** 별도 맞춤 절차는 불필요
 - **첫 실행:** 관련 파일을 먼저 읽고 검증 명령 후보를 확정합니다.
 ? 진행할까요?
@@ -574,20 +621,22 @@ If a run-only draft or new harness is useful:
 **🎯 Goals**
 - <goal>
-**🛠️ 선택한 하네스**: `<built-in> + 임시 초안`
-- `<built-in>`: 기본 작업 흐름을 잡는 하네스
+**🛠️ 선택한 하네스**: 기본 절차 + 임시 초안
+- 기본 절차: 실행 상태 계약으로 작업 큰 틀을 잡음
 - `customer-interview-synthesis`: 이번 작업의 인터뷰 분석 순서를 보강하는 임시 초안
 **하네스 선택 과정**
-- 후보: `<built-in>`, `harness-synthesis`, `customer-interview-synthesis`
-- 선택: `<built-in> + customer-interview-synthesis`
-- 이유: 기본 하네스는 큰 틀만 맞고, 인터뷰 원문 근거와 pain point 구분은 별도 절차가 필요합니다.
+- 후보: 기본 절차, `harness-synthesis`, `customer-interview-synthesis`
+- 선택: 기본 절차 + `customer-interview-synthesis`
+- 이유: 기본 절차는 큰 틀만 맞고, 인터뷰 원문 근거와 pain point 구분은 별도 절차가 필요합니다.
+**오버레이 점검:** <상동 형식>
-**맞춤 절차 판단:** 기본 하네스만으로는 부족함
+**맞춤 절차 판단:** 기본 절차만으로는 부족함
 **임시 하네스 초안** (이번 작업 전용):
 - **name:** `customer-interview-synthesis`
-- **why not just built-in:** 일반 research보다 인터뷰 단위, 원문 근거, pain point 반복성이 중요합니다.
+- **why base run is not enough:** 기본 절차보다 인터뷰 단위, 원문 근거, pain point 반복성이 중요합니다.
 - **domain rules:**
   - 인터뷰별 원문 근거를 먼저 분리
   - 반복 pain point와 단발 의견을 구분
@@ -596,9 +645,9 @@ If a run-only draft or new harness is useful:
 - **save policy:** 이번 run에는 적용, 저장은 반복 근거와 별도 승인 후만
 ? 진행할까요?
-❯ 1. 승인 (권장) — 기본 하네스 + 임시 초안으로 `.tink/current/` 생성
+❯ 1. 승인 (권장) — 기본 절차 + 임시 초안으로 `.tink/current/` 생성
   2. 조정
-  3. 기본 하네스만 사용
+  3. 기본 절차만 사용
   4. 취소
 ```
@@ -623,7 +672,8 @@ If Stitch triggers as a soft gate, merge it into the approval format. The user-f
 - 선택: `<harness>`
 - 이유: <why selected>
-- **맞춤 절차 판단:** <기본 하네스로 충분 | 기본 하네스만으로는 부족함 | 새 맞춤 절차 필요>
+- **오버레이 점검:** <overlay별 선택/제외와 이유>
+- **맞춤 절차 판단:** <별도 맞춤 절차는 불필요 | 기본 절차만으로는 부족함 | 새 맞춤 절차 필요>
 - **이유:** ...
 - **첫 실행:** ...

package/templates/claude/commands/tink/frog.md CHANGED Viewed

@@ -50,17 +50,25 @@ Use Korean field values when `.tink/config.json` language is `ko` or `auto` with
    - weak: static index, git-only evidence, stale current notes, or model judgment
    If a lifecycle summary is present, treat it as a health summary, not as authority. Use its `confidence`, `evidence_grade`, `evidence_handles`, and `safe_next_action` to explain the recommendation. A low-confidence or weak lifecycle entry must default to `keep`, `observe`, or `needs evidence`.
    Sort lifecycle-backed candidates first by evidence strength, then by recommendation: `frog_candidate`, `merge_candidate`, `weave`, `observe`, `keep`.
+4b. When invoked without a specific target, the health summary's judgment IS the default agenda - do not wait for the user to name harnesses:
+   - every harness whose lifecycle recommendation is `frog_candidate`, `merge_candidate`, or `weave` appears in the proposal with its evidence grade and reason;
+   - harnesses judged `keep` or `observe` are compressed into one line (`그 외 N개: 유지/관찰`);
+   - the same applies to overlap findings (겹침 점검): harnesses flagged as overlapping each other are proposed as one merge/retire group, not listed separately.
+4c. Check for retired generic built-ins. `code-change`, `bug-fix`, `research`, `review`, and `docs` were retired from the default set - generic work now runs on the base run without a harness. If any of them remain in `.tink/harnesses/`:
+   - unmodified leftovers: recommend `npx tink-harness@latest update`, which removes them automatically; offer direct removal only as the fallback when update is not possible;
+   - user-modified leftovers (preserved by update): propose either narrowing them into a clearly-named user harness (specific `use_when`, `kind: "synthesized"`) or deleting them, with the user's modifications quoted as evidence.
 5. Identify candidates:
    - never used with strong evidence
    - not used recently with strong evidence
    - overlaps strongly with another harness
-   - too broad to guide behavior
+   - too broad to guide behavior (a generic-purpose harness is a retirement candidate by default policy: the default set is specialized-only)
    - repeatedly ignored during `/tink:cast`
 6. For each candidate, show evidence grade and recommendation:
    - keep
    - merge into another harness
    - delete
    - rewrite via `/tink:weave`
+6a. If `.tink/memory/candidate/` exists, also review stale draft entries: a candidate with no supporting run, ledger, or friction evidence after 30+ days, or one superseded by an approved memory or harness change, should be proposed for a move to `.tink/memory/rejected/` with a one-line reason. Promotion of live candidates belongs to `/tink:weave`; frog only clears the stale ones. Apply the same approval rules as other operations.
 6b. If `.tink/rules/index.json` exists, also inspect rule quality:
    - keep: concrete `when`, `reason`, and useful `checks` or `include_paths`
    - rewrite: too broad, unclear reason, or missing verification

package/templates/claude/commands/tink/list.md CHANGED Viewed

@@ -1,104 +1,104 @@
----
-description: Inspect available Tink harnesses and recent usage signals.
----
-# /tink:list
-List available Tink harnesses without loading every harness body.
-## Procedure
-1. Read `.tink/harnesses/index.json`.
-2. Read only compact usage metadata from `.tink/runs/` (frontmatter `selected_harnesses` / `actually_loaded_harnesses` + dates), `.tink/maintenance/ledger.jsonl`, and `.tink/maintenance/weave-queue.json`. Do not load raw logs.
-3. Treat `.tink/current/` as weak evidence unless it is clearly from the same active conversation. If context is uncertain, label it `stale current candidate`, not proof of usage.
-4. Classify every harness into exactly one of three categories:
-   - **working** — directly performs tasks (e.g. `code-change`, `bug-fix`, `research`, `review`, `docs`, `ship`).
-   - **meta** — manages other harnesses or Tink itself. Treat these names as meta regardless of `kind`: `harness-synthesis`, `harness-curation`, `tink-feedback-apply`.
-   - **custom (this repo)** — `kind: synthesized` in `index.json` (created in this repo, not part of the default set). If a synthesized harness also matches a meta name, prefer meta.
-5. Compute the signal per harness:
-   - 🟢 **active** — appears in any `.tink/runs/*.md` frontmatter or `.tink/maintenance/ledger.jsonl` entry.
-   - ⚪ **unknown** — no run/ledger/memory evidence. Do not call it `quiet` or `candidate for purge` from the static index alone. Do not infer non-use from missing evidence.
-6. Show all three categories every time, even when one is empty. For an empty category, render `_(아직 없음)_` (or the English equivalent if the project language is `en`) instead of an item list.
-7. Do not output the `evidence` field. Usage is now compressed into `signal`.
-## Output format
-Always start with a header block that defines the fields and categories. Render each harness as a multi-line block — one field per line, never collapsed onto one line. Close with an assessment and command suggestions.
-Use this exact skeleton (translate field labels and descriptions to the language in `.tink/config.json`):
-````markdown
-### 🧶 Tink 하네스 목록
-> **필드 설명**
-> - **purpose** — 이 하네스가 다루는 작업
-> - **context** — Claude 컨텍스트 점유량
->   · `tiny` 아주 짧음  · `small` 보통 체크리스트  · `large` 별도 승인 후 읽는 큰 하네스
-> - **last used** — 가장 최근 실행 날짜 (없으면 `미사용`)
-> - **signal** — 🟢 `active` 사용 기록 있음  · ⚪ `unknown` 아직 사용 기록 없음
->
-> **카테고리 설명**
-> - **작업 하네스** — 실제 작업을 수행 (코드 변경·리뷰·문서 등)
-> - **메타 하네스** — 다른 하네스나 Tink 자체를 관리 (선택·합성·피드백 반영)
-> - **이 저장소 전용** — 이 프로젝트에서 직접 만들어 저장된 하네스
----
-#### 🛠️ 작업 하네스
-##### `<name>`
-- **purpose**: <one short sentence>
-- **context**: <tiny | small | large>
-- **last used**: <YYYY-MM-DD | 미사용>
-- **signal**: 🟢 active | ⚪ unknown
-#### 🧭 메타 하네스
-##### `<name>`
-- **purpose**: …
-- **context**: …
-- **last used**: …
-- **signal**: …
-(또는 비어 있으면)
-_(아직 없음)_
-#### 🔧 이 저장소 전용
-##### `<name>`
-- **purpose**: …
-- **context**: …
-- **last used**: …
-- **signal**: …
-(또는 비어 있으면)
-_(아직 없음)_
----
-### 📊 평가
-- **가장 활발**: …
-- **한 번도 안 쓴 하네스**: …
-- **균형/주의점**: 한두 문장 평가.
-### 💡 다음에 쓸 수 있는 명령
-- `/tink:cast <작업 설명>` — 적절한 하네스를 골라 작업 시작
-- `/tink:weave` — 자주 쓰는 하네스에 누적된 개선 사항 반영 (해당될 때만)
-- `/tink:frog` — 오래 사용 안 된 하네스 정리 후보 검토 (실제 삭제는 별도 승인)
-- `/tink:setup` — 언어·범위·훅 정책 등 Tink 설정 점검
-````
-## Assessment & command-suggestion rules
-- The 평가 section must mention at least: the most-used harness, every harness with an `unknown` signal, and any obvious imbalance (e.g. meta harnesses all untouched).
-- Always include `/tink:cast` and `/tink:setup` as default next steps.
-- Only suggest `/tink:weave` when at least one active harness has user-correction evidence, repeated runs of the same category, or items queued in `.tink/maintenance/weave-queue.json`.
-- Only suggest `/tink:frog` when at least one harness has been `unknown` for the entire visible history AND there is no plausible upcoming use. Frame it as "정리 후보 검토", not "삭제".
-## Output style
-Use bullets, not tables. One field per line per harness. Never collapse a harness into a single line.
-## Do not
-- Do not read every harness body by default.
-- Do not infer non-use from missing evidence.
-- Do not remove anything. Use `/tink:frog` for removal candidates.
-- Do not output the `evidence` field.
-- Do not hide a category because it has zero items — render `_(아직 없음)_` instead.
+---
+description: Inspect available Tink harnesses and recent usage signals.
+---
+# /tink:list
+List available Tink harnesses without loading every harness body.
+## Procedure
+1. Read `.tink/harnesses/index.json`.
+2. Read only compact usage metadata from `.tink/runs/` (frontmatter `selected_harnesses` / `actually_loaded_harnesses` + dates), `.tink/maintenance/ledger.jsonl`, and `.tink/maintenance/weave-queue.json`. Do not load raw logs.
+3. Treat `.tink/current/` as weak evidence unless it is clearly from the same active conversation. If context is uncertain, label it `stale current candidate`, not proof of usage.
+4. Classify every harness into exactly one of three categories:
+   - **working** — directly performs or gates tasks (e.g. `ship`, `pr-merge`, `requirements-interview`, `plan-consensus`, `goal-checkpoint`, `delegation-brief`). Generic work (code change, research, review, docs) runs on the base run without a harness, so it does not appear here.
+   - **meta** — manages other harnesses or Tink itself. Treat these names as meta regardless of `kind`: `harness-synthesis`, `harness-curation`, `tink-feedback-apply`.
+   - **custom (this repo)** — `kind: synthesized` in `index.json` (created in this repo, not part of the default set). If a synthesized harness also matches a meta name, prefer meta.
+5. Compute the signal per harness:
+   - 🟢 **active** — appears in any `.tink/runs/*.md` frontmatter or `.tink/maintenance/ledger.jsonl` entry.
+   - ⚪ **unknown** — no run/ledger/memory evidence. Do not call it `quiet` or `candidate for purge` from the static index alone. Do not infer non-use from missing evidence.
+6. Show all three categories every time, even when one is empty. For an empty category, render `_(아직 없음)_` (or the English equivalent if the project language is `en`) instead of an item list.
+7. Do not output the `evidence` field. Usage is now compressed into `signal`.
+## Output format
+Always start with a header block that defines the fields and categories. Render each harness as a multi-line block — one field per line, never collapsed onto one line. Close with an assessment and command suggestions.
+Use this exact skeleton (translate field labels and descriptions to the language in `.tink/config.json`):
+````markdown
+### 🧶 Tink 하네스 목록
+> **필드 설명**
+> - **purpose** — 이 하네스가 다루는 작업
+> - **context** — Claude 컨텍스트 점유량
+>   · `tiny` 아주 짧음  · `small` 보통 체크리스트  · `large` 별도 승인 후 읽는 큰 하네스
+> - **last used** — 가장 최근 실행 날짜 (없으면 `미사용`)
+> - **signal** — 🟢 `active` 사용 기록 있음  · ⚪ `unknown` 아직 사용 기록 없음
+>
+> **카테고리 설명**
+> - **작업 하네스** — 실제 작업을 수행하거나 안전판 역할 (출시·인터뷰·목표 관리 등). 일반 코드 변경·리뷰·문서는 하네스 없이 기본 절차로 진행됩니다.
+> - **메타 하네스** — 다른 하네스나 Tink 자체를 관리 (선택·합성·피드백 반영)
+> - **이 저장소 전용** — 이 프로젝트에서 직접 만들어 저장된 하네스
+---
+#### 🛠️ 작업 하네스
+##### `<name>`
+- **purpose**: <one short sentence>
+- **context**: <tiny | small | large>
+- **last used**: <YYYY-MM-DD | 미사용>
+- **signal**: 🟢 active | ⚪ unknown
+#### 🧭 메타 하네스
+##### `<name>`
+- **purpose**: …
+- **context**: …
+- **last used**: …
+- **signal**: …
+(또는 비어 있으면)
+_(아직 없음)_
+#### 🔧 이 저장소 전용
+##### `<name>`
+- **purpose**: …
+- **context**: …
+- **last used**: …
+- **signal**: …
+(또는 비어 있으면)
+_(아직 없음)_
+---
+### 📊 평가
+- **가장 활발**: …
+- **한 번도 안 쓴 하네스**: …
+- **균형/주의점**: 한두 문장 평가.
+### 💡 다음에 쓸 수 있는 명령
+- `/tink:cast <작업 설명>` — 적절한 하네스를 골라 작업 시작
+- `/tink:weave` — 자주 쓰는 하네스에 누적된 개선 사항 반영 (해당될 때만)
+- `/tink:frog` — 오래 사용 안 된 하네스 정리 후보 검토 (실제 삭제는 별도 승인)
+- `/tink:setup` — 언어·범위·훅 정책 등 Tink 설정 점검
+````
+## Assessment & command-suggestion rules
+- The 평가 section must mention at least: the most-used harness, every harness with an `unknown` signal, and any obvious imbalance (e.g. meta harnesses all untouched).
+- Always include `/tink:cast` and `/tink:setup` as default next steps.
+- Only suggest `/tink:weave` when at least one active harness has user-correction evidence, repeated runs of the same category, or items queued in `.tink/maintenance/weave-queue.json`.
+- Only suggest `/tink:frog` when at least one harness has been `unknown` for the entire visible history AND there is no plausible upcoming use. Frame it as "정리 후보 검토", not "삭제".
+## Output style
+Use bullets, not tables. One field per line per harness. Never collapse a harness into a single line.
+## Do not
+- Do not read every harness body by default.
+- Do not infer non-use from missing evidence.
+- Do not remove anything. Use `/tink:frog` for removal candidates.
+- Do not output the `evidence` field.
+- Do not hide a category because it has zero items — render `_(아직 없음)_` instead.

package/templates/claude/commands/tink/setup.md CHANGED Viewed

@@ -86,7 +86,7 @@ Use this wording in Korean:
 Tink는 두 종류의 파일을 씁니다.
 1. 재사용 하네스 (Reusable Harnesses): `.tink/harnesses/`
-   작업 방식 템플릿입니다. 예: bug-fix, research, review.
+   기능 특화 작업 방식 템플릿입니다. 예: ship, goal-checkpoint, plan-consensus.
    팀이 같이 쓰면 유용하므로 보통 git에 커밋합니다.
 2. 실행 상태 (Run State): `.tink/current/`, `.tink/runs/`, `.tink/cache/`

package/templates/claude/commands/tink/update.md CHANGED Viewed

@@ -40,6 +40,7 @@ npx tink-harness@latest update
 The `update` subcommand asks only one question - which agent surface to refresh (Claude Code, Codex, or both). Everything else updates automatically:
 - **Always overwrites**: commands, skills, maintenance, and runtime tools (`.claude/commands/tink/`, `.claude/skills/tink/`, `.tink/maintenance/`, `.tink/tools/`) — so you get the latest harness runner, report tools, and command behavior automatically.
 - **Preserves if modified**: harnesses, hooks, memory, and config (`.tink/harnesses/`, `.tink/hooks/`, `.tink/memory/`, `.tink/config.json`) — respects your `weave` customizations and local settings.
+- **Reuses stored choices**: language, install scope, and git policy come from `.tink/config.json`. With `git_policy: "none"` (커밋 안 함) the updater never creates or edits `.gitignore`, and an existing whole-directory `.tink/` ignore line is left as-is.
 ## Output format (source repo)

package/templates/claude/commands/tink/weave.md CHANGED Viewed

@@ -28,7 +28,11 @@ Use Korean field values when `.tink/config.json` language is `ko` or `auto` with
    If `.tink/maintenance/friction.jsonl` exists, read only compact recent entries and count repeated `check_failed`, `check_skipped`, `blocked`, gate denial, or rollback events. Repeated friction can justify a harness edit, rule graph update, or opt-in guard candidate.
    If `.tink/tools/generate-harness-lifecycle-summary.mjs` exists, run `node .tink/tools/generate-harness-lifecycle-summary.mjs` from the repo root before ranking candidates. The generated `.tink/maintenance/harness-lifecycle.json` is a report, not approval or reusable memory.
    If `.tink/maintenance/harness-lifecycle.json` or another summary following `.tink/schemas/harness-lifecycle.schema.json` exists, read it as a harness health summary. Prefer entries with recommendation `weave`, high or medium confidence, and concrete `evidence_handles`. Low-confidence entries should stay as observation unless the user explicitly asks to act on them.
-2. Identify one or a few active harnesses to improve using real failures and evidence:
+1b. Scan promotion candidates (임시초안 승격) - weave promotes as well as improves:
+   - **Run-only draft harnesses**: read recent `.tink/runs/*.md` for recorded draft names and domain rules. If the same draft (same name, or clearly the same domain rules) appears in 2+ run records, propose promoting it into `.tink/harnesses/<name>.md` plus an `index.json` entry with `kind: "synthesized"`. Score it with the harness synthesis contract (specificity, actionability, verifiability, reuse likelihood, context cost) before proposing.
+   - **Candidate memory**: read `.tink/memory/candidate/` when it exists. If 2+ runs, ledger, or friction entries support one candidate, propose moving it to `.tink/memory/approved/` as one compact file, with evidence handles recorded under `.tink/memory/evidence/`. If the user declines, move it to `.tink/memory/rejected/` with a one-line reason so it is not proposed again.
+   - Every promotion is a reusable-state write: it always goes through the Save Gate approval payload and is appended to `.tink/maintenance/ledger.jsonl`. Without 2+ independent evidence handles, present the candidate as an observation, not a proposal.
+2. Identify one or a few active harnesses to improve using real failures and evidence. When invoked without a specific target, the health summary's `weave` recommendations plus the promotion scan above form the default agenda:
    - repeated mistakes
    - user corrections
    - failed checks
@@ -76,16 +80,16 @@ Use Korean field values when `.tink/config.json` language is `ko` or `auto` with
 ## Approval format
 ```text
 Hone target:
-- code-change
+- ship
 Evidence:
-- source: `.tink/runs/2026-05-22-code-change.md`
+- source: `.tink/runs/2026-05-22-release-pipeline.md`
 - classification: repeated
 - observed failure: verification command was unclear in two runs
 Approval payload:
 - operation: weave
-- destination files: `.tink/harnesses/code-change.md`, `.tink/harnesses/index.json` if metadata changes, `.tink/rules/index.json` if routing changes
+- destination files: `.tink/harnesses/ship.md`, `.tink/harnesses/index.json` if metadata changes, `.tink/rules/index.json` if routing changes
 - context-cost delta: neutral or smaller
 - ledger: append op ID to `.tink/maintenance/ledger.jsonl`
 - rollback: revert this patch or rerun `/tink:weave` with the previous trigger
@@ -99,6 +103,30 @@ Proposed improvement:
   3. 취소
 ```
+Promotion proposal format (run-only draft → saved harness):
+```text
+승격 후보:
+- `customer-interview-synthesis` (임시 초안, 2개 run에서 반복 사용)
+Evidence:
+- `.tink/runs/2026-06-01-1010-interview-analysis.md`
+- `.tink/runs/2026-06-09-1430-interview-round2.md`
+- classification: repeated
+Approval payload:
+- operation: harness-create (promotion)
+- destination files: `.tink/harnesses/customer-interview-synthesis.md`, `.tink/harnesses/index.json`
+- synthesis score: specificity 4 · actionability 4 · verifiability 3 · reuse 4 · context cost tiny
+- ledger: append op ID to `.tink/maintenance/ledger.jsonl`
+- rollback: delete the file and index entry, or rerun `/tink:frog`
+? 진행할까요?
+❯ 1. 승인 — 하네스로 승격 저장
+  2. 조정
+  3. 거절 — 다시 제안하지 않도록 기록
+```
 ## Do not
 - Do not rewrite a harness from scratch unless the user asks.
 - Do not add broad principles that do not change behavior.

package/templates/codex/skills/tink-core/RULES.md CHANGED Viewed

@@ -56,20 +56,20 @@ Codex `$tink:cast` must show a visible approval step for every non-trivial run.
 When multiple harnesses or a run-only draft are selected, briefly explain each harness and include a short section labeled `하네스 선택 과정`: candidates considered, selected harnesses, and the reason each earns its place. Use natural Korean scope wording such as `완료 기준을 먼저 나누겠습니다` or `이번 점검은 두 범위로 보겠습니다`; avoid awkward phrasing like `"더 잘 동작하기"의 기준이 두 갈래입니다`.
-Default Korean options are `승인`, `조정`, `취소`. If a run-only draft is proposed, use `승인`, `조정`, `기본 하네스만 사용`, `취소`. If a high-impact safety or quality branch is visible, use `승인`, `요구사항 입력`, `이대로 진행`, `취소`. For hard gates or reusable-state saves, use only `승인`, `요구사항 입력`, `취소`.
+Default Korean options are `승인`, `조정`, `취소`. If a run-only draft is proposed, use `승인`, `조정`, `기본 절차만 사용`, `취소`. If a high-impact safety or quality branch is visible, use `승인`, `요구사항 입력`, `이대로 진행`, `취소`. For hard gates or reusable-state saves, use only `승인`, `요구사항 입력`, `취소`.
-Option label quality rules: use short, common, readable labels only. Good Korean labels include `승인`, `조정`, `취소`, `요구사항 입력`, `기본 하네스만 사용`, `새 하네스 초안 만들기`, `구조 점검`, `내용 점검`, and `전체 점검`. Do not invent compressed Korean labels, transliterated fragments, or unclear summaries such as `콘데의달 지질`. If the idea is too specific for a clean 1-5 word label, put the detail in `description` and use a generic label such as `내용 점검` or `전체 점검`. Before calling `request_user_input`, reread each Korean label; if it looks misspelled, unnatural, or semantically unclear, replace it with a plain fallback label.
+Option label quality rules: use short, common, readable labels only. Good Korean labels include `승인`, `조정`, `취소`, `요구사항 입력`, `기본 절차만 사용`, `새 하네스 초안 만들기`, `구조 점검`, `내용 점검`, and `전체 점검`. Do not invent compressed Korean labels, transliterated fragments, or unclear summaries such as `콘데의달 지질`. If the idea is too specific for a clean 1-5 word label, put the detail in `description` and use a generic label such as `내용 점검` or `전체 점검`. Before calling `request_user_input`, reread each Korean label; if it looks misspelled, unnatural, or semantically unclear, replace it with a plain fallback label.
-When `request_user_input` is unavailable, write the same approval request as a normal assistant message and wait for the user's answer. Do not create run state, load harness bodies, edit files, run commands, or continue the task before the answer. A user's `$tink:cast` invocation means "prepare and ask for approval", not "start immediately". Exception - quick triage Lane 1: when the request is clearly simple and safe (a question, a read-only check, or one obvious localized edit with no hard-gate signals), start immediately with a one-line marker instead of asking; full preparation applies to non-trivial tasks. When an active plan has 3 or more steps, end every response with the Tink progress block (10-cell bar, current step, remaining steps).
+When `request_user_input` is unavailable, write the same approval request as a normal assistant message and wait for the user's answer. Do not create run state, load harness bodies, edit files, run commands, or continue the task before the answer. A user's `$tink:cast` invocation means "prepare and ask for approval", not "start immediately". Exception - quick triage Lane 1: when the request is clearly simple and safe (a question, a read-only check, or one obvious localized edit with no hard-gate signals), start immediately with a one-line marker instead of asking; full preparation applies to non-trivial tasks. Overlay selection is rule-bound: goal-checkpoint is REQUIRED when the run has 2+ goals, 2+ sequential harnesses, 4+ expected steps, or spans multiple components; plan-consensus must be explicitly considered (with a recorded reason if skipped) for from-scratch implementations, reimplementations, migrations, or public contract design. The synthesis-probe verdict only covers custom procedures and must never be presented as the whole harness set being sufficient. When an active plan has 3 or more steps, end every response with the Tink progress block (10-cell bar, current step, remaining steps); right after creating or restructuring a plan, completing a goal/phase, or resuming a run, show the full progress map instead (one bar per phase with the active row marked, an overall bar, and the active phase's steps).
-Use this compact approval request shape. Keep it short; do not expose internal terms such as Stitch, Probe, synthesis probe, generic fit, or hard gate in user-facing text. Translate them into plain wording such as `확인할 점`, `맞춤 절차 판단`, `기본 하네스로 충분`, or `기본 하네스만으로는 부족함`.
+Use this compact approval request shape. Keep it short; do not expose internal terms such as Stitch, Probe, synthesis probe, generic fit, or hard gate in user-facing text. Translate them into plain wording such as `확인할 점`, `맞춤 절차 판단`, `별도 맞춤 절차는 불필요`, or `기본 절차만으로는 부족함`. Never use `기본 하네스로 충분` - the probe verdict covers custom procedures only, not the whole harness set.
 Korean:
 ```md
 이 작업은 Tink run으로 잡고 진행하겠습니다.
-- 선택 하네스: `code-change`
+- 선택 하네스: 기본 절차(하네스 없음)
 - 범위: Codex 승인 UX 문구와 테스트만 수정
 - 제외: release, publish, unrelated refactor
 - 승인 후 첫 단계: Codex core rules에 승인 요청 형식 추가
@@ -84,7 +84,7 @@ English:
 ```md
 I will handle this as a Tink run.
-- selected harnesses: `code-change`
+- selected harnesses: base run (no harness)
 - scope: update Codex approval UX text and tests only
 - out of scope: release, publish, unrelated refactors
 - first step after approval: add the approval request format to Codex core rules
@@ -98,7 +98,7 @@ If `request_user_input` is available, map this content into the prompt and use o
 ## Harness Procedure
-For `$tink:cast`, classify the task as code change, bug fix, research, review, docs, ship/release, or new pattern. Ask for current-run approval using the Codex Approval Protocol, then load only selected harness bodies after approval. If no built-in harness fits, use `harness-synthesis` to draft a narrow run-only harness instead of forcing a generic fit.
+For `$tink:cast`, classify the task as code change, bug fix, research, review, docs, ship/release, or new pattern. These are task types, not harness names: generic types run as a base run (no harness - the run state contract alone provides scope, verification, and evidence), and a harness is selected only when a specialized one genuinely fits (overlays, ship/release gates, meta harnesses, or user/synthesized domain harnesses). Never force a loose-fit harness just to name one. Ask for current-run approval using the Codex Approval Protocol, then load only selected harness bodies after approval. If a repeatable domain procedure is missing, use `harness-synthesis` to draft a narrow run-only harness instead of forcing a generic fit.
 Create run state before deeper work: