npm - tink-harness - Versions diffs - 1.3.0 → 1.5.0 - Mend

tink-harness 1.3.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +31 -0
package/README.ko.md +11 -3
package/README.md +12 -4
package/VERSIONING.md +1 -1
package/bin/install.js +67 -0
package/commands/cast.md +4 -1
package/docs/context-budget-ledger.ko.md +2 -0
package/docs/context-budget-ledger.md +2 -0
package/docs/context-metrics-evaluator.ko.md +39 -0
package/docs/context-metrics-evaluator.md +39 -0
package/docs/context-run-history-rollup.ko.md +36 -0
package/docs/context-run-history-rollup.md +36 -0
package/docs/context-run-record-policy.ko.md +50 -0
package/docs/context-run-record-policy.md +50 -0
package/docs/context-threshold-status.ko.md +43 -0
package/docs/context-threshold-status.md +43 -0
package/docs/pr/2026-06-08-codex-surface-cleanup.ko.md +27 -0
package/docs/pr/2026-06-08-context-metrics-artifact.ko.md +30 -0
package/docs/pr/2026-06-08-context-metrics-evaluator.ko.md +28 -0
package/docs/pr/2026-06-08-context-run-history-rollup.ko.md +27 -0
package/docs/pr/2026-06-08-context-run-record-policy.ko.md +25 -0
package/docs/pr/2026-06-08-context-threshold-status.ko.md +25 -0
package/docs/pr/2026-06-08-v1.5.0.ko.md +30 -0
package/docs/update-troubleshooting.ko.md +6 -1
package/docs/update-troubleshooting.md +6 -1
package/docs/update-verification-recipe.ko.md +3 -0
package/docs/update-verification-recipe.md +3 -0
package/docs/work-state.ko.md +7 -3
package/docs/work-state.md +7 -3
package/package.json +1 -1
package/templates/claude/commands/tink/cast.md +4 -1
package/templates/codex/skills/tink-core/RULES.md +4 -1
package/templates/tink/schemas/context-metrics-evaluation.schema.json +89 -0

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "tink",
   "description": "A small harness layer for Claude Code and Codex.",
-  "version": "1.3.0",
+  "version": "1.5.0",
   "author": {
     "name": "dotori"
   }

package/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,37 @@ All notable changes to Tink are tracked here.
 No unreleased changes yet.
+## [1.5.0] - 2026-06-08
+### Changed
+- Codex-only install/update now removes repo-local Tink Claude Code command files under `.claude/commands/tink/*.md` so Codex no longer shows them as stale `Source Command Tink ...` entries after a Codex-only refresh.
+- Codex-only install/update now removes the old repo-local `.claude/skills/tink/SKILL.md` Tink surface when it matches the known generated Tink skill, while preserving unknown user-authored content.
+- Update troubleshooting and verification docs now explain when `Source Command Tink ...` is stale and when it is expected because the repo intentionally keeps the Claude Code surface.
+- README and Korean README now highlight the v1.5.0 update behavior for existing Codex users.
+### Added
+- Regression coverage for Codex-only update cleanup of repo-local Claude command and skill surfaces.
+- Korean PR history draft for the v1.5.0 release in `docs/pr/2026-06-08-v1.5.0.ko.md`.
+## [1.4.0] - 2026-06-08
+### Added
+- `context-metrics-evaluation.schema.json` for `.tink/current/context-metrics-evaluation.json`.
+- Context Metrics Evaluator run-state artifact guidance for `/tink:cast` and `$tink:cast`.
+- Fixture-ratio evaluation docs in Korean and English, explaining measured fixture scope versus production telemetry.
+- Test-backed context metrics evaluation fixture that calculates all six context-efficiency metrics at or above the 90% target within fixture scope.
+- Korean PR history draft for the Context Metrics Artifact work in `docs/pr/2026-06-08-context-metrics-artifact.ko.md`.
+### Changed
+- Work State Guide now includes `context-metrics-evaluation.json` in the reading order.
+- README and Korean README now link to the Context Metrics Evaluator docs without adding a new command.
 ## [1.3.0] - 2026-06-08
 ### Added

package/README.ko.md CHANGED Viewed

@@ -8,7 +8,7 @@ Claude Code와 Codex를 위한 작은 하네스 레이어입니다.
 Tink는 지금 작업에 맞는 하네스를 고르고, 실행 상태를 보이게 만들고, 실제 사용 중 생긴 실패와 피드백으로 하네스 세트를 개선합니다.
-**최신 릴리스:** v1.3.0 — context 효율을 점수화하고 선택한 context를 검증과 연결하는 Context Budget Ledger 기반.
+**최신 릴리스:** v1.5.0 — Codex-only update에서 Codex skill picker에 `Source Command Tink ...`로 보이던 repo-local Claude Tink surface를 정리합니다.
 [English](README.md) · **한국어**
@@ -59,14 +59,22 @@ npx tink-harness@latest update
 업데이트 후 Codex skill, schema, Windows 경고가 이상해 보이면 `docs/update-troubleshooting.ko.md` 또는 `docs/update-troubleshooting.md`를 확인하세요.
-## 1.2.0에서 달라진 점
+## 1.5.0에서 달라진 점
+이번 릴리스는 기존 repo에서 Codex skill picker가 헷갈리게 보이는 문제를 고쳤습니다.
+- Codex-only `tink-harness update`가 repo-local `.claude/commands/tink/*.md`와 예전 repo-local `.claude/skills/tink/SKILL.md` Tink surface를 정리합니다.
+- 그래서 Codex에서 `$tink:*` action skill만 기대하는 상황에 `Source Command Tink Frog/List/...` 또는 넓은 `Tink` 항목이 같이 보이는 일을 줄입니다.
+- 의도적으로 Claude Code와 Codex를 둘 다 설치한 경우에는 repo-local Claude Code command가 남을 수 있고, 이때 `Source Command Tink ...` 항목은 정상일 수 있습니다. 자세한 내용은 `docs/update-troubleshooting.ko.md` 또는 `docs/update-troubleshooting.md`를 확인하세요.
+## 1.2.0 이후 기반 개선
 이번 릴리스는 Tink를 Claude Code와 Codex에서 같은 하네스 레이어로 쓰기 쉽게 정리합니다.
 - Codex에는 하나의 넓은 `tink` 스킬 대신 `$tink:cast`, `$tink:verify` 같은 action skill만 보이도록 설치됩니다.
 - 비단순 작업은 `context-pack.md`, `context-map.json`, `excluded-context.md`로 어떤 context를 썼고 뺐는지 남깁니다.
 - Repo Signal과 Context Graph Lite는 새 `tink index` 명령을 만들지 않고도 관련 테스트, 스키마, 동기화 파일, 검증 힌트를 고르는 데 쓰입니다.
-- context 효율 점수화를 위한 Context Budget Ledger는 `docs/context-budget-ledger.ko.md`와 `docs/context-budget-ledger.md`에서 확인할 수 있습니다.
+- context 효율 점수화, fixture 비율 계산, run-history rollup, 90% threshold status, 실제 run 기록 경계 기준은 `docs/context-budget-ledger.ko.md`, `docs/context-budget-ledger.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-metrics-evaluator.md`, `docs/context-run-history-rollup.ko.md`, `docs/context-run-history-rollup.md`, `docs/context-threshold-status.ko.md`, `docs/context-threshold-status.md`, `docs/context-run-record-policy.ko.md`, `docs/context-run-record-policy.md`에서 확인할 수 있습니다.
 - `/tink:verify`와 `$tink:verify`는 같은 Verify Runner 모델을 쓰며 `.tink/current/verification.json`에 검증 증거를 남깁니다.
 - 외부 context는 MCP Safe Profile을 따릅니다. 가장 작은 source handle만 남기고, 신뢰도와 민감도를 표시하며, 위험하거나 너무 넓은 context는 `excluded-context.md`에 따로 기록합니다.

package/README.md CHANGED Viewed

@@ -17,14 +17,14 @@
 </p>
 <p>
-  <a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.3.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
+  <a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.5.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
   <a href="https://www.npmjs.com/package/tink-harness"><img src="https://img.shields.io/npm/v/tink-harness?label=npm&color=cb3837" alt="npm version"></a>
   <a href="https://github.com/dotoricode/tink-harness/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/dotoricode/tink-harness/ci.yml?branch=main&label=ci" alt="CI"></a>
   <a href="https://github.com/dotoricode/tink-harness/blob/main/LICENSE"><img src="https://img.shields.io/github/license/dotoricode/tink-harness" alt="License"></a>
   <a href="https://github.com/dotoricode/tink-harness/stargazers"><img src="https://img.shields.io/github/stars/dotoricode/tink-harness?style=social" alt="GitHub stars"></a>
 </p>
-<p><strong>Latest release:</strong> v1.3.0 — Context Budget Ledger fields for scoring context efficiency and linking selected context to verification.</p>
+<p><strong>Latest release:</strong> v1.5.0 - Codex-only update now removes repo-local Claude Tink surfaces that appeared as <code>Source Command Tink ...</code>.</p>
 **English** · [한국어](README.ko.md)
@@ -124,14 +124,22 @@ To quickly verify the updated install, see `docs/update-verification-recipe.md`
 If an update looks stale or incomplete, see `docs/update-troubleshooting.md` or `docs/update-troubleshooting.ko.md`.
-## What's new in 1.2.0
+## What's new in 1.5.0
+This release fixes the Codex skill picker state after updating an existing repo.
+- Codex-only `tink-harness update` now removes repo-local `.claude/commands/tink/*.md` and the old repo-local `.claude/skills/tink/SKILL.md` Tink surface.
+- This prevents Codex from showing `Source Command Tink Frog/List/...` or a broad repo-local `Tink` entry when the user expects only `$tink:*` action skills.
+- If you intentionally install both Claude Code and Codex surfaces, repo-local Claude Code commands can remain and `Source Command Tink ...` entries may be expected. See `docs/update-troubleshooting.md` or `docs/update-troubleshooting.ko.md`.
+## Recent foundation from 1.2.0+
 This release makes Tink work as one harness layer across Claude Code and Codex.
 - Codex now installs focused `$tink:*` action skills instead of one broad visible `tink` skill, so the picker shows commands like `$tink:cast` and `$tink:verify` cleanly.
 - Non-trivial runs now create context artifacts: `context-pack.md`, `context-map.json`, and `excluded-context.md`.
 - Repo Signals and Context Graph Lite help `/tink:cast` and `$tink:cast` choose relevant tests, schemas, sync partners, and verification hints without adding a new `tink index` command.
-- Context Budget Ledger fields are documented in `docs/context-budget-ledger.md` and `docs/context-budget-ledger.ko.md` for scoring context efficiency without adding a new command.
+- Context Budget Ledger fields, fixture-ratio evaluation, run-history rollup, the 90 percent threshold status, and future real-run record boundaries are documented in `docs/context-budget-ledger.md`, `docs/context-budget-ledger.ko.md`, `docs/context-metrics-evaluator.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-run-history-rollup.md`, `docs/context-run-history-rollup.ko.md`, `docs/context-threshold-status.md`, `docs/context-threshold-status.ko.md`, `docs/context-run-record-policy.md`, and `docs/context-run-record-policy.ko.md` without adding a new command.
 - `/tink:verify` and `$tink:verify` share one portable Verify Runner model and write compact evidence to `.tink/current/verification.json`.
 - External context now follows the MCP Safe Profile: include only the smallest useful source handle, mark confidence and sensitivity, exclude unsafe context visibly, and connect important claims to verification.

package/VERSIONING.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Versioning
-Current version: `1.3.0`
+Current version: `1.5.0`
 Tink follows semver from `1.0.0` onward.

package/bin/install.js CHANGED Viewed

@@ -369,6 +369,70 @@ function removeLegacyCodexSkill(codexTarget) {
   if (!dryRun) fs.rmSync(legacyDir, { recursive: true, force: true });
 }
+function removeIfExists(base, filePath, label = 'legacy') {
+  if (!fs.existsSync(filePath)) return false;
+  log.message(`${dryRun ? `would remove ${label}` : `remove ${label}`} ${displayPath(base, filePath)}`);
+  recordOperation('removedLegacy', base, filePath);
+  if (!dryRun) fs.rmSync(filePath, { recursive: true, force: true });
+  return true;
+}
+function removeRepoLocalClaudeTinkSurface(target) {
+  const commandDir = path.join(target, '.claude/commands/tink');
+  const flatCommandDir = path.join(target, '.claude/commands');
+  const commandFiles = ['setup.md', 'cast.md', 'verify.md', 'list.md', 'frog.md', 'weave.md', 'update.md'];
+  for (const name of commandFiles) {
+    removeIfExists(target, path.join(commandDir, name), 'repo-local Claude command');
+  }
+  if (fs.existsSync(commandDir) && fs.readdirSync(commandDir).length === 0) {
+    removeIfExists(target, commandDir, 'empty repo-local Claude command dir');
+  }
+  const legacyFlatCommands = [
+    'tink-setup.md',
+    'tink-cast.md',
+    'tink-verify.md',
+    'tink-list.md',
+    'tink-frog.md',
+    'tink-weave.md',
+    'tink-update.md',
+    'tink-forge.md',
+    'tink-purge.md',
+    'tink-hone.md'
+  ];
+  for (const name of legacyFlatCommands) {
+    removeIfExists(target, path.join(flatCommandDir, name), 'repo-local Claude command');
+  }
+  if (fs.existsSync(flatCommandDir) && fs.readdirSync(flatCommandDir).length === 0) {
+    removeIfExists(target, flatCommandDir, 'empty repo-local Claude commands dir');
+  }
+  const skillDir = path.join(target, '.claude/skills/tink');
+  const skillParentDir = path.join(target, '.claude/skills');
+  const skillFile = path.join(skillDir, 'SKILL.md');
+  if (!fs.existsSync(skillDir)) {
+    if (fs.existsSync(skillParentDir) && fs.readdirSync(skillParentDir).length === 0) {
+      removeIfExists(target, skillParentDir, 'empty repo-local Claude skills dir');
+    }
+    return;
+  }
+  if (!fs.existsSync(skillFile)) {
+    log.message(`keep unknown ${displayPath(target, skillDir)}`);
+    recordOperation('keptUnknown', target, skillDir);
+    return;
+  }
+  const text = fs.readFileSync(skillFile, 'utf8');
+  if (text.includes('name: tink') && text.includes('# Tink')) {
+    removeIfExists(target, skillDir, 'repo-local Claude skill');
+    if (fs.existsSync(skillParentDir) && fs.readdirSync(skillParentDir).length === 0) {
+      removeIfExists(target, skillParentDir, 'empty repo-local Claude skills dir');
+    }
+    return;
+  }
+  log.message(`keep unknown ${displayPath(target, skillDir)}`);
+  recordOperation('keptUnknown', target, skillDir);
+}
 function readJsonFile(filePath, fallback) {
   if (!fs.existsSync(filePath)) return fallback;
   try {
@@ -417,6 +481,9 @@ function copySelected(scope, components, agent) {
   if (includesClaude(agent) && components.includes('commands')) {
     copyTinkCommands(templateRoot, target);
   }
+  if (agent === 'codex') {
+    removeRepoLocalClaudeTinkSurface(target);
+  }
   if (components.includes('skill')) {
     if (includesClaude(agent)) {
       copyDir(path.join(templateRoot, 'claude/skills'), path.join(target, '.claude/skills'), target);

package/commands/cast.md CHANGED Viewed

@@ -144,6 +144,7 @@ After approval, create `.tink/current/` with these files before doing deeper wor
 - `session.json`: lightweight session metadata, especially rule ids already loaded by phase
 - `context-pack.md`: human-readable selected context, including why each item is relevant
 - `context-map.json`: machine-readable included and excluded context with reasons
+- `context-metrics-evaluation.json`: measured or estimated context-efficiency scores, formulas, evidence refs, and limits
 - `excluded-context.md`: notable omitted files, tools, sources, or claims and why they were excluded
 Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
@@ -189,9 +190,11 @@ Create context artifacts before deeper implementation work:
 - `context-pack.md` should name the user task, selected harnesses, contract summary, loaded rules, selected files/docs, selected external sources, and verification implications.
 - `context-map.json` should contain `task`, `included`, `excluded`, `signals`, and `generated_at`. Each included or excluded entry should include `path` or `source`, `kind`, `reason`, and `confidence`. When external context is selected, also write `external_context[]`.
 - When useful, enrich each context entry with Context Budget Ledger fields: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`. Use them to explain why the first context pack is small enough, why excluded context should stay out, and which checks prove selected context matters.
+- `context-metrics-evaluation.json` should contain `run`, `evaluator`, `target_threshold_percent`, `measurement_status`, `scope`, `limits`, and `scores[]`. Each score should include `name`, `score_percent`, `formula`, `numerator`, `denominator`, and `evidence_refs`. If the score is based only on fixtures or the current run, say so in `scope` and `limits`; do not claim production-wide 90% without run-history or telemetry evidence.
 - `excluded-context.md` should make important omissions visible, especially files skipped because they are out of scope, stale, risky, too broad, or unverified external claims.
 If `.tink/schemas/context-map.schema.json` exists, use it for `context-map.json`. Do not paste the schema into the user response.
+If `.tink/schemas/context-metrics-evaluation.schema.json` exists, use it for `context-metrics-evaluation.json`. Do not paste the schema into the user response.
 Use deterministic context selection inside cast. Do not create or require a separate `tink index` command for this phase.
@@ -401,7 +404,7 @@ A task is trivial only when ALL of the following are true:
 15. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
 16. Ask for explicit approval before non-trivial work.
 17. After approval, read only the selected harness files and any approved run-only draft.
-18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, and `excluded-context.md`.
+18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`.
 19. Execute the first safe step immediately:
    - inspect relevant files,
    - run a read-only diagnostic,

package/docs/context-budget-ledger.ko.md CHANGED Viewed

@@ -48,6 +48,8 @@ Context Budget Ledger는 Tink가 context를 많이 모으는 대신, 왜 넣었
 실제 telemetry가 없으면 `measurement_status: "estimated"`로 두고 한계를 함께 적는다. 근거 없이 90% 달성으로 표시하지 않는다.
+fixture에서 비율을 계산할 수 있으면 `measurement_status: "measured"`를 쓸 수 있다. 이때도 scope가 fixture인지 production telemetry인지 분명히 적어야 한다. 자세한 계산 기준은 `docs/context-metrics-evaluator.ko.md`를 본다.
 ## 호환성 기준
 - Claude Code와 Codex가 같은 schema와 fixture를 읽는다.

package/docs/context-budget-ledger.md CHANGED Viewed

@@ -48,6 +48,8 @@ Entries with `role: "verification_target"` should connect to a command, manual c
 If there is no runtime telemetry yet, mark the scores as `measurement_status: "estimated"` and include the limits. Do not claim 90% without evidence.
+If fixture ratios can be calculated, `measurement_status: "measured"` is acceptable. The artifact must still state whether the scope is fixture evidence or production telemetry. See `docs/context-metrics-evaluator.md` for the calculation rules.
 ## Compatibility
 - Claude Code and Codex read the same schema and fixtures.

package/docs/context-metrics-evaluator.ko.md ADDED Viewed

@@ -0,0 +1,39 @@
+# Context Metrics Evaluator
+Context Metrics Evaluator는 Context Budget Ledger에 적힌 필드를 실제 비율로 계산하는 테스트 기준이다.
+이 기능은 새 public command가 아니다. `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않는다. `/tink:cast`와 `$tink:cast`는 `.tink/current/context-metrics-evaluation.json`을 run state artifact로 남기고, `tests/fixtures/current-run/context-metrics-evaluation.json`과 `tests/test_templates.py`는 같은 점수를 계산하는지 확인한다.
+영어판은 `docs/context-metrics-evaluator.md`에 있다.
+## 왜 필요한가
+`context-map.json.efficiency_metrics`에 점수를 사람이 직접 적기만 하면, 숫자가 좋아 보여도 근거가 약하다. Evaluator는 fixture를 다시 읽어서 다음을 계산한다.
+- excluded context가 `role`, `cost`, `reuse_signal`, `staleness`, `reason`을 갖는 비율.
+- included context가 `role`과 `cost`를 갖고, high-cost 항목은 `verification_link`를 갖는 비율.
+- included context가 `role`과 `verification_link`를 함께 갖는 비율.
+- `verification_target` 항목이 실제 verification command나 verification hint와 연결되는 비율.
+- 반복 path-case가 expected context role을 갖는 비율.
+- context-diff 변화가 verification link와 metric impact로 추적되는 비율.
+## 점수의 의미
+`fixture-ratio-v1`에서 90% 이상이라는 말은 “예시 artifact가 내부적으로 측정 가능하고 빠진 필드가 거의 없다”는 뜻이다.
+이것은 아직 “실제 모든 사용자 작업에서 90% 효율을 달성했다”는 뜻이 아니다. production telemetry나 여러 run record가 쌓이기 전까지는 scope를 `fixture`로 제한한다.
+## 완료 기준
+- 여섯 지표가 모두 fixture 계산 기준으로 90% 이상이다.
+- `context-map.json.efficiency_metrics.scores[]`와 `context-metrics-evaluation.json`의 점수가 일치한다.
+- 각 점수에는 `formula`, `numerator`, `denominator`, `evidence_refs`, `limit`가 있다.
+- `measurement_status`는 fixture에서 계산되면 `measured`로 둘 수 있지만, 문서에는 한계를 함께 적는다.
+- 설치된 schema는 `.tink/schemas/context-metrics-evaluation.schema.json`이다.
+## 호환성 기준
+- Claude Code와 Codex가 같은 artifact를 읽을 수 있어야 한다.
+- macOS와 Windows 모두에서 `npm test`로 검증되어야 한다.
+- 사용자 승인 없이 reusable memory, harness, rule, config를 저장하지 않는다.
+- Sentry와 release evidence bundling은 포함하지 않는다.

package/docs/context-metrics-evaluator.md ADDED Viewed

@@ -0,0 +1,39 @@
+# Context Metrics Evaluator
+Context Metrics Evaluator is a test-backed way to calculate the ratios recorded by Context Budget Ledger.
+It is not a new public command. It must not add a `tink index` command, watcher, generated cache, or hidden runtime index. `/tink:cast` and `$tink:cast` write `.tink/current/context-metrics-evaluation.json` as a run-state artifact, and `tests/fixtures/current-run/context-metrics-evaluation.json` plus `tests/test_templates.py` must calculate the same scores.
+Korean companion: `docs/context-metrics-evaluator.ko.md`.
+## Why It Exists
+If `context-map.json.efficiency_metrics` is only hand-written, the numbers can look better than the evidence. The evaluator re-reads the fixtures and calculates:
+- Excluded context with `role`, `cost`, `reuse_signal`, `staleness`, and `reason`.
+- Included context with `role` and `cost`, with high-cost entries justified by `verification_link`.
+- Included context with both `role` and `verification_link`.
+- `verification_target` entries linked to known verification commands or hints.
+- Repeated path-cases with expected context roles.
+- Context-diff changes traceable through verification links and metric impacts.
+## What The Score Means
+In `fixture-ratio-v1`, a score at or above 90% means the example artifacts are internally measurable and have very few missing fields.
+It does not mean that every real user run has reached 90% efficiency. Until production telemetry or multiple run records exist, the measurement scope stays `fixture`.
+## Done Criteria
+- All six metrics are at or above 90% under the fixture calculation.
+- `context-map.json.efficiency_metrics.scores[]` matches `context-metrics-evaluation.json`.
+- Each score has `formula`, `numerator`, `denominator`, `evidence_refs`, and `limit`.
+- `measurement_status` may be `measured` for fixture calculations, but the docs must state the limit.
+- The installed schema is `.tink/schemas/context-metrics-evaluation.schema.json`.
+## Compatibility
+- Claude Code and Codex read the same artifacts.
+- macOS and Windows are both verified through `npm test`.
+- Reusable memory, harness, rule, and config saves still require explicit approval.
+- Sentry and release evidence bundling are out of scope.

package/docs/context-run-history-rollup.ko.md ADDED Viewed

@@ -0,0 +1,36 @@
+# Context Run History Rollup
+Context Run History Rollup은 여러 run의 `context-metrics-evaluation.json` 점수를 묶어 90% 목표가 반복 작업에서도 유지되는지 보는 기준이다.
+이 기능은 새 public command가 아니다. `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않는다. 지금 단계에서는 `tests/fixtures/maintenance/context-metrics-rollup.json`과 `tests/test_templates.py`가 같은 rollup 점수를 계산하는지 확인한다.
+영어판은 `docs/context-run-history-rollup.md`에 있다.
+## 왜 필요한가
+current run 하나가 90% 이상이어도 반복 작업 전체가 안정적이라고 말하기는 어렵다. Rollup은 여러 run의 점수를 모아서 다음을 확인한다.
+- 각 지표의 평균 점수.
+- 각 지표의 최저 점수.
+- 모든 run이 여섯 지표를 빠짐없이 갖는지.
+- 각 지표가 평균과 최저점 모두 90% 이상인지.
+## 점수의 의미
+`scope: "run_history"`는 여러 run record를 묶은 값이라는 뜻이다.
+fixture에 있는 rollup은 production telemetry가 아니다. 실제 `.tink/runs/*` 기록이 충분히 쌓이기 전까지는 “대표 run-history fixture에서 90% 이상”이라고만 말한다.
+## 완료 기준
+- 여섯 지표가 모두 rollup 평균 90% 이상이다.
+- 여섯 지표가 모두 run별 최저점 90% 이상이다.
+- 각 score에는 `formula`, `numerator`, `denominator`, `evidence_refs`, `minimum_percent`가 있다.
+- `limits`에 production telemetry가 아님을 명시한다.
+## 호환성 기준
+- Claude Code와 Codex가 같은 artifact 이름과 metric 이름을 읽을 수 있어야 한다.
+- macOS와 Windows 모두에서 `npm test`로 검증되어야 한다.
+- 사용자 승인 없이 reusable memory, harness, rule, config를 저장하지 않는다.
+- Sentry와 release evidence bundling은 포함하지 않는다.

package/docs/context-run-history-rollup.md ADDED Viewed

@@ -0,0 +1,36 @@
+# Context Run History Rollup
+Context Run History Rollup combines multiple `context-metrics-evaluation.json` scores to check whether the 90% target holds across repeated work.
+It is not a new public command. It must not add a `tink index` command, watcher, generated cache, or hidden runtime index. At this stage, `tests/fixtures/maintenance/context-metrics-rollup.json` and `tests/test_templates.py` must calculate the same rollup scores.
+Korean companion: `docs/context-run-history-rollup.ko.md`.
+## Why It Exists
+A single current run above 90% is useful, but it does not prove repeated work is stable. The rollup combines several runs and checks:
+- Average score for each metric.
+- Minimum score for each metric.
+- Whether every run records all six metrics.
+- Whether both average and minimum are at or above 90%.
+## What The Score Means
+`scope: "run_history"` means the score combines multiple run records.
+The fixture rollup is not production telemetry. Until enough real `.tink/runs/*` records exist, describe it as representative run-history fixture evidence only.
+## Done Criteria
+- All six metrics have rollup averages at or above 90%.
+- All six metrics have per-run minimums at or above 90%.
+- Each score has `formula`, `numerator`, `denominator`, `evidence_refs`, and `minimum_percent`.
+- `limits` states that the data is not production telemetry.
+## Compatibility
+- Claude Code and Codex read the same artifact names and metric names.
+- macOS and Windows are both verified through `npm test`.
+- Reusable memory, harness, rule, and config saves still require explicit approval.
+- Sentry and release evidence bundling are out of scope.

package/docs/context-run-record-policy.ko.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Context Run Record Policy
+Context Run Record Policy는 실제 `.tink/runs/*` 기록을 나중에 rollup할 때 어떤 기록을 써도 되는지 정하는 기준이다.
+이 문서는 새 public command가 아니다. `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않는다. 자동 수집도 하지 않는다.
+영어 문서는 `docs/context-run-record-policy.md`에 있다.
+## 왜 필요한가
+현재 90% 달성 근거는 repository fixture와 대표 run-history fixture 기준이다. 이것만으로 “모든 실제 프로젝트에서 production-wide 90%가 보장된다”고 말하면 안 된다.
+실제 run 기록으로 넘어가려면 먼저 다음을 정해야 한다.
+- 어떤 `.tink/runs/*` 기록을 rollup에 포함할 수 있는가.
+- 어떤 정보는 민감하거나 너무 넓어서 제외해야 하는가.
+- metric score가 verification evidence와 연결되어 있는가.
+- Claude Code와 Codex, macOS와 Windows에서 같은 기준으로 검증할 수 있는가.
+## 포함 가능한 기록
+포함 가능한 기록은 사용자가 승인한 current-run에서 나온 완료 기록이어야 한다.
+- run id 또는 run path.
+- 완료 시각.
+- 사용한 surface: Claude Code 또는 Codex.
+- 사용한 platform: macOS 또는 Windows.
+- `context-metrics-evaluation.json` 형태의 여섯 지표 점수.
+- 검증 결과와 check 목록.
+- 짧은 evidence handle.
+- production telemetry인지, fixture인지, 대표 run인지에 대한 limit.
+## 제외해야 할 기록
+다음은 run-history rollup에 넣지 않는다.
+- token, credential, raw private payload.
+- private issue, dashboard, Figma file, discussion 전체 복사본.
+- 별도 승인 없는 `.tink/memory/*`, `.tink/rules/*`, `.tink/harnesses/*` reusable state 변경.
+- Sentry 연동.
+- release evidence bundling.
+## 완료 기준
+- 여섯 지표가 모두 기록에 존재한다.
+- metric score마다 근거가 있다.
+- verification result와 checks가 연결되어 있다.
+- limit가 production telemetry인지 아닌지 명확히 말한다.
+- 새 public command, watcher, generated cache, hidden runtime index가 없다.
+- macOS와 Windows 모두 `npm test`로 검증할 수 있다.

package/docs/context-run-record-policy.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Context Run Record Policy
+Context Run Record Policy defines which real `.tink/runs/*` records can later be used for run-history rollup.
+This is not a new public command. It does not create a `tink index` command, watcher, generated cache, or hidden runtime index. It also does not collect records automatically.
+Korean documentation is available in `docs/context-run-record-policy.ko.md`.
+## Why This Exists
+The current 90 percent evidence is based on repository fixtures and representative run-history fixtures. That is not enough to claim production-wide 90 percent behavior across every real project.
+Before moving to real run records, Tink needs a clear answer to these questions:
+- Which `.tink/runs/*` records can be included in a rollup?
+- Which data is sensitive or too broad and must be excluded?
+- Are metric scores linked to verification evidence?
+- Can Claude Code and Codex, on macOS and Windows, verify the same criteria?
+## Records That Can Be Included
+Included records must come from an approved current run and represent a completed work record.
+- Run id or run path.
+- Completion timestamp.
+- Surface: Claude Code or Codex.
+- Platform: macOS or Windows.
+- Six metric scores shaped like `context-metrics-evaluation.json`.
+- Verification result and check list.
+- Short evidence handles.
+- Explicit limits that say whether the record is production telemetry, a fixture, or a representative run.
+## Records That Must Be Excluded
+Do not include these in run-history rollup:
+- Tokens, credentials, or raw private payloads.
+- Full private issue text, whole dashboards, entire Figma files, or complete discussions.
+- Unapproved reusable state changes under `.tink/memory/*`, `.tink/rules/*`, or `.tink/harnesses/*`.
+- Sentry integration.
+- Release evidence bundling.
+## Completion Criteria
+- All six metrics are present.
+- Each metric score has evidence.
+- Verification result and checks are linked.
+- Limits clearly state whether the data is production telemetry.
+- No new public command, watcher, generated cache, or hidden runtime index exists.
+- macOS and Windows can verify the criteria with `npm test`.

package/docs/context-threshold-status.ko.md ADDED Viewed

@@ -0,0 +1,43 @@
+# Context Threshold Status
+Context Threshold Status는 여섯 가지 컨텍스트 효율 지표가 90% 목표를 넘었는지 한눈에 확인하는 상태판이다.
+현재 상태판은 `tests/fixtures/current-run/context-metrics-evaluation.json`과 `tests/fixtures/maintenance/context-metrics-rollup.json`을 함께 본다. 즉, 단일 current run fixture와 여러 대표 run을 묶은 run-history rollup이 모두 90% 이상인지 확인한다.
+이 문서는 새 public command가 아니다. `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않는다. 자동으로 사용자의 repo 데이터를 수집하지도 않는다.
+영어 문서는 `docs/context-threshold-status.md`에 있다.
+## 현재 상태
+| 지표 | current run | rollup 평균 | rollup 최저 | 상태 |
+| --- | ---: | ---: | ---: | --- |
+| 불필요 context 포함률 감소 | 100% | 97% | 94% | 90% 이상 |
+| 초기 context pack 크기 감소 | 100% | 95% | 92% | 90% 이상 |
+| 리뷰자가 근거 찾는 시간 감소 | 100% | 98% | 96% | 90% 이상 |
+| 검증 누락 탐지율 개선 | 100% | 99% | 98% | 90% 이상 |
+| 반복 작업 context 재사용 정확도 | 100% | 96% | 94% | 90% 이상 |
+| 재작업 가능성 감소 | 100% | 95% | 91% | 90% 이상 |
+## 왜 필요한가
+이 상태판이 없으면 90% 이상이라는 말을 어디까지 믿어도 되는지 다시 추리해야 한다.
+- current run fixture는 지금 만든 artifact가 완전한지 본다.
+- run-history rollup은 반복 작업에서도 점수가 유지되는지 본다.
+- minimum score는 특정 작업 단위가 90% 아래로 떨어지는지 본다.
+## 한계
+현재 상태는 repository fixture와 대표 run-history fixture 기준이다. production telemetry가 아니다.
+따라서 사용자에게 “모든 실제 프로젝트에서 90% 이상 보장”이라고 말하면 안 된다. 실제 `.tink/runs/*` 기록이 충분히 쌓이면 같은 계산식으로 다시 rollup해야 한다.
+## 완료 기준
+- 여섯 지표 모두 current run score가 90% 이상이다.
+- 여섯 지표 모두 rollup 평균이 90% 이상이다.
+- 여섯 지표 모두 rollup 최저점이 90% 이상이다.
+- `limits`에 production telemetry가 아니라는 한계를 명시한다.
+- Claude Code와 Codex 모두 같은 artifact 이름과 metric 이름을 읽을 수 있다.
+- macOS와 Windows 모두 `npm test`로 검증할 수 있다.

package/docs/context-threshold-status.md ADDED Viewed

@@ -0,0 +1,43 @@
+# Context Threshold Status
+Context Threshold Status is a compact status board for checking whether all six context efficiency metrics meet the 90 percent target.
+The current status compares `tests/fixtures/current-run/context-metrics-evaluation.json` with `tests/fixtures/maintenance/context-metrics-rollup.json`. In plain terms, it checks both the single current-run fixture and the representative run-history rollup.
+This is not a new public command. It does not create a `tink index` command, watcher, generated cache, or hidden runtime index. It also does not collect user repository data automatically.
+Korean documentation is available in `docs/context-threshold-status.ko.md`.
+## Current Status
+| Metric | Current run | Rollup average | Rollup minimum | Status |
+| --- | ---: | ---: | ---: | --- |
+| unnecessary_context_reduction | 100% | 97% | 94% | >= 90% |
+| initial_context_pack_size_reduction | 100% | 95% | 92% | >= 90% |
+| review_evidence_lookup_time_reduction | 100% | 98% | 96% | >= 90% |
+| verification_omission_detection | 100% | 99% | 98% | >= 90% |
+| repeated_context_reuse_accuracy | 100% | 96% | 94% | >= 90% |
+| rework_probability_reduction | 100% | 95% | 91% | >= 90% |
+## Why This Exists
+Without this status board, a reviewer has to infer what the 90 percent claim means from several separate files.
+- The current-run fixture checks whether the latest artifact shape is complete.
+- The run-history rollup checks whether repeated work stays above the target.
+- The minimum score catches a single work unit falling below 90 percent.
+## Limits
+This status is based on repository fixtures and representative run-history fixtures. It is not production telemetry.
+Do not claim production-wide 90 percent performance until enough real `.tink/runs/*` records are accumulated and rolled up with the same formulas.
+## Completion Criteria
+- All six current-run scores are at least 90 percent.
+- All six rollup averages are at least 90 percent.
+- All six rollup minimums are at least 90 percent.
+- `limits` clearly states that this is not production telemetry.
+- Claude Code and Codex can read the same artifact names and metric names.
+- macOS and Windows can verify the status with `npm test`.

package/docs/pr/2026-06-08-codex-surface-cleanup.ko.md ADDED Viewed

@@ -0,0 +1,27 @@
+# Codex-only update에서 repo-local Claude surface 정리
+## 문제
+다른 repo에서 `npx tink-harness@latest update` 후 Codex skill picker에 `Source Command Tink Frog/List/...`와 넓은 `Tink` 항목이 보일 수 있었음.
+이 항목들은 Codex action skill이 아니라 repo-local `.claude/commands/tink/*.md`와 `.claude/skills/tink/SKILL.md`가 Codex에 함께 노출된 것이었음. 사용자는 Codex용 `$tink:*` action skill만 기대하기 때문에 설치가 잘못된 것처럼 보일 수 있었음.
+## 해결
+- Codex-only install/update에서는 repo-local Tink Claude command와 Claude skill을 정리하도록 installer를 보강했음.
+- `tink-cast`, `tink-verify`, `tink-list`, `tink-frog`, `tink-weave`, `tink-setup`, `tink-update` Codex action skill은 그대로 유지함.
+- Claude Code와 Codex를 둘 다 선택한 `all` surface에서는 repo-local Claude command가 남을 수 있음을 troubleshooting 문서에 명확히 적었음.
+- update verification recipe에 Codex-only 기대 상태를 추가했음.
+## 검증
+- `npm test`
+- `git diff --check`
+- `npm pack --dry-run --json`
+## 참고
+- 새 public command를 추가하지 않았음.
+- `tink index`, watcher, generated cache, hidden runtime index를 추가하지 않았음.
+- Sentry와 release evidence bundling은 포함하지 않았음.
+- 다른 repo에서 바로 적용하려면 npm publish가 필요함.

package/docs/pr/2026-06-08-context-metrics-artifact.ko.md ADDED Viewed

@@ -0,0 +1,30 @@
+# Context Metrics Artifact
+## 문제
+Context Metrics Evaluator가 fixture 기준으로 여섯 지표를 계산할 수 있게 되었지만, 실제 `/tink:cast`와 `$tink:cast` run state에는 아직 `context-metrics-evaluation.json`이 공식 산출물로 연결되어 있지 않았다. 이 상태에서는 90% 목표를 current run마다 반복 검증하기 어렵다.
+## 해결
+- `templates/tink/schemas/context-metrics-evaluation.schema.json`을 추가했다.
+- `/tink:cast`와 `$tink:cast` 지침에 `.tink/current/context-metrics-evaluation.json` 생성 규칙을 추가했다.
+- Work State Guide에 metrics evaluation 읽기 순서를 추가했다.
+- Context Metrics Evaluator 문서를 run-state artifact 기준으로 갱신했다.
+- `v1.4.0` 릴리즈 메타데이터, README, CHANGELOG, VERSIONING을 갱신했다.
+## 검증
+- `npm test`
+- `git diff --check`
+- `claude plugin validate .claude-plugin/plugin.json`
+- `claude plugin validate .claude-plugin/marketplace.json`
+- `npm pack --dry-run --json`
+## 참고
+- 새 public command는 추가하지 않았다.
+- `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않았다.
+- Sentry는 포함하지 않았다.
+- release evidence bundling은 포함하지 않았다.
+- 점수의 scope와 limit를 명시해야 하며, run-history나 telemetry 없이 production-wide 90%를 주장하지 않는다.
+- Claude Code와 Codex, macOS와 Windows 동시 지원을 기준으로 작성했다.

package/docs/pr/2026-06-08-context-metrics-evaluator.ko.md ADDED Viewed

@@ -0,0 +1,28 @@
+# Context Metrics Evaluator
+## 문제
+Context Budget Ledger로 context entry에 역할, 비용, 재사용 신호, 검증 연결을 남길 수 있게 되었지만, `efficiency_metrics` 점수는 아직 사람이 적은 값에 가까웠다. 목표 지표를 90% 이상으로 반복 개선하려면 점수가 실제 fixture에서 다시 계산되어야 한다.
+## 해결
+- `tests/fixtures/current-run/context-metrics-evaluation.json`을 추가해 여섯 지표의 계산식, 분자, 분모, evidence ref를 기록했다.
+- `context-map.json.efficiency_metrics`를 `fixture-ratio-v1` 기준의 `measured` 점수로 갱신했다.
+- `tests/test_templates.py`가 context-map, context-diff, contract, repo signal, path-case fixture를 다시 읽어 점수를 직접 계산하도록 했다.
+- 반복 path-case에 expected context role을 보강해 재사용 정확도도 계산할 수 있게 했다.
+- 한국어 우선 문서 `docs/context-metrics-evaluator.ko.md`와 영어 companion을 추가했다.
+## 검증
+- `npm test`
+- `git diff --check`
+- `npm pack --dry-run --json`
+## 참고
+- 새 public command는 추가하지 않았다.
+- `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않았다.
+- Sentry는 포함하지 않았다.
+- release evidence bundling은 포함하지 않았다.
+- 이번 점수는 fixture scope의 측정값이며, production telemetry 전체가 90%에 도달했다는 뜻은 아니다.
+- Claude Code와 Codex, macOS와 Windows 동시 지원 기준을 유지했다.

package/docs/pr/2026-06-08-context-run-history-rollup.ko.md ADDED Viewed

@@ -0,0 +1,27 @@
+# Context Run History Rollup
+## 문제
+`context-metrics-evaluation.json`으로 current run 하나의 여섯 지표는 계산할 수 있게 되었지만, 반복 작업 전체가 90% 이상을 유지하는지는 아직 확인하기 어려웠다.
+## 해결
+- `tests/fixtures/maintenance/context-metrics-rollup.json`을 추가해 여러 run의 점수를 묶었다.
+- 테스트가 rollup 평균, 최저점, evidence ref, metric 누락 여부를 직접 계산하도록 했다.
+- 한국어 우선 문서 `docs/context-run-history-rollup.ko.md`와 영어 companion을 추가했다.
+- README에는 링크만 추가해 본문을 늘리지 않았다.
+## 검증
+- `npm test`
+- `git diff --check`
+- `npm pack --dry-run --json`
+## 참고
+- 새 public command는 추가하지 않았다.
+- `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않았다.
+- Sentry는 포함하지 않았다.
+- release evidence bundling은 포함하지 않았다.
+- 이번 값은 대표 run-history fixture 기준이며, production telemetry 전체가 90%에 도달했다는 뜻은 아니다.
+- Claude Code와 Codex, macOS와 Windows 동시 지원 기준을 유지했다.

package/docs/pr/2026-06-08-context-run-record-policy.ko.md ADDED Viewed

@@ -0,0 +1,25 @@
+# Context Run Record Policy로 실제 run rollup 전 안전 기준 추가
+## 문제
+현재 90% 달성 근거는 current-run fixture와 대표 run-history fixture까지 확장되었지만, 실제 `.tink/runs/*` 기록을 어떤 기준으로 rollup에 포함할지는 아직 명확하지 않다. 이 기준 없이 자동 수집이나 hidden cache부터 만들면 사용자 영역을 침범할 수 있다.
+## 해결
+- `tests/fixtures/maintenance/context-run-record-policy.json`에 실제 run 기록으로 넘어가기 전 필요한 포함/제외 기준을 정의했다.
+- `docs/context-run-record-policy.ko.md`와 영어 companion 문서를 추가했다.
+- `tests/test_templates.py`가 새 정책 fixture에서 public command, watcher, hidden runtime index, generated cache, Sentry가 제외되어 있는지 검증하게 했다.
+- README에는 관련 문서 링크만 추가했다.
+## 검증
+- `npm test`
+- `git diff --check`
+- `npm pack --dry-run --json`
+## 참고
+- 새 public command를 추가하지 않았다.
+- `tink index`, watcher, generated cache, hidden runtime index를 추가하지 않았다.
+- 실제 run 기록을 자동 수집하지 않는다.
+- 사용자에게 보이는 설치/명령/스키마 동작 변경이 아니므로 version bump, npm publish, GitHub Release는 진행하지 않는다.

package/docs/pr/2026-06-08-context-threshold-status.ko.md ADDED Viewed

@@ -0,0 +1,25 @@
+# Context Threshold Status로 90% 달성 근거 정리
+## 문제
+current-run fixture와 run-history rollup은 각각 존재하지만, 여섯 지표가 90% 이상인지 한곳에서 보기 어렵다. 리뷰자는 current run 점수, rollup 평균, rollup 최저점을 다시 대조해야 한다.
+## 해결
+- `tests/fixtures/maintenance/context-threshold-status.json`에 여섯 지표의 current-run 점수, rollup 평균, rollup 최저점을 함께 기록했다.
+- `tests/test_templates.py`가 이 상태판을 기존 fixture 두 개에서 다시 계산해 검증하도록 했다.
+- `docs/context-threshold-status.ko.md`와 영어 companion 문서를 추가해 90% 달성 범위와 한계를 설명했다.
+- README에는 본문을 늘리지 않고 관련 문서 링크만 추가했다.
+## 검증
+- `npm test`
+- `git diff --check`
+- `npm pack --dry-run --json`
+## 참고
+- 새 public command를 추가하지 않았다.
+- `tink index`, watcher, generated cache, hidden runtime index를 추가하지 않았다.
+- Sentry와 release evidence bundling은 포함하지 않았다.
+- 현재 근거는 production telemetry가 아니라 repository fixture와 대표 run-history fixture다.

package/docs/pr/2026-06-08-v1.5.0.ko.md ADDED Viewed

@@ -0,0 +1,30 @@
+# v1.5.0 릴리스
+## 문제
+다른 repo에서 `npx tink-harness@latest update`를 실행한 뒤 Codex skill picker에 `Source Command Tink Frog/List/...`와 넓은 `Tink` 항목이 같이 보이는 문제가 있었음.
+이 항목들은 새 Codex `$tink:*` action skill이 아니라 repo-local `.claude/commands/tink/*.md`와 `.claude/skills/tink/SKILL.md`가 Codex에 함께 노출된 것이었음. Codex-only 사용자는 `$tink:cast`, `$tink:verify`처럼 필요한 action skill만 보이길 기대하므로 update 결과가 낡거나 잘못된 것처럼 보였음.
+## 해결
+- Codex-only install/update에서 repo-local Tink Claude Code command 파일을 정리하도록 했음.
+- Codex-only install/update에서 알려진 생성물 형태의 repo-local `.claude/skills/tink/SKILL.md`도 정리하도록 했음.
+- 사용자 작성 가능성이 있는 알 수 없는 `.claude/skills/tink` 내용은 지우지 않도록 보수적으로 처리했음.
+- Claude Code와 Codex를 둘 다 설치한 경우에는 repo-local Claude command가 남을 수 있음을 troubleshooting 문서에 명확히 적었음.
+- README와 한국어 README에 v1.5.0 최신 릴리스 요약을 반영했음.
+## 검증
+- `npm test`
+- `git diff --check`
+- `npm pack --dry-run --json`
+- temp repo에서 Codex-only update smoke 검증
+- GitHub Actions CI 확인 예정
+## 참고
+- 새 public command를 추가하지 않았음.
+- `tink index`, watcher, generated cache, hidden runtime index를 추가하지 않았음.
+- Sentry나 release evidence bundling은 포함하지 않았음.
+- 이 수정은 다른 repo에서 `npx tink-harness@latest update`로 받아야 하므로 npm publish가 필요함.

package/docs/update-troubleshooting.ko.md CHANGED Viewed

@@ -19,13 +19,15 @@ README에는 긴 문제 해결 내용을 넣지 않는다. 평소에는 설치
 가능한 원인:
 - 이전 버전의 넓은 Codex `skills/tink/SKILL.md`가 남아 있다.
+- repo-local Claude Code 파일인 `.claude/commands/tink/*.md`가 Codex에서 `Source Command Tink ...`로 보이고 있다.
+- repo-local Claude Code skill인 `.claude/skills/tink/SKILL.md`가 Codex에서 넓은 `Tink` skill로 보이고 있다.
 - `CODEX_HOME`이 예상한 위치와 다르다.
 - Codex UI가 아직 skill 목록을 새로 읽지 않았다.
 확인:
 ```bash
-npx tink-harness@latest update --yes
+TINK_INSTALL_SURFACES=codex npx tink-harness@latest update --yes
 ```
 그 다음 `Update Result Summary`에서 다음을 본다.
@@ -37,9 +39,12 @@ npx tink-harness@latest update --yes
 기대 상태:
 - `skills/tink`는 없어야 한다.
+- Codex-only update에서는 `.claude/commands/tink/*.md`와 `.claude/skills/tink/SKILL.md`가 없어야 한다.
 - `tink-cast`, `tink-verify`, `tink-list`, `tink-frog`, `tink-weave`, `tink-setup`, `tink-update`가 있어야 한다.
 - `tink-core/RULES.md`는 내부 공유 규칙으로 존재하지만 picker에 action으로 보이지 않는 것이 정상이다.
+Claude Code와 Codex를 둘 다 선택해서 설치한 경우에는 repo-local Claude command가 남을 수 있다. 이때 `Source Command Tink ...` 항목은 repo에 Claude Code command가 설치되어 있기 때문에 보이는 것이다.
 ## schema files가 없을 때
 가능한 원인:

package/docs/update-troubleshooting.md CHANGED Viewed

@@ -19,13 +19,15 @@ After `tink-harness update`, read `Update Result Summary`.
 Possible causes:
 - The old broad Codex `skills/tink/SKILL.md` is still present.
+- Repo-local Claude Code files such as `.claude/commands/tink/*.md` are being shown by Codex as `Source Command Tink ...`.
+- Repo-local Claude Code skill `.claude/skills/tink/SKILL.md` is being shown by Codex as a broad `Tink` skill.
 - `CODEX_HOME` points somewhere unexpected.
 - Codex has not refreshed the skill list yet.
 Run:
 ```bash
-npx tink-harness@latest update --yes
+TINK_INSTALL_SURFACES=codex npx tink-harness@latest update --yes
 ```
 Then check `Codex target`, `Removed legacy paths`, and `Codex skills` in the summary.
@@ -33,9 +35,12 @@ Then check `Codex target`, `Removed legacy paths`, and `Codex skills` in the sum
 Expected state:
 - `skills/tink` is gone.
+- `.claude/commands/tink/*.md` and `.claude/skills/tink/SKILL.md` are gone for a Codex-only update.
 - `tink-cast`, `tink-verify`, `tink-list`, `tink-frog`, `tink-weave`, `tink-setup`, and `tink-update` exist.
 - `tink-core/RULES.md` exists as shared internal rules, but should not appear as a user action in the picker.
+If you intentionally selected both Claude Code and Codex, repo-local Claude commands may remain. In that case `Source Command Tink ...` entries are expected because the repo still has Claude Code commands installed.
 ## Schema Files Are Missing
 Check that `Install target` is the repo you meant to update.

package/docs/update-verification-recipe.ko.md CHANGED Viewed

@@ -49,6 +49,7 @@ Codex surface를 갱신했다면 다음이 있어야 한다.
 - `skills/tink`는 제거되어야 한다.
 - picker에는 action skill이 보이고, `tink-core`는 공유 규칙으로만 쓰인다.
+- Codex-only update에서는 repo-local `.claude/commands/tink/*.md`와 `.claude/skills/tink/SKILL.md`가 제거되어 Codex에 `Source Command Tink ...` 또는 넓은 repo-local `Tink` 항목이 보이지 않아야 한다.
 ### 3. Claude Code command 확인
@@ -64,6 +65,8 @@ Claude Code surface를 갱신했다면 다음 command가 있어야 한다.
 repo 개발 중에는 `npm test`가 3-copy sync를 확인한다.
+Claude Code와 Codex를 둘 다 갱신했다면 repo-local Claude command가 Codex에서 `Source Command Tink ...`로 보일 수 있다. 이것은 `all` surface에서는 예상 가능한 상태이며, Codex-only update에서는 피한다.
 ### 4. Schema 확인
 다음 파일이 있어야 한다.

package/docs/update-verification-recipe.md CHANGED Viewed

@@ -49,6 +49,7 @@ Expected state:
 - `skills/tink` is removed.
 - The picker shows action skills.
+- For a Codex-only update, repo-local `.claude/commands/tink/*.md` and `.claude/skills/tink/SKILL.md` are removed so Codex does not show `Source Command Tink ...` or a broad repo-local `Tink` entry.
 - `tink-core` is shared internal guidance, not a user-facing action.
 ### 3. Claude Code Commands
@@ -65,6 +66,8 @@ For Claude Code, these commands should exist:
 When developing Tink itself, `npm test` verifies the 3-copy command sync rule.
+If you intentionally refreshed both Claude Code and Codex, these repo-local Claude commands may also appear in Codex as `Source Command Tink ...`. That is expected for the `all` surface and avoided by a Codex-only update.
 ### 4. Schemas
 These files should exist:

package/docs/work-state.ko.md CHANGED Viewed

@@ -21,11 +21,13 @@ Tink는 실행 상태를 파일로 남겨서 사람이 빠르게 네 가지를
    - 선택된 파일, 문서, 규칙, 외부 source의 사람용 요약을 읽습니다.
 3. `.tink/current/context-map.json`
    - `included`, `excluded`, `signals`, `external_context`를 구조적으로 확인합니다.
-4. `.tink/current/excluded-context.md`
+4. `.tink/current/context-metrics-evaluation.json`
+   - context 효율 점수, 계산식, 분자/분모, evidence ref, 측정 scope와 한계를 확인합니다.
+5. `.tink/current/excluded-context.md`
    - 오래됐거나, 위험하거나, 너무 넓거나, 접근할 수 없거나, 범위 밖이라 제외한 context를 확인합니다.
-5. `.tink/current/verification.json`
+6. `.tink/current/verification.json`
    - pass, fail, blocked, skipped 검증 결과와 최종 report를 확인합니다.
-6. `.tink/current/notes.md`
+7. `.tink/current/notes.md`
    - 마지막 안전 지점, 복구 메모, 짧은 검증 요약을 읽습니다.
 ## Context 읽는 법
@@ -50,6 +52,8 @@ Tink는 실행 상태를 파일로 남겨서 사람이 빠르게 네 가지를
 자세한 기준은 `docs/context-budget-ledger.ko.md`를 봅니다.
+`context-metrics-evaluation.json`은 점수가 어떻게 나왔는지 보여줍니다. `scope: "fixture"`나 `scope: "current_run"`이면 해당 범위 안에서만 측정된 값입니다. 여러 run record나 production telemetry가 없으면 전체 사용자 작업이 90%에 도달했다고 말하지 않습니다.
 `signals[]`에 `kind: "context_graph_rule"`가 있으면 `/tink:cast`나 `$tink:cast`가 changed path를 보고 고른 작은 단서로 읽습니다. `context_graph_lite.rules.claude-command-sync` 같은 안정적인 `source_ref`를 가리키고, 왜 관련 파일을 함께 포함했는지 설명해야 합니다. 이 신호는 cast 내부 선택 근거일 뿐이며 public `tink index` 명령, watcher, generated cache, hidden runtime index를 뜻하지 않습니다.
 외부 context는 다음 항목을 확인합니다.

package/docs/work-state.md CHANGED Viewed

@@ -19,11 +19,13 @@ Start here when resuming, reviewing, or handing off a run:
    - Read the short human summary of selected files, docs, rules, and external sources.
 3. `.tink/current/context-map.json`
    - Inspect the structured `included`, `excluded`, `signals`, and `external_context` entries.
-4. `.tink/current/excluded-context.md`
+4. `.tink/current/context-metrics-evaluation.json`
+   - Check context-efficiency scores, formulas, numerators, denominators, evidence refs, measurement scope, and limits.
+5. `.tink/current/excluded-context.md`
    - Check what was skipped because it was stale, unsafe, too broad, unavailable, or outside scope.
-5. `.tink/current/verification.json`
+6. `.tink/current/verification.json`
    - Confirm pass, fail, blocked, or skipped checks and the final report.
-6. `.tink/current/notes.md`
+7. `.tink/current/notes.md`
    - Read the last safe point, recovery notes, and compact verification summaries.
 ## How To Read Context
@@ -48,6 +50,8 @@ When context entries include Context Budget Ledger fields, read them this way:
 See `docs/context-budget-ledger.md` for the detailed rules.
+`context-metrics-evaluation.json` explains how the score was produced. If `scope` is `fixture` or `current_run`, the value is measured only inside that boundary. Do not claim all user work has reached 90% without run-history or production telemetry evidence.
 When `signals[]` includes `kind: "context_graph_rule"`, read it as a small changed-path clue selected by `/tink:cast` or `$tink:cast`. It should point to a stable `source_ref` such as `context_graph_lite.rules.claude-command-sync`, explain why related files were included, and stay internal to cast. It must not imply a public `tink index` command, watcher, generated cache, or hidden runtime index.
 For external context, check:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "tink-harness",
-  "version": "1.3.0",
+  "version": "1.5.0",
   "description": "Self-growing harnesses for Claude Code and Codex.",
   "license": "MIT",
   "type": "module",

package/templates/claude/commands/tink/cast.md CHANGED Viewed

@@ -144,6 +144,7 @@ After approval, create `.tink/current/` with these files before doing deeper wor
 - `session.json`: lightweight session metadata, especially rule ids already loaded by phase
 - `context-pack.md`: human-readable selected context, including why each item is relevant
 - `context-map.json`: machine-readable included and excluded context with reasons
+- `context-metrics-evaluation.json`: measured or estimated context-efficiency scores, formulas, evidence refs, and limits
 - `excluded-context.md`: notable omitted files, tools, sources, or claims and why they were excluded
 Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
@@ -189,9 +190,11 @@ Create context artifacts before deeper implementation work:
 - `context-pack.md` should name the user task, selected harnesses, contract summary, loaded rules, selected files/docs, selected external sources, and verification implications.
 - `context-map.json` should contain `task`, `included`, `excluded`, `signals`, and `generated_at`. Each included or excluded entry should include `path` or `source`, `kind`, `reason`, and `confidence`. When external context is selected, also write `external_context[]`.
 - When useful, enrich each context entry with Context Budget Ledger fields: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`. Use them to explain why the first context pack is small enough, why excluded context should stay out, and which checks prove selected context matters.
+- `context-metrics-evaluation.json` should contain `run`, `evaluator`, `target_threshold_percent`, `measurement_status`, `scope`, `limits`, and `scores[]`. Each score should include `name`, `score_percent`, `formula`, `numerator`, `denominator`, and `evidence_refs`. If the score is based only on fixtures or the current run, say so in `scope` and `limits`; do not claim production-wide 90% without run-history or telemetry evidence.
 - `excluded-context.md` should make important omissions visible, especially files skipped because they are out of scope, stale, risky, too broad, or unverified external claims.
 If `.tink/schemas/context-map.schema.json` exists, use it for `context-map.json`. Do not paste the schema into the user response.
+If `.tink/schemas/context-metrics-evaluation.schema.json` exists, use it for `context-metrics-evaluation.json`. Do not paste the schema into the user response.
 Use deterministic context selection inside cast. Do not create or require a separate `tink index` command for this phase.
@@ -401,7 +404,7 @@ A task is trivial only when ALL of the following are true:
 15. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
 16. Ask for explicit approval before non-trivial work.
 17. After approval, read only the selected harness files and any approved run-only draft.
-18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, and `excluded-context.md`.
+18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`.
 19. Execute the first safe step immediately:
    - inspect relevant files,
    - run a read-only diagnostic,

package/templates/codex/skills/tink-core/RULES.md CHANGED Viewed

@@ -31,7 +31,7 @@ Accept legacy `$tink <action>` spelling for compatibility, but present `$tink:<a
 11. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
 12. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
 13. Current-run approval never authorizes reusable-state writes. Before saving reusable state, show operation, destination files, exact entry or patch summary, reusable reason, sensitive content excluded, and rollback/removal path.
-14. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, and `excluded-context.md`.
+14. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`.
 15. Do not stop at recommendation. Execute the first safe step after run state exists.
 16. Run `$tink:verify` behavior before final when `contract.json` lists required checks.
 17. Store reusable memory or rule updates under `.tink/` only after separate approval.
@@ -102,10 +102,13 @@ Create run state before deeper work:
 - `session.json`: loaded rule ids by phase and lightweight retrieval metadata
 - `.tink/current/context-pack.md`: human-readable selected context and why it matters
 - `.tink/current/context-map.json`: machine-readable included/excluded context and reasons
+- `.tink/current/context-metrics-evaluation.json`: measured or estimated context-efficiency scores, formulas, evidence refs, and limits
 - `.tink/current/excluded-context.md`: notable omitted context and why it was left out
 When useful, enrich `context-map.json.included[]` and `context-map.json.excluded[]` entries with Context Budget Ledger fields: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`. Use them to keep the first context pack small, mark stale or avoid-next-time context, and connect `verification_target` entries to command checks, manual checks, evidence refs, or verification hints. Do not claim any 90% efficiency score without measurement evidence.
+When writing `context-metrics-evaluation.json`, include `run`, `evaluator`, `target_threshold_percent`, `measurement_status`, `scope`, `limits`, and `scores[]`. Each score should include `name`, `score_percent`, `formula`, `numerator`, `denominator`, and `evidence_refs`. If the score comes only from fixture or current-run evidence, record that scope and do not claim production-wide 90% without run-history or telemetry evidence.
 When external context is needed for `$tink:cast`, write it through the MCP Safe Profile shape in `context-map.json.external_context[]`. Record `source`, `source_ref`, `kind`, `included`, `excluded`, `reason`, `confidence`, `sensitivity`, and `verification_hint` when useful. Treat Figma, GitHub, and official docs as representative examples, not the only supported sources; Linear, Jira, Supabase, dashboards, API responses, screenshots, attachments, and runbooks can follow the same shape.
 When repo signal fixtures contain `context_graph_lite.rules[]`, use those rules inside `$tink:cast` to choose the first related context candidates. Match changed paths against `when_paths`, consider `include_paths`, cite selected rules as `context_graph_rule` signals with `source_ref: "context_graph_lite.rules.<name>"`, and connect `signal_refs` to verification hints where relevant. If the fixture provides `context_budget_policy`, use it to assign roles, costs, reuse signals, verification links, staleness, and evidence kinds. Do not create a public `tink index` command, watch process, generated cache, or hidden runtime index.

package/templates/tink/schemas/context-metrics-evaluation.schema.json ADDED Viewed

@@ -0,0 +1,89 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://github.com/dotoricode/tink-harness/schemas/context-metrics-evaluation.schema.json",
+  "title": "Tink current run context metrics evaluation",
+  "type": "object",
+  "required": ["run", "evaluator", "target_threshold_percent", "measurement_status", "scope", "scores"],
+  "properties": {
+    "run": {
+      "type": "string",
+      "description": "Run state path, usually .tink/current."
+    },
+    "evaluator": {
+      "type": "string",
+      "description": "Stable evaluator id, such as fixture-ratio-v1."
+    },
+    "target_threshold_percent": {
+      "type": "number",
+      "minimum": 0,
+      "maximum": 100
+    },
+    "measurement_status": {
+      "type": "string",
+      "enum": ["estimated", "measured", "mixed"]
+    },
+    "scope": {
+      "type": "string",
+      "enum": ["fixture", "current_run", "run_history", "production_telemetry"]
+    },
+    "limits": {
+      "type": "array",
+      "items": { "type": "string" }
+    },
+    "scores": {
+      "type": "array",
+      "items": { "$ref": "#/$defs/metric_score" }
+    }
+  },
+  "$defs": {
+    "metric_score": {
+      "type": "object",
+      "required": [
+        "name",
+        "score_percent",
+        "formula",
+        "numerator",
+        "denominator",
+        "evidence_refs"
+      ],
+      "properties": {
+        "name": {
+          "type": "string",
+          "enum": [
+            "unnecessary_context_reduction",
+            "initial_context_pack_size_reduction",
+            "review_evidence_lookup_time_reduction",
+            "verification_omission_detection",
+            "repeated_context_reuse_accuracy",
+            "rework_probability_reduction"
+          ]
+        },
+        "score_percent": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 100
+        },
+        "formula": {
+          "type": "string"
+        },
+        "numerator": {
+          "type": "number",
+          "minimum": 0
+        },
+        "denominator": {
+          "type": "number",
+          "minimum": 0
+        },
+        "evidence_refs": {
+          "type": "array",
+          "items": { "type": "string" }
+        },
+        "limit": {
+          "type": "string"
+        }
+      },
+      "additionalProperties": true
+    }
+  },
+  "additionalProperties": true
+}