npm - tink-harness - Versions diffs - 1.4.0 → 1.6.0 - Mend

tink-harness 1.4.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

package/.claude-plugin/plugin.json +1 -1
package/CHANGELOG.md +25 -0
package/README.ko.md +13 -5
package/README.md +14 -5
package/VERSIONING.md +1 -1
package/bin/install.js +67 -0
package/commands/cast.md +4 -0
package/commands/frog.md +9 -0
package/commands/weave.md +46 -38
package/docs/context-run-history-rollup.ko.md +36 -0
package/docs/context-run-history-rollup.md +36 -0
package/docs/context-run-record-policy.ko.md +50 -0
package/docs/context-run-record-policy.md +50 -0
package/docs/context-threshold-status.ko.md +43 -0
package/docs/context-threshold-status.md +43 -0
package/docs/graph-rule-adoption-plan.ko.md +215 -0
package/docs/pr/2026-06-08-codex-surface-cleanup.ko.md +27 -0
package/docs/pr/2026-06-08-context-run-history-rollup.ko.md +27 -0
package/docs/pr/2026-06-08-context-run-record-policy.ko.md +25 -0
package/docs/pr/2026-06-08-context-threshold-status.ko.md +25 -0
package/docs/pr/2026-06-08-v1.5.0.ko.md +30 -0
package/docs/pr/2026-06-09-graph-rule-adoption-plan.ko.md +26 -0
package/docs/pr/2026-06-09-graph-rule-seed-rules.ko.md +27 -0
package/docs/pr/2026-06-09-v1.6.0.ko.md +27 -0
package/docs/update-troubleshooting.ko.md +6 -1
package/docs/update-troubleshooting.md +6 -1
package/docs/update-verification-recipe.ko.md +3 -0
package/docs/update-verification-recipe.md +3 -0
package/package.json +1 -1
package/skills/tink/SKILL.md +75 -73
package/templates/claude/commands/tink/cast.md +4 -0
package/templates/claude/commands/tink/frog.md +9 -0
package/templates/claude/commands/tink/weave.md +46 -38
package/templates/claude/skills/tink/SKILL.md +75 -73
package/templates/codex/skills/tink-core/RULES.md +10 -8
package/templates/tink/rules/index.json +239 -128

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "tink",
   "description": "A small harness layer for Claude Code and Codex.",
-  "version": "1.4.0",
+  "version": "1.6.0",
   "author": {
     "name": "dotori"
   }

package/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,31 @@ All notable changes to Tink are tracked here.
 No unreleased changes yet.
+## [1.6.0] - 2026-06-09
+### Added
+- Graph-rule seed rules now route common Tink maintenance work to the right supporting files, harnesses, and verification checks without adding a public `tink index` command.
+- `/tink:weave`, `/tink:frog`, and `$tink:*` guidance now treats rule `reason`, `risk`, `include_paths`, and `checks` as reviewable context-engineering evidence.
+- Korean PR history draft for the graph-rule seed rules work in `docs/pr/2026-06-09-graph-rule-seed-rules.ko.md`.
+- Korean PR history draft for the v1.6.0 release in `docs/pr/2026-06-09-v1.6.0.ko.md`.
+## [1.5.0] - 2026-06-08
+### Changed
+- Codex-only install/update now removes repo-local Tink Claude Code command files under `.claude/commands/tink/*.md` so Codex no longer shows them as stale `Source Command Tink ...` entries after a Codex-only refresh.
+- Codex-only install/update now removes the old repo-local `.claude/skills/tink/SKILL.md` Tink surface when it matches the known generated Tink skill, while preserving unknown user-authored content.
+- Update troubleshooting and verification docs now explain when `Source Command Tink ...` is stale and when it is expected because the repo intentionally keeps the Claude Code surface.
+- README and Korean README now highlight the v1.5.0 update behavior for existing Codex users.
+### Added
+- Regression coverage for Codex-only update cleanup of repo-local Claude command and skill surfaces.
+- Korean PR history draft for the v1.5.0 release in `docs/pr/2026-06-08-v1.5.0.ko.md`.
 ## [1.4.0] - 2026-06-08
 ### Added

package/README.ko.md CHANGED Viewed

@@ -1,4 +1,4 @@
-<p align="center">
+<p align="center">
   <img src=".github/assets/hero.gif" alt="Tink Hero Banner" width="100%">
 </p>
@@ -8,7 +8,7 @@ Claude Code와 Codex를 위한 작은 하네스 레이어입니다.
 Tink는 지금 작업에 맞는 하네스를 고르고, 실행 상태를 보이게 만들고, 실제 사용 중 생긴 실패와 피드백으로 하네스 세트를 개선합니다.
-**최신 릴리스:** v1.4.0 — context 효율 점수를 계산식, evidence ref, 측정 scope와 함께 남기는 Context Metrics Evaluator artifact.
+**최신 릴리스:** v1.6.0 — graph-rule seed routing으로 반복 작업에서 필요한 관련 파일, 하네스, 검증 체크를 더 잘 고릅니다.
 [English](README.md) · **한국어**
@@ -59,14 +59,22 @@ npx tink-harness@latest update
 업데이트 후 Codex skill, schema, Windows 경고가 이상해 보이면 `docs/update-troubleshooting.ko.md` 또는 `docs/update-troubleshooting.md`를 확인하세요.
-## 1.2.0에서 달라진 점
+## 1.6.0에서 달라진 점
+이번 릴리스는 Tink의 작은 rule graph를 실제 작업에서 더 쓸모 있게 만듭니다.
+- README 한/영 동기화, 버전 메타데이터 동기화, Claude Code 명령 3-copy 동기화, installer/update smoke check처럼 반복되는 작업에 필요한 관련 파일과 검증 체크를 seed rule로 연결합니다.
+- `/tink:cast`와 `$tink:cast`는 rule의 `reason`, `risk`, `include_paths`, `checks`를 context 증거로 남기도록 안내합니다.
+- `/tink:weave`와 `/tink:frog`는 rule quality를 함께 점검해서 keep, rewrite, split, merge, needs evidence로 정리할 수 있게 합니다.
+- 그래프는 계속 작고 파일 기반으로 유지합니다. 이번 릴리스도 public `tink index` 명령, watcher, generated cache, database, 외부 서비스를 추가하지 않습니다.
+## 1.2.0 이후 기반 개선
 이번 릴리스는 Tink를 Claude Code와 Codex에서 같은 하네스 레이어로 쓰기 쉽게 정리합니다.
 - Codex에는 하나의 넓은 `tink` 스킬 대신 `$tink:cast`, `$tink:verify` 같은 action skill만 보이도록 설치됩니다.
 - 비단순 작업은 `context-pack.md`, `context-map.json`, `excluded-context.md`로 어떤 context를 썼고 뺐는지 남깁니다.
 - Repo Signal과 Context Graph Lite는 새 `tink index` 명령을 만들지 않고도 관련 테스트, 스키마, 동기화 파일, 검증 힌트를 고르는 데 쓰입니다.
-- context 효율 점수화와 fixture 비율 계산 기준은 `docs/context-budget-ledger.ko.md`, `docs/context-budget-ledger.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-metrics-evaluator.md`에서 확인할 수 있습니다.
+- context 효율 점수화, fixture 비율 계산, run-history rollup, 90% threshold status, 실제 run 기록 경계 기준은 `docs/context-budget-ledger.ko.md`, `docs/context-budget-ledger.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-metrics-evaluator.md`, `docs/context-run-history-rollup.ko.md`, `docs/context-run-history-rollup.md`, `docs/context-threshold-status.ko.md`, `docs/context-threshold-status.md`, `docs/context-run-record-policy.ko.md`, `docs/context-run-record-policy.md`에서 확인할 수 있습니다.
 - `/tink:verify`와 `$tink:verify`는 같은 Verify Runner 모델을 쓰며 `.tink/current/verification.json`에 검증 증거를 남깁니다.
 - 외부 context는 MCP Safe Profile을 따릅니다. 가장 작은 source handle만 남기고, 신뢰도와 민감도를 표시하며, 위험하거나 너무 넓은 context는 `excluded-context.md`에 따로 기록합니다.
@@ -114,7 +122,7 @@ Tink는 직접 볼 수 있는 파일을 씁니다.
 Rule graph는 작게 유지합니다. Tink는 먼저 필수 규칙을 고르고, 작업 사실이나 keyword에 맞는 선택 규칙만 가져오며, phase별로 이미 읽은 rule id를 기록해 같은 안내를 반복하지 않습니다.
-설계 메모는 `docs/`에 둡니다. 기본 호환성 기준은 `docs/compatibility-policy.md`에 있으며, 새 작업은 Claude Code와 Codex, macOS와 Windows를 함께 고려해야 합니다. Repo Signal 동작은 `docs/repo-signals.ko.md` 또는 `docs/repo-signals.md`에 정리되어 있고, 외부 context 안전 기준은 `docs/mcp-safe-profile.md`에 정리되어 있습니다. `.tink/current/` 상태를 읽거나 검토할 때는 `docs/work-state.ko.md` 또는 `docs/work-state.md`부터 보면 됩니다. 다음 업데이트 안정화 계획은 `docs/phase-5-update-confidence.ko.md`와 `docs/phase-5-update-confidence.md`에 정리되어 있습니다. 더 큰 아이디어 구현 점검과 로드맵은 `docs/tink-idea-implementation-plan.ko.md`에 정리되어 있습니다.
+설계 메모는 `docs/`에 둡니다. 기본 호환성 기준은 `docs/compatibility-policy.md`에 있으며, 새 작업은 Claude Code와 Codex, macOS와 Windows를 함께 고려해야 합니다. Repo Signal 동작은 `docs/repo-signals.ko.md` 또는 `docs/repo-signals.md`에 정리되어 있고, 가벼운 graph 규칙 적용 계획은 `docs/graph-rule-adoption-plan.ko.md`에 정리되어 있습니다. 외부 context 안전 기준은 `docs/mcp-safe-profile.md`에 정리되어 있습니다. `.tink/current/` 상태를 읽거나 검토할 때는 `docs/work-state.ko.md` 또는 `docs/work-state.md`부터 보면 됩니다. 다음 업데이트 안정화 계획은 `docs/phase-5-update-confidence.ko.md`와 `docs/phase-5-update-confidence.md`에 정리되어 있습니다. 더 큰 아이디어 구현 점검과 로드맵은 `docs/tink-idea-implementation-plan.ko.md`에 정리되어 있습니다.
 중요한 원칙은 승인입니다. 현재 작업을 진행하는 승인과, 미래에도 재사용될 상태를 저장하는 승인은 별개입니다. 새 하네스, 메모리, rule graph, hook guard 저장은 항상 별도 승인이 필요합니다.

package/README.md CHANGED Viewed

@@ -17,14 +17,14 @@
 </p>
 <p>
-  <a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.4.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
+  <a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.6.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
   <a href="https://www.npmjs.com/package/tink-harness"><img src="https://img.shields.io/npm/v/tink-harness?label=npm&color=cb3837" alt="npm version"></a>
   <a href="https://github.com/dotoricode/tink-harness/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/dotoricode/tink-harness/ci.yml?branch=main&label=ci" alt="CI"></a>
   <a href="https://github.com/dotoricode/tink-harness/blob/main/LICENSE"><img src="https://img.shields.io/github/license/dotoricode/tink-harness" alt="License"></a>
   <a href="https://github.com/dotoricode/tink-harness/stargazers"><img src="https://img.shields.io/github/stars/dotoricode/tink-harness?style=social" alt="GitHub stars"></a>
 </p>
-<p><strong>Latest release:</strong> v1.4.0 — Context Metrics Evaluator artifact for measured context-efficiency scores.</p>
+<p><strong>Latest release:</strong> v1.6.0 - Graph-rule seed routing now helps Tink pick supporting files, harnesses, and verification checks for repeated work.</p>
 **English** · [한국어](README.ko.md)
@@ -124,14 +124,23 @@ To quickly verify the updated install, see `docs/update-verification-recipe.md`
 If an update looks stale or incomplete, see `docs/update-troubleshooting.md` or `docs/update-troubleshooting.ko.md`.
-## What's new in 1.2.0
+## What's new in 1.6.0
+This release makes Tink's small rule graph more useful during real work.
+- Seed rules now connect common maintenance work to related files, harnesses, and checks, such as README bilingual sync, version metadata sync, Claude Code command 3-copy sync, and installer/update smoke checks.
+- `/tink:cast` and `$tink:cast` guidance now records rule `reason`, `risk`, `include_paths`, and `checks` as reviewable context evidence instead of silently loading extra files.
+- `/tink:weave` and `/tink:frog` now include rule-quality review so reusable rules can be kept, rewritten, split, merged, or marked as needing more evidence.
+- The graph remains file-based and small. This release still does not add a public `tink index` command, watcher, generated cache, database, or external service.
+## Recent foundation from 1.2.0+
 This release makes Tink work as one harness layer across Claude Code and Codex.
 - Codex now installs focused `$tink:*` action skills instead of one broad visible `tink` skill, so the picker shows commands like `$tink:cast` and `$tink:verify` cleanly.
 - Non-trivial runs now create context artifacts: `context-pack.md`, `context-map.json`, and `excluded-context.md`.
 - Repo Signals and Context Graph Lite help `/tink:cast` and `$tink:cast` choose relevant tests, schemas, sync partners, and verification hints without adding a new `tink index` command.
-- Context Budget Ledger fields and fixture-ratio evaluation are documented in `docs/context-budget-ledger.md`, `docs/context-budget-ledger.ko.md`, `docs/context-metrics-evaluator.md`, and `docs/context-metrics-evaluator.ko.md` without adding a new command.
+- Context Budget Ledger fields, fixture-ratio evaluation, run-history rollup, the 90 percent threshold status, and future real-run record boundaries are documented in `docs/context-budget-ledger.md`, `docs/context-budget-ledger.ko.md`, `docs/context-metrics-evaluator.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-run-history-rollup.md`, `docs/context-run-history-rollup.ko.md`, `docs/context-threshold-status.md`, `docs/context-threshold-status.ko.md`, `docs/context-run-record-policy.md`, and `docs/context-run-record-policy.ko.md` without adding a new command.
 - `/tink:verify` and `$tink:verify` share one portable Verify Runner model and write compact evidence to `.tink/current/verification.json`.
 - External context now follows the MCP Safe Profile: include only the smallest useful source handle, mark confidence and sensitivity, exclude unsafe context visibly, and connect important claims to verification.
@@ -193,7 +202,7 @@ Tink uses files you can inspect:
 The rule graph stays small on purpose. Tink loads matching mandatory rules first, retrieves only relevant optional rules by task facts or keywords, and records loaded rule ids by phase so the same guidance is not repeated in one run.
-Design notes live in `docs/`. The compatibility baseline is `docs/compatibility-policy.md`: every new slice should consider Claude Code and Codex, plus macOS and Windows. Repo signal behavior is described in `docs/repo-signals.md` or `docs/repo-signals.ko.md`. External context safety is described in `docs/mcp-safe-profile.md` and `docs/external-context-policy.md`. To read or review `.tink/current/` state, start with `docs/work-state.md` or `docs/work-state.ko.md`. Update confidence is still documented in `docs/phase-5-update-confidence.md` or `docs/phase-5-update-confidence.ko.md`. The planned work-unit list is `docs/planned-work-units.md` or `docs/planned-work-units.ko.md`, with details in the verification evidence, harness lifecycle, memory decision, context change, and update diagnosis docs. The broader Korean idea audit and roadmap is `docs/tink-idea-implementation-plan.ko.md`.
+Design notes live in `docs/`. The compatibility baseline is `docs/compatibility-policy.md`: every new slice should consider Claude Code and Codex, plus macOS and Windows. Repo signal behavior is described in `docs/repo-signals.md` or `docs/repo-signals.ko.md`. The lightweight graph-rule adoption plan is `docs/graph-rule-adoption-plan.ko.md`. External context safety is described in `docs/mcp-safe-profile.md` and `docs/external-context-policy.md`. To read or review `.tink/current/` state, start with `docs/work-state.md` or `docs/work-state.ko.md`. Update confidence is still documented in `docs/phase-5-update-confidence.md` or `docs/phase-5-update-confidence.ko.md`. The planned work-unit list is `docs/planned-work-units.md` or `docs/planned-work-units.ko.md`, with details in the verification evidence, harness lifecycle, memory decision, context change, and update diagnosis docs. The broader Korean idea audit and roadmap is `docs/tink-idea-implementation-plan.ko.md`.
 The important rule is approval.

package/VERSIONING.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Versioning
-Current version: `1.4.0`
+Current version: `1.6.0`
 Tink follows semver from `1.0.0` onward.

package/bin/install.js CHANGED Viewed

@@ -369,6 +369,70 @@ function removeLegacyCodexSkill(codexTarget) {
   if (!dryRun) fs.rmSync(legacyDir, { recursive: true, force: true });
 }
+function removeIfExists(base, filePath, label = 'legacy') {
+  if (!fs.existsSync(filePath)) return false;
+  log.message(`${dryRun ? `would remove ${label}` : `remove ${label}`} ${displayPath(base, filePath)}`);
+  recordOperation('removedLegacy', base, filePath);
+  if (!dryRun) fs.rmSync(filePath, { recursive: true, force: true });
+  return true;
+}
+function removeRepoLocalClaudeTinkSurface(target) {
+  const commandDir = path.join(target, '.claude/commands/tink');
+  const flatCommandDir = path.join(target, '.claude/commands');
+  const commandFiles = ['setup.md', 'cast.md', 'verify.md', 'list.md', 'frog.md', 'weave.md', 'update.md'];
+  for (const name of commandFiles) {
+    removeIfExists(target, path.join(commandDir, name), 'repo-local Claude command');
+  }
+  if (fs.existsSync(commandDir) && fs.readdirSync(commandDir).length === 0) {
+    removeIfExists(target, commandDir, 'empty repo-local Claude command dir');
+  }
+  const legacyFlatCommands = [
+    'tink-setup.md',
+    'tink-cast.md',
+    'tink-verify.md',
+    'tink-list.md',
+    'tink-frog.md',
+    'tink-weave.md',
+    'tink-update.md',
+    'tink-forge.md',
+    'tink-purge.md',
+    'tink-hone.md'
+  ];
+  for (const name of legacyFlatCommands) {
+    removeIfExists(target, path.join(flatCommandDir, name), 'repo-local Claude command');
+  }
+  if (fs.existsSync(flatCommandDir) && fs.readdirSync(flatCommandDir).length === 0) {
+    removeIfExists(target, flatCommandDir, 'empty repo-local Claude commands dir');
+  }
+  const skillDir = path.join(target, '.claude/skills/tink');
+  const skillParentDir = path.join(target, '.claude/skills');
+  const skillFile = path.join(skillDir, 'SKILL.md');
+  if (!fs.existsSync(skillDir)) {
+    if (fs.existsSync(skillParentDir) && fs.readdirSync(skillParentDir).length === 0) {
+      removeIfExists(target, skillParentDir, 'empty repo-local Claude skills dir');
+    }
+    return;
+  }
+  if (!fs.existsSync(skillFile)) {
+    log.message(`keep unknown ${displayPath(target, skillDir)}`);
+    recordOperation('keptUnknown', target, skillDir);
+    return;
+  }
+  const text = fs.readFileSync(skillFile, 'utf8');
+  if (text.includes('name: tink') && text.includes('# Tink')) {
+    removeIfExists(target, skillDir, 'repo-local Claude skill');
+    if (fs.existsSync(skillParentDir) && fs.readdirSync(skillParentDir).length === 0) {
+      removeIfExists(target, skillParentDir, 'empty repo-local Claude skills dir');
+    }
+    return;
+  }
+  log.message(`keep unknown ${displayPath(target, skillDir)}`);
+  recordOperation('keptUnknown', target, skillDir);
+}
 function readJsonFile(filePath, fallback) {
   if (!fs.existsSync(filePath)) return fallback;
   try {
@@ -417,6 +481,9 @@ function copySelected(scope, components, agent) {
   if (includesClaude(agent) && components.includes('commands')) {
     copyTinkCommands(templateRoot, target);
   }
+  if (agent === 'codex') {
+    removeRepoLocalClaudeTinkSurface(target);
+  }
   if (components.includes('skill')) {
     if (includesClaude(agent)) {
       copyDir(path.join(templateRoot, 'claude/skills'), path.join(target, '.claude/skills'), target);

package/commands/cast.md CHANGED Viewed

@@ -369,8 +369,12 @@ A task is trivial only when ALL of the following are true:
 2. Read `.tink/rules/index.json` if present. Use it as a small rule graph to choose candidate harnesses, checks, and opt-in guard candidates from contract facts. Do not read every harness.
    - Load `mandatory` nodes first when their `when` facts match the contract.
    - Retrieve `retrievable` nodes only when their `when` facts or `keywords` fit the task.
+   - Treat `select_harnesses`, `include_paths`, `checks`, `reason`, and `risk` as first-class routing data when present.
    - Respect `budget_cost` and `selection_policy.retrieval.max_retrievable_per_phase` when present.
    - Record every loaded rule id in `.tink/current/session.json` under `loaded_rule_ids_by_phase.<phase>`.
+   - Record selected `include_paths` in `context-map.json.included[]` with `role: "supporting"` or `role: "verification_target"` when the rule also adds a check.
+   - Record rule `checks` in `contract.json.verification.manual_checks[]` or `commands[]` only when they are relevant and cheap; otherwise record them in `notes.md` as deferred checks.
+   - Record rule `reason` and `risk` in `context-map.json.signals[]` with `kind: "rule_graph"` so reviewers can see why the context or check was chosen.
    - If a rule id is already listed for the same phase, do not repeat its guidance; cite the existing session entry instead.
 3. Read `.tink/harnesses/index.json`. Use it to validate the candidates from the rule graph and to fall back when no rule node matches.
 4. Read small memory files where `config.json` sets `memory_has_entries.<name>: true`. Skip files set to `false`. After a Save Gate approves a new memory entry, set that file's flag to `true` in `config.json`.

package/commands/frog.md CHANGED Viewed

@@ -26,6 +26,7 @@ Use Korean field values when `.tink/config.json` language is `ko` or `auto` with
    - `.tink/runs/` summaries
    - `.tink/maintenance/ledger.jsonl`
    - `.tink/maintenance/weave-queue.json`
+   - `.tink/rules/index.json`
    - references in memory files
    - recent git history touching harness files as weak context only
 2b. Check `.tink/runs/` accumulation against TTL config:
@@ -53,6 +54,14 @@ Use Korean field values when `.tink/config.json` language is `ko` or `auto` with
    - merge into another harness
    - delete
    - rewrite via `/tink:weave`
+6b. If `.tink/rules/index.json` exists, also inspect rule quality:
+   - keep: concrete `when`, `reason`, and useful `checks` or `include_paths`
+   - rewrite: too broad, unclear reason, or missing verification
+   - split: one rule mixes unrelated paths, tasks, or risks
+   - merge: multiple rules cover the same `when`, `include_paths`, and `checks`
+   - needs evidence: weak or no run, ledger, friction, or user-correction evidence
+   Prefer keep, rewrite, split, merge, or needs evidence before any removal proposal.
+   Report rule recommendations separately from harness recommendations.
 7. Only strong evidence may recommend `delete`. Medium evidence may recommend `merge` or `hone`. Weak evidence must default to `keep` or `needs evidence`.
 8. For each non-keep action, prepare an operation-specific approval payload with exact files, op ID, evidence handles, and rollback.
 9. If the recommendation is `weave`, write or present a weave handoff packet and, after approval, add it to `.tink/maintenance/weave-queue.json`:

package/commands/weave.md CHANGED Viewed

@@ -24,42 +24,50 @@ Use Korean field values when `.tink/config.json` language is `ko` or `auto` with
 1. Read `.tink/harnesses/index.json`. If `.tink/maintenance/weave-queue.json` exists, read it to find:
    - Handoff packets from `/tink:frog` (entries where `auto` is absent or false)
    - Auto signals from completed runs (entries where `auto: true`)
-   Count auto signals per harness: `check_failed` signals count as 2, all other outcomes count as 1. Use this frequency to rank improvement candidates — harnesses with the highest signal count should be improved first. If invoked from `/tink:frog`, also read the purge output and `.tink/current/notes.md` for the weave handoff packet.
-   If `.tink/maintenance/friction.jsonl` exists, read only compact recent entries and count repeated `check_failed`, `check_skipped`, `blocked`, gate denial, or rollback events. Repeated friction can justify a harness edit, rule graph update, or opt-in guard candidate.
+   Count auto signals per harness: `check_failed` signals count as 2, all other outcomes count as 1. Use this frequency to rank improvement candidates — harnesses with the highest signal count should be improved first. If invoked from `/tink:frog`, also read the purge output and `.tink/current/notes.md` for the weave handoff packet.
+   If `.tink/maintenance/friction.jsonl` exists, read only compact recent entries and count repeated `check_failed`, `check_skipped`, `blocked`, gate denial, or rollback events. Repeated friction can justify a harness edit, rule graph update, or opt-in guard candidate.
 2. Identify one or a few active harnesses to improve using real failures and evidence:
    - repeated mistakes
-   - user corrections
-   - failed checks
-   - repeated friction entries
-   - confusing approval prompts
+   - user corrections
+   - failed checks
+   - repeated friction entries
+   - confusing approval prompts
    - too much context footprint
    - missing done criteria
 3. Require concrete evidence handles before proposing a save:
-   - run record path or run ID
-   - current notes path when same-conversation certainty exists
-   - failed check name
-   - friction entry timestamp/type
-   - compact user correction snippet
+   - run record path or run ID
+   - current notes path when same-conversation certainty exists
+   - failed check name
+   - friction entry timestamp/type
+   - compact user correction snippet
    - purge handoff ID from `.tink/maintenance/weave-queue.json`
 4. Classify the evidence as repeated or single-run. Single-run evidence may suggest a trial edit, but should not become broad policy unless the user explicitly approves.
-5. Explain why the change belongs in the harness rather than `.tink/memory/` or `.tink/current/notes.md`.
-6. Decide the right destination:
-   - harness edit: a procedure, ask-first question, check, or recovery step should change;
-   - rule graph update: a contract fact should select a harness, check, or guard candidate earlier;
-   - opt-in hook guard candidate: the same failure should be blocked by `PreToolUse`, `PostToolUse`, or `Stop` after user approval;
-   - friction logging update: the run should record a missing evidence pattern more clearly.
-7. Read only the target harness files and `.tink/rules/index.json` when the evidence points to rule selection.
-8. Propose small edits:
-   - clearer when-to-use trigger
-   - better ask-first question
-   - tighter checks
-   - smaller context footprint
-   - explicit failure recovery
-   - rule graph node or edge
-   - opt-in guard template
-9. Show an approval payload: destination files, exact patch summary, evidence handles, repeated vs single-run classification, why reusable, context-cost delta, sensitive content excluded, rollback path.
-10. Ask for approval before saving.
-11. Apply surgical changes, update index metadata or `.tink/rules/index.json` if needed, mark the weave queue item status, and append the approval/result to `.tink/maintenance/ledger.jsonl`.
+5. Explain why the change belongs in the harness rather than `.tink/memory/` or `.tink/current/notes.md`.
+6. Decide the right destination:
+   - harness edit: a procedure, ask-first question, check, or recovery step should change;
+   - rule graph update: a contract fact should select a harness, check, or guard candidate earlier;
+   - opt-in hook guard candidate: the same failure should be blocked by `PreToolUse`, `PostToolUse`, or `Stop` after user approval;
+   - friction logging update: the run should record a missing evidence pattern more clearly.
+7. Read only the target harness files and `.tink/rules/index.json` when the evidence points to rule selection.
+   For rule graph updates, run a structural gate before proposing a save:
+   - duplicate: does an existing rule already cover the same `when`, `include_paths`, or `checks`?
+   - breadth: is the rule too broad, such as "always check docs", instead of tied to concrete paths, task facts, or risks?
+   - evidence: does the proposal cite a run, failed check, user correction, or friction entry?
+   - verification: does the rule add a check or explain why no check is needed?
+   - compatibility: does the rule make sense for both Claude Code and Codex, and for macOS and Windows?
+   - portability: does it avoid OS-specific shell syntax unless alternatives are listed?
+8. Propose small edits:
+   - clearer when-to-use trigger
+   - better ask-first question
+   - tighter checks
+   - smaller context footprint
+   - explicit failure recovery
+   - rule graph node or edge
+   - opt-in guard template
+9. Show an approval payload: destination files, exact patch summary, evidence handles, repeated vs single-run classification, why reusable, context-cost delta, sensitive content excluded, rollback path.
+   For rule graph updates, also show structural gate results: duplicate, breadth, evidence, verification, compatibility, and portability.
+10. Ask for approval before saving.
+11. Apply surgical changes, update index metadata or `.tink/rules/index.json` if needed, mark the weave queue item status, and append the approval/result to `.tink/maintenance/ledger.jsonl`.
 ## Approval format
 ```text
@@ -72,11 +80,11 @@ Evidence:
 - observed failure: verification command was unclear in two runs
 Approval payload:
-- operation: weave
-- destination files: `.tink/harnesses/code-change.md`, `.tink/harnesses/index.json` if metadata changes, `.tink/rules/index.json` if routing changes
-- context-cost delta: neutral or smaller
-- ledger: append op ID to `.tink/maintenance/ledger.jsonl`
-- rollback: revert this patch or rerun `/tink:weave` with the previous trigger
+- operation: weave
+- destination files: `.tink/harnesses/code-change.md`, `.tink/harnesses/index.json` if metadata changes, `.tink/rules/index.json` if routing changes
+- context-cost delta: neutral or smaller
+- ledger: append op ID to `.tink/maintenance/ledger.jsonl`
+- rollback: revert this patch or rerun `/tink:weave` with the previous trigger
 Proposed improvement:
 - Checks 섹션에 “검증 명령과 실패 시 마지막 안전 지점 기록” 추가
@@ -89,7 +97,7 @@ Proposed improvement:
 ## Do not
 - Do not rewrite a harness from scratch unless the user asks.
-- Do not add broad principles that do not change behavior.
-- Do not save one-off task progress as harness knowledge.
-- Do not propose a harness edit without evidence handles.
-- Do not register enforcement hooks by default. Save guard templates as opt-in candidates unless the user explicitly approves installation.
+- Do not add broad principles that do not change behavior.
+- Do not save one-off task progress as harness knowledge.
+- Do not propose a harness edit without evidence handles.
+- Do not register enforcement hooks by default. Save guard templates as opt-in candidates unless the user explicitly approves installation.

package/docs/context-run-history-rollup.ko.md ADDED Viewed

@@ -0,0 +1,36 @@
+# Context Run History Rollup
+Context Run History Rollup은 여러 run의 `context-metrics-evaluation.json` 점수를 묶어 90% 목표가 반복 작업에서도 유지되는지 보는 기준이다.
+이 기능은 새 public command가 아니다. `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않는다. 지금 단계에서는 `tests/fixtures/maintenance/context-metrics-rollup.json`과 `tests/test_templates.py`가 같은 rollup 점수를 계산하는지 확인한다.
+영어판은 `docs/context-run-history-rollup.md`에 있다.
+## 왜 필요한가
+current run 하나가 90% 이상이어도 반복 작업 전체가 안정적이라고 말하기는 어렵다. Rollup은 여러 run의 점수를 모아서 다음을 확인한다.
+- 각 지표의 평균 점수.
+- 각 지표의 최저 점수.
+- 모든 run이 여섯 지표를 빠짐없이 갖는지.
+- 각 지표가 평균과 최저점 모두 90% 이상인지.
+## 점수의 의미
+`scope: "run_history"`는 여러 run record를 묶은 값이라는 뜻이다.
+fixture에 있는 rollup은 production telemetry가 아니다. 실제 `.tink/runs/*` 기록이 충분히 쌓이기 전까지는 “대표 run-history fixture에서 90% 이상”이라고만 말한다.
+## 완료 기준
+- 여섯 지표가 모두 rollup 평균 90% 이상이다.
+- 여섯 지표가 모두 run별 최저점 90% 이상이다.
+- 각 score에는 `formula`, `numerator`, `denominator`, `evidence_refs`, `minimum_percent`가 있다.
+- `limits`에 production telemetry가 아님을 명시한다.
+## 호환성 기준
+- Claude Code와 Codex가 같은 artifact 이름과 metric 이름을 읽을 수 있어야 한다.
+- macOS와 Windows 모두에서 `npm test`로 검증되어야 한다.
+- 사용자 승인 없이 reusable memory, harness, rule, config를 저장하지 않는다.
+- Sentry와 release evidence bundling은 포함하지 않는다.

package/docs/context-run-history-rollup.md ADDED Viewed

@@ -0,0 +1,36 @@
+# Context Run History Rollup
+Context Run History Rollup combines multiple `context-metrics-evaluation.json` scores to check whether the 90% target holds across repeated work.
+It is not a new public command. It must not add a `tink index` command, watcher, generated cache, or hidden runtime index. At this stage, `tests/fixtures/maintenance/context-metrics-rollup.json` and `tests/test_templates.py` must calculate the same rollup scores.
+Korean companion: `docs/context-run-history-rollup.ko.md`.
+## Why It Exists
+A single current run above 90% is useful, but it does not prove repeated work is stable. The rollup combines several runs and checks:
+- Average score for each metric.
+- Minimum score for each metric.
+- Whether every run records all six metrics.
+- Whether both average and minimum are at or above 90%.
+## What The Score Means
+`scope: "run_history"` means the score combines multiple run records.
+The fixture rollup is not production telemetry. Until enough real `.tink/runs/*` records exist, describe it as representative run-history fixture evidence only.
+## Done Criteria
+- All six metrics have rollup averages at or above 90%.
+- All six metrics have per-run minimums at or above 90%.
+- Each score has `formula`, `numerator`, `denominator`, `evidence_refs`, and `minimum_percent`.
+- `limits` states that the data is not production telemetry.
+## Compatibility
+- Claude Code and Codex read the same artifact names and metric names.
+- macOS and Windows are both verified through `npm test`.
+- Reusable memory, harness, rule, and config saves still require explicit approval.
+- Sentry and release evidence bundling are out of scope.

package/docs/context-run-record-policy.ko.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Context Run Record Policy
+Context Run Record Policy는 실제 `.tink/runs/*` 기록을 나중에 rollup할 때 어떤 기록을 써도 되는지 정하는 기준이다.
+이 문서는 새 public command가 아니다. `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않는다. 자동 수집도 하지 않는다.
+영어 문서는 `docs/context-run-record-policy.md`에 있다.
+## 왜 필요한가
+현재 90% 달성 근거는 repository fixture와 대표 run-history fixture 기준이다. 이것만으로 “모든 실제 프로젝트에서 production-wide 90%가 보장된다”고 말하면 안 된다.
+실제 run 기록으로 넘어가려면 먼저 다음을 정해야 한다.
+- 어떤 `.tink/runs/*` 기록을 rollup에 포함할 수 있는가.
+- 어떤 정보는 민감하거나 너무 넓어서 제외해야 하는가.
+- metric score가 verification evidence와 연결되어 있는가.
+- Claude Code와 Codex, macOS와 Windows에서 같은 기준으로 검증할 수 있는가.
+## 포함 가능한 기록
+포함 가능한 기록은 사용자가 승인한 current-run에서 나온 완료 기록이어야 한다.
+- run id 또는 run path.
+- 완료 시각.
+- 사용한 surface: Claude Code 또는 Codex.
+- 사용한 platform: macOS 또는 Windows.
+- `context-metrics-evaluation.json` 형태의 여섯 지표 점수.
+- 검증 결과와 check 목록.
+- 짧은 evidence handle.
+- production telemetry인지, fixture인지, 대표 run인지에 대한 limit.
+## 제외해야 할 기록
+다음은 run-history rollup에 넣지 않는다.
+- token, credential, raw private payload.
+- private issue, dashboard, Figma file, discussion 전체 복사본.
+- 별도 승인 없는 `.tink/memory/*`, `.tink/rules/*`, `.tink/harnesses/*` reusable state 변경.
+- Sentry 연동.
+- release evidence bundling.
+## 완료 기준
+- 여섯 지표가 모두 기록에 존재한다.
+- metric score마다 근거가 있다.
+- verification result와 checks가 연결되어 있다.
+- limit가 production telemetry인지 아닌지 명확히 말한다.
+- 새 public command, watcher, generated cache, hidden runtime index가 없다.
+- macOS와 Windows 모두 `npm test`로 검증할 수 있다.

package/docs/context-run-record-policy.md ADDED Viewed

@@ -0,0 +1,50 @@
+# Context Run Record Policy
+Context Run Record Policy defines which real `.tink/runs/*` records can later be used for run-history rollup.
+This is not a new public command. It does not create a `tink index` command, watcher, generated cache, or hidden runtime index. It also does not collect records automatically.
+Korean documentation is available in `docs/context-run-record-policy.ko.md`.
+## Why This Exists
+The current 90 percent evidence is based on repository fixtures and representative run-history fixtures. That is not enough to claim production-wide 90 percent behavior across every real project.
+Before moving to real run records, Tink needs a clear answer to these questions:
+- Which `.tink/runs/*` records can be included in a rollup?
+- Which data is sensitive or too broad and must be excluded?
+- Are metric scores linked to verification evidence?
+- Can Claude Code and Codex, on macOS and Windows, verify the same criteria?
+## Records That Can Be Included
+Included records must come from an approved current run and represent a completed work record.
+- Run id or run path.
+- Completion timestamp.
+- Surface: Claude Code or Codex.
+- Platform: macOS or Windows.
+- Six metric scores shaped like `context-metrics-evaluation.json`.
+- Verification result and check list.
+- Short evidence handles.
+- Explicit limits that say whether the record is production telemetry, a fixture, or a representative run.
+## Records That Must Be Excluded
+Do not include these in run-history rollup:
+- Tokens, credentials, or raw private payloads.
+- Full private issue text, whole dashboards, entire Figma files, or complete discussions.
+- Unapproved reusable state changes under `.tink/memory/*`, `.tink/rules/*`, or `.tink/harnesses/*`.
+- Sentry integration.
+- Release evidence bundling.
+## Completion Criteria
+- All six metrics are present.
+- Each metric score has evidence.
+- Verification result and checks are linked.
+- Limits clearly state whether the data is production telemetry.
+- No new public command, watcher, generated cache, or hidden runtime index exists.
+- macOS and Windows can verify the criteria with `npm test`.

package/docs/context-threshold-status.ko.md ADDED Viewed

@@ -0,0 +1,43 @@
+# Context Threshold Status
+Context Threshold Status는 여섯 가지 컨텍스트 효율 지표가 90% 목표를 넘었는지 한눈에 확인하는 상태판이다.
+현재 상태판은 `tests/fixtures/current-run/context-metrics-evaluation.json`과 `tests/fixtures/maintenance/context-metrics-rollup.json`을 함께 본다. 즉, 단일 current run fixture와 여러 대표 run을 묶은 run-history rollup이 모두 90% 이상인지 확인한다.
+이 문서는 새 public command가 아니다. `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않는다. 자동으로 사용자의 repo 데이터를 수집하지도 않는다.
+영어 문서는 `docs/context-threshold-status.md`에 있다.
+## 현재 상태
+| 지표 | current run | rollup 평균 | rollup 최저 | 상태 |
+| --- | ---: | ---: | ---: | --- |
+| 불필요 context 포함률 감소 | 100% | 97% | 94% | 90% 이상 |
+| 초기 context pack 크기 감소 | 100% | 95% | 92% | 90% 이상 |
+| 리뷰자가 근거 찾는 시간 감소 | 100% | 98% | 96% | 90% 이상 |
+| 검증 누락 탐지율 개선 | 100% | 99% | 98% | 90% 이상 |
+| 반복 작업 context 재사용 정확도 | 100% | 96% | 94% | 90% 이상 |
+| 재작업 가능성 감소 | 100% | 95% | 91% | 90% 이상 |
+## 왜 필요한가
+이 상태판이 없으면 90% 이상이라는 말을 어디까지 믿어도 되는지 다시 추리해야 한다.
+- current run fixture는 지금 만든 artifact가 완전한지 본다.
+- run-history rollup은 반복 작업에서도 점수가 유지되는지 본다.
+- minimum score는 특정 작업 단위가 90% 아래로 떨어지는지 본다.
+## 한계
+현재 상태는 repository fixture와 대표 run-history fixture 기준이다. production telemetry가 아니다.
+따라서 사용자에게 “모든 실제 프로젝트에서 90% 이상 보장”이라고 말하면 안 된다. 실제 `.tink/runs/*` 기록이 충분히 쌓이면 같은 계산식으로 다시 rollup해야 한다.
+## 완료 기준
+- 여섯 지표 모두 current run score가 90% 이상이다.
+- 여섯 지표 모두 rollup 평균이 90% 이상이다.
+- 여섯 지표 모두 rollup 최저점이 90% 이상이다.
+- `limits`에 production telemetry가 아니라는 한계를 명시한다.
+- Claude Code와 Codex 모두 같은 artifact 이름과 metric 이름을 읽을 수 있다.
+- macOS와 Windows 모두 `npm test`로 검증할 수 있다.