tink-harness 1.2.2 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "tink",
3
3
  "description": "A small harness layer for Claude Code and Codex.",
4
- "version": "1.2.2",
4
+ "version": "1.4.0",
5
5
  "author": {
6
6
  "name": "dotori"
7
7
  }
package/CHANGELOG.md CHANGED
@@ -7,6 +7,41 @@ All notable changes to Tink are tracked here.
7
7
  No unreleased changes yet.
8
8
 
9
9
 
10
+ ## [1.4.0] - 2026-06-08
11
+
12
+ ### Added
13
+
14
+ - `context-metrics-evaluation.schema.json` for `.tink/current/context-metrics-evaluation.json`.
15
+ - Context Metrics Evaluator run-state artifact guidance for `/tink:cast` and `$tink:cast`.
16
+ - Fixture-ratio evaluation docs in Korean and English, explaining measured fixture scope versus production telemetry.
17
+ - Test-backed context metrics evaluation fixture that calculates all six context-efficiency metrics at or above the 90% target within fixture scope.
18
+ - Korean PR history draft for the Context Metrics Artifact work in `docs/pr/2026-06-08-context-metrics-artifact.ko.md`.
19
+
20
+ ### Changed
21
+
22
+ - Work State Guide now includes `context-metrics-evaluation.json` in the reading order.
23
+ - README and Korean README now link to the Context Metrics Evaluator docs without adding a new command.
24
+
25
+
26
+ ## [1.3.0] - 2026-06-08
27
+
28
+ ### Added
29
+
30
+ - Context Budget Ledger fields for `context-map.json` entries: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`.
31
+ - `context-map.json.efficiency_metrics` example shape for recording six context-efficiency scores with basis, confidence, evidence refs, and limits.
32
+ - Current-run fixture examples that connect selected context to verification links and mark avoid-next-time exclusions.
33
+ - Repo signal `context_budget_policy` fixture guidance for scoring selected Context Graph Lite candidates without adding a public `tink index` command.
34
+ - Korean-first Context Budget Ledger documentation with an English companion.
35
+ - Korean context-engineering efficiency HTML explainer for the current operating model and expected improvement ranges.
36
+ - Korean PR history draft for the Context Budget Ledger work in `docs/pr/2026-06-08-context-budget-ledger.ko.md`.
37
+
38
+ ### Changed
39
+
40
+ - `/tink:cast` and `$tink:cast` guidance now asks context artifacts to record role, cost, reuse signal, verification link, staleness, and evidence kind when useful.
41
+ - Work State Guide now explains how to read Context Budget Ledger fields.
42
+ - README and Korean README now link to the Context Budget Ledger docs without expanding the main body.
43
+
44
+
10
45
  ## [1.2.2] - 2026-06-08
11
46
 
12
47
  ### Added
package/README.ko.md CHANGED
@@ -8,7 +8,7 @@ Claude Code와 Codex를 위한 작은 하네스 레이어입니다.
8
8
 
9
9
  Tink는 지금 작업에 맞는 하네스를 고르고, 실행 상태를 보이게 만들고, 실제 사용 중 생긴 실패와 피드백으로 하네스 세트를 개선합니다.
10
10
 
11
- **최신 릴리스:** v1.2.2업데이트 신뢰도, 작업 단위 계획, 검증 증거 세분화, memory 정책 기반.
11
+ **최신 릴리스:** v1.4.0context 효율 점수를 계산식, evidence ref, 측정 scope와 함께 남기는 Context Metrics Evaluator artifact.
12
12
 
13
13
  [English](README.md) · **한국어**
14
14
 
@@ -66,6 +66,7 @@ npx tink-harness@latest update
66
66
  - Codex에는 하나의 넓은 `tink` 스킬 대신 `$tink:cast`, `$tink:verify` 같은 action skill만 보이도록 설치됩니다.
67
67
  - 비단순 작업은 `context-pack.md`, `context-map.json`, `excluded-context.md`로 어떤 context를 썼고 뺐는지 남깁니다.
68
68
  - Repo Signal과 Context Graph Lite는 새 `tink index` 명령을 만들지 않고도 관련 테스트, 스키마, 동기화 파일, 검증 힌트를 고르는 데 쓰입니다.
69
+ - context 효율 점수화와 fixture 비율 계산 기준은 `docs/context-budget-ledger.ko.md`, `docs/context-budget-ledger.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-metrics-evaluator.md`에서 확인할 수 있습니다.
69
70
  - `/tink:verify`와 `$tink:verify`는 같은 Verify Runner 모델을 쓰며 `.tink/current/verification.json`에 검증 증거를 남깁니다.
70
71
  - 외부 context는 MCP Safe Profile을 따릅니다. 가장 작은 source handle만 남기고, 신뢰도와 민감도를 표시하며, 위험하거나 너무 넓은 context는 `excluded-context.md`에 따로 기록합니다.
71
72
 
package/README.md CHANGED
@@ -17,14 +17,14 @@
17
17
  </p>
18
18
 
19
19
  <p>
20
- <a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.2.2"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
20
+ <a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.4.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
21
21
  <a href="https://www.npmjs.com/package/tink-harness"><img src="https://img.shields.io/npm/v/tink-harness?label=npm&color=cb3837" alt="npm version"></a>
22
22
  <a href="https://github.com/dotoricode/tink-harness/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/dotoricode/tink-harness/ci.yml?branch=main&label=ci" alt="CI"></a>
23
23
  <a href="https://github.com/dotoricode/tink-harness/blob/main/LICENSE"><img src="https://img.shields.io/github/license/dotoricode/tink-harness" alt="License"></a>
24
24
  <a href="https://github.com/dotoricode/tink-harness/stargazers"><img src="https://img.shields.io/github/stars/dotoricode/tink-harness?style=social" alt="GitHub stars"></a>
25
25
  </p>
26
26
 
27
- <p><strong>Latest release:</strong> v1.2.2update confidence, planned work units, evidence details, and memory policy scaffolding.</p>
27
+ <p><strong>Latest release:</strong> v1.4.0Context Metrics Evaluator artifact for measured context-efficiency scores.</p>
28
28
 
29
29
  **English** · [한국어](README.ko.md)
30
30
 
@@ -131,6 +131,7 @@ This release makes Tink work as one harness layer across Claude Code and Codex.
131
131
  - Codex now installs focused `$tink:*` action skills instead of one broad visible `tink` skill, so the picker shows commands like `$tink:cast` and `$tink:verify` cleanly.
132
132
  - Non-trivial runs now create context artifacts: `context-pack.md`, `context-map.json`, and `excluded-context.md`.
133
133
  - Repo Signals and Context Graph Lite help `/tink:cast` and `$tink:cast` choose relevant tests, schemas, sync partners, and verification hints without adding a new `tink index` command.
134
+ - Context Budget Ledger fields and fixture-ratio evaluation are documented in `docs/context-budget-ledger.md`, `docs/context-budget-ledger.ko.md`, `docs/context-metrics-evaluator.md`, and `docs/context-metrics-evaluator.ko.md` without adding a new command.
134
135
  - `/tink:verify` and `$tink:verify` share one portable Verify Runner model and write compact evidence to `.tink/current/verification.json`.
135
136
  - External context now follows the MCP Safe Profile: include only the smallest useful source handle, mark confidence and sensitivity, exclude unsafe context visibly, and connect important claims to verification.
136
137
 
package/VERSIONING.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Versioning
2
2
 
3
- Current version: `1.2.2`
3
+ Current version: `1.4.0`
4
4
 
5
5
  Tink follows semver from `1.0.0` onward.
6
6
 
package/commands/cast.md CHANGED
@@ -144,6 +144,7 @@ After approval, create `.tink/current/` with these files before doing deeper wor
144
144
  - `session.json`: lightweight session metadata, especially rule ids already loaded by phase
145
145
  - `context-pack.md`: human-readable selected context, including why each item is relevant
146
146
  - `context-map.json`: machine-readable included and excluded context with reasons
147
+ - `context-metrics-evaluation.json`: measured or estimated context-efficiency scores, formulas, evidence refs, and limits
147
148
  - `excluded-context.md`: notable omitted files, tools, sources, or claims and why they were excluded
148
149
 
149
150
  Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
@@ -188,9 +189,12 @@ If `.tink/schemas/session.schema.json` exists, use it as the session shape. Do n
188
189
  Create context artifacts before deeper implementation work:
189
190
  - `context-pack.md` should name the user task, selected harnesses, contract summary, loaded rules, selected files/docs, selected external sources, and verification implications.
190
191
  - `context-map.json` should contain `task`, `included`, `excluded`, `signals`, and `generated_at`. Each included or excluded entry should include `path` or `source`, `kind`, `reason`, and `confidence`. When external context is selected, also write `external_context[]`.
192
+ - When useful, enrich each context entry with Context Budget Ledger fields: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`. Use them to explain why the first context pack is small enough, why excluded context should stay out, and which checks prove selected context matters.
193
+ - `context-metrics-evaluation.json` should contain `run`, `evaluator`, `target_threshold_percent`, `measurement_status`, `scope`, `limits`, and `scores[]`. Each score should include `name`, `score_percent`, `formula`, `numerator`, `denominator`, and `evidence_refs`. If the score is based only on fixtures or the current run, say so in `scope` and `limits`; do not claim production-wide 90% without run-history or telemetry evidence.
191
194
  - `excluded-context.md` should make important omissions visible, especially files skipped because they are out of scope, stale, risky, too broad, or unverified external claims.
192
195
 
193
196
  If `.tink/schemas/context-map.schema.json` exists, use it for `context-map.json`. Do not paste the schema into the user response.
197
+ If `.tink/schemas/context-metrics-evaluation.schema.json` exists, use it for `context-metrics-evaluation.json`. Do not paste the schema into the user response.
194
198
 
195
199
  Use deterministic context selection inside cast. Do not create or require a separate `tink index` command for this phase.
196
200
 
@@ -227,6 +231,7 @@ When a repo signal fixture exists, such as `tests/fixtures/repo-signals/*.json`
227
231
  - set `signal.source` to the fixture path and `signal.source_ref` to the relevant entry name or JSON path when useful;
228
232
  - do not include every fixture entry by default; select only entries that explain the current task, verification, or safety boundary;
229
233
  - if the fixture conflicts with live repo state, prefer live repo state and record the fixture mismatch as a medium-confidence signal.
234
+ - if the fixture provides `context_budget_policy`, use it to assign entry roles, cost, reuse signals, verification links, staleness, and evidence kinds; do not treat the policy as telemetry or claim a 90% score without evidence.
230
235
 
231
236
  Context Graph Lite rules may appear in the same fixture under `context_graph_lite.rules[]`. Use them only inside cast:
232
237
  - match changed paths against `when_paths`;
@@ -258,6 +263,7 @@ Exclusion rules:
258
263
  - Exclude product phases that are explicitly deferred, and name the deferral in `excluded-context.md`.
259
264
  - Prefer a short high-confidence context pack over a broad low-confidence one.
260
265
  - When unsure, include the uncertainty in `reason` and set `confidence` to `low` or `medium` rather than silently expanding scope.
266
+ - For repeated false starts, mark the entry with `reuse_signal: "avoid_next_time"` or `role: "stale"` instead of deleting the evidence. This lets later runs skip it faster while preserving the reason.
261
267
 
262
268
  Candidate limits:
263
269
  - Start with 5-12 included entries for normal code/doc work.
@@ -398,7 +404,7 @@ A task is trivial only when ALL of the following are true:
398
404
  15. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
399
405
  16. Ask for explicit approval before non-trivial work.
400
406
  17. After approval, read only the selected harness files and any approved run-only draft.
401
- 18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, and `excluded-context.md`.
407
+ 18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`.
402
408
  19. Execute the first safe step immediately:
403
409
  - inspect relevant files,
404
410
  - run a read-only diagnostic,
@@ -0,0 +1,58 @@
1
+ # Context Budget Ledger
2
+
3
+ Context Budget Ledger는 Tink가 context를 많이 모으는 대신, 왜 넣었고 왜 뺐는지 점수화할 수 있게 만드는 작은 기록 규칙이다.
4
+
5
+ 이 기능은 새 public command가 아니다. `tink index` 명령, watcher, generated cache, hidden runtime index도 만들지 않는다. `/tink:cast`와 `$tink:cast`가 이미 남기는 `context-map.json`과 `context-diff.json`의 항목을 더 읽기 쉽게 만드는 방식이다.
6
+
7
+ 영어판은 `docs/context-budget-ledger.md`에 있다.
8
+
9
+ ## 왜 필요한가
10
+
11
+ 지금의 `context-map.json`은 어떤 파일과 source를 포함하거나 제외했는지 설명할 수 있다. 하지만 반복 작업에서 다음 질문을 정량적으로 답하기에는 정보가 부족했다.
12
+
13
+ - 이 context가 핵심인가, 보조인가, 검증 대상인가?
14
+ - 처음 context pack에 넣기에는 비용이 높은가?
15
+ - 다음 비슷한 작업에서 다시 불러와야 하는가, 피해야 하는가?
16
+ - 어떤 검증 check와 연결되는가?
17
+ - 오래됐거나 stale한 정보인가?
18
+
19
+ 그래서 context entry에 다음 optional 필드를 추가한다.
20
+
21
+ - `role`: `primary`, `supporting`, `verification_target`, `external_evidence`, `exclusion_candidate`, `example_only`, `stale`, `avoid_next_time`.
22
+ - `cost`: `low`, `medium`, `high`.
23
+ - `reuse_signal`: `always`, `often`, `rare`, `example_only`, `avoid_next_time`.
24
+ - `verification_link`: 연결되는 check, evidence ref, verification hint.
25
+ - `staleness`: `fresh`, `aging`, `stale`, `unknown`.
26
+ - `evidence_kind`: `file`, `doc`, `schema`, `test`, `command`, `external`, `signal`, `diff`, `unknown`.
27
+
28
+ ## 어떻게 쓰는가
29
+
30
+ 작업 시작 시에는 `role`과 `cost`로 첫 context pack을 작게 고른다.
31
+
32
+ 작업 중에는 새로 필요해진 context를 `context-diff.json`에 남긴다. 이때 `verification_link`가 있으면 리뷰자가 왜 그 파일을 봤는지 다시 추리하지 않아도 된다.
33
+
34
+ 작업 후에는 `reuse_signal`과 `staleness`를 보고 다음 반복에서 제외할 후보를 더 빨리 고른다. 예를 들어 `reuse_signal: "avoid_next_time"`인 외부 research link는 비슷한 로컬-only 작업에서 다시 불러오지 않는다.
35
+
36
+ `role: "verification_target"`인 항목은 반드시 검증과 연결되어야 한다. 연결이 없으면 검증 누락 후보로 본다.
37
+
38
+ ## 점수화 방식
39
+
40
+ `context-map.json.efficiency_metrics`는 여섯 지표를 0-100%로 기록한다.
41
+
42
+ - 불필요 context 포함률 감소: 제외 항목이 `reuse_signal`, `staleness`, 제외 이유를 갖는 비율.
43
+ - 초기 context pack 크기 감소: included 항목이 `role`과 `cost`로 우선순위화되는 비율.
44
+ - 리뷰자가 근거 찾는 시간 감소: included 항목 중 `role`과 `verification_link`가 함께 있는 비율.
45
+ - 검증 누락 탐지율 개선: `verification_target` 항목 중 연결 check가 있는 비율.
46
+ - 반복 작업 context 재사용 정확도: `reuse_signal`이 있는 항목 비율과 path-case 재선택 결과.
47
+ - 재작업 가능성 감소: context-diff에서 뒤늦게 추가된 필수 context와 누락 check가 줄어드는지.
48
+
49
+ 실제 telemetry가 없으면 `measurement_status: "estimated"`로 두고 한계를 함께 적는다. 근거 없이 90% 달성으로 표시하지 않는다.
50
+
51
+ fixture에서 비율을 계산할 수 있으면 `measurement_status: "measured"`를 쓸 수 있다. 이때도 scope가 fixture인지 production telemetry인지 분명히 적어야 한다. 자세한 계산 기준은 `docs/context-metrics-evaluator.ko.md`를 본다.
52
+
53
+ ## 호환성 기준
54
+
55
+ - Claude Code와 Codex가 같은 schema와 fixture를 읽는다.
56
+ - macOS와 Windows 모두에서 동작해야 하므로 shell 전용 경로나 OS 전용 명령을 요구하지 않는다.
57
+ - 사용자 승인 없이 reusable memory, harness, rule, config를 저장하지 않는다.
58
+ - Sentry와 release evidence bundling은 포함하지 않는다.
@@ -0,0 +1,58 @@
1
+ # Context Budget Ledger
2
+
3
+ Context Budget Ledger is a small record format that helps Tink score why context was included, excluded, reused, or linked to verification.
4
+
5
+ It is not a new public command. It must not add a `tink index` command, watcher, generated cache, or hidden runtime index. It only enriches the existing `context-map.json` and `context-diff.json` artifacts created by `/tink:cast` and `$tink:cast`.
6
+
7
+ Korean companion: `docs/context-budget-ledger.ko.md`.
8
+
9
+ ## Why It Exists
10
+
11
+ The current context map can explain included and excluded context. It needs a little more structure to answer repeated-run questions:
12
+
13
+ - Is this context primary, supporting, or a verification target?
14
+ - Is it expensive for the first context pack?
15
+ - Should similar future runs reuse it or avoid it?
16
+ - Which check or evidence proves it matters?
17
+ - Is the information fresh or stale?
18
+
19
+ Context entries can now include optional fields:
20
+
21
+ - `role`: `primary`, `supporting`, `verification_target`, `external_evidence`, `exclusion_candidate`, `example_only`, `stale`, `avoid_next_time`.
22
+ - `cost`: `low`, `medium`, `high`.
23
+ - `reuse_signal`: `always`, `often`, `rare`, `example_only`, `avoid_next_time`.
24
+ - `verification_link`: related check, evidence ref, or verification hint.
25
+ - `staleness`: `fresh`, `aging`, `stale`, `unknown`.
26
+ - `evidence_kind`: `file`, `doc`, `schema`, `test`, `command`, `external`, `signal`, `diff`, `unknown`.
27
+
28
+ ## How To Use It
29
+
30
+ At cast time, use `role` and `cost` to keep the first context pack small.
31
+
32
+ During work, record late-added context in `context-diff.json`. A `verification_link` helps reviewers jump from context to the check that proves it matters.
33
+
34
+ After work, use `reuse_signal` and `staleness` to exclude weak or stale context faster in the next similar run.
35
+
36
+ Entries with `role: "verification_target"` should connect to a command, manual check, evidence ref, or verification hint. Missing links are verification omission candidates.
37
+
38
+ ## Scoring
39
+
40
+ `context-map.json.efficiency_metrics` records the six context-efficiency metrics as 0-100% scores.
41
+
42
+ - Unnecessary context reduction: excluded entries with reuse and staleness evidence.
43
+ - Initial context pack size reduction: included entries prioritized by role and cost.
44
+ - Reviewer evidence lookup time reduction: included entries with both role and verification_link.
45
+ - Verification omission detection: verification_target entries with matching checks.
46
+ - Repeated context reuse accuracy: entries with reuse_signal and matching path-case reuse.
47
+ - Rework probability reduction: fewer late-added required context entries and missing checks.
48
+
49
+ If there is no runtime telemetry yet, mark the scores as `measurement_status: "estimated"` and include the limits. Do not claim 90% without evidence.
50
+
51
+ If fixture ratios can be calculated, `measurement_status: "measured"` is acceptable. The artifact must still state whether the scope is fixture evidence or production telemetry. See `docs/context-metrics-evaluator.md` for the calculation rules.
52
+
53
+ ## Compatibility
54
+
55
+ - Claude Code and Codex read the same schema and fixtures.
56
+ - macOS and Windows are both supported; no OS-specific shell behavior is required.
57
+ - Reusable memory, harness, rule, and config saves still require explicit approval.
58
+ - Sentry and release evidence bundling are out of scope.
@@ -0,0 +1,39 @@
1
+ # Context Metrics Evaluator
2
+
3
+ Context Metrics Evaluator는 Context Budget Ledger에 적힌 필드를 실제 비율로 계산하는 테스트 기준이다.
4
+
5
+ 이 기능은 새 public command가 아니다. `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않는다. `/tink:cast`와 `$tink:cast`는 `.tink/current/context-metrics-evaluation.json`을 run state artifact로 남기고, `tests/fixtures/current-run/context-metrics-evaluation.json`과 `tests/test_templates.py`는 같은 점수를 계산하는지 확인한다.
6
+
7
+ 영어판은 `docs/context-metrics-evaluator.md`에 있다.
8
+
9
+ ## 왜 필요한가
10
+
11
+ `context-map.json.efficiency_metrics`에 점수를 사람이 직접 적기만 하면, 숫자가 좋아 보여도 근거가 약하다. Evaluator는 fixture를 다시 읽어서 다음을 계산한다.
12
+
13
+ - excluded context가 `role`, `cost`, `reuse_signal`, `staleness`, `reason`을 갖는 비율.
14
+ - included context가 `role`과 `cost`를 갖고, high-cost 항목은 `verification_link`를 갖는 비율.
15
+ - included context가 `role`과 `verification_link`를 함께 갖는 비율.
16
+ - `verification_target` 항목이 실제 verification command나 verification hint와 연결되는 비율.
17
+ - 반복 path-case가 expected context role을 갖는 비율.
18
+ - context-diff 변화가 verification link와 metric impact로 추적되는 비율.
19
+
20
+ ## 점수의 의미
21
+
22
+ `fixture-ratio-v1`에서 90% 이상이라는 말은 “예시 artifact가 내부적으로 측정 가능하고 빠진 필드가 거의 없다”는 뜻이다.
23
+
24
+ 이것은 아직 “실제 모든 사용자 작업에서 90% 효율을 달성했다”는 뜻이 아니다. production telemetry나 여러 run record가 쌓이기 전까지는 scope를 `fixture`로 제한한다.
25
+
26
+ ## 완료 기준
27
+
28
+ - 여섯 지표가 모두 fixture 계산 기준으로 90% 이상이다.
29
+ - `context-map.json.efficiency_metrics.scores[]`와 `context-metrics-evaluation.json`의 점수가 일치한다.
30
+ - 각 점수에는 `formula`, `numerator`, `denominator`, `evidence_refs`, `limit`가 있다.
31
+ - `measurement_status`는 fixture에서 계산되면 `measured`로 둘 수 있지만, 문서에는 한계를 함께 적는다.
32
+ - 설치된 schema는 `.tink/schemas/context-metrics-evaluation.schema.json`이다.
33
+
34
+ ## 호환성 기준
35
+
36
+ - Claude Code와 Codex가 같은 artifact를 읽을 수 있어야 한다.
37
+ - macOS와 Windows 모두에서 `npm test`로 검증되어야 한다.
38
+ - 사용자 승인 없이 reusable memory, harness, rule, config를 저장하지 않는다.
39
+ - Sentry와 release evidence bundling은 포함하지 않는다.
@@ -0,0 +1,39 @@
1
+ # Context Metrics Evaluator
2
+
3
+ Context Metrics Evaluator is a test-backed way to calculate the ratios recorded by Context Budget Ledger.
4
+
5
+ It is not a new public command. It must not add a `tink index` command, watcher, generated cache, or hidden runtime index. `/tink:cast` and `$tink:cast` write `.tink/current/context-metrics-evaluation.json` as a run-state artifact, and `tests/fixtures/current-run/context-metrics-evaluation.json` plus `tests/test_templates.py` must calculate the same scores.
6
+
7
+ Korean companion: `docs/context-metrics-evaluator.ko.md`.
8
+
9
+ ## Why It Exists
10
+
11
+ If `context-map.json.efficiency_metrics` is only hand-written, the numbers can look better than the evidence. The evaluator re-reads the fixtures and calculates:
12
+
13
+ - Excluded context with `role`, `cost`, `reuse_signal`, `staleness`, and `reason`.
14
+ - Included context with `role` and `cost`, with high-cost entries justified by `verification_link`.
15
+ - Included context with both `role` and `verification_link`.
16
+ - `verification_target` entries linked to known verification commands or hints.
17
+ - Repeated path-cases with expected context roles.
18
+ - Context-diff changes traceable through verification links and metric impacts.
19
+
20
+ ## What The Score Means
21
+
22
+ In `fixture-ratio-v1`, a score at or above 90% means the example artifacts are internally measurable and have very few missing fields.
23
+
24
+ It does not mean that every real user run has reached 90% efficiency. Until production telemetry or multiple run records exist, the measurement scope stays `fixture`.
25
+
26
+ ## Done Criteria
27
+
28
+ - All six metrics are at or above 90% under the fixture calculation.
29
+ - `context-map.json.efficiency_metrics.scores[]` matches `context-metrics-evaluation.json`.
30
+ - Each score has `formula`, `numerator`, `denominator`, `evidence_refs`, and `limit`.
31
+ - `measurement_status` may be `measured` for fixture calculations, but the docs must state the limit.
32
+ - The installed schema is `.tink/schemas/context-metrics-evaluation.schema.json`.
33
+
34
+ ## Compatibility
35
+
36
+ - Claude Code and Codex read the same artifacts.
37
+ - macOS and Windows are both verified through `npm test`.
38
+ - Reusable memory, harness, rule, and config saves still require explicit approval.
39
+ - Sentry and release evidence bundling are out of scope.
@@ -0,0 +1,38 @@
1
+ # Context Budget Ledger
2
+
3
+ ## 문제
4
+
5
+ Tink는 `context-map.json`과 `excluded-context.md`로 어떤 context를 사용했는지 설명할 수 있었지만, 컨텍스트 엔지니어링 효율을 90% 목표까지 반복 개선하려면 더 정량적인 근거가 필요했다.
6
+
7
+ 특히 다음 항목은 기존 필드만으로는 자동 점검하기 어려웠다.
8
+
9
+ - 포함한 context가 핵심인지, 보조인지, 검증 대상인지.
10
+ - 처음 context pack에 넣기에는 비용이 높은지.
11
+ - 다음 비슷한 작업에서 재사용해야 하는지, 피해야 하는지.
12
+ - 선택한 context가 어떤 검증 check와 연결되는지.
13
+ - 오래됐거나 stale한 context를 얼마나 빨리 제외할 수 있는지.
14
+
15
+ ## 해결
16
+
17
+ - `context-map.schema.json`에 `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, `evidence_kind` 필드를 optional로 추가했다.
18
+ - `context-map.json.efficiency_metrics` 예시를 추가해 여섯 효율 지표를 0-100%와 근거로 기록할 수 있게 했다.
19
+ - current-run fixture와 context-diff fixture에 역할, 비용, 재사용 신호, 검증 연결, metric impact 예시를 추가했다.
20
+ - repo signal fixture에 `context_budget_policy`를 추가해 Context Graph Lite가 선택한 후보를 어떻게 점수화할지 설명했다.
21
+ - `/tink:cast`와 `$tink:cast` 지침에 Context Budget Ledger 필드를 기록하는 규칙을 추가했다.
22
+ - 한국어 우선 문서 `docs/context-budget-ledger.ko.md`와 영어 companion `docs/context-budget-ledger.md`를 추가하고 README에는 링크만 추가했다.
23
+ - 컨텍스트 동작 원리와 정량적 개선 추정치를 설명하는 `docs/context-engineering-efficiency.ko.html`을 repo 문서로 보존했다.
24
+
25
+ ## 검증
26
+
27
+ - `npm test`
28
+ - `git diff --check`
29
+ - `npm pack --dry-run --json`
30
+
31
+ ## 참고
32
+
33
+ - 새 public command는 추가하지 않았다. 특히 `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않았다.
34
+ - Sentry는 포함하지 않았다.
35
+ - release evidence bundling은 포함하지 않았다.
36
+ - 새 필드는 optional이므로 기존 `context-map.json`을 바로 깨뜨리지 않는다.
37
+ - 실제 telemetry가 없으면 `measurement_status: "estimated"`로 남기고, 근거 없이 90% 달성으로 표시하지 않는다.
38
+ - Claude Code와 Codex, macOS와 Windows 동시 지원을 기준으로 작성했다.
@@ -0,0 +1,30 @@
1
+ # Context Metrics Artifact
2
+
3
+ ## 문제
4
+
5
+ Context Metrics Evaluator가 fixture 기준으로 여섯 지표를 계산할 수 있게 되었지만, 실제 `/tink:cast`와 `$tink:cast` run state에는 아직 `context-metrics-evaluation.json`이 공식 산출물로 연결되어 있지 않았다. 이 상태에서는 90% 목표를 current run마다 반복 검증하기 어렵다.
6
+
7
+ ## 해결
8
+
9
+ - `templates/tink/schemas/context-metrics-evaluation.schema.json`을 추가했다.
10
+ - `/tink:cast`와 `$tink:cast` 지침에 `.tink/current/context-metrics-evaluation.json` 생성 규칙을 추가했다.
11
+ - Work State Guide에 metrics evaluation 읽기 순서를 추가했다.
12
+ - Context Metrics Evaluator 문서를 run-state artifact 기준으로 갱신했다.
13
+ - `v1.4.0` 릴리즈 메타데이터, README, CHANGELOG, VERSIONING을 갱신했다.
14
+
15
+ ## 검증
16
+
17
+ - `npm test`
18
+ - `git diff --check`
19
+ - `claude plugin validate .claude-plugin/plugin.json`
20
+ - `claude plugin validate .claude-plugin/marketplace.json`
21
+ - `npm pack --dry-run --json`
22
+
23
+ ## 참고
24
+
25
+ - 새 public command는 추가하지 않았다.
26
+ - `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않았다.
27
+ - Sentry는 포함하지 않았다.
28
+ - release evidence bundling은 포함하지 않았다.
29
+ - 점수의 scope와 limit를 명시해야 하며, run-history나 telemetry 없이 production-wide 90%를 주장하지 않는다.
30
+ - Claude Code와 Codex, macOS와 Windows 동시 지원을 기준으로 작성했다.
@@ -0,0 +1,28 @@
1
+ # Context Metrics Evaluator
2
+
3
+ ## 문제
4
+
5
+ Context Budget Ledger로 context entry에 역할, 비용, 재사용 신호, 검증 연결을 남길 수 있게 되었지만, `efficiency_metrics` 점수는 아직 사람이 적은 값에 가까웠다. 목표 지표를 90% 이상으로 반복 개선하려면 점수가 실제 fixture에서 다시 계산되어야 한다.
6
+
7
+ ## 해결
8
+
9
+ - `tests/fixtures/current-run/context-metrics-evaluation.json`을 추가해 여섯 지표의 계산식, 분자, 분모, evidence ref를 기록했다.
10
+ - `context-map.json.efficiency_metrics`를 `fixture-ratio-v1` 기준의 `measured` 점수로 갱신했다.
11
+ - `tests/test_templates.py`가 context-map, context-diff, contract, repo signal, path-case fixture를 다시 읽어 점수를 직접 계산하도록 했다.
12
+ - 반복 path-case에 expected context role을 보강해 재사용 정확도도 계산할 수 있게 했다.
13
+ - 한국어 우선 문서 `docs/context-metrics-evaluator.ko.md`와 영어 companion을 추가했다.
14
+
15
+ ## 검증
16
+
17
+ - `npm test`
18
+ - `git diff --check`
19
+ - `npm pack --dry-run --json`
20
+
21
+ ## 참고
22
+
23
+ - 새 public command는 추가하지 않았다.
24
+ - `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않았다.
25
+ - Sentry는 포함하지 않았다.
26
+ - release evidence bundling은 포함하지 않았다.
27
+ - 이번 점수는 fixture scope의 측정값이며, production telemetry 전체가 90%에 도달했다는 뜻은 아니다.
28
+ - Claude Code와 Codex, macOS와 Windows 동시 지원 기준을 유지했다.
@@ -21,11 +21,13 @@ Tink는 실행 상태를 파일로 남겨서 사람이 빠르게 네 가지를
21
21
  - 선택된 파일, 문서, 규칙, 외부 source의 사람용 요약을 읽습니다.
22
22
  3. `.tink/current/context-map.json`
23
23
  - `included`, `excluded`, `signals`, `external_context`를 구조적으로 확인합니다.
24
- 4. `.tink/current/excluded-context.md`
24
+ 4. `.tink/current/context-metrics-evaluation.json`
25
+ - context 효율 점수, 계산식, 분자/분모, evidence ref, 측정 scope와 한계를 확인합니다.
26
+ 5. `.tink/current/excluded-context.md`
25
27
  - 오래됐거나, 위험하거나, 너무 넓거나, 접근할 수 없거나, 범위 밖이라 제외한 context를 확인합니다.
26
- 5. `.tink/current/verification.json`
28
+ 6. `.tink/current/verification.json`
27
29
  - pass, fail, blocked, skipped 검증 결과와 최종 report를 확인합니다.
28
- 6. `.tink/current/notes.md`
30
+ 7. `.tink/current/notes.md`
29
31
  - 마지막 안전 지점, 복구 메모, 짧은 검증 요약을 읽습니다.
30
32
 
31
33
  ## Context 읽는 법
@@ -39,6 +41,19 @@ Tink는 실행 상태를 파일로 남겨서 사람이 빠르게 네 가지를
39
41
  - `signals`: repo signal, `context_graph_rule` 선택, verification hint, unmatched path, 선택 근거.
40
42
  - `external_context`: Figma, GitHub, official docs, dashboards, API responses, screenshots, attachments, runbooks 같은 외부 source.
41
43
 
44
+ 각 context entry에 Context Budget Ledger 필드가 있으면 다음처럼 읽습니다.
45
+
46
+ - `role`: 이 context가 핵심인지, 보조인지, 검증 대상인지, 다음에는 피해야 하는 후보인지 알려줍니다.
47
+ - `cost`: 처음 context pack에 넣기 위한 상대 비용입니다.
48
+ - `reuse_signal`: 다음 비슷한 작업에서 다시 쓸지, 예시로만 볼지, 피할지 알려줍니다.
49
+ - `verification_link`: 이 context가 어떤 check나 evidence와 연결되는지 보여줍니다.
50
+ - `staleness`: 오래된 정보인지 빠르게 판단하는 신호입니다.
51
+ - `evidence_kind`: 파일, 문서, 스키마, 테스트, 외부 evidence 같은 근거 종류입니다.
52
+
53
+ 자세한 기준은 `docs/context-budget-ledger.ko.md`를 봅니다.
54
+
55
+ `context-metrics-evaluation.json`은 점수가 어떻게 나왔는지 보여줍니다. `scope: "fixture"`나 `scope: "current_run"`이면 해당 범위 안에서만 측정된 값입니다. 여러 run record나 production telemetry가 없으면 전체 사용자 작업이 90%에 도달했다고 말하지 않습니다.
56
+
42
57
  `signals[]`에 `kind: "context_graph_rule"`가 있으면 `/tink:cast`나 `$tink:cast`가 changed path를 보고 고른 작은 단서로 읽습니다. `context_graph_lite.rules.claude-command-sync` 같은 안정적인 `source_ref`를 가리키고, 왜 관련 파일을 함께 포함했는지 설명해야 합니다. 이 신호는 cast 내부 선택 근거일 뿐이며 public `tink index` 명령, watcher, generated cache, hidden runtime index를 뜻하지 않습니다.
43
58
 
44
59
  외부 context는 다음 항목을 확인합니다.
@@ -19,11 +19,13 @@ Start here when resuming, reviewing, or handing off a run:
19
19
  - Read the short human summary of selected files, docs, rules, and external sources.
20
20
  3. `.tink/current/context-map.json`
21
21
  - Inspect the structured `included`, `excluded`, `signals`, and `external_context` entries.
22
- 4. `.tink/current/excluded-context.md`
22
+ 4. `.tink/current/context-metrics-evaluation.json`
23
+ - Check context-efficiency scores, formulas, numerators, denominators, evidence refs, measurement scope, and limits.
24
+ 5. `.tink/current/excluded-context.md`
23
25
  - Check what was skipped because it was stale, unsafe, too broad, unavailable, or outside scope.
24
- 5. `.tink/current/verification.json`
26
+ 6. `.tink/current/verification.json`
25
27
  - Confirm pass, fail, blocked, or skipped checks and the final report.
26
- 6. `.tink/current/notes.md`
28
+ 7. `.tink/current/notes.md`
27
29
  - Read the last safe point, recovery notes, and compact verification summaries.
28
30
 
29
31
  ## How To Read Context
@@ -37,6 +39,19 @@ Use `context-map.json` when you need traceability:
37
39
  - `signals`: repo signals, `context_graph_rule` selections, verification hints, unmatched paths, or other selection evidence.
38
40
  - `external_context`: outside sources such as Figma, GitHub, official docs, dashboards, API responses, screenshots, attachments, or runbooks.
39
41
 
42
+ When context entries include Context Budget Ledger fields, read them this way:
43
+
44
+ - `role`: whether the context is primary, supporting, a verification target, or something to avoid next time.
45
+ - `cost`: relative cost for putting the entry in the first context pack.
46
+ - `reuse_signal`: whether similar future runs should reuse, treat as an example, or avoid the entry.
47
+ - `verification_link`: the check, evidence ref, or verification hint connected to the entry.
48
+ - `staleness`: a quick freshness signal.
49
+ - `evidence_kind`: whether the evidence is a file, doc, schema, test, external source, or another kind.
50
+
51
+ See `docs/context-budget-ledger.md` for the detailed rules.
52
+
53
+ `context-metrics-evaluation.json` explains how the score was produced. If `scope` is `fixture` or `current_run`, the value is measured only inside that boundary. Do not claim all user work has reached 90% without run-history or production telemetry evidence.
54
+
40
55
  When `signals[]` includes `kind: "context_graph_rule"`, read it as a small changed-path clue selected by `/tink:cast` or `$tink:cast`. It should point to a stable `source_ref` such as `context_graph_lite.rules.claude-command-sync`, explain why related files were included, and stay internal to cast. It must not imply a public `tink index` command, watcher, generated cache, or hidden runtime index.
41
56
 
42
57
  For external context, check:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tink-harness",
3
- "version": "1.2.2",
3
+ "version": "1.4.0",
4
4
  "description": "Self-growing harnesses for Claude Code and Codex.",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -144,6 +144,7 @@ After approval, create `.tink/current/` with these files before doing deeper wor
144
144
  - `session.json`: lightweight session metadata, especially rule ids already loaded by phase
145
145
  - `context-pack.md`: human-readable selected context, including why each item is relevant
146
146
  - `context-map.json`: machine-readable included and excluded context with reasons
147
+ - `context-metrics-evaluation.json`: measured or estimated context-efficiency scores, formulas, evidence refs, and limits
147
148
  - `excluded-context.md`: notable omitted files, tools, sources, or claims and why they were excluded
148
149
 
149
150
  Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
@@ -188,9 +189,12 @@ If `.tink/schemas/session.schema.json` exists, use it as the session shape. Do n
188
189
  Create context artifacts before deeper implementation work:
189
190
  - `context-pack.md` should name the user task, selected harnesses, contract summary, loaded rules, selected files/docs, selected external sources, and verification implications.
190
191
  - `context-map.json` should contain `task`, `included`, `excluded`, `signals`, and `generated_at`. Each included or excluded entry should include `path` or `source`, `kind`, `reason`, and `confidence`. When external context is selected, also write `external_context[]`.
192
+ - When useful, enrich each context entry with Context Budget Ledger fields: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`. Use them to explain why the first context pack is small enough, why excluded context should stay out, and which checks prove selected context matters.
193
+ - `context-metrics-evaluation.json` should contain `run`, `evaluator`, `target_threshold_percent`, `measurement_status`, `scope`, `limits`, and `scores[]`. Each score should include `name`, `score_percent`, `formula`, `numerator`, `denominator`, and `evidence_refs`. If the score is based only on fixtures or the current run, say so in `scope` and `limits`; do not claim production-wide 90% without run-history or telemetry evidence.
191
194
  - `excluded-context.md` should make important omissions visible, especially files skipped because they are out of scope, stale, risky, too broad, or unverified external claims.
192
195
 
193
196
  If `.tink/schemas/context-map.schema.json` exists, use it for `context-map.json`. Do not paste the schema into the user response.
197
+ If `.tink/schemas/context-metrics-evaluation.schema.json` exists, use it for `context-metrics-evaluation.json`. Do not paste the schema into the user response.
194
198
 
195
199
  Use deterministic context selection inside cast. Do not create or require a separate `tink index` command for this phase.
196
200
 
@@ -227,6 +231,7 @@ When a repo signal fixture exists, such as `tests/fixtures/repo-signals/*.json`
227
231
  - set `signal.source` to the fixture path and `signal.source_ref` to the relevant entry name or JSON path when useful;
228
232
  - do not include every fixture entry by default; select only entries that explain the current task, verification, or safety boundary;
229
233
  - if the fixture conflicts with live repo state, prefer live repo state and record the fixture mismatch as a medium-confidence signal.
234
+ - if the fixture provides `context_budget_policy`, use it to assign entry roles, cost, reuse signals, verification links, staleness, and evidence kinds; do not treat the policy as telemetry or claim a 90% score without evidence.
230
235
 
231
236
  Context Graph Lite rules may appear in the same fixture under `context_graph_lite.rules[]`. Use them only inside cast:
232
237
  - match changed paths against `when_paths`;
@@ -258,6 +263,7 @@ Exclusion rules:
258
263
  - Exclude product phases that are explicitly deferred, and name the deferral in `excluded-context.md`.
259
264
  - Prefer a short high-confidence context pack over a broad low-confidence one.
260
265
  - When unsure, include the uncertainty in `reason` and set `confidence` to `low` or `medium` rather than silently expanding scope.
266
+ - For repeated false starts, mark the entry with `reuse_signal: "avoid_next_time"` or `role: "stale"` instead of deleting the evidence. This lets later runs skip it faster while preserving the reason.
261
267
 
262
268
  Candidate limits:
263
269
  - Start with 5-12 included entries for normal code/doc work.
@@ -398,7 +404,7 @@ A task is trivial only when ALL of the following are true:
398
404
  15. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
399
405
  16. Ask for explicit approval before non-trivial work.
400
406
  17. After approval, read only the selected harness files and any approved run-only draft.
401
- 18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, and `excluded-context.md`.
407
+ 18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`.
402
408
  19. Execute the first safe step immediately:
403
409
  - inspect relevant files,
404
410
  - run a read-only diagnostic,
@@ -31,7 +31,7 @@ Accept legacy `$tink <action>` spelling for compatibility, but present `$tink:<a
31
31
  11. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
32
32
  12. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
33
33
  13. Current-run approval never authorizes reusable-state writes. Before saving reusable state, show operation, destination files, exact entry or patch summary, reusable reason, sensitive content excluded, and rollback/removal path.
34
- 14. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, and `excluded-context.md`.
34
+ 14. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`.
35
35
  15. Do not stop at recommendation. Execute the first safe step after run state exists.
36
36
  16. Run `$tink:verify` behavior before final when `contract.json` lists required checks.
37
37
  17. Store reusable memory or rule updates under `.tink/` only after separate approval.
@@ -102,11 +102,16 @@ Create run state before deeper work:
102
102
  - `session.json`: loaded rule ids by phase and lightweight retrieval metadata
103
103
  - `.tink/current/context-pack.md`: human-readable selected context and why it matters
104
104
  - `.tink/current/context-map.json`: machine-readable included/excluded context and reasons
105
+ - `.tink/current/context-metrics-evaluation.json`: measured or estimated context-efficiency scores, formulas, evidence refs, and limits
105
106
  - `.tink/current/excluded-context.md`: notable omitted context and why it was left out
106
107
 
108
+ When useful, enrich `context-map.json.included[]` and `context-map.json.excluded[]` entries with Context Budget Ledger fields: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`. Use them to keep the first context pack small, mark stale or avoid-next-time context, and connect `verification_target` entries to command checks, manual checks, evidence refs, or verification hints. Do not claim any 90% efficiency score without measurement evidence.
109
+
110
+ When writing `context-metrics-evaluation.json`, include `run`, `evaluator`, `target_threshold_percent`, `measurement_status`, `scope`, `limits`, and `scores[]`. Each score should include `name`, `score_percent`, `formula`, `numerator`, `denominator`, and `evidence_refs`. If the score comes only from fixture or current-run evidence, record that scope and do not claim production-wide 90% without run-history or telemetry evidence.
111
+
107
112
  When external context is needed for `$tink:cast`, write it through the MCP Safe Profile shape in `context-map.json.external_context[]`. Record `source`, `source_ref`, `kind`, `included`, `excluded`, `reason`, `confidence`, `sensitivity`, and `verification_hint` when useful. Treat Figma, GitHub, and official docs as representative examples, not the only supported sources; Linear, Jira, Supabase, dashboards, API responses, screenshots, attachments, and runbooks can follow the same shape.
108
113
 
109
- When repo signal fixtures contain `context_graph_lite.rules[]`, use those rules inside `$tink:cast` to choose the first related context candidates. Match changed paths against `when_paths`, consider `include_paths`, cite selected rules as `context_graph_rule` signals with `source_ref: "context_graph_lite.rules.<name>"`, and connect `signal_refs` to verification hints where relevant. Do not create a public `tink index` command, watch process, generated cache, or hidden runtime index.
114
+ When repo signal fixtures contain `context_graph_lite.rules[]`, use those rules inside `$tink:cast` to choose the first related context candidates. Match changed paths against `when_paths`, consider `include_paths`, cite selected rules as `context_graph_rule` signals with `source_ref: "context_graph_lite.rules.<name>"`, and connect `signal_refs` to verification hints where relevant. If the fixture provides `context_budget_policy`, use it to assign roles, costs, reuse signals, verification links, staleness, and evidence kinds. Do not create a public `tink index` command, watch process, generated cache, or hidden runtime index.
110
115
 
111
116
  External context safety checklist:
112
117
  - Select the smallest useful `source_ref`; avoid whole files, boards, dashboards, logs, or design systems when one issue, frame, section, screenshot, or attachment is enough.
@@ -1,154 +1,247 @@
1
- {
2
- "$schema": "https://json-schema.org/draft/2020-12/schema",
3
- "$id": "https://github.com/dotoricode/tink-harness/schemas/context-map.schema.json",
4
- "title": "Tink current run context map",
5
- "type": "object",
6
- "required": ["task", "included", "excluded", "signals", "generated_at"],
7
- "properties": {
8
- "task": {
9
- "type": "object",
10
- "required": ["summary"],
11
- "properties": {
12
- "summary": {
13
- "type": "string",
14
- "description": "Short description of the active user task."
15
- },
16
- "type": {
17
- "type": "string",
18
- "description": "Task class from contract.json when available."
19
- },
20
- "harnesses": {
21
- "type": "array",
22
- "items": { "type": "string" }
23
- }
24
- },
25
- "additionalProperties": true
26
- },
27
- "included": {
28
- "type": "array",
29
- "items": { "$ref": "#/$defs/context_entry" }
30
- },
31
- "excluded": {
32
- "type": "array",
33
- "items": { "$ref": "#/$defs/context_entry" }
34
- },
35
- "signals": {
36
- "type": "array",
37
- "items": {
38
- "type": "object",
39
- "required": ["kind", "value", "reason"],
40
- "properties": {
41
- "kind": { "type": "string" },
42
- "value": { "type": "string" },
43
- "reason": { "type": "string" },
44
- "source": {
45
- "type": "string",
46
- "description": "Optional source artifact for this signal, such as a repo signal fixture or current run file."
47
- },
48
- "source_ref": {
49
- "type": "string",
50
- "description": "Optional path-like reference inside the source artifact."
51
- },
52
- "confidence": {
53
- "type": "string",
54
- "enum": ["low", "medium", "high"]
55
- }
56
- },
57
- "additionalProperties": true
58
- }
59
- },
60
- "external_context": {
61
- "type": "array",
62
- "items": { "$ref": "#/$defs/external_context_profile" }
63
- },
64
- "generated_at": {
65
- "type": "string",
66
- "description": "ISO timestamp for when the context map was created."
67
- }
68
- },
69
- "$defs": {
70
- "context_entry": {
71
- "type": "object",
72
- "required": ["kind", "reason", "confidence"],
73
- "oneOf": [
74
- { "required": ["path"] },
75
- { "required": ["source"] }
76
- ],
77
- "properties": {
78
- "path": {
79
- "type": "string",
80
- "description": "Repo-local file or directory path."
81
- },
82
- "source": {
83
- "type": "string",
84
- "description": "External source, tool, connector, or non-file context."
85
- },
86
- "kind": {
87
- "type": "string",
88
- "description": "file, directory, doc, rule, memory, external, command, or other context kind."
89
- },
90
- "reason": {
91
- "type": "string",
92
- "description": "Why this context was included or excluded."
93
- },
94
- "confidence": {
95
- "type": "string",
96
- "enum": ["low", "medium", "high"]
97
- },
98
- "source_ref": {
99
- "type": "string",
100
- "description": "Optional source-local reference, such as issue id, frame id, URL label, or attachment name."
101
- },
102
- "sensitivity": {
103
- "type": "string",
104
- "enum": ["public", "internal", "sensitive", "secret"]
105
- },
106
- "verification_hint": {
107
- "type": "string",
108
- "description": "Follow-up check needed before treating this context as verified."
109
- }
110
- },
111
- "additionalProperties": true
112
- },
113
- "external_context_profile": {
114
- "type": "object",
115
- "required": ["source", "source_ref", "kind", "included", "excluded", "reason", "confidence", "sensitivity"],
116
- "properties": {
117
- "source": {
118
- "type": "string",
119
- "description": "External source name, such as Figma, GitHub, docs, attachment, dashboard, or another connector."
120
- },
121
- "source_ref": {
122
- "type": "string",
123
- "description": "Smallest useful source-local reference."
124
- },
125
- "kind": {
126
- "type": "string",
127
- "description": "External context kind, such as error_evidence, design_evidence, work_item, reference, runtime_evidence, or attachment."
128
- },
129
- "included": {
130
- "type": "array",
131
- "items": { "type": "string" }
132
- },
133
- "excluded": {
134
- "type": "array",
135
- "items": { "type": "string" }
136
- },
137
- "reason": { "type": "string" },
138
- "confidence": {
139
- "type": "string",
140
- "enum": ["low", "medium", "high"]
141
- },
142
- "sensitivity": {
143
- "type": "string",
144
- "enum": ["public", "internal", "sensitive", "secret"]
145
- },
146
- "verification_hint": { "type": "string" },
147
- "blocked_reason": { "type": "string" },
148
- "next_action": { "type": "string" }
149
- },
150
- "additionalProperties": true
151
- }
152
- },
153
- "additionalProperties": true
154
- }
1
+ {
2
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
3
+ "$id": "https://github.com/dotoricode/tink-harness/schemas/context-map.schema.json",
4
+ "title": "Tink current run context map",
5
+ "type": "object",
6
+ "required": ["task", "included", "excluded", "signals", "generated_at"],
7
+ "properties": {
8
+ "task": {
9
+ "type": "object",
10
+ "required": ["summary"],
11
+ "properties": {
12
+ "summary": {
13
+ "type": "string",
14
+ "description": "Short description of the active user task."
15
+ },
16
+ "type": {
17
+ "type": "string",
18
+ "description": "Task class from contract.json when available."
19
+ },
20
+ "harnesses": {
21
+ "type": "array",
22
+ "items": { "type": "string" }
23
+ }
24
+ },
25
+ "additionalProperties": true
26
+ },
27
+ "included": {
28
+ "type": "array",
29
+ "items": { "$ref": "#/$defs/context_entry" }
30
+ },
31
+ "excluded": {
32
+ "type": "array",
33
+ "items": { "$ref": "#/$defs/context_entry" }
34
+ },
35
+ "signals": {
36
+ "type": "array",
37
+ "items": {
38
+ "type": "object",
39
+ "required": ["kind", "value", "reason"],
40
+ "properties": {
41
+ "kind": { "type": "string" },
42
+ "value": { "type": "string" },
43
+ "reason": { "type": "string" },
44
+ "source": {
45
+ "type": "string",
46
+ "description": "Optional source artifact for this signal, such as a repo signal fixture or current run file."
47
+ },
48
+ "source_ref": {
49
+ "type": "string",
50
+ "description": "Optional path-like reference inside the source artifact."
51
+ },
52
+ "confidence": {
53
+ "type": "string",
54
+ "enum": ["low", "medium", "high"]
55
+ }
56
+ },
57
+ "additionalProperties": true
58
+ }
59
+ },
60
+ "external_context": {
61
+ "type": "array",
62
+ "items": { "$ref": "#/$defs/external_context_profile" }
63
+ },
64
+ "efficiency_metrics": {
65
+ "$ref": "#/$defs/efficiency_metrics"
66
+ },
67
+ "generated_at": {
68
+ "type": "string",
69
+ "description": "ISO timestamp for when the context map was created."
70
+ }
71
+ },
72
+ "$defs": {
73
+ "context_entry": {
74
+ "type": "object",
75
+ "required": ["kind", "reason", "confidence"],
76
+ "oneOf": [
77
+ { "required": ["path"] },
78
+ { "required": ["source"] }
79
+ ],
80
+ "properties": {
81
+ "path": {
82
+ "type": "string",
83
+ "description": "Repo-local file or directory path."
84
+ },
85
+ "source": {
86
+ "type": "string",
87
+ "description": "External source, tool, connector, or non-file context."
88
+ },
89
+ "kind": {
90
+ "type": "string",
91
+ "description": "file, directory, doc, rule, memory, external, command, or other context kind."
92
+ },
93
+ "reason": {
94
+ "type": "string",
95
+ "description": "Why this context was included or excluded."
96
+ },
97
+ "confidence": {
98
+ "type": "string",
99
+ "enum": ["low", "medium", "high"]
100
+ },
101
+ "role": {
102
+ "type": "string",
103
+ "enum": [
104
+ "primary",
105
+ "supporting",
106
+ "verification_target",
107
+ "external_evidence",
108
+ "exclusion_candidate",
109
+ "example_only",
110
+ "stale",
111
+ "avoid_next_time"
112
+ ],
113
+ "description": "How this entry should be used when building or reviewing the context pack."
114
+ },
115
+ "cost": {
116
+ "type": "string",
117
+ "enum": ["low", "medium", "high"],
118
+ "description": "Relative context cost for deciding whether the entry belongs in the first context pack."
119
+ },
120
+ "reuse_signal": {
121
+ "type": "string",
122
+ "enum": ["always", "often", "rare", "example_only", "avoid_next_time"],
123
+ "description": "Whether similar future runs should prefer, down-rank, or avoid this context."
124
+ },
125
+ "verification_link": {
126
+ "type": "string",
127
+ "description": "Check name, evidence ref, or verification hint that proves why this context matters."
128
+ },
129
+ "staleness": {
130
+ "type": "string",
131
+ "enum": ["fresh", "aging", "stale", "unknown"],
132
+ "description": "Freshness signal used to exclude stale context faster."
133
+ },
134
+ "evidence_kind": {
135
+ "type": "string",
136
+ "enum": ["file", "doc", "schema", "test", "command", "external", "signal", "diff", "unknown"],
137
+ "description": "Kind of evidence represented by this context entry."
138
+ },
139
+ "source_ref": {
140
+ "type": "string",
141
+ "description": "Optional source-local reference, such as issue id, frame id, URL label, or attachment name."
142
+ },
143
+ "sensitivity": {
144
+ "type": "string",
145
+ "enum": ["public", "internal", "sensitive", "secret"]
146
+ },
147
+ "verification_hint": {
148
+ "type": "string",
149
+ "description": "Follow-up check needed before treating this context as verified."
150
+ }
151
+ },
152
+ "additionalProperties": true
153
+ },
154
+ "efficiency_metrics": {
155
+ "type": "object",
156
+ "required": ["target_threshold_percent", "measurement_status", "scores"],
157
+ "properties": {
158
+ "target_threshold_percent": {
159
+ "type": "number",
160
+ "minimum": 0,
161
+ "maximum": 100
162
+ },
163
+ "measurement_status": {
164
+ "type": "string",
165
+ "enum": ["estimated", "measured", "mixed"]
166
+ },
167
+ "scores": {
168
+ "type": "array",
169
+ "items": { "$ref": "#/$defs/metric_score" }
170
+ },
171
+ "notes": {
172
+ "type": "string"
173
+ }
174
+ },
175
+ "additionalProperties": true
176
+ },
177
+ "metric_score": {
178
+ "type": "object",
179
+ "required": ["name", "score_percent", "basis", "confidence"],
180
+ "properties": {
181
+ "name": {
182
+ "type": "string"
183
+ },
184
+ "score_percent": {
185
+ "type": "number",
186
+ "minimum": 0,
187
+ "maximum": 100
188
+ },
189
+ "basis": {
190
+ "type": "string"
191
+ },
192
+ "confidence": {
193
+ "type": "string",
194
+ "enum": ["low", "medium", "high"]
195
+ },
196
+ "evidence_refs": {
197
+ "type": "array",
198
+ "items": { "type": "string" }
199
+ },
200
+ "limit": {
201
+ "type": "string"
202
+ }
203
+ },
204
+ "additionalProperties": true
205
+ },
206
+ "external_context_profile": {
207
+ "type": "object",
208
+ "required": ["source", "source_ref", "kind", "included", "excluded", "reason", "confidence", "sensitivity"],
209
+ "properties": {
210
+ "source": {
211
+ "type": "string",
212
+ "description": "External source name, such as Figma, GitHub, docs, attachment, dashboard, or another connector."
213
+ },
214
+ "source_ref": {
215
+ "type": "string",
216
+ "description": "Smallest useful source-local reference."
217
+ },
218
+ "kind": {
219
+ "type": "string",
220
+ "description": "External context kind, such as error_evidence, design_evidence, work_item, reference, runtime_evidence, or attachment."
221
+ },
222
+ "included": {
223
+ "type": "array",
224
+ "items": { "type": "string" }
225
+ },
226
+ "excluded": {
227
+ "type": "array",
228
+ "items": { "type": "string" }
229
+ },
230
+ "reason": { "type": "string" },
231
+ "confidence": {
232
+ "type": "string",
233
+ "enum": ["low", "medium", "high"]
234
+ },
235
+ "sensitivity": {
236
+ "type": "string",
237
+ "enum": ["public", "internal", "sensitive", "secret"]
238
+ },
239
+ "verification_hint": { "type": "string" },
240
+ "blocked_reason": { "type": "string" },
241
+ "next_action": { "type": "string" }
242
+ },
243
+ "additionalProperties": true
244
+ }
245
+ },
246
+ "additionalProperties": true
247
+ }
@@ -0,0 +1,89 @@
1
+ {
2
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
3
+ "$id": "https://github.com/dotoricode/tink-harness/schemas/context-metrics-evaluation.schema.json",
4
+ "title": "Tink current run context metrics evaluation",
5
+ "type": "object",
6
+ "required": ["run", "evaluator", "target_threshold_percent", "measurement_status", "scope", "scores"],
7
+ "properties": {
8
+ "run": {
9
+ "type": "string",
10
+ "description": "Run state path, usually .tink/current."
11
+ },
12
+ "evaluator": {
13
+ "type": "string",
14
+ "description": "Stable evaluator id, such as fixture-ratio-v1."
15
+ },
16
+ "target_threshold_percent": {
17
+ "type": "number",
18
+ "minimum": 0,
19
+ "maximum": 100
20
+ },
21
+ "measurement_status": {
22
+ "type": "string",
23
+ "enum": ["estimated", "measured", "mixed"]
24
+ },
25
+ "scope": {
26
+ "type": "string",
27
+ "enum": ["fixture", "current_run", "run_history", "production_telemetry"]
28
+ },
29
+ "limits": {
30
+ "type": "array",
31
+ "items": { "type": "string" }
32
+ },
33
+ "scores": {
34
+ "type": "array",
35
+ "items": { "$ref": "#/$defs/metric_score" }
36
+ }
37
+ },
38
+ "$defs": {
39
+ "metric_score": {
40
+ "type": "object",
41
+ "required": [
42
+ "name",
43
+ "score_percent",
44
+ "formula",
45
+ "numerator",
46
+ "denominator",
47
+ "evidence_refs"
48
+ ],
49
+ "properties": {
50
+ "name": {
51
+ "type": "string",
52
+ "enum": [
53
+ "unnecessary_context_reduction",
54
+ "initial_context_pack_size_reduction",
55
+ "review_evidence_lookup_time_reduction",
56
+ "verification_omission_detection",
57
+ "repeated_context_reuse_accuracy",
58
+ "rework_probability_reduction"
59
+ ]
60
+ },
61
+ "score_percent": {
62
+ "type": "number",
63
+ "minimum": 0,
64
+ "maximum": 100
65
+ },
66
+ "formula": {
67
+ "type": "string"
68
+ },
69
+ "numerator": {
70
+ "type": "number",
71
+ "minimum": 0
72
+ },
73
+ "denominator": {
74
+ "type": "number",
75
+ "minimum": 0
76
+ },
77
+ "evidence_refs": {
78
+ "type": "array",
79
+ "items": { "type": "string" }
80
+ },
81
+ "limit": {
82
+ "type": "string"
83
+ }
84
+ },
85
+ "additionalProperties": true
86
+ }
87
+ },
88
+ "additionalProperties": true
89
+ }