tink-harness 1.3.0 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/CHANGELOG.md +16 -0
- package/README.ko.md +2 -2
- package/README.md +3 -3
- package/VERSIONING.md +1 -1
- package/commands/cast.md +4 -1
- package/docs/context-budget-ledger.ko.md +2 -0
- package/docs/context-budget-ledger.md +2 -0
- package/docs/context-metrics-evaluator.ko.md +39 -0
- package/docs/context-metrics-evaluator.md +39 -0
- package/docs/pr/2026-06-08-context-metrics-artifact.ko.md +30 -0
- package/docs/pr/2026-06-08-context-metrics-evaluator.ko.md +28 -0
- package/docs/work-state.ko.md +7 -3
- package/docs/work-state.md +7 -3
- package/package.json +1 -1
- package/templates/claude/commands/tink/cast.md +4 -1
- package/templates/codex/skills/tink-core/RULES.md +4 -1
- package/templates/tink/schemas/context-metrics-evaluation.schema.json +89 -0
package/CHANGELOG.md
CHANGED
|
@@ -7,6 +7,22 @@ All notable changes to Tink are tracked here.
|
|
|
7
7
|
No unreleased changes yet.
|
|
8
8
|
|
|
9
9
|
|
|
10
|
+
## [1.4.0] - 2026-06-08
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
|
|
14
|
+
- `context-metrics-evaluation.schema.json` for `.tink/current/context-metrics-evaluation.json`.
|
|
15
|
+
- Context Metrics Evaluator run-state artifact guidance for `/tink:cast` and `$tink:cast`.
|
|
16
|
+
- Fixture-ratio evaluation docs in Korean and English, explaining measured fixture scope versus production telemetry.
|
|
17
|
+
- Test-backed context metrics evaluation fixture that calculates all six context-efficiency metrics at or above the 90% target within fixture scope.
|
|
18
|
+
- Korean PR history draft for the Context Metrics Artifact work in `docs/pr/2026-06-08-context-metrics-artifact.ko.md`.
|
|
19
|
+
|
|
20
|
+
### Changed
|
|
21
|
+
|
|
22
|
+
- Work State Guide now includes `context-metrics-evaluation.json` in the reading order.
|
|
23
|
+
- README and Korean README now link to the Context Metrics Evaluator docs without adding a new command.
|
|
24
|
+
|
|
25
|
+
|
|
10
26
|
## [1.3.0] - 2026-06-08
|
|
11
27
|
|
|
12
28
|
### Added
|
package/README.ko.md
CHANGED
|
@@ -8,7 +8,7 @@ Claude Code와 Codex를 위한 작은 하네스 레이어입니다.
|
|
|
8
8
|
|
|
9
9
|
Tink는 지금 작업에 맞는 하네스를 고르고, 실행 상태를 보이게 만들고, 실제 사용 중 생긴 실패와 피드백으로 하네스 세트를 개선합니다.
|
|
10
10
|
|
|
11
|
-
**최신 릴리스:** v1.
|
|
11
|
+
**최신 릴리스:** v1.4.0 — context 효율 점수를 계산식, evidence ref, 측정 scope와 함께 남기는 Context Metrics Evaluator artifact.
|
|
12
12
|
|
|
13
13
|
[English](README.md) · **한국어**
|
|
14
14
|
|
|
@@ -66,7 +66,7 @@ npx tink-harness@latest update
|
|
|
66
66
|
- Codex에는 하나의 넓은 `tink` 스킬 대신 `$tink:cast`, `$tink:verify` 같은 action skill만 보이도록 설치됩니다.
|
|
67
67
|
- 비단순 작업은 `context-pack.md`, `context-map.json`, `excluded-context.md`로 어떤 context를 썼고 뺐는지 남깁니다.
|
|
68
68
|
- Repo Signal과 Context Graph Lite는 새 `tink index` 명령을 만들지 않고도 관련 테스트, 스키마, 동기화 파일, 검증 힌트를 고르는 데 쓰입니다.
|
|
69
|
-
- context 효율
|
|
69
|
+
- context 효율 점수화와 fixture 비율 계산 기준은 `docs/context-budget-ledger.ko.md`, `docs/context-budget-ledger.md`, `docs/context-metrics-evaluator.ko.md`, `docs/context-metrics-evaluator.md`에서 확인할 수 있습니다.
|
|
70
70
|
- `/tink:verify`와 `$tink:verify`는 같은 Verify Runner 모델을 쓰며 `.tink/current/verification.json`에 검증 증거를 남깁니다.
|
|
71
71
|
- 외부 context는 MCP Safe Profile을 따릅니다. 가장 작은 source handle만 남기고, 신뢰도와 민감도를 표시하며, 위험하거나 너무 넓은 context는 `excluded-context.md`에 따로 기록합니다.
|
|
72
72
|
|
package/README.md
CHANGED
|
@@ -17,14 +17,14 @@
|
|
|
17
17
|
</p>
|
|
18
18
|
|
|
19
19
|
<p>
|
|
20
|
-
<a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.
|
|
20
|
+
<a href="https://github.com/dotoricode/tink-harness/releases/tag/v1.4.0"><img src="https://img.shields.io/github/v/release/dotoricode/tink-harness?label=release&color=2ea44f" alt="GitHub release"></a>
|
|
21
21
|
<a href="https://www.npmjs.com/package/tink-harness"><img src="https://img.shields.io/npm/v/tink-harness?label=npm&color=cb3837" alt="npm version"></a>
|
|
22
22
|
<a href="https://github.com/dotoricode/tink-harness/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/dotoricode/tink-harness/ci.yml?branch=main&label=ci" alt="CI"></a>
|
|
23
23
|
<a href="https://github.com/dotoricode/tink-harness/blob/main/LICENSE"><img src="https://img.shields.io/github/license/dotoricode/tink-harness" alt="License"></a>
|
|
24
24
|
<a href="https://github.com/dotoricode/tink-harness/stargazers"><img src="https://img.shields.io/github/stars/dotoricode/tink-harness?style=social" alt="GitHub stars"></a>
|
|
25
25
|
</p>
|
|
26
26
|
|
|
27
|
-
<p><strong>Latest release:</strong> v1.
|
|
27
|
+
<p><strong>Latest release:</strong> v1.4.0 — Context Metrics Evaluator artifact for measured context-efficiency scores.</p>
|
|
28
28
|
|
|
29
29
|
**English** · [한국어](README.ko.md)
|
|
30
30
|
|
|
@@ -131,7 +131,7 @@ This release makes Tink work as one harness layer across Claude Code and Codex.
|
|
|
131
131
|
- Codex now installs focused `$tink:*` action skills instead of one broad visible `tink` skill, so the picker shows commands like `$tink:cast` and `$tink:verify` cleanly.
|
|
132
132
|
- Non-trivial runs now create context artifacts: `context-pack.md`, `context-map.json`, and `excluded-context.md`.
|
|
133
133
|
- Repo Signals and Context Graph Lite help `/tink:cast` and `$tink:cast` choose relevant tests, schemas, sync partners, and verification hints without adding a new `tink index` command.
|
|
134
|
-
- Context Budget Ledger fields are documented in `docs/context-budget-ledger.md
|
|
134
|
+
- Context Budget Ledger fields and fixture-ratio evaluation are documented in `docs/context-budget-ledger.md`, `docs/context-budget-ledger.ko.md`, `docs/context-metrics-evaluator.md`, and `docs/context-metrics-evaluator.ko.md` without adding a new command.
|
|
135
135
|
- `/tink:verify` and `$tink:verify` share one portable Verify Runner model and write compact evidence to `.tink/current/verification.json`.
|
|
136
136
|
- External context now follows the MCP Safe Profile: include only the smallest useful source handle, mark confidence and sensitivity, exclude unsafe context visibly, and connect important claims to verification.
|
|
137
137
|
|
package/VERSIONING.md
CHANGED
package/commands/cast.md
CHANGED
|
@@ -144,6 +144,7 @@ After approval, create `.tink/current/` with these files before doing deeper wor
|
|
|
144
144
|
- `session.json`: lightweight session metadata, especially rule ids already loaded by phase
|
|
145
145
|
- `context-pack.md`: human-readable selected context, including why each item is relevant
|
|
146
146
|
- `context-map.json`: machine-readable included and excluded context with reasons
|
|
147
|
+
- `context-metrics-evaluation.json`: measured or estimated context-efficiency scores, formulas, evidence refs, and limits
|
|
147
148
|
- `excluded-context.md`: notable omitted files, tools, sources, or claims and why they were excluded
|
|
148
149
|
|
|
149
150
|
Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
|
|
@@ -189,9 +190,11 @@ Create context artifacts before deeper implementation work:
|
|
|
189
190
|
- `context-pack.md` should name the user task, selected harnesses, contract summary, loaded rules, selected files/docs, selected external sources, and verification implications.
|
|
190
191
|
- `context-map.json` should contain `task`, `included`, `excluded`, `signals`, and `generated_at`. Each included or excluded entry should include `path` or `source`, `kind`, `reason`, and `confidence`. When external context is selected, also write `external_context[]`.
|
|
191
192
|
- When useful, enrich each context entry with Context Budget Ledger fields: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`. Use them to explain why the first context pack is small enough, why excluded context should stay out, and which checks prove selected context matters.
|
|
193
|
+
- `context-metrics-evaluation.json` should contain `run`, `evaluator`, `target_threshold_percent`, `measurement_status`, `scope`, `limits`, and `scores[]`. Each score should include `name`, `score_percent`, `formula`, `numerator`, `denominator`, and `evidence_refs`. If the score is based only on fixtures or the current run, say so in `scope` and `limits`; do not claim production-wide 90% without run-history or telemetry evidence.
|
|
192
194
|
- `excluded-context.md` should make important omissions visible, especially files skipped because they are out of scope, stale, risky, too broad, or unverified external claims.
|
|
193
195
|
|
|
194
196
|
If `.tink/schemas/context-map.schema.json` exists, use it for `context-map.json`. Do not paste the schema into the user response.
|
|
197
|
+
If `.tink/schemas/context-metrics-evaluation.schema.json` exists, use it for `context-metrics-evaluation.json`. Do not paste the schema into the user response.
|
|
195
198
|
|
|
196
199
|
Use deterministic context selection inside cast. Do not create or require a separate `tink index` command for this phase.
|
|
197
200
|
|
|
@@ -401,7 +404,7 @@ A task is trivial only when ALL of the following are true:
|
|
|
401
404
|
15. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
|
|
402
405
|
16. Ask for explicit approval before non-trivial work.
|
|
403
406
|
17. After approval, read only the selected harness files and any approved run-only draft.
|
|
404
|
-
18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, and `excluded-context.md`.
|
|
407
|
+
18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`.
|
|
405
408
|
19. Execute the first safe step immediately:
|
|
406
409
|
- inspect relevant files,
|
|
407
410
|
- run a read-only diagnostic,
|
|
@@ -48,6 +48,8 @@ Context Budget Ledger는 Tink가 context를 많이 모으는 대신, 왜 넣었
|
|
|
48
48
|
|
|
49
49
|
실제 telemetry가 없으면 `measurement_status: "estimated"`로 두고 한계를 함께 적는다. 근거 없이 90% 달성으로 표시하지 않는다.
|
|
50
50
|
|
|
51
|
+
fixture에서 비율을 계산할 수 있으면 `measurement_status: "measured"`를 쓸 수 있다. 이때도 scope가 fixture인지 production telemetry인지 분명히 적어야 한다. 자세한 계산 기준은 `docs/context-metrics-evaluator.ko.md`를 본다.
|
|
52
|
+
|
|
51
53
|
## 호환성 기준
|
|
52
54
|
|
|
53
55
|
- Claude Code와 Codex가 같은 schema와 fixture를 읽는다.
|
|
@@ -48,6 +48,8 @@ Entries with `role: "verification_target"` should connect to a command, manual c
|
|
|
48
48
|
|
|
49
49
|
If there is no runtime telemetry yet, mark the scores as `measurement_status: "estimated"` and include the limits. Do not claim 90% without evidence.
|
|
50
50
|
|
|
51
|
+
If fixture ratios can be calculated, `measurement_status: "measured"` is acceptable. The artifact must still state whether the scope is fixture evidence or production telemetry. See `docs/context-metrics-evaluator.md` for the calculation rules.
|
|
52
|
+
|
|
51
53
|
## Compatibility
|
|
52
54
|
|
|
53
55
|
- Claude Code and Codex read the same schema and fixtures.
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# Context Metrics Evaluator
|
|
2
|
+
|
|
3
|
+
Context Metrics Evaluator는 Context Budget Ledger에 적힌 필드를 실제 비율로 계산하는 테스트 기준이다.
|
|
4
|
+
|
|
5
|
+
이 기능은 새 public command가 아니다. `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않는다. `/tink:cast`와 `$tink:cast`는 `.tink/current/context-metrics-evaluation.json`을 run state artifact로 남기고, `tests/fixtures/current-run/context-metrics-evaluation.json`과 `tests/test_templates.py`는 같은 점수를 계산하는지 확인한다.
|
|
6
|
+
|
|
7
|
+
영어판은 `docs/context-metrics-evaluator.md`에 있다.
|
|
8
|
+
|
|
9
|
+
## 왜 필요한가
|
|
10
|
+
|
|
11
|
+
`context-map.json.efficiency_metrics`에 점수를 사람이 직접 적기만 하면, 숫자가 좋아 보여도 근거가 약하다. Evaluator는 fixture를 다시 읽어서 다음을 계산한다.
|
|
12
|
+
|
|
13
|
+
- excluded context가 `role`, `cost`, `reuse_signal`, `staleness`, `reason`을 갖는 비율.
|
|
14
|
+
- included context가 `role`과 `cost`를 갖고, high-cost 항목은 `verification_link`를 갖는 비율.
|
|
15
|
+
- included context가 `role`과 `verification_link`를 함께 갖는 비율.
|
|
16
|
+
- `verification_target` 항목이 실제 verification command나 verification hint와 연결되는 비율.
|
|
17
|
+
- 반복 path-case가 expected context role을 갖는 비율.
|
|
18
|
+
- context-diff 변화가 verification link와 metric impact로 추적되는 비율.
|
|
19
|
+
|
|
20
|
+
## 점수의 의미
|
|
21
|
+
|
|
22
|
+
`fixture-ratio-v1`에서 90% 이상이라는 말은 “예시 artifact가 내부적으로 측정 가능하고 빠진 필드가 거의 없다”는 뜻이다.
|
|
23
|
+
|
|
24
|
+
이것은 아직 “실제 모든 사용자 작업에서 90% 효율을 달성했다”는 뜻이 아니다. production telemetry나 여러 run record가 쌓이기 전까지는 scope를 `fixture`로 제한한다.
|
|
25
|
+
|
|
26
|
+
## 완료 기준
|
|
27
|
+
|
|
28
|
+
- 여섯 지표가 모두 fixture 계산 기준으로 90% 이상이다.
|
|
29
|
+
- `context-map.json.efficiency_metrics.scores[]`와 `context-metrics-evaluation.json`의 점수가 일치한다.
|
|
30
|
+
- 각 점수에는 `formula`, `numerator`, `denominator`, `evidence_refs`, `limit`가 있다.
|
|
31
|
+
- `measurement_status`는 fixture에서 계산되면 `measured`로 둘 수 있지만, 문서에는 한계를 함께 적는다.
|
|
32
|
+
- 설치된 schema는 `.tink/schemas/context-metrics-evaluation.schema.json`이다.
|
|
33
|
+
|
|
34
|
+
## 호환성 기준
|
|
35
|
+
|
|
36
|
+
- Claude Code와 Codex가 같은 artifact를 읽을 수 있어야 한다.
|
|
37
|
+
- macOS와 Windows 모두에서 `npm test`로 검증되어야 한다.
|
|
38
|
+
- 사용자 승인 없이 reusable memory, harness, rule, config를 저장하지 않는다.
|
|
39
|
+
- Sentry와 release evidence bundling은 포함하지 않는다.
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# Context Metrics Evaluator
|
|
2
|
+
|
|
3
|
+
Context Metrics Evaluator is a test-backed way to calculate the ratios recorded by Context Budget Ledger.
|
|
4
|
+
|
|
5
|
+
It is not a new public command. It must not add a `tink index` command, watcher, generated cache, or hidden runtime index. `/tink:cast` and `$tink:cast` write `.tink/current/context-metrics-evaluation.json` as a run-state artifact, and `tests/fixtures/current-run/context-metrics-evaluation.json` plus `tests/test_templates.py` must calculate the same scores.
|
|
6
|
+
|
|
7
|
+
Korean companion: `docs/context-metrics-evaluator.ko.md`.
|
|
8
|
+
|
|
9
|
+
## Why It Exists
|
|
10
|
+
|
|
11
|
+
If `context-map.json.efficiency_metrics` is only hand-written, the numbers can look better than the evidence. The evaluator re-reads the fixtures and calculates:
|
|
12
|
+
|
|
13
|
+
- Excluded context with `role`, `cost`, `reuse_signal`, `staleness`, and `reason`.
|
|
14
|
+
- Included context with `role` and `cost`, with high-cost entries justified by `verification_link`.
|
|
15
|
+
- Included context with both `role` and `verification_link`.
|
|
16
|
+
- `verification_target` entries linked to known verification commands or hints.
|
|
17
|
+
- Repeated path-cases with expected context roles.
|
|
18
|
+
- Context-diff changes traceable through verification links and metric impacts.
|
|
19
|
+
|
|
20
|
+
## What The Score Means
|
|
21
|
+
|
|
22
|
+
In `fixture-ratio-v1`, a score at or above 90% means the example artifacts are internally measurable and have very few missing fields.
|
|
23
|
+
|
|
24
|
+
It does not mean that every real user run has reached 90% efficiency. Until production telemetry or multiple run records exist, the measurement scope stays `fixture`.
|
|
25
|
+
|
|
26
|
+
## Done Criteria
|
|
27
|
+
|
|
28
|
+
- All six metrics are at or above 90% under the fixture calculation.
|
|
29
|
+
- `context-map.json.efficiency_metrics.scores[]` matches `context-metrics-evaluation.json`.
|
|
30
|
+
- Each score has `formula`, `numerator`, `denominator`, `evidence_refs`, and `limit`.
|
|
31
|
+
- `measurement_status` may be `measured` for fixture calculations, but the docs must state the limit.
|
|
32
|
+
- The installed schema is `.tink/schemas/context-metrics-evaluation.schema.json`.
|
|
33
|
+
|
|
34
|
+
## Compatibility
|
|
35
|
+
|
|
36
|
+
- Claude Code and Codex read the same artifacts.
|
|
37
|
+
- macOS and Windows are both verified through `npm test`.
|
|
38
|
+
- Reusable memory, harness, rule, and config saves still require explicit approval.
|
|
39
|
+
- Sentry and release evidence bundling are out of scope.
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# Context Metrics Artifact
|
|
2
|
+
|
|
3
|
+
## 문제
|
|
4
|
+
|
|
5
|
+
Context Metrics Evaluator가 fixture 기준으로 여섯 지표를 계산할 수 있게 되었지만, 실제 `/tink:cast`와 `$tink:cast` run state에는 아직 `context-metrics-evaluation.json`이 공식 산출물로 연결되어 있지 않았다. 이 상태에서는 90% 목표를 current run마다 반복 검증하기 어렵다.
|
|
6
|
+
|
|
7
|
+
## 해결
|
|
8
|
+
|
|
9
|
+
- `templates/tink/schemas/context-metrics-evaluation.schema.json`을 추가했다.
|
|
10
|
+
- `/tink:cast`와 `$tink:cast` 지침에 `.tink/current/context-metrics-evaluation.json` 생성 규칙을 추가했다.
|
|
11
|
+
- Work State Guide에 metrics evaluation 읽기 순서를 추가했다.
|
|
12
|
+
- Context Metrics Evaluator 문서를 run-state artifact 기준으로 갱신했다.
|
|
13
|
+
- `v1.4.0` 릴리즈 메타데이터, README, CHANGELOG, VERSIONING을 갱신했다.
|
|
14
|
+
|
|
15
|
+
## 검증
|
|
16
|
+
|
|
17
|
+
- `npm test`
|
|
18
|
+
- `git diff --check`
|
|
19
|
+
- `claude plugin validate .claude-plugin/plugin.json`
|
|
20
|
+
- `claude plugin validate .claude-plugin/marketplace.json`
|
|
21
|
+
- `npm pack --dry-run --json`
|
|
22
|
+
|
|
23
|
+
## 참고
|
|
24
|
+
|
|
25
|
+
- 새 public command는 추가하지 않았다.
|
|
26
|
+
- `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않았다.
|
|
27
|
+
- Sentry는 포함하지 않았다.
|
|
28
|
+
- release evidence bundling은 포함하지 않았다.
|
|
29
|
+
- 점수의 scope와 limit를 명시해야 하며, run-history나 telemetry 없이 production-wide 90%를 주장하지 않는다.
|
|
30
|
+
- Claude Code와 Codex, macOS와 Windows 동시 지원을 기준으로 작성했다.
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Context Metrics Evaluator
|
|
2
|
+
|
|
3
|
+
## 문제
|
|
4
|
+
|
|
5
|
+
Context Budget Ledger로 context entry에 역할, 비용, 재사용 신호, 검증 연결을 남길 수 있게 되었지만, `efficiency_metrics` 점수는 아직 사람이 적은 값에 가까웠다. 목표 지표를 90% 이상으로 반복 개선하려면 점수가 실제 fixture에서 다시 계산되어야 한다.
|
|
6
|
+
|
|
7
|
+
## 해결
|
|
8
|
+
|
|
9
|
+
- `tests/fixtures/current-run/context-metrics-evaluation.json`을 추가해 여섯 지표의 계산식, 분자, 분모, evidence ref를 기록했다.
|
|
10
|
+
- `context-map.json.efficiency_metrics`를 `fixture-ratio-v1` 기준의 `measured` 점수로 갱신했다.
|
|
11
|
+
- `tests/test_templates.py`가 context-map, context-diff, contract, repo signal, path-case fixture를 다시 읽어 점수를 직접 계산하도록 했다.
|
|
12
|
+
- 반복 path-case에 expected context role을 보강해 재사용 정확도도 계산할 수 있게 했다.
|
|
13
|
+
- 한국어 우선 문서 `docs/context-metrics-evaluator.ko.md`와 영어 companion을 추가했다.
|
|
14
|
+
|
|
15
|
+
## 검증
|
|
16
|
+
|
|
17
|
+
- `npm test`
|
|
18
|
+
- `git diff --check`
|
|
19
|
+
- `npm pack --dry-run --json`
|
|
20
|
+
|
|
21
|
+
## 참고
|
|
22
|
+
|
|
23
|
+
- 새 public command는 추가하지 않았다.
|
|
24
|
+
- `tink index` 명령, watcher, generated cache, hidden runtime index를 만들지 않았다.
|
|
25
|
+
- Sentry는 포함하지 않았다.
|
|
26
|
+
- release evidence bundling은 포함하지 않았다.
|
|
27
|
+
- 이번 점수는 fixture scope의 측정값이며, production telemetry 전체가 90%에 도달했다는 뜻은 아니다.
|
|
28
|
+
- Claude Code와 Codex, macOS와 Windows 동시 지원 기준을 유지했다.
|
package/docs/work-state.ko.md
CHANGED
|
@@ -21,11 +21,13 @@ Tink는 실행 상태를 파일로 남겨서 사람이 빠르게 네 가지를
|
|
|
21
21
|
- 선택된 파일, 문서, 규칙, 외부 source의 사람용 요약을 읽습니다.
|
|
22
22
|
3. `.tink/current/context-map.json`
|
|
23
23
|
- `included`, `excluded`, `signals`, `external_context`를 구조적으로 확인합니다.
|
|
24
|
-
4. `.tink/current/
|
|
24
|
+
4. `.tink/current/context-metrics-evaluation.json`
|
|
25
|
+
- context 효율 점수, 계산식, 분자/분모, evidence ref, 측정 scope와 한계를 확인합니다.
|
|
26
|
+
5. `.tink/current/excluded-context.md`
|
|
25
27
|
- 오래됐거나, 위험하거나, 너무 넓거나, 접근할 수 없거나, 범위 밖이라 제외한 context를 확인합니다.
|
|
26
|
-
|
|
28
|
+
6. `.tink/current/verification.json`
|
|
27
29
|
- pass, fail, blocked, skipped 검증 결과와 최종 report를 확인합니다.
|
|
28
|
-
|
|
30
|
+
7. `.tink/current/notes.md`
|
|
29
31
|
- 마지막 안전 지점, 복구 메모, 짧은 검증 요약을 읽습니다.
|
|
30
32
|
|
|
31
33
|
## Context 읽는 법
|
|
@@ -50,6 +52,8 @@ Tink는 실행 상태를 파일로 남겨서 사람이 빠르게 네 가지를
|
|
|
50
52
|
|
|
51
53
|
자세한 기준은 `docs/context-budget-ledger.ko.md`를 봅니다.
|
|
52
54
|
|
|
55
|
+
`context-metrics-evaluation.json`은 점수가 어떻게 나왔는지 보여줍니다. `scope: "fixture"`나 `scope: "current_run"`이면 해당 범위 안에서만 측정된 값입니다. 여러 run record나 production telemetry가 없으면 전체 사용자 작업이 90%에 도달했다고 말하지 않습니다.
|
|
56
|
+
|
|
53
57
|
`signals[]`에 `kind: "context_graph_rule"`가 있으면 `/tink:cast`나 `$tink:cast`가 changed path를 보고 고른 작은 단서로 읽습니다. `context_graph_lite.rules.claude-command-sync` 같은 안정적인 `source_ref`를 가리키고, 왜 관련 파일을 함께 포함했는지 설명해야 합니다. 이 신호는 cast 내부 선택 근거일 뿐이며 public `tink index` 명령, watcher, generated cache, hidden runtime index를 뜻하지 않습니다.
|
|
54
58
|
|
|
55
59
|
외부 context는 다음 항목을 확인합니다.
|
package/docs/work-state.md
CHANGED
|
@@ -19,11 +19,13 @@ Start here when resuming, reviewing, or handing off a run:
|
|
|
19
19
|
- Read the short human summary of selected files, docs, rules, and external sources.
|
|
20
20
|
3. `.tink/current/context-map.json`
|
|
21
21
|
- Inspect the structured `included`, `excluded`, `signals`, and `external_context` entries.
|
|
22
|
-
4. `.tink/current/
|
|
22
|
+
4. `.tink/current/context-metrics-evaluation.json`
|
|
23
|
+
- Check context-efficiency scores, formulas, numerators, denominators, evidence refs, measurement scope, and limits.
|
|
24
|
+
5. `.tink/current/excluded-context.md`
|
|
23
25
|
- Check what was skipped because it was stale, unsafe, too broad, unavailable, or outside scope.
|
|
24
|
-
|
|
26
|
+
6. `.tink/current/verification.json`
|
|
25
27
|
- Confirm pass, fail, blocked, or skipped checks and the final report.
|
|
26
|
-
|
|
28
|
+
7. `.tink/current/notes.md`
|
|
27
29
|
- Read the last safe point, recovery notes, and compact verification summaries.
|
|
28
30
|
|
|
29
31
|
## How To Read Context
|
|
@@ -48,6 +50,8 @@ When context entries include Context Budget Ledger fields, read them this way:
|
|
|
48
50
|
|
|
49
51
|
See `docs/context-budget-ledger.md` for the detailed rules.
|
|
50
52
|
|
|
53
|
+
`context-metrics-evaluation.json` explains how the score was produced. If `scope` is `fixture` or `current_run`, the value is measured only inside that boundary. Do not claim all user work has reached 90% without run-history or production telemetry evidence.
|
|
54
|
+
|
|
51
55
|
When `signals[]` includes `kind: "context_graph_rule"`, read it as a small changed-path clue selected by `/tink:cast` or `$tink:cast`. It should point to a stable `source_ref` such as `context_graph_lite.rules.claude-command-sync`, explain why related files were included, and stay internal to cast. It must not imply a public `tink index` command, watcher, generated cache, or hidden runtime index.
|
|
52
56
|
|
|
53
57
|
For external context, check:
|
package/package.json
CHANGED
|
@@ -144,6 +144,7 @@ After approval, create `.tink/current/` with these files before doing deeper wor
|
|
|
144
144
|
- `session.json`: lightweight session metadata, especially rule ids already loaded by phase
|
|
145
145
|
- `context-pack.md`: human-readable selected context, including why each item is relevant
|
|
146
146
|
- `context-map.json`: machine-readable included and excluded context with reasons
|
|
147
|
+
- `context-metrics-evaluation.json`: measured or estimated context-efficiency scores, formulas, evidence refs, and limits
|
|
147
148
|
- `excluded-context.md`: notable omitted files, tools, sources, or claims and why they were excluded
|
|
148
149
|
|
|
149
150
|
Create `contract.json` before loading harness bodies. It should be short, factual, and based on the user request plus visible project context:
|
|
@@ -189,9 +190,11 @@ Create context artifacts before deeper implementation work:
|
|
|
189
190
|
- `context-pack.md` should name the user task, selected harnesses, contract summary, loaded rules, selected files/docs, selected external sources, and verification implications.
|
|
190
191
|
- `context-map.json` should contain `task`, `included`, `excluded`, `signals`, and `generated_at`. Each included or excluded entry should include `path` or `source`, `kind`, `reason`, and `confidence`. When external context is selected, also write `external_context[]`.
|
|
191
192
|
- When useful, enrich each context entry with Context Budget Ledger fields: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`. Use them to explain why the first context pack is small enough, why excluded context should stay out, and which checks prove selected context matters.
|
|
193
|
+
- `context-metrics-evaluation.json` should contain `run`, `evaluator`, `target_threshold_percent`, `measurement_status`, `scope`, `limits`, and `scores[]`. Each score should include `name`, `score_percent`, `formula`, `numerator`, `denominator`, and `evidence_refs`. If the score is based only on fixtures or the current run, say so in `scope` and `limits`; do not claim production-wide 90% without run-history or telemetry evidence.
|
|
192
194
|
- `excluded-context.md` should make important omissions visible, especially files skipped because they are out of scope, stale, risky, too broad, or unverified external claims.
|
|
193
195
|
|
|
194
196
|
If `.tink/schemas/context-map.schema.json` exists, use it for `context-map.json`. Do not paste the schema into the user response.
|
|
197
|
+
If `.tink/schemas/context-metrics-evaluation.schema.json` exists, use it for `context-metrics-evaluation.json`. Do not paste the schema into the user response.
|
|
195
198
|
|
|
196
199
|
Use deterministic context selection inside cast. Do not create or require a separate `tink index` command for this phase.
|
|
197
200
|
|
|
@@ -401,7 +404,7 @@ A task is trivial only when ALL of the following are true:
|
|
|
401
404
|
15. Run Stitch once before committing to `.tink/current/`. If it triggers, show exactly one proposal before approval. Call `AskUserQuestion` as described in the Interaction policy section.
|
|
402
405
|
16. Ask for explicit approval before non-trivial work.
|
|
403
406
|
17. After approval, read only the selected harness files and any approved run-only draft.
|
|
404
|
-
18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, and `excluded-context.md`.
|
|
407
|
+
18. Create `.tink/current/` files from the run state contract, including `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`.
|
|
405
408
|
19. Execute the first safe step immediately:
|
|
406
409
|
- inspect relevant files,
|
|
407
410
|
- run a read-only diagnostic,
|
|
@@ -31,7 +31,7 @@ Accept legacy `$tink <action>` spelling for compatibility, but present `$tink:<a
|
|
|
31
31
|
11. Use `request_user_input` for choice prompts when available. Otherwise stop and ask one concise blocking approval question directly in chat. Do not continue until the user answers.
|
|
32
32
|
12. Treat reusable saves as a separate hard approval gate for `.tink/memory/*`, `.tink/harnesses/*`, `.tink/rules/*`, `.tink/config.json`, Codex skill files, and template/plugin files that affect future installs.
|
|
33
33
|
13. Current-run approval never authorizes reusable-state writes. Before saving reusable state, show operation, destination files, exact entry or patch summary, reusable reason, sensitive content excluded, and rollback/removal path.
|
|
34
|
-
14. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, and `excluded-context.md`.
|
|
34
|
+
14. After approval, create `.tink/current/plan.md`, `checks.md`, `steps.json`, `notes.md`, `answers.md`, `contract.json`, `session.json`, `context-pack.md`, `context-map.json`, `context-metrics-evaluation.json`, and `excluded-context.md`.
|
|
35
35
|
15. Do not stop at recommendation. Execute the first safe step after run state exists.
|
|
36
36
|
16. Run `$tink:verify` behavior before final when `contract.json` lists required checks.
|
|
37
37
|
17. Store reusable memory or rule updates under `.tink/` only after separate approval.
|
|
@@ -102,10 +102,13 @@ Create run state before deeper work:
|
|
|
102
102
|
- `session.json`: loaded rule ids by phase and lightweight retrieval metadata
|
|
103
103
|
- `.tink/current/context-pack.md`: human-readable selected context and why it matters
|
|
104
104
|
- `.tink/current/context-map.json`: machine-readable included/excluded context and reasons
|
|
105
|
+
- `.tink/current/context-metrics-evaluation.json`: measured or estimated context-efficiency scores, formulas, evidence refs, and limits
|
|
105
106
|
- `.tink/current/excluded-context.md`: notable omitted context and why it was left out
|
|
106
107
|
|
|
107
108
|
When useful, enrich `context-map.json.included[]` and `context-map.json.excluded[]` entries with Context Budget Ledger fields: `role`, `cost`, `reuse_signal`, `verification_link`, `staleness`, and `evidence_kind`. Use them to keep the first context pack small, mark stale or avoid-next-time context, and connect `verification_target` entries to command checks, manual checks, evidence refs, or verification hints. Do not claim any 90% efficiency score without measurement evidence.
|
|
108
109
|
|
|
110
|
+
When writing `context-metrics-evaluation.json`, include `run`, `evaluator`, `target_threshold_percent`, `measurement_status`, `scope`, `limits`, and `scores[]`. Each score should include `name`, `score_percent`, `formula`, `numerator`, `denominator`, and `evidence_refs`. If the score comes only from fixture or current-run evidence, record that scope and do not claim production-wide 90% without run-history or telemetry evidence.
|
|
111
|
+
|
|
109
112
|
When external context is needed for `$tink:cast`, write it through the MCP Safe Profile shape in `context-map.json.external_context[]`. Record `source`, `source_ref`, `kind`, `included`, `excluded`, `reason`, `confidence`, `sensitivity`, and `verification_hint` when useful. Treat Figma, GitHub, and official docs as representative examples, not the only supported sources; Linear, Jira, Supabase, dashboards, API responses, screenshots, attachments, and runbooks can follow the same shape.
|
|
110
113
|
|
|
111
114
|
When repo signal fixtures contain `context_graph_lite.rules[]`, use those rules inside `$tink:cast` to choose the first related context candidates. Match changed paths against `when_paths`, consider `include_paths`, cite selected rules as `context_graph_rule` signals with `source_ref: "context_graph_lite.rules.<name>"`, and connect `signal_refs` to verification hints where relevant. If the fixture provides `context_budget_policy`, use it to assign roles, costs, reuse signals, verification links, staleness, and evidence kinds. Do not create a public `tink index` command, watch process, generated cache, or hidden runtime index.
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
{
|
|
2
|
+
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
|
3
|
+
"$id": "https://github.com/dotoricode/tink-harness/schemas/context-metrics-evaluation.schema.json",
|
|
4
|
+
"title": "Tink current run context metrics evaluation",
|
|
5
|
+
"type": "object",
|
|
6
|
+
"required": ["run", "evaluator", "target_threshold_percent", "measurement_status", "scope", "scores"],
|
|
7
|
+
"properties": {
|
|
8
|
+
"run": {
|
|
9
|
+
"type": "string",
|
|
10
|
+
"description": "Run state path, usually .tink/current."
|
|
11
|
+
},
|
|
12
|
+
"evaluator": {
|
|
13
|
+
"type": "string",
|
|
14
|
+
"description": "Stable evaluator id, such as fixture-ratio-v1."
|
|
15
|
+
},
|
|
16
|
+
"target_threshold_percent": {
|
|
17
|
+
"type": "number",
|
|
18
|
+
"minimum": 0,
|
|
19
|
+
"maximum": 100
|
|
20
|
+
},
|
|
21
|
+
"measurement_status": {
|
|
22
|
+
"type": "string",
|
|
23
|
+
"enum": ["estimated", "measured", "mixed"]
|
|
24
|
+
},
|
|
25
|
+
"scope": {
|
|
26
|
+
"type": "string",
|
|
27
|
+
"enum": ["fixture", "current_run", "run_history", "production_telemetry"]
|
|
28
|
+
},
|
|
29
|
+
"limits": {
|
|
30
|
+
"type": "array",
|
|
31
|
+
"items": { "type": "string" }
|
|
32
|
+
},
|
|
33
|
+
"scores": {
|
|
34
|
+
"type": "array",
|
|
35
|
+
"items": { "$ref": "#/$defs/metric_score" }
|
|
36
|
+
}
|
|
37
|
+
},
|
|
38
|
+
"$defs": {
|
|
39
|
+
"metric_score": {
|
|
40
|
+
"type": "object",
|
|
41
|
+
"required": [
|
|
42
|
+
"name",
|
|
43
|
+
"score_percent",
|
|
44
|
+
"formula",
|
|
45
|
+
"numerator",
|
|
46
|
+
"denominator",
|
|
47
|
+
"evidence_refs"
|
|
48
|
+
],
|
|
49
|
+
"properties": {
|
|
50
|
+
"name": {
|
|
51
|
+
"type": "string",
|
|
52
|
+
"enum": [
|
|
53
|
+
"unnecessary_context_reduction",
|
|
54
|
+
"initial_context_pack_size_reduction",
|
|
55
|
+
"review_evidence_lookup_time_reduction",
|
|
56
|
+
"verification_omission_detection",
|
|
57
|
+
"repeated_context_reuse_accuracy",
|
|
58
|
+
"rework_probability_reduction"
|
|
59
|
+
]
|
|
60
|
+
},
|
|
61
|
+
"score_percent": {
|
|
62
|
+
"type": "number",
|
|
63
|
+
"minimum": 0,
|
|
64
|
+
"maximum": 100
|
|
65
|
+
},
|
|
66
|
+
"formula": {
|
|
67
|
+
"type": "string"
|
|
68
|
+
},
|
|
69
|
+
"numerator": {
|
|
70
|
+
"type": "number",
|
|
71
|
+
"minimum": 0
|
|
72
|
+
},
|
|
73
|
+
"denominator": {
|
|
74
|
+
"type": "number",
|
|
75
|
+
"minimum": 0
|
|
76
|
+
},
|
|
77
|
+
"evidence_refs": {
|
|
78
|
+
"type": "array",
|
|
79
|
+
"items": { "type": "string" }
|
|
80
|
+
},
|
|
81
|
+
"limit": {
|
|
82
|
+
"type": "string"
|
|
83
|
+
}
|
|
84
|
+
},
|
|
85
|
+
"additionalProperties": true
|
|
86
|
+
}
|
|
87
|
+
},
|
|
88
|
+
"additionalProperties": true
|
|
89
|
+
}
|