meto-cli 0.12.0 → 0.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,42 @@
1
+ # ai/rubric/
2
+
3
+ This directory holds the tester rubric and related evaluation artifacts for the project.
4
+
5
+ ---
6
+
7
+ ## Files
8
+
9
+ | File | Purpose |
10
+ |------|---------|
11
+ | `tester-rubric.md` | Blank rubric template — copy and fill in for each slice evaluation |
12
+ | `tester-calibration-log.md` | Running log of past misjudgments and corrected thresholds (added in slice-090) |
13
+ | `slice-NNN-score.md` | Completed rubric for slice NNN — created by @meto-tester after each evaluation |
14
+
15
+ ---
16
+
17
+ ## Score-to-Outcome Mapping
18
+
19
+ | Score pattern | Outcome | What happens next |
20
+ |---------------|---------|-------------------|
21
+ | All dimensions score 3 | **Clean Pass** | Task moves to `tasks-done.md` immediately |
22
+ | All dimensions score 2 or higher (at least one 2) | **Conditional Pass** | Task moves to `tasks-done.md`; tester opens a follow-up task for each 2-scored dimension if the issue is worth tracking |
23
+ | Any dimension scores 1 | **Automatic Fail** | Task moves back to `tasks-todo.md`; the full rubric with critiques is attached so the developer knows exactly what to fix |
24
+
25
+ ---
26
+
27
+ ## Workflow
28
+
29
+ 1. @meto-tester copies `tester-rubric.md` to `ai/rubric/slice-NNN-score.md` (where NNN is the slice ID).
30
+ 2. Tester runs all verification commands (`npx vitest run`, `npx tsc --noEmit`, lint) and records exit codes in the rubric table.
31
+ 3. Tester scores each of the 5 dimensions, writes critiques for any dimension below 3.
32
+ 4. Tester writes the overall outcome (PASS / CONDITIONAL PASS / FAIL) and signs off.
33
+ 5. Tester moves the task to `tasks-done.md` (PASS / CONDITIONAL PASS) or back to `tasks-todo.md` (FAIL), attaching the completed rubric file path in the task block.
34
+
35
+ ---
36
+
37
+ ## NEVER DO
38
+
39
+ - Do not return a binary "pass" or "fail" without completing the rubric table.
40
+ - Do not skip running the verification commands — reading code is not evidence.
41
+ - Do not modify a completed rubric after sign-off. If a past score was wrong, record the correction in `tester-calibration-log.md`.
42
+ - Do not delete slice score files. They are the audit trail for the project's quality history.
@@ -0,0 +1,71 @@
1
+ # Tester Calibration Log — {{PROJECT_NAME}}
2
+
3
+ <!-- LOCATION: This file lives at ai/rubric/tester-calibration-log.md in the scaffolded project. -->
4
+ <!-- @meto-tester MUST read this file at session start before evaluating any slice. -->
5
+ <!-- Update "Current Calibration Rules" every time a new entry is added below. -->
6
+
7
+ ---
8
+
9
+ ## Current Calibration Rules
10
+
11
+ <!-- INSTRUCTIONS: Keep this section current. Each time you add a new log entry, -->
12
+ <!-- distil the rule from that entry into the list below. This section is the active -->
13
+ <!-- working rule set @meto-tester applies on every evaluation. -->
14
+ <!-- Delete example rules and replace with real ones when the first real entry is added. -->
15
+
16
+ 1. **Test Coverage — partial implementation still scores 2, not 1** — If the implementation covers all acceptance criteria but lacks edge-case tests that were never specified in the contract, score 2 (partial) rather than 1 (fail). Score 1 only when an AC is provably untested.
17
+ 2. **Convention Adherence — a single stray debug log is score 2, not 1** — A committed `console.log` left in a non-critical path is a partial violation (score 2). Score 1 only when debug output meaningfully pollutes production behaviour or multiple violations exist.
18
+
19
+ ---
20
+
21
+ ## Log Entries
22
+
23
+ <!-- INSTRUCTIONS: Add new entries at the TOP (reverse-chronological). -->
24
+ <!-- Copy the entry template below for each new entry. -->
25
+ <!-- Fictional slice IDs (slice-000, slice-001) used for illustration — delete these examples -->
26
+ <!-- once the first real entry is recorded. -->
27
+
28
+ ---
29
+
30
+ ### Entry 2 — Example
31
+
32
+ | Field | Value |
33
+ |---|---|
34
+ | **Date** | 2026-01-15 |
35
+ | **Slice ID** | slice-001 |
36
+ | **Dimension Affected** | Convention Adherence |
37
+ | **What was scored incorrectly** | Scored 1 (fail) because one `console.log` was found in a utility helper; correct threshold is score 2 (partial) for a single isolated debug statement that does not affect production output |
38
+ | **Correct score in retrospect** | 2 — partial violation, not a full fail |
39
+ | **Rule update** | A single isolated `console.log` in a non-critical path is a partial violation (score 2); only score 1 when multiple violations exist or when debug output pollutes production behaviour |
40
+
41
+ ---
42
+
43
+ ### Entry 1 — Example
44
+
45
+ | Field | Value |
46
+ |---|---|
47
+ | **Date** | 2026-01-08 |
48
+ | **Slice ID** | slice-000 |
49
+ | **Dimension Affected** | Test Coverage |
50
+ | **What was scored incorrectly** | Scored 1 (fail) because an edge case not listed in the sprint contract had no test; the implementation itself covered all agreed acceptance criteria |
51
+ | **Correct score in retrospect** | 2 — all contracted criteria were covered; the missing edge case was out of scope for the contract |
52
+ | **Rule update** | Score Test Coverage based on contracted acceptance criteria only; uncontracted edge cases lower the score to 2 (partial) but never to 1 (fail) unless an AC is provably untested |
53
+
54
+ ---
55
+
56
+ ## Entry Template
57
+
58
+ <!-- Copy this block for each new entry. Replace all placeholder values. -->
59
+
60
+ <!--
61
+ ### Entry N — [brief label]
62
+
63
+ | Field | Value |
64
+ |---|---|
65
+ | **Date** | YYYY-MM-DD |
66
+ | **Slice ID** | slice-NNN |
67
+ | **Dimension Affected** | [Code Quality / Type Safety / Test Coverage / Convention Adherence / Methodology Compliance] |
68
+ | **What was scored incorrectly** | [Describe what score you gave and why it was wrong] |
69
+ | **Correct score in retrospect** | [1 / 2 / 3 — and why] |
70
+ | **Rule update** | [One sentence: the threshold adjustment to apply in all future evaluations of this dimension] |
71
+ -->
@@ -0,0 +1,140 @@
1
+ # Tester Rubric — Slice {{SLICE_ID}}
2
+
3
+ <!-- Fill in all fields before submitting the rubric. Incomplete rubrics are not valid evaluations. -->
4
+
5
+ **Tester:** {{TESTER_NAME}}
6
+ **Date:** {{DATE}}
7
+ **Slice:** {{SLICE_ID}}
8
+
9
+ ---
10
+
11
+ ## Grading Dimensions
12
+
13
+ Score each dimension on a 1–3 scale using the criteria below. Every score below 3 requires a written critique (see Critique Format section).
14
+
15
+ ---
16
+
17
+ ### 1. Code Quality
18
+
19
+ | Score | Criteria |
20
+ |-------|----------|
21
+ | 3 | Code is readable, well-structured, no dead code, no magic numbers, all identifiers are descriptive |
22
+ | 2 | Mostly readable with minor issues: one or two unclear names, a stray commented block, or a magic number with obvious intent |
23
+ | 1 | Code is difficult to follow, contains dead code, relies on unexplained magic numbers, or has significant structural problems |
24
+
25
+ **Score:** ___
26
+
27
+ **Critique (required if score < 3):**
28
+
29
+ <!-- One sentence of actionable feedback. Point to a specific file and line number where possible. Example: "src/cli/audit/scanner.ts:42 uses the magic number 20 — extract it as a named constant MAX_BAR_WIDTH." -->
30
+
31
+ ---
32
+
33
+ ### 2. Type Safety
34
+
35
+ | Score | Criteria |
36
+ |-------|----------|
37
+ | 3 | No `any` types anywhere, `tsc --noEmit` exits 0 with zero errors or warnings |
38
+ | 2 | One suppressed `any` with a clear justification comment, or `tsc` exits 0 but with a minor type assertion that is technically safe |
39
+ | 1 | `any` type used without justification, `tsc --noEmit` exits non-zero, or type errors are suppressed silently |
40
+
41
+ **Score:** ___
42
+
43
+ **Critique (required if score < 3):**
44
+
45
+ <!-- One sentence. Example: "src/cli/init/renderer.ts:88 uses `as any` to bypass a union type — the correct fix is to narrow the type with a type guard." -->
46
+
47
+ ---
48
+
49
+ ### 3. Test Coverage
50
+
51
+ | Score | Criteria |
52
+ |-------|----------|
53
+ | 3 | Failing test written before implementation (red→green confirmed), all acceptance criteria have at least one corresponding test, `npx vitest run` exits 0 |
54
+ | 2 | Tests cover most acceptance criteria but one AC has no direct test, or the red→green order cannot be verified but tests are otherwise complete |
55
+ | 1 | Tests added after the fact without red→green discipline, one or more acceptance criteria have no test, or `npx vitest run` exits non-zero |
56
+
57
+ **Score:** ___
58
+
59
+ **Critique (required if score < 3):**
60
+
61
+ <!-- One sentence. Example: "The 'minimum pass threshold' AC has no test — add a test asserting that a dimension scored 1 causes the rubric evaluation to return FAIL regardless of other scores." -->
62
+
63
+ ---
64
+
65
+ ### 4. Convention Adherence
66
+
67
+ | Score | Criteria |
68
+ |-------|----------|
69
+ | 3 | Commit message matches the format `type(scope): description [agent-tag]`, file naming follows project conventions, no `console.log`, no hardcoded scaffold content in `/src/` |
70
+ | 2 | One minor convention violation (e.g. missing agent tag in commit, a slightly inconsistent file name) with no functional impact |
71
+ | 1 | Commit message does not follow the format, `console.log` present in committed code, scaffold content hardcoded in source, or multiple convention violations |
72
+
73
+ **Score:** ___
74
+
75
+ **Critique (required if score < 3):**
76
+
77
+ <!-- One sentence. Example: "Commit a3f9c1b is missing the [dev-agent] tag — amend or note in the next commit message." -->
78
+
79
+ ---
80
+
81
+ ### 5. Methodology Compliance
82
+
83
+ | Score | Criteria |
84
+ |-------|----------|
85
+ | 3 | Sprint contract exists at `ai/contracts/slice-{{SLICE_ID}}-contract.md` and is signed by both agents, task definition was followed exactly, no out-of-scope work delivered |
86
+ | 2 | Sprint contract exists and is signed but has one minor gap (e.g. an edge case that was agreed verbally but not written into the contract), or a very small out-of-scope addition with clear justification |
87
+ | 1 | No sprint contract, contract not signed before code was written, significant out-of-scope work delivered, or task definition was materially deviated from |
88
+
89
+ **Score:** ___
90
+
91
+ **Critique (required if score < 3):**
92
+
93
+ <!-- One sentence. Example: "ai/contracts/slice-088-contract.md was not signed before implementation began — retroactive sign-off does not satisfy the contract-first rule." -->
94
+
95
+ ---
96
+
97
+ ## Minimum Pass Threshold
98
+
99
+ <!-- Do not modify this section. It is part of the rubric definition. -->
100
+
101
+ - All dimensions must score **2 or higher** to pass.
102
+ - Any dimension scoring **1 is an automatic fail** — the slice returns to todo regardless of other scores.
103
+ - A score of 2 across all dimensions is a **conditional pass** — the tester must note in the rubric table what must be improved in the next slice or a follow-up task.
104
+ - A score of 3 across all dimensions is a **clean pass**.
105
+
106
+ **Overall result:** PASS / CONDITIONAL PASS / FAIL _(circle one)_
107
+
108
+ ---
109
+
110
+ ## Critique Format
111
+
112
+ For each dimension that scored below 3, write exactly one sentence of actionable feedback. The sentence must:
113
+
114
+ 1. Name the specific problem (not "code quality issues" — name the actual issue)
115
+ 2. Point to a file and line number where possible
116
+ 3. State the corrective action the developer should take
117
+
118
+ **Do not write critiques for dimensions that scored 3.** Silence on a dimension means it passed cleanly.
119
+
120
+ ---
121
+
122
+ ## Verification Commands Run
123
+
124
+ <!-- List every command you ran and its exit code. Never evaluate by reading code alone. -->
125
+
126
+ | Command | Exit Code | Notes |
127
+ |---------|-----------|-------|
128
+ | `npx vitest run` | | |
129
+ | `npx tsc --noEmit` | | |
130
+ | _(lint command if present)_ | | |
131
+
132
+ ---
133
+
134
+ ## Final Sign-off
135
+
136
+ By submitting this rubric, I confirm that I ran all verification commands in the current session and that every score reflects observed evidence, not assumption.
137
+
138
+ **Tester:** {{TESTER_NAME}}
139
+ **Date:** {{DATE}}
140
+ **Outcome:** PASS / CONDITIONAL PASS / FAIL _(circle one)_