yam-harness 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/COMMANDS.md CHANGED
@@ -22,6 +22,8 @@ $quick
22
22
  ```text
23
23
  $ueye
24
24
  첨부한 레퍼런스 이미지의 분위기를 참고해서 가격 카드 UI를 고급스럽게 개선해줘.
25
+ 구현 전에 레퍼런스에서 읽은 디자인 특징을 짧게 증명하고,
26
+ 구현 후 레퍼런스와 결과물을 비교해서 비슷한 점과 다른 점을 정리해줘.
25
27
  이 프로젝트의 디자인 방향성과 맞게 구현하고,
26
28
  가능하면 기본/모바일/오류 상태를 실제 화면 기준으로 확인해줘.
27
29
  ```
@@ -33,6 +35,10 @@ $ueye
33
35
  가능한 안전한 수정안을 제안해줘.
34
36
  ```
35
37
 
38
+ ```text
39
+ yam template ueye-comparison
40
+ ```
41
+
36
42
  ## Review-Only
37
43
 
38
44
  ```text
package/DECISIONS.md CHANGED
@@ -1,17 +1,17 @@
1
1
  # yam Decision Baseline
2
2
 
3
- Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style minimal harness principles.
3
+ Every `yam` change is evaluated against strict proof, modular skill, and minimal-core harness principles.
4
4
 
5
5
  ## Fixed Questions
6
6
 
7
- 1. What would Sneakoscope verify?
8
- 2. What would ECC make selective or low-context?
9
- 3. What would Karpathy remove to keep the core obeyable?
7
+ 1. What needs concrete evidence before completion?
8
+ 2. What should stay selective or low-context?
9
+ 3. What can be removed to keep the core obeyable?
10
10
  4. What should `yam` keep light by default, and what should deepen deliberately?
11
11
 
12
12
  ## Borrow
13
13
 
14
- ### Sneakoscope
14
+ ### Strict Proof
15
15
 
16
16
  - Truthful completion language.
17
17
  - Risk escalation.
@@ -19,7 +19,7 @@ Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style min
19
19
  - Fake versus real distinction.
20
20
  - Runtime/process proof only when explicitly requested.
21
21
 
22
- ### ECC
22
+ ### Modular Skills
23
23
 
24
24
  - Skills-first structure.
25
25
  - Selective install.
@@ -27,7 +27,7 @@ Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style min
27
27
  - Token optimization.
28
28
  - Project-specific rules instead of global bloat.
29
29
 
30
- ### Karpathy-Style Minimal Harness
30
+ ### Minimal Core
31
31
 
32
32
  - Short core.
33
33
  - Few route names.
@@ -36,21 +36,21 @@ Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style min
36
36
 
37
37
  ## Reject
38
38
 
39
- ### From Sneakoscope
39
+ ### From Strict Proof
40
40
 
41
41
  - Mandatory hooks.
42
42
  - Mandatory Team or subagent proof.
43
43
  - Always-on tmux/proof lifecycle.
44
44
  - Heavy memory systems for ordinary edits.
45
45
 
46
- ### From ECC
46
+ ### From Modular Skills
47
47
 
48
48
  - Full install by default.
49
49
  - Giant catalog context.
50
50
  - Hook runtime by default.
51
51
  - Too many always-on rules.
52
52
 
53
- ### From Minimal Harness
53
+ ### From Minimal Core
54
54
 
55
55
  - Under-verification.
56
56
  - Vague quality rules.
package/README.md CHANGED
@@ -17,7 +17,7 @@ End with remaining tasks and fix-first items.
17
17
  ## Routes
18
18
 
19
19
  - `$quick`: fast scoped implementation, small fixes, and quick error scans.
20
- - `$ueye`: design-heavy UI/UX implementation and screenshot-led visual review.
20
+ - `$ueye`: tight UI/UX/design implementation and visual review with reference-read proof, comparison, and quality judgment.
21
21
  - `$question`: direct Q&A without turning simple questions into research projects.
22
22
  - `$scout`: bounded investigation and recommendation.
23
23
  - `$deep`: single-agent heavy verification by request, including runtime/tmux/browser/process proof when needed.
@@ -134,6 +134,7 @@ node ./bin/yam.js hook disable lite --global
134
134
  node ./bin/yam.js budget ueye
135
135
  node ./bin/yam.js measure ueye --files 5 --commands 2 --report-lines 12 --seconds 180
136
136
  node ./bin/yam.js template ueye
137
+ node ./bin/yam.js template ueye-comparison
137
138
  node ./bin/yam.js template mission
138
139
  node ./bin/yam.js template proof
139
140
  node ./bin/yam.js tune-log /path/to/project
package/ROADMAP.md CHANGED
@@ -64,12 +64,13 @@ Tasks:
64
64
 
65
65
  ### 5. Ueye Workflow
66
66
 
67
- Goal: make screenshot/UI review more repeatable.
67
+ Goal: make screenshot/UI review and reference-led design implementation more repeatable.
68
68
 
69
69
  Tasks:
70
70
 
71
71
  - Merge `ui`, `eye`, and `review` into `ueye`. Done.
72
72
  - Add Ueye checklist refinements. Done.
73
+ - Add Visual Evidence Inventory, Reference Read Proof, Reference vs Implementation Matrix, and Design Quality Review. Done.
73
74
  - Add optional browser/screenshot capture notes.
74
75
  - Keep image generation optional, never a gate.
75
76
 
@@ -96,12 +97,14 @@ Tasks:
96
97
 
97
98
  ### 8. Scout / Research Workflow
98
99
 
99
- Goal: give yam a research lane comparable to Sneakoscope research, but lighter and more decision-oriented.
100
+ Goal: give yam a research lane that is evidence-bound, lightweight, and decision-oriented.
100
101
 
101
- ECC reference points:
102
+ Research reference points:
102
103
 
103
- - ECC has `deep-research`, `market-research`, `research-ops`, and `contexts/research.md`.
104
- - Useful parts to borrow: evidence boundaries, source freshness, fact/inference/recommendation separation, and decision-oriented summaries.
104
+ - Evidence boundaries.
105
+ - Source freshness.
106
+ - Fact/inference/recommendation separation.
107
+ - Decision-oriented summaries.
105
108
 
106
109
  Tasks:
107
110
 
@@ -142,13 +145,13 @@ Tasks:
142
145
 
143
146
  Goal: preserve durable lessons without turning yam into a heavy automatic memory system.
144
147
 
145
- Borrowed from Sneakoscope:
148
+ Kept:
146
149
 
147
150
  - Sparse one-record-per-file storage.
148
151
  - Wrongness-style records for repeated mistakes and wrong decisions.
149
152
  - Deliberate forgetting via resolve instead of permanent prompt injection.
150
153
 
151
- Borrowed from ECC:
154
+ Kept:
152
155
 
153
156
  - Evidence before recommendation.
154
157
  - Clear separation between observation and next action.
@@ -172,13 +175,13 @@ Tasks:
172
175
 
173
176
  Goal: prevent false runtime completion while keeping ordinary work fast.
174
177
 
175
- Borrowed from Sneakoscope:
178
+ Kept:
176
179
 
177
180
  - Runtime truth vocabulary.
178
181
  - Cleanup must be backed by exit/closure evidence.
179
182
  - tmux physical proof idea, reduced to route-level evidence notes.
180
183
 
181
- Borrowed from ECC:
184
+ Kept:
182
185
 
183
186
  - Evidence boundaries before recommendation.
184
187
  - Explicit partial/blocked/assumed language.
@@ -201,13 +204,13 @@ Tasks:
201
204
 
202
205
  Goal: provide one explicit heavy execution route without increasing total skill count.
203
206
 
204
- Borrowed from Sneakoscope:
207
+ Kept:
205
208
 
206
209
  - Real Team/subagent route boundary.
207
210
  - Cross-verification before completion.
208
211
  - Runtime/tmux/browser proof when mission evidence needs it.
209
212
 
210
- Borrowed from ECC:
213
+ Kept:
211
214
 
212
215
  - Role-specific work boundaries.
213
216
  - Evidence-first reporting.
@@ -233,25 +236,26 @@ Tasks:
233
236
 
234
237
  Goal: remove overlapping skill roles while preserving the best parts of the old routes.
235
238
 
236
- Borrowed from Sneakoscope actual image UX code:
239
+ Kept:
237
240
 
238
241
  - Source screenshot inventory before visual claims.
239
242
  - P0-P3 issue ledger.
240
243
  - P0/P1-first fix loop.
241
244
  - Partial truth cap for text-only or missing-screenshot review.
242
245
 
243
- Borrowed from ECC command docs:
246
+ Kept:
244
247
 
245
248
  - Smallest useful verification command.
246
249
  - Group errors by file and root cause.
247
250
  - Fix one error class at a time.
248
251
  - Compact PASS/FAIL reporting.
249
252
 
250
- Borrowed from Open Design local code and contribution rules:
253
+ Kept:
251
254
 
252
255
  - Real preview/screenshot evidence.
253
256
  - Compact design direction.
254
257
  - P0 visual quality gates over placeholder output.
258
+ - Post-implementation design quality judgment across hierarchy, spacing, typography, color, component detail, interaction, responsiveness, accessibility, and brand fit.
255
259
 
256
260
  Kept out by design:
257
261
 
@@ -273,14 +277,14 @@ Tasks:
273
277
  Goal: keep beginner momentum while creating a path toward professional proof-first work.
274
278
  The hook stays light, but the harness direction does not. `yam` should support a depth ladder: direction fit first, focused proof for ordinary work, strong proof for risky work, and real team proof for `$mission`.
275
279
 
276
- Borrowed from Sneakoscope:
280
+ Kept:
277
281
 
278
282
  - Hook status and trust reporting.
279
283
  - Tool readiness as evidence.
280
284
  - DB/Supabase safety thinking.
281
285
  - Runtime/tmux/process cleanup truth.
282
286
 
283
- Borrowed from ECC:
287
+ Kept:
284
288
 
285
289
  - Selective install and low-context operation.
286
290
  - Evidence boundaries instead of always-on gates.
package/bin/yam.js CHANGED
@@ -65,10 +65,10 @@ const ROUTE_BUDGETS = {
65
65
  limits: { files: 8, commands: 2, reportLines: 16, seconds: 300 }
66
66
  },
67
67
  ueye: {
68
- files: 'project direction, target UI surface, visual evidence, nearby component/styles',
68
+ files: 'project direction, target UI surface, reference/before/after evidence, nearby component/styles',
69
69
  commands: 'browser/screenshot when feasible; inspect 1-3 primary images by default; typecheck/build only if UI implementation changed code',
70
- report: 'design work, source evidence, max 5 inventory rows, states checked, P0-P3 ledger, before/after, truth cap',
71
- expand: 'when direction, reference image, or visual evidence requires it; do not do broad design archaeology for simple tweaks',
70
+ report: 'visual evidence inventory, reference read proof, reference-vs-implementation matrix, design quality review, P0-P3 ledger, truth cap',
71
+ expand: 'when reference fidelity, responsive/state risk, or visual evidence requires it; do not do broad design archaeology for simple tweaks',
72
72
  limits: { files: 10, commands: 3, reportLines: 28, seconds: 600 }
73
73
  },
74
74
  question: {
@@ -291,6 +291,7 @@ async function verify({ quiet = false } = {}) {
291
291
  'risk-escalation.md',
292
292
  'quick.md',
293
293
  'ueye.md',
294
+ 'ueye-proof.md',
294
295
  'ui-quality.md',
295
296
  'question.md',
296
297
  'mission.md',
@@ -319,7 +320,7 @@ async function verify({ quiet = false } = {}) {
319
320
  if (!hasHeading(projectTemplate, section)) issues.push(`project template missing section: ${section}`);
320
321
  }
321
322
  }
322
- for (const template of ['ueye-review.md', 'mission-plan.md', 'runtime-proof.md', 'tuning-log.md']) {
323
+ for (const template of ['ueye-review.md', 'ueye-comparison.md', 'mission-plan.md', 'runtime-proof.md', 'tuning-log.md']) {
323
324
  if (!await exists(path.join(ROOT, 'templates', template))) issues.push(`missing template: ${template}`);
324
325
  }
325
326
 
@@ -858,9 +859,6 @@ async function buildYamLiteContext({ cwd, prompt }) {
858
859
  if (docsHint) lines.push(docsHint);
859
860
  const routeHint = yamLiteRouteHint(prompt);
860
861
  if (routeHint) lines.push(routeHint);
861
- if (await exists(path.join(path.resolve(cwd), '.sneakoscope'))) {
862
- lines.push('Caution: active .sneakoscope detected; avoid mixing proof gates unless the user explicitly wants it.');
863
- }
864
862
  return lines.join('\n');
865
863
  }
866
864
 
@@ -1308,7 +1306,7 @@ async function inspectProjectPack(targetDir = process.cwd()) {
1308
1306
  const instructionSurfaces = await findInstructionSurfaces(resolved);
1309
1307
 
1310
1308
  if (missingSections.length) issues.push(`missing section(s): ${missingSections.join(', ')}`);
1311
- if (words > 1200) warnings.push(`pack is long (${words} words); keep the Karpathy-style core compact`);
1309
+ if (words > 1200) warnings.push(`pack is long (${words} words); keep the core compact`);
1312
1310
  if (words < 80) warnings.push(`pack is very short (${words} words); direction may be too thin to reuse`);
1313
1311
  if (packAgeDays > PACK_STALE_DAYS) warnings.push(`pack is ${packAgeDays} days old; review whether direction or commands changed`);
1314
1312
  if (placeholderLines > 12) warnings.push(`${placeholderLines} placeholder lines are still blank`);
@@ -1353,9 +1351,7 @@ async function findInstructionSurfaces(dir) {
1353
1351
  { path: 'CLAUDE.md', level: 'warning', note: 'active CLAUDE.md may carry non-yam instructions' },
1354
1352
  { path: 'RULES.md', level: 'warning', note: 'active RULES.md may carry non-yam instructions' },
1355
1353
  { path: '.codex/AGENTS.md', level: 'warning', note: 'active .codex/AGENTS.md may override project behavior' },
1356
- { path: '.codex/SNEAKOSCOPE.md', level: 'issue', note: 'active Sneakoscope instruction file detected' },
1357
1354
  { path: '.codex/hooks.json', level: 'issue', note: 'active Codex hook file detected' },
1358
- { path: '.sneakoscope', level: 'issue', note: 'active Sneakoscope directory detected' },
1359
1355
  { path: '.agents', level: 'warning', note: 'project-local .agents directory may add additional skills or instructions' }
1360
1356
  ];
1361
1357
  const found = [];
@@ -1653,6 +1649,8 @@ async function printTemplate(name = '') {
1653
1649
  const map = {
1654
1650
  project: PROJECT_PACK,
1655
1651
  ueye: 'ueye-review.md',
1652
+ 'ueye-comparison': 'ueye-comparison.md',
1653
+ ueyecompare: 'ueye-comparison.md',
1656
1654
  mission: 'mission-plan.md',
1657
1655
  proof: 'runtime-proof.md',
1658
1656
  runtime: 'runtime-proof.md',
@@ -1660,7 +1658,7 @@ async function printTemplate(name = '') {
1660
1658
  };
1661
1659
  const file = map[key];
1662
1660
  if (!file) {
1663
- console.error('usage: yam template <project|ueye|mission|proof|tuning>');
1661
+ console.error('usage: yam template <project|ueye|ueye-comparison|mission|proof|tuning>');
1664
1662
  process.exitCode = 1;
1665
1663
  return;
1666
1664
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "yam-harness",
3
- "version": "0.1.1",
3
+ "version": "0.1.3",
4
4
  "description": "Progressive proof-first Codex harness: start fast, deepen deliberately, stay honest by design.",
5
5
  "type": "module",
6
6
  "author": "0kim0bos",
@@ -36,10 +36,10 @@ Or:
36
36
  Current-docs proof: skipped because this was stable/local/non-SDK work.
37
37
  ```
38
38
 
39
- ## Compared Baseline
39
+ ## Design Baseline
40
40
 
41
- Sneakoscope favors source-intelligence proof for current tool behavior.
41
+ Strict proof favors source-backed evidence for current tool behavior.
42
42
 
43
- ECC keeps research/context selective and low-context.
43
+ Modular skill workflows keep research/context selective and low-context.
44
44
 
45
- Karpathy-style minimalism says the rule is useful only when it changes the answer.
45
+ Minimal-core design says the rule is useful only when it changes the answer.
@@ -31,10 +31,10 @@ Before claiming safe:
31
31
  - A successful migration command is not automatically safe; it only proves that command execution completed.
32
32
  - Do not claim production safety without environment evidence.
33
33
 
34
- ## Compared Baseline
34
+ ## Design Baseline
35
35
 
36
- Sneakoscope would likely gate destructive DB work more aggressively.
36
+ Strict proof would gate destructive DB work more aggressively.
37
37
 
38
- ECC would keep the check selective and evidence-bound.
38
+ Modular skill workflows keep the check selective and evidence-bound.
39
39
 
40
- Karpathy-style minimalism keeps this as a short rule and a small detector, not a full DB policy engine.
40
+ Minimal-core design keeps this as a short rule and a small detector, not a full DB policy engine.
@@ -52,10 +52,10 @@ Runtime work needs stronger evidence because long-running processes can create f
52
52
  - No release-blocking runtime proof unless the user chooses `$deep` or `$mission`.
53
53
  - No full `$mission` claim without real subagent/team evidence; downgrade to `$deep`, or mark mission partial/blocked.
54
54
 
55
- Compared baseline:
55
+ Design baseline:
56
56
 
57
- - Sneakoscope would collect stronger physical proof and gate completion more aggressively.
58
- - ECC would keep evidence boundaries and report what is known vs inferred.
59
- - Karpathy-style minimalism would keep the rule short and obeyable.
57
+ - Strict proof collects stronger physical proof and gates completion more aggressively.
58
+ - Modular skill workflows keep evidence boundaries and report what is known vs inferred.
59
+ - Minimal-core design keeps the rule short and obeyable.
60
60
 
61
61
  `yam` keeps the guard explicit, cheap, and route-aware.
@@ -14,7 +14,7 @@ Allowed:
14
14
  - Remind the agent not to overclaim verification, cleanup, or visual evidence.
15
15
  - Suggest `$quick`, `$ueye`, `$question`, `$scout`, `$deep`, or `$mission` based on obvious prompt signals.
16
16
  - Mention a project pack or memory summary when present.
17
- - Warn when `.sneakoscope` is active in the current project.
17
+ - Warn when conflicting proof-harness surfaces are active in the current project.
18
18
 
19
19
  Not allowed:
20
20
 
@@ -44,12 +44,12 @@ Project hooks write to `<project>/.codex/hooks.json`.
44
44
 
45
45
  `yam` backs up an existing hook file before enabling the lite hook.
46
46
 
47
- ## Compared Baseline
47
+ ## Design Baseline
48
48
 
49
- Sneakoscope uses hooks as a broad trust surface with route prep, tool evidence, permission gates, subagent evidence, and stop gates.
49
+ Broad hook systems often use route prep, tool evidence, permission gates, subagent evidence, and stop gates.
50
50
 
51
- ECC favors selective setup and lower-context workflows.
51
+ Selective skill systems favor lower-context workflows.
52
52
 
53
- Karpathy-style minimalism would avoid hooks unless the rule is short and changes behavior.
53
+ Minimal-core systems avoid hooks unless the rule is short and changes behavior.
54
54
 
55
55
  `yam` keeps this hook advisory-only so beginner momentum is preserved while the agent still receives a direction nudge. Deeper proof belongs in `$deep` and real team execution belongs in `$mission`, not in an always-on prompt hook.
@@ -2,21 +2,21 @@
2
2
 
3
3
  `yam` uses markdown as a small direction layer, not as an automatic control system.
4
4
 
5
- ## Compared Baseline
5
+ ## Design Baseline
6
6
 
7
- Sneakoscope:
7
+ Strict proof systems:
8
8
 
9
9
  - Creates and manages more markdown surfaces for agent control, route instructions, proof, and dashboards.
10
10
  - Good for strict verification and anti-fake-work pressure.
11
11
  - Risk: too much generated context and too much automatic intervention.
12
12
 
13
- ECC:
13
+ Modular skill systems:
14
14
 
15
15
  - Splits markdown into modular instructions, rules, skills, and commands.
16
16
  - Good for selective installation and low-context operation.
17
17
  - Risk: too many optional files can still become noisy if installed wholesale.
18
18
 
19
- Karpathy-style minimal harness:
19
+ Minimal-core systems:
20
20
 
21
21
  - Keeps the core instruction document short and human-readable.
22
22
  - Good for speed, obedience, and easy maintenance.
@@ -2,12 +2,12 @@
2
2
 
3
3
  `yam memory` is an opt-in, project-local memory layer.
4
4
 
5
- It borrows only the lightest useful parts from heavier harnesses:
5
+ It keeps only the lightest useful parts from heavier harness patterns:
6
6
 
7
- - Sneakoscope TriWiki: sparse records, one file per claim, deliberate forgetting instead of injecting every old claim.
8
- - Sneakoscope wrongness memory: remember repeated mistakes, wrong decisions, stale assumptions, and overconfident claims.
9
- - ECC research style: separate evidence, inference, and recommendation.
10
- - Karpathy-style minimalism: keep the mechanism small enough to obey.
7
+ - Sparse records, one file per durable claim, and deliberate forgetting instead of injecting every old claim.
8
+ - Wrongness memory for repeated mistakes, wrong decisions, stale assumptions, and overconfident claims.
9
+ - Separate evidence, inference, and recommendation.
10
+ - Keep the mechanism small enough to obey.
11
11
 
12
12
  Storage:
13
13
 
@@ -77,10 +77,10 @@ Doctor scan:
77
77
  Use `references/doctor-scan.md` before final completion.
78
78
  Keep the scan short, but cover direction fit, scope control, verification, runtime/cleanup, truth status, and fix-first items.
79
79
 
80
- Compared baseline:
80
+ Design baseline:
81
81
 
82
- - Sneakoscope would likely make this a Team route with stronger proof gates and required agent evidence.
83
- - ECC would split role responsibilities and keep evidence boundaries.
84
- - Karpathy-style minimalism would avoid adding this unless it clearly replaces a confusing middle route.
82
+ - Strict proof would likely make this a team route with stronger gates and required agent evidence.
83
+ - Modular skill workflows split role responsibilities and keep evidence boundaries.
84
+ - Minimal-core design avoids adding this unless it clearly replaces a confusing middle route.
85
85
 
86
86
  `yam` uses mission to replace the old standalone runtime route with a clearer heavy execution route.
@@ -2,15 +2,15 @@
2
2
 
3
3
  `quick` is the merged small-work route: fast patching, ordinary scoped implementation, and fast error scanning.
4
4
 
5
- ## Borrowed, With Weight Removed
5
+ ## Selected Principles
6
6
 
7
- From Sneakoscope:
7
+ Strict proof:
8
8
 
9
9
  - Honest completion language.
10
10
  - Real versus assumed verification.
11
11
  - Stop instead of claiming success when evidence is missing.
12
12
 
13
- From ECC:
13
+ Focused execution:
14
14
 
15
15
  - Detect the smallest useful command.
16
16
  - Group build/type/lint/test errors by file and root cause.
@@ -18,7 +18,7 @@ From ECC:
18
18
  - Re-run the same focused command after a fix.
19
19
  - Use a compact PASS/FAIL matrix.
20
20
 
21
- From Karpathy-style minimal harness:
21
+ Minimal core:
22
22
 
23
23
  - Keep the instruction short enough to obey.
24
24
  - Read the smallest useful context.
@@ -33,12 +33,12 @@ yam measure ueye --files 7 --commands 2 --report-lines 18 --seconds 260
33
33
  - `$deep`: can exceed ordinary budgets, but the reason must be risk-tied; single-agent runtime/tmux/browser checks belong here when verification needs them.
34
34
  - `$mission`: can spend more context on real subagent/team lanes, cross-verification, doctor scan, and runtime evidence, but only for approved plans where real subagents are used or explicitly unavailable/partial.
35
35
 
36
- ## Compared Baseline
36
+ ## Design Baseline
37
37
 
38
- Sneakoscope would favor stronger automatic evidence collection.
38
+ Strict proof would favor stronger automatic evidence collection.
39
39
 
40
- ECC would favor selective, low-context reporting.
40
+ Modular skill workflows favor selective, low-context reporting.
41
41
 
42
- Karpathy-style minimal harness would remove the measurement unless it changes behavior.
42
+ Minimal-core design removes the measurement unless it changes behavior.
43
43
 
44
44
  `yam` keeps manual measurement because it helps reduce over-reading without installing hooks.
@@ -49,7 +49,7 @@ Default:
49
49
  Advisory:
50
50
 
51
51
  - `yam-lite` hook may suggest routes and warn about overclaiming.
52
- - `yam pack` may warn about stale project direction, command drift, active hooks, or Sneakoscope surfaces.
52
+ - `yam pack` may warn about stale project direction, command drift, active hooks, or legacy proof surfaces.
53
53
 
54
54
  On demand:
55
55
 
@@ -60,7 +60,7 @@ On demand:
60
60
  - `yam tools doctor`: inspect tool readiness without changing project state.
61
61
  - `yam proof`: summarize actual evidence without running verification.
62
62
 
63
- ## Borrow From Sneakoscope
63
+ ## Strict Proof Inputs
64
64
 
65
65
  - Tool readiness checks.
66
66
  - Hook status and trust reporting.
@@ -71,14 +71,14 @@ On demand:
71
71
  - Destructive DB/Supabase command detection and production-write caution.
72
72
  - Feature/release inventory as an optional doctor, not a default gate.
73
73
 
74
- ## Borrow From ECC
74
+ ## Modular Skill Inputs
75
75
 
76
76
  - Selective install and profiles.
77
77
  - Evidence boundaries.
78
78
  - Low-context command detection.
79
79
  - Optional orchestration instead of always-on orchestration.
80
80
 
81
- ## Borrow From Open Design
81
+ ## Design Quality Inputs
82
82
 
83
83
  - Real preview/screenshot evidence.
84
84
  - Compact design direction.
@@ -0,0 +1,109 @@
1
+ # Ueye Proof
2
+
3
+ Ueye proof keeps visual claims honest without turning design review into a heavy release gate.
4
+
5
+ ## When To Use
6
+
7
+ Use Ueye proof artifacts when:
8
+
9
+ - The user asks for UI/UX/design review.
10
+ - A reference image, screenshot, or target design is part of the task.
11
+ - The final answer needs to claim visual quality, visual parity, or responsive correctness.
12
+ - A screenshot or browser check exists and should be tied to the conclusion.
13
+
14
+ Skip or compress them when:
15
+
16
+ - The change is text-only or documentation-only.
17
+ - The user asks for a quick opinion and no visual source exists.
18
+ - The only possible evidence is code reading; mark the result `assumed` or `partial`.
19
+
20
+ ## Artifact Set
21
+
22
+ ### Visual Evidence Inventory
23
+
24
+ Purpose: identify what visual sources were inspected.
25
+
26
+ Minimum useful fields:
27
+
28
+ - Label:
29
+ - Type:
30
+ - Path/URL:
31
+ - Dimensions:
32
+ - sha256:
33
+ - Viewport:
34
+ - State:
35
+ - Role:
36
+
37
+ Bounds:
38
+
39
+ - Default to 1-3 primary visual sources.
40
+ - Cap ordinary inventories at 5 rows.
41
+ - Record missing dimensions or hashes as `unknown`, not as failure.
42
+ - Use generated annotations only as derivative aids.
43
+
44
+ ### Reference Read Proof
45
+
46
+ Purpose: separate reading the reference from judging the implementation.
47
+
48
+ Capture:
49
+
50
+ - Layout:
51
+ - Hierarchy:
52
+ - Typography:
53
+ - Color/contrast:
54
+ - Component detail:
55
+ - Interaction/motion:
56
+ - Responsiveness:
57
+ - Brand/mood:
58
+
59
+ Keep this concise. It should explain what the reference asks for, not become a full design essay.
60
+
61
+ ### Reference vs Implementation Matrix
62
+
63
+ Purpose: compare reference direction against the actual implementation evidence.
64
+
65
+ Status values:
66
+
67
+ - `matched`
68
+ - `similar`
69
+ - `different`
70
+ - `not-verified`
71
+ - `not-applicable`
72
+
73
+ Rows to use only when relevant:
74
+
75
+ - Layout and spacing.
76
+ - Visual hierarchy.
77
+ - Typography.
78
+ - Color and contrast.
79
+ - Component detail.
80
+ - Interaction and motion.
81
+ - Responsiveness.
82
+ - Accessibility-relevant visuals.
83
+ - Brand and mood fit.
84
+
85
+ ### Design Quality Review
86
+
87
+ Purpose: judge the implemented UI on design quality after evidence is established.
88
+
89
+ Dimensions:
90
+
91
+ - Visual hierarchy.
92
+ - Layout and spacing.
93
+ - Typography.
94
+ - Color and contrast.
95
+ - Component detail.
96
+ - Interaction and motion.
97
+ - Responsiveness.
98
+ - Accessibility.
99
+ - Brand and mood fit.
100
+
101
+ Report findings as P0-P3. Keep P2/P3 short unless the user requested a full polish pass.
102
+ For each relevant design dimension, use `pass`, `needs-polish`, or `fails`.
103
+
104
+ ## Truth Caps
105
+
106
+ - Full visual verification requires real implementation evidence and relevant recheck.
107
+ - Reference-only evidence can support direction, not implementation proof.
108
+ - Generated-only evidence can support ideation or annotation, not implemented-screen verification.
109
+ - Missing screenshots, unavailable browser, or text-only review should cap the result at `partial`, `blocked`, or `assumed`.
@@ -2,9 +2,9 @@
2
2
 
3
3
  `ueye` is the merged UI/design route: design-heavy implementation, screenshot-led UX review, and visual QA.
4
4
 
5
- ## Borrowed, With Weight Removed
5
+ ## Selected Principles
6
6
 
7
- From Sneakoscope image UX review:
7
+ Visual proof:
8
8
 
9
9
  - Source-screen inventory before visual claims.
10
10
  - P0-P3 issue ledger.
@@ -12,21 +12,21 @@ From Sneakoscope image UX review:
12
12
  - Recheck changed or high-risk screens after fixes when feasible.
13
13
  - Cap text-only or missing-screenshot reviews as partial instead of fully verified.
14
14
 
15
- Kept out from Sneakoscope by design:
15
+ Kept out by design:
16
16
 
17
17
  - Mandatory generated annotated images.
18
18
  - Image voxel ledgers.
19
19
  - Release gates for every UI change.
20
20
  - Always-on proof loops.
21
21
 
22
- From Open Design:
22
+ Design quality:
23
23
 
24
24
  - Real examples and previews matter more than abstract prose.
25
25
  - Design direction should be compact and searchable.
26
26
  - P0 gates should reject placeholder visuals, generic UI, and broken responsive states.
27
27
  - UI work should be self-contained enough to inspect.
28
28
 
29
- From ECC:
29
+ Evidence boundaries:
30
30
 
31
31
  - Separate evidence from judgment.
32
32
  - Keep review output compact.
@@ -57,6 +57,94 @@ Bound:
57
57
  - Generated callout images are optional and should usually be at most one per review pass.
58
58
  - P0/P1 issues can expand the review; P2/P3 should stay top-few unless the user asks for a full polish pass.
59
59
 
60
+ ## Ueye Proof Artifacts
61
+
62
+ Use these when the task depends on visual truth, reference matching, or design quality judgment. They are proof aids, not a separate lite/deep split.
63
+
64
+ 1. Visual Evidence Inventory.
65
+ 2. Reference Read Proof.
66
+ 3. Reference vs Implementation Matrix.
67
+ 4. Design Quality Review.
68
+
69
+ Default bound:
70
+
71
+ - Use only the artifacts that support the claim being made.
72
+ - Keep evidence to the smallest set of screenshots, references, or URLs that can honestly support the result.
73
+ - Prefer paths, URLs, dimensions, and hashes for images when available.
74
+ - Do not require voxel grids, exhaustive callouts, or generated annotations.
75
+ - Do not let proof artifacts turn Ueye into always-on heavy orchestration.
76
+
77
+ ### Visual Evidence Inventory
78
+
79
+ Record the real screens and image sources behind the review.
80
+
81
+ Include when known:
82
+
83
+ - Label.
84
+ - Type: implementation screenshot, browser URL, user screenshot, reference image, artifact export, generated annotation.
85
+ - Path or URL.
86
+ - Dimensions.
87
+ - sha256.
88
+ - Viewport or device.
89
+ - State: default, loading, error, empty, disabled, hover/focus, mobile, or unknown.
90
+ - Role: proof, reference direction, annotation, or partial evidence.
91
+
92
+ Images without hashes or dimensions can still be useful, but mark the missing fields plainly.
93
+
94
+ ### Reference Read Proof
95
+
96
+ Before judging a reference match, state what was actually read from the reference.
97
+
98
+ Keep it visual and bounded:
99
+
100
+ - Layout structure.
101
+ - Hierarchy and emphasis.
102
+ - Typography feel.
103
+ - Color and contrast.
104
+ - Component shapes and details.
105
+ - Interaction or motion cues when visible.
106
+ - Responsive implication if the reference includes multiple sizes.
107
+ - Brand or mood fit.
108
+
109
+ Reference read proof describes the direction. It is not proof that the implementation matches.
110
+
111
+ ### Reference vs Implementation Matrix
112
+
113
+ Use when a user supplies, names, or implies a reference visual.
114
+
115
+ Compare only meaningful dimensions:
116
+
117
+ - Layout and spacing.
118
+ - Visual hierarchy.
119
+ - Typography.
120
+ - Color and contrast.
121
+ - Component detail.
122
+ - Interaction and motion.
123
+ - Responsiveness.
124
+ - Accessibility-relevant visual behavior.
125
+ - Brand or mood fit.
126
+
127
+ For each row, record `matched`, `similar`, `different`, `not-verified`, or `not-applicable`, plus the smallest evidence note.
128
+
129
+ ### Design Quality Review
130
+
131
+ Use as the judgment layer after evidence is separated and reference comparison is complete.
132
+
133
+ Review dimensions:
134
+
135
+ - Visual hierarchy.
136
+ - Layout and spacing.
137
+ - Typography.
138
+ - Color and contrast.
139
+ - Component detail.
140
+ - Interaction and motion.
141
+ - Responsiveness.
142
+ - Accessibility.
143
+ - Brand and mood fit.
144
+
145
+ For each relevant dimension, record `pass`, `needs-polish`, or `fails`.
146
+ Keep actionable findings in P0-P3 order. Prefer fixing P0/P1 and cheap local P2 issues before broad polish.
147
+
60
148
  ## P0-P3 Ledger
61
149
 
62
150
  - P0: blocker, unusable, impossible to complete primary workflow, severe accessibility or responsive failure.
@@ -67,12 +155,15 @@ Bound:
67
155
  ## Implementation Loop
68
156
 
69
157
  1. Direction fit.
70
- 2. Source-screen inventory.
71
- 3. Nearby pattern and token scan.
72
- 4. Implementation.
73
- 5. Screenshot/browser recheck when feasible.
74
- 6. P0/P1 closeout.
75
- 7. Truth status.
158
+ 2. Visual Evidence Inventory.
159
+ 3. Reference Read Proof when a reference is used.
160
+ 4. Nearby pattern and token scan.
161
+ 5. Implementation.
162
+ 6. Screenshot/browser recheck when feasible.
163
+ 7. Reference vs Implementation Matrix when reference fidelity matters.
164
+ 8. Design Quality Review.
165
+ 9. P0/P1 closeout.
166
+ 10. Truth status.
76
167
 
77
168
  ## Truth Caps
78
169
 
@@ -9,6 +9,7 @@ Use for:
9
9
 
10
10
  - Design-heavy UI/UX implementation.
11
11
  - Reference-image-based UI direction.
12
+ - Reference-image-based visual fidelity work where the result must be compared against the reference.
12
13
  - Screenshot, URL, or current-screen UX review.
13
14
  - Pre-fix and post-fix visual QA.
14
15
  - UI states, responsive behavior, hierarchy, CTA, contrast, alignment, spacing, and affordance.
@@ -23,6 +24,8 @@ Do not use for:
23
24
 
24
25
  - Direction before execution.
25
26
  - Visual evidence before visual claims.
27
+ - Ueye is the tight UX/UI/design route, not Quick with screenshots.
28
+ - Reference-based work requires reference read proof before implementation and reference comparison after implementation.
26
29
  - Context-reuse first.
27
30
  - Token economy is part of quality.
28
31
  - Product fit beats decoration.
@@ -31,6 +34,7 @@ Do not use for:
31
34
  - Text-only visual critique cannot be reported as fully verified when screenshot evidence was required.
32
35
  - Generated annotated images are optional, not a default gate.
33
36
  - Image evidence should stay bounded: inspect the primary screen first, then only the states/images needed to support the claim.
37
+ - Design quality judgment belongs after implementation/review: compare to the reference first, then judge whether the result is good design.
34
38
 
35
39
  ## Workflow
36
40
 
@@ -41,17 +45,53 @@ Do not use for:
41
45
  - local/browser screenshot
42
46
  - exported static artifact image
43
47
  - URL/current screen, when accessible
44
- 4. Inspect nearby UI patterns, tokens, styles, and state handling.
45
- 5. Implement the smallest coherent design improvement when implementation is requested.
46
- 6. Check default, loading, error, empty, disabled, hover/focus, and mobile states when relevant.
47
- 7. Run browser/screenshot verification when feasible.
48
- 8. Produce a P0-P3 visual issue ledger and fix path.
49
- 9. Recheck changed/high-risk screens after fixes when feasible.
48
+ 4. When a reference image/screen is used, produce Reference Read Proof before changing UI:
49
+ - 5-8 concrete observations about layout, spacing, typography, color, hierarchy, component shape, interaction/motion, and brand mood.
50
+ - Mark any ambiguous or unobservable detail instead of inventing it.
51
+ 5. Inspect nearby UI patterns, tokens, styles, and state handling.
52
+ 6. Implement the smallest coherent design improvement when implementation is requested.
53
+ 7. Check default, loading, error, empty, disabled, hover/focus, and mobile states when relevant.
54
+ 8. Run browser/screenshot verification when feasible.
55
+ 9. Compare reference and implementation when reference fidelity was requested:
56
+ - matched, similar, different, not-applicable, or not-verified.
57
+ - Separate "faithful to reference" from "good design."
58
+ 10. Run Design Quality Review after implementation/review:
59
+ - visual hierarchy
60
+ - layout and spacing
61
+ - typography
62
+ - color and contrast
63
+ - component detail
64
+ - interaction and motion
65
+ - responsiveness
66
+ - accessibility
67
+ - brand or mood fit
68
+ - mark each relevant dimension as pass, needs-polish, or fails.
69
+ 11. Produce a P0-P3 visual issue ledger and fix path.
70
+ 12. Recheck changed/high-risk screens after fixes when feasible.
71
+
72
+ ## Required Ueye Artifacts
73
+
74
+ Use these artifacts for reference-led implementation or serious visual review. Keep them compact for small screens, but do not omit them when the claim depends on reference fidelity.
75
+
76
+ 1. Visual Evidence Inventory:
77
+ - reference, before, and after evidence when available.
78
+ - path or URL, source type, state, viewport, and visual verification cap.
79
+ - sha256 and dimensions when an image file is locally available.
80
+ 2. Reference Read Proof:
81
+ - concrete observations from the reference before implementation.
82
+ - no implementation claim should depend on an unrecorded reference observation.
83
+ 3. Reference vs Implementation Matrix:
84
+ - compare the implemented result against the reference by aspect.
85
+ - record matched, similar, different, not-applicable, or not-verified.
86
+ 4. Design Quality Review:
87
+ - judge whether the result is good UI/UX/design after reference comparison.
88
+ - use pass, needs-polish, or fails for each relevant design dimension.
50
89
 
51
90
  ## Visual Truth Caps
52
91
 
53
92
  - Full visual verification requires real source-screen evidence.
54
93
  - Reference images guide direction; they do not prove the implemented screen unless compared with real source-screen evidence.
94
+ - Reference fidelity claims require both Reference Read Proof and Reference vs Implementation Matrix.
55
95
  - Generated or annotated images are derivative evidence; they cannot upgrade missing real screen evidence to `verified`.
56
96
  - Inspect 1-3 primary images by default; expand only for P0/P1 risk, responsive breakage, or user-requested deep visual QA.
57
97
  - Keep source-screen inventory to the 5 most important rows by default.
@@ -84,6 +124,8 @@ Report:
84
124
 
85
125
  - What changed visually or what was reviewed.
86
126
  - Source evidence used.
127
+ - Reference Read Proof and reference comparison result when a reference was used.
128
+ - Design Quality Review result when implementation or serious visual review was requested.
87
129
  - P0-P3 issues or confirmation that no blockers were found.
88
130
  - States/viewports checked.
89
131
  - Truth status and visual verification cap.
@@ -33,7 +33,7 @@ Use this to tune route wording from real use.
33
33
 
34
34
  ## Compared Against
35
35
 
36
- - Sneakoscope:
37
- - ECC:
38
- - Karpathy:
36
+ - Strict proof:
37
+ - Modular skills:
38
+ - Minimal core:
39
39
  - yam decision:
@@ -0,0 +1,78 @@
1
+ # yam Ueye Reference Comparison
2
+
3
+ ## Input
4
+
5
+ - Implementation screenshot/URL:
6
+ - Reference screenshot/URL/path:
7
+ - Product or screen:
8
+ - Review bound:
9
+
10
+ ## Visual Evidence Inventory
11
+
12
+ | Label | Type | Path/URL | Dimensions | sha256 | Viewport | State | Role |
13
+ | --- | --- | --- | --- | --- | --- | --- | --- |
14
+ | implementation | | | unknown | unknown | | | proof |
15
+ | reference | | | unknown | unknown | | | reference direction |
16
+
17
+ ## Reference Read Proof
18
+
19
+ - Layout:
20
+ - Visual hierarchy:
21
+ - Typography:
22
+ - Color/contrast:
23
+ - Component detail:
24
+ - Interaction/motion:
25
+ - Responsiveness:
26
+ - Brand/mood fit:
27
+
28
+ ## Reference vs Implementation Matrix
29
+
30
+ Use `matched`, `similar`, `different`, `not-verified`, or `not-applicable`.
31
+
32
+ | Dimension | Status | Evidence | Suggested Fix |
33
+ | --- | --- | --- | --- |
34
+ | Layout and spacing | not-verified | | |
35
+ | Visual hierarchy | not-verified | | |
36
+ | Typography | not-verified | | |
37
+ | Color and contrast | not-verified | | |
38
+ | Component detail | not-verified | | |
39
+ | Interaction and motion | not-verified | | |
40
+ | Responsiveness | not-verified | | |
41
+ | Accessibility visuals | not-verified | | |
42
+ | Brand and mood fit | not-verified | | |
43
+
44
+ ## Design Quality Review
45
+
46
+ Use `pass`, `needs-polish`, or `fails` for each relevant design dimension before listing actionable issues.
47
+
48
+ | Dimension | Result | Evidence |
49
+ | --- | --- | --- |
50
+ | Visual hierarchy | needs-polish | |
51
+ | Layout/spacing | needs-polish | |
52
+ | Typography | needs-polish | |
53
+ | Color/contrast | needs-polish | |
54
+ | Component detail | needs-polish | |
55
+ | Interaction/motion | needs-polish | |
56
+ | Responsiveness | needs-polish | |
57
+ | Accessibility | needs-polish | |
58
+ | Brand/mood fit | needs-polish | |
59
+
60
+ ### P0 Blockers
61
+
62
+ - None.
63
+
64
+ ### P1 Major
65
+
66
+ - None.
67
+
68
+ ### P2 Quality
69
+
70
+ - None.
71
+
72
+ ### P3 Polish
73
+
74
+ - None.
75
+
76
+ ## Truth Status
77
+
78
+ - verified / partial / skipped / blocked / assumed:
@@ -12,14 +12,61 @@
12
12
  - Fits project direction:
13
13
  - Mismatch:
14
14
 
15
- ## Source-Screen Inventory
15
+ ## Visual Evidence Inventory
16
16
 
17
17
  - Evidence bound: 1-3 primary images, max 5 inventory rows by default.
18
- - Source type:
19
- - State:
18
+ - Label:
19
+ - Type:
20
+ - Path/URL:
21
+ - Dimensions:
22
+ - sha256:
20
23
  - Viewport:
24
+ - State:
25
+ - Role: proof / reference direction / annotation / partial evidence
21
26
  - Visual verification cap:
22
- - Reference/generated image used only as direction or annotation:
27
+
28
+ ## Reference Read Proof
29
+
30
+ - Layout:
31
+ - Visual hierarchy:
32
+ - Typography:
33
+ - Color/contrast:
34
+ - Component detail:
35
+ - Interaction/motion:
36
+ - Responsiveness:
37
+ - Brand/mood fit:
38
+
39
+ ## Reference vs Implementation Matrix
40
+
41
+ | Dimension | Status | Evidence |
42
+ | --- | --- | --- |
43
+ | Layout and spacing | not-verified | |
44
+ | Visual hierarchy | not-verified | |
45
+ | Typography | not-verified | |
46
+ | Color and contrast | not-verified | |
47
+ | Component detail | not-verified | |
48
+ | Interaction and motion | not-verified | |
49
+ | Responsiveness | not-verified | |
50
+ | Accessibility visuals | not-verified | |
51
+ | Brand and mood fit | not-verified | |
52
+
53
+ Status values: matched / similar / different / not-verified / not-applicable.
54
+
55
+ ## Design Quality Review
56
+
57
+ | Dimension | Result | Evidence |
58
+ | --- | --- | --- |
59
+ | Visual hierarchy | needs-polish | |
60
+ | Layout/spacing | needs-polish | |
61
+ | Typography | needs-polish | |
62
+ | Color/contrast | needs-polish | |
63
+ | Component detail | needs-polish | |
64
+ | Interaction/motion | needs-polish | |
65
+ | Responsiveness | needs-polish | |
66
+ | Accessibility | needs-polish | |
67
+ | Brand/mood fit | needs-polish | |
68
+
69
+ Result values: pass / needs-polish / fails.
23
70
 
24
71
  ## P0-P3 Issues
25
72
 
@@ -41,14 +88,15 @@
41
88
 
42
89
  ## Checks
43
90
 
44
- - Hierarchy:
45
- - CTA:
46
- - Spacing:
47
- - Alignment:
48
- - Contrast:
49
- - Density:
50
- - Text fit:
51
- - Mobile:
91
+ - Visual hierarchy:
92
+ - Layout/spacing:
93
+ - Typography:
94
+ - Color/contrast:
95
+ - Component detail:
96
+ - Interaction/motion:
97
+ - Responsiveness:
98
+ - Accessibility:
99
+ - Brand/mood fit:
52
100
  - Empty/loading/error/disabled/hover/focus states:
53
101
 
54
102
  ## Safe Fix Path
package/yam.manifest.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "yam",
3
- "version": "0.1.1",
3
+ "version": "0.1.3",
4
4
  "principles": [
5
5
  "Direction before execution.",
6
6
  "Start fast.",
@@ -22,7 +22,7 @@
22
22
  {
23
23
  "id": "ueye",
24
24
  "stage": "v0.3",
25
- "purpose": "Design-heavy UI/UX implementation and screenshot-led visual review with P0-P3 issue tracking."
25
+ "purpose": "Tight UI/UX/design implementation and visual review with reference-read proof, visual comparison, design quality review, and P0-P3 issue tracking."
26
26
  },
27
27
  {
28
28
  "id": "question",