npm - yam-harness - Versions diffs - 0.1.1 → 0.1.3 - Mend

yam-harness 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/COMMANDS.md +6 -0
package/DECISIONS.md +10 -10
package/README.md +2 -1
package/ROADMAP.md +20 -16
package/bin/yam.js +9 -11
package/package.json +1 -1
package/references/current-docs.md +4 -4
package/references/db-supabase-safety-lite.md +4 -4
package/references/honest-completion.md +4 -4
package/references/hook-lite.md +5 -5
package/references/markdown-management.md +4 -4
package/references/memory.md +5 -5
package/references/mission.md +4 -4
package/references/quick.md +4 -4
package/references/token-budget-reporter.md +4 -4
package/references/tool-trust-layer.md +4 -4
package/references/ueye-proof.md +109 -0
package/references/ueye.md +102 -11
package/skills/ueye/SKILL.md +48 -6
package/templates/tuning-log.md +3 -3
package/templates/ueye-comparison.md +78 -0
package/templates/ueye-review.md +60 -12
package/yam.manifest.json +2 -2

package/COMMANDS.md CHANGED Viewed

@@ -22,6 +22,8 @@ $quick
 ```text
 $ueye
 첨부한 레퍼런스 이미지의 분위기를 참고해서 가격 카드 UI를 고급스럽게 개선해줘.
+구현 전에 레퍼런스에서 읽은 디자인 특징을 짧게 증명하고,
+구현 후 레퍼런스와 결과물을 비교해서 비슷한 점과 다른 점을 정리해줘.
 이 프로젝트의 디자인 방향성과 맞게 구현하고,
 가능하면 기본/모바일/오류 상태를 실제 화면 기준으로 확인해줘.
 ```
@@ -33,6 +35,10 @@ $ueye
 가능한 안전한 수정안을 제안해줘.
 ```
+```text
+yam template ueye-comparison
+```
 ## Review-Only
 ```text

package/DECISIONS.md CHANGED Viewed

@@ -1,17 +1,17 @@
 # yam Decision Baseline
-Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style minimal harness principles.
+Every `yam` change is evaluated against strict proof, modular skill, and minimal-core harness principles.
 ## Fixed Questions
-1. What would Sneakoscope verify?
-2. What would ECC make selective or low-context?
-3. What would Karpathy remove to keep the core obeyable?
+1. What needs concrete evidence before completion?
+2. What should stay selective or low-context?
+3. What can be removed to keep the core obeyable?
 4. What should `yam` keep light by default, and what should deepen deliberately?
 ## Borrow
-### Sneakoscope
+### Strict Proof
 - Truthful completion language.
 - Risk escalation.
@@ -19,7 +19,7 @@ Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style min
 - Fake versus real distinction.
 - Runtime/process proof only when explicitly requested.
-### ECC
+### Modular Skills
 - Skills-first structure.
 - Selective install.
@@ -27,7 +27,7 @@ Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style min
 - Token optimization.
 - Project-specific rules instead of global bloat.
-### Karpathy-Style Minimal Harness
+### Minimal Core
 - Short core.
 - Few route names.
@@ -36,21 +36,21 @@ Every `yam` change is evaluated against Sneakoscope, ECC, and Karpathy-style min
 ## Reject
-### From Sneakoscope
+### From Strict Proof
 - Mandatory hooks.
 - Mandatory Team or subagent proof.
 - Always-on tmux/proof lifecycle.
 - Heavy memory systems for ordinary edits.
-### From ECC
+### From Modular Skills
 - Full install by default.
 - Giant catalog context.
 - Hook runtime by default.
 - Too many always-on rules.
-### From Minimal Harness
+### From Minimal Core
 - Under-verification.
 - Vague quality rules.

package/README.md CHANGED Viewed

@@ -17,7 +17,7 @@ End with remaining tasks and fix-first items.
 ## Routes
 - `$quick`: fast scoped implementation, small fixes, and quick error scans.
-- `$ueye`: design-heavy UI/UX implementation and screenshot-led visual review.
+- `$ueye`: tight UI/UX/design implementation and visual review with reference-read proof, comparison, and quality judgment.
 - `$question`: direct Q&A without turning simple questions into research projects.
 - `$scout`: bounded investigation and recommendation.
 - `$deep`: single-agent heavy verification by request, including runtime/tmux/browser/process proof when needed.
@@ -134,6 +134,7 @@ node ./bin/yam.js hook disable lite --global
 node ./bin/yam.js budget ueye
 node ./bin/yam.js measure ueye --files 5 --commands 2 --report-lines 12 --seconds 180
 node ./bin/yam.js template ueye
+node ./bin/yam.js template ueye-comparison
 node ./bin/yam.js template mission
 node ./bin/yam.js template proof
 node ./bin/yam.js tune-log /path/to/project

package/ROADMAP.md CHANGED Viewed

@@ -64,12 +64,13 @@ Tasks:
 ### 5. Ueye Workflow
-Goal: make screenshot/UI review more repeatable.
+Goal: make screenshot/UI review and reference-led design implementation more repeatable.
 Tasks:
 - Merge `ui`, `eye`, and `review` into `ueye`. Done.
 - Add Ueye checklist refinements. Done.
+- Add Visual Evidence Inventory, Reference Read Proof, Reference vs Implementation Matrix, and Design Quality Review. Done.
 - Add optional browser/screenshot capture notes.
 - Keep image generation optional, never a gate.
@@ -96,12 +97,14 @@ Tasks:
 ### 8. Scout / Research Workflow
-Goal: give yam a research lane comparable to Sneakoscope research, but lighter and more decision-oriented.
+Goal: give yam a research lane that is evidence-bound, lightweight, and decision-oriented.
-ECC reference points:
+Research reference points:
-- ECC has `deep-research`, `market-research`, `research-ops`, and `contexts/research.md`.
-- Useful parts to borrow: evidence boundaries, source freshness, fact/inference/recommendation separation, and decision-oriented summaries.
+- Evidence boundaries.
+- Source freshness.
+- Fact/inference/recommendation separation.
+- Decision-oriented summaries.
 Tasks:
@@ -142,13 +145,13 @@ Tasks:
 Goal: preserve durable lessons without turning yam into a heavy automatic memory system.
-Borrowed from Sneakoscope:
+Kept:
 - Sparse one-record-per-file storage.
 - Wrongness-style records for repeated mistakes and wrong decisions.
 - Deliberate forgetting via resolve instead of permanent prompt injection.
-Borrowed from ECC:
+Kept:
 - Evidence before recommendation.
 - Clear separation between observation and next action.
@@ -172,13 +175,13 @@ Tasks:
 Goal: prevent false runtime completion while keeping ordinary work fast.
-Borrowed from Sneakoscope:
+Kept:
 - Runtime truth vocabulary.
 - Cleanup must be backed by exit/closure evidence.
 - tmux physical proof idea, reduced to route-level evidence notes.
-Borrowed from ECC:
+Kept:
 - Evidence boundaries before recommendation.
 - Explicit partial/blocked/assumed language.
@@ -201,13 +204,13 @@ Tasks:
 Goal: provide one explicit heavy execution route without increasing total skill count.
-Borrowed from Sneakoscope:
+Kept:
 - Real Team/subagent route boundary.
 - Cross-verification before completion.
 - Runtime/tmux/browser proof when mission evidence needs it.
-Borrowed from ECC:
+Kept:
 - Role-specific work boundaries.
 - Evidence-first reporting.
@@ -233,25 +236,26 @@ Tasks:
 Goal: remove overlapping skill roles while preserving the best parts of the old routes.
-Borrowed from Sneakoscope actual image UX code:
+Kept:
 - Source screenshot inventory before visual claims.
 - P0-P3 issue ledger.
 - P0/P1-first fix loop.
 - Partial truth cap for text-only or missing-screenshot review.
-Borrowed from ECC command docs:
+Kept:
 - Smallest useful verification command.
 - Group errors by file and root cause.
 - Fix one error class at a time.
 - Compact PASS/FAIL reporting.
-Borrowed from Open Design local code and contribution rules:
+Kept:
 - Real preview/screenshot evidence.
 - Compact design direction.
 - P0 visual quality gates over placeholder output.
+- Post-implementation design quality judgment across hierarchy, spacing, typography, color, component detail, interaction, responsiveness, accessibility, and brand fit.
 Kept out by design:
@@ -273,14 +277,14 @@ Tasks:
 Goal: keep beginner momentum while creating a path toward professional proof-first work.
 The hook stays light, but the harness direction does not. `yam` should support a depth ladder: direction fit first, focused proof for ordinary work, strong proof for risky work, and real team proof for `$mission`.
-Borrowed from Sneakoscope:
+Kept:
 - Hook status and trust reporting.
 - Tool readiness as evidence.
 - DB/Supabase safety thinking.
 - Runtime/tmux/process cleanup truth.
-Borrowed from ECC:
+Kept:
 - Selective install and low-context operation.
 - Evidence boundaries instead of always-on gates.

package/bin/yam.js CHANGED Viewed

@@ -65,10 +65,10 @@ const ROUTE_BUDGETS = {
     limits: { files: 8, commands: 2, reportLines: 16, seconds: 300 }
   },
   ueye: {
-    files: 'project direction, target UI surface, visual evidence, nearby component/styles',
+    files: 'project direction, target UI surface, reference/before/after evidence, nearby component/styles',
     commands: 'browser/screenshot when feasible; inspect 1-3 primary images by default; typecheck/build only if UI implementation changed code',
-    report: 'design work, source evidence, max 5 inventory rows, states checked, P0-P3 ledger, before/after, truth cap',
-    expand: 'when direction, reference image, or visual evidence requires it; do not do broad design archaeology for simple tweaks',
+    report: 'visual evidence inventory, reference read proof, reference-vs-implementation matrix, design quality review, P0-P3 ledger, truth cap',
+    expand: 'when reference fidelity, responsive/state risk, or visual evidence requires it; do not do broad design archaeology for simple tweaks',
     limits: { files: 10, commands: 3, reportLines: 28, seconds: 600 }
   },
   question: {
@@ -291,6 +291,7 @@ async function verify({ quiet = false } = {}) {
     'risk-escalation.md',
     'quick.md',
     'ueye.md',
+    'ueye-proof.md',
     'ui-quality.md',
     'question.md',
     'mission.md',
@@ -319,7 +320,7 @@ async function verify({ quiet = false } = {}) {
       if (!hasHeading(projectTemplate, section)) issues.push(`project template missing section: ${section}`);
     }
   }
-  for (const template of ['ueye-review.md', 'mission-plan.md', 'runtime-proof.md', 'tuning-log.md']) {
+  for (const template of ['ueye-review.md', 'ueye-comparison.md', 'mission-plan.md', 'runtime-proof.md', 'tuning-log.md']) {
     if (!await exists(path.join(ROOT, 'templates', template))) issues.push(`missing template: ${template}`);
   }
@@ -858,9 +859,6 @@ async function buildYamLiteContext({ cwd, prompt }) {
   if (docsHint) lines.push(docsHint);
   const routeHint = yamLiteRouteHint(prompt);
   if (routeHint) lines.push(routeHint);
-  if (await exists(path.join(path.resolve(cwd), '.sneakoscope'))) {
-    lines.push('Caution: active .sneakoscope detected; avoid mixing proof gates unless the user explicitly wants it.');
-  }
   return lines.join('\n');
 }
@@ -1308,7 +1306,7 @@ async function inspectProjectPack(targetDir = process.cwd()) {
   const instructionSurfaces = await findInstructionSurfaces(resolved);
   if (missingSections.length) issues.push(`missing section(s): ${missingSections.join(', ')}`);
-  if (words > 1200) warnings.push(`pack is long (${words} words); keep the Karpathy-style core compact`);
+  if (words > 1200) warnings.push(`pack is long (${words} words); keep the core compact`);
   if (words < 80) warnings.push(`pack is very short (${words} words); direction may be too thin to reuse`);
   if (packAgeDays > PACK_STALE_DAYS) warnings.push(`pack is ${packAgeDays} days old; review whether direction or commands changed`);
   if (placeholderLines > 12) warnings.push(`${placeholderLines} placeholder lines are still blank`);
@@ -1353,9 +1351,7 @@ async function findInstructionSurfaces(dir) {
     { path: 'CLAUDE.md', level: 'warning', note: 'active CLAUDE.md may carry non-yam instructions' },
     { path: 'RULES.md', level: 'warning', note: 'active RULES.md may carry non-yam instructions' },
     { path: '.codex/AGENTS.md', level: 'warning', note: 'active .codex/AGENTS.md may override project behavior' },
-    { path: '.codex/SNEAKOSCOPE.md', level: 'issue', note: 'active Sneakoscope instruction file detected' },
     { path: '.codex/hooks.json', level: 'issue', note: 'active Codex hook file detected' },
-    { path: '.sneakoscope', level: 'issue', note: 'active Sneakoscope directory detected' },
     { path: '.agents', level: 'warning', note: 'project-local .agents directory may add additional skills or instructions' }
   ];
   const found = [];
@@ -1653,6 +1649,8 @@ async function printTemplate(name = '') {
   const map = {
     project: PROJECT_PACK,
     ueye: 'ueye-review.md',
+    'ueye-comparison': 'ueye-comparison.md',
+    ueyecompare: 'ueye-comparison.md',
     mission: 'mission-plan.md',
     proof: 'runtime-proof.md',
     runtime: 'runtime-proof.md',
@@ -1660,7 +1658,7 @@ async function printTemplate(name = '') {
   };
   const file = map[key];
   if (!file) {
-    console.error('usage: yam template <project|ueye|mission|proof|tuning>');
+    console.error('usage: yam template <project|ueye|ueye-comparison|mission|proof|tuning>');
     process.exitCode = 1;
     return;
   }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "yam-harness",
-  "version": "0.1.1",
+  "version": "0.1.3",
   "description": "Progressive proof-first Codex harness: start fast, deepen deliberately, stay honest by design.",
   "type": "module",
   "author": "0kim0bos",

package/references/current-docs.md CHANGED Viewed

@@ -36,10 +36,10 @@ Or:
 Current-docs proof: skipped because this was stable/local/non-SDK work.
 ```
-## Compared Baseline
+## Design Baseline
-Sneakoscope favors source-intelligence proof for current tool behavior.
+Strict proof favors source-backed evidence for current tool behavior.
-ECC keeps research/context selective and low-context.
+Modular skill workflows keep research/context selective and low-context.
-Karpathy-style minimalism says the rule is useful only when it changes the answer.
+Minimal-core design says the rule is useful only when it changes the answer.

package/references/db-supabase-safety-lite.md CHANGED Viewed

@@ -31,10 +31,10 @@ Before claiming safe:
 - A successful migration command is not automatically safe; it only proves that command execution completed.
 - Do not claim production safety without environment evidence.
-## Compared Baseline
+## Design Baseline
-Sneakoscope would likely gate destructive DB work more aggressively.
+Strict proof would gate destructive DB work more aggressively.
-ECC would keep the check selective and evidence-bound.
+Modular skill workflows keep the check selective and evidence-bound.
-Karpathy-style minimalism keeps this as a short rule and a small detector, not a full DB policy engine.
+Minimal-core design keeps this as a short rule and a small detector, not a full DB policy engine.

package/references/honest-completion.md CHANGED Viewed

@@ -52,10 +52,10 @@ Runtime work needs stronger evidence because long-running processes can create f
 - No release-blocking runtime proof unless the user chooses `$deep` or `$mission`.
 - No full `$mission` claim without real subagent/team evidence; downgrade to `$deep`, or mark mission partial/blocked.
-Compared baseline:
+Design baseline:
-- Sneakoscope would collect stronger physical proof and gate completion more aggressively.
-- ECC would keep evidence boundaries and report what is known vs inferred.
-- Karpathy-style minimalism would keep the rule short and obeyable.
+- Strict proof collects stronger physical proof and gates completion more aggressively.
+- Modular skill workflows keep evidence boundaries and report what is known vs inferred.
+- Minimal-core design keeps the rule short and obeyable.
 `yam` keeps the guard explicit, cheap, and route-aware.

package/references/hook-lite.md CHANGED Viewed

@@ -14,7 +14,7 @@ Allowed:
 - Remind the agent not to overclaim verification, cleanup, or visual evidence.
 - Suggest `$quick`, `$ueye`, `$question`, `$scout`, `$deep`, or `$mission` based on obvious prompt signals.
 - Mention a project pack or memory summary when present.
-- Warn when `.sneakoscope` is active in the current project.
+- Warn when conflicting proof-harness surfaces are active in the current project.
 Not allowed:
@@ -44,12 +44,12 @@ Project hooks write to `<project>/.codex/hooks.json`.
 `yam` backs up an existing hook file before enabling the lite hook.
-## Compared Baseline
+## Design Baseline
-Sneakoscope uses hooks as a broad trust surface with route prep, tool evidence, permission gates, subagent evidence, and stop gates.
+Broad hook systems often use route prep, tool evidence, permission gates, subagent evidence, and stop gates.
-ECC favors selective setup and lower-context workflows.
+Selective skill systems favor lower-context workflows.
-Karpathy-style minimalism would avoid hooks unless the rule is short and changes behavior.
+Minimal-core systems avoid hooks unless the rule is short and changes behavior.
 `yam` keeps this hook advisory-only so beginner momentum is preserved while the agent still receives a direction nudge. Deeper proof belongs in `$deep` and real team execution belongs in `$mission`, not in an always-on prompt hook.

package/references/markdown-management.md CHANGED Viewed

@@ -2,21 +2,21 @@
 `yam` uses markdown as a small direction layer, not as an automatic control system.
-## Compared Baseline
+## Design Baseline
-Sneakoscope:
+Strict proof systems:
 - Creates and manages more markdown surfaces for agent control, route instructions, proof, and dashboards.
 - Good for strict verification and anti-fake-work pressure.
 - Risk: too much generated context and too much automatic intervention.
-ECC:
+Modular skill systems:
 - Splits markdown into modular instructions, rules, skills, and commands.
 - Good for selective installation and low-context operation.
 - Risk: too many optional files can still become noisy if installed wholesale.
-Karpathy-style minimal harness:
+Minimal-core systems:
 - Keeps the core instruction document short and human-readable.
 - Good for speed, obedience, and easy maintenance.

package/references/memory.md CHANGED Viewed

@@ -2,12 +2,12 @@
 `yam memory` is an opt-in, project-local memory layer.
-It borrows only the lightest useful parts from heavier harnesses:
+It keeps only the lightest useful parts from heavier harness patterns:
-- Sneakoscope TriWiki: sparse records, one file per claim, deliberate forgetting instead of injecting every old claim.
-- Sneakoscope wrongness memory: remember repeated mistakes, wrong decisions, stale assumptions, and overconfident claims.
-- ECC research style: separate evidence, inference, and recommendation.
-- Karpathy-style minimalism: keep the mechanism small enough to obey.
+- Sparse records, one file per durable claim, and deliberate forgetting instead of injecting every old claim.
+- Wrongness memory for repeated mistakes, wrong decisions, stale assumptions, and overconfident claims.
+- Separate evidence, inference, and recommendation.
+- Keep the mechanism small enough to obey.
 Storage:

package/references/mission.md CHANGED Viewed

@@ -77,10 +77,10 @@ Doctor scan:
 Use `references/doctor-scan.md` before final completion.
 Keep the scan short, but cover direction fit, scope control, verification, runtime/cleanup, truth status, and fix-first items.
-Compared baseline:
+Design baseline:
-- Sneakoscope would likely make this a Team route with stronger proof gates and required agent evidence.
-- ECC would split role responsibilities and keep evidence boundaries.
-- Karpathy-style minimalism would avoid adding this unless it clearly replaces a confusing middle route.
+- Strict proof would likely make this a team route with stronger gates and required agent evidence.
+- Modular skill workflows split role responsibilities and keep evidence boundaries.
+- Minimal-core design avoids adding this unless it clearly replaces a confusing middle route.
 `yam` uses mission to replace the old standalone runtime route with a clearer heavy execution route.

package/references/quick.md CHANGED Viewed

@@ -2,15 +2,15 @@
 `quick` is the merged small-work route: fast patching, ordinary scoped implementation, and fast error scanning.
-## Borrowed, With Weight Removed
+## Selected Principles
-From Sneakoscope:
+Strict proof:
 - Honest completion language.
 - Real versus assumed verification.
 - Stop instead of claiming success when evidence is missing.
-From ECC:
+Focused execution:
 - Detect the smallest useful command.
 - Group build/type/lint/test errors by file and root cause.
@@ -18,7 +18,7 @@ From ECC:
 - Re-run the same focused command after a fix.
 - Use a compact PASS/FAIL matrix.
-From Karpathy-style minimal harness:
+Minimal core:
 - Keep the instruction short enough to obey.
 - Read the smallest useful context.

package/references/token-budget-reporter.md CHANGED Viewed

@@ -33,12 +33,12 @@ yam measure ueye --files 7 --commands 2 --report-lines 18 --seconds 260
 - `$deep`: can exceed ordinary budgets, but the reason must be risk-tied; single-agent runtime/tmux/browser checks belong here when verification needs them.
 - `$mission`: can spend more context on real subagent/team lanes, cross-verification, doctor scan, and runtime evidence, but only for approved plans where real subagents are used or explicitly unavailable/partial.
-## Compared Baseline
+## Design Baseline
-Sneakoscope would favor stronger automatic evidence collection.
+Strict proof would favor stronger automatic evidence collection.
-ECC would favor selective, low-context reporting.
+Modular skill workflows favor selective, low-context reporting.
-Karpathy-style minimal harness would remove the measurement unless it changes behavior.
+Minimal-core design removes the measurement unless it changes behavior.
 `yam` keeps manual measurement because it helps reduce over-reading without installing hooks.

package/references/tool-trust-layer.md CHANGED Viewed

@@ -49,7 +49,7 @@ Default:
 Advisory:
 - `yam-lite` hook may suggest routes and warn about overclaiming.
-- `yam pack` may warn about stale project direction, command drift, active hooks, or Sneakoscope surfaces.
+- `yam pack` may warn about stale project direction, command drift, active hooks, or legacy proof surfaces.
 On demand:
@@ -60,7 +60,7 @@ On demand:
 - `yam tools doctor`: inspect tool readiness without changing project state.
 - `yam proof`: summarize actual evidence without running verification.
-## Borrow From Sneakoscope
+## Strict Proof Inputs
 - Tool readiness checks.
 - Hook status and trust reporting.
@@ -71,14 +71,14 @@ On demand:
 - Destructive DB/Supabase command detection and production-write caution.
 - Feature/release inventory as an optional doctor, not a default gate.
-## Borrow From ECC
+## Modular Skill Inputs
 - Selective install and profiles.
 - Evidence boundaries.
 - Low-context command detection.
 - Optional orchestration instead of always-on orchestration.
-## Borrow From Open Design
+## Design Quality Inputs
 - Real preview/screenshot evidence.
 - Compact design direction.

package/references/ueye-proof.md ADDED Viewed

@@ -0,0 +1,109 @@
+# Ueye Proof
+Ueye proof keeps visual claims honest without turning design review into a heavy release gate.
+## When To Use
+Use Ueye proof artifacts when:
+- The user asks for UI/UX/design review.
+- A reference image, screenshot, or target design is part of the task.
+- The final answer needs to claim visual quality, visual parity, or responsive correctness.
+- A screenshot or browser check exists and should be tied to the conclusion.
+Skip or compress them when:
+- The change is text-only or documentation-only.
+- The user asks for a quick opinion and no visual source exists.
+- The only possible evidence is code reading; mark the result `assumed` or `partial`.
+## Artifact Set
+### Visual Evidence Inventory
+Purpose: identify what visual sources were inspected.
+Minimum useful fields:
+- Label:
+- Type:
+- Path/URL:
+- Dimensions:
+- sha256:
+- Viewport:
+- State:
+- Role:
+Bounds:
+- Default to 1-3 primary visual sources.
+- Cap ordinary inventories at 5 rows.
+- Record missing dimensions or hashes as `unknown`, not as failure.
+- Use generated annotations only as derivative aids.
+### Reference Read Proof
+Purpose: separate reading the reference from judging the implementation.
+Capture:
+- Layout:
+- Hierarchy:
+- Typography:
+- Color/contrast:
+- Component detail:
+- Interaction/motion:
+- Responsiveness:
+- Brand/mood:
+Keep this concise. It should explain what the reference asks for, not become a full design essay.
+### Reference vs Implementation Matrix
+Purpose: compare reference direction against the actual implementation evidence.
+Status values:
+- `matched`
+- `similar`
+- `different`
+- `not-verified`
+- `not-applicable`
+Rows to use only when relevant:
+- Layout and spacing.
+- Visual hierarchy.
+- Typography.
+- Color and contrast.
+- Component detail.
+- Interaction and motion.
+- Responsiveness.
+- Accessibility-relevant visuals.
+- Brand and mood fit.
+### Design Quality Review
+Purpose: judge the implemented UI on design quality after evidence is established.
+Dimensions:
+- Visual hierarchy.
+- Layout and spacing.
+- Typography.
+- Color and contrast.
+- Component detail.
+- Interaction and motion.
+- Responsiveness.
+- Accessibility.
+- Brand and mood fit.
+Report findings as P0-P3. Keep P2/P3 short unless the user requested a full polish pass.
+For each relevant design dimension, use `pass`, `needs-polish`, or `fails`.
+## Truth Caps
+- Full visual verification requires real implementation evidence and relevant recheck.
+- Reference-only evidence can support direction, not implementation proof.
+- Generated-only evidence can support ideation or annotation, not implemented-screen verification.
+- Missing screenshots, unavailable browser, or text-only review should cap the result at `partial`, `blocked`, or `assumed`.

package/references/ueye.md CHANGED Viewed

@@ -2,9 +2,9 @@
 `ueye` is the merged UI/design route: design-heavy implementation, screenshot-led UX review, and visual QA.
-## Borrowed, With Weight Removed
+## Selected Principles
-From Sneakoscope image UX review:
+Visual proof:
 - Source-screen inventory before visual claims.
 - P0-P3 issue ledger.
@@ -12,21 +12,21 @@ From Sneakoscope image UX review:
 - Recheck changed or high-risk screens after fixes when feasible.
 - Cap text-only or missing-screenshot reviews as partial instead of fully verified.
-Kept out from Sneakoscope by design:
+Kept out by design:
 - Mandatory generated annotated images.
 - Image voxel ledgers.
 - Release gates for every UI change.
 - Always-on proof loops.
-From Open Design:
+Design quality:
 - Real examples and previews matter more than abstract prose.
 - Design direction should be compact and searchable.
 - P0 gates should reject placeholder visuals, generic UI, and broken responsive states.
 - UI work should be self-contained enough to inspect.
-From ECC:
+Evidence boundaries:
 - Separate evidence from judgment.
 - Keep review output compact.
@@ -57,6 +57,94 @@ Bound:
 - Generated callout images are optional and should usually be at most one per review pass.
 - P0/P1 issues can expand the review; P2/P3 should stay top-few unless the user asks for a full polish pass.
+## Ueye Proof Artifacts
+Use these when the task depends on visual truth, reference matching, or design quality judgment. They are proof aids, not a separate lite/deep split.
+1. Visual Evidence Inventory.
+2. Reference Read Proof.
+3. Reference vs Implementation Matrix.
+4. Design Quality Review.
+Default bound:
+- Use only the artifacts that support the claim being made.
+- Keep evidence to the smallest set of screenshots, references, or URLs that can honestly support the result.
+- Prefer paths, URLs, dimensions, and hashes for images when available.
+- Do not require voxel grids, exhaustive callouts, or generated annotations.
+- Do not let proof artifacts turn Ueye into always-on heavy orchestration.
+### Visual Evidence Inventory
+Record the real screens and image sources behind the review.
+Include when known:
+- Label.
+- Type: implementation screenshot, browser URL, user screenshot, reference image, artifact export, generated annotation.
+- Path or URL.
+- Dimensions.
+- sha256.
+- Viewport or device.
+- State: default, loading, error, empty, disabled, hover/focus, mobile, or unknown.
+- Role: proof, reference direction, annotation, or partial evidence.
+Images without hashes or dimensions can still be useful, but mark the missing fields plainly.
+### Reference Read Proof
+Before judging a reference match, state what was actually read from the reference.
+Keep it visual and bounded:
+- Layout structure.
+- Hierarchy and emphasis.
+- Typography feel.
+- Color and contrast.
+- Component shapes and details.
+- Interaction or motion cues when visible.
+- Responsive implication if the reference includes multiple sizes.
+- Brand or mood fit.
+Reference read proof describes the direction. It is not proof that the implementation matches.
+### Reference vs Implementation Matrix
+Use when a user supplies, names, or implies a reference visual.
+Compare only meaningful dimensions:
+- Layout and spacing.
+- Visual hierarchy.
+- Typography.
+- Color and contrast.
+- Component detail.
+- Interaction and motion.
+- Responsiveness.
+- Accessibility-relevant visual behavior.
+- Brand or mood fit.
+For each row, record `matched`, `similar`, `different`, `not-verified`, or `not-applicable`, plus the smallest evidence note.
+### Design Quality Review
+Use as the judgment layer after evidence is separated and reference comparison is complete.
+Review dimensions:
+- Visual hierarchy.
+- Layout and spacing.
+- Typography.
+- Color and contrast.
+- Component detail.
+- Interaction and motion.
+- Responsiveness.
+- Accessibility.
+- Brand and mood fit.
+For each relevant dimension, record `pass`, `needs-polish`, or `fails`.
+Keep actionable findings in P0-P3 order. Prefer fixing P0/P1 and cheap local P2 issues before broad polish.
 ## P0-P3 Ledger
 - P0: blocker, unusable, impossible to complete primary workflow, severe accessibility or responsive failure.
@@ -67,12 +155,15 @@ Bound:
 ## Implementation Loop
 1. Direction fit.
-2. Source-screen inventory.
-3. Nearby pattern and token scan.
-4. Implementation.
-5. Screenshot/browser recheck when feasible.
-6. P0/P1 closeout.
-7. Truth status.
+2. Visual Evidence Inventory.
+3. Reference Read Proof when a reference is used.
+4. Nearby pattern and token scan.
+5. Implementation.
+6. Screenshot/browser recheck when feasible.
+7. Reference vs Implementation Matrix when reference fidelity matters.
+8. Design Quality Review.
+9. P0/P1 closeout.
+10. Truth status.
 ## Truth Caps

package/skills/ueye/SKILL.md CHANGED Viewed

@@ -9,6 +9,7 @@ Use for:
 - Design-heavy UI/UX implementation.
 - Reference-image-based UI direction.
+- Reference-image-based visual fidelity work where the result must be compared against the reference.
 - Screenshot, URL, or current-screen UX review.
 - Pre-fix and post-fix visual QA.
 - UI states, responsive behavior, hierarchy, CTA, contrast, alignment, spacing, and affordance.
@@ -23,6 +24,8 @@ Do not use for:
 - Direction before execution.
 - Visual evidence before visual claims.
+- Ueye is the tight UX/UI/design route, not Quick with screenshots.
+- Reference-based work requires reference read proof before implementation and reference comparison after implementation.
 - Context-reuse first.
 - Token economy is part of quality.
 - Product fit beats decoration.
@@ -31,6 +34,7 @@ Do not use for:
 - Text-only visual critique cannot be reported as fully verified when screenshot evidence was required.
 - Generated annotated images are optional, not a default gate.
 - Image evidence should stay bounded: inspect the primary screen first, then only the states/images needed to support the claim.
+- Design quality judgment belongs after implementation/review: compare to the reference first, then judge whether the result is good design.
 ## Workflow
@@ -41,17 +45,53 @@ Do not use for:
    - local/browser screenshot
    - exported static artifact image
    - URL/current screen, when accessible
-4. Inspect nearby UI patterns, tokens, styles, and state handling.
-5. Implement the smallest coherent design improvement when implementation is requested.
-6. Check default, loading, error, empty, disabled, hover/focus, and mobile states when relevant.
-7. Run browser/screenshot verification when feasible.
-8. Produce a P0-P3 visual issue ledger and fix path.
-9. Recheck changed/high-risk screens after fixes when feasible.
+4. When a reference image/screen is used, produce Reference Read Proof before changing UI:
+   - 5-8 concrete observations about layout, spacing, typography, color, hierarchy, component shape, interaction/motion, and brand mood.
+   - Mark any ambiguous or unobservable detail instead of inventing it.
+5. Inspect nearby UI patterns, tokens, styles, and state handling.
+6. Implement the smallest coherent design improvement when implementation is requested.
+7. Check default, loading, error, empty, disabled, hover/focus, and mobile states when relevant.
+8. Run browser/screenshot verification when feasible.
+9. Compare reference and implementation when reference fidelity was requested:
+   - matched, similar, different, not-applicable, or not-verified.
+   - Separate "faithful to reference" from "good design."
+10. Run Design Quality Review after implementation/review:
+   - visual hierarchy
+   - layout and spacing
+   - typography
+   - color and contrast
+   - component detail
+   - interaction and motion
+   - responsiveness
+   - accessibility
+   - brand or mood fit
+   - mark each relevant dimension as pass, needs-polish, or fails.
+11. Produce a P0-P3 visual issue ledger and fix path.
+12. Recheck changed/high-risk screens after fixes when feasible.
+## Required Ueye Artifacts
+Use these artifacts for reference-led implementation or serious visual review. Keep them compact for small screens, but do not omit them when the claim depends on reference fidelity.
+1. Visual Evidence Inventory:
+   - reference, before, and after evidence when available.
+   - path or URL, source type, state, viewport, and visual verification cap.
+   - sha256 and dimensions when an image file is locally available.
+2. Reference Read Proof:
+   - concrete observations from the reference before implementation.
+   - no implementation claim should depend on an unrecorded reference observation.
+3. Reference vs Implementation Matrix:
+   - compare the implemented result against the reference by aspect.
+   - record matched, similar, different, not-applicable, or not-verified.
+4. Design Quality Review:
+   - judge whether the result is good UI/UX/design after reference comparison.
+   - use pass, needs-polish, or fails for each relevant design dimension.
 ## Visual Truth Caps
 - Full visual verification requires real source-screen evidence.
 - Reference images guide direction; they do not prove the implemented screen unless compared with real source-screen evidence.
+- Reference fidelity claims require both Reference Read Proof and Reference vs Implementation Matrix.
 - Generated or annotated images are derivative evidence; they cannot upgrade missing real screen evidence to `verified`.
 - Inspect 1-3 primary images by default; expand only for P0/P1 risk, responsive breakage, or user-requested deep visual QA.
 - Keep source-screen inventory to the 5 most important rows by default.
@@ -84,6 +124,8 @@ Report:
 - What changed visually or what was reviewed.
 - Source evidence used.
+- Reference Read Proof and reference comparison result when a reference was used.
+- Design Quality Review result when implementation or serious visual review was requested.
 - P0-P3 issues or confirmation that no blockers were found.
 - States/viewports checked.
 - Truth status and visual verification cap.

package/templates/tuning-log.md CHANGED Viewed

@@ -33,7 +33,7 @@ Use this to tune route wording from real use.
 ## Compared Against
-- Sneakoscope:
-- ECC:
-- Karpathy:
+- Strict proof:
+- Modular skills:
+- Minimal core:
 - yam decision:

package/templates/ueye-comparison.md ADDED Viewed

@@ -0,0 +1,78 @@
+# yam Ueye Reference Comparison
+## Input
+- Implementation screenshot/URL:
+- Reference screenshot/URL/path:
+- Product or screen:
+- Review bound:
+## Visual Evidence Inventory
+| Label | Type | Path/URL | Dimensions | sha256 | Viewport | State | Role |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| implementation |  |  | unknown | unknown |  |  | proof |
+| reference |  |  | unknown | unknown |  |  | reference direction |
+## Reference Read Proof
+- Layout:
+- Visual hierarchy:
+- Typography:
+- Color/contrast:
+- Component detail:
+- Interaction/motion:
+- Responsiveness:
+- Brand/mood fit:
+## Reference vs Implementation Matrix
+Use `matched`, `similar`, `different`, `not-verified`, or `not-applicable`.
+| Dimension | Status | Evidence | Suggested Fix |
+| --- | --- | --- | --- |
+| Layout and spacing | not-verified |  |  |
+| Visual hierarchy | not-verified |  |  |
+| Typography | not-verified |  |  |
+| Color and contrast | not-verified |  |  |
+| Component detail | not-verified |  |  |
+| Interaction and motion | not-verified |  |  |
+| Responsiveness | not-verified |  |  |
+| Accessibility visuals | not-verified |  |  |
+| Brand and mood fit | not-verified |  |  |
+## Design Quality Review
+Use `pass`, `needs-polish`, or `fails` for each relevant design dimension before listing actionable issues.
+| Dimension | Result | Evidence |
+| --- | --- | --- |
+| Visual hierarchy | needs-polish |  |
+| Layout/spacing | needs-polish |  |
+| Typography | needs-polish |  |
+| Color/contrast | needs-polish |  |
+| Component detail | needs-polish |  |
+| Interaction/motion | needs-polish |  |
+| Responsiveness | needs-polish |  |
+| Accessibility | needs-polish |  |
+| Brand/mood fit | needs-polish |  |
+### P0 Blockers
+- None.
+### P1 Major
+- None.
+### P2 Quality
+- None.
+### P3 Polish
+- None.
+## Truth Status
+- verified / partial / skipped / blocked / assumed:

package/templates/ueye-review.md CHANGED Viewed

@@ -12,14 +12,61 @@
 - Fits project direction:
 - Mismatch:
-## Source-Screen Inventory
+## Visual Evidence Inventory
 - Evidence bound: 1-3 primary images, max 5 inventory rows by default.
-- Source type:
-- State:
+- Label:
+- Type:
+- Path/URL:
+- Dimensions:
+- sha256:
 - Viewport:
+- State:
+- Role: proof / reference direction / annotation / partial evidence
 - Visual verification cap:
-- Reference/generated image used only as direction or annotation:
+## Reference Read Proof
+- Layout:
+- Visual hierarchy:
+- Typography:
+- Color/contrast:
+- Component detail:
+- Interaction/motion:
+- Responsiveness:
+- Brand/mood fit:
+## Reference vs Implementation Matrix
+| Dimension | Status | Evidence |
+| --- | --- | --- |
+| Layout and spacing | not-verified |  |
+| Visual hierarchy | not-verified |  |
+| Typography | not-verified |  |
+| Color and contrast | not-verified |  |
+| Component detail | not-verified |  |
+| Interaction and motion | not-verified |  |
+| Responsiveness | not-verified |  |
+| Accessibility visuals | not-verified |  |
+| Brand and mood fit | not-verified |  |
+Status values: matched / similar / different / not-verified / not-applicable.
+## Design Quality Review
+| Dimension | Result | Evidence |
+| --- | --- | --- |
+| Visual hierarchy | needs-polish |  |
+| Layout/spacing | needs-polish |  |
+| Typography | needs-polish |  |
+| Color/contrast | needs-polish |  |
+| Component detail | needs-polish |  |
+| Interaction/motion | needs-polish |  |
+| Responsiveness | needs-polish |  |
+| Accessibility | needs-polish |  |
+| Brand/mood fit | needs-polish |  |
+Result values: pass / needs-polish / fails.
 ## P0-P3 Issues
@@ -41,14 +88,15 @@
 ## Checks
-- Hierarchy:
-- CTA:
-- Spacing:
-- Alignment:
-- Contrast:
-- Density:
-- Text fit:
-- Mobile:
+- Visual hierarchy:
+- Layout/spacing:
+- Typography:
+- Color/contrast:
+- Component detail:
+- Interaction/motion:
+- Responsiveness:
+- Accessibility:
+- Brand/mood fit:
 - Empty/loading/error/disabled/hover/focus states:
 ## Safe Fix Path

package/yam.manifest.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "yam",
-  "version": "0.1.1",
+  "version": "0.1.3",
   "principles": [
     "Direction before execution.",
     "Start fast.",
@@ -22,7 +22,7 @@
     {
       "id": "ueye",
       "stage": "v0.3",
-      "purpose": "Design-heavy UI/UX implementation and screenshot-led visual review with P0-P3 issue tracking."
+      "purpose": "Tight UI/UX/design implementation and visual review with reference-read proof, visual comparison, design quality review, and P0-P3 issue tracking."
     },
     {
       "id": "question",