npm - yam-harness - Versions diffs - 0.1.0 - Mend

yam-harness 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

package/AGENTS.md +18 -0
package/COMMANDS.md +144 -0
package/DECISIONS.md +70 -0
package/LICENSE +21 -0
package/README.md +159 -0
package/ROADMAP.md +308 -0
package/bin/yam.js +1966 -0
package/package.json +74 -0
package/references/context-reuse.md +59 -0
package/references/current-docs.md +45 -0
package/references/db-supabase-safety-lite.md +40 -0
package/references/doctor-scan.md +56 -0
package/references/eye.md +30 -0
package/references/final-report.md +61 -0
package/references/honest-completion.md +61 -0
package/references/hook-lite.md +55 -0
package/references/markdown-management.md +56 -0
package/references/memory.md +59 -0
package/references/mission.md +86 -0
package/references/question.md +25 -0
package/references/quick.md +70 -0
package/references/risk-escalation.md +27 -0
package/references/runtime-orchestration.md +57 -0
package/references/scout.md +38 -0
package/references/token-budget-reporter.md +44 -0
package/references/token-economy.md +61 -0
package/references/tool-trust-layer.md +113 -0
package/references/truth-matrix.md +44 -0
package/references/ueye.md +83 -0
package/references/ui-quality.md +23 -0
package/references/verification-levels.md +53 -0
package/skills/deep/SKILL.md +76 -0
package/skills/mission/SKILL.md +105 -0
package/skills/question/SKILL.md +45 -0
package/skills/quick/SKILL.md +81 -0
package/skills/scout/SKILL.md +71 -0
package/skills/ueye/SKILL.md +90 -0
package/templates/mission-plan.md +46 -0
package/templates/runtime-proof.md +54 -0
package/templates/tuning-log.md +39 -0
package/templates/ueye-review.md +62 -0
package/templates/yam.project.md +71 -0
package/yam.manifest.json +48 -0

package/skills/deep/SKILL.md ADDED Viewed

@@ -0,0 +1,76 @@
+---
+name: deep
+description: Heavy verification route for risky or user-requested deep work. Use when the user invokes $deep or explicitly asks for strong verification.
+---
+# yam Deep
+Use for:
+- Auth, payment, DB, security, deployment, or release work.
+- Broad refactors.
+- Regressions with unclear cause.
+- User-requested heavy verification.
+- Work where false completion would be costly.
+- Long-running verification using dev servers, test watchers, browser QA, tmux, or process cleanup.
+- Broad or risky work that does not require real subagent/team execution.
+- `$mission` requests that cannot use real subagents; downgrade to `$deep` and report why.
+## Principles
+- Direction before execution.
+- Token economy still matters, even in deep mode.
+- Reuse `yam.project.md` before broad context reading when present.
+- Strong verification is allowed here, but still bounded.
+- Prefer focused evidence over ceremony.
+- Stay single-agent by default; if real team/subagent execution is required, use `$mission`.
+- Do not claim verified without actual evidence.
+- Use runtime/tmux/process orchestration only when the verification claim needs it.
+- Do not start long-running processes just to make small work look more proven.
+- Do not claim cleanup unless process exit, tmux pane/session closure, or intentional persistence is confirmed.
+## Workflow
+1. Define the risk surface.
+2. Identify acceptance criteria.
+3. Build a focused verification plan.
+4. Implement or inspect within scope.
+5. Start dev servers, test watchers, tmux panes, or browser QA only when they materially support verification.
+6. Run appropriate checks: tests, typecheck, build, browser QA, security, runtime, or data safety checks.
+7. Confirm cleanup when claiming cleanup.
+8. Classify truth status with `references/truth-matrix.md`.
+9. Report proof summary and remaining risk.
+## Verification
+Use Level 4 from `references/verification-levels.md`.
+Use `references/token-economy.md`; wider context is allowed only when tied to the risk surface.
+Use `references/context-reuse.md`; update stale project packs narrowly when needed.
+Use `references/markdown-management.md` before writing broad proof or direction markdown.
+Use `references/runtime-orchestration.md` when long-running processes, tmux, browser QA, or cleanup proof are needed.
+Use `references/db-supabase-safety-lite.md` for destructive DB/Supabase, migration, production-write, or RLS/policy work.
+Use `references/current-docs.md` when SDK/API/cloud-service behavior may be current or version-sensitive.
+Use `references/honest-completion.md`; do not overclaim verification, runtime, cleanup, or visual proof.
+Use `references/final-report.md` to close with remaining tasks and fix-first items when useful.
+Use `references/token-budget-reporter.md` when a run needs measured budget feedback.
+Deep verification may include:
+- Test suite or relevant suites.
+- Build.
+- Browser QA.
+- Dev server, test watcher, tmux, or process lifecycle checks.
+- Security or migration checks.
+- Before/after screenshot.
+- Risk-specific manual inspection.
+## Final Response
+Include:
+- What was changed or reviewed.
+- Evidence gathered.
+- Truth status.
+- Blockers or remaining risk.
+- Remaining tasks.
+- Fix-first items before planned tasks.

package/skills/mission/SKILL.md ADDED Viewed

@@ -0,0 +1,105 @@
+---
+name: mission
+description: Explicit real-subagent/team execution route for approved implementation plans. Use when the user invokes $mission or asks for team implementation with real subagents, cross-verification, doctor scan, runtime/tmux/browser QA, and final proof summary.
+---
+# yam Mission
+Use for:
+- Approved implementation plans or scenarios.
+- Broad implementation where real subagent/team separation reduces risk.
+- Team execution with real implementer, reviewer, verifier, and doctor lanes when tool support is available.
+- Work that needs implementation plus cross-verification.
+- Work that may need deep runtime/tmux/browser QA as part of final proof.
+Do not use for:
+- Tiny changes or ordinary scoped implementation. Use `$quick`.
+- Design-heavy UI/UX implementation or screenshot-led review. Use `$ueye`.
+- Pure investigation. Use `$scout`.
+- Pure verification without real subagent/team execution. Use `$deep`.
+- Heavy single-agent work, even with runtime/tmux/browser proof. Use `$deep`.
+- Mission-shaped work when real subagents are unavailable or unsafe to use. Downgrade to `$deep` and report the downgrade.
+## Principles
+- Direction before execution.
+- Mission is explicit-only; never auto-escalate small tasks into mission.
+- Start from the user's approved plan or ask for one if the plan is missing.
+- Mission requires real subagent/team orchestration; role-only self-review belongs in `$deep`.
+- Keep role separation real and useful, not theatrical.
+- Token economy still matters.
+- Use real subagents only when they are available and the work has separable, high-risk, or parallel lanes.
+- If real subagents are not available, unsafe, or not worth using, downgrade to `$deep` by default.
+- If the user explicitly insists on `$mission` despite unavailable subagents, mark the mission `partial` or `blocked` instead of pretending role-play is team execution.
+- Use tmux/dev server/browser QA only when the mission needs runtime evidence.
+- Cross-verify before claiming completion.
+- Do not claim verified, cleaned up, or visually checked without evidence.
+- Doctor scan is mandatory for mission finalization, but it should stay concise.
+## Workflow
+1. Restate the mission goal, scope, no-go rules, and acceptance criteria.
+2. Confirm real subagent/team availability and a meaningful split.
+3. If real subagents are unavailable, unsafe, or not useful, stop mission setup and downgrade to `$deep` unless the user explicitly asks to proceed with a partial/blocked mission.
+4. Split work into real lanes:
+   - Implementer: makes scoped changes.
+   - Reviewer: checks code, risk, and project direction.
+   - UX/browser verifier: checks screen behavior when relevant.
+   - Doctor/scanner: checks direction fit, scope control, verification, cleanup, stale context, and false-completion risk.
+5. Execute the implementation in bounded steps.
+6. Use `$deep`-style runtime verification when the mission needs dev server, tmux, test watcher, browser QA, cleanup, or before/after evidence.
+7. Cross-check findings and resolve contradictions.
+8. Run the smallest honest final verification set.
+9. Run doctor scan with `references/doctor-scan.md`.
+10. Confirm cleanup or explicitly report intentionally running processes.
+11. Produce final proof summary, truth status, remaining tasks, and fix-first items.
+## Proof Summary
+Include:
+- Mission goal.
+- Role work completed.
+- Subagent decision: used / downgraded_to_deep / unavailable_partial / blocked, with reason.
+- Files or surfaces changed.
+- Runtime/tmux/browser evidence when used.
+- Cross-verification result.
+- Doctor/scanner result.
+- Cleanup status.
+- Truth status: proven, verified, partial, fixture_only, fixture_instrumented_real, integration_optional, real_required_missing, skipped, blocked, or assumed.
+Use `references/context-reuse.md`; read project pack before broad context.
+Use `references/runtime-orchestration.md` only when runtime evidence is needed.
+Use `references/doctor-scan.md` before final mission completion.
+Use `references/db-supabase-safety-lite.md` for destructive DB/Supabase, migration, production-write, or RLS/policy lanes.
+Use `references/current-docs.md` when any lane depends on current SDK/API/cloud-service behavior.
+Use `references/honest-completion.md`; do not overclaim.
+Use `references/final-report.md` to close with remaining tasks and fix-first items.
+Use `references/token-budget-reporter.md` when budget drift matters.
+## Prompt Pattern
+Good mission prompts include:
+- Goal.
+- Scope.
+- No-go rules.
+- Acceptance criteria.
+- Runtime/browser needs.
+- Required final checks.
+If the user invokes `$mission` without an approved plan, ask for the plan or propose a compact plan first.
+## Final Response
+Report:
+- What the mission changed.
+- Role/cross-verification summary.
+- Evidence gathered.
+- Truth status.
+- Cleanup status.
+- Remaining tasks.
+- Fix-first items before planned tasks.

package/skills/question/SKILL.md ADDED Viewed

@@ -0,0 +1,45 @@
+---
+name: question
+description: Very-light Q&A route. Use when the user invokes $question or asks a direct conceptual or practical question that can be answered without code changes or broad research.
+---
+# yam Question
+Use for:
+- Simple conceptual explanations.
+- "What is X?" or "Can I do Y?" questions.
+- Local harness/project usage questions.
+- Clarifying a decision that does not need fresh source gathering.
+Do not use for:
+- Code changes. Use `$quick` or `$ueye`.
+- Multi-option research or current market/tool comparisons. Use `$scout`.
+- Risky verification or broad debugging. Use `$deep`.
+Principles:
+- Direction before execution.
+- Answer the question first.
+- Keep context tiny.
+- Do not browse or inspect files unless freshness, local truth, or accuracy requires it.
+- Separate fact, inference, and recommendation when the distinction matters.
+- State uncertainty plainly.
+- Token economy is part of quality.
+- End with remaining tasks and fix-first items only when they are useful.
+Workflow:
+1. Identify the exact question.
+2. Answer from stable knowledge or already-loaded context.
+3. Check local files only when the answer depends on installed/configured state.
+4. Browse only when the fact is current, niche, or likely to have changed.
+5. If the answer becomes comparison/research, switch to `$scout`.
+Output:
+- Direct answer.
+- Short explanation.
+- Practical next step, if any.
+- Uncertainty or verification note, if relevant.
+Token budget:
+Use `references/token-budget-reporter.md`.
+Use `references/current-docs.md` only when the question depends on current SDK/API/cloud-service behavior; otherwise answer directly.
+Default: 0-2 files, 0 commands, short final answer.

package/skills/quick/SKILL.md ADDED Viewed

@@ -0,0 +1,81 @@
+---
+name: quick
+description: Fast implementation route for small changes, focused bug fixes, and quick error scans. Use when the user invokes $quick or asks for a quick scoped fix.
+---
+# yam Quick
+Use for:
+- Copy, label, spacing, color, small CSS, and small docs edits.
+- Narrow bug fixes and ordinary scoped implementation.
+- Fast error scans for build/type/lint/test failures.
+- Small UI tweaks that do not need design exploration or visual review loops.
+Do not use for:
+- Design-heavy UI work or reference-image interpretation. Use `$ueye`.
+- Risky, broad, or runtime-heavy work. Use `$deep`.
+- DB/Supabase destructive commands, production writes, migrations, RLS, or schema changes. Recommend `$deep`.
+- Real subagent/team implementation. Use `$mission`.
+- Pure Q&A. Use `$question`.
+## Principles
+- Direction before execution.
+- Context-reuse first.
+- Token economy is part of quality.
+- Start with the smallest likely edit surface.
+- Follow existing project architecture, naming, UX flow, and test style.
+- Verify at the lightest level that honestly supports completion.
+- Do not use teams, orchestration, structured proof, or tmux.
+- Do not run broad test suites for tiny changes.
+## Lanes
+Patch lane:
+1. Read `yam.project.md` or `.yam/memory/summary.md` only when present and useful.
+2. Inspect the smallest relevant file or nearby pattern.
+3. Make the minimal change.
+4. Re-read the changed snippet.
+5. Run at most one or two focused checks when useful.
+Build-fix lane:
+1. Detect the smallest useful command from package scripts or project pack.
+2. Group errors by file and root cause.
+3. Fix one error class at a time.
+4. Read only the local error context, usually the file and nearby imports.
+5. Re-run the same focused command.
+6. Stop if the same error survives three attempts, errors expand, dependency installation is needed, or the fix implies architecture change.
+Scan lane:
+1. Inspect the current error output or run the smallest detector.
+2. Report grouped issues and the safest first fix.
+3. Edit only when the user asked for implementation.
+## Verification
+- Copy/CSS/docs: Level 0 is often enough after re-reading the changed snippet.
+- TS/JS or app logic: prefer typecheck, related test, lint, or build in that order when available.
+- Build-fix: use a compact PASS/FAIL matrix for command results.
+- If verification is skipped, partial, blocked, or assumed, say that plainly.
+Use `references/quick.md` for the merged fast/build rules.
+Use `references/truth-matrix.md` for truth labels.
+Use `references/db-supabase-safety-lite.md` when a command or prompt contains DB/Supabase mutation signals.
+Use `references/token-economy.md` to keep context small.
+Use `references/context-reuse.md` before broad project reading.
+Use `references/final-report.md` to close with remaining tasks and fix-first items when useful.
+Use `references/token-budget-reporter.md` when a run needs measured budget feedback.
+## Final Response
+Keep it compact:
+- What changed or what the scan found.
+- What was checked.
+- What was skipped, blocked, or still risky.
+- Remaining tasks and fix-first items, only when useful.

package/skills/scout/SKILL.md ADDED Viewed

@@ -0,0 +1,71 @@
+---
+name: scout
+description: Lightweight investigation and judgment route. Use when the user invokes $scout or asks to find options, references, docs, tools, product direction, or third-party perspective before implementation.
+---
+# yam Scout
+Use for:
+- Tool/library comparison.
+- Current documentation lookup.
+- Design or product references.
+- Technical direction checks.
+- Risk discovery before implementation.
+- Third-party judgment before committing to a direction.
+- Objective and subjective evaluation together.
+- Macro, realistic, and future-facing judgment.
+## Principles
+- Scout, do not sprawl.
+- Token economy is part of quality.
+- Reuse `yam.project.md` before broad context reading when present.
+- Prefer official and primary sources.
+- Use current docs proof only when SDK/API/cloud-service freshness matters.
+- Keep the question narrow.
+- Do not change code unless the user asks.
+- Return a practical recommendation.
+- Separate fact, inference, opinion, and recommendation.
+- Treat "objective" as evidence plus constraints, not certainty theater.
+- Treat "subjective" as named taste, product instinct, and likely user perception.
+- Keep source count bounded by default; use `$deep` only when the user asks for heavy verification.
+## Workflow
+1. Clarify the decision being scouted.
+2. Choose the lane:
+   - quick lookup
+   - option comparison
+   - design/reference scan
+   - product/technical direction memo
+   - risk discovery
+3. Read `yam.project.md` first when project direction matters.
+4. Gather 3-7 high-signal sources by default.
+5. Compare options by fit, risk, cost, implementation effort, and durability.
+6. Give both objective and subjective judgment when useful.
+7. Recommend a direction.
+8. State uncertainty and what would change the recommendation.
+## Output
+Use concise sections:
+- Best pick.
+- Objective judgment.
+- Subjective judgment.
+- Macro / realistic / future view when the decision benefits from it.
+- Alternatives.
+- Risks.
+- Sources or local evidence.
+Use `references/token-economy.md`; default to 3-7 high-signal sources.
+Use `references/current-docs.md` for current SDK/API/cloud-service questions.
+Use `references/context-reuse.md`; do not rescout known decisions unless they may be stale.
+Use `references/markdown-management.md` before creating or updating project packs.
+Use `references/final-report.md` to close with remaining tasks and fix-first items when useful.
+Use `references/token-budget-reporter.md` when a run needs measured budget feedback.
+## Final Response
+Mention the recommended direction, key tradeoffs, remaining tasks, and fix-first items before planned tasks when useful.

package/skills/ueye/SKILL.md ADDED Viewed

@@ -0,0 +1,90 @@
+---
+name: ueye
+description: UI/UX/design implementation and visual review route with screenshot evidence. Use when the user invokes $ueye or asks for design-heavy UI work, reference-image-based implementation, or visual UX review.
+---
+# yam Ueye
+Use for:
+- Design-heavy UI/UX implementation.
+- Reference-image-based UI direction.
+- Screenshot, URL, or current-screen UX review.
+- Pre-fix and post-fix visual QA.
+- UI states, responsive behavior, hierarchy, CTA, contrast, alignment, spacing, and affordance.
+Do not use for:
+- Tiny UI tweaks. Use `$quick`.
+- Pure review of non-visual code risk. Use `$deep` or `$mission` when risk is high.
+- Broad implementation plans with role split, runtime proof, or doctor scan. Use `$mission`.
+## Principles
+- Direction before execution.
+- Visual evidence before visual claims.
+- Context-reuse first.
+- Token economy is part of quality.
+- Product fit beats decoration.
+- Use existing tokens, components, typography, icons, and layout patterns first.
+- Build the actual usable screen, not a marketing detour.
+- Text-only visual critique cannot be reported as fully verified when screenshot evidence was required.
+- Generated annotated images are optional, not a default gate.
+- Image evidence should stay bounded: inspect the primary screen first, then only the states/images needed to support the claim.
+## Workflow
+1. Read `yam.project.md` and `.yam/memory/summary.md` only when present and useful.
+2. Identify the target screen, product direction, audience, and reference image or URL.
+3. Build or capture a source-screen inventory:
+   - user-provided screenshot
+   - local/browser screenshot
+   - exported static artifact image
+   - URL/current screen, when accessible
+4. Inspect nearby UI patterns, tokens, styles, and state handling.
+5. Implement the smallest coherent design improvement when implementation is requested.
+6. Check default, loading, error, empty, disabled, hover/focus, and mobile states when relevant.
+7. Run browser/screenshot verification when feasible.
+8. Produce a P0-P3 visual issue ledger and fix path.
+9. Recheck changed/high-risk screens after fixes when feasible.
+## Visual Truth Caps
+- Full visual verification requires real source-screen evidence.
+- Reference images guide direction; they do not prove the implemented screen unless compared with real source-screen evidence.
+- Generated or annotated images are derivative evidence; they cannot upgrade missing real screen evidence to `verified`.
+- Inspect 1-3 primary images by default; expand only for P0/P1 risk, responsive breakage, or user-requested deep visual QA.
+- Keep source-screen inventory to the 5 most important rows by default.
+- Screenshot unavailable: cap the result at `partial` or `blocked`.
+- Text-only review: cap the result at `partial`.
+- Mock or invented screenshots: cap the result at `partial` and mark the source.
+- Browser unavailable but source files reviewed: report implementation verification separately from visual verification.
+- Generated callout images may improve review quality, but missing generated images must not block ordinary Ueye work.
+## Design Checks
+Use `references/ueye.md` and `references/ui-quality.md`.
+Always consider:
+- Direction fit.
+- Hierarchy.
+- CTA clarity.
+- Spacing and alignment.
+- Contrast.
+- Density.
+- Text fit.
+- Responsive behavior.
+- State coverage.
+- Interaction affordance.
+- Consistency with the project's visual language.
+## Final Response
+Report:
+- What changed visually or what was reviewed.
+- Source evidence used.
+- P0-P3 issues or confirmation that no blockers were found.
+- States/viewports checked.
+- Truth status and visual verification cap.
+- Remaining tasks and fix-first items before planned tasks when useful.

package/templates/mission-plan.md ADDED Viewed

@@ -0,0 +1,46 @@
+# yam Mission Prompt
+```text
+$mission
+아래 구현 계획은 확정됐어.
+목표:
+-
+범위:
+-
+금지사항:
+-
+Acceptance criteria:
+-
+역할:
+- Implementer: 범위 안에서 구현
+- Reviewer: 코드/구조/리스크/방향성 검토
+- UX/browser verifier: 화면/상태/브라우저 흐름 확인
+- Doctor/scanner: stale context, 과검증/검증부족, false-completion risk, cleanup, 남은 fix-first 점검
+Subagent 판단:
+- 실제 subagent/team 사용 가능 여부:
+- meaningful split:
+- decision: used / downgraded_to_deep / unavailable_partial / blocked
+- 이유:
+- subagent가 불가능하거나 불필요하면 기본적으로 $deep으로 전환:
+검증:
+- 필요한 가장 작은 검증 명령을 우선 실행
+- 필요하면 tmux/dev server/browser QA/process cleanup proof 사용
+- 검증하지 못한 것은 skipped/blocked/assumed로 명확히 보고
+최종 보고:
+- 구현 요약
+- 역할별 교차 검증 결과
+- subagent decision
+- 실제 evidence
+- truth status
+- cleanup status
+- fix-first items
+- remaining tasks
+```

package/templates/runtime-proof.md ADDED Viewed

@@ -0,0 +1,54 @@
+# yam Deep/Mission Runtime Proof Summary
+## Goal
+-
+## Plan
+-
+## Processes
+- Command:
+- PID/session/pane:
+- Port:
+- Started:
+- Stopped:
+- Exit/closure verified:
+## tmux
+- Used: yes / no
+- Session:
+- Pane:
+- Before-drain observation:
+- After-drain observation:
+- Final observation:
+## Evidence
+- Before:
+- During:
+- After:
+## Verification
+- Command/check:
+- Result:
+- Truth status:
+## Cleanup
+- Cleanup performed:
+- Process exit verified:
+- tmux pane/session closed or intentionally left running:
+- Remaining process intentionally left running:
+## Blockers
+- None.
+## Final Truth Status
+- proven / verified / partial / fixture_only / fixture_instrumented_real / integration_optional / real_required_missing / skipped / blocked / assumed:

package/templates/tuning-log.md ADDED Viewed

@@ -0,0 +1,39 @@
+# yam Tuning Log
+Use this to tune route wording from real use.
+## Entry
+- Date:
+- Project:
+- Route:
+- Task:
+## What Happened
+- Over-read context:
+- Under-verified:
+- Report too long:
+- Direction mismatch:
+- Useful behavior:
+## Budget Measurement
+- Files read:
+- Commands run:
+- Report lines:
+- Seconds:
+- Budget result:
+## Fix To Harness
+- Skill to edit:
+- Proposed wording:
+- Keep/remove:
+## Compared Against
+- Sneakoscope:
+- ECC:
+- Karpathy:
+- yam decision:

package/templates/ueye-review.md ADDED Viewed

@@ -0,0 +1,62 @@
+# yam Ueye Review
+## Input
+- Screenshot/URL:
+- Product or screen:
+- Reference direction:
+- Source evidence:
+## Direction Fit
+- Fits project direction:
+- Mismatch:
+## Source-Screen Inventory
+- Evidence bound: 1-3 primary images, max 5 inventory rows by default.
+- Source type:
+- State:
+- Viewport:
+- Visual verification cap:
+- Reference/generated image used only as direction or annotation:
+## P0-P3 Issues
+### P0 Blockers
+- None.
+### P1 Major
+- None.
+### P2 Quality
+- None.
+### P3 Polish
+- None.
+## Checks
+- Hierarchy:
+- CTA:
+- Spacing:
+- Alignment:
+- Contrast:
+- Density:
+- Text fit:
+- Mobile:
+- Empty/loading/error/disabled/hover/focus states:
+## Safe Fix Path
+1.
+2.
+3.
+## Truth Status
+- verified / partial / skipped / blocked / assumed: