npm - yam-harness - Versions diffs - 0.1.0 - Mend

yam-harness 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

package/AGENTS.md +18 -0
package/COMMANDS.md +144 -0
package/DECISIONS.md +70 -0
package/LICENSE +21 -0
package/README.md +159 -0
package/ROADMAP.md +308 -0
package/bin/yam.js +1966 -0
package/package.json +74 -0
package/references/context-reuse.md +59 -0
package/references/current-docs.md +45 -0
package/references/db-supabase-safety-lite.md +40 -0
package/references/doctor-scan.md +56 -0
package/references/eye.md +30 -0
package/references/final-report.md +61 -0
package/references/honest-completion.md +61 -0
package/references/hook-lite.md +55 -0
package/references/markdown-management.md +56 -0
package/references/memory.md +59 -0
package/references/mission.md +86 -0
package/references/question.md +25 -0
package/references/quick.md +70 -0
package/references/risk-escalation.md +27 -0
package/references/runtime-orchestration.md +57 -0
package/references/scout.md +38 -0
package/references/token-budget-reporter.md +44 -0
package/references/token-economy.md +61 -0
package/references/tool-trust-layer.md +113 -0
package/references/truth-matrix.md +44 -0
package/references/ueye.md +83 -0
package/references/ui-quality.md +23 -0
package/references/verification-levels.md +53 -0
package/skills/deep/SKILL.md +76 -0
package/skills/mission/SKILL.md +105 -0
package/skills/question/SKILL.md +45 -0
package/skills/quick/SKILL.md +81 -0
package/skills/scout/SKILL.md +71 -0
package/skills/ueye/SKILL.md +90 -0
package/templates/mission-plan.md +46 -0
package/templates/runtime-proof.md +54 -0
package/templates/tuning-log.md +39 -0
package/templates/ueye-review.md +62 -0
package/templates/yam.project.md +71 -0
package/yam.manifest.json +48 -0

package/package.json ADDED Viewed

@@ -0,0 +1,74 @@
+{
+  "name": "yam-harness",
+  "version": "0.1.0",
+  "description": "Progressive proof-first Codex harness: start fast, deepen deliberately, stay honest by design.",
+  "type": "module",
+  "author": "0kim0bos",
+  "homepage": "https://github.com/0kim0bos/yam#readme",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/0kim0bos/yam.git"
+  },
+  "bugs": {
+    "url": "https://github.com/0kim0bos/yam/issues"
+  },
+  "engines": {
+    "node": ">=18"
+  },
+  "bin": {
+    "yam": "bin/yam.js"
+  },
+  "files": [
+    "AGENTS.md",
+    "COMMANDS.md",
+    "DECISIONS.md",
+    "LICENSE",
+    "README.md",
+    "ROADMAP.md",
+    "bin",
+    "references",
+    "skills",
+    "templates",
+    "yam.manifest.json"
+  ],
+  "scripts": {
+    "install:skills": "node ./bin/yam.js install",
+    "uninstall:skills": "node ./bin/yam.js uninstall",
+    "status": "node ./bin/yam.js status",
+    "verify": "node ./bin/yam.js verify",
+    "doctor": "node ./bin/yam.js doctor",
+    "list": "node ./bin/yam.js list",
+    "examples": "node ./bin/yam.js examples",
+    "verify:self": "node --check ./bin/yam.js && node ./bin/yam.js verify && node ./bin/yam.js doctor",
+    "prepack": "npm run verify:self",
+    "pack:dry-run": "npm pack --dry-run",
+    "budget": "node ./bin/yam.js budget",
+    "measure": "node ./bin/yam.js measure",
+    "tools": "node ./bin/yam.js tools",
+    "proof": "node ./bin/yam.js proof",
+    "safety": "node ./bin/yam.js safety",
+    "detect": "node ./bin/yam.js detect",
+    "pack": "node ./bin/yam.js pack",
+    "memory": "node ./bin/yam.js memory",
+    "template": "node ./bin/yam.js template",
+    "tune-log": "node ./bin/yam.js tune-log",
+    "init-project": "node ./bin/yam.js init-project"
+  },
+  "keywords": [
+    "codex",
+    "harness",
+    "codex-skills",
+    "ai-agents",
+    "developer-tools",
+    "verification",
+    "ui-review",
+    "proof-first",
+    "skills",
+    "proof",
+    "ui"
+  ],
+  "publishConfig": {
+    "access": "public"
+  },
+  "license": "MIT"
+}

package/references/context-reuse.md ADDED Viewed

@@ -0,0 +1,59 @@
+# Context Reuse
+One purpose of `yam` is to avoid re-reading the whole project and re-planning from scratch every run.
+Default order:
+```text
+1. Read project direction pack if present.
+2. Read `.yam/memory/summary.md` only when repeated mistakes or direction changes matter.
+3. Read the specific files needed for the request.
+4. Reuse known commands and constraints from the pack.
+5. Expand context only when evidence shows the pack is stale or incomplete.
+```
+Preferred project pack:
+```text
+yam.project.md
+```
+What it should contain:
+- Product direction.
+- Current UI/design direction.
+- Tech stack.
+- Important commands.
+- Test/build expectations.
+- Key directories.
+- Things not to do.
+- Current known risks.
+- Recent decisions.
+Optional memory summary:
+```text
+.yam/memory/summary.md
+```
+Use it only for known wrong decisions, repeated mistakes, direction changes, and lessons that would otherwise be rediscovered.
+Rules:
+- Do not regenerate a broad plan when `yam.project.md` already answers the direction question.
+- Do not reread architecture docs unless the task touches architecture.
+- Do not reread full design docs for tiny UI work when the pack has the relevant style direction.
+- If the pack is stale, update the pack narrowly instead of carrying stale assumptions in chat.
+- Keep the pack short; it is a context-saving artifact, not a second README.
+- Do not read every `.yam/memory/records/*.json` file by default; use `yam memory summary`.
+- Follow `references/markdown-management.md` when creating or updating markdown surfaces.
+Suggested size:
+```text
+500 to 1200 words.
+```
+Truth rule:
+If no project pack exists, say that direction was inferred from local files, not from a maintained project pack.

package/references/current-docs.md ADDED Viewed

@@ -0,0 +1,45 @@
+# Current Docs Rule
+Use current docs proof only when stale knowledge is a realistic risk.
+## Require Current Docs Proof
+Use Context7, official docs, or primary sources when the task depends on current behavior for:
+- modern SDK/API syntax
+- cloud services such as Supabase, Vercel, OpenAI, Stripe, or auth providers
+- framework version behavior, migration, deprecation, or breaking changes
+- CLI flags, deployment behavior, pricing/limits, security rules, or platform integrations
+- user wording such as latest, current, recently changed, new version, official docs, migration, or upgrade
+## Usually Skip
+Do not force current-docs proof for:
+- stable programming concepts
+- local codebase pattern matching
+- small copy/CSS/UI polish
+- questions answered by the project pack or local source
+- implementation that follows already-installed project conventions without external API uncertainty
+## Output Line
+Use one concise line:
+```text
+Current-docs proof: Context7/official docs checked for <SDK/service>; result applied to <decision>.
+```
+Or:
+```text
+Current-docs proof: skipped because this was stable/local/non-SDK work.
+```
+## Compared Baseline
+Sneakoscope favors source-intelligence proof for current tool behavior.
+ECC keeps research/context selective and low-context.
+Karpathy-style minimalism says the rule is useful only when it changes the answer.

package/references/db-supabase-safety-lite.md ADDED Viewed

@@ -0,0 +1,40 @@
+# DB/Supabase Safety Lite
+This is an advisory proof-first trust layer, not a database scanner.
+Use it when a prompt, command, migration, or code change touches database mutation, Supabase, RLS, production data, or schema changes.
+## Risk Signals
+Recommend `$deep` when you see:
+- `DROP`, `TRUNCATE`, `DELETE FROM`, broad `UPDATE`, or `ALTER TABLE`.
+- `supabase db reset`, `supabase db push`, migration generation, migration apply, or remote/linked project commands.
+- ORM migration commands such as Prisma, Drizzle, Knex, or Sequelize migrations.
+- RLS/policy/permission changes: `CREATE POLICY`, `ALTER POLICY`, `GRANT`, `REVOKE`.
+- Production/remote signals: `prod`, `production`, `live`, `DATABASE_URL`, `service_role`, `--db-url`, `--linked`, `--remote`, or `--project-ref`.
+## Guardrail
+Before claiming safe:
+- identify local/staging/production target
+- prefer read-only inspection first
+- require explicit user approval for destructive or production writes
+- know the rollback or backup path when data can be lost
+- run the smallest honest verification that matches the claim
+## Truth Language
+- Pattern detection is `assumed`, not proof.
+- Read-only inspection can be `partial` or `verified` for the inspected surface.
+- A successful migration command is not automatically safe; it only proves that command execution completed.
+- Do not claim production safety without environment evidence.
+## Compared Baseline
+Sneakoscope would likely gate destructive DB work more aggressively.
+ECC would keep the check selective and evidence-bound.
+Karpathy-style minimalism keeps this as a short rule and a small detector, not a full DB policy engine.

package/references/doctor-scan.md ADDED Viewed

@@ -0,0 +1,56 @@
+# Doctor Scan
+Doctor scan is the final mission pass that looks for false completion, stale context, hidden failures, and avoidable follow-up risk.
+It is not another broad implementation pass.
+## Checklist
+Direction:
+- `yam.project.md` was read when project direction matters.
+- The implementation still matches product direction, UI direction, tech stack, and no-go rules.
+- New complexity is justified by the mission scope.
+Changed surface:
+- Changed files are within the approved mission scope.
+- No unrelated refactors or metadata churn were introduced.
+- Any generated files or markdown artifacts are intentional.
+Verification:
+- The smallest honest checks were run.
+- Failed or skipped checks are reported.
+- Browser, screenshot, or visual claims have actual visual evidence.
+- Runtime/tmux/process claims have PID, port, session, pane, log, or equivalent evidence when relevant.
+Cleanup:
+- Dev servers, watchers, tmux panes, or child processes are stopped or intentionally left running.
+- Cleanup is not claimed unless exit/closure was checked.
+- Remaining running processes are named.
+Truth status:
+- `verified` or `proven` is used only when evidence supports it.
+- `partial`, `blocked`, `skipped`, or `assumed` is used when appropriate.
+- Fixture or mock evidence is not promoted to real runtime proof.
+Report hygiene:
+- Final answer includes remaining tasks when real work remains.
+- Fix-first items are listed before planned tasks when they can block or distort the next run.
+- The report is concise enough to avoid wasting context in the next session.
+## Output Shape
+```text
+Doctor scan:
+- Direction fit:
+- Scope control:
+- Verification:
+- Runtime/cleanup:
+- Truth status:
+- Fix-first:
+```

package/references/eye.md ADDED Viewed

@@ -0,0 +1,30 @@
+# Eye
+`eye` is visual inspection for UI/UX quality.
+Inputs:
+- Screenshot.
+- URL.
+- Current local app screen.
+- Reference image.
+Review dimensions:
+- Direction fit.
+- Visual hierarchy.
+- CTA clarity.
+- Spacing and alignment.
+- Contrast.
+- Responsive behavior.
+- Empty, loading, error, disabled states.
+- Interaction affordance.
+Output:
+- P0: blocker.
+- P1: major issue.
+- P2: noticeable quality issue.
+- P3: polish.
+- Safe fix suggestions.

package/references/final-report.md ADDED Viewed

@@ -0,0 +1,61 @@
+# Final Report
+Every `yam` route should end with a compact handoff that helps the next run avoid re-reading and re-planning.
+## Required Closing Check
+Include these when they are useful and keep them short:
+- What changed or what was found.
+- What verification ran.
+- What was not checked.
+- Truth status, when verification or runtime claims matter.
+- Remaining tasks.
+- Fix-first items before planned tasks.
+## Anti-False-Completion Check
+Before final response, compare claim strength to evidence:
+- Do not say `verified` unless a relevant check actually ran and passed.
+- Do not say cleanup is complete unless exit/closure was checked or the process is intentionally left running.
+- Do not say UI was reviewed unless a screen, screenshot, browser check, or equivalent visual evidence was inspected.
+- Do not hide skipped or blocked checks.
+Use `references/honest-completion.md` and `references/truth-matrix.md` when the route has meaningful verification claims.
+## Remaining Tasks
+Use this for work that still belongs to the current roadmap or request.
+Examples:
+- Implement the next UI state.
+- Add browser verification.
+- Move large component logic into a helper.
+- Run build after a data-flow change.
+## Fix-First Items
+Use this for issues that should be considered before starting the next planned task because they can slow work, break verification, or distort product direction.
+Examples:
+- Current lint/build errors.
+- Failing tests.
+- Broken dev server.
+- Stale `yam.project.md`.
+- Active hooks or instruction files that conflict with the route.
+- Warnings that keep obscuring real problems.
+## Route Weight
+For `$quick`, one short paragraph or a compact verification matrix is enough.
+For `$ueye`, include source evidence, states/viewports checked, and P0-P3 issues only when they matter.
+For `$deep` and `$mission`, include evidence, remaining risks, remaining tasks, and fix-first items.
+Do not pad the final answer when there are no meaningful remaining tasks or fix-first items.
+When budget drift matters, include or run `yam measure <route>` with approximate files, commands, report lines, and seconds.

package/references/honest-completion.md ADDED Viewed

@@ -0,0 +1,61 @@
+# Honest Completion
+`yam` should prevent false completion without turning every task into a heavy proof ceremony.
+## Completion Rule
+Never claim that work is done, verified, cleaned up, visually checked, deployed, or fixed unless the matching evidence exists.
+Use precise language:
+- `done`: the requested edit or answer was completed.
+- `verified`: an actual relevant check passed.
+- `partial`: meaningful evidence exists, but not the full surface.
+- `assumed`: inferred from code or context, with no execution evidence.
+- `blocked`: the check could not run.
+- `skipped`: intentionally not checked because it was not worth the cost.
+## Default Guard
+Every route should ask before final response:
+- What did I actually change or answer?
+- What evidence do I actually have?
+- What did I not check?
+- Is the truth status stronger than the evidence?
+- Are there fix-first items before the next planned task?
+## Lightweight By Default
+For small work, the guard can be one sentence:
+```text
+Verification: not run; change was limited to copy/CSS and inspected locally.
+```
+For larger work, include proof summary and residual risk.
+## Runtime Guard
+Runtime work needs stronger evidence because long-running processes can create false success:
+- Record the command or process that was started.
+- Record the PID, port, session, or tmux pane when available.
+- Do not claim cleanup unless process exit or intended persistence is confirmed.
+- Do not claim browser/visual verification unless a browser check, screenshot, or equivalent evidence exists.
+## What yam Does Not Do By Default
+- No automatic proof gates.
+- No forced subagents.
+- No automatic tmux for ordinary work.
+- No release-blocking runtime proof unless the user chooses `$deep` or `$mission`.
+- No full `$mission` claim without real subagent/team evidence; downgrade to `$deep`, or mark mission partial/blocked.
+Compared baseline:
+- Sneakoscope would collect stronger physical proof and gate completion more aggressively.
+- ECC would keep evidence boundaries and report what is known vs inferred.
+- Karpathy-style minimalism would keep the rule short and obeyable.
+`yam` keeps the guard explicit, cheap, and route-aware.

package/references/hook-lite.md ADDED Viewed

@@ -0,0 +1,55 @@
+# Hook Lite
+`yam-lite` is an opt-in advisory hook.
+It exists to keep yam's direction present without turning every request into a proof gate.
+It does not mean `yam` itself should stay shallow; it only keeps the always-on layer small enough to preserve momentum.
+## Contract
+Allowed:
+- Add short route guidance through `UserPromptSubmit` additional context.
+- Remind the agent to check project direction and keep ordinary work moving.
+- Remind the agent not to overclaim verification, cleanup, or visual evidence.
+- Suggest `$quick`, `$ueye`, `$question`, `$scout`, `$deep`, or `$mission` based on obvious prompt signals.
+- Mention a project pack or memory summary when present.
+- Warn when `.sneakoscope` is active in the current project.
+Not allowed:
+- Run verification commands.
+- Start tmux, browser QA, dev servers, or subagents.
+- Block tools or permissions.
+- Read broad project context.
+- Force `$quick` or any other route.
+- Install dependencies.
+- Modify source files.
+## Toggle
+```bash
+yam hook status --global
+yam hook enable lite --global
+yam hook disable lite --global
+yam hook status --project /path/to/project
+yam hook enable lite --project /path/to/project
+yam hook disable lite --project /path/to/project
+```
+Global hooks write to `~/.codex/hooks.json`.
+Project hooks write to `<project>/.codex/hooks.json`.
+`yam` backs up an existing hook file before enabling the lite hook.
+## Compared Baseline
+Sneakoscope uses hooks as a broad trust surface with route prep, tool evidence, permission gates, subagent evidence, and stop gates.
+ECC favors selective setup and lower-context workflows.
+Karpathy-style minimalism would avoid hooks unless the rule is short and changes behavior.
+`yam` keeps this hook advisory-only so beginner momentum is preserved while the agent still receives a direction nudge. Deeper proof belongs in `$deep` and real team execution belongs in `$mission`, not in an always-on prompt hook.

package/references/markdown-management.md ADDED Viewed

@@ -0,0 +1,56 @@
+# Markdown Management
+`yam` uses markdown as a small direction layer, not as an automatic control system.
+## Compared Baseline
+Sneakoscope:
+- Creates and manages more markdown surfaces for agent control, route instructions, proof, and dashboards.
+- Good for strict verification and anti-fake-work pressure.
+- Risk: too much generated context and too much automatic intervention.
+ECC:
+- Splits markdown into modular instructions, rules, skills, and commands.
+- Good for selective installation and low-context operation.
+- Risk: too many optional files can still become noisy if installed wholesale.
+Karpathy-style minimal harness:
+- Keeps the core instruction document short and human-readable.
+- Good for speed, obedience, and easy maintenance.
+- Risk: weaker automated structure when work becomes broad or risky.
+## yam Policy
+- `yam.project.md` is project-local, short, and user-owned.
+- `SKILL.md` files are route instructions managed by the `yam` source.
+- `references/*.md` files are optional detail and should be opened only when needed.
+- `.yam/*.md` files are optional logs, summaries, or proof notes.
+- `.yam/memory/records/*.json` files are opt-in sparse memory records, not an automatic control layer.
+- Do not install hooks or automations to keep markdown "fresh".
+- Do not overwrite an existing project pack during normal init.
+- Update stale project packs narrowly instead of re-reading the whole project every run.
+## Project Pack Size
+Target:
+```text
+500 to 1200 words
+```
+Hard preference:
+- Short enough to read before each route.
+- Specific enough to prevent re-planning from scratch.
+- Focused on product direction, UI direction, commands, risks, and no-go rules.
+## Write Rules
+- Create `yam.project.md` only when missing.
+- Never replace a user-edited pack without explicit approval.
+- If command detection changes, report the new command and let the user or route update the pack narrowly.
+- Keep generated sections clearly marked.
+- Prefer one project pack over multiple competing instruction files.

package/references/memory.md ADDED Viewed

@@ -0,0 +1,59 @@
+# Memory
+`yam memory` is an opt-in, project-local memory layer.
+It borrows only the lightest useful parts from heavier harnesses:
+- Sneakoscope TriWiki: sparse records, one file per claim, deliberate forgetting instead of injecting every old claim.
+- Sneakoscope wrongness memory: remember repeated mistakes, wrong decisions, stale assumptions, and overconfident claims.
+- ECC research style: separate evidence, inference, and recommendation.
+- Karpathy-style minimalism: keep the mechanism small enough to obey.
+Storage:
+```text
+.yam/memory/records/<id>.json
+.yam/memory/summary.md
+```
+Use it for:
+- Wrong decisions that should not be repeated.
+- Repeated mistakes that waste time.
+- Project direction changes.
+- Lessons learned after a bug, failed implementation, or UX review.
+- Verification command notes that should survive across sessions.
+Do not use it for:
+- Secrets, tokens, credentials, local private paths that should not be committed, or personal data.
+- Full proof logs.
+- Large research reports.
+- Automatic gates or route enforcement.
+- Every minor thought from a session.
+Commands:
+```bash
+yam memory init .
+yam memory add . --kind wrong_decision --summary "..." --evidence "..." --action "..."
+yam memory list .
+yam memory summary .
+yam memory resolve . mem-YYYYMMDD... --note "..."
+```
+Kinds:
+- `wrong_decision`
+- `repeat_mistake`
+- `direction_change`
+- `lesson`
+- `risk`
+- `command`
+Route behavior:
+- Routes may read `.yam/memory/summary.md` only when the task is likely to repeat a known mistake or touch project direction.
+- Routes should not read all memory records by default.
+- If memory is stale or noisy, report that as a fix-first item instead of expanding context.
+- Memory never proves completion by itself; it only informs direction and avoids repeated waste.

package/references/mission.md ADDED Viewed

@@ -0,0 +1,86 @@
+# Mission
+`mission` is the real-subagent/team execution route for approved plans.
+It exists so `deep` can stay single-agent and verification-centered while `mission` owns real team/subagent implementation and cross-verification.
+Use it when:
+- The user has approved a broad implementation plan.
+- Real subagent/team lanes would reduce risk.
+- Implementation, review, visual/runtime verification, and final proof need to happen together.
+- The work is too broad for `quick` or `ueye`.
+Do not use it for:
+- Small edits.
+- Ordinary scoped feature work.
+- Pure research.
+- Pure verification or single-agent heavy work.
+- Tasks without an approved plan or clear acceptance criteria.
+- Tasks where real subagents are unavailable, unsafe, or not worth the token/time cost; use `deep` instead.
+Role model:
+- Implementer: makes the scoped changes.
+- Reviewer: checks correctness, risk, architecture, and direction.
+- UX/browser verifier: checks UI behavior, browser state, screenshots, or flows when relevant.
+- Doctor/scanner: checks direction fit, scope control, command output, stale instructions, runtime/cleanup evidence, false-completion risk, and remaining fix-first items.
+Subagent policy:
+- Mission requires real subagent/team orchestration to be considered a full mission.
+- Use mission when the environment supports subagents and the work can be split into independent implementation, review, or verification lanes.
+- If the task is bounded, the split is artificial, or the token/time cost would exceed the safety benefit, choose `deep` instead.
+- If the user invoked mission but real subagents are unavailable or unsafe, downgrade to `deep` by default and report `subagent decision: downgraded_to_deep`.
+- If the user explicitly insists on mission despite missing subagents, mark the result `partial` or `blocked`; do not treat self-review as team execution.
+- The final proof must record the subagent decision: `used`, `downgraded_to_deep`, `unavailable_partial`, or `blocked`, with the reason.
+Runtime use:
+- Mission may use deep runtime verification when needed, but runtime proof alone does not make a mission if subagents were not used.
+- tmux is recommended when a persistent dev server, watcher, or browser QA loop materially improves evidence.
+- tmux is not mandatory for every mission.
+Completion rule:
+- No verified claim without evidence.
+- No cleanup claim without exit/closure confirmation or intentional persistence.
+- No visual claim without screen, browser, or screenshot evidence.
+Suggested prompt:
+```text
+$mission
+아래 구현 계획은 확정됐어.
+목표:
+-
+범위:
+-
+금지사항:
+-
+Acceptance criteria:
+-
+실제 subagent/team 단위로 구현하고,
+implementer/reviewer/UX verifier/doctor lane으로 교차 검증해줘.
+필요하면 tmux/dev server/browser QA/process cleanup proof까지 사용해줘.
+최종 보고에는 evidence, truth status, cleanup status, fix-first items, remaining tasks를 포함해줘.
+```
+Doctor scan:
+Use `references/doctor-scan.md` before final completion.
+Keep the scan short, but cover direction fit, scope control, verification, runtime/cleanup, truth status, and fix-first items.
+Compared baseline:
+- Sneakoscope would likely make this a Team route with stronger proof gates and required agent evidence.
+- ECC would split role responsibilities and keep evidence boundaries.
+- Karpathy-style minimalism would avoid adding this unless it clearly replaces a confusing middle route.
+`yam` uses mission to replace the old standalone runtime route with a clearer heavy execution route.