npm - omakaseagent - Versions diffs - 0.1.0 - Mend

omakaseagent 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (187) hide show

package/dist/cursor/.cursor/skills/omakase/reference/archivist-workflows.md ADDED Viewed

@@ -0,0 +1,178 @@
+# Archivist workflows — git recap & chat preferences
+Protocols for **@omakase-archivist**. Not user-facing skills. Distilled from cursor-team-kit `weekly-review`, `what-did-i-get-done`, and `workflow-from-chats` (Omakase voice).
+**Router:** Use the native Archivist for these. Router `taste` is for reading/updating memory on demand — not weekly recaps.
+**Phase G:** Full repo `omakase learn` (CLI + project agents) is separate; this doc covers chat/git evidence only.
+---
+## Setup (all workflows)
+Read `.omakaseagent/taste.md` and `decisions.md` when present. Include **Memory consulted** in the output (exact entries or “none yet — init seed only”).
+## When to run
+| User intent | Workflow |
+|-------------|----------|
+| Weekly recap, retro, “what did I ship” | Git recap (below) |
+| Specific date range (“since Monday”, “last 3 days”) | Same git recap — set `window` |
+| “Mine chats”, “learn my preferences”, encode workflow | Chat preferences (below) |
+| Cross-memory + git + chats → patterns, gaps, evolution | Delegate **Memory Synthesizer** with charter citing this doc |
+---
+## 1. Git recap (one workflow, `window` parameter)
+**Goal:** A short executive summary a human would actually send — **not** a `git log` dump. If the output is only commit subjects pasted in a list, the workflow failed.
+### Setup
+1. Resolve **author** — default `git config user.email`. If the user asks for team/we scope, use all authors or named emails they give. **Sanity check:** if that email has zero commits in the window, run `git shortlog -sn main --since="<since>"` and ask the user which author(s) to use, or match a name they gave (e.g. `--author="Rick"`). Cloud/agent environments often use a bot email that is not the human author.
+2. Resolve **window** — default last **7 days**; weekly-style asks may use up to **10**. User ranges (“yesterday”, “since 2026-05-28”) → concrete dates; state the range used in the output.
+3. **Branch context** — default `main` (or repo default: `git symbolic-ref refs/remotes/origin/HEAD` shortened to branch name). Use current branch only if the user scoped to it.
+4. Collect commits on that branch in the window for the author(s). **Exclude merge commits.**
+### Commands (starting point — adapt if repo differs)
+```bash
+git config user.email
+git log main --no-merges --author="<email>" --since="<since>" --until="<until>" --format="%h %ad %s" --date=short
+```
+For substantial-looking subjects, skim the change (not every commit):
+```bash
+git show <hash> --stat
+git show <hash> -p -- <limit large diffs mentally; skip formatting-only>
+```
+### Synthesis rules
+- Group into **2–5 themes** (what moved the project), not one bullet per commit.
+- **Omit** cosmetic-only work (formatting, import churn, rename-only) unless it was the whole week.
+- Describe changes **functionally** — do not infer motivation (“because you wanted…”).
+- **Classification paragraph** (required, short): split the week into likely **bugfix**, **tech debt**, and **net-new** using subjects + diff skim. No separate Omakase taxonomy — three buckets are enough.
+### Memory writes (git only)
+- **Default: report only** — recap is the deliverable.
+- **Optional:** If a commit message or diff clearly records a **decision** (architecture, policy, tool choice), propose a **`decisions.md` row** (Context / Decision / Why / Revisit if) in a fenced diff block — **do not apply** until the user confirms.
+- Do **not** bulk-append recap bullets to `taste.md`.
+### Output shape
+```markdown
+## Shipped recap (<start> – <end>, `main`, <author>)
+- <theme 1>
+- <theme 2>
+…
+**Classification:** <2–4 sentences: bugfix vs tech debt vs net-new>
+**Proposed decisions.md** (if any; omit section if none):
+<diff or table rows for user confirm>
+```
+---
+## 2. Chat preferences (7-day default)
+**Goal:** Durable workflow rules for **future** agents — not chat summaries.
+### Capability
+1. If Cursor-style parent transcripts are readable under the project (agent may search `agent-transcripts` under the Cursor project dir) → run full protocol.
+2. If not → ask for pasted excerpts or topics; state what could not be verified. **Do not** print local filesystem paths in user-facing output.
+### Protocol
+1. **Window** — default **7 days** unless the user specifies otherwise.
+2. **Inventory** (internal): topic, parent conversation id, approximate date, why it may contain preferences. Subagent transcripts are evidence only; **cite parent conversations** in output (title + id).
+3. **Scan** for explicit signals: “I prefer”, “always”, “never”, “not what I asked”, “stop”, review/PR/CI/skill corrections.
+4. **Preference atoms:** trigger, rule, quality bar, stop condition, evidence, confidence (**strong / medium / weak / contradicted**).
+5. **Contradicted** — two parents support incompatible rules (e.g. “always squash” vs “never squash”). **Do not write files.** List under **Contradicted** and ask the user which wins.
+6. **Cluster** by shape: shipping, review, simplification, debugging, communication, delegation, validation.
+7. **Artifact choice:**
+   - **Default:** patch `taste.md` (observable preferences) or `decisions.md` (one-off choices with Why).
+   - **Ask first** if the user would need a **new skill, rule, or workflow doc** outside `.omakaseagent/`.
+8. **Strong** preferences only for proposed patches; medium/weak → **Consider**; contradicted → ask (above).
+### Writes
+- Show **exact diffs** for `taste.md` / `decisions.md`. **Wait for user confirm** before applying.
+- Never overwrite the user’s voice; add or sharpen.
+- No raw chat dumps, secrets, customer data, or credentials in output.
+### Output shape
+```markdown
+## Chat preference synthesis (<window>)
+**Target workflow:** <one sentence>
+**Evidence:** <parent title> (<conversation id>) — <one line why it counts>
+**Adopt** (strong → proposed patches below)
+**Consider** (medium/weak)
+**Dismissed** (stale, one-off, low signal)
+**Contradicted** (needs your call — do not patch until resolved)
+### Proposed patches (confirm to apply)
+(diff blocks for taste.md / decisions.md — user confirms before apply)
+```
+---
+## 3. Delegation to Memory Synthesizer
+Delegate when the charter needs **compiled truth** across `taste.md`, `decisions.md`, git themes, and chat atoms — evolution narrative, gap analysis, or co-curator structure proposals.
+The lead may still run git or chat workflows alone when the ask is a recap or a preference pass only.
+---
+## 4. Drift audit (skill source vs dist vs TEAMS.md)
+**Goal:** Catch persona/router drift before it reaches users — especially after `skill/` edits without `npm run build`.
+### When to run
+| Trigger | Action |
+|---------|--------|
+| After merging Class 2 `skill/` or native generator changes | `npm run verify:drift` (CI runs this on main/PR) |
+| Weekly maintenance or “does dist match skill?” | Same + `npm run build` if drift found |
+| Before adding a new specialist | Check overlap per `reference/team-architecture.md` |
+### Mechanical check
+```bash
+npm run verify:drift
+```
+Validates: `TEAMS.md` lists core leads; `dist/*/agents/omakase-{engineer,critic,archivist}.md` include `teams/` bundles; persona tree under `skill/teams/` is non-empty.
+### Report shape (default: report only)
+```markdown
+## Drift audit (<date>)
+- **TEAMS.md:** pass | issues
+- **dist bundles:** pass | rebuild needed
+- **scenario evals:** `npm run verify:scenario-evals` pass | fail
+**Proposed decisions.md** (if structural drift found): …
+```
+Do not auto-edit `skill/` or `dist/` — propose rebuild or fix, human confirms.
+---
+## Quality bar
+- Git recap must change how someone understands the week (themes + classification), not duplicate `git log`.
+- Chat work must produce **actionable** memory candidates or honest “nothing durable found.”
+- Apply Omakase Internal Critique Pass on non-trivial synthesis (Context Fidelity: no invented quotes or decisions).

package/dist/cursor/.cursor/skills/omakase/reference/backlog-audit.md ADDED Viewed

@@ -0,0 +1,168 @@
+# Backlog audit — codebase improvement the Omakase way
+**No separate slash command.** This workflow is triggered by plain goals to **@omakase-engineer** (or the router when native agents are absent):
+- "Audit the codebase and tell me what's worth fixing"
+- "Find tech debt / security / perf issues"
+- "What should we improve next?"
+- "Review what this branch changes before I open a PR"
+- "Reconcile the backlog" / "refresh stale plans"
+- "Write an execution plan for fixing X" (skip audit — recon + single plan)
+**Read with:** `reference/dark-factory.md`, `.omakaseagent/factory.md`, `taste.md`, `decisions.md`, `reference/execution-plan.md`.
+---
+## What this is (and is not)
+| It **is** | It **is not** |
+|-----------|----------------|
+| Engineer-led **advisor** pass: understand, judge, specify | A second router command or external skill install |
+| Findings + **execution plans** in `.omakaseagent/backlog/` | Strategic `/omakase plan` output (see `reference/plan.md`) |
+| Factory execution afterward: critic + gate + memory | "Plan passed checklist" without Omakase evidence |
+| Read-only on **source** during the audit phase | Engineer implementing fixes during the same audit turn |
+**Economics (borrowed principle):** use the capable model for understanding and specifying; delegate implementation to `omakase-implementation-lead` or a follow-up Engineer session with a self-contained execution plan. The **execution plan** is the handoff product — its quality determines whether the executor succeeds.
+---
+## Hard rules (audit phase)
+1. **Do not modify source code during audit.** The only writes in the audit turn go under `.omakaseagent/backlog/` (and optional `.omakaseagent/handoffs/` for the findings summary). Implementation happens in a **separate** work phase via factory orchestration.
+2. **Do not run commands that mutate the working tree** during audit — no installs, commits, formatters. Read, search, read-only analysis (`tsc --noEmit`, lint check mode, cheap test runs if side-effect free).
+3. **Every execution plan must be self-contained** — see `reference/execution-plan.md`. The executor has not seen the audit session.
+4. **Never reproduce secret values** — `file:line` and credential type only; recommend rotation.
+5. **If the user asks you to implement during audit,** finish or skip to spec: write/update the execution plan, then run factory orchestration in a new phase — do not "quick fix while you're in there."
+6. **Cite memory** — findings and plans must respect `taste.md` and `decisions.md`; do not re-litigate recorded rejections without new evidence.
+---
+## Workflow
+### Phase 1 — Recon (always)
+- Read `README`, `AGENTS.md`, root configs, CI, directory layout.
+- Pull mechanical commands from `.omakaseagent/factory.md` when present (not guessed).
+- Note conventions with an exemplar file path for plans to reference.
+- `git log --oneline -30` and churn hotspots when useful.
+If there is no verification baseline (broken build, no tests), say so — "establish baseline" is often finding #1 and blocks risky plans in dependency order.
+### Phase 2 — Audit (parallel when repo is non-trivial)
+Audit across these categories (scope by effort level below):
+| Category | Look for |
+|----------|----------|
+| Correctness / bugs | Logic errors, race conditions, error swallowing |
+| Security | Injection, authz gaps, secrets in repo, unsafe defaults |
+| Performance | N+1, hot loops, unnecessary allocation |
+| Test coverage | Untested critical paths, missing regression tests |
+| Tech debt & architecture | Duplication, drift, boundary violations vs taste |
+| Dependencies & migrations | Stale deps, breaking upgrades, dead code |
+| DX & tooling | Broken scripts, slow CI, confusing local setup |
+| Docs | Wrong README, missing runbooks for real workflows |
+| Direction | Grounded next features — **evidence from repo only**, not generic slop |
+**Effort level** (user sets in plain language; default **standard**):
+| | quick | standard | deep |
+|---|-------|----------|------|
+| Coverage | Hotspots only | Hotspot-weighted | Whole repo |
+| Subagents | 0–1 | ≤4 concurrent | ≤8, per category |
+| Findings | Top ~6, HIGH confidence | Full vetted table | Full + LOW "investigate" |
+Fan out read-only subagents per category when the harness supports it. Each subagent prompt must include: repo path, category scope, recon facts, instruction to return **findings only** (no fixes), and the finding format below.
+Say in the report what was **not** audited.
+**Branch scope** (when user asks about current branch / pre-PR): limit to files changed since merge-base with default branch plus direct importers. Tag each finding **`introduced`** or **`pre-existing`**. Separate them in the table — do not blame the branch for legacy debt in touched files.
+### Phase 3 — Vet (mandatory before presenting)
+Subagents over-report. For every finding in the table, **open the cited code yourself**. Expect:
+- **By-design** behavior reported as bug (e.g. standard `https_proxy` flagged as SSRF)
+- **Mis-attributed** evidence (real issue, wrong file/line)
+- **Duplicates** across categories
+Downgrade, correct, or reject. Record rejections in `.omakaseagent/backlog/README.md` under **Findings considered and rejected** — durable policy rejections also belong in `decisions.md` (offer **@omakase-archivist** when the user wants them persisted).
+### Phase 4 — Present and select
+Findings table ordered by leverage (impact ÷ effort × confidence):
+| # | Finding | Category | Impact | Effort | Risk class | Confidence | Evidence |
+Present **direction** suggestions separately (2–4 max) — options to weigh, not ranked against bugs.
+Ask which findings become execution plans (suggest top 3–5 + user picks). Note **dependency order** (e.g. characterization tests before refactor).
+Wait for selection. Do not write plans nobody asked for.
+**Skip audit** when the user already named the work: recon only enough to spec → one execution plan via `reference/execution-plan.md`.
+### Phase 5 — Write execution plans
+For each selected finding, write `.omakaseagent/backlog/NNN-<slug>.md` using `reference/execution-plan.md`.
+- Stamp `git rev-parse --short HEAD` on each plan.
+- Excerpts from **your own reads**, never from subagent reports alone.
+- Monotonic numbering; reconcile with existing `backlog/README.md` — no duplicate findings.
+- Update `backlog/README.md`: execution order, dependencies, status column, rejected findings.
+### Phase 6 — Execute (separate phase — factory orchestration)
+When the user says implement / execute / ship a backlog item:
+1. Treat the execution plan as the **task brief charter** (`reference/task-intake.md`).
+2. Class **2+:** scenarios + one confirm before deep work.
+3. Delegate to `omakase-implementation-lead` when isolated context helps; charter = full plan file path.
+4. Run `factory.md` mechanical checks.
+5. **@omakase-critic** — rubric **and** plan done criteria.
+6. Gate file links the backlog plan path and records checklist results in `## Mechanical evidence`.
+7. Human checkpoint — merging stays human-owned.
+Never close Class 2+ with only plan checkboxes and no gate.
+### Phase 7 — Reconcile
+When the user asks to refresh the backlog or at the start of a new audit if `backlog/` exists:
+- **DONE** — verify gate exists or code clearly landed; mark DONE in index.
+- **BLOCKED** — investigate; rewrite plan or retire with reason.
+- **Drifted** — plan SHA vs HEAD on in-scope paths; refresh excerpts or mark STALE.
+- **Fixed independently** — retire finding; note in rejected/retired section.
+- Cross-check: backlog status should not say DONE without evidence (gate or explicit Class 0–1 light checkpoint).
+---
+## Finding format (for audit subagents and the table)
+Each finding needs:
+- **Evidence** — `path:line` (multiple if needed)
+- **Impact** — what breaks, slows, or risks if ignored
+- **Effort** — S / M / L
+- **Risk class** — 0–3+ per `factory.md` / `dark-factory.md` (fixes to auth = 3+)
+- **Confidence** — HIGH / MED / LOW
+- **Why not now** (optional) — honest deprioritization
+No vibes-only findings.
+---
+## Relationship to other Omakase artifacts
+| Artifact | Role |
+|----------|------|
+| `reference/plan.md` | Strategic plan — why, options, phasing (router `plan` or Engineer when shaping direction) |
+| `reference/execution-plan.md` | Tactical plan — how, steps, STOP, verify gates (backlog/) |
+| `reference/handoff.md` | Session continuity; audit summary can land in `handoffs/` |
+| `reference/factory-orchestration.md` | Mandatory loop after plan selection |
+---
+## Tone
+Advising, not selling. Prefer "not worth doing" over padding the list. A short list of high-confidence plans beats thirty vague ones. Findings must pass the taste bar — no AI slop recommendations, no generic "add monitoring" without repo evidence.

package/dist/cursor/.cursor/skills/omakase/reference/critique.md ADDED Viewed

@@ -0,0 +1,92 @@
+# Critique — Domain-Aware Quality Gate
+`critique` is one of the most important commands in the system. It is a smart traffic-cop.
+It never guesses the domain. It aggressively gathers context, detects the nature of the work, loads the appropriate extensions from `reference/`, **merges** them additively into the core 8-bullet Omakase Critique Rubric, and runs the combined standard with senior rigor.
+## Core Principle (Non-Negotiable)
+The 8 bullets in `OMAKASE-CRITIQUE.md` are the unchanging foundation for *every* project that uses Omakase.
+Domain-specific standards (starting with Engineering) are **additions only**. They never replace or weaken the core. This is what keeps the system consistent and trustworthy while still allowing real excellence in specific domains.
+## Detection Logic (How to Decide What to Merge)
+Gather signals from multiple layers, in rough priority order:
+**Strong Engineering signals** (merge `reference/engineering.md` extensions):
+- Direct code, diffs, architecture discussion, implementation requests
+- Words like "refactor", "review this code", "make this production", "implement", "architecture for", or "data model" in a technical context
+- File paths or discussion of specific modules, types, performance, boundaries
+- Recent context that was already in Engineering persona
+**Skill package signals** (delegate to **The Skill Judge** via The Critic, or run `reference/skill-judge.md` when you are the Critic handling it directly):
+- "Evaluate this skill", "audit SKILL.md", "score this persona", reviewing an import before merge
+- Target is primarily `SKILL.md`, skill-shaped reference under `skill/`, or persona markdown for team packaging
+- Do **not** merge engineering extensions for pure skill audits; use the skill-judge rubric + core Omakase slop/taste bullets
+**Non-Engineering / Core-Only Signals** (use core Omakase Critique Rubric *only*; do NOT merge engineering extensions):
+- Explicit qualifiers: "high-level", "product strategy", "GTM", "positioning", "messaging", "voice and tone", "exec brief", "one-pager", "process design", "operating rhythm", "customer communication", "without any implementation or code details"
+- Pure writing, narrative, or documentation critique ("review this email", "strengthen the argument in this doc", "improve clarity of this strategy brief")
+- High-level product or process discussion ("plan the GTM", "critique our feature intake process for decision quality")
+- Design, writing, or planning work where the request explicitly avoids or disclaims technical depth
+**Weaker / Mixed signals**:
+- Ambiguous or borderline cases (e.g., "plan the developer platform improvements" or "add X feature" without qualifiers) → **ask once**: "This request has elements that could be product/strategy focused or involve implementation. Should I apply the full engineering critique standards (code judo, file health, deslop, etc.) in addition to the core rubric, or stick to core standards only?"
+- When the line is blurry, err on the side of asking rather than silently merging heavy extensions.
+When in doubt, prefer **asking once** over guessing. Never silently apply the wrong extensions. For pure non-engineering work, the critique must still pass the full core 8-bullet rubric, but interpreted through the appropriate domain lens (e.g., "structural integrity" and "pragmatic craftsmanship" apply to the strategy doc, process, or writing artifact, not code).
+## Merge Rules
+1. Always load the core 8-bullet rubric first.
+2. Load domain extensions additively. Engineering extensions are never applied to pure product, strategy, writing, process, or high-level design work.
+3. **Domain Detection & Merge Declaration (mandatory for every critique output)**: At the very top, after the summary verdict, explicitly declare:
+   - The detected domain (e.g., "Pure product strategy", "Mixed (product positioning + technical data model)", "High-level process design / writing").
+   - Exactly what was merged (or not): e.g. "Core Omakase Critique Rubric only (no engineering extensions: request was high-level GTM strategy with explicit 'no implementation details' framing)."
+   - Or: "Core + Engineering extensions (code judo, file & module health, deslop density) because the request includes technical architecture and implementation decisions."
+   This ensures every output transparently documents whether engineering standards were correctly or incorrectly applied.
+4. A deliverable can pass the core 8 and still fail under the merged engineering lens when applicable. Both standards apply simultaneously only when engineering content is present.
+**Current Engineering extensions to merge** (from `reference/engineering.md`):
+- Code judo & structural simplification opportunities
+- File/module health (especially ~1000 line smell)
+- Spaghetti growth and boundary violations
+- Direct/boring/maintainable vs magic or thin abstractions
+- Type and contract clarity
+- Pervasive deslop (comments, defensive code, repeated mutation, scattered state)
+Future domains will add their own additive sections using the same pattern.
+## Expected Output Structure (Use This)
+For any non-trivial critique:
+1. **Summary verdict** (1-2 sentences)
+2. **Domain Detection & Merge Declaration** (mandatory; see Merge Rules #3 for exact phrasing and examples. This makes the full merged critique transparent — note explicitly when engineering extensions were correctly avoided for pure non-engineering work, or correctly applied for mixed/eng work.)
+3. **Standards applied** (explicitly list core + any merged extensions, cross-referencing the declaration above)
+4. **Score table** (adapt the 8 core bullets + any merged engineering bullets; use 0-4 where relevant. For core-only critiques, interpret "Pragmatic Craftsmanship" and "Structural Integrity" relative to the actual artifact type — strategy doc, email, process design — not code.)
+5. **What's working** (2-4 specific strengths with evidence)
+6. **Priority issues** (P0–P3, with "What", "Why it matters", "Suggested fix")
+7. **"Why this approach" / taste reasoning** (for the critique itself when the target was substantial)
+8. **Recommended next actions** (specific commands the user can run, e.g. `/omakase engineer fix the state management in X` or `/omakase critique the revised GTM narrative`)
+**Depth Adaptation (mandatory for ruthless simplicity)**: The full 8-part structure with complete score table is the default for substantial, ambiguous, or borderline targets. For small targets (< ~150 lines or with immediately obvious P0 structural/spaghetti issues), use the short form to avoid bloat: Summary verdict (2 sentences) + Domain Detection & Merge Declaration + Standards applied (one line) + Top 3 Priority Issues (full P0-P2 fields) + "Why this approach" (for the critique) + Recommended next actions. When the target is a previous critique output, the final Recommended next actions **must** include an explicit self-application step ("Now run the full merged critique on this critique document").
+Be direct. Specific. Evidence-based. Never vague or hedged when the standard has been missed.
+## Tone
+You are a senior craftsperson who has high standards and zero tolerance for slop or mediocrity.
+You respect the person, but you do not respect mediocre work. Your job is to make the output excellent, not to make the author feel good.
+## Self-Application
+The `critique` command must itself pass the Critique Rubric (core + any relevant extensions). This is especially true when critiquing the Omakase system itself.
+## Edge Cases
+- Very small / obvious targets → keep the critique short but still apply the rubric. Do not skip standards just because the thing is tiny.
+- Purely non-engineering work in an engineering-heavy project → still start with core; only merge engineering if the user or context makes it relevant.
+- The target is the Omakase skill itself → apply the highest bar. We eat our own dogfood without exception.

package/dist/cursor/.cursor/skills/omakase/reference/dark-factory.md ADDED Viewed

@@ -0,0 +1,111 @@
+# Dark factory — Level 4 with Omakase
+**Read this first if you are an agent.** Per-repo commands and checks live in `.omakaseagent/factory.md` (created by `omakase learn`). Day-to-day intake: `reference/task-intake.md`.
+---
+## What this pattern is (and is not)
+**Omakase "factory"** is a **trust and evidence system** for agent engineering — not a deployment pipeline and not lights-out automation.
+| It **is** | It **is not** |
+|-----------|----------------|
+| A way to earn **longer agent runs** without the human reading every line | Level 5 dark factory (unattended merge, ship, deploy) |
+| **Scenarios** humans approve once; agents prove behavior later | A DOT/Attractor runner or custom orchestration engine (v1) |
+| **Mechanical checks** agents run (`build`, `test`, CI scripts) | Replacing the repo's CI — it complements CI |
+| **Gate reports** that bundle evidence for human checkpoint | Vague "done" in chat |
+| **Risk classes** — more autonomy on low risk, more human on high | Same rules for docs and auth migrations |
+**Goal:** Humans spend review time on **intent and proof**, not routine diff reading. Agents spend effort on **implementation + running checks + writing evidence**. Omakase supplies **taste, critique, memory, and gate shape**.
+Industry "dark factory" often means full autonomy. **Omakase targets Level 4 (Dan Shapiro):** human approves what should be true; agent proves it; human accepts at checkpoint.
+---
+## What "automation" means here
+**Automated today (agent responsibility):**
+- Co-create task brief + scenarios from plain user goals (`task-intake.md`)
+- Run repo mechanical commands listed in `factory.md`
+- Produce gate report under `.omakaseagent/gates/`
+- Cite memory; propose memory updates when durable
+- Offer `omakase learn` when factory layout is missing
+**Automated in CI (repo scripts):**
+- Gate report headings — `npm run verify:gate-reports`
+- Class 2 PR gate discipline — `npm run verify:pr-gate-diff`
+- Scenario eval contracts — `npm run verify:scenario-evals` (`evals/*.eval.json`)
+- Skill/dist drift — `npm run verify:drift`
+**Automated later (live harness evals, Phase 5+):**
+- With-skill vs baseline runs on seed prompts
+- Narrow task classes may earn more autonomy **after** evidence history — still human accept
+**Never automate in v1:**
+- Merging, deploying, production changes without explicit human accept
+- Judging "taste" or "slop" purely with scripts — use **@omakase-critic**
+- Inventing scenarios that change product intent without user confirm (Class 2+)
+**Operating rule (encode, don't re-review):** If a human would check the same thing on every task, propose a **scenario** or **mechanical check** and add it to `factory.md` / CI — do not make the human repeat the inspection.
+---
+## Rule
+> Humans approve what should be true. Agents prove it became true.
+| Human owns | Agent owns | Omakase owns |
+|------------|------------|--------------|
+| Intent, constraints, scenario approval, risk class, final accept | Implementation, running checks, evidence collection, gate draft | Taste bar, critique, memory shape, gate language |
+---
+## Loop (one task)
+1. **Task brief** — agent co-writes from user goal (no "seed" jargon for users)
+2. **Scenarios** — agent proposes; human confirms before Class 2+ deep work
+3. **Work** — `@omakase-engineer` between gates; memory first
+4. **Evidence** — scenarios + mechanical + critic + memory
+5. **Checkpoint** — gate file; human reviews evidence stack
+---
+## Risk classes
+| Class | Autonomy | Examples |
+|-------|----------|----------|
+| 0 | High — brief inline, light checkpoint | Docs, README |
+| 1 | Medium — run mechanical checks | CI, scripts |
+| 2 | Confirm brief + scenarios first | Features, personas, CLI |
+| 3+ | Stay interactive | Auth, money, migrations |
+Repo-specific examples: `.omakaseagent/factory.md`.
+---
+## Quality gates (Omakase rubric applied to the work)
+1. Context loaded (memory cited)
+2. Task/scenario clarity
+3. Anti-slop critique
+4. Verification (fresh command output, not "should work")
+5. Memory update when durable
+6. Checkpoint artifact exists (Class 2+)
+---
+## Commands
+```bash
+npx omakase init    # memory + agents
+npx omakase learn   # per-repo factory.md + starter scenarios
+npx omakase learn --dry-run
+```
+**Team loop (Class 2+):** `reference/factory-orchestration.md`. Worked example: `examples/factory-e2e/`.
+**Backlog audit (Engineer, no extra command):** `reference/backlog-audit.md` — findings and execution plans in `.omakaseagent/backlog/`; factory loop unchanged for implementation.