npm - engsys - Versions diffs - 1.0.0 - Mend

engsys 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (173) hide show

package/LICENSE +21 -0
package/README.md +202 -0
package/core/agents/aaron.md +152 -0
package/core/agents/bert.md +115 -0
package/core/agents/isabelle.md +136 -0
package/core/agents/jody.md +150 -0
package/core/agents/leith.md +111 -0
package/core/agents/marcelo.md +282 -0
package/core/agents/melvin.md +101 -0
package/core/agents/nyx.md +152 -0
package/core/agents/otto.md +168 -0
package/core/agents/patricia.md +283 -0
package/core/commands/design-audit-local.md +155 -0
package/core/commands/design-audit.md +235 -0
package/core/commands/design-critique.md +96 -0
package/core/commands/file-issue.md +22 -0
package/core/commands/generate-project.md +45 -0
package/core/commands/implement-issue.md +37 -0
package/core/commands/implement-project.md +40 -0
package/core/commands/naturalize.md +61 -0
package/core/commands/pre-push.md +29 -0
package/core/commands/prep-review-collect.md +130 -0
package/core/commands/prep-review-finalize.md +121 -0
package/core/commands/prep-review-publish.md +113 -0
package/core/commands/prep-review.md +65 -0
package/core/commands/project-closeout.md +25 -0
package/core/skills/agentic-eval/SKILL.md +195 -0
package/core/skills/chrome-devtools/SKILL.md +97 -0
package/core/skills/code-review/SKILL.md +26 -0
package/core/skills/gh-cli/SKILL.md +2202 -0
package/core/skills/git-commit/SKILL.md +124 -0
package/core/skills/git-workflow-agents/SKILL.md +462 -0
package/core/skills/git-workflow-agents/reference.md +220 -0
package/core/skills/github-actions/SKILL.md +190 -0
package/core/skills/github-issues/SKILL.md +154 -0
package/core/skills/llm-structured-outputs/SKILL.md +323 -0
package/core/skills/llm-structured-outputs/references/provider-details.md +392 -0
package/core/skills/pre-push/SKILL.md +115 -0
package/core/skills/refactor/SKILL.md +645 -0
package/core/skills/web-design-reviewer/SKILL.md +371 -0
package/core/skills/webapp-testing/SKILL.md +127 -0
package/core/skills/webapp-testing/test-helper.js +56 -0
package/core/templates/CLAUDE.md.tmpl +98 -0
package/core/templates/adr-template.md +67 -0
package/core/templates/gh-issue-templates/bug.md +39 -0
package/core/templates/gh-issue-templates/content.md +42 -0
package/core/templates/gh-issue-templates/enhancement.md +36 -0
package/core/templates/gh-issue-templates/feature.md +39 -0
package/core/templates/gh-issue-templates/infrastructure.md +41 -0
package/core/templates/post-edit-reminders.sh.tmpl +19 -0
package/core/templates/settings.json.tmpl +90 -0
package/core/templates/settings.local.json.tmpl +3 -0
package/core/workflows/agent-implementation-workflow.md +346 -0
package/core/workflows/generate-project.md +258 -0
package/core/workflows/implement-project-workflow.md +190 -0
package/core/workflows/issue-tracking.md +89 -0
package/core/workflows/project-closeout-ceremony.md +77 -0
package/core/workflows/review-workflow.md +266 -0
package/engsys.config.example.yaml +46 -0
package/install +202 -0
package/lessons-library/README.md +80 -0
package/lessons-library/async-callbacks-verify-liveness.md +15 -0
package/lessons-library/change-isnt-done-until-every-surface-updated.md +15 -0
package/lessons-library/claim-then-act-for-irreversible-ops.md +16 -0
package/lessons-library/co-commit-entangled-work.md +15 -0
package/lessons-library/dependabot-triage-playbook.md +17 -0
package/lessons-library/deploy-by-digest-and-verify-the-running-revision.md +15 -0
package/lessons-library/enforce-your-guarantee-at-your-boundary.md +16 -0
package/lessons-library/gate-changes-on-measurement-not-vibes.md +15 -0
package/lessons-library/iac-first-no-console-changes.md +15 -0
package/lessons-library/independent-objective-review-gate.md +15 -0
package/lessons-library/keep-an-immutable-source-of-truth.md +15 -0
package/lessons-library/long-agent-runs-checkpoint-not-poll.md +15 -0
package/lessons-library/model-identity-with-stable-ids-and-provenance.md +15 -0
package/lessons-library/operator-choices-are-first-class.md +15 -0
package/lessons-library/prefer-tool-enforced-structured-output.md +15 -0
package/lessons-library/prove-causation-before-acting.md +15 -0
package/lessons-library/re-read-state-before-acting.md +14 -0
package/lessons-library/read-layer-tolerates-unbackfilled-rows.md +15 -0
package/lessons-library/shell-safety-pipefail-and-validate-before-teardown.md +14 -0
package/lessons-library/shift-correctness-left-and-distrust-false-greens.md +15 -0
package/lessons-library/stray-control-bytes-hide-changes.md +14 -0
package/lessons-library/tests-can-assert-the-bug.md +15 -0
package/lessons-library/verify-ground-truth-not-reports.md +15 -0
package/lessons-library/worktrees-need-bootstrap-from-origin-main.md +15 -0
package/lib/commands.js +356 -0
package/lib/generate-team-avatars.mjs +251 -0
package/lib/manifest.js +155 -0
package/lib/render.js +135 -0
package/lib/selftest.js +90 -0
package/lib/util.js +89 -0
package/lib/yaml.js +156 -0
package/optional-agents/gary.md +86 -0
package/optional-agents/jos.md +136 -0
package/optional-agents/sandy.md +101 -0
package/optional-agents/steve.md +161 -0
package/package.json +43 -0
package/stacks/cloud/aws/claude.fragment.md +17 -0
package/stacks/cloud/aws/settings.fragment.json +39 -0
package/stacks/cloud/aws/skills/aws-deployment-preflight/SKILL.md +165 -0
package/stacks/cloud/aws/skills/cloud-architecture-aws/SKILL.md +265 -0
package/stacks/cloud/azure/claude.fragment.md +17 -0
package/stacks/cloud/azure/settings.fragment.json +45 -0
package/stacks/cloud/azure/skills/azure-deployment-preflight/SKILL.md +175 -0
package/stacks/cloud/azure/skills/cloud-architecture-azure/SKILL.md +211 -0
package/stacks/cloud/cloudflare/claude.fragment.md +21 -0
package/stacks/cloud/cloudflare/settings.fragment.json +31 -0
package/stacks/cloud/cloudflare/skills/cloud-architecture-cloudflare/SKILL.md +294 -0
package/stacks/cloud/cloudflare/skills/cloudflare-deployment-preflight/SKILL.md +175 -0
package/stacks/cloud/gcp/claude.fragment.md +17 -0
package/stacks/cloud/gcp/settings.fragment.json +40 -0
package/stacks/cloud/gcp/skills/cloud-architecture-gcp/SKILL.md +208 -0
package/stacks/cloud/gcp/skills/gcp-deployment-preflight/SKILL.md +137 -0
package/stacks/db/mongo/skills/mongo-conventions/SKILL.md +96 -0
package/stacks/db/prisma/claude.fragment.md +49 -0
package/stacks/db/prisma/skills/docker-database-package-copy/SKILL.md +44 -0
package/stacks/db/prisma/skills/prisma-conventions/SKILL.md +37 -0
package/stacks/domain/mobile-growth/skills/apple-ads/SKILL.md +184 -0
package/stacks/domain/mobile-growth/skills/apple-ads/references/benchmark-notes.md +47 -0
package/stacks/domain/mobile-growth/skills/apple-ads/references/official-links.md +53 -0
package/stacks/domain/mobile-growth/skills/google-play-growth/SKILL.md +197 -0
package/stacks/domain/mobile-growth/skills/google-play-growth/references/benchmark-notes.md +47 -0
package/stacks/domain/mobile-growth/skills/google-play-growth/references/official-links.md +45 -0
package/stacks/iac/bicep/claude.fragment.md +14 -0
package/stacks/iac/bicep/settings.fragment.json +20 -0
package/stacks/iac/bicep/skills/iac-bicep/SKILL.md +113 -0
package/stacks/iac/cdk/claude.fragment.md +14 -0
package/stacks/iac/cdk/settings.fragment.json +23 -0
package/stacks/iac/cdk/skills/iac-cdk/SKILL.md +104 -0
package/stacks/iac/terraform/claude.fragment.md +13 -0
package/stacks/iac/terraform/settings.fragment.json +25 -0
package/stacks/iac/terraform/skills/iac-terraform/SKILL.md +93 -0
package/stacks/iac/terraform/skills/terraform-conventions/SKILL.md +87 -0
package/stacks/lang/kotlin/skills/android-testing/SKILL.md +263 -0
package/stacks/lang/kotlin/skills/jetpack-compose/SKILL.md +264 -0
package/stacks/lang/kotlin/skills/kotlin-coroutines/SKILL.md +329 -0
package/stacks/lang/python/skills/python-conventions/SKILL.md +61 -0
package/stacks/lang/shell/skills/shell-scripting/SKILL.md +110 -0
package/stacks/lang/swift/skills/swift-concurrency/SKILL.md +423 -0
package/stacks/lang/swift/skills/swift-concurrency/references/approachable-concurrency.md +80 -0
package/stacks/lang/swift/skills/swift-concurrency/references/concurrency-patterns.md +233 -0
package/stacks/lang/swift/skills/swift-concurrency/references/swiftui-concurrency.md +187 -0
package/stacks/lang/swift/skills/swift-concurrency/references/synchronization-primitives.md +341 -0
package/stacks/lang/swift/skills/swift-testing/SKILL.md +497 -0
package/stacks/lang/swift/skills/swift-testing/references/testing-advanced.md +106 -0
package/stacks/lang/swift/skills/swift-testing/references/testing-patterns.md +504 -0
package/stacks/lang/swift/skills/swiftdata/SKILL.md +334 -0
package/stacks/lang/swift/skills/swiftdata/references/core-data-coexistence.md +504 -0
package/stacks/lang/swift/skills/swiftdata/references/swiftdata-advanced.md +975 -0
package/stacks/lang/swift/skills/swiftdata/references/swiftdata-queries.md +675 -0
package/stacks/lang/swift/skills/swiftui-patterns/SKILL.md +371 -0
package/stacks/lang/swift/skills/swiftui-patterns/references/architecture-patterns.md +486 -0
package/stacks/lang/swift/skills/swiftui-patterns/references/deprecated-migration.md +1097 -0
package/stacks/lang/swift/skills/swiftui-patterns/references/design-polish.md +780 -0
package/stacks/lang/swift/skills/swiftui-patterns/references/platform-and-sharing.md +696 -0
package/stacks/lang/typescript/skills/typescript-conventions/SKILL.md +91 -0
package/stacks/platform/android/claude.fragment.md +40 -0
package/stacks/platform/android/hooks/pre-push-gradle.sh +70 -0
package/stacks/platform/android/settings.fragment.json +13 -0
package/stacks/platform/android/skills/android-build-conventions/SKILL.md +247 -0
package/stacks/platform/ios/claude.fragment.md +24 -0
package/stacks/platform/ios/hooks/pre-push-xcodebuild.sh +82 -0
package/stacks/platform/ios/settings.fragment.json +21 -0
package/stacks/platform/ios/skills/xcodebuildmcp-simulator-logs/SKILL.md +76 -0
package/stacks/platform/web/skills/frontend-testing/SKILL.md +246 -0
package/stacks/platform/web/skills/react-conventions/SKILL.md +261 -0
package/stacks/platform/web/skills/web-platform-conventions/SKILL.md +55 -0
package/stacks/tooling/issue-tracker-github/claude.fragment.md +10 -0
package/stacks/tooling/issue-tracker-github/settings.fragment.json +24 -0
package/stacks/tooling/issue-tracker-github/skills/issue-tracker-github/SKILL.md +278 -0
package/stacks/tooling/issue-tracker-linear/claude.fragment.md +17 -0
package/stacks/tooling/issue-tracker-linear/settings.fragment.json +9 -0
package/stacks/tooling/issue-tracker-linear/skills/issue-tracker-linear/SKILL.md +183 -0

package/core/commands/design-audit-local.md ADDED Viewed

@@ -0,0 +1,155 @@
+---
+description: Iterate on surface design fidelity live in the browser against the running app — compare a surface to the design reference, fix gaps in-loop, re-verify, repeat
+argument-hint: <surface name or a route path>
+---
+# /design-audit-local
+Tight, interactive fidelity loop: open the **design reference** (gallery/showcase or mockup) **and** the live local surface side by side in the browser, find where the surface diverges from the reference, **fix the gap in source**, let the dev server hot-reload, re-verify — repeat until it matches or a gap is escalated. The deliverable is working code on this branch, not a spec.
+Target: $ARGUMENTS
+## The mapping (read this first)
+The **reference** is whatever the project uses as design truth (declared in `CLAUDE.md` / design docs): a **design system as code** rendered in a **gallery/showcase**, or a **mockup/proposal** pipeline. The audit checks whether the **live surfaces compose that reference faithfully** — right tokens, right components, right variants/states. So the side-by-side is *reference vs. live surface*.
+## When to use this vs `/design-audit`
+| You want…                                                              | Use                |
+| ---------------------------------------------------------------------- | ------------------ |
+| A fast compare-and-fix loop against the running app, edits land now    | **this command**   |
+| A thorough remediation spec to review, then turn into a tracker project | `/design-audit`   |
+| A pure design critique with no implementation in scope                 | `/design-critique` |
+A gap this loop **escalates** (P0 structural / S0–S1 design concern) graduates into `/design-audit` for a proper spec, or `/file-issue`. The P0–P3 / S0–S3 vocabulary below is shared with `/design-audit` on purpose.
+## Anchors (resolve to the project's actual paths)
+| What                       | Where (project-defined)                                         |
+| -------------------------- | -------------------------------------------------------------- |
+| Component showcase (ref)   | the rendered gallery/showcase, if one exists                   |
+| Design tokens (truth)      | colors / typography / spacing / elevation / motion sources      |
+| Components                 | the project's component library                                 |
+| Mockups / proposals        | the proposal pipeline, if one exists                           |
+| Live surfaces              | the consuming app surfaces                                      |
+| Routing / surface split    | the app's router                                               |
+| Brand / voice / lexicon    | the project's design/brand brief                               |
+| Local dev / git conventions | `CLAUDE.md`                                                    |
+| Browser control            | a browser-control MCP (e.g. `chrome-devtools`)                 |
+## Phase 0: Preflight — is the dev server up?
+This loop is worthless if the app isn't running. Check before anything else. **Do not silently start services** — if it's down, report and stop.
+```bash
+# The dev server must be listening (port is project-defined)
+lsof -nP -iTCP:<dev-port> -sTCP:LISTEN
+```
+- **Listening** → proceed.
+- **Down** → tell the operator to start the project's dev server and stop.
+**Worktree bootstrap — check this when the session runs inside a git worktree.** A freshly created worktree may have no installed dependencies. Detect a worktree:
+```bash
+# In a worktree, --git-dir differs from --git-common-dir; in the main checkout they're equal.
+[ "$(cd "$(git rev-parse --git-dir)" && pwd)" != "$(cd "$(git rev-parse --git-common-dir)" && pwd)" ] && echo "worktree"
+```
+Bootstrap per the project (install deps + any codegen) before the dev server can start.
+**Worktree gotcha — check this explicitly.** Hot reload only picks up edits if the dev server's working directory is inside *this* checkout. If the server is running from a different worktree or the main checkout, every fix will appear to do nothing. If you can't confirm the server's cwd, say so and ask the operator to restart it from this directory.
+If the browser-control MCP tools are unavailable, stop and tell the operator — this command depends on them. A screenshot-only MCP is a fallback for capture but can't drive the interactive loop as well.
+## Phase 1: Scope and route resolution
+1. Resolve `$ARGUMENTS` to a live surface and its route (confirm by navigating, don't assume) — the surface→route→source mapping is project-defined.
+2. If the surface carries a **demo/state switcher** (so you can reach empty/loading/error and any lifecycle states without a backend), use it to cover those states.
+3. If `$ARGUMENTS` is ambiguous, ask the operator once with `AskUserQuestion`.
+## Phase 2: Side-by-side capture
+Open both in the browser so every comparison is concrete:
+1. **Reference** — open the gallery/showcase (or the mockup) for the component(s) in scope.
+2. **Live surface** — open the resolved route from Phase 1.
+3. Resize both to **1440×900** (primary desktop). Screenshot each. This pair is the comparison anchor.
+4. If responsive is in scope, repeat at **768** (tablet) and **375** (mobile).
+5. Audit **both themes** if the project ships light/dark — drift often hides in one theme only.
+Use the accessibility-tree snapshot to find/identify elements, and computed-style reads when a visual diff is ambiguous (e.g. is that the token, or a magic hex?). Use screenshots for the visual record.
+## Phase 3: Interactive audit walk
+Walk the live surface against the reference. Because this is the *running* app, you can exercise things a source read can't.
+**Fidelity dimensions** (same as `/design-audit` — don't skip any): Layout & IA · Components (system vs hand-rolled, right variant/state) · Copy (lexicon + voice) · Design tokens (**magic hex / magic px = P1 gap**) · States (drive lifecycle + hover/focus/loading/error) · Interactions (hover/active/focus, motion tokens, `prefers-reduced-motion`, disabled, form validation — actually click and observe) · Responsive (1440 / 768 / 375, no horizontal scroll) · Accessibility (run an a11y audit; contrast, focus ring, ARIA on icon-only buttons, heading order).
+**Live-only checks — capture these too** (a static `/design-audit` misses them):
+- Console messages — JS errors, framework warnings, prop-type complaints.
+- Layout shift / flashes on load.
+For each divergence, log a gap (Phase 5 format). Triage immediately into two buckets:
+- **Fix-live** — P1 visual drift, P2 polish, wrong tokens, wrong copy, wrong component, wrong hover/focus/transition. Goes into Phase 4.
+- **Escalate** — P0 structural (a whole component or section missing — may be unbuilt work, not drift), or an S0/S1 design concern (the *design itself* is questionable). Surface to the operator; don't hack a structural feature into place to chase a screenshot. For an S0/S1 judgment call, pull in the **leith** subagent for a brand/UX second opinion (and **nyx** if it touches a security boundary).
+## Phase 4: Rapid fix loop
+The core of the command. For each **fix-live** gap the operator greenlights, one at a time:
+1. **Locate** the source — the surface file, or a shared component. Use the snapshot + visible text/classes to pin the element to a file.
+2. **Edit** — make the smallest change that closes the gap. Prefer fixing the **surface to use the system** (swap hand-rolled markup for the design-system component, swap a magic hex for the token) over patching pixels locally. Never introduce a magic hex/px.
+3. **Reload** — hot reload usually reflects the change in ~1s. Navigate with reload if it didn't catch. Watch console messages for new errors.
+4. **Re-verify** — re-screenshot the live route at the same viewport (and both themes if relevant), compare to the reference again. Fixed → mark resolved. Not fixed → iterate.
+5. **Iteration cap** — if a single gap isn't closed after **3** attempts, stop, write down what's blocking it, move on.
+Work the buckets in order: **P0 escalations first** (operator decides scope), then **P1**, then **P2**. Leave **P3** unless the operator pulls them in.
+Keep the operator in the loop — this is interactive. After a cluster of related fixes, show the before/after and let them react before moving on.
+## Phase 5: Gap log (lightweight)
+Track findings in `tmp/design-audit-local-<surface>-<YYYY-MM-DD>.md` (`mkdir -p tmp` first; gitignored). A working scratchpad, **not** a `docs/specs/` deliverable — keep it terse. It exists so nothing is lost and so escalated gaps can graduate into a `/design-audit` spec.
+Per gap:
+```markdown
+### Gap: <one-line summary>
+- **Severity**: P0 broken | P1 visual drift | P2 polish | P3 nice-to-have (or S0–S3 for a design concern)
+- **Type**: missing | wrong-state | wrong-tokens | wrong-copy | wrong-component | wrong-interaction | wrong-layout | a11y | runtime-error
+- **Reference**: <gallery demo / mockup / component / token file>
+- **Implementation**: `<surface file>:<line>`
+- **Observed (live)**: <what the running app does — attach screenshot ref>
+- **Expected (system)**: <what the reference prescribes>
+- **Disposition**: FIXED-LIVE (commit <pending>) | ESCALATED (<why — needs spec / unbuilt feature / S0-S1 design concern>) | BLOCKED (<what stopped the fix>)
+```
+## Phase 6: Handoff
+Final report to the operator:
+- **Surface + route audited**, viewports + themes covered.
+- **Fixed live** — count + one line each. These are uncommitted edits on this branch.
+- **Escalated** — gaps that need a spec or are unbuilt features. Recommend `/design-audit <surface>` for a full remediation spec, or `/file-issue` per gap.
+- **Blocked** — gaps that resisted 3 fix attempts, with the blocker.
+- **Live-only findings** — console errors, a11y misses.
+- **State coverage** — which lifecycle/empty/error states you verified, which you couldn't (and why).
+- **Next step** — remind the operator the fixes are uncommitted: run `/pre-push` to gate, then commit per `CLAUDE.md` § Git / PR conventions. Don't auto-commit or auto-push.
+## Operating rules
+- **Preflight is non-negotiable.** Don't start the loop against an app that isn't running, and don't silently boot services — report and let the operator start it.
+- **Fix-live means visual fidelity, not feature work.** A whole missing component or section is a P0 escalation, not something to scaffold mid-loop.
+- **Use the system, don't reinvent it.** The best fix is usually making the surface consume the design-system component instead of hand-rolled markup. Token compliance is not optional — any color/spacing/type/radius not from the project's tokens is a P1, even if it looks identical.
+- **One gap at a time.** Edit → reload → re-verify → next. Batching makes it impossible to tell which edit broke what.
+- **The system is the reference, not gospel.** If the *design itself* is wrong (S0/S1), escalate it — don't faithfully implement a bad design. That's a `/design-audit` Phase 3 / `/design-critique` conversation.
+- **Sanity before fidelity.** If the surface has a fundamental design problem, surface it loudly before grinding pixel gaps.
+- **Cite, don't summarize.** Every gap references the reference side and `<surface file>:<line>` so the operator can verify in 30 seconds.
+- **States are the product.** Use the demo switcher to reach lifecycle/empty/error states; don't log a gap for a state you couldn't actually observe.
+- **Don't commit or push.** This loop ends with uncommitted edits. The operator runs `/pre-push` and commits.
+See `CLAUDE.md` § Pre-push gate and § Git / PR conventions for the gate and commit rules, and the project's design/brand brief for brand/voice.

package/core/commands/design-audit.md ADDED Viewed

@@ -0,0 +1,235 @@
+---
+description: Audit a live surface against the design reference (mockups or design system + tokens/components) with a fine-grained eye, produce a remediation spec
+argument-hint: <surface name, a route path, a screenshot path, or a supplied mockup file>
+---
+Compare a live surface to its design reference with a **fine-fine-fine eye** and produce a remediation spec.
+Target: $ARGUMENTS
+## The mapping (read this first)
+The **reference side** is whatever the project uses as design truth — pick whichever the project defines in `CLAUDE.md` / its design docs:
+- A **design-proposal pipeline** (mockup HTML / Figma exports / image comps) to diff the live surface against, OR
+- A **design system as code** (tokens + components) and its **rendered gallery/showcase**, where the audit checks whether the live surfaces compose that system faithfully (right tokens, right components, right variants/states).
+The **live side** is the consuming surface(s) in the app.
+This command supports three input forms for `$ARGUMENTS`:
+1. **A surface or route** — audit that live surface against the reference.
+2. **A screenshot path** — audit the rendered pixels in the image against the reference; you won't have line numbers for the "live" side, so cite the screenshot region and the nearest source file you can locate.
+3. **A supplied mockup** (an HTML/image file handed to you for a *not-yet-built* surface) — treat the mockup as the proposed design, run the sanity check on it, and audit how well it would compose the existing system.
+## Anchors (resolve to the project's actual paths)
+| What                       | Where (project-defined)                                          |
+| -------------------------- | --------------------------------------------------------------- |
+| Design tokens (truth)      | colors / typography / spacing / elevation / motion sources       |
+| Components                 | the project's component library                                  |
+| Component showcase (ref)   | the rendered gallery/showcase, if one exists                    |
+| Mockups / proposals        | the proposal pipeline, if one exists                            |
+| Live surfaces              | the consuming app surfaces                                       |
+| Routing / surface split    | the app's router                                                |
+| Brand / voice / lexicon    | the project's design/brand brief                                |
+| Personas + flows           | the project's persona/flow docs                                 |
+| Prior work                 | `gh pr list --state merged --search "<keyword>"` · `gh issue list --search "<keyword>"` |
+## Phase 1: Scope
+1. Resolve `$ARGUMENTS` to a target (a surface, route, screenshot, or mockup). Confirm by navigating, don't assume.
+2. If ambiguous, ask the user once with `AskUserQuestion`.
+3. For a live surface, list the component tree that renders it and note which design-system components it consumes vs. hand-rolled local markup.
+## Phase 2: Load prior context
+Pull what's been tried, what's open:
+```bash
+gh pr list --state merged --search "<keyword>" --json number,title,mergedAt --limit 20
+gh issue list --search "<keyword>" --json number,title,labels --limit 20
+```
+If the work is tracked on a ProjectV2 board, use `gh project item-list <num> --owner <owner> --format json` (the MCP can't do ProjectV2). Note any **tried-and-stuck** patterns — closed issues with PRs that didn't land the design correctly need a different approach than untouched gaps.
+## Phase 3: Design sanity check (does this design even make sense?)
+**Before** spending cycles on fidelity, ask: is the design itself good? A faithful build of a bad design is still a bad product.
+Use the **`design-critique`** skill as the engine if available. Pass it the target (surface source, screenshot, or mockup), the project's actors/personas, and the brand/voice brief. Get a second opinion from **leith** for brand fit. Loop in **nyx** if anything touches a security boundary (tenant isolation, trust frame, sandbox). Loop in **otto** if an AI/LLM interaction's cost or latency looks off.
+Work through the ten sanity questions from [/design-critique](design-critique.md) for the surface, and record sanity findings separately from fidelity findings:
+```markdown
+### Sanity finding: <one-line summary>
+- **Severity**: S0 fundamental (redesign before implementing) | S1 significant (rework recommended) | S2 minor (next iteration) | S3 nitpick
+- **Lens**: one of the ten sanity questions
+- **Observed**: <what the design does>
+- **Concern**: <why this won't serve the actor / brand>
+- **Evidence**: <persona, lexicon/voice rule, heuristic, prior failure>
+- **Recommended change**: <specific design adjustment, not just "fix it">
+```
+**Stop-flag** — if you find any **S0**, surface it loudly. A fidelity audit on top of a fundamentally flawed design is wasted effort.
+## Phase 4: Audit (fine-fine-fine)
+For each surface, work through every dimension. Use the **leith** subagent for UX/brand judgment calls — pass her the live source paths and the reference.
+**Dimensions** (don't skip any):
+| Dimension         | What to check                                                                                                       |
+| ----------------- | ------------------------------------------------------------------------------------------------------------------ |
+| **Layout & IA**   | Grid, section order, surface chrome, nav entry, breadcrumbs.                                                        |
+| **Components**    | Does the surface use the **design-system component**, or hand-roll markup the system already provides? Right variant? Right default state? Compare against the reference (gallery / mockup / component source). |
+| **Copy**          | Headings, labels, microcopy, button text, empty-state, error-state, tooltips. Must follow the lexicon and the project's voice. Often the most-divergent dimension. |
+| **Design tokens** | Colors/type/spacing/radii/elevation must be the project's tokens — **no magic hex / magic px** in surface code.      |
+| **States**        | Empty, loading, error, dense, plus any product lifecycle states. Find the states the live surface is missing.        |
+| **Interactions**  | Hover/active/focus. Transitions honor the motion tokens and `prefers-reduced-motion`. Click affordances, disabled states, form validation. |
+| **Responsive**    | Layout holds at desktop / tablet / mobile. No horizontal scroll.                                                    |
+| **Accessibility** | Contrast meets WCAG AA. Visible focus ring. ARIA on icon-only buttons. Logical heading order. Keyboard nav. (Lean on the project's a11y/contrast checks if it ships any.) |
+For each gap found, record:
+```markdown
+### Gap: <one-line summary>
+- **Severity**: P0 broken | P1 visual drift | P2 polish | P3 nice-to-have
+- **Type**: missing | wrong-state | wrong-tokens | wrong-copy | wrong-component | wrong-interaction | wrong-layout | a11y
+- **Reference**: <gallery demo / mockup / component / token file>
+- **Implementation**: `<surface file>:<line>` (or screenshot region if the input was an image)
+- **Observed**: <what's actually there>
+- **Expected**: <what the system / reference prescribes>
+- **Proposed fix**: <concrete change>
+- **Effort**: S | M | L
+- **Prior attempt**: <link to issue/PR if one already tried this>
+```
+**Optional live verification** — if the app's dev server is running and a browser-control MCP (e.g. `chrome-devtools`) is available, capture the actual rendered DOM/computed CSS and diff against the reference for any gap where the static read is ambiguous (e.g. "is that the token or a magic hex?"). Otherwise rely on source analysis.
+> For a **fast compare-and-fix loop** against the running app — where edits land immediately rather than into a spec — use [/design-audit-local](design-audit-local.md). This command (`/design-audit`) is the thorough, spec-producing audit; they share the P0–P3 / S0–S3 vocabulary so findings move between them.
+## Phase 5: Write the spec
+Output path: `docs/specs/fidelity-<surface>-<YYYY-MM-DD>.md` (`mkdir -p docs/specs` first).
+Skeleton:
+```markdown
+# <Surface> Fidelity Audit — <date>
+Status: Draft
+Owner: <operator>
+Generated by: /design-audit
+## Goal
+Bring `<surface>` to fidelity with the design reference. **If the design sanity check below surfaced S0/S1 concerns, address those before fidelity work.**
+## Scope
+- Surface(s) audited: <list>
+- Implementation files in scope: <list>
+- Reference: <design-system components + gallery, or mockup pipeline>
+- Prior attempts: <merged PRs, open issues>
+## Design sanity (does this design make sense?)
+### S0 — Fundamental (redesign before implementing)
+<entries — or "None">
+### S1 — Significant (rework recommended in parallel with fidelity fixes)
+<entries>
+### S2 — Minor (note for next iteration)
+<entries>
+### S3 — Nitpick
+<entries>
+**Verdict**: <one of>
+- ✅ Design is sound — proceed straight to fidelity remediation.
+- ⚠️ Design needs rework on N items — recommend a redesign pass before / alongside fidelity work.
+- 🛑 Design is fundamentally broken — stop fidelity work; redesign first.
+## Fidelity findings
+### P0 — Broken
+<gap entries>
+### P1 — Visual drift
+<gap entries>
+### P2 — Polish
+<gap entries>
+### P3 — Nice-to-have
+<gap entries>
+## Phased remediation plan
+### Phase 1: P0 fixes (one PR)
+- <issue stubs>
+### Phase 2: P1 visual drift (one PR)
+- <issue stubs>
+### Phase 3: P2 polish (one or more PRs)
+- <issue stubs>
+P3s deferred unless explicitly pulled in.
+## Tried-and-stuck patterns
+<attempts that didn't land the design correctly + a recommended different approach>
+## Acceptance criteria (per phase)
+<testable criteria for "done" — Marcelo should verify these are testable before handoff>
+## Open questions
+<anything the operator needs to decide before implementation>
+```
+## Phase 6: Testability pass
+Hand the draft to the **marcelo** subagent for a testability review. Every acceptance criterion must be specific and testable ("the sign-in CTA renders the primary `Button` with text `Sign in`" — not "button works"). Update the spec with Marcelo's revisions.
+## Phase 7: Report
+Final response to operator:
+- Spec path.
+- **Design sanity verdict** — sound / rework / broken. Counts: S0 / S1 / S2 / S3.
+- **Fidelity counts**: P0 / P1 / P2 / P3.
+- Top 3 most-impactful fixes (mix of sanity + fidelity, prioritized by actor impact).
+- Anything from prior attempts that needs a different approach this time.
+- **If any S0 sanity findings**, lead with them — "⚠️ This design needs rework on X before fidelity work is worthwhile."
+- Explicit request: "review the spec; when ready, run `/generate-project @<spec-path>` to convert it to a tracker project with phased issues."
+## Operating rules
+- **Sanity before fidelity.** Run Phase 3 first. If you find an S0, surface it immediately and shortcut the fine-fine-fine fidelity audit — a perfect build of a broken design is still broken. Operator decides whether to redesign or proceed.
+- **No surface-level scans.** Open every relevant file. Read every design-system component the surface uses. The whole point of this command is the fine eye.
+- **Use the system, don't reinvent it.** Hand-rolled markup that duplicates a design-system component is a finding even if it looks identical — the system is the contract.
+- **Token compliance is not optional.** Any color/spacing/type/radius/shadow not from the project's tokens is a P1, even if visually close.
+- **Don't auto-create issues yet.** The user reviews the spec first. Issue creation happens via `/generate-project` on the operator's command.
+- **Cite, don't summarize.** Findings reference the reference side and `<surface file>:<line>` so the reviewer can verify each gap in 30 seconds.
+- **States count.** A design that breaks down on empty/error/lifecycle states is a finding — those states are the product, not error cases.
+See `CLAUDE.md` § Filing issues for how the eventual issues should be shaped.

package/core/commands/design-critique.md ADDED Viewed

@@ -0,0 +1,96 @@
+---
+description: Sanity-check a proposal, mockup, screenshot, or live screen for design quality (does this make sense? will users bounce off it? are we designing this well?)
+argument-hint: <proposal name, route path, file path, screenshot path, URL, or freeform description>
+---
+Stand-alone design critique. Use this to evaluate a design **before** committing to implementation, or to review a live screen for design quality. It is artifact-agnostic: the target can be a mockup, a proposal, a screenshot, a freeform description, or a live surface.
+Target: $ARGUMENTS
+For a full reference-vs-live comparison (sanity + fidelity), use [/design-audit](design-audit.md) instead. For a fast live fix loop, use [/design-audit-local](design-audit-local.md).
+## What "the reference" is
+The project's **design reference** is whatever the project uses as its source of truth — pick whichever exists:
+- A **design-proposal pipeline** (mockup HTML / Figma exports / image comps), or
+- A **design system as code** (tokens + components) plus a rendered component gallery/showcase, or
+- A documented **brand / voice / lexicon** and persona docs.
+`CLAUDE.md` (or the project's design docs) declares where these live. A critique asks two things: is the *design idea* sound, and does it *belong to this product's system and voice*?
+## Anchors (resolve to the project's actual paths)
+| What                    | Where (project-defined)                                          |
+| ----------------------- | --------------------------------------------------------------- |
+| Design tokens (truth)   | the project's token source (colors, type, spacing, elevation, motion) |
+| Components              | the project's component library                                  |
+| Component showcase/gallery | the rendered reference, if one exists                        |
+| Mockups / proposals     | the proposal pipeline, if one exists                            |
+| Brand / voice / lexicon | the project's design/brand brief                                |
+| Personas + flows        | the project's persona/flow docs                                 |
+| Brand context (agent)   | `.claude/agents/leith.md`                                       |
+## Engine
+Use the **`design-critique`** skill as the engine if available. Get a second opinion from the **leith** subagent for brand fit (she holds the voice and value prop). Loop in **nyx** if anything raises tenant-isolation, sandbox-escape, or PII concerns. Loop in **otto** if there's an AI/LLM interaction whose cost or latency model looks off.
+## The ten questions (work through every one)
+| #   | Question                                                                                  | Lens                 |
+| --- | ----------------------------------------------------------------------------------------- | -------------------- |
+| 1   | Does this serve the actor's actual job-to-be-done?                                         | Product strategy     |
+| 2   | Is the IA / mental model right for that actor?                                             | UX architecture      |
+| 3   | Is the most important thing the most prominent?                                           | Visual hierarchy     |
+| 4   | How many clicks / decisions / words to get value?                                         | Cognitive load       |
+| 5   | What does this look like across every realistic state — empty, partial, loading, error, dense, and any lifecycle states? States are the product, not error cases. | Realistic states |
+| 6   | Does it feel like *this product*, or generic SaaS chrome?                                  | Brand alignment      |
+| 7   | Are we asking for / showing data we shouldn't? Cross-tenant / PII leak risk?               | Privacy/security     |
+| 8   | Where does this actor typically get stuck on this kind of screen?                          | Actor empathy        |
+| 9   | Better, worse, or same vs the closest competitor for this job?                             | Competitive position |
+| 10  | If a new user landed here cold, value understood in < 30s?                                 | Onboarding fitness   |
+## Severity scale
+- **S0 fundamental** — design needs rework before any implementation; would actively harm the product.
+- **S1 significant** — design rework recommended; will frustrate an actor or undercut the value prop.
+- **S2 minor** — note for next iteration; works but suboptimal.
+- **S3 nitpick** — taste-level; surface but don't block on.
+## Finding format
+```markdown
+### <one-line summary>
+- **Severity**: S0 | S1 | S2 | S3
+- **Lens**: which of the ten questions
+- **Observed**: <what the design does>
+- **Concern**: <why this won't serve the actor / brand / user>
+- **Evidence**: <persona, competitor, lexicon/voice rule, heuristic, prior failure>
+- **Recommended change**: <specific design adjustment — not just "fix it">
+- **Effort to redesign**: S | M | L
+```
+## Output
+Two modes depending on scope:
+- **Quick critique** (default for small scope — one screen, no implementation file given): respond inline with a findings table + verdict. No file written. Operator reads, decides.
+- **Formal critique** (target is a full surface/proposal or operator asks for a deliverable): write `docs/critiques/<target-slug>-<YYYY-MM-DD>.md` (`mkdir -p docs/critiques` first) with findings, verdict, and a recommended next step.
+Final verdict, regardless of mode:
+- ✅ **Design is sound.** Proceed to implementation (or fidelity audit via `/design-audit` if implementation already exists).
+- ⚠️ **Design needs rework on N items.** List the S0/S1s. Recommend a redesign pass before implementation.
+- 🛑 **Design is fundamentally broken.** Don't implement. Redesign first.
+## Operating rules
+- **Be specific.** "This is confusing" is not a finding. "The action panel has three CTAs at equal weight — the user can't tell which to click first" is.
+- **Cite evidence.** Reference the actor, a lexicon/voice rule, a token/component the system already provides, a competitor's pattern, or a usability heuristic. Don't critique on taste alone.
+- **Distinguish design from implementation.** This command critiques the design itself. If the user asks "is the live page good?", separate "the design is bad" from "the implementation drifted from the system" — that second one is a `/design-audit` conversation.
+- **Don't pull punches on brand.** Generic SaaS chrome that doesn't feel like the product is an S1, not an S3. The brand is part of the product.
+- **Empty / loading / error / lifecycle states count.** A design that's only good when everything is mid-flight is an S1.
+- **Speculate honestly about actor reaction**, and label it as speculation. "Best guess: the user will hesitate here because nothing signals X" — say it, mark it a guess, not a fact.
+When in doubt about S1 vs S2, err toward S1. Bad design is hard to recover from later.

package/core/commands/file-issue.md ADDED Viewed

@@ -0,0 +1,22 @@
+---
+description: File a tracker issue using Bert's workflow (investigate, write body to tmp/, create via the issue-tracker skill)
+argument-hint: <brief description of the issue>
+---
+Use the **bert** subagent to investigate and file an issue. Full reference: [.claude/workflows/issue-tracking.md](.claude/workflows/issue-tracking.md).
+Relies on the project's installed **issue-tracker skill** (`.claude/skills/issue-tracker-*/`) for the issue write; its named operations work the same whether the tracker is GitHub or Linear.
+Issue context: $ARGUMENTS
+Bert should:
+1. **Investigate** the codebase for the real root cause — no surface-level fixes, no band-aids.
+2. **Write the body** to `tmp/issue-body-<slug>.md` using the Write tool. No heredocs / cat pipes.
+3. **Classify the issue** by type and lead the title with the matching emoji: 🐛 bug · ✨ feature · 💄 enhancement · 📄 content · 🏗️ infrastructure. Body should cover problem/RCA, evidence with `file:line`, fix direction, and acceptance criteria.
+4. **Create the issue** via the project's issue-tracker skill's `create-issue` operation (`.claude/skills/issue-tracker-*/`), passing the `tmp/issue-body-<slug>.md` body file, the title, and the `<type>[,<area>]` label(s). On GitHub the skill runs `gh issue create --body-file …`.
+5. **Report back**: number/ID, URL, short RCA + fix summary, labels.
+Conventions: `CLAUDE.md` § Filing issues.
+If unsure whether to file or fix: ask. Typos and one-liners just fix.

package/core/commands/generate-project.md ADDED Viewed

@@ -0,0 +1,45 @@
+---
+description: Generate a feature spec, plan, and tracker project from a goal (multi-agent design loop through Leith, Melvin, Nyx, Marcelo, Jody)
+argument-hint: <goal description, plus any attachment paths or links>
+---
+Follow the 8-phase workflow in [.claude/workflows/generate-project.md](.claude/workflows/generate-project.md).
+Tracker project + issue writes go through the project's installed **issue-tracker skill** (`.claude/skills/issue-tracker-*/`) and its named operations (`create-issue`, `create-board`, `add-to-board`, `set-board-field`, `query-board`) — the same flow works whether the tracker is GitHub or Linear. On GitHub the skill carries the `gh project` / `gh api graphql` specifics.
+Goal: $ARGUMENTS
+## Quick orientation
+You (the main session) are the orchestrator. Subagents enrich the spec; you reconcile and own the merge.
+**Subagents** (`.claude/agents/`):
+- **leith** — product/UX, user stories, acceptance criteria from a user perspective
+- **melvin** — architecture, service impact, data/consistency, scale/latency/cost
+- **nyx** — security/privacy, threat model, tenant isolation, abuse cases
+- **marcelo** — testing strategy, input validation matrix, quality gates
+- **jody** — phased plan, tracker project + issues, dependencies, owner/labels
+**Phase sequence** (per the workflow doc):
+1. **Intake** — capture goal, attachments, constraints.
+2. **Explore** — `docs/agent-lessons/` (project-local lessons), `docs/architecture/`, `CLAUDE.md`, existing specs, related code, existing issues/projects.
+3. **Clarifying questions** — one batched set, 5–8 max, each unlocks a decision.
+4. **Draft spec** — `docs/specs/<feature-slug>.md` with the section skeleton from the workflow doc.
+5. **Design loop** — Leith and Melvin in parallel, then Nyx, reconcile findings, then Marcelo, then Jody.
+6. **Tracker project mechanics** — use the issue-tracker skill's `create-board` / `add-to-board` / `set-board-field` operations to stand up the board and set Phase/Priority/Owner (on GitHub these run `gh project` / `gh api graphql` for ProjectV2; the `github` MCP doesn't do projects). Issue bodies in `tmp/issue-body-<slug>.md` via `create-issue` — never heredoc.
+7. **Sanity check** — fresh subagent (e.g. `Plan` or `general-purpose`) reviews spec/project/issues with no design-loop context.
+8. **Reflect** — durable lessons → `docs/agent-lessons/` (and PR generalizable ones back to the engsys `lessons-library/`); agent role changes → `.claude/agents/*.md`; automatic behaviors → `CLAUDE.md` or `.claude/commands/*.md`.
+## Key invariants
+- Don't ask generic questions before exploring the repo.
+- Don't proceed past Nyx until security-required product/architecture changes are reconciled.
+- One project phase = one implementation PR. Each issue = one commit. This shape must hold in Jody's plan so [/implement-issue](implement-issue.md) and [/implement-project](implement-project.md) work cleanly downstream.
+- Boards and custom fields go through the issue-tracker skill's `create-board` / `set-board-field` operations. On GitHub those **must** use `gh project` / `gh api graphql` (the `github` MCP server doesn't support ProjectV2); the skill handles that.
+- Every project carries a `Phase` single-select field with `P<n>: <name>` options — without it `/implement-project` can't batch. No item may be left with an empty Phase.
+- Issue bodies in `tmp/issue-body-<slug>.md`, created via the skill's `create-issue` operation (GitHub: `gh issue create --body-file …`). Never HEREDOC.
+- Final report: spec path, project URL, issues by phase, subagent contributions, sanity-check result, learning updates, open questions, explicit request for operator review before implementation starts.
+See `CLAUDE.md` for the project's tool-preference order and filing-issue conventions, and the installed issue-tracker skill (`.claude/skills/issue-tracker-*/`) for the concrete board/issue mechanics on the active tracker.

package/core/commands/implement-issue.md ADDED Viewed

@@ -0,0 +1,37 @@
+---
+description: Implement a tracker issue in an isolated worktree, run the full PR cycle (commit, local code review, push, PR)
+argument-hint: <issue-number>
+---
+Use the **isabelle** subagent, following the per-issue cycle in [.claude/workflows/agent-implementation-workflow.md](.claude/workflows/agent-implementation-workflow.md).
+Issue/work-item reads and writes (claim, findings) go through the project's installed **issue-tracker skill** (`.claude/skills/issue-tracker-*/`) and its named operations — the same flow works whether the tracker is GitHub or Linear. PR creation and CI stay on `gh` / GitHub.
+Issue: #$ARGUMENTS
+**Before invoking Isabelle**, read the issue (skill `get-issue`) and evaluate against the project's model-escalation criteria in `CLAUDE.md`. Isabelle defaults to **Sonnet**; pass `model: "opus"` on the Agent call if those criteria are met.
+Isabelle should:
+1. **Claim**: assign the issue to herself via the issue-tracker skill's `update-issue` operation (GitHub: `gh issue edit $ARGUMENTS --add-assignee @me`).
+2. **Worktree + branch (with `-b` flag, critical)** — create the worktree and branch in one command, from `origin/main`:
+   ```bash
+   git worktree add ../worktrees/issue-$ARGUMENTS-<slug> -b agent/$ARGUMENTS-<slug> origin/main
+   cd ../worktrees/issue-$ARGUMENTS-<slug>
+   # bring over any local env the build needs (project-specific); fall back to the template
+   cp ../../.env .env 2>/dev/null || cp .env.example .env 2>/dev/null || true
+   # install deps + any codegen the project requires (see CLAUDE.md)
+   ```
+   Run every build/lint/test command **inside this worktree dir** (not the main checkout).
+3. **Implement** — read the issue and existing code first; match patterns; one implementation commit referencing `(#$ARGUMENTS)`.
+4. **Pre-push gate** (no exceptions): run the project's pre-push gate / precheck (build, lint, format, test, plus any path-gated checks for migrations / IaC / containers the project defines). See [/pre-push](pre-push.md).
+5. **Local code review** (before push): run a local code review with the built-in `/code-review` skill against `origin/main`. Fix Critical + Warning findings, re-run once to confirm clean, cap at ~2 passes. Go in clean.
+6. **Push + PR**: branch first, then `gh pr create --draft --base main --body-file tmp/pr-body-$ARGUMENTS.md` (PR creation stays on `gh`). Link/close the work item per the issue-tracker skill's `link-pr` operation — on GitHub the PR body uses **one Closes per line**, `Closes #$ARGUMENTS` (a comma-list only closes the first); other trackers per the skill. Then persist the local review findings onto the work item via the skill's `comment-issue` operation (each finding + resolution) — the durable record the closeout ceremony mines.
+7. **Cleanup after merge**: empty the worktree dir, `git worktree remove`, delete the local branch, `git worktree prune`.
+Conventions: `CLAUDE.md` § Git / PR conventions and § Code review.
+Reflection: if work exposed a reusable failure, update `docs/agent-lessons/` (and PR generalizable lessons back to the engsys `lessons-library/`) before handoff.