npm - @appsforgood/next-supabase-kit - Versions diffs - 0.1.4 → 0.1.6 - Mend

@appsforgood/next-supabase-kit 0.1.4 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

package/CHANGELOG.md +12 -0
package/DOGFOOD.md +24 -0
package/LOOP_CODING.md +107 -0
package/MAINTAINER_RELEASE.md +100 -0
package/README.md +40 -4
package/REPOSITORY_SETTINGS.md +7 -3
package/SUPPLY_CHAIN.md +5 -5
package/UPGRADE.md +2 -1
package/antigravity/commands/accessibility-pass.toml +16 -0
package/antigravity/commands/browser-qa.toml +18 -0
package/antigravity/commands/distinctiveness-pass.toml +16 -0
package/antigravity/commands/frontend.toml +5 -4
package/antigravity/commands/layout-cleanup.toml +16 -0
package/antigravity/commands/responsive-cleanup.toml +16 -0
package/antigravity/commands/screenshot-critique.toml +16 -0
package/antigravity/commands/ui-audit.toml +17 -0
package/antigravity/commands/ui-polish.toml +17 -0
package/antigravity/plugin.json +9 -0
package/checklists/ui-acceptance-rubric.md +58 -0
package/checklists/ui-detectors.md +75 -0
package/dist/index.js +1090 -411
package/dist/index.js.map +1 -1
package/dist/studio/office/assets/office.css +188 -29
package/dist/studio/office/assets/office.js +72 -50
package/dist/studio/wizard/assets/wizard.css +157 -26
package/dist/studio/wizard/assets/wizard.js +78 -70
package/examples/next-supabase-installed/.agent-kit/agent-roster.json +7 -3
package/examples/next-supabase-installed/.agent-kit/manifest.json +13 -11
package/examples/next-supabase-installed/audit-output.json +22 -2
package/examples/next-supabase-installed/tree.txt +1 -0
package/package.json +28 -7
package/prompts/ui-command-index.md +124 -0
package/research/summaries/agentic-engineering-maturity-levels.md +54 -0
package/rosters/next-supabase-default-council.json +37 -12
package/runtime-skills/ui-improvement-harness/SKILL.md +12 -0
package/schemas/agentic-level.schema.json +47 -0
package/schemas/onboarding-state.schema.json +4 -1
package/skills/ui-improvement-harness.md +96 -0
package/templates/next-supabase/AGENT_ROSTER.md +6 -3
package/templates/next-supabase/ASSISTANT_ADAPTERS.md +3 -1
package/templates/next-supabase/DECISIONS.md +14 -0
package/templates/next-supabase/DESIGN.md +3 -0
package/templates/next-supabase/DOCS.md +7 -1
package/templates/next-supabase/LOOP_CODING.md +98 -0
package/templates/next-supabase/QUALITY_GATES.md +4 -2
package/templates/next-supabase/SKILLS.md +14 -0
package/templates/next-supabase/SPEC.md +5 -1
package/templates/next-supabase/STYLE_GUIDE.md +3 -1
package/templates/next-supabase/TESTING.md +14 -0

package/templates/next-supabase/DECISIONS.md CHANGED Viewed

@@ -44,6 +44,20 @@ Runtime command files are adapters. `AGENTS.md`, `.agent-kit/agent-roster.json`,
 Native commands improve invocation ergonomics, but project policy, security gates, handoff rules, model routing, and documentation contracts stay centralized in Agent Kit files.
+## UI Improvement Harness Rule
+### Context
+Frontend work needs repeatable audit, polish, screenshot, responsive, accessibility, distinctiveness, and browser QA loops rather than one-off taste review.
+### Decision
+Use `.agent-kit/prompts/ui-command-index.md`, `.agent-kit/checklists/ui-detectors.md`, `.agent-kit/checklists/ui-acceptance-rubric.md`, and `.agent-kit/skills/ui-improvement-harness.md` as the source of truth for UI improvement commands and detector severity.
+### Consequences
+Meaningful UI work must classify blocker, major, and minor findings, require desktop and mobile screenshot evidence, and include authenticated or permission-state evidence for protected screens.
 ## Agent Kit Model Routing
 ### Context

package/templates/next-supabase/DESIGN.md CHANGED Viewed

@@ -103,6 +103,7 @@ Run `.agent-kit/prompts/frontend-distinctiveness-benchmark.md` before accepting
 | Asset provenance | Real, generated, licensed, and placeholder assets identified with usage constraints |
 | State proof | Loading, empty, error, disabled, success, permission, and focus states captured where relevant |
 | Visual QA proof | Desktop, mobile, and high-risk state evidence reviewed for the change risk |
+| UI detector proof | `.agent-kit/checklists/ui-detectors.md` completed with blocker, major, minor, pass, and not-applicable findings |
 Distinctiveness verdict:
@@ -166,6 +167,8 @@ Frontend work is not accepted until the following evidence exists:
 - A frontend distinctiveness benchmark records first-screen proof, content fingerprint, reference benchmark, creative divergence, asset provenance, state proof, visual QA proof, generic-risk, and source-safety risks.
 - A product-quality scorecard records user/task fit, content specificity, visual identity, information architecture, component states, accessibility and interaction, source safety, total score, and verdict.
 - Desktop and mobile screenshots were reviewed.
+- UI detector findings were classified and blockers were resolved.
+- Authenticated or permission-state screenshots were reviewed when the changed surface requires login, roles, tenant context, or permissions.
 - Accessibility risks and component states were reviewed.
 - Visual QA tier is documented in `TESTING.md` for high-risk UI changes.
 - Baseline visual changes are approved intentionally when visual regression tooling exists.

package/templates/next-supabase/DOCS.md CHANGED Viewed

@@ -11,6 +11,8 @@ npm run dev
 Add required environment variables in `.env.example`. Never place real secrets in docs.
+Run `agent-kit setup` to open Agent Office. The office and wizard show **Agentic Engineering Level** (L3–L6 computed from audit/adapter/context signals). See `LOOP_CODING.md` and kit `DOCS.md` for climb patterns. Do not confuse Agentic L5/L6 with audit readiness or visual QA tiers.
 ## Architecture Overview
 Document:
@@ -40,6 +42,8 @@ Document primary workflows, including:
 - Planning and core-change handoffs from `AGENT_ROSTER.md`
 - Tool-specific assistant activation from `ASSISTANT_ADAPTERS.md`
 - Runtime command validation with `agent-kit adapter validate antigravity` when Antigravity is active
+- UI improvement command workflows from `.agent-kit/prompts/ui-command-index.md`
+- Deterministic UI detector and acceptance review from `.agent-kit/checklists/ui-detectors.md` and `.agent-kit/checklists/ui-acceptance-rubric.md`
 - Model-selection setup, enforcement status, and limitations from `MODEL_ROUTING.md`
 - Council-session evidence capture from `COUNCIL.md`
 - Upgrade review, conflict handling, migration review, and rollback evidence from `UPGRADE.md`
@@ -55,7 +59,9 @@ Document primary workflows, including:
 - Data creation and update workflow
 - Deployment workflow
-Runtime command files are adapters only. Native commands such as `/plan`, `/security`, `/frontend`, `/copy`, `/handoff`, `/audit`, `/setup`, `/upgrade`, and `/ship` should point back to `AGENTS.md`, `.agent-kit/agent-roster.json`, `QUALITY_GATES.md`, `.agent-kit/skills/`, and Agent Studio evidence.
+Runtime command files are adapters only. Native commands such as `/plan`, `/security`, `/frontend`, `/ui-audit`, `/ui-polish`, `/layout-cleanup`, `/responsive-cleanup`, `/accessibility-pass`, `/distinctiveness-pass`, `/screenshot-critique`, `/browser-qa`, `/copy`, `/handoff`, `/audit`, `/setup`, `/upgrade`, and `/ship` should point back to `AGENTS.md`, `.agent-kit/agent-roster.json`, `QUALITY_GATES.md`, `.agent-kit/skills/`, and Agent Studio evidence.
+High-risk UI work must include desktop and mobile screenshots plus authenticated or permission-state evidence when the surface requires login, roles, tenant context, or permissions.
 ## Integration Points

package/templates/next-supabase/LOOP_CODING.md ADDED Viewed

@@ -0,0 +1,98 @@
+# Loop Coding
+Loop coding means the agent repeats **plan → act → check → fix** until a stop condition, instead of finishing in one chat turn. The Agent Kit is opinionated about **which loops are safe** and **which checkpoints must stay in place**.
+This document describes loop types, kit-safe patterns, and limits.
+## Loop Types
+| Loop type | What it means | Kit-safe version |
+| --- | --- | --- |
+| **Agent loop** | Same agent iterates on feedback until done | Use scoped prompts (for example `.agent-kit/prompts/implement-feature.md`); review each turn; do not remove Security Reviewer or QA gates |
+| **Eval-driven loop** | Code changes until **tests, audit, or evals pass** | `npm test` + `agent-kit audit` + CI |
+| **Self-improving loop** | Agent critiques its own output and revises | Manual: delegate to `@qa-engineer` or run tests between passes; **avoid fully unsupervised self-critique on auth, RLS, or release tooling** |
+| **Council / team loop** | Planner → specialist → Security → QA handoffs | `agent-kit session handoff` + IDE subagents — the kit's core operating model |
+| **Background / overnight loop** | Runs without a human present | **Defer by default** — requires worktree policy, cost caps, kill switches, and stronger eval gates than agent freedom |
+## Practical Rule
+Climb maturity by adding **checkpoints** (tests, audit, guards, human review), not by removing them. Unsupervised loops are only healthy when **eval gates are stronger than the agent's freedom**.
+## Eval-Driven PR Loop (recommended)
+For feature work:
+1. **Plan** — Planner classifies scope; Lead Architect maps affected layers when the change is core.
+2. **Implement** — Next.js / Supabase engineers (or general agent with council rules loaded).
+3. **Check** — run the smallest reliable gate set:
+   ```bash
+   npm test
+   agent-kit audit --min-readiness baseline-setup
+   agent-kit adapter validate all   # when IDE surfaces change
+   ```
+4. **Fix** — repeat implement/check until green or blocked on a documented gap.
+5. **Record** — `agent-kit session render` and mirror summary in `COUNCIL.md` for meaningful multi-agent work.
+See [TESTING.md](TESTING.md) for CI gate expectations.
+## Council Loop (multi-agent)
+The default handoff order lives in `AGENTS.md` and `.agent-kit/agent-roster.json`:
+1. Planner — scope, workflow, council selection
+2. Lead Architect — core changes
+3. Domain engineers — data, UI, copy as needed
+4. Security Reviewer — auth, mutations, secrets, dependencies, release risk
+5. QA Engineer — behavior evidence
+6. Documentation Maintainer — living docs and council record
+Use Agent Studio when the CLI is available:
+```bash
+agent-kit session start --workflow core-change --request "Short title"
+agent-kit session handoff --from planner --to lead-architect --decision "..." --risk "..." --next "..." --evidence "..."
+agent-kit session verify --command "npm test" --result pass
+agent-kit session output phased-checklist --status complete --evidence "..."
+agent-kit session render
+```
+When CLI tooling is unavailable, append the session template in `COUNCIL.md`.
+## Hooks And Local Automation (Level 6 enablers)
+The kit does **not** ship unsupervised orchestration. It documents safe local enablers:
+| Pattern | Purpose | Starting point |
+| --- | --- | --- |
+| Pre-commit test or audit | Catch drift before commit | `.agent-kit/prompts/audit-project-setup.md`, project `npm test` |
+| Post-edit lint/typecheck | Fast feedback on save | Project ESLint / `tsc --noEmit` in editor or CI |
+| PR CI audit gate | Block merge below readiness | `.github/workflows/agent-kit-audit.yml` template |
+| Adapter validate on PR | Prove IDE activation stays valid | `agent-kit adapter validate all` when adapter assets change |
+For Cursor-specific hook/automation patterns, see Cursor Automations docs and keep Planner-first triage **opt-in** — never as a replacement for Security Reviewer or human release approval.
+## MCP Routing (delegation hint)
+Match MCP servers to council roles:
+| Role | Typical MCP use |
+| --- | --- |
+| Supabase/Postgres Engineer | Schema, migrations, RLS, logs, advisors |
+| Security Reviewer | Dependency/advisory checks; no broad production writes without review |
+| Deployment/Observability Engineer | Hosting logs, release status, error tracking |
+| QA Engineer | Test runners, visual diff tools where configured |
+Record active MCP surfaces in `ASSISTANT_ADAPTERS.md` when they affect council behavior.
+## What Not To Default
+- Overnight unsupervised agent runs on auth, RLS, or release tooling
+- Agents managing agents without eval harness and kill switches
+- Removing human review from publish, migration, or security-sensitive paths
+## Related Docs
+- `AGENTS.md` — council roles and default handoffs
+- `QUALITY_GATES.md` — Baseline / Strong / Mature evidence tiers
+- `COUNCIL.md` — session evidence template
+- `TESTING.md` — project test and CI gate expectations

package/templates/next-supabase/QUALITY_GATES.md CHANGED Viewed

@@ -56,7 +56,7 @@ Best-practice means evidence can survive handoff, release, and later audit.
 - Multi-agent work has local Agent Studio evidence: context loaded, corrections considered, decisions and handoffs recorded, required outputs tracked, artifacts linked, verification captured, and rendered Markdown current.
 - Supabase RLS policies are inventory-backed, least-privilege, and tested for cross-user or cross-tenant access.
 - Production readiness covers Next.js routing/rendering, caching, error boundaries, metadata, accessibility, performance, security headers, and Core Web Vitals evidence.
-- Frontend work starts from brand/content intake, reference-set review, anti-references, and creative-direction options, then proves first-screen proof, content fingerprint, asset provenance, product-quality scorecard, distinctiveness, desktop, mobile, key states, keyboard flow, and visual QA evidence.
+- Frontend work starts from brand/content intake, reference-set review, anti-references, and creative-direction options, then proves first-screen proof, content fingerprint, asset provenance, product-quality scorecard, distinctiveness, UI detector findings, desktop, mobile, key states, keyboard flow, and visual QA evidence.
 - Public-facing and conversion-facing copy starts from discovery questions, audience, pain, outcome, differentiator, proof, objections, voice/tone, and CTA hierarchy, with unsupported claims marked as assumptions.
 - Test evidence includes the smallest useful unit/regression checks plus critical-path smoke coverage.
 - Release evidence includes install or production smoke, migration order, dependency audit, package or deployment verification, logs, and rollback notes.
@@ -71,7 +71,7 @@ Best-practice means evidence can survive handoff, release, and later audit.
 | Planning or roadmap | Planner, Documentation Maintainer | Updated roadmap or checklist with owner, status, and acceptance evidence |
 | Core architecture | Planner, Lead Architect, QA, Docs | Affected-layer map, preserved contracts, tests, updated `SPEC.md` or `DECISIONS.md` |
 | Supabase/Auth/RLS | Lead Architect, Supabase/Postgres Engineer, Security Reviewer, QA | Migration notes, RLS inventory, negative authorization test, rollback risk |
-| Frontend/UI | Frontend Design Lead, QA, Docs | Brand/content intake, reference-set evidence, design critique verdict, distinctiveness benchmark, product-quality scorecard, creative direction, component states, accessibility, desktop/mobile visual QA |
+| Frontend/UI | Frontend Design Lead, QA, Docs | Brand/content intake, reference-set evidence, design critique verdict, distinctiveness benchmark, product-quality scorecard, creative direction, UI detector severity findings, component states, accessibility, desktop/mobile visual QA, authenticated screen evidence when applicable |
 | Marketing/copy | Marketing Copy Lead, Frontend Design Lead, QA, Docs | `MESSAGING.md`, audience and pain, value proposition, proof, objections, voice/tone, CTA hierarchy, risky-claim review |
 | Security-sensitive | Security Reviewer, Lead Architect, QA | OWASP review, boundary validation, dependency/secret review, regression or smoke evidence |
 | Release/package | Deployment/Observability Engineer, Security Reviewer, QA, Docs | Release gate output, dependency audit, install/deploy smoke, provenance or publish identity evidence |
@@ -83,6 +83,8 @@ Best-practice means evidence can survive handoff, release, and later audit.
 - A checklist item is not done until the evidence is linked or named.
 - A test is not evidence unless it covers the behavior, risk, or contract being claimed.
 - A screenshot is not visual QA unless it covers the important viewport, state, and content.
+- A UI detector pass is not complete until blockers, majors, minors, accepted exceptions, screenshots, viewport, auth state, and data state are named.
+- A high-risk UI change is not accepted while blocker detector findings remain or authenticated workflow evidence is missing.
 - A research finding is not a best practice until it is promoted into templates, skills, checklists, audit checks, tests, release gates, or documented decisions.
 - A runtime command is not canonical policy; it is accepted only when it wraps `AGENTS.md`, `.agent-kit/agent-roster.json`, `QUALITY_GATES.md`, canonical skills, and Agent Studio evidence.
 - A fresh install can be baseline setup while still warning on `TBD`, example rows, or starter instruction text; those placeholders must be replaced before claiming strong or best-practice maturity.

package/templates/next-supabase/SKILLS.md CHANGED Viewed

@@ -108,6 +108,20 @@ Required checks:
 - Use the matching `.agent-kit/design-briefs/*` brief for SaaS, admin, marketplace, content, tool, ecommerce, portfolio/venue, education, community/social, or AI workflow surfaces.
 - Review final desktop and mobile screenshots with `.agent-kit/prompts/screenshot-review.md`.
+## UI Improvement Harness
+Use for operational UI audit, polish, layout cleanup, responsive cleanup, accessibility pass, screenshot critique, visual distinctiveness pass, and live browser QA loops.
+Required checks:
+- Use `.agent-kit/prompts/ui-command-index.md` to pick the workflow: UI audit, UI polish, layout cleanup, responsive cleanup, accessibility pass, distinctiveness pass, screenshot critique, or browser QA.
+- Run `.agent-kit/checklists/ui-detectors.md` and classify findings as blocker, major, minor, pass, or not applicable.
+- Apply `.agent-kit/checklists/ui-acceptance-rubric.md` before release.
+- Require desktop and mobile screenshot evidence for meaningful UI changes.
+- Require authenticated or permission-state evidence for protected app screens.
+- Block release when blocker detector findings remain.
+- Fix major findings or document accepted exceptions before high-risk UI changes ship.
+- Record route, viewport, auth state, data state, screenshots, detector findings, and residual risks.
 ## Content-First Creative Direction
 Use before designing or changing a user-facing site, product screen, dashboard, tool, marketplace, content experience, ecommerce flow, portfolio, venue page, education product, community surface, or AI workflow UI.

package/templates/next-supabase/SPEC.md CHANGED Viewed

@@ -25,6 +25,7 @@ List behavior that must be preserved during changes:
 - Agent council routing in `.agent-kit/agent-roster.json`
 - Model profile routing in `MODEL_ROUTING.md` and `.agent-kit/model-routing.json`
 - Optional runtime adapter commands and portable `SKILL.md` wrappers, when activated
+- UI improvement command workflows, detector severity findings, and acceptance rubric when frontend work is in scope
 - Council-session evidence in `COUNCIL.md`
 - Agent, council-session, model-routing, and audit-report schema contracts in `.agent-kit/schemas/`
 - Planner default ownership for planning and Lead Architect review for core changes
@@ -70,7 +71,7 @@ Record the current maturity target and evidence.
 | Architecture | TBD | TBD | TBD | Affected-layer map, `DECISIONS.md` |
 | Supabase/RLS | TBD | TBD | TBD | RLS inventory, migration tests |
 | Messaging | TBD | TBD | TBD | `MESSAGING.md`, proof map, objection handling, CTA hierarchy |
-| Frontend | TBD | TBD | TBD | `DESIGN.md`, reference-set evidence, design critique verdict, product-quality scorecard, screenshots, visual QA |
+| Frontend | TBD | TBD | TBD | `DESIGN.md`, reference-set evidence, design critique verdict, product-quality scorecard, UI detector findings, screenshots, visual QA |
 | Testing | TBD | TBD | TBD | Unit, regression, smoke, visual evidence |
 | Release | TBD | TBD | TBD | `DEPLOYMENT.md`, logs, rollback notes |
@@ -83,6 +84,8 @@ Record the current maturity target and evidence.
 - Audience, pain, desired outcome, differentiator, proof, objections, voice, and CTA hierarchy are documented before public-facing or conversion-facing copy is accepted.
 - Reference set, anti-references, source-safety notes, and design critique verdict are documented before accepting significant frontend work.
 - Frontend product-quality scorecard is documented before accepting significant frontend work.
+- UI detector findings are classified before accepting meaningful audit, polish, layout, responsive, accessibility, screenshot, distinctiveness, or browser QA work.
+- Authenticated or permission-state screenshots are reviewed when the changed surface requires login, roles, tenant context, or permissions.
 - First screens show the real product, task, object, content, or workflow.
 ## Brand And Content Inventory
@@ -102,6 +105,7 @@ Track the inputs that make the UI specific to this product.
 | Chosen creative direction | TBD | Creative-direction matrix and screenshots |
 | Design critique verdict | TBD | `DESIGN.md`, critique-gate review |
 | Visual QA tier | TBD | `TESTING.md`, Storybook, Playwright report, visual-regression service, or screenshot artifacts |
+| UI detector evidence | TBD | `.agent-kit/checklists/ui-detectors.md`, `.agent-kit/checklists/ui-acceptance-rubric.md`, browser QA notes |
 ## Component And State Inventory

package/templates/next-supabase/STYLE_GUIDE.md CHANGED Viewed

@@ -51,7 +51,9 @@ Use `.agent-kit/prompts/design-critique-gate.md` before accepting significant fr
 Use `.agent-kit/prompts/frontend-distinctiveness-benchmark.md` before accepting significant frontend work. `DESIGN.md` should prove first-screen specificity, content fingerprint, reference benchmark, asset provenance, state proof, and visual QA proof so a design cannot pass while remaining interchangeable with another product in the same category.
-Use `.agent-kit/prompts/frontend-product-quality-scorecard.md` before accepting significant frontend work. `DESIGN.md` should score user/task fit, content specificity, visual identity, information architecture, component states, accessibility and interaction, and source safety. Reject work with critical zeroes or a total score below `10/14`; reserve best-practice claims for `12/14` or higher with desktop/mobile and visual QA evidence.
+Use `.agent-kit/prompts/frontend-product-quality-scorecard.md` before accepting significant frontend work. `DESIGN.md` should score user/task fit, content specificity, visual identity, information architecture, component states, accessibility and interaction, and source safety. Reject work with critical zeroes or a total score below `10/14`; reserve best-practice claims for `12/14` or higher with desktop/mobile, authenticated screen evidence when applicable, UI detector findings, and visual QA evidence.
+Use `.agent-kit/prompts/ui-command-index.md`, `.agent-kit/checklists/ui-detectors.md`, and `.agent-kit/checklists/ui-acceptance-rubric.md` for UI audit, polish, layout cleanup, responsive cleanup, accessibility pass, distinctiveness pass, screenshot critique, and browser QA loops.
 ## Messaging And Copy Rules

package/templates/next-supabase/TESTING.md CHANGED Viewed

@@ -9,6 +9,7 @@ Testing should be proportional to risk. Auth, data mutations, payments, admin ac
 - Integration tests for API, Server Actions, and Supabase interactions where practical.
 - Playwright smoke tests for auth and critical user workflows.
 - Visual QA for important user-facing screens and reusable component states.
+- UI detector review for audit, polish, layout, responsive, accessibility, screenshot, distinctiveness, and browser QA workflows.
 - Runtime adapter validation for plugin manifests, native commands, portable `SKILL.md` wrappers, source-of-truth references, package allowlists, and secret safety.
 ## Critical Smoke Paths
@@ -35,6 +36,8 @@ Choose the smallest reliable visual QA tier for the project:
 Required rules:
 - Capture default, loading, empty, error, disabled, success, permission-denied, and mobile states where relevant.
+- Run `.agent-kit/checklists/ui-detectors.md` for meaningful UI audit or polish work and classify blocker, major, minor, pass, and not-applicable findings.
+- High-risk UI changes require desktop and mobile screenshots plus authenticated or permission-state evidence when the workflow requires login, tenant context, roles, or permissions.
 - Stabilize dynamic data, animations, dates, avatars, generated media, and third-party widgets before visual comparison.
 - Review baseline updates as product changes; do not auto-accept visual diffs without rationale.
 - Keep accessibility, semantic, keyboard, auth, and data-boundary tests separate from visual checks.
@@ -53,8 +56,19 @@ Recommended baseline:
 - `agent-kit audit --min-readiness baseline-setup`
 - Playwright smoke tests for critical paths
 - Visual QA evidence for high-risk UI changes
+- UI detector findings and accepted exceptions for meaningful UI changes
 - `agent-kit adapter validate antigravity` and `agent-kit package validate` when adapter/package assets change
+### Eval-driven PR loop
+Repeat **implement → check → fix** until tests and audit pass. See the kit's [LOOP_CODING.md](LOOP_CODING.md) for loop types, council handoffs, and safe automation limits. Minimum check commands:
+```bash
+npm test
+agent-kit audit --min-readiness baseline-setup
+agent-kit adapter validate all   # when IDE or adapter assets change
+```
 ## Security-Focused Tests
 Prioritize: