npm - @amityco/social-plus-vise - Versions diffs - 0.14.6 → 0.14.8 - Mend

@amityco/social-plus-vise 0.14.6 → 0.14.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/CHANGELOG.md +15 -0
package/README.md +53 -80
package/dist/capabilities.js +289 -11
package/dist/outcomes.js +51 -3
package/dist/tools/compliance.js +82 -16
package/dist/tools/design.js +43 -8
package/dist/tools/integration.js +87 -17
package/dist/tools/project.js +162 -3
package/package.json +3 -2
package/rules/chat.yaml +79 -5
package/rules/feed.yaml +111 -1

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,21 @@ All notable changes to `@amityco/social-plus-vise` are documented in this file.
 The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## 0.14.8 — 2026-06-05
+### Added
+- **Platform capability availability preflight:** `vise plan` now checks the bundled SDK surface before feed-forwarding product capabilities, reports available/unavailable/unknown capabilities, filters optional feed choices to features available on the selected platform, and ignores unsupported optional selections during `init`.
+- **End-to-end product-flow smoke:** `npm run validate` now includes a local temp-project flow covering design extraction, plan feed-forward, blocking intake, answered init, capability check, design conformance, and sensor discovery in one scenario.
+## 0.14.7 — 2026-06-05
+### Fixed
+- **Android broad-social intake:** requests that combine feed/posts/comments with chat, profile, followers, or following now surface the blocking `feature_surface` question instead of collapsing silently to a single feed contract.
+- **Android rich feed rendering:** `android.feed.post-datatype-handled` now rejects placeholder labels such as `[Image post]`, `[Poll post]`, or `[text post]` when the renderer does not resolve the actual SDK media/poll/clip/room data.
+- **Android text-only composer gap:** new `android.feed.rich-post-composer-surfaced` rule flags feed composers that only call `createTextPost` without implementing or explicitly scoping out image, video, file, poll, clip, room, livestream, or mixed-attachment creation.
+- **Android comment tray state gap:** new `android.comments.thread-ui-states-present` rule flags paged comment trays that show an empty list before handling loading/error states.
+- **Android chat/profile regressions:** new `android.chat.unread-visible`, `android.chat.sort-explicit`, and `android.profile.social-counts-from-sdk` rules flag missing unread badges, implicit message ordering, and follower/following placeholder counts.
 ## 0.14.6 — 2026-06-05
 **Theme:** Intake questions must reach the human before implementation.

package/README.md CHANGED Viewed

@@ -45,7 +45,7 @@ See [Usage Flow](#usage-flow) for the full step-by-step diagram.
 Instead of just providing a CLI or AI skills, Vise implements a technique called **Agentic Workflow Governance**. Think of it as a customer-project integration harness: the governed build loop runs inside the target repo, grounded in the real project, real docs, and real validation signals.
-Vise wraps your local coding agents in compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 300 platform-specific compliance rules, checks the generated UI against the customer's design system, gates narrow baseline capabilities so nothing mandatory is silently dropped, offers richer feed capabilities as explicit opt-ins, and runs your project's own build/lint/typecheck sensors. **Your source code never leaves your machine.**
+Vise wraps your local coding agents in compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 300+ platform-specific compliance rules, checks the generated UI against the customer's design system, gates narrow baseline capabilities so nothing mandatory is silently dropped, offers richer feed capabilities as explicit opt-ins, and runs your project's own build/lint/typecheck sensors. **Your source code never leaves your machine.**
 At a glance, Vise sits between the user's prompt and the agent's code changes. The agent still edits the app; Vise turns the request into a grounded plan, records the local contract, and keeps checking until the integration is ready to ship.
@@ -54,7 +54,7 @@ flowchart LR
     Prompt["User prompt<br/>Add a social.plus feature"] --> Skill["AI skill<br/>drives the loop"]
     Skill --> Inspect["Inspect project<br/>platform, app surface,<br/>design signals"]
     Inspect --> Plan["Plan<br/>outcome, docs,<br/>intake questions"]
-    Plan --> Design["Design + completeness<br/>tokens, feature checklist,<br/>explicit opt-outs"]
+    Plan --> Design["Design + capability preflight<br/>preview confirmation,<br/>feature checklist"]
     Design --> Build["Agent builds<br/>edits customer code locally"]
     Build --> Check["Vise check<br/>SDK compliance gate"]
     Check -->|findings| Build
@@ -75,8 +75,8 @@ Vise validates on three layers, and the layer is set by the *kind of claim* —
 | Layer | Claim | How | Enforcement |
 |---|---|---|---|
-| **SDK compliance** | "this is **wrong**" | 300 deterministic rules (session renewal, live-collection vs one-shot, no secret in logs, parent-child rendering, ban-state gating…) | **Hard gate** — `vise check` blocks until green or attested. A small advisory subset surfaces as informational only and never blocks. |
-| **Design conformance** | "this **looks off**" | extract the customer's design system into a contract, then check token usage | **Advisory** — `vise design check`/`preview`; never fails a build |
+| **SDK compliance** | "this is **wrong**" | 300+ deterministic rules (session renewal, live-collection vs one-shot, no secret in logs, parent-child rendering, ban-state gating…) | **Hard gate** — `vise check` blocks until green or attested. A small advisory subset surfaces as informational only and never blocks. |
+| **Design conformance** | "this **looks off**" | extract the customer's design system into a contract, render a preview for confirmation, then check token usage | **Advisory** — `vise design check`/`preview`; never fails a build |
 | **Feature completeness** | "this is **missing**" | Vise proposes a narrow baseline per outcome; for add-feed, pagination is mandatory, while richer feed capabilities are opt-in choices from `vise plan` | **Decision gate** — `vise check` exits `completeness-gap` until each baseline capability is built or validly opted out; selected optional capabilities run separate sensors |
 Correctness is gated by deterministic rules or attestations. Baseline completeness is gated by explicit scope decisions: if a baseline capability is legitimately out of scope, record `// vise: scope-omit <id> — <reason>` and it no longer blocks. Optional feed capabilities such as image upload, poll creation, and edit post are offered during planning and become checked only after the user opts in. Conformance remains advisory because "matches the brand" is legitimately subjective. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).
@@ -127,7 +127,7 @@ Required customer or agent actions:
 ### Design-conformant UI
-Vise can ingest the customer's aesthetic into a **design contract** and guide generation to match it — from an HTML/CSS prototype (`vise design extract`) or from the host app's own design system across web + Android + Flutter + iOS (`vise design extract --from-project`: CSS vars/Tailwind/token modules, `colors.xml`, Flutter `Color(0x…)`, iOS `.colorset`/Swift). `vise design check` reports token conformance; `vise design preview` writes a visual review; `vise design reference` generates a full visual design-system spec (swatches, type samples, component demos). All advisory.
+Vise can ingest the customer's aesthetic into a **design contract** and guide generation to match it — from an HTML/CSS prototype (`vise design extract`) or from the host app's own design system across web + Android + Flutter + iOS (`vise design extract --from-project`: CSS vars/Tailwind/token modules, `colors.xml`, Flutter `Color(0x…)`, iOS `.colorset`/Swift). `vise design extract` writes both `sp-vise/design-contract.json` and `sp-vise/design-preview.html`; `vise plan`/`init` withhold design feed-forward until the preview is explicitly accepted with `--answer design_contract_confirmation=yes`. If the preview is rejected (`=no`), Vise asks for a replacement design source and records no design-conformance claim. `vise design check` reports token conformance; `vise design preview` writes a visual review; `vise design reference` generates a full visual design-system spec (swatches, type samples, component demos). All advisory.
 **For social.plus-specific styling:** `vise design init-tokens` scaffolds `src/styles/social-plus-tokens.css` in your project — a dedicated token file for social.plus features that you can edit independently from your main app's design system. Greenfield projects get sensible `--sp-*` defaults; brownfield projects get their existing token values seeded in. Edit the file, run `vise design extract --from-project` to refresh the contract, and future agent builds inherit the updated palette — no AI agent needed in the update loop.
@@ -141,91 +141,63 @@ A bench vise holds the workpiece steady so the craftsman's hands are free to sha
 ---
-## Benchmark: Phase 1 Results
+## Benchmark Results: Current Claim
-> **The compliance gaps agents ship on their own, they close under Vise's check loop.**
-> Across two capable coding agents (Cursor / Composer 2.5 and Claude Sonnet 4.6), the features with *secondary* compliance requirements — Chat, Moderation, Push — failed without Vise and passed with it; both agents reached 9/9 with Vise. This is a **strong directional signal at N=1 per cell, not a settled statistical finding.** The [Commune paper](docs/commune-paper-2026-05-30.md) is the full, honest version — methodology, per-cell results, threats to validity, and a complementary bug-fix benchmark where Vise showed *no* advantage.
+> **Vise gives AI coding agents a governed workflow for social.plus integrations, improving feature completeness, SDK compliance, and design consistency in greenfield work.**
-### What "delivered correctly" means
+The strongest current claim is not a universal speed or quality promise. It is narrower and more useful: when agents build greenfield social.plus SDK features, the Vise workflow makes scope explicit, checks local code, and turns missing capabilities or SDK violations into concrete repair work before the agent stops.
-"Correct" doesn't just mean the code compiles. It means every feature handles the edge cases that matter to real users and real moderation teams:
+### Latest headline: feed completeness
-- A **banned user** cannot type or submit a post — the send button is hidden, not just disabled-on-submit
-- **Push notification preferences** are wired to the Amity API so users who opt out actually stop receiving notifications
-- **Moderation actions** (report, flag, block) are surfaced in the UI so users can act on them, not buried in a hook
-- **Chat and feed queries** use live, reactive subscriptions — not one-time fetches that go stale
+The latest Vise 0.14.5 opt-in comparison is the headline product proof. Same feed request, same SDK docs, no Vise workflow in the baseline; the Vise arm explicitly selected `feed_optional_capabilities=post-image-upload,post-poll-creation,post-edit`, persisted that choice into `sp-vise/compliance.json`, and activated the selected sensors.
-Without Vise, AI agents frequently implement the primary feature correctly but miss these secondary requirements. They know about them in the abstract — but when building a chat screen, "ban state" feels out of scope and gets skipped. `vise check` turns that vague awareness into a specific, actionable finding.
+| Agent / model | Docs-only baseline | Vise 0.14.5 opt-in arm | Readout |
+|---|---:|---:|---|
+| Cursor / Composer 2.5 | 30% (3.3/11 avg) | **97% (32/33)** | One seed surfaced the remaining item instead of silently dropping it. |
+| Claude / Sonnet 4.6 medium | 27% (3.0/11 avg) | **100% (33/33)** | All three Vise seeds reached 11/11. |
+| Codex / GPT-5.4 medium | 21% (2.3/11 avg) | **100% (33/33)** | All three Vise seeds reached 11/11. |
-### The experiment: three conditions, nine features
+Aggregate: **98/99 expected feed capabilities** and **27/27 selected optional capabilities** implemented across the latest Vise arm. See the full table and per-seed grader links in [`benchmarks/CURSOR_VISE_0.14.5_RESULTS.html`](benchmarks/CURSOR_VISE_0.14.5_RESULTS.html).
-We ran a controlled experiment — the **Commune Benchmark** — to measure not just *whether* Vise helps, but *why*. Each of the nine features below was built from scratch by an AI agent under three independent conditions:
+### Supporting proof
-**Nine features built:**
-SDK setup · User presence · Social feed · Events · Chat & DMs · Push notifications · User profile · Content moderation · Comments
-| Condition | What the agent had | The question it answers |
+| Surface | Safe claim | Evidence |
 |---|---|---|
-| **Pure MCP** | Access to social.plus docs only — no compliance guidance | Baseline: how well does the agent do on its own? |
-| **Rules-as-Markdown** | The full 1,013-line compliance rulebook pasted directly into the prompt | Is the problem just that the agent doesn't know the rules? |
-| **Vise + Skill** | Full Vise CLI — `vise check` runs automatically, agent reads specific findings, fixes them, repeats until green | Does an active feedback loop change the outcome? |
-The Rules-as-Markdown condition is the key isolation: if the agent already knows all the rules, does giving it the spec document fix the problem? The answer turned out to be **no** — knowing the rules and being forced to act on specific findings are different things.
+| **Feature completeness** | Vise helps agents build more of the expected SDK capability surface. | Latest comparison: **21-30% without Vise vs 97-100% with Vise 0.14.5**. Earlier pre-registered Capability Matrix Row 2 also shipped a feature-completeness win: silently dropped items fell from 7.67/11 to 4.0/11. |
+| **SDK compliance** | Vise checks catch greenfield SDK compliance gaps that docs or static guidance can miss. | Commune benchmark: Vise averaged **100% greenfield SDK compliance** where docs/RAG-style controls averaged **67%** across the reported slices. |
+| **Design conformance** | Vise design checks reduce design drift under ambiguous briefs. | Ambiguous Spotify-style design test: Vise design runs produced **0 / 0 / 0 hex literals** across three seeds; without Vise, runs varied **0 / 2 / 15**. This supports variance reduction, not pixel-perfect visual quality. |
-### Results — features delivered without production gaps
+### Why the workflow works
-| Coding agent (model) | Pure MCP | Rules-as-Markdown | Vise + Skill |
-|---|---|---|---|
-| **Cursor (Composer 2.5)** | 6 out of 9 ✗ | 5 out of 9 ✗ | **9 out of 9 ✅** |
-| **Claude Code (Sonnet 4.6)** | 6 out of 9 ✗ | 7 out of 9 ✗ | **9 out of 9 ✅** |
+The benchmark story is the product flow:
-The three features that consistently fail without Vise — **Chat**, **Moderation**, and **Push Notifications** — are exactly the ones with secondary compliance requirements (ban-state, report affordances, Amity preference API). `vise check` catches these with a specific finding; the rules doc does not.
+1. **Inspect** — Vise detects platform, app surface, SDK surface, sensors, and design signals from the local repo.
+2. **Plan** — Vise classifies the outcome, asks blocking intake questions, surfaces capability availability, and offers optional feed capabilities only when the platform SDK surface supports them.
+3. **Confirm design** — `vise design extract` writes a preview; `plan`/`init` withhold design feed-forward until the user confirms `design_contract_confirmation=yes`.
+4. **Initialize** — `vise init` records the resolved compliance contract, intake answers, selected optional capabilities, inspection result, and accepted design digest.
+5. **Build / check / repair** — the agent edits locally, runs `vise check`, fixes deterministic findings, resolves completeness gaps or selected-capability failures, and then runs project sensors.
-Both agents reached 9/9 with Vise. The Rules-as-Markdown arm did **not** reliably beat the plain-docs control — 5/9 on Cursor (*below* control) and 7/9 on Sonnet — and at N=1 per cell neither gap is distinguishable from noise. The robust, reproducible signal is narrower and mechanistic: **Chat and Moderation never pass under either control arm, and always pass under Vise.** Passes were scored by a deterministic grader, not by hand — see [Reproducibility & honest caveats](#reproducibility--honest-caveats) for what that grader does and doesn't establish.
+Static docs can tell an agent what exists. Vise changes the stopping condition: the agent is not done until the local contract is green, attested, or blocked on explicit customer input.
-### Why it matters
+### How to read the evidence
-A failing feature without Vise is *invisible* until a user hits it: the code compiles, the demo works, and the ban-state gap surfaces only when a banned user posts. Vise turns that latent gap into a specific finding the agent fixes before you ship. (We did not separately measure remediation effort, so this makes no rework-cost claim — only that the gaps are real and silent without a checker.)
+The benchmark suite is intentionally reported with boundaries:
-### Reproducibility & honest caveats
+- **Latest feed-completeness comparison** is the current product claim for the opt-in capability flow in Vise 0.14.5. It is a best-case/opt-in comparison across Cursor, Claude, and Codex, not a universal result for every prompt.
+- **Capability Matrix v1** remains the pre-registered follow-up. It shipped the Row 2 feature-completeness claim, found **no Row 1 SDK-compliance claim** on chat/moderation/push under its registered margin, and withheld the Row 3 design claim on a technicality despite higher by-name token use.
+- **Commune Phase 1** remains useful historical evidence for the compliance loop: two agents reached 9/9 with Vise vs 5-7/9 under controls, but it was N=1 per cell and the grader overlaps Vise's own rules.
+- **Design tests** support design-drift reduction and token cleanup. They do not prove visual taste, pixel perfection, or production-ready UI without human review.
+- **Negative results must travel with the claim:** no measured Vise advantage on day-2 bug fixing; the push slice exposed a non-converging attestation loop when docs and SDK disagreed; earlier enumerative plan-time design guidance measured negative and was retracted; the original `scope-omit` affordance went unused in the matrix.
-- **Scoring is deterministic — and it overlaps with what Vise enforces.** Each cell is graded on four dimensions: `vise check --ci` (the same compliance ruleset), the project's own sensors (build / typecheck / lint), and hand-authored string-inclusion acceptance patterns. Because the metric overlaps Vise's own rules — and only the Vise arm iterates against that checker — read the headline as "Vise's checks pass," not as a fully independent oracle. The acceptance patterns are literal string matching (not AST), so they involve authoring judgment.
-- **Vise-arm passes were deterministic-pass**, not attestation exceptions — agents fixed the code. (The grader applies a narrow, *symmetric* auto-attestation for absence / type-stub findings across **all** arms including the controls; it cannot satisfy the acceptance patterns, so it does not tilt the result toward Vise.)
-- **Three arms, separate tooling.** The Rules-as-Markdown arm has no Vise checker available — it cannot run `vise check`.
-- **Built from scratch** (greenfield seed), capable models with prior SDK familiarity. A complementary **bug-fix** benchmark showed **no Vise advantage** — the loop helps on greenfield integration, not local bug hunts.
-- **N=1 per cell.** A strong directional signal (the Chat/Moderation/Push mechanism reproduces across both models), **not** a statistically settled finding.
-- **Follow-up evidence cuts both ways.** The pre-registered [Capability Matrix](#benchmark-capability-matrix-v1-pre-registered) (n=3, arm-independent grading, different briefs) found **no Row 1 claim** on chat/moderation/push — chat and moderation tied, push +1, below the registered margin — and a new, registered win on feature completeness instead. Read the two together, not Phase 1 alone.
-- **Full methodology, per-cell analysis, and threats to validity:** [the Commune paper](docs/commune-paper-2026-05-30.md). The [`benchmarks/FINDINGS.html`](benchmarks/FINDINGS.html) and [`benchmarks/RULES_AS_MARKDOWN.html`](benchmarks/RULES_AS_MARKDOWN.html) files are **summary report tables**, not raw transcripts or workspace diffs.
+Full evidence: [`benchmarks/CURSOR_VISE_0.14.5_RESULTS.html`](benchmarks/CURSOR_VISE_0.14.5_RESULTS.html), [`benchmarks/capability-matrix/RESULTS.md`](benchmarks/capability-matrix/RESULTS.md), [`benchmarks/commune/RESULTS.md`](benchmarks/commune/RESULTS.md), and [`benchmarks/brand/design-test/RESULTS.md`](benchmarks/brand/design-test/RESULTS.md).
 ### Which mode should I use?
 | If you… | Use | Why |
 |---|---|---|
-| Building new social features with an AI agent | **Vise CLI + Skill** | The mode that closed every secondary-compliance gap in our benchmark |
-| Auditing existing social.plus code | `vise check --ci` | Grades any codebase against the full ruleset |
-| Enforcing compliance in a CI pipeline | `vise check --ci` | Exits non-zero on failures; structured JSON output for logs |
----
-## Benchmark: Capability Matrix v1 (pre-registered)
-> One row won, one tied, one was withheld on a technicality. The registered protocol requires all three to be reported with equal prominence — so here they are.
-Phase 1 measured one thing (secondary-compliance gaps) under a harness that overlaps Vise's own ruleset. The **Capability Matrix** is the stricter follow-up: a [pre-registered protocol](benchmarks/capability-matrix/PROTOCOL.md) frozen before any cell ran (7 dated amendments, each committed before the data it governs), **firewalled grading instruments** (authors barred from Vise's rules and config — [authorship record](benchmarks/capability-matrix/AUTHORSHIP.md)), **blind structural judges**, and 27 fresh isolated cells: n=3 seeds per cell, Cursor / Composer 2.5, docs-only control vs Vise loop, identical offline docs bundle in both arms. Full numbers, per-behavior provenance, and the rig-integrity record: [RESULTS.md](benchmarks/capability-matrix/RESULTS.md).
-| Capability claim | Registered outcome | What the data says |
-|---|---|---|
-| **Feature completeness** — open feed request scored against an 11-item firewalled ground truth | ✅ **Claim ships** (won 3/3 seed pairs and the mean) | The Vise loop roughly **halved silently-dropped SDK capabilities: 4.0 vs 7.67 of 11** — by *building more of the SDK surface*, not by surfacing trade-offs. Mechanism not isolated: plan-time capability enumeration, persistence against the check gate, and ~1.8× time-on-task are all live candidates. |
-| **SDK compliance** — chat / moderation / push slices | ➖ **No claim** (7/9 vs 6/9; push +1, below the registered ≥+2 margin) | Brief-explicit behaviors tie, as pre-registered. One level down the defects are *symmetric*: 2/3 control cells shipped ban gates reading a field that doesn't exist on the real SDK (compiles, never fires; 0/3 Vise cells) — while 2/3 Vise cells over-parameterized `targetType` and never bound the brief's value, plausibly steered by Vise's own rule. |
-| **Design conformance** — ambiguous brief, brand-token usage | ⚠️ **Withheld on a technicality** | By-name token usage **+18.1 points** for the contract+check loop (90.3 vs 72.2 mean) ✓ — but the second registered condition (hex literals strictly *lower*) tied at 0–0, so the claim is withheld and reported descriptively. Vise controls are reused from an earlier round (cross-temporal). |
-**Registered negative results** (the protocol requires these in any publication of the matrix):
-- **Push stop-condition non-convergence.** Where docs and SDK disagree (push registration — the docs themselves carry a gap warning), "iterate until `vise check` is green" did not converge within the 30-minute cap in *any* Vise cell: agents looped on attestation dialect. Shipped behavior still passed 3/3 — the cost is wall-clock, not correctness. (2/3 control cells also hit the cap doing SDK archaeology.)
-- **Scope-omit affordance unused.** No agent in any cell used the `// vise: scope-omit` surfacing marker or wrote a qualifying scope note. The advisory surfacing mechanism, as shipped in 0.14.1, influenced zero cells.
-- Unchanged from prior rounds: **no Vise advantage on day-2 bug fixes**; **enumerative plan-time design guidance** measured negative twice and was retracted in 0.14.1.
-All Capability Matrix findings are directional at n=3 under one model and one executor — not settled statistics.
+| Building new social features with an AI agent | **Vise CLI + Skill** | This is the full governed workflow measured by the benchmarks. |
+| Auditing existing social.plus code | `vise check --ci` | Grades any codebase against the recorded compliance contract. |
+| Enforcing compliance in CI | `vise check --ci` | Exits non-zero on deterministic failures, missing baseline capabilities, failed selected optional sensors, blockers, or drift. |
 ---
@@ -239,7 +211,7 @@ All Capability Matrix findings are directional at n=3 under one model and one ex
 | **Android (Kotlin)** | ✅ Full | Gradle assemble, unit tests |
 | **iOS (Swift)** | ✅ Full | Static rule checks fully operational. Build sensor not wired (`xcodebuild` environment requirements make it fragile) — `vise run-sensors` returns no-sensors for iOS; compliance rules run regardless. |
-Each platform has 52–54 rules across 10 compliance domains (feed, comments, moderation, chat, secrets, session & auth, notifications, live objects, logging hygiene, design tokens).
+Each platform has dozens of rules across 10 compliance domains (feed, comments, moderation, chat, secrets, session & auth, notifications, live objects, logging hygiene, design tokens).
 ---
@@ -285,23 +257,23 @@ The skill is plain Markdown; you can read it any time with `vise print-skill`.
 flowchart LR
     A[User asks AI agent<br/>'Add a social feed'] --> B[Agent runs<br/>vise inspect]
     B --> C[Agent runs<br/>vise plan --request &quot;...&quot;]
-    C --> D{Intake<br/>questions?}
-    D -->|Yes| E[Agent asks user<br/>for feed target,<br/>design source, etc.]
+    C --> D{Blocking intake<br/>or design preview?}
+    D -->|Yes| E[Agent surfaces questions<br/>and opens design preview<br/>for user confirmation]
     D -->|No| F
-    E --> F[Agent runs<br/>vise init<br/>with answers]
+    E --> F[Agent reruns plan/init<br/>with answers]
     F --> G[Agent runs<br/>vise search-docs<br/>vise get-doc-page]
     G --> H[Agent edits<br/>your code]
     H --> I[Agent runs<br/>vise check]
     I --> J{Green?}
-    J -->|No, fixable| K[Agent fixes<br/>findings]
+    J -->|No, fixable| K[Agent fixes<br/>findings, completeness gaps,<br/>or selected sensors]
     K --> I
     J -->|No, attest| L[Agent calls<br/>vise attest with<br/>evidence]
     L --> I
     J -->|Yes| M[Agent runs<br/>vise run-sensors]
-    M --> N[Done.<br/>sp-vise/ contract<br/>committed]
+    M --> N[Done.<br/>sp-vise/ contract<br/>and evidence committed]
 ```
-The flow above is what the skill teaches your AI agent. You — the human — drive intent and approve edits; the agent runs Vise commands deterministically; Vise grounds the agent in real docs and real compliance rules. If blocking intake is still unresolved, `vise init` refuses to initialize, returns `status: "needs-clarification"`, and exits 7 so the agent must surface the questions instead of guessing.
+The flow above is what the skill teaches your AI agent. You — the human — drive intent, answer scope questions, and approve the design preview before Vise feeds a design contract forward. The agent runs Vise commands deterministically; Vise grounds the agent in real docs and real compliance rules. If blocking intake or design confirmation is still unresolved, `vise init` refuses to initialize, returns `status: "needs-clarification"`, and exits 7 so the agent must surface the questions instead of guessing.
 ---
@@ -325,14 +297,14 @@ The flow above is what the skill teaches your AI agent. You — the human — dr
 | Command | Purpose |
 |---|---|
-| `vise design extract <prototypePath> [--repo .] [--no-write]` | Read an HTML/CSS prototype and write a graded `sp-vise/design-contract.json` (declared CSS custom properties become exact tokens; repeated literals become inferred/advisory tokens; single-use literals are dropped) so generated social.plus UI can match the customer's aesthetic |
-| `vise design extract --from-project [path] [--no-write]` | No external prototype? Derive the contract from the host project's **own** design system — CSS custom properties (incl. shadcn `:root` and Tailwind v4 `@theme`), TS/JS token modules, inline tailwind configs, **Android** `colors.xml`/`dimens.xml`, **Flutter** `Color(0x…)`, and **iOS** `.xcassets/*.colorset` + Swift `Color(hex:)`/`Color(red:g:b:)`. Reference values (`var()`/`theme()`/`calc()`) are skipped, so a var-mapped config contributes nothing rather than wrong tokens |
+| `vise design extract <prototypePath> [--repo .] [--no-write]` | Read an HTML/CSS prototype and write a graded `sp-vise/design-contract.json` plus `sp-vise/design-preview.html` (declared CSS custom properties become exact tokens; repeated literals become inferred/advisory tokens; single-use literals are dropped) so generated social.plus UI can match the customer's aesthetic after preview confirmation |
+| `vise design extract --from-project [path] [--no-write]` | No external prototype? Derive the contract from the host project's **own** design system and write a preview — CSS custom properties (incl. shadcn `:root` and Tailwind v4 `@theme`), TS/JS token modules, inline tailwind configs, **Android** `colors.xml`/`dimens.xml`, **Flutter** `Color(0x…)`, and **iOS** `.xcassets/*.colorset` + Swift `Color(hex:)`/`Color(red:g:b:)`. Reference values (`var()`/`theme()`/`calc()`) are skipped, so a var-mapped config contributes nothing rather than wrong tokens |
 | `vise design check [path]` | Advisory, **non-blocking** report on how closely the UI code matches the contract (token coverage + on/off-contract color literals). Never fails a build and is **not** a `vise check` gate |
 | `vise design preview [path] [--reference <prototype>]` | Write a self-contained `sp-vise/design-preview.html`: the contract's tokens as visual swatches + the conformance report + the HTML reference embedded for side-by-side review. Vise renders the artifact; a human/VLM judges the visual match. Dependency-free — **not** an automated pixel diff |
 | `vise design reference [path] [--title <name>]` | Write a self-contained `sp-vise/design-reference.html`: human/VLM-readable design-system spec — token swatches, type samples, component demos, and a growth-layer summary. Pairs with `design-contract.json` (machine-readable). Use `--title` to name the design system (e.g. `--title Streamly`). Advisory — **not** an enforcement gate |
 | `vise design init-tokens [path] [--force]` | Scaffold `src/styles/social-plus-tokens.css` — the dedicated, customer-editable token file for social.plus features. **Greenfield:** neutral defaults (full `--sp-*` token set). **Brownfield:** seeded from your existing concrete tokens. Idempotent — never overwrites an existing file (use `--force` to override). After editing, run `vise design extract --from-project` to refresh the contract |
-The extracted contract is **advisory input for generation**, not an enforcement gate: a token-poor prototype yields a weaker — never wrong — contract, and absence of a prototype simply means no contract (the existing `*.design.reuse-detected-tokens` rules still cover reuse of a host project's own design system).
+The extracted contract is **advisory input for generation**, not an enforcement gate: a token-poor prototype yields a weaker — never wrong — contract, and absence of a prototype simply means no contract (the existing `*.design.reuse-detected-tokens` rules still cover reuse of a host project's own design system). A contract becomes feed-forward guidance only after the user confirms the preview with `--answer design_contract_confirmation=yes`; rejecting it with `=no` asks for a replacement design source and records no design-conformance digest.
 ### Documentation grounding & Troubleshooting
@@ -466,11 +438,12 @@ After a successful `vise init`, your project gets a `sp-vise/` directory. If ini
 | File | Created by | What it contains |
 |---|---|---|
-| `sp-vise/compliance.json` | `vise init` | The rules selected for this integration, the Vise version, the ruleset digest, the target app surface, and an optional engagement link. |
-| `sp-vise/intake.json` | `vise init` | The request, outcome, intake answers, remaining blocking count, and any retrospective `--allow-unresolved-intake` acknowledgement. |
+| `sp-vise/compliance.json` | `vise init` | The rules selected for this integration, the Vise version, the ruleset digest, the target app surface, selected optional capabilities, optional engagement link, and an accepted design-contract digest when confirmed. |
+| `sp-vise/intake.json` | `vise init` | The request, outcome, intake answers, remaining blocking count, design-review status (`absent`, `needs-confirmation`, `accepted`, or `rejected`), and any retrospective `--allow-unresolved-intake` acknowledgement. |
 | `sp-vise/attestations/*.json` | `vise sync` (deterministic) or `vise attest` (host-agent / human) | Per-rule evidence: signer, confidence, rationale, cited files (with source fingerprints for drift detection). |
 | `sp-vise/inspection.json` | `vise init` | The platform, monorepo surface, and design-token signals detected at init time. |
 | `sp-vise/design-contract.json` | `vise design extract` | The extracted design contract: declared tokens, breakpoints, advisory components, source file digests (for freshness detection), and a stable digest over design facts. |
+| `sp-vise/design-preview.html` | `vise design extract` or `vise design preview` | Self-contained visual review of the design contract, embedded prototype when available, token swatches, and design-check conformance summary. Open this before answering `design_contract_confirmation`. |
 | `sp-vise/design-reference.html` | `vise design reference` | Self-contained HTML design-system spec (token swatches, type samples, components). Human/VLM-readable; open in a browser alongside the app. |
 | `sp-vise/engagement.json` | `vise engagement init` (optional) | Contractual scope: tier, customer ID, contracted outcomes, reviewer assignment. |