@amityco/social-plus-vise 0.14.6 → 0.14.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,21 @@ All notable changes to `@amityco/social-plus-vise` are documented in this file.
4
4
 
5
5
  The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
6
 
7
+ ## 0.14.8 — 2026-06-05
8
+
9
+ ### Added
10
+ - **Platform capability availability preflight:** `vise plan` now checks the bundled SDK surface before feed-forwarding product capabilities, reports available/unavailable/unknown capabilities, filters optional feed choices to features available on the selected platform, and ignores unsupported optional selections during `init`.
11
+ - **End-to-end product-flow smoke:** `npm run validate` now includes a local temp-project flow covering design extraction, plan feed-forward, blocking intake, answered init, capability check, design conformance, and sensor discovery in one scenario.
12
+
13
+ ## 0.14.7 — 2026-06-05
14
+
15
+ ### Fixed
16
+ - **Android broad-social intake:** requests that combine feed/posts/comments with chat, profile, followers, or following now surface the blocking `feature_surface` question instead of collapsing silently to a single feed contract.
17
+ - **Android rich feed rendering:** `android.feed.post-datatype-handled` now rejects placeholder labels such as `[Image post]`, `[Poll post]`, or `[text post]` when the renderer does not resolve the actual SDK media/poll/clip/room data.
18
+ - **Android text-only composer gap:** new `android.feed.rich-post-composer-surfaced` rule flags feed composers that only call `createTextPost` without implementing or explicitly scoping out image, video, file, poll, clip, room, livestream, or mixed-attachment creation.
19
+ - **Android comment tray state gap:** new `android.comments.thread-ui-states-present` rule flags paged comment trays that show an empty list before handling loading/error states.
20
+ - **Android chat/profile regressions:** new `android.chat.unread-visible`, `android.chat.sort-explicit`, and `android.profile.social-counts-from-sdk` rules flag missing unread badges, implicit message ordering, and follower/following placeholder counts.
21
+
7
22
  ## 0.14.6 — 2026-06-05
8
23
 
9
24
  **Theme:** Intake questions must reach the human before implementation.
package/README.md CHANGED
@@ -45,7 +45,7 @@ See [Usage Flow](#usage-flow) for the full step-by-step diagram.
45
45
 
46
46
  Instead of just providing a CLI or AI skills, Vise implements a technique called **Agentic Workflow Governance**. Think of it as a customer-project integration harness: the governed build loop runs inside the target repo, grounded in the real project, real docs, and real validation signals.
47
47
 
48
- Vise wraps your local coding agents in compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 300 platform-specific compliance rules, checks the generated UI against the customer's design system, gates narrow baseline capabilities so nothing mandatory is silently dropped, offers richer feed capabilities as explicit opt-ins, and runs your project's own build/lint/typecheck sensors. **Your source code never leaves your machine.**
48
+ Vise wraps your local coding agents in compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 300+ platform-specific compliance rules, checks the generated UI against the customer's design system, gates narrow baseline capabilities so nothing mandatory is silently dropped, offers richer feed capabilities as explicit opt-ins, and runs your project's own build/lint/typecheck sensors. **Your source code never leaves your machine.**
49
49
 
50
50
  At a glance, Vise sits between the user's prompt and the agent's code changes. The agent still edits the app; Vise turns the request into a grounded plan, records the local contract, and keeps checking until the integration is ready to ship.
51
51
 
@@ -54,7 +54,7 @@ flowchart LR
54
54
  Prompt["User prompt<br/>Add a social.plus feature"] --> Skill["AI skill<br/>drives the loop"]
55
55
  Skill --> Inspect["Inspect project<br/>platform, app surface,<br/>design signals"]
56
56
  Inspect --> Plan["Plan<br/>outcome, docs,<br/>intake questions"]
57
- Plan --> Design["Design + completeness<br/>tokens, feature checklist,<br/>explicit opt-outs"]
57
+ Plan --> Design["Design + capability preflight<br/>preview confirmation,<br/>feature checklist"]
58
58
  Design --> Build["Agent builds<br/>edits customer code locally"]
59
59
  Build --> Check["Vise check<br/>SDK compliance gate"]
60
60
  Check -->|findings| Build
@@ -75,8 +75,8 @@ Vise validates on three layers, and the layer is set by the *kind of claim* —
75
75
 
76
76
  | Layer | Claim | How | Enforcement |
77
77
  |---|---|---|---|
78
- | **SDK compliance** | "this is **wrong**" | 300 deterministic rules (session renewal, live-collection vs one-shot, no secret in logs, parent-child rendering, ban-state gating…) | **Hard gate** — `vise check` blocks until green or attested. A small advisory subset surfaces as informational only and never blocks. |
79
- | **Design conformance** | "this **looks off**" | extract the customer's design system into a contract, then check token usage | **Advisory** — `vise design check`/`preview`; never fails a build |
78
+ | **SDK compliance** | "this is **wrong**" | 300+ deterministic rules (session renewal, live-collection vs one-shot, no secret in logs, parent-child rendering, ban-state gating…) | **Hard gate** — `vise check` blocks until green or attested. A small advisory subset surfaces as informational only and never blocks. |
79
+ | **Design conformance** | "this **looks off**" | extract the customer's design system into a contract, render a preview for confirmation, then check token usage | **Advisory** — `vise design check`/`preview`; never fails a build |
80
80
  | **Feature completeness** | "this is **missing**" | Vise proposes a narrow baseline per outcome; for add-feed, pagination is mandatory, while richer feed capabilities are opt-in choices from `vise plan` | **Decision gate** — `vise check` exits `completeness-gap` until each baseline capability is built or validly opted out; selected optional capabilities run separate sensors |
81
81
 
82
82
  Correctness is gated by deterministic rules or attestations. Baseline completeness is gated by explicit scope decisions: if a baseline capability is legitimately out of scope, record `// vise: scope-omit <id> — <reason>` and it no longer blocks. Optional feed capabilities such as image upload, poll creation, and edit post are offered during planning and become checked only after the user opts in. Conformance remains advisory because "matches the brand" is legitimately subjective. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).
@@ -127,7 +127,7 @@ Required customer or agent actions:
127
127
 
128
128
  ### Design-conformant UI
129
129
 
130
- Vise can ingest the customer's aesthetic into a **design contract** and guide generation to match it — from an HTML/CSS prototype (`vise design extract`) or from the host app's own design system across web + Android + Flutter + iOS (`vise design extract --from-project`: CSS vars/Tailwind/token modules, `colors.xml`, Flutter `Color(0x…)`, iOS `.colorset`/Swift). `vise design check` reports token conformance; `vise design preview` writes a visual review; `vise design reference` generates a full visual design-system spec (swatches, type samples, component demos). All advisory.
130
+ Vise can ingest the customer's aesthetic into a **design contract** and guide generation to match it — from an HTML/CSS prototype (`vise design extract`) or from the host app's own design system across web + Android + Flutter + iOS (`vise design extract --from-project`: CSS vars/Tailwind/token modules, `colors.xml`, Flutter `Color(0x…)`, iOS `.colorset`/Swift). `vise design extract` writes both `sp-vise/design-contract.json` and `sp-vise/design-preview.html`; `vise plan`/`init` withhold design feed-forward until the preview is explicitly accepted with `--answer design_contract_confirmation=yes`. If the preview is rejected (`=no`), Vise asks for a replacement design source and records no design-conformance claim. `vise design check` reports token conformance; `vise design preview` writes a visual review; `vise design reference` generates a full visual design-system spec (swatches, type samples, component demos). All advisory.
131
131
 
132
132
  **For social.plus-specific styling:** `vise design init-tokens` scaffolds `src/styles/social-plus-tokens.css` in your project — a dedicated token file for social.plus features that you can edit independently from your main app's design system. Greenfield projects get sensible `--sp-*` defaults; brownfield projects get their existing token values seeded in. Edit the file, run `vise design extract --from-project` to refresh the contract, and future agent builds inherit the updated palette — no AI agent needed in the update loop.
133
133
 
@@ -141,91 +141,63 @@ A bench vise holds the workpiece steady so the craftsman's hands are free to sha
141
141
 
142
142
  ---
143
143
 
144
- ## Benchmark: Phase 1 Results
144
+ ## Benchmark Results: Current Claim
145
145
 
146
- > **The compliance gaps agents ship on their own, they close under Vise's check loop.**
147
- > Across two capable coding agents (Cursor / Composer 2.5 and Claude Sonnet 4.6), the features with *secondary* compliance requirements — Chat, Moderation, Push — failed without Vise and passed with it; both agents reached 9/9 with Vise. This is a **strong directional signal at N=1 per cell, not a settled statistical finding.** The [Commune paper](docs/commune-paper-2026-05-30.md) is the full, honest version — methodology, per-cell results, threats to validity, and a complementary bug-fix benchmark where Vise showed *no* advantage.
146
+ > **Vise gives AI coding agents a governed workflow for social.plus integrations, improving feature completeness, SDK compliance, and design consistency in greenfield work.**
148
147
 
149
- ### What "delivered correctly" means
148
+ The strongest current claim is not a universal speed or quality promise. It is narrower and more useful: when agents build greenfield social.plus SDK features, the Vise workflow makes scope explicit, checks local code, and turns missing capabilities or SDK violations into concrete repair work before the agent stops.
150
149
 
151
- "Correct" doesn't just mean the code compiles. It means every feature handles the edge cases that matter to real users and real moderation teams:
150
+ ### Latest headline: feed completeness
152
151
 
153
- - A **banned user** cannot type or submit a post the send button is hidden, not just disabled-on-submit
154
- - **Push notification preferences** are wired to the Amity API so users who opt out actually stop receiving notifications
155
- - **Moderation actions** (report, flag, block) are surfaced in the UI so users can act on them, not buried in a hook
156
- - **Chat and feed queries** use live, reactive subscriptions — not one-time fetches that go stale
152
+ The latest Vise 0.14.5 opt-in comparison is the headline product proof. Same feed request, same SDK docs, no Vise workflow in the baseline; the Vise arm explicitly selected `feed_optional_capabilities=post-image-upload,post-poll-creation,post-edit`, persisted that choice into `sp-vise/compliance.json`, and activated the selected sensors.
157
153
 
158
- Without Vise, AI agents frequently implement the primary feature correctly but miss these secondary requirements. They know about them in the abstract but when building a chat screen, "ban state" feels out of scope and gets skipped. `vise check` turns that vague awareness into a specific, actionable finding.
154
+ | Agent / model | Docs-only baseline | Vise 0.14.5 opt-in arm | Readout |
155
+ |---|---:|---:|---|
156
+ | Cursor / Composer 2.5 | 30% (3.3/11 avg) | **97% (32/33)** | One seed surfaced the remaining item instead of silently dropping it. |
157
+ | Claude / Sonnet 4.6 medium | 27% (3.0/11 avg) | **100% (33/33)** | All three Vise seeds reached 11/11. |
158
+ | Codex / GPT-5.4 medium | 21% (2.3/11 avg) | **100% (33/33)** | All three Vise seeds reached 11/11. |
159
159
 
160
- ### The experiment: three conditions, nine features
160
+ Aggregate: **98/99 expected feed capabilities** and **27/27 selected optional capabilities** implemented across the latest Vise arm. See the full table and per-seed grader links in [`benchmarks/CURSOR_VISE_0.14.5_RESULTS.html`](benchmarks/CURSOR_VISE_0.14.5_RESULTS.html).
161
161
 
162
- We ran a controlled experiment — the **Commune Benchmark** — to measure not just *whether* Vise helps, but *why*. Each of the nine features below was built from scratch by an AI agent under three independent conditions:
162
+ ### Supporting proof
163
163
 
164
- **Nine features built:**
165
- SDK setup · User presence · Social feed · Events · Chat & DMs · Push notifications · User profile · Content moderation · Comments
166
-
167
- | Condition | What the agent had | The question it answers |
164
+ | Surface | Safe claim | Evidence |
168
165
  |---|---|---|
169
- | **Pure MCP** | Access to social.plus docs only no compliance guidance | Baseline: how well does the agent do on its own? |
170
- | **Rules-as-Markdown** | The full 1,013-line compliance rulebook pasted directly into the prompt | Is the problem just that the agent doesn't know the rules? |
171
- | **Vise + Skill** | Full Vise CLI `vise check` runs automatically, agent reads specific findings, fixes them, repeats until green | Does an active feedback loop change the outcome? |
172
-
173
- The Rules-as-Markdown condition is the key isolation: if the agent already knows all the rules, does giving it the spec document fix the problem? The answer turned out to be **no** — knowing the rules and being forced to act on specific findings are different things.
166
+ | **Feature completeness** | Vise helps agents build more of the expected SDK capability surface. | Latest comparison: **21-30% without Vise vs 97-100% with Vise 0.14.5**. Earlier pre-registered Capability Matrix Row 2 also shipped a feature-completeness win: silently dropped items fell from 7.67/11 to 4.0/11. |
167
+ | **SDK compliance** | Vise checks catch greenfield SDK compliance gaps that docs or static guidance can miss. | Commune benchmark: Vise averaged **100% greenfield SDK compliance** where docs/RAG-style controls averaged **67%** across the reported slices. |
168
+ | **Design conformance** | Vise design checks reduce design drift under ambiguous briefs. | Ambiguous Spotify-style design test: Vise design runs produced **0 / 0 / 0 hex literals** across three seeds; without Vise, runs varied **0 / 2 / 15**. This supports variance reduction, not pixel-perfect visual quality. |
174
169
 
175
- ### Results features delivered without production gaps
170
+ ### Why the workflow works
176
171
 
177
- | Coding agent (model) | Pure MCP | Rules-as-Markdown | Vise + Skill |
178
- |---|---|---|---|
179
- | **Cursor (Composer 2.5)** | 6 out of 9 ✗ | 5 out of 9 ✗ | **9 out of 9 ✅** |
180
- | **Claude Code (Sonnet 4.6)** | 6 out of 9 ✗ | 7 out of 9 ✗ | **9 out of 9 ✅** |
172
+ The benchmark story is the product flow:
181
173
 
182
- The three features that consistently fail without Vise — **Chat**, **Moderation**, and **Push Notifications** are exactly the ones with secondary compliance requirements (ban-state, report affordances, Amity preference API). `vise check` catches these with a specific finding; the rules doc does not.
174
+ 1. **Inspect** — Vise detects platform, app surface, SDK surface, sensors, and design signals from the local repo.
175
+ 2. **Plan** — Vise classifies the outcome, asks blocking intake questions, surfaces capability availability, and offers optional feed capabilities only when the platform SDK surface supports them.
176
+ 3. **Confirm design** — `vise design extract` writes a preview; `plan`/`init` withhold design feed-forward until the user confirms `design_contract_confirmation=yes`.
177
+ 4. **Initialize** — `vise init` records the resolved compliance contract, intake answers, selected optional capabilities, inspection result, and accepted design digest.
178
+ 5. **Build / check / repair** — the agent edits locally, runs `vise check`, fixes deterministic findings, resolves completeness gaps or selected-capability failures, and then runs project sensors.
183
179
 
184
- Both agents reached 9/9 with Vise. The Rules-as-Markdown arm did **not** reliably beat the plain-docs control — 5/9 on Cursor (*below* control) and 7/9 on Sonnet — and at N=1 per cell neither gap is distinguishable from noise. The robust, reproducible signal is narrower and mechanistic: **Chat and Moderation never pass under either control arm, and always pass under Vise.** Passes were scored by a deterministic grader, not by hand see [Reproducibility & honest caveats](#reproducibility--honest-caveats) for what that grader does and doesn't establish.
180
+ Static docs can tell an agent what exists. Vise changes the stopping condition: the agent is not done until the local contract is green, attested, or blocked on explicit customer input.
185
181
 
186
- ### Why it matters
182
+ ### How to read the evidence
187
183
 
188
- A failing feature without Vise is *invisible* until a user hits it: the code compiles, the demo works, and the ban-state gap surfaces only when a banned user posts. Vise turns that latent gap into a specific finding the agent fixes before you ship. (We did not separately measure remediation effort, so this makes no rework-cost claim — only that the gaps are real and silent without a checker.)
184
+ The benchmark suite is intentionally reported with boundaries:
189
185
 
190
- ### Reproducibility & honest caveats
186
+ - **Latest feed-completeness comparison** is the current product claim for the opt-in capability flow in Vise 0.14.5. It is a best-case/opt-in comparison across Cursor, Claude, and Codex, not a universal result for every prompt.
187
+ - **Capability Matrix v1** remains the pre-registered follow-up. It shipped the Row 2 feature-completeness claim, found **no Row 1 SDK-compliance claim** on chat/moderation/push under its registered margin, and withheld the Row 3 design claim on a technicality despite higher by-name token use.
188
+ - **Commune Phase 1** remains useful historical evidence for the compliance loop: two agents reached 9/9 with Vise vs 5-7/9 under controls, but it was N=1 per cell and the grader overlaps Vise's own rules.
189
+ - **Design tests** support design-drift reduction and token cleanup. They do not prove visual taste, pixel perfection, or production-ready UI without human review.
190
+ - **Negative results must travel with the claim:** no measured Vise advantage on day-2 bug fixing; the push slice exposed a non-converging attestation loop when docs and SDK disagreed; earlier enumerative plan-time design guidance measured negative and was retracted; the original `scope-omit` affordance went unused in the matrix.
191
191
 
192
- - **Scoring is deterministic — and it overlaps with what Vise enforces.** Each cell is graded on four dimensions: `vise check --ci` (the same compliance ruleset), the project's own sensors (build / typecheck / lint), and hand-authored string-inclusion acceptance patterns. Because the metric overlaps Vise's own rules — and only the Vise arm iterates against that checker — read the headline as "Vise's checks pass," not as a fully independent oracle. The acceptance patterns are literal string matching (not AST), so they involve authoring judgment.
193
- - **Vise-arm passes were deterministic-pass**, not attestation exceptions — agents fixed the code. (The grader applies a narrow, *symmetric* auto-attestation for absence / type-stub findings across **all** arms including the controls; it cannot satisfy the acceptance patterns, so it does not tilt the result toward Vise.)
194
- - **Three arms, separate tooling.** The Rules-as-Markdown arm has no Vise checker available — it cannot run `vise check`.
195
- - **Built from scratch** (greenfield seed), capable models with prior SDK familiarity. A complementary **bug-fix** benchmark showed **no Vise advantage** — the loop helps on greenfield integration, not local bug hunts.
196
- - **N=1 per cell.** A strong directional signal (the Chat/Moderation/Push mechanism reproduces across both models), **not** a statistically settled finding.
197
- - **Follow-up evidence cuts both ways.** The pre-registered [Capability Matrix](#benchmark-capability-matrix-v1-pre-registered) (n=3, arm-independent grading, different briefs) found **no Row 1 claim** on chat/moderation/push — chat and moderation tied, push +1, below the registered margin — and a new, registered win on feature completeness instead. Read the two together, not Phase 1 alone.
198
- - **Full methodology, per-cell analysis, and threats to validity:** [the Commune paper](docs/commune-paper-2026-05-30.md). The [`benchmarks/FINDINGS.html`](benchmarks/FINDINGS.html) and [`benchmarks/RULES_AS_MARKDOWN.html`](benchmarks/RULES_AS_MARKDOWN.html) files are **summary report tables**, not raw transcripts or workspace diffs.
192
+ Full evidence: [`benchmarks/CURSOR_VISE_0.14.5_RESULTS.html`](benchmarks/CURSOR_VISE_0.14.5_RESULTS.html), [`benchmarks/capability-matrix/RESULTS.md`](benchmarks/capability-matrix/RESULTS.md), [`benchmarks/commune/RESULTS.md`](benchmarks/commune/RESULTS.md), and [`benchmarks/brand/design-test/RESULTS.md`](benchmarks/brand/design-test/RESULTS.md).
199
193
 
200
194
  ### Which mode should I use?
201
195
 
202
196
  | If you… | Use | Why |
203
197
  |---|---|---|
204
- | Building new social features with an AI agent | **Vise CLI + Skill** | The mode that closed every secondary-compliance gap in our benchmark |
205
- | Auditing existing social.plus code | `vise check --ci` | Grades any codebase against the full ruleset |
206
- | Enforcing compliance in a CI pipeline | `vise check --ci` | Exits non-zero on failures; structured JSON output for logs |
207
-
208
- ---
209
-
210
- ## Benchmark: Capability Matrix v1 (pre-registered)
211
-
212
- > One row won, one tied, one was withheld on a technicality. The registered protocol requires all three to be reported with equal prominence — so here they are.
213
-
214
- Phase 1 measured one thing (secondary-compliance gaps) under a harness that overlaps Vise's own ruleset. The **Capability Matrix** is the stricter follow-up: a [pre-registered protocol](benchmarks/capability-matrix/PROTOCOL.md) frozen before any cell ran (7 dated amendments, each committed before the data it governs), **firewalled grading instruments** (authors barred from Vise's rules and config — [authorship record](benchmarks/capability-matrix/AUTHORSHIP.md)), **blind structural judges**, and 27 fresh isolated cells: n=3 seeds per cell, Cursor / Composer 2.5, docs-only control vs Vise loop, identical offline docs bundle in both arms. Full numbers, per-behavior provenance, and the rig-integrity record: [RESULTS.md](benchmarks/capability-matrix/RESULTS.md).
215
-
216
- | Capability claim | Registered outcome | What the data says |
217
- |---|---|---|
218
- | **Feature completeness** — open feed request scored against an 11-item firewalled ground truth | ✅ **Claim ships** (won 3/3 seed pairs and the mean) | The Vise loop roughly **halved silently-dropped SDK capabilities: 4.0 vs 7.67 of 11** — by *building more of the SDK surface*, not by surfacing trade-offs. Mechanism not isolated: plan-time capability enumeration, persistence against the check gate, and ~1.8× time-on-task are all live candidates. |
219
- | **SDK compliance** — chat / moderation / push slices | ➖ **No claim** (7/9 vs 6/9; push +1, below the registered ≥+2 margin) | Brief-explicit behaviors tie, as pre-registered. One level down the defects are *symmetric*: 2/3 control cells shipped ban gates reading a field that doesn't exist on the real SDK (compiles, never fires; 0/3 Vise cells) — while 2/3 Vise cells over-parameterized `targetType` and never bound the brief's value, plausibly steered by Vise's own rule. |
220
- | **Design conformance** — ambiguous brief, brand-token usage | ⚠️ **Withheld on a technicality** | By-name token usage **+18.1 points** for the contract+check loop (90.3 vs 72.2 mean) ✓ — but the second registered condition (hex literals strictly *lower*) tied at 0–0, so the claim is withheld and reported descriptively. Vise controls are reused from an earlier round (cross-temporal). |
221
-
222
- **Registered negative results** (the protocol requires these in any publication of the matrix):
223
-
224
- - **Push stop-condition non-convergence.** Where docs and SDK disagree (push registration — the docs themselves carry a gap warning), "iterate until `vise check` is green" did not converge within the 30-minute cap in *any* Vise cell: agents looped on attestation dialect. Shipped behavior still passed 3/3 — the cost is wall-clock, not correctness. (2/3 control cells also hit the cap doing SDK archaeology.)
225
- - **Scope-omit affordance unused.** No agent in any cell used the `// vise: scope-omit` surfacing marker or wrote a qualifying scope note. The advisory surfacing mechanism, as shipped in 0.14.1, influenced zero cells.
226
- - Unchanged from prior rounds: **no Vise advantage on day-2 bug fixes**; **enumerative plan-time design guidance** measured negative twice and was retracted in 0.14.1.
227
-
228
- All Capability Matrix findings are directional at n=3 under one model and one executor — not settled statistics.
198
+ | Building new social features with an AI agent | **Vise CLI + Skill** | This is the full governed workflow measured by the benchmarks. |
199
+ | Auditing existing social.plus code | `vise check --ci` | Grades any codebase against the recorded compliance contract. |
200
+ | Enforcing compliance in CI | `vise check --ci` | Exits non-zero on deterministic failures, missing baseline capabilities, failed selected optional sensors, blockers, or drift. |
229
201
 
230
202
  ---
231
203
 
@@ -239,7 +211,7 @@ All Capability Matrix findings are directional at n=3 under one model and one ex
239
211
  | **Android (Kotlin)** | ✅ Full | Gradle assemble, unit tests |
240
212
  | **iOS (Swift)** | ✅ Full | Static rule checks fully operational. Build sensor not wired (`xcodebuild` environment requirements make it fragile) — `vise run-sensors` returns no-sensors for iOS; compliance rules run regardless. |
241
213
 
242
- Each platform has 52–54 rules across 10 compliance domains (feed, comments, moderation, chat, secrets, session & auth, notifications, live objects, logging hygiene, design tokens).
214
+ Each platform has dozens of rules across 10 compliance domains (feed, comments, moderation, chat, secrets, session & auth, notifications, live objects, logging hygiene, design tokens).
243
215
 
244
216
  ---
245
217
 
@@ -285,23 +257,23 @@ The skill is plain Markdown; you can read it any time with `vise print-skill`.
285
257
  flowchart LR
286
258
  A[User asks AI agent<br/>'Add a social feed'] --> B[Agent runs<br/>vise inspect]
287
259
  B --> C[Agent runs<br/>vise plan --request &quot;...&quot;]
288
- C --> D{Intake<br/>questions?}
289
- D -->|Yes| E[Agent asks user<br/>for feed target,<br/>design source, etc.]
260
+ C --> D{Blocking intake<br/>or design preview?}
261
+ D -->|Yes| E[Agent surfaces questions<br/>and opens design preview<br/>for user confirmation]
290
262
  D -->|No| F
291
- E --> F[Agent runs<br/>vise init<br/>with answers]
263
+ E --> F[Agent reruns plan/init<br/>with answers]
292
264
  F --> G[Agent runs<br/>vise search-docs<br/>vise get-doc-page]
293
265
  G --> H[Agent edits<br/>your code]
294
266
  H --> I[Agent runs<br/>vise check]
295
267
  I --> J{Green?}
296
- J -->|No, fixable| K[Agent fixes<br/>findings]
268
+ J -->|No, fixable| K[Agent fixes<br/>findings, completeness gaps,<br/>or selected sensors]
297
269
  K --> I
298
270
  J -->|No, attest| L[Agent calls<br/>vise attest with<br/>evidence]
299
271
  L --> I
300
272
  J -->|Yes| M[Agent runs<br/>vise run-sensors]
301
- M --> N[Done.<br/>sp-vise/ contract<br/>committed]
273
+ M --> N[Done.<br/>sp-vise/ contract<br/>and evidence committed]
302
274
  ```
303
275
 
304
- The flow above is what the skill teaches your AI agent. You — the human — drive intent and approve edits; the agent runs Vise commands deterministically; Vise grounds the agent in real docs and real compliance rules. If blocking intake is still unresolved, `vise init` refuses to initialize, returns `status: "needs-clarification"`, and exits 7 so the agent must surface the questions instead of guessing.
276
+ The flow above is what the skill teaches your AI agent. You — the human — drive intent, answer scope questions, and approve the design preview before Vise feeds a design contract forward. The agent runs Vise commands deterministically; Vise grounds the agent in real docs and real compliance rules. If blocking intake or design confirmation is still unresolved, `vise init` refuses to initialize, returns `status: "needs-clarification"`, and exits 7 so the agent must surface the questions instead of guessing.
305
277
 
306
278
  ---
307
279
 
@@ -325,14 +297,14 @@ The flow above is what the skill teaches your AI agent. You — the human — dr
325
297
 
326
298
  | Command | Purpose |
327
299
  |---|---|
328
- | `vise design extract <prototypePath> [--repo .] [--no-write]` | Read an HTML/CSS prototype and write a graded `sp-vise/design-contract.json` (declared CSS custom properties become exact tokens; repeated literals become inferred/advisory tokens; single-use literals are dropped) so generated social.plus UI can match the customer's aesthetic |
329
- | `vise design extract --from-project [path] [--no-write]` | No external prototype? Derive the contract from the host project's **own** design system — CSS custom properties (incl. shadcn `:root` and Tailwind v4 `@theme`), TS/JS token modules, inline tailwind configs, **Android** `colors.xml`/`dimens.xml`, **Flutter** `Color(0x…)`, and **iOS** `.xcassets/*.colorset` + Swift `Color(hex:)`/`Color(red:g:b:)`. Reference values (`var()`/`theme()`/`calc()`) are skipped, so a var-mapped config contributes nothing rather than wrong tokens |
300
+ | `vise design extract <prototypePath> [--repo .] [--no-write]` | Read an HTML/CSS prototype and write a graded `sp-vise/design-contract.json` plus `sp-vise/design-preview.html` (declared CSS custom properties become exact tokens; repeated literals become inferred/advisory tokens; single-use literals are dropped) so generated social.plus UI can match the customer's aesthetic after preview confirmation |
301
+ | `vise design extract --from-project [path] [--no-write]` | No external prototype? Derive the contract from the host project's **own** design system and write a preview — CSS custom properties (incl. shadcn `:root` and Tailwind v4 `@theme`), TS/JS token modules, inline tailwind configs, **Android** `colors.xml`/`dimens.xml`, **Flutter** `Color(0x…)`, and **iOS** `.xcassets/*.colorset` + Swift `Color(hex:)`/`Color(red:g:b:)`. Reference values (`var()`/`theme()`/`calc()`) are skipped, so a var-mapped config contributes nothing rather than wrong tokens |
330
302
  | `vise design check [path]` | Advisory, **non-blocking** report on how closely the UI code matches the contract (token coverage + on/off-contract color literals). Never fails a build and is **not** a `vise check` gate |
331
303
  | `vise design preview [path] [--reference <prototype>]` | Write a self-contained `sp-vise/design-preview.html`: the contract's tokens as visual swatches + the conformance report + the HTML reference embedded for side-by-side review. Vise renders the artifact; a human/VLM judges the visual match. Dependency-free — **not** an automated pixel diff |
332
304
  | `vise design reference [path] [--title <name>]` | Write a self-contained `sp-vise/design-reference.html`: human/VLM-readable design-system spec — token swatches, type samples, component demos, and a growth-layer summary. Pairs with `design-contract.json` (machine-readable). Use `--title` to name the design system (e.g. `--title Streamly`). Advisory — **not** an enforcement gate |
333
305
  | `vise design init-tokens [path] [--force]` | Scaffold `src/styles/social-plus-tokens.css` — the dedicated, customer-editable token file for social.plus features. **Greenfield:** neutral defaults (full `--sp-*` token set). **Brownfield:** seeded from your existing concrete tokens. Idempotent — never overwrites an existing file (use `--force` to override). After editing, run `vise design extract --from-project` to refresh the contract |
334
306
 
335
- The extracted contract is **advisory input for generation**, not an enforcement gate: a token-poor prototype yields a weaker — never wrong — contract, and absence of a prototype simply means no contract (the existing `*.design.reuse-detected-tokens` rules still cover reuse of a host project's own design system).
307
+ The extracted contract is **advisory input for generation**, not an enforcement gate: a token-poor prototype yields a weaker — never wrong — contract, and absence of a prototype simply means no contract (the existing `*.design.reuse-detected-tokens` rules still cover reuse of a host project's own design system). A contract becomes feed-forward guidance only after the user confirms the preview with `--answer design_contract_confirmation=yes`; rejecting it with `=no` asks for a replacement design source and records no design-conformance digest.
336
308
 
337
309
  ### Documentation grounding & Troubleshooting
338
310
 
@@ -466,11 +438,12 @@ After a successful `vise init`, your project gets a `sp-vise/` directory. If ini
466
438
 
467
439
  | File | Created by | What it contains |
468
440
  |---|---|---|
469
- | `sp-vise/compliance.json` | `vise init` | The rules selected for this integration, the Vise version, the ruleset digest, the target app surface, and an optional engagement link. |
470
- | `sp-vise/intake.json` | `vise init` | The request, outcome, intake answers, remaining blocking count, and any retrospective `--allow-unresolved-intake` acknowledgement. |
441
+ | `sp-vise/compliance.json` | `vise init` | The rules selected for this integration, the Vise version, the ruleset digest, the target app surface, selected optional capabilities, optional engagement link, and an accepted design-contract digest when confirmed. |
442
+ | `sp-vise/intake.json` | `vise init` | The request, outcome, intake answers, remaining blocking count, design-review status (`absent`, `needs-confirmation`, `accepted`, or `rejected`), and any retrospective `--allow-unresolved-intake` acknowledgement. |
471
443
  | `sp-vise/attestations/*.json` | `vise sync` (deterministic) or `vise attest` (host-agent / human) | Per-rule evidence: signer, confidence, rationale, cited files (with source fingerprints for drift detection). |
472
444
  | `sp-vise/inspection.json` | `vise init` | The platform, monorepo surface, and design-token signals detected at init time. |
473
445
  | `sp-vise/design-contract.json` | `vise design extract` | The extracted design contract: declared tokens, breakpoints, advisory components, source file digests (for freshness detection), and a stable digest over design facts. |
446
+ | `sp-vise/design-preview.html` | `vise design extract` or `vise design preview` | Self-contained visual review of the design contract, embedded prototype when available, token swatches, and design-check conformance summary. Open this before answering `design_contract_confirmation`. |
474
447
  | `sp-vise/design-reference.html` | `vise design reference` | Self-contained HTML design-system spec (token swatches, type samples, components). Human/VLM-readable; open in a browser alongside the app. |
475
448
  | `sp-vise/engagement.json` | `vise engagement init` (optional) | Contractual scope: tier, customer ID, contracted outcomes, reviewer assignment. |
476
449