@amityco/social-plus-vise 0.14.5 → 0.14.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +33 -0
- package/README.md +96 -83
- package/dist/capabilities.js +289 -11
- package/dist/outcomes.js +51 -3
- package/dist/server.js +82 -5
- package/dist/tools/blocks.js +385 -0
- package/dist/tools/compliance.js +180 -14
- package/dist/tools/design.js +62 -11
- package/dist/tools/integration.js +99 -20
- package/dist/tools/project.js +162 -3
- package/package.json +4 -2
- package/rules/chat.yaml +79 -5
- package/rules/feed.yaml +111 -1
- package/skills/social-plus-vise/SKILL.md +4 -0
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,39 @@ All notable changes to `@amityco/social-plus-vise` are documented in this file.
|
|
|
4
4
|
|
|
5
5
|
The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
6
6
|
|
|
7
|
+
## 0.14.8 — 2026-06-05
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
- **Platform capability availability preflight:** `vise plan` now checks the bundled SDK surface before feed-forwarding product capabilities, reports available/unavailable/unknown capabilities, filters optional feed choices to features available on the selected platform, and ignores unsupported optional selections during `init`.
|
|
11
|
+
- **End-to-end product-flow smoke:** `npm run validate` now includes a local temp-project flow covering design extraction, plan feed-forward, blocking intake, answered init, capability check, design conformance, and sensor discovery in one scenario.
|
|
12
|
+
|
|
13
|
+
## 0.14.7 — 2026-06-05
|
|
14
|
+
|
|
15
|
+
### Fixed
|
|
16
|
+
- **Android broad-social intake:** requests that combine feed/posts/comments with chat, profile, followers, or following now surface the blocking `feature_surface` question instead of collapsing silently to a single feed contract.
|
|
17
|
+
- **Android rich feed rendering:** `android.feed.post-datatype-handled` now rejects placeholder labels such as `[Image post]`, `[Poll post]`, or `[text post]` when the renderer does not resolve the actual SDK media/poll/clip/room data.
|
|
18
|
+
- **Android text-only composer gap:** new `android.feed.rich-post-composer-surfaced` rule flags feed composers that only call `createTextPost` without implementing or explicitly scoping out image, video, file, poll, clip, room, livestream, or mixed-attachment creation.
|
|
19
|
+
- **Android comment tray state gap:** new `android.comments.thread-ui-states-present` rule flags paged comment trays that show an empty list before handling loading/error states.
|
|
20
|
+
- **Android chat/profile regressions:** new `android.chat.unread-visible`, `android.chat.sort-explicit`, and `android.profile.social-counts-from-sdk` rules flag missing unread badges, implicit message ordering, and follower/following placeholder counts.
|
|
21
|
+
|
|
22
|
+
## 0.14.6 — 2026-06-05
|
|
23
|
+
|
|
24
|
+
**Theme:** Intake questions must reach the human before implementation.
|
|
25
|
+
|
|
26
|
+
### Added
|
|
27
|
+
- **`vise init` now enforces unresolved blocking intake:** normal init returns `status: "needs-clarification"` and exits 7 when required planning questions are still unanswered. This prevents host agents from skipping surfaced questions and writing a compliance sidecar as if scope were resolved.
|
|
28
|
+
- **`sp-vise/intake.json`:** successful init records the request, outcome, answers, remaining blocking count, and whether unresolved intake was explicitly acknowledged.
|
|
29
|
+
- **`--allow-unresolved-intake`:** explicit bypass for retrospective benchmark/harness initialization only. The acknowledgement is recorded in `sp-vise/intake.json`; customer implementations should answer the blocking questions instead.
|
|
30
|
+
|
|
31
|
+
### Fixed
|
|
32
|
+
- **Host-project `style.css` design extraction:** `vise design extract --from-project` now detects non-theme-named CSS files that declare custom properties, so common roots like `style.css` produce strong host-project design contracts instead of empty weak contracts.
|
|
33
|
+
- **Design-source wording:** `vise plan` now labels host-project design contracts as the host app's design system, not the customer's prototype, and empty contracts tell the agent to ask for the correct design source instead of implying tokens exist.
|
|
34
|
+
|
|
35
|
+
### Docs
|
|
36
|
+
- **Skill and tool docs now require surfacing blocking intake questions** before implementation, and release smoke guidance uses `--allow-unresolved-intake` only where a disposable retrospective init is intended.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
7
40
|
## 0.14.5 — 2026-06-04
|
|
8
41
|
|
|
9
42
|
**Theme:** Optional feed capabilities become explicit opt-in sensors.
|
package/README.md
CHANGED
|
@@ -45,7 +45,7 @@ See [Usage Flow](#usage-flow) for the full step-by-step diagram.
|
|
|
45
45
|
|
|
46
46
|
Instead of just providing a CLI or AI skills, Vise implements a technique called **Agentic Workflow Governance**. Think of it as a customer-project integration harness: the governed build loop runs inside the target repo, grounded in the real project, real docs, and real validation signals.
|
|
47
47
|
|
|
48
|
-
Vise wraps your local coding agents in compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 300 platform-specific compliance rules, checks the generated UI against the customer's design system, gates narrow baseline capabilities so nothing mandatory is silently dropped, offers richer feed capabilities as explicit opt-ins, and runs your project's own build/lint/typecheck sensors. **Your source code never leaves your machine.**
|
|
48
|
+
Vise wraps your local coding agents in compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 300+ platform-specific compliance rules, checks the generated UI against the customer's design system, gates narrow baseline capabilities so nothing mandatory is silently dropped, offers richer feed capabilities as explicit opt-ins, and runs your project's own build/lint/typecheck sensors. **Your source code never leaves your machine.**
|
|
49
49
|
|
|
50
50
|
At a glance, Vise sits between the user's prompt and the agent's code changes. The agent still edits the app; Vise turns the request into a grounded plan, records the local contract, and keeps checking until the integration is ready to ship.
|
|
51
51
|
|
|
@@ -54,7 +54,7 @@ flowchart LR
|
|
|
54
54
|
Prompt["User prompt<br/>Add a social.plus feature"] --> Skill["AI skill<br/>drives the loop"]
|
|
55
55
|
Skill --> Inspect["Inspect project<br/>platform, app surface,<br/>design signals"]
|
|
56
56
|
Inspect --> Plan["Plan<br/>outcome, docs,<br/>intake questions"]
|
|
57
|
-
Plan --> Design["Design +
|
|
57
|
+
Plan --> Design["Design + capability preflight<br/>preview confirmation,<br/>feature checklist"]
|
|
58
58
|
Design --> Build["Agent builds<br/>edits customer code locally"]
|
|
59
59
|
Build --> Check["Vise check<br/>SDK compliance gate"]
|
|
60
60
|
Check -->|findings| Build
|
|
@@ -75,8 +75,8 @@ Vise validates on three layers, and the layer is set by the *kind of claim* —
|
|
|
75
75
|
|
|
76
76
|
| Layer | Claim | How | Enforcement |
|
|
77
77
|
|---|---|---|---|
|
|
78
|
-
| **SDK compliance** | "this is **wrong**" | 300 deterministic rules (session renewal, live-collection vs one-shot, no secret in logs, parent-child rendering, ban-state gating…) | **Hard gate** — `vise check` blocks until green or attested. A small advisory subset surfaces as informational only and never blocks. |
|
|
79
|
-
| **Design conformance** | "this **looks off**" | extract the customer's design system into a contract, then check token usage | **Advisory** — `vise design check`/`preview`; never fails a build |
|
|
78
|
+
| **SDK compliance** | "this is **wrong**" | 300+ deterministic rules (session renewal, live-collection vs one-shot, no secret in logs, parent-child rendering, ban-state gating…) | **Hard gate** — `vise check` blocks until green or attested. A small advisory subset surfaces as informational only and never blocks. |
|
|
79
|
+
| **Design conformance** | "this **looks off**" | extract the customer's design system into a contract, render a preview for confirmation, then check token usage | **Advisory** — `vise design check`/`preview`; never fails a build |
|
|
80
80
|
| **Feature completeness** | "this is **missing**" | Vise proposes a narrow baseline per outcome; for add-feed, pagination is mandatory, while richer feed capabilities are opt-in choices from `vise plan` | **Decision gate** — `vise check` exits `completeness-gap` until each baseline capability is built or validly opted out; selected optional capabilities run separate sensors |
|
|
81
81
|
|
|
82
82
|
Correctness is gated by deterministic rules or attestations. Baseline completeness is gated by explicit scope decisions: if a baseline capability is legitimately out of scope, record `// vise: scope-omit <id> — <reason>` and it no longer blocks. Optional feed capabilities such as image upload, poll creation, and edit post are offered during planning and become checked only after the user opts in. Conformance remains advisory because "matches the brand" is legitimately subjective. See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).
|
|
@@ -86,13 +86,48 @@ Correctness is gated by deterministic rules or attestations. Baseline completene
|
|
|
86
86
|
Vise has two deliberately separate roles:
|
|
87
87
|
|
|
88
88
|
- **Customer integration helper:** runs inside customer projects to inspect, plan, validate, and sensor-check social.plus SDK integrations.
|
|
89
|
-
- **Block Factory SDK facts provider:**
|
|
89
|
+
- **Block Factory SDK facts provider:** internal mode for social.plus Block Factory to verify SDK capabilities, symbols, and model schemas before reusable blocks are generated or released.
|
|
90
|
+
- **Block installer governance:** customer-project workflow that consumes a Block Factory registry, plans safe package/source changes, writes `sp-vise/blocks.json`, and validates installed block state.
|
|
90
91
|
|
|
91
|
-
Vise owns SDK truth and customer-project governance. social.plus Block Factory owns block contracts, package adapters, previews, conformance tests, and release readiness. See [docs/SDK_FACTS_FOR_BLOCK_FACTORY.md](docs/SDK_FACTS_FOR_BLOCK_FACTORY.md) for the internal provider-side plan.
|
|
92
|
+
Vise owns SDK truth and customer-project governance. social.plus Block Factory owns block contracts, package adapters, registry metadata, previews, conformance tests, and release readiness. See [docs/SDK_FACTS_FOR_BLOCK_FACTORY.md](docs/SDK_FACTS_FOR_BLOCK_FACTORY.md) for the internal provider-side plan.
|
|
93
|
+
|
|
94
|
+
### Block Factory user experience
|
|
95
|
+
|
|
96
|
+
When Vise is used with social.plus Block Factory, the customer experience should feel like asking an AI agent for a social capability rather than manually assembling SDK calls and UI states:
|
|
97
|
+
|
|
98
|
+
> "Add social.plus comments to this app and match my existing design system."
|
|
99
|
+
|
|
100
|
+
The agent uses Vise to turn that request into a governed install workflow:
|
|
101
|
+
|
|
102
|
+
```sh
|
|
103
|
+
vise blocks list --registry ./registry/blocks.json --format json
|
|
104
|
+
vise blocks plan . --block comments --registry ./registry/blocks.json --format json
|
|
105
|
+
vise blocks add . --block comments --registry ./registry/blocks.json --package-source npm --dry-run
|
|
106
|
+
vise blocks add . --block comments --registry ./registry/blocks.json --package-source npm --apply
|
|
107
|
+
vise blocks validate . --block comments --registry ./registry/blocks.json --format json
|
|
108
|
+
vise run-sensors --dry-run
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
What Vise does:
|
|
112
|
+
|
|
113
|
+
- Inspects the customer project and detects the target platform.
|
|
114
|
+
- Reads Block Factory registry metadata without owning block product data.
|
|
115
|
+
- Plans package, source-anchor, sidecar, design-contract, and sensor requirements.
|
|
116
|
+
- Applies changes only to package manifests, `sp-vise/blocks.json`, and source files with explicit install anchors.
|
|
117
|
+
- Returns `needs-review` when a brownfield project has no safe install anchor.
|
|
118
|
+
- Validates installed block state and detects drift after future code changes.
|
|
119
|
+
|
|
120
|
+
Required customer or agent actions:
|
|
121
|
+
|
|
122
|
+
- Choose the block id and package source.
|
|
123
|
+
- Review the dry-run plan before using `--apply`.
|
|
124
|
+
- Add explicit install anchors when Vise cannot safely edit the project.
|
|
125
|
+
- Keep `sp-vise/blocks.json` committed so future validation has state to compare.
|
|
126
|
+
- Run the target project's normal build, lint, test, and Vise sensors before shipping.
|
|
92
127
|
|
|
93
128
|
### Design-conformant UI
|
|
94
129
|
|
|
95
|
-
Vise can ingest the customer's aesthetic into a **design contract** and guide generation to match it — from an HTML/CSS prototype (`vise design extract`) or from the host app's own design system across web + Android + Flutter + iOS (`vise design extract --from-project`: CSS vars/Tailwind/token modules, `colors.xml`, Flutter `Color(0x…)`, iOS `.colorset`/Swift). `vise design check` reports token conformance; `vise design preview` writes a visual review; `vise design reference` generates a full visual design-system spec (swatches, type samples, component demos). All advisory.
|
|
130
|
+
Vise can ingest the customer's aesthetic into a **design contract** and guide generation to match it — from an HTML/CSS prototype (`vise design extract`) or from the host app's own design system across web + Android + Flutter + iOS (`vise design extract --from-project`: CSS vars/Tailwind/token modules, `colors.xml`, Flutter `Color(0x…)`, iOS `.colorset`/Swift). `vise design extract` writes both `sp-vise/design-contract.json` and `sp-vise/design-preview.html`; `vise plan`/`init` withhold design feed-forward until the preview is explicitly accepted with `--answer design_contract_confirmation=yes`. If the preview is rejected (`=no`), Vise asks for a replacement design source and records no design-conformance claim. `vise design check` reports token conformance; `vise design preview` writes a visual review; `vise design reference` generates a full visual design-system spec (swatches, type samples, component demos). All advisory.
|
|
96
131
|
|
|
97
132
|
**For social.plus-specific styling:** `vise design init-tokens` scaffolds `src/styles/social-plus-tokens.css` in your project — a dedicated token file for social.plus features that you can edit independently from your main app's design system. Greenfield projects get sensible `--sp-*` defaults; brownfield projects get their existing token values seeded in. Edit the file, run `vise design extract --from-project` to refresh the contract, and future agent builds inherit the updated palette — no AI agent needed in the update loop.
|
|
98
133
|
|
|
@@ -106,91 +141,63 @@ A bench vise holds the workpiece steady so the craftsman's hands are free to sha
|
|
|
106
141
|
|
|
107
142
|
---
|
|
108
143
|
|
|
109
|
-
## Benchmark:
|
|
110
|
-
|
|
111
|
-
> **The compliance gaps agents ship on their own, they close under Vise's check loop.**
|
|
112
|
-
> Across two capable coding agents (Cursor / Composer 2.5 and Claude Sonnet 4.6), the features with *secondary* compliance requirements — Chat, Moderation, Push — failed without Vise and passed with it; both agents reached 9/9 with Vise. This is a **strong directional signal at N=1 per cell, not a settled statistical finding.** The [Commune paper](docs/commune-paper-2026-05-30.md) is the full, honest version — methodology, per-cell results, threats to validity, and a complementary bug-fix benchmark where Vise showed *no* advantage.
|
|
144
|
+
## Benchmark Results: Current Claim
|
|
113
145
|
|
|
114
|
-
|
|
146
|
+
> **Vise gives AI coding agents a governed workflow for social.plus integrations, improving feature completeness, SDK compliance, and design consistency in greenfield work.**
|
|
115
147
|
|
|
116
|
-
|
|
148
|
+
The strongest current claim is not a universal speed or quality promise. It is narrower and more useful: when agents build greenfield social.plus SDK features, the Vise workflow makes scope explicit, checks local code, and turns missing capabilities or SDK violations into concrete repair work before the agent stops.
|
|
117
149
|
|
|
118
|
-
|
|
119
|
-
- **Push notification preferences** are wired to the Amity API so users who opt out actually stop receiving notifications
|
|
120
|
-
- **Moderation actions** (report, flag, block) are surfaced in the UI so users can act on them, not buried in a hook
|
|
121
|
-
- **Chat and feed queries** use live, reactive subscriptions — not one-time fetches that go stale
|
|
150
|
+
### Latest headline: feed completeness
|
|
122
151
|
|
|
123
|
-
|
|
152
|
+
The latest Vise 0.14.5 opt-in comparison is the headline product proof. Same feed request, same SDK docs, no Vise workflow in the baseline; the Vise arm explicitly selected `feed_optional_capabilities=post-image-upload,post-poll-creation,post-edit`, persisted that choice into `sp-vise/compliance.json`, and activated the selected sensors.
|
|
124
153
|
|
|
125
|
-
|
|
154
|
+
| Agent / model | Docs-only baseline | Vise 0.14.5 opt-in arm | Readout |
|
|
155
|
+
|---|---:|---:|---|
|
|
156
|
+
| Cursor / Composer 2.5 | 30% (3.3/11 avg) | **97% (32/33)** | One seed surfaced the remaining item instead of silently dropping it. |
|
|
157
|
+
| Claude / Sonnet 4.6 medium | 27% (3.0/11 avg) | **100% (33/33)** | All three Vise seeds reached 11/11. |
|
|
158
|
+
| Codex / GPT-5.4 medium | 21% (2.3/11 avg) | **100% (33/33)** | All three Vise seeds reached 11/11. |
|
|
126
159
|
|
|
127
|
-
|
|
160
|
+
Aggregate: **98/99 expected feed capabilities** and **27/27 selected optional capabilities** implemented across the latest Vise arm. See the full table and per-seed grader links in [`benchmarks/CURSOR_VISE_0.14.5_RESULTS.html`](benchmarks/CURSOR_VISE_0.14.5_RESULTS.html).
|
|
128
161
|
|
|
129
|
-
|
|
130
|
-
SDK setup · User presence · Social feed · Events · Chat & DMs · Push notifications · User profile · Content moderation · Comments
|
|
162
|
+
### Supporting proof
|
|
131
163
|
|
|
132
|
-
|
|
|
164
|
+
| Surface | Safe claim | Evidence |
|
|
133
165
|
|---|---|---|
|
|
134
|
-
| **
|
|
135
|
-
| **
|
|
136
|
-
| **Vise
|
|
166
|
+
| **Feature completeness** | Vise helps agents build more of the expected SDK capability surface. | Latest comparison: **21-30% without Vise vs 97-100% with Vise 0.14.5**. Earlier pre-registered Capability Matrix Row 2 also shipped a feature-completeness win: silently dropped items fell from 7.67/11 to 4.0/11. |
|
|
167
|
+
| **SDK compliance** | Vise checks catch greenfield SDK compliance gaps that docs or static guidance can miss. | Commune benchmark: Vise averaged **100% greenfield SDK compliance** where docs/RAG-style controls averaged **67%** across the reported slices. |
|
|
168
|
+
| **Design conformance** | Vise design checks reduce design drift under ambiguous briefs. | Ambiguous Spotify-style design test: Vise design runs produced **0 / 0 / 0 hex literals** across three seeds; without Vise, runs varied **0 / 2 / 15**. This supports variance reduction, not pixel-perfect visual quality. |
|
|
137
169
|
|
|
138
|
-
|
|
170
|
+
### Why the workflow works
|
|
139
171
|
|
|
140
|
-
|
|
172
|
+
The benchmark story is the product flow:
|
|
141
173
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
The three features that consistently fail without Vise — **Chat**, **Moderation**, and **Push Notifications** — are exactly the ones with secondary compliance requirements (ban-state, report affordances, Amity preference API). `vise check` catches these with a specific finding; the rules doc does not.
|
|
174
|
+
1. **Inspect** — Vise detects platform, app surface, SDK surface, sensors, and design signals from the local repo.
|
|
175
|
+
2. **Plan** — Vise classifies the outcome, asks blocking intake questions, surfaces capability availability, and offers optional feed capabilities only when the platform SDK surface supports them.
|
|
176
|
+
3. **Confirm design** — `vise design extract` writes a preview; `plan`/`init` withhold design feed-forward until the user confirms `design_contract_confirmation=yes`.
|
|
177
|
+
4. **Initialize** — `vise init` records the resolved compliance contract, intake answers, selected optional capabilities, inspection result, and accepted design digest.
|
|
178
|
+
5. **Build / check / repair** — the agent edits locally, runs `vise check`, fixes deterministic findings, resolves completeness gaps or selected-capability failures, and then runs project sensors.
|
|
148
179
|
|
|
149
|
-
|
|
180
|
+
Static docs can tell an agent what exists. Vise changes the stopping condition: the agent is not done until the local contract is green, attested, or blocked on explicit customer input.
|
|
150
181
|
|
|
151
|
-
###
|
|
182
|
+
### How to read the evidence
|
|
152
183
|
|
|
153
|
-
|
|
184
|
+
The benchmark suite is intentionally reported with boundaries:
|
|
154
185
|
|
|
155
|
-
|
|
186
|
+
- **Latest feed-completeness comparison** is the current product claim for the opt-in capability flow in Vise 0.14.5. It is a best-case/opt-in comparison across Cursor, Claude, and Codex, not a universal result for every prompt.
|
|
187
|
+
- **Capability Matrix v1** remains the pre-registered follow-up. It shipped the Row 2 feature-completeness claim, found **no Row 1 SDK-compliance claim** on chat/moderation/push under its registered margin, and withheld the Row 3 design claim on a technicality despite higher by-name token use.
|
|
188
|
+
- **Commune Phase 1** remains useful historical evidence for the compliance loop: two agents reached 9/9 with Vise vs 5-7/9 under controls, but it was N=1 per cell and the grader overlaps Vise's own rules.
|
|
189
|
+
- **Design tests** support design-drift reduction and token cleanup. They do not prove visual taste, pixel perfection, or production-ready UI without human review.
|
|
190
|
+
- **Negative results must travel with the claim:** no measured Vise advantage on day-2 bug fixing; the push slice exposed a non-converging attestation loop when docs and SDK disagreed; earlier enumerative plan-time design guidance measured negative and was retracted; the original `scope-omit` affordance went unused in the matrix.
|
|
156
191
|
|
|
157
|
-
|
|
158
|
-
- **Vise-arm passes were deterministic-pass**, not attestation exceptions — agents fixed the code. (The grader applies a narrow, *symmetric* auto-attestation for absence / type-stub findings across **all** arms including the controls; it cannot satisfy the acceptance patterns, so it does not tilt the result toward Vise.)
|
|
159
|
-
- **Three arms, separate tooling.** The Rules-as-Markdown arm has no Vise checker available — it cannot run `vise check`.
|
|
160
|
-
- **Built from scratch** (greenfield seed), capable models with prior SDK familiarity. A complementary **bug-fix** benchmark showed **no Vise advantage** — the loop helps on greenfield integration, not local bug hunts.
|
|
161
|
-
- **N=1 per cell.** A strong directional signal (the Chat/Moderation/Push mechanism reproduces across both models), **not** a statistically settled finding.
|
|
162
|
-
- **Follow-up evidence cuts both ways.** The pre-registered [Capability Matrix](#benchmark-capability-matrix-v1-pre-registered) (n=3, arm-independent grading, different briefs) found **no Row 1 claim** on chat/moderation/push — chat and moderation tied, push +1, below the registered margin — and a new, registered win on feature completeness instead. Read the two together, not Phase 1 alone.
|
|
163
|
-
- **Full methodology, per-cell analysis, and threats to validity:** [the Commune paper](docs/commune-paper-2026-05-30.md). The [`benchmarks/FINDINGS.html`](benchmarks/FINDINGS.html) and [`benchmarks/RULES_AS_MARKDOWN.html`](benchmarks/RULES_AS_MARKDOWN.html) files are **summary report tables**, not raw transcripts or workspace diffs.
|
|
192
|
+
Full evidence: [`benchmarks/CURSOR_VISE_0.14.5_RESULTS.html`](benchmarks/CURSOR_VISE_0.14.5_RESULTS.html), [`benchmarks/capability-matrix/RESULTS.md`](benchmarks/capability-matrix/RESULTS.md), [`benchmarks/commune/RESULTS.md`](benchmarks/commune/RESULTS.md), and [`benchmarks/brand/design-test/RESULTS.md`](benchmarks/brand/design-test/RESULTS.md).
|
|
164
193
|
|
|
165
194
|
### Which mode should I use?
|
|
166
195
|
|
|
167
196
|
| If you… | Use | Why |
|
|
168
197
|
|---|---|---|
|
|
169
|
-
| Building new social features with an AI agent | **Vise CLI + Skill** |
|
|
170
|
-
| Auditing existing social.plus code | `vise check --ci` | Grades any codebase against the
|
|
171
|
-
| Enforcing compliance in
|
|
172
|
-
|
|
173
|
-
---
|
|
174
|
-
|
|
175
|
-
## Benchmark: Capability Matrix v1 (pre-registered)
|
|
176
|
-
|
|
177
|
-
> One row won, one tied, one was withheld on a technicality. The registered protocol requires all three to be reported with equal prominence — so here they are.
|
|
178
|
-
|
|
179
|
-
Phase 1 measured one thing (secondary-compliance gaps) under a harness that overlaps Vise's own ruleset. The **Capability Matrix** is the stricter follow-up: a [pre-registered protocol](benchmarks/capability-matrix/PROTOCOL.md) frozen before any cell ran (7 dated amendments, each committed before the data it governs), **firewalled grading instruments** (authors barred from Vise's rules and config — [authorship record](benchmarks/capability-matrix/AUTHORSHIP.md)), **blind structural judges**, and 27 fresh isolated cells: n=3 seeds per cell, Cursor / Composer 2.5, docs-only control vs Vise loop, identical offline docs bundle in both arms. Full numbers, per-behavior provenance, and the rig-integrity record: [RESULTS.md](benchmarks/capability-matrix/RESULTS.md).
|
|
180
|
-
|
|
181
|
-
| Capability claim | Registered outcome | What the data says |
|
|
182
|
-
|---|---|---|
|
|
183
|
-
| **Feature completeness** — open feed request scored against an 11-item firewalled ground truth | ✅ **Claim ships** (won 3/3 seed pairs and the mean) | The Vise loop roughly **halved silently-dropped SDK capabilities: 4.0 vs 7.67 of 11** — by *building more of the SDK surface*, not by surfacing trade-offs. Mechanism not isolated: plan-time capability enumeration, persistence against the check gate, and ~1.8× time-on-task are all live candidates. |
|
|
184
|
-
| **SDK compliance** — chat / moderation / push slices | ➖ **No claim** (7/9 vs 6/9; push +1, below the registered ≥+2 margin) | Brief-explicit behaviors tie, as pre-registered. One level down the defects are *symmetric*: 2/3 control cells shipped ban gates reading a field that doesn't exist on the real SDK (compiles, never fires; 0/3 Vise cells) — while 2/3 Vise cells over-parameterized `targetType` and never bound the brief's value, plausibly steered by Vise's own rule. |
|
|
185
|
-
| **Design conformance** — ambiguous brief, brand-token usage | ⚠️ **Withheld on a technicality** | By-name token usage **+18.1 points** for the contract+check loop (90.3 vs 72.2 mean) ✓ — but the second registered condition (hex literals strictly *lower*) tied at 0–0, so the claim is withheld and reported descriptively. Vise controls are reused from an earlier round (cross-temporal). |
|
|
186
|
-
|
|
187
|
-
**Registered negative results** (the protocol requires these in any publication of the matrix):
|
|
188
|
-
|
|
189
|
-
- **Push stop-condition non-convergence.** Where docs and SDK disagree (push registration — the docs themselves carry a gap warning), "iterate until `vise check` is green" did not converge within the 30-minute cap in *any* Vise cell: agents looped on attestation dialect. Shipped behavior still passed 3/3 — the cost is wall-clock, not correctness. (2/3 control cells also hit the cap doing SDK archaeology.)
|
|
190
|
-
- **Scope-omit affordance unused.** No agent in any cell used the `// vise: scope-omit` surfacing marker or wrote a qualifying scope note. The advisory surfacing mechanism, as shipped in 0.14.1, influenced zero cells.
|
|
191
|
-
- Unchanged from prior rounds: **no Vise advantage on day-2 bug fixes**; **enumerative plan-time design guidance** measured negative twice and was retracted in 0.14.1.
|
|
192
|
-
|
|
193
|
-
All Capability Matrix findings are directional at n=3 under one model and one executor — not settled statistics.
|
|
198
|
+
| Building new social features with an AI agent | **Vise CLI + Skill** | This is the full governed workflow measured by the benchmarks. |
|
|
199
|
+
| Auditing existing social.plus code | `vise check --ci` | Grades any codebase against the recorded compliance contract. |
|
|
200
|
+
| Enforcing compliance in CI | `vise check --ci` | Exits non-zero on deterministic failures, missing baseline capabilities, failed selected optional sensors, blockers, or drift. |
|
|
194
201
|
|
|
195
202
|
---
|
|
196
203
|
|
|
@@ -204,7 +211,7 @@ All Capability Matrix findings are directional at n=3 under one model and one ex
|
|
|
204
211
|
| **Android (Kotlin)** | ✅ Full | Gradle assemble, unit tests |
|
|
205
212
|
| **iOS (Swift)** | ✅ Full | Static rule checks fully operational. Build sensor not wired (`xcodebuild` environment requirements make it fragile) — `vise run-sensors` returns no-sensors for iOS; compliance rules run regardless. |
|
|
206
213
|
|
|
207
|
-
Each platform has
|
|
214
|
+
Each platform has dozens of rules across 10 compliance domains (feed, comments, moderation, chat, secrets, session & auth, notifications, live objects, logging hygiene, design tokens).
|
|
208
215
|
|
|
209
216
|
---
|
|
210
217
|
|
|
@@ -250,23 +257,23 @@ The skill is plain Markdown; you can read it any time with `vise print-skill`.
|
|
|
250
257
|
flowchart LR
|
|
251
258
|
A[User asks AI agent<br/>'Add a social feed'] --> B[Agent runs<br/>vise inspect]
|
|
252
259
|
B --> C[Agent runs<br/>vise plan --request "..."]
|
|
253
|
-
C --> D{
|
|
254
|
-
D -->|Yes| E[Agent
|
|
260
|
+
C --> D{Blocking intake<br/>or design preview?}
|
|
261
|
+
D -->|Yes| E[Agent surfaces questions<br/>and opens design preview<br/>for user confirmation]
|
|
255
262
|
D -->|No| F
|
|
256
|
-
E --> F[Agent
|
|
263
|
+
E --> F[Agent reruns plan/init<br/>with answers]
|
|
257
264
|
F --> G[Agent runs<br/>vise search-docs<br/>vise get-doc-page]
|
|
258
265
|
G --> H[Agent edits<br/>your code]
|
|
259
266
|
H --> I[Agent runs<br/>vise check]
|
|
260
267
|
I --> J{Green?}
|
|
261
|
-
J -->|No, fixable| K[Agent fixes<br/>findings]
|
|
268
|
+
J -->|No, fixable| K[Agent fixes<br/>findings, completeness gaps,<br/>or selected sensors]
|
|
262
269
|
K --> I
|
|
263
270
|
J -->|No, attest| L[Agent calls<br/>vise attest with<br/>evidence]
|
|
264
271
|
L --> I
|
|
265
272
|
J -->|Yes| M[Agent runs<br/>vise run-sensors]
|
|
266
|
-
M --> N[Done.<br/>sp-vise/ contract<br/>committed]
|
|
273
|
+
M --> N[Done.<br/>sp-vise/ contract<br/>and evidence committed]
|
|
267
274
|
```
|
|
268
275
|
|
|
269
|
-
The flow above is what the skill teaches your AI agent. You — the human — drive intent and approve
|
|
276
|
+
The flow above is what the skill teaches your AI agent. You — the human — drive intent, answer scope questions, and approve the design preview before Vise feeds a design contract forward. The agent runs Vise commands deterministically; Vise grounds the agent in real docs and real compliance rules. If blocking intake or design confirmation is still unresolved, `vise init` refuses to initialize, returns `status: "needs-clarification"`, and exits 7 so the agent must surface the questions instead of guessing.
|
|
270
277
|
|
|
271
278
|
---
|
|
272
279
|
|
|
@@ -280,20 +287,24 @@ The flow above is what the skill teaches your AI agent. You — the human — dr
|
|
|
280
287
|
| `vise inspect [path]` | Detect platform, monorepo surfaces, design signals, available sensors |
|
|
281
288
|
| `vise plan [path] --request "..."` | Produce a grounded implementation plan with intake questions and docs citations |
|
|
282
289
|
| `vise plan-harness [path] --request "..."` | (Pre-planning step) Build the harness around the request |
|
|
283
|
-
| `vise init [path] --request "..."` | Write the `sp-vise/` compliance contract
|
|
290
|
+
| `vise init [path] --request "..." [--answer key=value]` | Write the `sp-vise/` compliance contract after blocking intake is answered; returns `needs-clarification` and exits 7 if required answers are missing |
|
|
291
|
+
| `vise blocks list --registry <path>` | Read a social.plus Block Factory registry |
|
|
292
|
+
| `vise blocks plan [path] --block <id> --registry <path>` | Plan safe block package, source-anchor, sidecar, and sensor changes |
|
|
293
|
+
| `vise blocks add [path] --block <id> --registry <path> [--dry-run\|--apply]` | Dry-run or apply safe block installation inside a customer project |
|
|
294
|
+
| `vise blocks validate [path] [--block <id>] --registry <path>` | Validate installed block sidecar state, package presence, and source anchors |
|
|
284
295
|
|
|
285
296
|
### Design contract (UI generation)
|
|
286
297
|
|
|
287
298
|
| Command | Purpose |
|
|
288
299
|
|---|---|
|
|
289
|
-
| `vise design extract <prototypePath> [--repo .] [--no-write]` | Read an HTML/CSS prototype and write a graded `sp-vise/design-contract.json` (declared CSS custom properties become exact tokens; repeated literals become inferred/advisory tokens; single-use literals are dropped) so generated social.plus UI can match the customer's aesthetic |
|
|
290
|
-
| `vise design extract --from-project [path] [--no-write]` | No external prototype? Derive the contract from the host project's **own** design system — CSS custom properties (incl. shadcn `:root` and Tailwind v4 `@theme`), TS/JS token modules, inline tailwind configs, **Android** `colors.xml`/`dimens.xml`, **Flutter** `Color(0x…)`, and **iOS** `.xcassets/*.colorset` + Swift `Color(hex:)`/`Color(red:g:b:)`. Reference values (`var()`/`theme()`/`calc()`) are skipped, so a var-mapped config contributes nothing rather than wrong tokens |
|
|
300
|
+
| `vise design extract <prototypePath> [--repo .] [--no-write]` | Read an HTML/CSS prototype and write a graded `sp-vise/design-contract.json` plus `sp-vise/design-preview.html` (declared CSS custom properties become exact tokens; repeated literals become inferred/advisory tokens; single-use literals are dropped) so generated social.plus UI can match the customer's aesthetic after preview confirmation |
|
|
301
|
+
| `vise design extract --from-project [path] [--no-write]` | No external prototype? Derive the contract from the host project's **own** design system and write a preview — CSS custom properties (incl. shadcn `:root` and Tailwind v4 `@theme`), TS/JS token modules, inline tailwind configs, **Android** `colors.xml`/`dimens.xml`, **Flutter** `Color(0x…)`, and **iOS** `.xcassets/*.colorset` + Swift `Color(hex:)`/`Color(red:g:b:)`. Reference values (`var()`/`theme()`/`calc()`) are skipped, so a var-mapped config contributes nothing rather than wrong tokens |
|
|
291
302
|
| `vise design check [path]` | Advisory, **non-blocking** report on how closely the UI code matches the contract (token coverage + on/off-contract color literals). Never fails a build and is **not** a `vise check` gate |
|
|
292
303
|
| `vise design preview [path] [--reference <prototype>]` | Write a self-contained `sp-vise/design-preview.html`: the contract's tokens as visual swatches + the conformance report + the HTML reference embedded for side-by-side review. Vise renders the artifact; a human/VLM judges the visual match. Dependency-free — **not** an automated pixel diff |
|
|
293
304
|
| `vise design reference [path] [--title <name>]` | Write a self-contained `sp-vise/design-reference.html`: human/VLM-readable design-system spec — token swatches, type samples, component demos, and a growth-layer summary. Pairs with `design-contract.json` (machine-readable). Use `--title` to name the design system (e.g. `--title Streamly`). Advisory — **not** an enforcement gate |
|
|
294
305
|
| `vise design init-tokens [path] [--force]` | Scaffold `src/styles/social-plus-tokens.css` — the dedicated, customer-editable token file for social.plus features. **Greenfield:** neutral defaults (full `--sp-*` token set). **Brownfield:** seeded from your existing concrete tokens. Idempotent — never overwrites an existing file (use `--force` to override). After editing, run `vise design extract --from-project` to refresh the contract |
|
|
295
306
|
|
|
296
|
-
The extracted contract is **advisory input for generation**, not an enforcement gate: a token-poor prototype yields a weaker — never wrong — contract, and absence of a prototype simply means no contract (the existing `*.design.reuse-detected-tokens` rules still cover reuse of a host project's own design system).
|
|
307
|
+
The extracted contract is **advisory input for generation**, not an enforcement gate: a token-poor prototype yields a weaker — never wrong — contract, and absence of a prototype simply means no contract (the existing `*.design.reuse-detected-tokens` rules still cover reuse of a host project's own design system). A contract becomes feed-forward guidance only after the user confirms the preview with `--answer design_contract_confirmation=yes`; rejecting it with `=no` asks for a replacement design source and records no design-conformance digest.
|
|
297
308
|
|
|
298
309
|
### Documentation grounding & Troubleshooting
|
|
299
310
|
|
|
@@ -423,14 +434,16 @@ jobs:
|
|
|
423
434
|
|
|
424
435
|
## Compliance Contract
|
|
425
436
|
|
|
426
|
-
After `vise init`, your project gets a `sp-vise/` directory. These files become part of your repo and travel through code review:
|
|
437
|
+
After a successful `vise init`, your project gets a `sp-vise/` directory. If init returns `needs-clarification`, no compliance sidecar is written; answer the blocking questions and run init again. These files become part of your repo and travel through code review:
|
|
427
438
|
|
|
428
439
|
| File | Created by | What it contains |
|
|
429
440
|
|---|---|---|
|
|
430
|
-
| `sp-vise/compliance.json` | `vise init` | The rules selected for this integration, the Vise version, the ruleset digest, the target app surface,
|
|
441
|
+
| `sp-vise/compliance.json` | `vise init` | The rules selected for this integration, the Vise version, the ruleset digest, the target app surface, selected optional capabilities, optional engagement link, and an accepted design-contract digest when confirmed. |
|
|
442
|
+
| `sp-vise/intake.json` | `vise init` | The request, outcome, intake answers, remaining blocking count, design-review status (`absent`, `needs-confirmation`, `accepted`, or `rejected`), and any retrospective `--allow-unresolved-intake` acknowledgement. |
|
|
431
443
|
| `sp-vise/attestations/*.json` | `vise sync` (deterministic) or `vise attest` (host-agent / human) | Per-rule evidence: signer, confidence, rationale, cited files (with source fingerprints for drift detection). |
|
|
432
444
|
| `sp-vise/inspection.json` | `vise init` | The platform, monorepo surface, and design-token signals detected at init time. |
|
|
433
445
|
| `sp-vise/design-contract.json` | `vise design extract` | The extracted design contract: declared tokens, breakpoints, advisory components, source file digests (for freshness detection), and a stable digest over design facts. |
|
|
446
|
+
| `sp-vise/design-preview.html` | `vise design extract` or `vise design preview` | Self-contained visual review of the design contract, embedded prototype when available, token swatches, and design-check conformance summary. Open this before answering `design_contract_confirmation`. |
|
|
434
447
|
| `sp-vise/design-reference.html` | `vise design reference` | Self-contained HTML design-system spec (token swatches, type samples, components). Human/VLM-readable; open in a browser alongside the app. |
|
|
435
448
|
| `sp-vise/engagement.json` | `vise engagement init` (optional) | Contractual scope: tier, customer ID, contracted outcomes, reviewer assignment. |
|
|
436
449
|
|