@amityco/social-plus-vise 1.1.0 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +7 -0
- package/README.md +13 -6
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,13 @@ All notable changes to `@amityco/social-plus-vise` are documented in this file.
|
|
|
4
4
|
|
|
5
5
|
The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
6
6
|
|
|
7
|
+
## 1.1.1 — 2026-06-16
|
|
8
|
+
|
|
9
|
+
Docs-only patch; no CLI, MCP, rule, or sidecar behavior changes.
|
|
10
|
+
|
|
11
|
+
### Changed
|
|
12
|
+
- **README Evidence section reworked.** Dropped the pre-1.0 "like-for-like" feed-completeness result (it was measured against an earlier dev build on early-June agent versions); presented the two genuine matched comparisons (SDK compliance, design conformance) as a scannable table; relabeled the former "best case" as **"Feed completeness (capabilities selected)"** and reframed it as *absolute* opt-in completeness rather than a baseline-beating lift; led design conformance with the robust by-name-token signal instead of the fragile one-seed hex anecdote (and dropped the model attribution, given a cross-temporal confound now noted under Boundaries); tightened the limitations and lifted the "stopping condition" framing into the section intro. No benchmark numbers were invented; the SDK-compliance benchmark stays described generically (its internal codename remains denylisted from the shipped surface).
|
|
13
|
+
|
|
7
14
|
## 1.1.0 — 2026-06-16
|
|
8
15
|
|
|
9
16
|
Public GA-readiness pass. No change to `vise check` exit codes or sidecar **formats**. The rule corpus gains three corrected rationales (stale/fictional SDK method names) — a deliberate pre-GA contract reset, with the `test:sidecar-compat` baseline re-frozen accordingly (no published 0.14.x consumers to protect; see below).
|
package/README.md
CHANGED
|
@@ -130,13 +130,20 @@ Vise can ingest your aesthetic from an HTML/CSS prototype (`vise design extract`
|
|
|
130
130
|
|
|
131
131
|
## Evidence
|
|
132
132
|
|
|
133
|
-
Vise's measured effect is on
|
|
133
|
+
Vise's measured effect is on whether an agent builds the *whole* outcome — pagination, empty and error states, the optional capabilities you asked for — or just the happy path. The durable finding across benchmarks is the **mechanism**, not any single percentage: a checked verification loop beats the same rules pasted into a prompt. What Vise reliably changes is the **stopping condition** — the agent isn't done when the code compiles; it's done when the local contract is green, attested, or blocked on an explicit customer decision.
|
|
134
134
|
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
135
|
+
**Matched comparisons** — Vise's verification loop vs a docs-only baseline, same task and agent:
|
|
136
|
+
|
|
137
|
+
| Benchmark | What's measured | Docs-only | With Vise |
|
|
138
|
+
|---|---|---|---|
|
|
139
|
+
| **SDK compliance** | correctness slices passed (greenfield; 9 independent slices, single run per cell, two agents) | 6/9 (67%) | **9/9 (100%)** |
|
|
140
|
+
| **Design conformance** | the brand's *named* design tokens used vs hardcoded values (ambiguous brief, n=3) | ~72% | **~90%** |
|
|
141
|
+
|
|
142
|
+
The mechanism is the durable claim: a checked loop beats the same rules in the prompt.
|
|
143
|
+
|
|
144
|
+
**Feed completeness (capabilities selected):** when the optional capabilities — image, poll, edit — are selected up front, the Vise arm completed **97–100%** of an 11-item feed checklist across three agents (Cursor/Composer, Claude/Sonnet, Codex/GPT). This is *absolute* completeness from answering Vise's capability questions — **not** a lift over a baseline.
|
|
145
|
+
|
|
146
|
+
**Boundaries:** these are social.plus's own measurements on greenfield work — self-reported, no third-party audit, measured on earlier builds, and not universal claims. Negative results travel with them: no measured advantage on day-2 bug fixing, and the design-conformance arms were measured at different times (the figure is the robust by-name-token signal, not pixel perfection).
|
|
140
147
|
|
|
141
148
|
<sub>Cursor, Claude, Codex, GitHub Copilot, VS Code, and other product names are trademarks of their respective owners; social.plus is not affiliated with or endorsed by them. Benchmark figures are from social.plus's own measurements.</sub>
|
|
142
149
|
|