@amityco/social-plus-vise 0.8.1 → 0.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +149 -0
- package/README.md +76 -40
- package/dist/outcomes.js +19 -3
- package/dist/server.js +39 -0
- package/dist/tools/compliance.js +68 -20
- package/dist/tools/debug.js +267 -0
- package/dist/tools/harness.js +17 -1
- package/dist/tools/project.js +222 -25
- package/dist/types.js +4 -0
- package/package.json +14 -6
- package/rules/auth.yaml +298 -38
- package/rules/feed.yaml +1 -1
- package/rules/live-data.yaml +316 -36
- package/rules/push.yaml +140 -0
- package/rules/sdk-lifecycle.yaml +1421 -131
- package/rules/security.yaml +60 -0
- package/skills/social-plus-vise/SKILL.md +45 -2
- package/skills/vise-harness-engineer/SKILL.md +35 -0
- package/social.plus-vise.png +0 -0
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to `@amityco/social-plus-vise` are documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
6
|
+
|
|
7
|
+
## 0.10.0 — 2026-05-29
|
|
8
|
+
|
|
9
|
+
**Theme:** Benchmark-driven sensor expansion. The Commune benchmark (9 new SDK domains: chat, push, social graph, moderation, comments) produced the first measured, defensible advantage for vise+skill over pure MCP: **7/9 working features vs 3/9** with the same agent on the same prompts. This release ships the sensors, rules, and findings.json improvements that produced that result.
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
- `react-native.chat.channel-type-dm` / `typescript.chat.channel-type-dm` (`warning`) — DM channels must use `type: 'conversation'`, not `type: 'community'`. Agents consistently choose `community` for 1-to-1 chats because it sounds plausible but silently creates a group channel with the wrong shape. Sensor requires `userIds` co-occurrence to avoid firing on legitimate community broadcasts.
|
|
13
|
+
- `react-native.follow.status-subscription` (`warning`) — `getFollowStatus` must be wrapped in a live subscription, not a one-shot query. A one-shot call captures state at mount and never updates — follow/unfollow actions are not reflected in the UI until the user navigates away.
|
|
14
|
+
- `rationale` field in `sp-vise/findings.json` — agents see *why* each rule exists, not just *what* it requires. Improves attestation quality on rules that allow it.
|
|
15
|
+
- Compliance.json rule entries now include a `title` field (digest-stable, separate from hashing) so agents and humans can identify rules without grepping definitions.
|
|
16
|
+
- Corpus grew from **262 → 265 rules**.
|
|
17
|
+
|
|
18
|
+
### Changed
|
|
19
|
+
- **`vise init` now writes `sp-vise/findings.json` immediately** — agents see current rule violations on startup with no exploration needed. Combined with the `npm run sp-check` script added to scaffolded workspaces, agents follow a directed (read findings → fix → verify) loop instead of an exploratory (search → search → search → implement) loop.
|
|
20
|
+
- **`live-collection.api-mismatch`, `posts.activity-tag-filter`, `posts.reaction-stale-post-ref`, `user.ban-state-respected`** — all now skip `.d.ts` files to eliminate false positives from type stubs.
|
|
21
|
+
- **`user.ban-state-respected`** — `flagComment` and `flagPost` added to the recognised write-pattern list. Flagging is a moderation action and must be ban-guarded.
|
|
22
|
+
- **`react-native.push.unregister.present`** — recommendation generalised; no longer references benchmark-specific state variables. Surfaces the exact `useEffect` cleanup pattern needed.
|
|
23
|
+
- Reactive markers now include `.on('dataUpdated', ...)` — the event-emitter style of subscribing to LiveCollection updates is now recognised as a valid alternative to property-callback subscription.
|
|
24
|
+
- README updated with a step-by-step Quick Start that references `findings.json` directly.
|
|
25
|
+
|
|
26
|
+
### Benchmark infrastructure (`benchmarks/`)
|
|
27
|
+
- **Commune benchmark** added — 9-slice React Native scenario (CM-SETUP, CM-PRESENCE, CM-FEED, CM-EVENTS, CM-CHAT, CM-PUSH, CM-PROFILE, CM-MODERATE, CM-COMMENTS) covering chat, push, social graph, and moderation domains absent from TouchTunes. Three seed types per slice (`baseline`, `broken`, `greenfield`) for 27 fixture sets total.
|
|
28
|
+
- **Rules-as-markdown control arm** (`benchmarks/commune/run-commune-rules-arm.sh`) — injects the rule corpus as a static document into the agent prompt. Built to isolate whether vise's measured advantage comes from *information delivery* (the rules) or the *iterative verification loop* (sp-check).
|
|
29
|
+
- **TouchTunes runner improvements** — workspace isolation (`workspaces/broken/` vs `workspaces/baseline/` so agents can't peek at the answer), `< /dev/null` stdin redirect fix that was causing agy/codex to silently skip cells, `|| true` per-cell error isolation, and grader auto-attestation for no-file and `.d.ts`-pointing rules.
|
|
30
|
+
- **agy + codex runners** (`run-agy-cells.sh`, `run-codex-cells.sh`) — production-quality scripts with TTY-detection fixes and workspace isolation.
|
|
31
|
+
|
|
32
|
+
### Findings & reports
|
|
33
|
+
- `benchmarks/FINDINGS.html` — engineering-facing summary of the benchmark methodology, results, and what was/wasn't proven.
|
|
34
|
+
- `benchmarks/MARKETING.html` — three-tier marketing-claim framework (safe / concrete / honest / aspirational) with supporting wallclock data and a list of metrics to instrument next.
|
|
35
|
+
|
|
36
|
+
### Honest claim
|
|
37
|
+
On 9 new SDK domain implementations with codex gpt-5.4, vise+skill produced 7 working features vs 3 for pure MCP — same agent, same prompts. The cost: +28% wallclock per session. The net: −52% wallclock per *working* feature, because more features ship on the first try. Vise consistently catches five bug classes that capable models otherwise miss: wrong DM channel type, missing push register/unregister lifecycle, one-shot queries where live subscriptions are required, missing ban checks before write operations, and missing flag affordances on user-generated content.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## 0.9.0 — 2026-05-27
|
|
42
|
+
|
|
43
|
+
**Theme:** Business model-grounded gap analysis; Next.js / SSR guard; environment hygiene expanded to all platforms.
|
|
44
|
+
|
|
45
|
+
### Added
|
|
46
|
+
- `typescript.client.no-ssr-init` (`error`) — SDK client must not be initialized in a Next.js Server Component, `layout.tsx` without `'use client'`, or inside `getServerSideProps`/`getStaticProps`. The primary demo-invisible failure mode for AI-native Next.js customers: `next dev` recovers from the error gracefully; `next build` + production does not.
|
|
47
|
+
- `react-native.secret.env-gitignore` — React Native env files containing secret-shaped keys must be excluded by `.gitignore`.
|
|
48
|
+
- `react-native.secret.env-example` — A `.env.example` or `.env.sample` must accompany any gitignored React Native env file.
|
|
49
|
+
- `flutter.secret.env-gitignore` — Flutter `.env` or `secrets.dart` files containing secret-shaped keys must be excluded by `.gitignore`.
|
|
50
|
+
- `android.secret.env-gitignore` — `local.properties` containing secret-shaped keys must be excluded by `.gitignore`.
|
|
51
|
+
- `ios.secret.env-gitignore` — `Secrets.plist` or `*.xcconfig` files containing secret-shaped keys must be excluded by `.gitignore`.
|
|
52
|
+
- Corpus grew from **256 → 262 rules**.
|
|
53
|
+
- `benchmarks/SDK_INTEGRATION_GAP_ANALYSIS.md` — business model-grounded gap analysis mapping every SDK-relevant value claim to Vise rule coverage, with a prioritised improvement backlog.
|
|
54
|
+
|
|
55
|
+
### Changed
|
|
56
|
+
- **Skill — "Stop Instead Of Guessing":** intake list now asks about Next.js rendering mode (Server Component vs `'use client'` vs Pages Router) before implementing SDK initialization.
|
|
57
|
+
- **Skill — "Session Renewal":** new feedforward: SDK collection queries must not fire before `login()` completes; gate collection setup behind the session-active signal.
|
|
58
|
+
- **Skill — "Live Collection API Mismatch":** new guidance: handle connection-state changes and render a reconnecting indicator when the WebSocket drops.
|
|
59
|
+
- **Skill — "Debugging & Troubleshooting":** compact `--brief` flag documented; `repairBrief` output described.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## 0.7.0 — 2026-05-23
|
|
64
|
+
|
|
65
|
+
**Theme:** SDK-specific rule corpus expansion + measured cross-tool benchmark.
|
|
66
|
+
|
|
67
|
+
### Added
|
|
68
|
+
- 17 new SDK-specific rule families across 5 platforms = **85 new compliance rules** (corpus grew from 167 → 252):
|
|
69
|
+
- **Tier 1 — Silent-failure traps:** `session-handler.retained`, `live-collection.api-mismatch`, `posts.status-filter-applied`, `pagination.cursor-opaque`, `posts.parent-child-rendered`
|
|
70
|
+
- **Tier 2 — Wrong-target / silent misroute:** `feed.target-type-explicit`, `comment.reference-type-enum`, `channel.type-matches-shape`
|
|
71
|
+
- **Tier 3 — Moderator-only data leaking to user UI:** `moderation.role-gated-action`, `flag-count.not-leaked-to-non-mods`, `user.ban-state-respected`
|
|
72
|
+
- **Tier 4 — Notifications & unread state:** `notifications.amity-preferences-configured`, `unread.subscribed-not-counted`
|
|
73
|
+
- **Tier 5 — Custom config & types:** `reactions.configured-name-used`, `custom-post-type.dataType-declared`
|
|
74
|
+
- **Tier 6 — File upload & media:** `file-upload.via-amity-file-client`, `image-post.child-resolution-awaited`
|
|
75
|
+
- Multi-outcome measured benchmark (chat / comments / push on React + Flutter) with cross-tool validation (Antigravity / Gemini 3.5 Flash). See `benchmarks/RESULTS.md`.
|
|
76
|
+
- Fixture-foundation gates: `run-happy-path-clean.mjs` (every canonical happy-path must fire zero rules) and `run-fixture-symmetry.mjs` (every rule's positive fixture must not fire the rule).
|
|
77
|
+
- Dedicated React Native canonical happy-path fixture (previously shared with TypeScript).
|
|
78
|
+
- New CI exit code `4` for `contract-drift` (rules in `sp-vise/compliance.json` no longer match current ruleset).
|
|
79
|
+
|
|
80
|
+
### Fixed
|
|
81
|
+
- `*.secret.inline-api-key` now catches env-fallback literal leaks: `String.fromEnvironment(..., defaultValue: 'literal')` (Dart), `process.env.X ?? 'literal'` (JS/TS), ternary fallback. Previously these forms slipped past the regex because the literal wasn't directly assigned to `apiKey`.
|
|
82
|
+
- Four web/Flutter rule false-positives on idiomatic guarded code: `typescript.client.region` now accepts env-sourced and positional region declarations; `*.network.error-handling-present` recognizes React error-state idiom; `flutter.design.reuse-detected-tokens` credits `Theme.of(context)` reuse.
|
|
83
|
+
- Pre-existing CLI version assertion in `test/run-cli.mjs` (was pinned to `0.4.0`).
|
|
84
|
+
|
|
85
|
+
### Changed
|
|
86
|
+
- Project structure flattened — `packages/foundry/` layer removed; npm package now publishes from the repo root.
|
|
87
|
+
- README consolidated from two files (brand + developer) into a single customer-facing canonical README; internal architecture moved to `docs/`.
|
|
88
|
+
|
|
89
|
+
## 0.6.0 — 2026-05-22
|
|
90
|
+
|
|
91
|
+
**Theme:** v0.6 compliance expansion + 5-platform measured benchmark.
|
|
92
|
+
|
|
93
|
+
### Added
|
|
94
|
+
- Corpus grew to 167 rules across 10 domains.
|
|
95
|
+
- Outcomes: `add-comments`, `add-moderation`, `add-chat`.
|
|
96
|
+
- Five-platform measured benchmark (TypeScript / React Native / Flutter / Android / iOS) with real `vise check` and `vise run-sensors` artifacts.
|
|
97
|
+
|
|
98
|
+
### Fixed
|
|
99
|
+
- React Native platform detection priority (previously misdetected as TypeScript when both signals were present).
|
|
100
|
+
|
|
101
|
+
## 0.5.0 — 2026-05-21
|
|
102
|
+
|
|
103
|
+
**Theme:** AST-based sensors.
|
|
104
|
+
|
|
105
|
+
### Added
|
|
106
|
+
- Tree-sitter AST sensors for Kotlin / Swift / Dart literal detection.
|
|
107
|
+
- Phase 1 pilot: `typescript.auth.no-literal-user-id` resolves identifier-via-constant indirections.
|
|
108
|
+
- Phase 4: AST-aware comment stripping for `ui-states-present` and `design-reuse-detected-tokens` rules.
|
|
109
|
+
|
|
110
|
+
## 0.4.0 — 2026-05-20
|
|
111
|
+
|
|
112
|
+
**Theme:** Compliance harness.
|
|
113
|
+
|
|
114
|
+
### Added
|
|
115
|
+
- `vise check --ci`: read-only verification with structured exit codes for CI pipelines.
|
|
116
|
+
- Attestation flow: `vise attest` with rule id, signer, confidence, evidence, and rationale.
|
|
117
|
+
- `vise sync`: persist deterministic-pass attestation files.
|
|
118
|
+
- Engagement tracking: `vise engagement init/show` for tier / customer-id / scope metadata.
|
|
119
|
+
- `sp-vise/` sidecar directory: customer-visible compliance contract (`compliance.json`, `attestations/`, `engagement.json`, `inspection.json`).
|
|
120
|
+
- Cross-platform rule corpus.
|
|
121
|
+
- Native project skill installs.
|
|
122
|
+
|
|
123
|
+
## 0.3.0 — 2026-05-19
|
|
124
|
+
|
|
125
|
+
**Theme:** Foundry → Vise rename.
|
|
126
|
+
|
|
127
|
+
### Changed
|
|
128
|
+
- Renamed npm package to `@amityco/social-plus-vise`.
|
|
129
|
+
- Added `vise` short binary alias; kept `foundry-mcp` as a compatibility binary alias.
|
|
130
|
+
- Added Claude Code skill targets (`--target claude`, `--target claude-project .`).
|
|
131
|
+
- Documented Cursor, Copilot, and VS Code instruction installs.
|
|
132
|
+
|
|
133
|
+
## 0.2.1 — 2026-05-18
|
|
134
|
+
|
|
135
|
+
### Added
|
|
136
|
+
- `vise install-skill`, `vise print-skill`, and `vise skill-path` for bundled-skill installation.
|
|
137
|
+
|
|
138
|
+
## 0.2.0 — 2026-05-17
|
|
139
|
+
|
|
140
|
+
### Added
|
|
141
|
+
- Skill-guided CLI commands: `inspect`, `plan`, `validate`, `run-sensors`.
|
|
142
|
+
- The `social-plus-vise` skill guidance shipped as part of the package.
|
|
143
|
+
|
|
144
|
+
## 0.1.1 — 2026-05-16
|
|
145
|
+
|
|
146
|
+
### Added
|
|
147
|
+
- Initial npm publish.
|
|
148
|
+
- MCP adapter (stdio).
|
|
149
|
+
- Doc search backed by `https://learn.social.plus/llms-full.txt`.
|
package/README.md
CHANGED
|
@@ -1,7 +1,3 @@
|
|
|
1
|
-
<p align="center">
|
|
2
|
-
<img src="./social.plus-vise.png" alt="social.plus Vise" width="320" />
|
|
3
|
-
</p>
|
|
4
|
-
|
|
5
1
|
<h1 align="center">social.plus Vise</h1>
|
|
6
2
|
|
|
7
3
|
<p align="center">
|
|
@@ -45,9 +41,11 @@ See [Usage Flow](#usage-flow) for the full step-by-step diagram.
|
|
|
45
41
|
|
|
46
42
|
---
|
|
47
43
|
|
|
48
|
-
## What Vise Does
|
|
44
|
+
## What Vise Does: Agentic Workflow Governance
|
|
49
45
|
|
|
50
|
-
|
|
46
|
+
Instead of just providing a CLI or AI skills, Vise implements a technique called **Agentic Workflow Governance**. Think of it as building a software factory directly on top of the customer's project.
|
|
47
|
+
|
|
48
|
+
Vise acts as the foreman of this factory, wrapping your local coding agents in compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 262 platform-specific compliance rules, and runs your project's own build/lint/typecheck sensors. **Your source code never leaves your machine.**
|
|
51
49
|
|
|
52
50
|
| Layer | Purpose |
|
|
53
51
|
|---|---|
|
|
@@ -61,53 +59,78 @@ A bench vise holds the workpiece steady so the craftsman's hands are free to sha
|
|
|
61
59
|
|
|
62
60
|
---
|
|
63
61
|
|
|
64
|
-
## Benchmark:
|
|
62
|
+
## Benchmark: Phase 1 Results
|
|
63
|
+
|
|
64
|
+
> **Every feature delivered correctly — confirmed independently with two different AI coding tools.**
|
|
65
|
+
> With Vise, both agents built all 9 social features with no production gaps. Without Vise, 3 out of 9 features had hidden problems that would only surface after users complained.
|
|
66
|
+
|
|
67
|
+
### What "delivered correctly" means
|
|
68
|
+
|
|
69
|
+
"Correct" doesn't just mean the code compiles. It means every feature handles the edge cases that matter to real users and real moderation teams:
|
|
70
|
+
|
|
71
|
+
- A **banned user** cannot type or submit a post — the send button is hidden, not just disabled-on-submit
|
|
72
|
+
- **Push notification preferences** are wired to the Amity API so users who opt out actually stop receiving notifications
|
|
73
|
+
- **Moderation actions** (report, flag, block) are surfaced in the UI so users can act on them, not buried in a hook
|
|
74
|
+
- **Chat and feed queries** use live, reactive subscriptions — not one-time fetches that go stale
|
|
75
|
+
|
|
76
|
+
Without Vise, AI agents frequently implement the primary feature correctly but miss these secondary requirements. They know about them in the abstract — but when building a chat screen, "ban state" feels out of scope and gets skipped. `sp-check` turns that vague awareness into a specific, actionable finding.
|
|
77
|
+
|
|
78
|
+
### The experiment: three conditions, nine features
|
|
79
|
+
|
|
80
|
+
We ran a controlled experiment — the **Commune Benchmark** — to measure not just *whether* Vise helps, but *why*. Each of the nine features below was built from scratch by an AI agent under three independent conditions:
|
|
65
81
|
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
> **76% cheaper · 28% faster · 86% fewer issues**
|
|
82
|
+
**Nine features built:**
|
|
83
|
+
SDK setup · User presence · Social feed · Events · Chat & DMs · Push notifications · User profile · Content moderation · Comments
|
|
69
84
|
|
|
70
|
-
|
|
85
|
+
| Condition | What the agent had | The question it answers |
|
|
86
|
+
|---|---|---|
|
|
87
|
+
| **Pure MCP** | Access to social.plus docs only — no compliance guidance | Baseline: how well does the agent do on its own? |
|
|
88
|
+
| **Rules-as-Markdown** | The full 1,013-line compliance rulebook pasted directly into the prompt | Is the problem just that the agent doesn't know the rules? |
|
|
89
|
+
| **Vise + Skill** | Full Vise CLI — `sp-check` runs automatically, agent reads specific findings, fixes them, repeats until green | Does an active feedback loop change the outcome? |
|
|
90
|
+
|
|
91
|
+
The Rules-as-Markdown condition is the key isolation: if the agent already knows all the rules, does giving it the spec document fix the problem? The answer turned out to be **no** — knowing the rules and being forced to act on specific findings are different things.
|
|
71
92
|
|
|
72
|
-
###
|
|
93
|
+
### Results — features delivered without production gaps
|
|
73
94
|
|
|
74
|
-
|
|
|
75
|
-
|
|
76
|
-
| **
|
|
77
|
-
| **
|
|
78
|
-
| **Vise CLI + Skill** (full workflow) | ✅ 2/2 | 1 | 8,733 | $0.0024 | 447s |
|
|
95
|
+
| Coding agent (model) | Pure MCP | Rules-as-Markdown | Vise + Skill |
|
|
96
|
+
|---|---|---|---|
|
|
97
|
+
| **Cursor (Composer 2.5)** | 6 out of 9 ✗ | 5 out of 9 ✗ | **9 out of 9 ✅** |
|
|
98
|
+
| **Claude Code (Sonnet 4.6)** | 6 out of 9 ✗ | 7 out of 9 ✗ | **9 out of 9 ✅** |
|
|
79
99
|
|
|
80
|
-
|
|
100
|
+
The three features that consistently fail without Vise — **Chat**, **Moderation**, and **Push Notifications** — are exactly the ones with secondary compliance requirements (ban-state, report affordances, Amity preference API). Vise's `sp-check` catches these with a specific finding; the rules doc does not.
|
|
81
101
|
|
|
82
|
-
|
|
102
|
+
Both agents reached a perfect score with Vise. Neither could reach it with the compliance spec pasted into the prompt. All 9 passes were independently verified by code inspection — no scoring shortcuts.
|
|
83
103
|
|
|
84
|
-
|
|
104
|
+
### Efficiency — rework sessions needed
|
|
85
105
|
|
|
86
|
-
|
|
106
|
+
Vise delivers all 9 features correctly in a single session. The other conditions leave failing features that require additional sessions to diagnose (the gap isn't visible without `sp-check`) and fix.
|
|
87
107
|
|
|
88
|
-
|
|
|
108
|
+
| Coding agent (model) | Condition | Features correct | Rework sessions needed |
|
|
89
109
|
|---|---|---|---|
|
|
90
|
-
|
|
|
91
|
-
|
|
|
92
|
-
|
|
|
93
|
-
|
|
|
94
|
-
|
|
|
95
|
-
|
|
|
96
|
-
| Manual rework needed? | Yes | No | Zero rework |
|
|
110
|
+
| **Cursor (Composer 2.5)** | Pure MCP | 6 / 9 ✗ | +3 or more |
|
|
111
|
+
| **Cursor (Composer 2.5)** | Rules-as-Markdown | 5 / 9 ✗ | +4 or more |
|
|
112
|
+
| **Cursor (Composer 2.5)** | **Vise + Skill** | **9 / 9 ✅** | **0 ✅** |
|
|
113
|
+
| **Claude Code (Sonnet 4.6)** | Pure MCP | 6 / 9 ✗ | +3 or more |
|
|
114
|
+
| **Claude Code (Sonnet 4.6)** | Rules-as-Markdown | 7 / 9 ✗ | +2 or more |
|
|
115
|
+
| **Claude Code (Sonnet 4.6)** | **Vise + Skill** | **9 / 9 ✅** | **0 ✅** |
|
|
97
116
|
|
|
98
|
-
|
|
117
|
+
<sub>Rework sessions are additional developer-initiated prompts needed after the initial session to diagnose and fix the failing features. Each failing feature typically requires at least one session to identify the gap and one to fix it — and that's without the benefit of `sp-check` pointing directly at the problem.</sub>
|
|
99
118
|
|
|
100
|
-
|
|
119
|
+
### Reproducibility
|
|
120
|
+
|
|
121
|
+
- **Gate-checked:** Every pass was verified by code inspection — the Vise workspaces contain an actual UI-level ban gate; the pure-MCP workspaces do not. Zero attestation shortcuts.
|
|
122
|
+
- **Built from scratch** (greenfield seed) — not patching existing code.
|
|
123
|
+
- **Three arms run with separate tooling.** The Rules-as-Markdown arm has no `sp-check` tool available — it cannot "cheat" by running the checker.
|
|
124
|
+
- **N=1 per cell (Phase 1).** Each agent ran each scenario once. Repeatability seeds on the three most discriminating slices (CM-CHAT, CM-MODERATE, CM-PUSH) are pending. These results should be treated as a strong initial signal, not a statistically settled finding.
|
|
125
|
+
- Full per-feature scorecards, agent transcripts, and workspace diffs: [`benchmarks/FINDINGS.html`](benchmarks/FINDINGS.html) · [`benchmarks/RULES_AS_MARKDOWN.html`](benchmarks/RULES_AS_MARKDOWN.html)
|
|
101
126
|
|
|
102
127
|
### Which mode should I use?
|
|
103
128
|
|
|
104
|
-
| If you
|
|
129
|
+
| If you… | Use | Why |
|
|
105
130
|
|---|---|---|
|
|
106
|
-
|
|
|
107
|
-
|
|
|
108
|
-
|
|
|
109
|
-
|
|
110
|
-
For the full interactive report with charts, see [`benchmarks/report.html`](./benchmarks/report.html). For per-cell scorecards and prior benchmark versions, see [`benchmarks/RESULTS.md`](./benchmarks/RESULTS.md).
|
|
131
|
+
| Building new social features with an AI agent | **Vise CLI + Skill** | The only mode that reliably delivers all features correctly |
|
|
132
|
+
| Auditing existing social.plus code | `vise check --ci` | Grades any codebase against the full ruleset |
|
|
133
|
+
| Enforcing compliance in a CI pipeline | `vise check --ci` | Exits non-zero on failures; structured JSON output for logs |
|
|
111
134
|
|
|
112
135
|
---
|
|
113
136
|
|
|
@@ -121,7 +144,7 @@ For the full interactive report with charts, see [`benchmarks/report.html`](./be
|
|
|
121
144
|
| **Android (Kotlin)** | ✅ Full | Gradle assemble, unit tests |
|
|
122
145
|
| **iOS (Swift)** | ✅ Full | (static rule checks; runtime sensors WIP) |
|
|
123
146
|
|
|
124
|
-
Each platform has
|
|
147
|
+
Each platform has 52–54 rules across 10 compliance domains (feed, comments, moderation, chat, secrets, session & auth, notifications, live objects, logging hygiene, design tokens).
|
|
125
148
|
|
|
126
149
|
---
|
|
127
150
|
|
|
@@ -199,12 +222,13 @@ The flow above is what the skill teaches your AI agent. You — the human — dr
|
|
|
199
222
|
| `vise plan-harness [path] --request "..."` | (Pre-planning step) Build the harness around the request |
|
|
200
223
|
| `vise init [path] --request "..."` | Write the `sp-vise/` compliance contract for this project |
|
|
201
224
|
|
|
202
|
-
### Documentation grounding
|
|
225
|
+
### Documentation grounding & Troubleshooting
|
|
203
226
|
|
|
204
227
|
| Command | Purpose |
|
|
205
228
|
|---|---|
|
|
206
229
|
| `vise search-docs "<query>"` | Search social.plus docs for relevant pages |
|
|
207
230
|
| `vise get-doc-page <path>` | Fetch a specific doc page by path |
|
|
231
|
+
| `vise debug [path] --error "..." [--brief]` | Debug an SDK-specific runtime failure and emit a likely-cause summary plus a minimal repair brief |
|
|
208
232
|
|
|
209
233
|
### Compliance verification
|
|
210
234
|
|
|
@@ -225,6 +249,18 @@ The flow above is what the skill teaches your AI agent. You — the human — dr
|
|
|
225
249
|
| `vise run-sensors [path]` | Run detected project commands (npm scripts, Gradle, Flutter, lint, typecheck, SDK import smokes); never executes arbitrary shell |
|
|
226
250
|
| `vise run-sensors [path] --dry-run` | List what would run without executing |
|
|
227
251
|
|
|
252
|
+
### Troubleshooting quick loop
|
|
253
|
+
|
|
254
|
+
For SDK-specific runtime issues, start with the compact debug flow before broader repo exploration:
|
|
255
|
+
|
|
256
|
+
```sh
|
|
257
|
+
vise debug . --error-file logs/crash.log --brief
|
|
258
|
+
vise check . --ci
|
|
259
|
+
vise run-sensors .
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
`vise debug --brief` returns the likely rule, minimum patch shape, invariants to preserve, and verification commands for the first repair pass.
|
|
263
|
+
|
|
228
264
|
### Skill management
|
|
229
265
|
|
|
230
266
|
| Command | Purpose |
|
|
@@ -266,7 +302,7 @@ MCP-capable hosts can call Vise as structured tool calls instead of shell comman
|
|
|
266
302
|
|
|
267
303
|
### Tool names (snake_case per MCP convention)
|
|
268
304
|
|
|
269
|
-
`inspect_project`, `plan_harness`, `plan_integration`, `init_compliance`, `check_compliance`, `sync_compliance`, `attest_rule`, `explain_rule`, `init_engagement`, `show_engagement`, `search_docs`, `get_doc_page`, `validate_setup`, `run_sensors`.
|
|
305
|
+
`inspect_project`, `plan_harness`, `plan_integration`, `init_compliance`, `check_compliance`, `sync_compliance`, `attest_rule`, `explain_rule`, `init_engagement`, `show_engagement`, `search_docs`, `get_doc_page`, `debug_issue`, `validate_setup`, `run_sensors`.
|
|
270
306
|
|
|
271
307
|
These are the same operations as the CLI commands above, exposed as MCP tools.
|
|
272
308
|
|
package/dist/outcomes.js
CHANGED
|
@@ -5,15 +5,15 @@ export function hasAnswer(answers, id) {
|
|
|
5
5
|
const CLASSIFY_ORDER = [
|
|
6
6
|
"setup-push",
|
|
7
7
|
"setup-live-data",
|
|
8
|
+
"add-comments",
|
|
8
9
|
"add-moderation",
|
|
9
10
|
"add-chat",
|
|
10
|
-
"add-comments",
|
|
11
11
|
"add-feed",
|
|
12
12
|
"troubleshoot",
|
|
13
13
|
"validate-setup",
|
|
14
14
|
"setup-sdk",
|
|
15
15
|
];
|
|
16
|
-
const PUSH_PATTERNS = [/\b(push
|
|
16
|
+
const PUSH_PATTERNS = [/\b(push(?:\s+notification)?|push(?:\s+notifications)?|firebase|fcm|apns)\b/];
|
|
17
17
|
const LIVE_PATTERNS = [
|
|
18
18
|
/\b(live object|live objects|live collection|live collections|realtime collection|real-time collection|observe|observer|subscribe|subscription|unsubscribe|live update|live updates)\b/,
|
|
19
19
|
];
|
|
@@ -31,7 +31,7 @@ const CHAT_PATTERNS = [
|
|
|
31
31
|
];
|
|
32
32
|
const TROUBLESHOOT_PATTERNS = [/\b(error|broken|crash|not working|fail|timeout|401|403)\b/];
|
|
33
33
|
const VALIDATE_PATTERNS = [/\b(validate|check|correct|setup right|initiali[sz])\b/];
|
|
34
|
-
const SETUP_PATTERNS = [/\b(setup|set up|install|integrate|wire|configure)\b/];
|
|
34
|
+
const SETUP_PATTERNS = [/\b(setup|set up|install|integrate|wire|configure|init sdk|sdk setup|session lifecycle)\b|initialise?s?\b/];
|
|
35
35
|
export const BROAD_SOCIAL_REGEX = /\b(nice|social features|social feature|engagement|community experience)\b/i;
|
|
36
36
|
export const DESIGN_REGEX = /\bdesign token|design tokens|theme|same design|design system|brand/i;
|
|
37
37
|
export function classifyOutcome(request) {
|
|
@@ -134,10 +134,18 @@ const setupSdk = {
|
|
|
134
134
|
step: "Initialize the social.plus client exactly once with API key and explicit region.",
|
|
135
135
|
evidence: [platformQuickStart(ctx.platform).path, "requiredInputs.social.plus API key local env/config variable", "requiredInputs.social.plus region"],
|
|
136
136
|
},
|
|
137
|
+
{
|
|
138
|
+
step: "When repairing setup, reuse the app's existing region or endpoint config source instead of hardcoding a guessed default value.",
|
|
139
|
+
evidence: [platformQuickStart(ctx.platform).path, "requiredInputs.social.plus region"],
|
|
140
|
+
},
|
|
137
141
|
{
|
|
138
142
|
step: "Wire login after user identity is known and before social.plus API queries/subscriptions.",
|
|
139
143
|
evidence: ["social-plus-sdk/getting-started/authentication", "requiredInputs.user identity source for login"],
|
|
140
144
|
},
|
|
145
|
+
{
|
|
146
|
+
step: "When adding renewal handling, keep it in the existing login path and retain the handler for the full session lifetime.",
|
|
147
|
+
evidence: ["social-plus-sdk/getting-started/authentication", "requiredInputs.user identity source for login"],
|
|
148
|
+
},
|
|
141
149
|
{ step: "Run validate_setup and detected command sensors after edits.", evidence: ["validate_setup", "run_sensors"] },
|
|
142
150
|
],
|
|
143
151
|
validation: (platform) => [`${platform}.setup.present`, `${platform}.login.present`, `${platform}.region.explicit`],
|
|
@@ -470,6 +478,13 @@ const addFeed = {
|
|
|
470
478
|
"social-plus-sdk/core-concepts/realtime-communication/live-objects-collections/overview",
|
|
471
479
|
],
|
|
472
480
|
},
|
|
481
|
+
{
|
|
482
|
+
step: "When repairing or refactoring a feed query, preserve existing pagination inputs and state (for example pageToken, nextPage, hasMore/loadMore, or infinite-query wiring) unless the customer explicitly changes feed behavior.",
|
|
483
|
+
evidence: [
|
|
484
|
+
"requiredInputs.feed scope",
|
|
485
|
+
"implementationRules.file-specific edits",
|
|
486
|
+
],
|
|
487
|
+
},
|
|
473
488
|
{ step: "Reuse the host app's existing visual system for the social surface.", evidence: designEvidence },
|
|
474
489
|
{ step: "Implement loading, empty, error, and data states.", evidence: ["implementationRules.file-specific edits"] },
|
|
475
490
|
{ step: "Run validate_setup and detected command sensors after edits.", evidence: ["validate_setup", "run_sensors"] },
|
|
@@ -878,6 +893,7 @@ const troubleshoot = {
|
|
|
878
893
|
implementationRules: () => [],
|
|
879
894
|
implementationSteps: () => [
|
|
880
895
|
{ step: "Gather more evidence before implementation.", evidence: ["stopConditions", "search_docs", "inspect_project"] },
|
|
896
|
+
{ step: "If the issue is an SDK-specific runtime symptom, run vise debug first and use the repair brief before broader repo exploration.", evidence: ["search_docs", "inspect_project"] },
|
|
881
897
|
],
|
|
882
898
|
validation: () => [],
|
|
883
899
|
stopConditions: () => [],
|
package/dist/server.js
CHANGED
|
@@ -13,6 +13,7 @@ import { planIntegrationTool } from "./tools/integration.js";
|
|
|
13
13
|
import { inspectProjectTool, validateSetupTool } from "./tools/project.js";
|
|
14
14
|
import { resolveRequestTool, suggestPatchTool } from "./tools/resolve.js";
|
|
15
15
|
import { runSensorsTool } from "./tools/sensors.js";
|
|
16
|
+
import { debugIssueTool, debugIssue } from "./tools/debug.js";
|
|
16
17
|
import { packageName, packageVersion } from "./version.js";
|
|
17
18
|
const tools = new Map([
|
|
18
19
|
searchDocsTool,
|
|
@@ -31,6 +32,7 @@ const tools = new Map([
|
|
|
31
32
|
runSensorsTool,
|
|
32
33
|
validateSetupTool,
|
|
33
34
|
suggestPatchTool,
|
|
35
|
+
debugIssueTool,
|
|
34
36
|
].map((tool) => [tool.name, tool]));
|
|
35
37
|
const bundledSkillName = "social-plus-vise";
|
|
36
38
|
const cliResult = await handleCli(process.argv.slice(2));
|
|
@@ -119,6 +121,32 @@ async function handleCli(args) {
|
|
|
119
121
|
});
|
|
120
122
|
return "exit";
|
|
121
123
|
}
|
|
124
|
+
if (command === "debug") {
|
|
125
|
+
assertOnlyKnownFlags(args, ["error", "error-file", "brief"], "debug");
|
|
126
|
+
let errorMessage = flagValue(args, "error");
|
|
127
|
+
if (!errorMessage) {
|
|
128
|
+
const errorFile = flagValue(args, "error-file");
|
|
129
|
+
if (errorFile) {
|
|
130
|
+
errorMessage = await readFile(path.resolve(errorFile), "utf8");
|
|
131
|
+
}
|
|
132
|
+
else if (!process.stdin.isTTY) {
|
|
133
|
+
const { readFileSync } = await import("node:fs");
|
|
134
|
+
try {
|
|
135
|
+
errorMessage = readFileSync(0, "utf-8");
|
|
136
|
+
}
|
|
137
|
+
catch {
|
|
138
|
+
errorMessage = undefined;
|
|
139
|
+
}
|
|
140
|
+
}
|
|
141
|
+
}
|
|
142
|
+
if (!errorMessage) {
|
|
143
|
+
console.error("debug requires --error, --error-file, or piped stdin.");
|
|
144
|
+
process.exitCode = 1;
|
|
145
|
+
return "exit";
|
|
146
|
+
}
|
|
147
|
+
console.log(JSON.stringify(await debugIssue(positionalRepoPath(args.slice(1)), errorMessage, { brief: hasFlag(args, "brief") }), null, 2));
|
|
148
|
+
return "exit";
|
|
149
|
+
}
|
|
122
150
|
if (command === "plan" || command === "plan-integration") {
|
|
123
151
|
await printToolResult(planIntegrationTool, {
|
|
124
152
|
repoPath: positionalRepoPath(args.slice(1)),
|
|
@@ -349,6 +377,16 @@ Run deterministic social.plus setup validation for the current project.
|
|
|
349
377
|
|
|
350
378
|
Usage:
|
|
351
379
|
vise validate [repoPath] [--platform typescript] [--surface apps/web]`;
|
|
380
|
+
}
|
|
381
|
+
if (command === "debug") {
|
|
382
|
+
return `${packageName} debug
|
|
383
|
+
|
|
384
|
+
Correlate an SDK-specific runtime failure to likely compliance issues and emit a minimal repair brief.
|
|
385
|
+
|
|
386
|
+
Usage:
|
|
387
|
+
vise debug [repoPath] --error "401 Unauthorized: TokenExpiredException during social.plus session renewal"
|
|
388
|
+
vise debug [repoPath] --error-file logs/crash.log
|
|
389
|
+
vise debug [repoPath] --error-file logs/crash.log --brief`;
|
|
352
390
|
}
|
|
353
391
|
if (command === "run-sensors" || command === "run-sensor" || command === "run_sensor") {
|
|
354
392
|
return `${packageName} run-sensors
|
|
@@ -425,6 +463,7 @@ Usage:
|
|
|
425
463
|
vise install-skill --target codex Install bundled skill guidance
|
|
426
464
|
vise print-skill Print bundled skill markdown
|
|
427
465
|
vise inspect [repoPath] Inspect platform and design signals
|
|
466
|
+
vise debug [repoPath] --error ... Debug an SDK-specific runtime error and emit a repair brief
|
|
428
467
|
vise plan [repoPath] --request "..." Create an implementation plan
|
|
429
468
|
vise init [repoPath] --request "..." Initialize compliance sidecar
|
|
430
469
|
vise check [repoPath] Check compliance contract
|