@amityco/social-plus-vise 0.8.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,149 @@
1
+ # Changelog
2
+
3
+ All notable changes to `@amityco/social-plus-vise` are documented in this file.
4
+
5
+ The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
+
7
+ ## 0.10.0 — 2026-05-29
8
+
9
+ **Theme:** Benchmark-driven sensor expansion. The Commune benchmark (9 new SDK domains: chat, push, social graph, moderation, comments) produced the first measured, defensible advantage for vise+skill over pure MCP: **7/9 working features vs 3/9** with the same agent on the same prompts. This release ships the sensors, rules, and findings.json improvements that produced that result.
10
+
11
+ ### Added
12
+ - `react-native.chat.channel-type-dm` / `typescript.chat.channel-type-dm` (`warning`) — DM channels must use `type: 'conversation'`, not `type: 'community'`. Agents consistently choose `community` for 1-to-1 chats because it sounds plausible but silently creates a group channel with the wrong shape. Sensor requires `userIds` co-occurrence to avoid firing on legitimate community broadcasts.
13
+ - `react-native.follow.status-subscription` (`warning`) — `getFollowStatus` must be wrapped in a live subscription, not a one-shot query. A one-shot call captures state at mount and never updates — follow/unfollow actions are not reflected in the UI until the user navigates away.
14
+ - `rationale` field in `sp-vise/findings.json` — agents see *why* each rule exists, not just *what* it requires. Improves attestation quality on rules that allow it.
15
+ - Compliance.json rule entries now include a `title` field (digest-stable, separate from hashing) so agents and humans can identify rules without grepping definitions.
16
+ - Corpus grew from **262 → 265 rules**.
17
+
18
+ ### Changed
19
+ - **`vise init` now writes `sp-vise/findings.json` immediately** — agents see current rule violations on startup with no exploration needed. Combined with the `npm run sp-check` script added to scaffolded workspaces, agents follow a directed (read findings → fix → verify) loop instead of an exploratory (search → search → search → implement) loop.
20
+ - **`live-collection.api-mismatch`, `posts.activity-tag-filter`, `posts.reaction-stale-post-ref`, `user.ban-state-respected`** — all now skip `.d.ts` files to eliminate false positives from type stubs.
21
+ - **`user.ban-state-respected`** — `flagComment` and `flagPost` added to the recognised write-pattern list. Flagging is a moderation action and must be ban-guarded.
22
+ - **`react-native.push.unregister.present`** — recommendation generalised; no longer references benchmark-specific state variables. Surfaces the exact `useEffect` cleanup pattern needed.
23
+ - Reactive markers now include `.on('dataUpdated', ...)` — the event-emitter style of subscribing to LiveCollection updates is now recognised as a valid alternative to property-callback subscription.
24
+ - README updated with a step-by-step Quick Start that references `findings.json` directly.
25
+
26
+ ### Benchmark infrastructure (`benchmarks/`)
27
+ - **Commune benchmark** added — 9-slice React Native scenario (CM-SETUP, CM-PRESENCE, CM-FEED, CM-EVENTS, CM-CHAT, CM-PUSH, CM-PROFILE, CM-MODERATE, CM-COMMENTS) covering chat, push, social graph, and moderation domains absent from TouchTunes. Three seed types per slice (`baseline`, `broken`, `greenfield`) for 27 fixture sets total.
28
+ - **Rules-as-markdown control arm** (`benchmarks/commune/run-commune-rules-arm.sh`) — injects the rule corpus as a static document into the agent prompt. Built to isolate whether vise's measured advantage comes from *information delivery* (the rules) or the *iterative verification loop* (sp-check).
29
+ - **TouchTunes runner improvements** — workspace isolation (`workspaces/broken/` vs `workspaces/baseline/` so agents can't peek at the answer), `< /dev/null` stdin redirect fix that was causing agy/codex to silently skip cells, `|| true` per-cell error isolation, and grader auto-attestation for no-file and `.d.ts`-pointing rules.
30
+ - **agy + codex runners** (`run-agy-cells.sh`, `run-codex-cells.sh`) — production-quality scripts with TTY-detection fixes and workspace isolation.
31
+
32
+ ### Findings & reports
33
+ - `benchmarks/FINDINGS.html` — engineering-facing summary of the benchmark methodology, results, and what was/wasn't proven.
34
+ - `benchmarks/MARKETING.html` — three-tier marketing-claim framework (safe / concrete / honest / aspirational) with supporting wallclock data and a list of metrics to instrument next.
35
+
36
+ ### Honest claim
37
+ On 9 new SDK domain implementations with codex gpt-5.4, vise+skill produced 7 working features vs 3 for pure MCP — same agent, same prompts. The cost: +28% wallclock per session. The net: −52% wallclock per *working* feature, because more features ship on the first try. Vise consistently catches five bug classes that capable models otherwise miss: wrong DM channel type, missing push register/unregister lifecycle, one-shot queries where live subscriptions are required, missing ban checks before write operations, and missing flag affordances on user-generated content.
38
+
39
+ ---
40
+
41
+ ## 0.9.0 — 2026-05-27
42
+
43
+ **Theme:** Business model-grounded gap analysis; Next.js / SSR guard; environment hygiene expanded to all platforms.
44
+
45
+ ### Added
46
+ - `typescript.client.no-ssr-init` (`error`) — SDK client must not be initialized in a Next.js Server Component, `layout.tsx` without `'use client'`, or inside `getServerSideProps`/`getStaticProps`. The primary demo-invisible failure mode for AI-native Next.js customers: `next dev` recovers from the error gracefully; `next build` + production does not.
47
+ - `react-native.secret.env-gitignore` — React Native env files containing secret-shaped keys must be excluded by `.gitignore`.
48
+ - `react-native.secret.env-example` — A `.env.example` or `.env.sample` must accompany any gitignored React Native env file.
49
+ - `flutter.secret.env-gitignore` — Flutter `.env` or `secrets.dart` files containing secret-shaped keys must be excluded by `.gitignore`.
50
+ - `android.secret.env-gitignore` — `local.properties` containing secret-shaped keys must be excluded by `.gitignore`.
51
+ - `ios.secret.env-gitignore` — `Secrets.plist` or `*.xcconfig` files containing secret-shaped keys must be excluded by `.gitignore`.
52
+ - Corpus grew from **256 → 262 rules**.
53
+ - `benchmarks/SDK_INTEGRATION_GAP_ANALYSIS.md` — business model-grounded gap analysis mapping every SDK-relevant value claim to Vise rule coverage, with a prioritised improvement backlog.
54
+
55
+ ### Changed
56
+ - **Skill — "Stop Instead Of Guessing":** intake list now asks about Next.js rendering mode (Server Component vs `'use client'` vs Pages Router) before implementing SDK initialization.
57
+ - **Skill — "Session Renewal":** new feedforward: SDK collection queries must not fire before `login()` completes; gate collection setup behind the session-active signal.
58
+ - **Skill — "Live Collection API Mismatch":** new guidance: handle connection-state changes and render a reconnecting indicator when the WebSocket drops.
59
+ - **Skill — "Debugging & Troubleshooting":** compact `--brief` flag documented; `repairBrief` output described.
60
+
61
+ ---
62
+
63
+ ## 0.7.0 — 2026-05-23
64
+
65
+ **Theme:** SDK-specific rule corpus expansion + measured cross-tool benchmark.
66
+
67
+ ### Added
68
+ - 17 new SDK-specific rule families across 5 platforms = **85 new compliance rules** (corpus grew from 167 → 252):
69
+ - **Tier 1 — Silent-failure traps:** `session-handler.retained`, `live-collection.api-mismatch`, `posts.status-filter-applied`, `pagination.cursor-opaque`, `posts.parent-child-rendered`
70
+ - **Tier 2 — Wrong-target / silent misroute:** `feed.target-type-explicit`, `comment.reference-type-enum`, `channel.type-matches-shape`
71
+ - **Tier 3 — Moderator-only data leaking to user UI:** `moderation.role-gated-action`, `flag-count.not-leaked-to-non-mods`, `user.ban-state-respected`
72
+ - **Tier 4 — Notifications & unread state:** `notifications.amity-preferences-configured`, `unread.subscribed-not-counted`
73
+ - **Tier 5 — Custom config & types:** `reactions.configured-name-used`, `custom-post-type.dataType-declared`
74
+ - **Tier 6 — File upload & media:** `file-upload.via-amity-file-client`, `image-post.child-resolution-awaited`
75
+ - Multi-outcome measured benchmark (chat / comments / push on React + Flutter) with cross-tool validation (Antigravity / Gemini 3.5 Flash). See `benchmarks/RESULTS.md`.
76
+ - Fixture-foundation gates: `run-happy-path-clean.mjs` (every canonical happy-path must fire zero rules) and `run-fixture-symmetry.mjs` (every rule's positive fixture must not fire the rule).
77
+ - Dedicated React Native canonical happy-path fixture (previously shared with TypeScript).
78
+ - New CI exit code `4` for `contract-drift` (rules in `sp-vise/compliance.json` no longer match current ruleset).
79
+
80
+ ### Fixed
81
+ - `*.secret.inline-api-key` now catches env-fallback literal leaks: `String.fromEnvironment(..., defaultValue: 'literal')` (Dart), `process.env.X ?? 'literal'` (JS/TS), ternary fallback. Previously these forms slipped past the regex because the literal wasn't directly assigned to `apiKey`.
82
+ - Four web/Flutter rule false-positives on idiomatic guarded code: `typescript.client.region` now accepts env-sourced and positional region declarations; `*.network.error-handling-present` recognizes React error-state idiom; `flutter.design.reuse-detected-tokens` credits `Theme.of(context)` reuse.
83
+ - Pre-existing CLI version assertion in `test/run-cli.mjs` (was pinned to `0.4.0`).
84
+
85
+ ### Changed
86
+ - Project structure flattened — `packages/foundry/` layer removed; npm package now publishes from the repo root.
87
+ - README consolidated from two files (brand + developer) into a single customer-facing canonical README; internal architecture moved to `docs/`.
88
+
89
+ ## 0.6.0 — 2026-05-22
90
+
91
+ **Theme:** v0.6 compliance expansion + 5-platform measured benchmark.
92
+
93
+ ### Added
94
+ - Corpus grew to 167 rules across 10 domains.
95
+ - Outcomes: `add-comments`, `add-moderation`, `add-chat`.
96
+ - Five-platform measured benchmark (TypeScript / React Native / Flutter / Android / iOS) with real `vise check` and `vise run-sensors` artifacts.
97
+
98
+ ### Fixed
99
+ - React Native platform detection priority (previously misdetected as TypeScript when both signals were present).
100
+
101
+ ## 0.5.0 — 2026-05-21
102
+
103
+ **Theme:** AST-based sensors.
104
+
105
+ ### Added
106
+ - Tree-sitter AST sensors for Kotlin / Swift / Dart literal detection.
107
+ - Phase 1 pilot: `typescript.auth.no-literal-user-id` resolves identifier-via-constant indirections.
108
+ - Phase 4: AST-aware comment stripping for `ui-states-present` and `design-reuse-detected-tokens` rules.
109
+
110
+ ## 0.4.0 — 2026-05-20
111
+
112
+ **Theme:** Compliance harness.
113
+
114
+ ### Added
115
+ - `vise check --ci`: read-only verification with structured exit codes for CI pipelines.
116
+ - Attestation flow: `vise attest` with rule id, signer, confidence, evidence, and rationale.
117
+ - `vise sync`: persist deterministic-pass attestation files.
118
+ - Engagement tracking: `vise engagement init/show` for tier / customer-id / scope metadata.
119
+ - `sp-vise/` sidecar directory: customer-visible compliance contract (`compliance.json`, `attestations/`, `engagement.json`, `inspection.json`).
120
+ - Cross-platform rule corpus.
121
+ - Native project skill installs.
122
+
123
+ ## 0.3.0 — 2026-05-19
124
+
125
+ **Theme:** Foundry → Vise rename.
126
+
127
+ ### Changed
128
+ - Renamed npm package to `@amityco/social-plus-vise`.
129
+ - Added `vise` short binary alias; kept `foundry-mcp` as a compatibility binary alias.
130
+ - Added Claude Code skill targets (`--target claude`, `--target claude-project .`).
131
+ - Documented Cursor, Copilot, and VS Code instruction installs.
132
+
133
+ ## 0.2.1 — 2026-05-18
134
+
135
+ ### Added
136
+ - `vise install-skill`, `vise print-skill`, and `vise skill-path` for bundled-skill installation.
137
+
138
+ ## 0.2.0 — 2026-05-17
139
+
140
+ ### Added
141
+ - Skill-guided CLI commands: `inspect`, `plan`, `validate`, `run-sensors`.
142
+ - The `social-plus-vise` skill guidance shipped as part of the package.
143
+
144
+ ## 0.1.1 — 2026-05-16
145
+
146
+ ### Added
147
+ - Initial npm publish.
148
+ - MCP adapter (stdio).
149
+ - Doc search backed by `https://learn.social.plus/llms-full.txt`.
package/README.md CHANGED
@@ -1,7 +1,3 @@
1
- <p align="center">
2
- <img src="./social.plus-vise.png" alt="social.plus Vise" width="320" />
3
- </p>
4
-
5
1
  <h1 align="center">social.plus Vise</h1>
6
2
 
7
3
  <p align="center">
@@ -31,24 +27,25 @@ vise install-skill --target claude # Claude Code (personal)
31
27
  vise install-skill --target cursor . # Cursor (project-local)
32
28
  vise install-skill --target copilot . # GitHub Copilot / VS Code
33
29
 
34
- # 3. Inside your project, let your AI agent run the Vise loop
35
- cd your-app
36
- vise inspect # detect platform, surface, design signals
37
- vise plan --request "Add a social feed" # produce a grounded implementation plan
38
- vise init --request "Add a social feed" # write the sp-vise/ compliance contract
39
-
40
- # 4. After the agent makes edits
41
- vise check # verify the integration against the contract
42
- vise run-sensors # run your project's own build/typecheck/lint
30
+ # 3. Ask your AI agent to integrate with social.plus
31
+ # (the skill handles the rest — inspect, plan, init, code, check)
43
32
  ```
44
33
 
45
- That's it. The skill at `skills/social-plus-vise/SKILL.md` (installed in step 2) teaches your AI agent when to run each command. Skip to [Usage Flow](#usage-flow) for the full picture.
34
+ **Step 3 in practice:** Open your AI coding tool in your project and prompt:
35
+
36
+ > "Add a social feed to this app using the social.plus SDK."
37
+
38
+ The installed skill teaches your agent to run `vise inspect` → `vise plan` → `vise init` → edit code → `vise check` → `vise run-sensors` automatically. You drive intent; Vise keeps the agent on the rails.
39
+
40
+ See [Usage Flow](#usage-flow) for the full step-by-step diagram.
46
41
 
47
42
  ---
48
43
 
49
- ## What Vise Does
44
+ ## What Vise Does: Agentic Workflow Governance
45
+
46
+ Instead of just providing a CLI or AI skills, Vise implements a technique called **Agentic Workflow Governance**. Think of it as building a software factory directly on top of the customer's project.
50
47
 
51
- Vise is a **local CLI + AI skill** that wraps coding agents in deterministic compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 250+ platform-specific compliance rules, and runs your project's own build/lint/typecheck sensors — all locally. **Your source code never leaves your machine.**
48
+ Vise acts as the foreman of this factory, wrapping your local coding agents in compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 262 platform-specific compliance rules, and runs your project's own build/lint/typecheck sensors. **Your source code never leaves your machine.**
52
49
 
53
50
  | Layer | Purpose |
54
51
  |---|---|
@@ -62,6 +59,81 @@ A bench vise holds the workpiece steady so the craftsman's hands are free to sha
62
59
 
63
60
  ---
64
61
 
62
+ ## Benchmark: Phase 1 Results
63
+
64
+ > **Every feature delivered correctly — confirmed independently with two different AI coding tools.**
65
+ > With Vise, both agents built all 9 social features with no production gaps. Without Vise, 3 out of 9 features had hidden problems that would only surface after users complained.
66
+
67
+ ### What "delivered correctly" means
68
+
69
+ "Correct" doesn't just mean the code compiles. It means every feature handles the edge cases that matter to real users and real moderation teams:
70
+
71
+ - A **banned user** cannot type or submit a post — the send button is hidden, not just disabled-on-submit
72
+ - **Push notification preferences** are wired to the Amity API so users who opt out actually stop receiving notifications
73
+ - **Moderation actions** (report, flag, block) are surfaced in the UI so users can act on them, not buried in a hook
74
+ - **Chat and feed queries** use live, reactive subscriptions — not one-time fetches that go stale
75
+
76
+ Without Vise, AI agents frequently implement the primary feature correctly but miss these secondary requirements. They know about them in the abstract — but when building a chat screen, "ban state" feels out of scope and gets skipped. `sp-check` turns that vague awareness into a specific, actionable finding.
77
+
78
+ ### The experiment: three conditions, nine features
79
+
80
+ We ran a controlled experiment — the **Commune Benchmark** — to measure not just *whether* Vise helps, but *why*. Each of the nine features below was built from scratch by an AI agent under three independent conditions:
81
+
82
+ **Nine features built:**
83
+ SDK setup · User presence · Social feed · Events · Chat & DMs · Push notifications · User profile · Content moderation · Comments
84
+
85
+ | Condition | What the agent had | The question it answers |
86
+ |---|---|---|
87
+ | **Pure MCP** | Access to social.plus docs only — no compliance guidance | Baseline: how well does the agent do on its own? |
88
+ | **Rules-as-Markdown** | The full 1,013-line compliance rulebook pasted directly into the prompt | Is the problem just that the agent doesn't know the rules? |
89
+ | **Vise + Skill** | Full Vise CLI — `sp-check` runs automatically, agent reads specific findings, fixes them, repeats until green | Does an active feedback loop change the outcome? |
90
+
91
+ The Rules-as-Markdown condition is the key isolation: if the agent already knows all the rules, does giving it the spec document fix the problem? The answer turned out to be **no** — knowing the rules and being forced to act on specific findings are different things.
92
+
93
+ ### Results — features delivered without production gaps
94
+
95
+ | Coding agent (model) | Pure MCP | Rules-as-Markdown | Vise + Skill |
96
+ |---|---|---|---|
97
+ | **Cursor (Composer 2.5)** | 6 out of 9 ✗ | 5 out of 9 ✗ | **9 out of 9 ✅** |
98
+ | **Claude Code (Sonnet 4.6)** | 6 out of 9 ✗ | 7 out of 9 ✗ | **9 out of 9 ✅** |
99
+
100
+ The three features that consistently fail without Vise — **Chat**, **Moderation**, and **Push Notifications** — are exactly the ones with secondary compliance requirements (ban-state, report affordances, Amity preference API). Vise's `sp-check` catches these with a specific finding; the rules doc does not.
101
+
102
+ Both agents reached a perfect score with Vise. Neither could reach it with the compliance spec pasted into the prompt. All 9 passes were independently verified by code inspection — no scoring shortcuts.
103
+
104
+ ### Efficiency — rework sessions needed
105
+
106
+ Vise delivers all 9 features correctly in a single session. The other conditions leave failing features that require additional sessions to diagnose (the gap isn't visible without `sp-check`) and fix.
107
+
108
+ | Coding agent (model) | Condition | Features correct | Rework sessions needed |
109
+ |---|---|---|---|
110
+ | **Cursor (Composer 2.5)** | Pure MCP | 6 / 9 ✗ | +3 or more |
111
+ | **Cursor (Composer 2.5)** | Rules-as-Markdown | 5 / 9 ✗ | +4 or more |
112
+ | **Cursor (Composer 2.5)** | **Vise + Skill** | **9 / 9 ✅** | **0 ✅** |
113
+ | **Claude Code (Sonnet 4.6)** | Pure MCP | 6 / 9 ✗ | +3 or more |
114
+ | **Claude Code (Sonnet 4.6)** | Rules-as-Markdown | 7 / 9 ✗ | +2 or more |
115
+ | **Claude Code (Sonnet 4.6)** | **Vise + Skill** | **9 / 9 ✅** | **0 ✅** |
116
+
117
+ <sub>Rework sessions are additional developer-initiated prompts needed after the initial session to diagnose and fix the failing features. Each failing feature typically requires at least one session to identify the gap and one to fix it — and that's without the benefit of `sp-check` pointing directly at the problem.</sub>
118
+
119
+ ### Reproducibility
120
+
121
+ - **Gate-checked:** Every pass was verified by code inspection — the Vise workspaces contain an actual UI-level ban gate; the pure-MCP workspaces do not. Zero attestation shortcuts.
122
+ - **Built from scratch** (greenfield seed) — not patching existing code.
123
+ - **Three arms run with separate tooling.** The Rules-as-Markdown arm has no `sp-check` tool available — it cannot "cheat" by running the checker.
124
+ - **N=1 per cell (Phase 1).** Each agent ran each scenario once. Repeatability seeds on the three most discriminating slices (CM-CHAT, CM-MODERATE, CM-PUSH) are pending. These results should be treated as a strong initial signal, not a statistically settled finding.
125
+ - Full per-feature scorecards, agent transcripts, and workspace diffs: [`benchmarks/FINDINGS.html`](benchmarks/FINDINGS.html) · [`benchmarks/RULES_AS_MARKDOWN.html`](benchmarks/RULES_AS_MARKDOWN.html)
126
+
127
+ ### Which mode should I use?
128
+
129
+ | If you… | Use | Why |
130
+ |---|---|---|
131
+ | Building new social features with an AI agent | **Vise CLI + Skill** | The only mode that reliably delivers all features correctly |
132
+ | Auditing existing social.plus code | `vise check --ci` | Grades any codebase against the full ruleset |
133
+ | Enforcing compliance in a CI pipeline | `vise check --ci` | Exits non-zero on failures; structured JSON output for logs |
134
+
135
+ ---
136
+
65
137
  ## Supported Platforms
66
138
 
67
139
  | Platform | Coverage | Sensors |
@@ -72,7 +144,7 @@ A bench vise holds the workpiece steady so the craftsman's hands are free to sha
72
144
  | **Android (Kotlin)** | ✅ Full | Gradle assemble, unit tests |
73
145
  | **iOS (Swift)** | ✅ Full | (static rule checks; runtime sensors WIP) |
74
146
 
75
- Each platform has 5055 rules across 10 compliance domains (feed, comments, moderation, chat, secrets, session & auth, notifications, live objects, logging hygiene, design tokens).
147
+ Each platform has 5254 rules across 10 compliance domains (feed, comments, moderation, chat, secrets, session & auth, notifications, live objects, logging hygiene, design tokens).
76
148
 
77
149
  ---
78
150
 
@@ -150,12 +222,13 @@ The flow above is what the skill teaches your AI agent. You — the human — dr
150
222
  | `vise plan-harness [path] --request "..."` | (Pre-planning step) Build the harness around the request |
151
223
  | `vise init [path] --request "..."` | Write the `sp-vise/` compliance contract for this project |
152
224
 
153
- ### Documentation grounding
225
+ ### Documentation grounding & Troubleshooting
154
226
 
155
227
  | Command | Purpose |
156
228
  |---|---|
157
229
  | `vise search-docs "<query>"` | Search social.plus docs for relevant pages |
158
230
  | `vise get-doc-page <path>` | Fetch a specific doc page by path |
231
+ | `vise debug [path] --error "..." [--brief]` | Debug an SDK-specific runtime failure and emit a likely-cause summary plus a minimal repair brief |
159
232
 
160
233
  ### Compliance verification
161
234
 
@@ -176,6 +249,18 @@ The flow above is what the skill teaches your AI agent. You — the human — dr
176
249
  | `vise run-sensors [path]` | Run detected project commands (npm scripts, Gradle, Flutter, lint, typecheck, SDK import smokes); never executes arbitrary shell |
177
250
  | `vise run-sensors [path] --dry-run` | List what would run without executing |
178
251
 
252
+ ### Troubleshooting quick loop
253
+
254
+ For SDK-specific runtime issues, start with the compact debug flow before broader repo exploration:
255
+
256
+ ```sh
257
+ vise debug . --error-file logs/crash.log --brief
258
+ vise check . --ci
259
+ vise run-sensors .
260
+ ```
261
+
262
+ `vise debug --brief` returns the likely rule, minimum patch shape, invariants to preserve, and verification commands for the first repair pass.
263
+
179
264
  ### Skill management
180
265
 
181
266
  | Command | Purpose |
@@ -217,7 +302,7 @@ MCP-capable hosts can call Vise as structured tool calls instead of shell comman
217
302
 
218
303
  ### Tool names (snake_case per MCP convention)
219
304
 
220
- `inspect_project`, `plan_harness`, `plan_integration`, `init_compliance`, `check_compliance`, `sync_compliance`, `attest_rule`, `explain_rule`, `init_engagement`, `show_engagement`, `search_docs`, `get_doc_page`, `validate_setup`, `run_sensors`.
305
+ `inspect_project`, `plan_harness`, `plan_integration`, `init_compliance`, `check_compliance`, `sync_compliance`, `attest_rule`, `explain_rule`, `init_engagement`, `show_engagement`, `search_docs`, `get_doc_page`, `debug_issue`, `validate_setup`, `run_sensors`.
221
306
 
222
307
  These are the same operations as the CLI commands above, exposed as MCP tools.
223
308
 
@@ -289,22 +374,6 @@ Attestation files record source fingerprints (SHA-256 of cited files) so subsequ
289
374
 
290
375
  ---
291
376
 
292
- ## Benchmark Headline
293
-
294
- | Platform | Pure MCP findings | Vise findings | Pure MCP CI | Vise CI |
295
- |---|---|---|---|---|
296
- | React / Next.js | 7 (3 errors) | 2 (warnings) | ❌ FAIL | ✅ PASS |
297
- | React Native | 6 | 2 | ❌ FAIL | ✅ PASS |
298
- | Flutter | 9 | 2 | ❌ FAIL | ✅ PASS |
299
- | Android (Kotlin) | 9 | 0 | ❌ FAIL | ✅ PASS |
300
- | iOS (Swift) | 8 | 0 | ❌ FAIL | ✅ PASS |
301
-
302
- Measured runs of the same AI agent (Claude Sonnet 4.6) implementing "add a global feed" on each platform, with and without Vise. Without Vise: every run ships a hardcoded API key (a deterministic failure that cannot be attested). With Vise: every run passes CI with zero deterministic failures.
303
-
304
- For full methodology, per-cell scorecards, and the v0.7 multi-outcome cross-tool validation (chat / comments / push on React + Flutter, plus Antigravity/Gemini cross-tool), see [`benchmarks/RESULTS.md`](./benchmarks/RESULTS.md) in the installed npm package.
305
-
306
- ---
307
-
308
377
  ## Changelog
309
378
 
310
379
  See [`CHANGELOG.md`](./CHANGELOG.md) for the full version history.
package/dist/outcomes.js CHANGED
@@ -5,15 +5,15 @@ export function hasAnswer(answers, id) {
5
5
  const CLASSIFY_ORDER = [
6
6
  "setup-push",
7
7
  "setup-live-data",
8
+ "add-comments",
8
9
  "add-moderation",
9
10
  "add-chat",
10
- "add-comments",
11
11
  "add-feed",
12
12
  "troubleshoot",
13
13
  "validate-setup",
14
14
  "setup-sdk",
15
15
  ];
16
- const PUSH_PATTERNS = [/\b(push|notification|firebase|fcm|apns)\b/];
16
+ const PUSH_PATTERNS = [/\b(push(?:\s+notification)?|push(?:\s+notifications)?|firebase|fcm|apns)\b/];
17
17
  const LIVE_PATTERNS = [
18
18
  /\b(live object|live objects|live collection|live collections|realtime collection|real-time collection|observe|observer|subscribe|subscription|unsubscribe|live update|live updates)\b/,
19
19
  ];
@@ -31,7 +31,7 @@ const CHAT_PATTERNS = [
31
31
  ];
32
32
  const TROUBLESHOOT_PATTERNS = [/\b(error|broken|crash|not working|fail|timeout|401|403)\b/];
33
33
  const VALIDATE_PATTERNS = [/\b(validate|check|correct|setup right|initiali[sz])\b/];
34
- const SETUP_PATTERNS = [/\b(setup|set up|install|integrate|wire|configure)\b/];
34
+ const SETUP_PATTERNS = [/\b(setup|set up|install|integrate|wire|configure|init sdk|sdk setup|session lifecycle)\b|initialise?s?\b/];
35
35
  export const BROAD_SOCIAL_REGEX = /\b(nice|social features|social feature|engagement|community experience)\b/i;
36
36
  export const DESIGN_REGEX = /\bdesign token|design tokens|theme|same design|design system|brand/i;
37
37
  export function classifyOutcome(request) {
@@ -134,10 +134,18 @@ const setupSdk = {
134
134
  step: "Initialize the social.plus client exactly once with API key and explicit region.",
135
135
  evidence: [platformQuickStart(ctx.platform).path, "requiredInputs.social.plus API key local env/config variable", "requiredInputs.social.plus region"],
136
136
  },
137
+ {
138
+ step: "When repairing setup, reuse the app's existing region or endpoint config source instead of hardcoding a guessed default value.",
139
+ evidence: [platformQuickStart(ctx.platform).path, "requiredInputs.social.plus region"],
140
+ },
137
141
  {
138
142
  step: "Wire login after user identity is known and before social.plus API queries/subscriptions.",
139
143
  evidence: ["social-plus-sdk/getting-started/authentication", "requiredInputs.user identity source for login"],
140
144
  },
145
+ {
146
+ step: "When adding renewal handling, keep it in the existing login path and retain the handler for the full session lifetime.",
147
+ evidence: ["social-plus-sdk/getting-started/authentication", "requiredInputs.user identity source for login"],
148
+ },
141
149
  { step: "Run validate_setup and detected command sensors after edits.", evidence: ["validate_setup", "run_sensors"] },
142
150
  ],
143
151
  validation: (platform) => [`${platform}.setup.present`, `${platform}.login.present`, `${platform}.region.explicit`],
@@ -470,6 +478,13 @@ const addFeed = {
470
478
  "social-plus-sdk/core-concepts/realtime-communication/live-objects-collections/overview",
471
479
  ],
472
480
  },
481
+ {
482
+ step: "When repairing or refactoring a feed query, preserve existing pagination inputs and state (for example pageToken, nextPage, hasMore/loadMore, or infinite-query wiring) unless the customer explicitly changes feed behavior.",
483
+ evidence: [
484
+ "requiredInputs.feed scope",
485
+ "implementationRules.file-specific edits",
486
+ ],
487
+ },
473
488
  { step: "Reuse the host app's existing visual system for the social surface.", evidence: designEvidence },
474
489
  { step: "Implement loading, empty, error, and data states.", evidence: ["implementationRules.file-specific edits"] },
475
490
  { step: "Run validate_setup and detected command sensors after edits.", evidence: ["validate_setup", "run_sensors"] },
@@ -878,6 +893,7 @@ const troubleshoot = {
878
893
  implementationRules: () => [],
879
894
  implementationSteps: () => [
880
895
  { step: "Gather more evidence before implementation.", evidence: ["stopConditions", "search_docs", "inspect_project"] },
896
+ { step: "If the issue is an SDK-specific runtime symptom, run vise debug first and use the repair brief before broader repo exploration.", evidence: ["search_docs", "inspect_project"] },
881
897
  ],
882
898
  validation: () => [],
883
899
  stopConditions: () => [],
package/dist/server.js CHANGED
@@ -13,6 +13,7 @@ import { planIntegrationTool } from "./tools/integration.js";
13
13
  import { inspectProjectTool, validateSetupTool } from "./tools/project.js";
14
14
  import { resolveRequestTool, suggestPatchTool } from "./tools/resolve.js";
15
15
  import { runSensorsTool } from "./tools/sensors.js";
16
+ import { debugIssueTool, debugIssue } from "./tools/debug.js";
16
17
  import { packageName, packageVersion } from "./version.js";
17
18
  const tools = new Map([
18
19
  searchDocsTool,
@@ -31,6 +32,7 @@ const tools = new Map([
31
32
  runSensorsTool,
32
33
  validateSetupTool,
33
34
  suggestPatchTool,
35
+ debugIssueTool,
34
36
  ].map((tool) => [tool.name, tool]));
35
37
  const bundledSkillName = "social-plus-vise";
36
38
  const cliResult = await handleCli(process.argv.slice(2));
@@ -119,6 +121,32 @@ async function handleCli(args) {
119
121
  });
120
122
  return "exit";
121
123
  }
124
+ if (command === "debug") {
125
+ assertOnlyKnownFlags(args, ["error", "error-file", "brief"], "debug");
126
+ let errorMessage = flagValue(args, "error");
127
+ if (!errorMessage) {
128
+ const errorFile = flagValue(args, "error-file");
129
+ if (errorFile) {
130
+ errorMessage = await readFile(path.resolve(errorFile), "utf8");
131
+ }
132
+ else if (!process.stdin.isTTY) {
133
+ const { readFileSync } = await import("node:fs");
134
+ try {
135
+ errorMessage = readFileSync(0, "utf-8");
136
+ }
137
+ catch {
138
+ errorMessage = undefined;
139
+ }
140
+ }
141
+ }
142
+ if (!errorMessage) {
143
+ console.error("debug requires --error, --error-file, or piped stdin.");
144
+ process.exitCode = 1;
145
+ return "exit";
146
+ }
147
+ console.log(JSON.stringify(await debugIssue(positionalRepoPath(args.slice(1)), errorMessage, { brief: hasFlag(args, "brief") }), null, 2));
148
+ return "exit";
149
+ }
122
150
  if (command === "plan" || command === "plan-integration") {
123
151
  await printToolResult(planIntegrationTool, {
124
152
  repoPath: positionalRepoPath(args.slice(1)),
@@ -349,6 +377,16 @@ Run deterministic social.plus setup validation for the current project.
349
377
 
350
378
  Usage:
351
379
  vise validate [repoPath] [--platform typescript] [--surface apps/web]`;
380
+ }
381
+ if (command === "debug") {
382
+ return `${packageName} debug
383
+
384
+ Correlate an SDK-specific runtime failure to likely compliance issues and emit a minimal repair brief.
385
+
386
+ Usage:
387
+ vise debug [repoPath] --error "401 Unauthorized: TokenExpiredException during social.plus session renewal"
388
+ vise debug [repoPath] --error-file logs/crash.log
389
+ vise debug [repoPath] --error-file logs/crash.log --brief`;
352
390
  }
353
391
  if (command === "run-sensors" || command === "run-sensor" || command === "run_sensor") {
354
392
  return `${packageName} run-sensors
@@ -425,6 +463,7 @@ Usage:
425
463
  vise install-skill --target codex Install bundled skill guidance
426
464
  vise print-skill Print bundled skill markdown
427
465
  vise inspect [repoPath] Inspect platform and design signals
466
+ vise debug [repoPath] --error ... Debug an SDK-specific runtime error and emit a repair brief
428
467
  vise plan [repoPath] --request "..." Create an implementation plan
429
468
  vise init [repoPath] --request "..." Initialize compliance sidecar
430
469
  vise check [repoPath] Check compliance contract