npm - @amityco/social-plus-vise - Versions diffs - 0.8.0 → 0.11.0 - Mend

@amityco/social-plus-vise 0.8.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/CHANGELOG.md +149 -0
package/README.md +104 -35
package/dist/outcomes.js +19 -3
package/dist/server.js +39 -0
package/dist/tools/compliance.js +68 -20
package/dist/tools/debug.js +267 -0
package/dist/tools/harness.js +17 -1
package/dist/tools/project.js +222 -25
package/dist/types.js +4 -0
package/package.json +14 -6
package/rules/auth.yaml +298 -38
package/rules/feed.yaml +1 -1
package/rules/live-data.yaml +316 -36
package/rules/push.yaml +140 -0
package/rules/sdk-lifecycle.yaml +1421 -131
package/rules/security.yaml +60 -0
package/skills/social-plus-vise/SKILL.md +45 -2
package/skills/vise-harness-engineer/SKILL.md +35 -0
package/social.plus-vise.png +0 -0

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,149 @@
+# Changelog
+All notable changes to `@amityco/social-plus-vise` are documented in this file.
+The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## 0.10.0 — 2026-05-29
+**Theme:** Benchmark-driven sensor expansion. The Commune benchmark (9 new SDK domains: chat, push, social graph, moderation, comments) produced the first measured, defensible advantage for vise+skill over pure MCP: **7/9 working features vs 3/9** with the same agent on the same prompts. This release ships the sensors, rules, and findings.json improvements that produced that result.
+### Added
+- `react-native.chat.channel-type-dm` / `typescript.chat.channel-type-dm` (`warning`) — DM channels must use `type: 'conversation'`, not `type: 'community'`. Agents consistently choose `community` for 1-to-1 chats because it sounds plausible but silently creates a group channel with the wrong shape. Sensor requires `userIds` co-occurrence to avoid firing on legitimate community broadcasts.
+- `react-native.follow.status-subscription` (`warning`) — `getFollowStatus` must be wrapped in a live subscription, not a one-shot query. A one-shot call captures state at mount and never updates — follow/unfollow actions are not reflected in the UI until the user navigates away.
+- `rationale` field in `sp-vise/findings.json` — agents see *why* each rule exists, not just *what* it requires. Improves attestation quality on rules that allow it.
+- Compliance.json rule entries now include a `title` field (digest-stable, separate from hashing) so agents and humans can identify rules without grepping definitions.
+- Corpus grew from **262 → 265 rules**.
+### Changed
+- **`vise init` now writes `sp-vise/findings.json` immediately** — agents see current rule violations on startup with no exploration needed. Combined with the `npm run sp-check` script added to scaffolded workspaces, agents follow a directed (read findings → fix → verify) loop instead of an exploratory (search → search → search → implement) loop.
+- **`live-collection.api-mismatch`, `posts.activity-tag-filter`, `posts.reaction-stale-post-ref`, `user.ban-state-respected`** — all now skip `.d.ts` files to eliminate false positives from type stubs.
+- **`user.ban-state-respected`** — `flagComment` and `flagPost` added to the recognised write-pattern list. Flagging is a moderation action and must be ban-guarded.
+- **`react-native.push.unregister.present`** — recommendation generalised; no longer references benchmark-specific state variables. Surfaces the exact `useEffect` cleanup pattern needed.
+- Reactive markers now include `.on('dataUpdated', ...)` — the event-emitter style of subscribing to LiveCollection updates is now recognised as a valid alternative to property-callback subscription.
+- README updated with a step-by-step Quick Start that references `findings.json` directly.
+### Benchmark infrastructure (`benchmarks/`)
+- **Commune benchmark** added — 9-slice React Native scenario (CM-SETUP, CM-PRESENCE, CM-FEED, CM-EVENTS, CM-CHAT, CM-PUSH, CM-PROFILE, CM-MODERATE, CM-COMMENTS) covering chat, push, social graph, and moderation domains absent from TouchTunes. Three seed types per slice (`baseline`, `broken`, `greenfield`) for 27 fixture sets total.
+- **Rules-as-markdown control arm** (`benchmarks/commune/run-commune-rules-arm.sh`) — injects the rule corpus as a static document into the agent prompt. Built to isolate whether vise's measured advantage comes from *information delivery* (the rules) or the *iterative verification loop* (sp-check).
+- **TouchTunes runner improvements** — workspace isolation (`workspaces/broken/` vs `workspaces/baseline/` so agents can't peek at the answer), `< /dev/null` stdin redirect fix that was causing agy/codex to silently skip cells, `|| true` per-cell error isolation, and grader auto-attestation for no-file and `.d.ts`-pointing rules.
+- **agy + codex runners** (`run-agy-cells.sh`, `run-codex-cells.sh`) — production-quality scripts with TTY-detection fixes and workspace isolation.
+### Findings & reports
+- `benchmarks/FINDINGS.html` — engineering-facing summary of the benchmark methodology, results, and what was/wasn't proven.
+- `benchmarks/MARKETING.html` — three-tier marketing-claim framework (safe / concrete / honest / aspirational) with supporting wallclock data and a list of metrics to instrument next.
+### Honest claim
+On 9 new SDK domain implementations with codex gpt-5.4, vise+skill produced 7 working features vs 3 for pure MCP — same agent, same prompts. The cost: +28% wallclock per session. The net: −52% wallclock per *working* feature, because more features ship on the first try. Vise consistently catches five bug classes that capable models otherwise miss: wrong DM channel type, missing push register/unregister lifecycle, one-shot queries where live subscriptions are required, missing ban checks before write operations, and missing flag affordances on user-generated content.
+---
+## 0.9.0 — 2026-05-27
+**Theme:** Business model-grounded gap analysis; Next.js / SSR guard; environment hygiene expanded to all platforms.
+### Added
+- `typescript.client.no-ssr-init` (`error`) — SDK client must not be initialized in a Next.js Server Component, `layout.tsx` without `'use client'`, or inside `getServerSideProps`/`getStaticProps`. The primary demo-invisible failure mode for AI-native Next.js customers: `next dev` recovers from the error gracefully; `next build` + production does not.
+- `react-native.secret.env-gitignore` — React Native env files containing secret-shaped keys must be excluded by `.gitignore`.
+- `react-native.secret.env-example` — A `.env.example` or `.env.sample` must accompany any gitignored React Native env file.
+- `flutter.secret.env-gitignore` — Flutter `.env` or `secrets.dart` files containing secret-shaped keys must be excluded by `.gitignore`.
+- `android.secret.env-gitignore` — `local.properties` containing secret-shaped keys must be excluded by `.gitignore`.
+- `ios.secret.env-gitignore` — `Secrets.plist` or `*.xcconfig` files containing secret-shaped keys must be excluded by `.gitignore`.
+- Corpus grew from **256 → 262 rules**.
+- `benchmarks/SDK_INTEGRATION_GAP_ANALYSIS.md` — business model-grounded gap analysis mapping every SDK-relevant value claim to Vise rule coverage, with a prioritised improvement backlog.
+### Changed
+- **Skill — "Stop Instead Of Guessing":** intake list now asks about Next.js rendering mode (Server Component vs `'use client'` vs Pages Router) before implementing SDK initialization.
+- **Skill — "Session Renewal":** new feedforward: SDK collection queries must not fire before `login()` completes; gate collection setup behind the session-active signal.
+- **Skill — "Live Collection API Mismatch":** new guidance: handle connection-state changes and render a reconnecting indicator when the WebSocket drops.
+- **Skill — "Debugging & Troubleshooting":** compact `--brief` flag documented; `repairBrief` output described.
+---
+## 0.7.0 — 2026-05-23
+**Theme:** SDK-specific rule corpus expansion + measured cross-tool benchmark.
+### Added
+- 17 new SDK-specific rule families across 5 platforms = **85 new compliance rules** (corpus grew from 167 → 252):
+  - **Tier 1 — Silent-failure traps:** `session-handler.retained`, `live-collection.api-mismatch`, `posts.status-filter-applied`, `pagination.cursor-opaque`, `posts.parent-child-rendered`
+  - **Tier 2 — Wrong-target / silent misroute:** `feed.target-type-explicit`, `comment.reference-type-enum`, `channel.type-matches-shape`
+  - **Tier 3 — Moderator-only data leaking to user UI:** `moderation.role-gated-action`, `flag-count.not-leaked-to-non-mods`, `user.ban-state-respected`
+  - **Tier 4 — Notifications & unread state:** `notifications.amity-preferences-configured`, `unread.subscribed-not-counted`
+  - **Tier 5 — Custom config & types:** `reactions.configured-name-used`, `custom-post-type.dataType-declared`
+  - **Tier 6 — File upload & media:** `file-upload.via-amity-file-client`, `image-post.child-resolution-awaited`
+- Multi-outcome measured benchmark (chat / comments / push on React + Flutter) with cross-tool validation (Antigravity / Gemini 3.5 Flash). See `benchmarks/RESULTS.md`.
+- Fixture-foundation gates: `run-happy-path-clean.mjs` (every canonical happy-path must fire zero rules) and `run-fixture-symmetry.mjs` (every rule's positive fixture must not fire the rule).
+- Dedicated React Native canonical happy-path fixture (previously shared with TypeScript).
+- New CI exit code `4` for `contract-drift` (rules in `sp-vise/compliance.json` no longer match current ruleset).
+### Fixed
+- `*.secret.inline-api-key` now catches env-fallback literal leaks: `String.fromEnvironment(..., defaultValue: 'literal')` (Dart), `process.env.X ?? 'literal'` (JS/TS), ternary fallback. Previously these forms slipped past the regex because the literal wasn't directly assigned to `apiKey`.
+- Four web/Flutter rule false-positives on idiomatic guarded code: `typescript.client.region` now accepts env-sourced and positional region declarations; `*.network.error-handling-present` recognizes React error-state idiom; `flutter.design.reuse-detected-tokens` credits `Theme.of(context)` reuse.
+- Pre-existing CLI version assertion in `test/run-cli.mjs` (was pinned to `0.4.0`).
+### Changed
+- Project structure flattened — `packages/foundry/` layer removed; npm package now publishes from the repo root.
+- README consolidated from two files (brand + developer) into a single customer-facing canonical README; internal architecture moved to `docs/`.
+## 0.6.0 — 2026-05-22
+**Theme:** v0.6 compliance expansion + 5-platform measured benchmark.
+### Added
+- Corpus grew to 167 rules across 10 domains.
+- Outcomes: `add-comments`, `add-moderation`, `add-chat`.
+- Five-platform measured benchmark (TypeScript / React Native / Flutter / Android / iOS) with real `vise check` and `vise run-sensors` artifacts.
+### Fixed
+- React Native platform detection priority (previously misdetected as TypeScript when both signals were present).
+## 0.5.0 — 2026-05-21
+**Theme:** AST-based sensors.
+### Added
+- Tree-sitter AST sensors for Kotlin / Swift / Dart literal detection.
+- Phase 1 pilot: `typescript.auth.no-literal-user-id` resolves identifier-via-constant indirections.
+- Phase 4: AST-aware comment stripping for `ui-states-present` and `design-reuse-detected-tokens` rules.
+## 0.4.0 — 2026-05-20
+**Theme:** Compliance harness.
+### Added
+- `vise check --ci`: read-only verification with structured exit codes for CI pipelines.
+- Attestation flow: `vise attest` with rule id, signer, confidence, evidence, and rationale.
+- `vise sync`: persist deterministic-pass attestation files.
+- Engagement tracking: `vise engagement init/show` for tier / customer-id / scope metadata.
+- `sp-vise/` sidecar directory: customer-visible compliance contract (`compliance.json`, `attestations/`, `engagement.json`, `inspection.json`).
+- Cross-platform rule corpus.
+- Native project skill installs.
+## 0.3.0 — 2026-05-19
+**Theme:** Foundry → Vise rename.
+### Changed
+- Renamed npm package to `@amityco/social-plus-vise`.
+- Added `vise` short binary alias; kept `foundry-mcp` as a compatibility binary alias.
+- Added Claude Code skill targets (`--target claude`, `--target claude-project .`).
+- Documented Cursor, Copilot, and VS Code instruction installs.
+## 0.2.1 — 2026-05-18
+### Added
+- `vise install-skill`, `vise print-skill`, and `vise skill-path` for bundled-skill installation.
+## 0.2.0 — 2026-05-17
+### Added
+- Skill-guided CLI commands: `inspect`, `plan`, `validate`, `run-sensors`.
+- The `social-plus-vise` skill guidance shipped as part of the package.
+## 0.1.1 — 2026-05-16
+### Added
+- Initial npm publish.
+- MCP adapter (stdio).
+- Doc search backed by `https://learn.social.plus/llms-full.txt`.

package/README.md CHANGED Viewed

@@ -1,7 +1,3 @@
-<p align="center">
-  <img src="./social.plus-vise.png" alt="social.plus Vise" width="320" />
-</p>
 <h1 align="center">social.plus Vise</h1>
 <p align="center">
@@ -31,24 +27,25 @@ vise install-skill --target claude        # Claude Code (personal)
 vise install-skill --target cursor .      # Cursor (project-local)
 vise install-skill --target copilot .     # GitHub Copilot / VS Code
-# 3. Inside your project, let your AI agent run the Vise loop
-cd your-app
-vise inspect                              # detect platform, surface, design signals
-vise plan --request "Add a social feed"   # produce a grounded implementation plan
-vise init --request "Add a social feed"   # write the sp-vise/ compliance contract
-# 4. After the agent makes edits
-vise check                                # verify the integration against the contract
-vise run-sensors                          # run your project's own build/typecheck/lint
+# 3. Ask your AI agent to integrate with social.plus
+#    (the skill handles the rest — inspect, plan, init, code, check)
 ```
-That's it. The skill at `skills/social-plus-vise/SKILL.md` (installed in step 2) teaches your AI agent when to run each command. Skip to [Usage Flow](#usage-flow) for the full picture.
+**Step 3 in practice:** Open your AI coding tool in your project and prompt:
+> "Add a social feed to this app using the social.plus SDK."
+The installed skill teaches your agent to run `vise inspect` → `vise plan` → `vise init` → edit code → `vise check` → `vise run-sensors` automatically. You drive intent; Vise keeps the agent on the rails.
+See [Usage Flow](#usage-flow) for the full step-by-step diagram.
 ---
-## What Vise Does
+## What Vise Does: Agentic Workflow Governance
+Instead of just providing a CLI or AI skills, Vise implements a technique called **Agentic Workflow Governance**. Think of it as building a software factory directly on top of the customer's project.
-Vise is a **local CLI + AI skill** that wraps coding agents in deterministic compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 250+ platform-specific compliance rules, and runs your project's own build/lint/typecheck sensors — all locally. **Your source code never leaves your machine.**
+Vise acts as the foreman of this factory, wrapping your local coding agents in compliance guardrails when they integrate social.plus SDKs. It inspects your project, grounds the agent in hosted docs, enforces 262 platform-specific compliance rules, and runs your project's own build/lint/typecheck sensors. **Your source code never leaves your machine.**
 | Layer | Purpose |
 |---|---|
@@ -62,6 +59,81 @@ A bench vise holds the workpiece steady so the craftsman's hands are free to sha
 ---
+## Benchmark: Phase 1 Results
+> **Every feature delivered correctly — confirmed independently with two different AI coding tools.**
+> With Vise, both agents built all 9 social features with no production gaps. Without Vise, 3 out of 9 features had hidden problems that would only surface after users complained.
+### What "delivered correctly" means
+"Correct" doesn't just mean the code compiles. It means every feature handles the edge cases that matter to real users and real moderation teams:
+- A **banned user** cannot type or submit a post — the send button is hidden, not just disabled-on-submit
+- **Push notification preferences** are wired to the Amity API so users who opt out actually stop receiving notifications
+- **Moderation actions** (report, flag, block) are surfaced in the UI so users can act on them, not buried in a hook
+- **Chat and feed queries** use live, reactive subscriptions — not one-time fetches that go stale
+Without Vise, AI agents frequently implement the primary feature correctly but miss these secondary requirements. They know about them in the abstract — but when building a chat screen, "ban state" feels out of scope and gets skipped. `sp-check` turns that vague awareness into a specific, actionable finding.
+### The experiment: three conditions, nine features
+We ran a controlled experiment — the **Commune Benchmark** — to measure not just *whether* Vise helps, but *why*. Each of the nine features below was built from scratch by an AI agent under three independent conditions:
+**Nine features built:**
+SDK setup · User presence · Social feed · Events · Chat & DMs · Push notifications · User profile · Content moderation · Comments
+| Condition | What the agent had | The question it answers |
+|---|---|---|
+| **Pure MCP** | Access to social.plus docs only — no compliance guidance | Baseline: how well does the agent do on its own? |
+| **Rules-as-Markdown** | The full 1,013-line compliance rulebook pasted directly into the prompt | Is the problem just that the agent doesn't know the rules? |
+| **Vise + Skill** | Full Vise CLI — `sp-check` runs automatically, agent reads specific findings, fixes them, repeats until green | Does an active feedback loop change the outcome? |
+The Rules-as-Markdown condition is the key isolation: if the agent already knows all the rules, does giving it the spec document fix the problem? The answer turned out to be **no** — knowing the rules and being forced to act on specific findings are different things.
+### Results — features delivered without production gaps
+| Coding agent (model) | Pure MCP | Rules-as-Markdown | Vise + Skill |
+|---|---|---|---|
+| **Cursor (Composer 2.5)** | 6 out of 9 ✗ | 5 out of 9 ✗ | **9 out of 9 ✅** |
+| **Claude Code (Sonnet 4.6)** | 6 out of 9 ✗ | 7 out of 9 ✗ | **9 out of 9 ✅** |
+The three features that consistently fail without Vise — **Chat**, **Moderation**, and **Push Notifications** — are exactly the ones with secondary compliance requirements (ban-state, report affordances, Amity preference API). Vise's `sp-check` catches these with a specific finding; the rules doc does not.
+Both agents reached a perfect score with Vise. Neither could reach it with the compliance spec pasted into the prompt. All 9 passes were independently verified by code inspection — no scoring shortcuts.
+### Efficiency — rework sessions needed
+Vise delivers all 9 features correctly in a single session. The other conditions leave failing features that require additional sessions to diagnose (the gap isn't visible without `sp-check`) and fix.
+| Coding agent (model) | Condition | Features correct | Rework sessions needed |
+|---|---|---|---|
+| **Cursor (Composer 2.5)** | Pure MCP | 6 / 9 ✗ | +3 or more |
+| **Cursor (Composer 2.5)** | Rules-as-Markdown | 5 / 9 ✗ | +4 or more |
+| **Cursor (Composer 2.5)** | **Vise + Skill** | **9 / 9 ✅** | **0 ✅** |
+| **Claude Code (Sonnet 4.6)** | Pure MCP | 6 / 9 ✗ | +3 or more |
+| **Claude Code (Sonnet 4.6)** | Rules-as-Markdown | 7 / 9 ✗ | +2 or more |
+| **Claude Code (Sonnet 4.6)** | **Vise + Skill** | **9 / 9 ✅** | **0 ✅** |
+<sub>Rework sessions are additional developer-initiated prompts needed after the initial session to diagnose and fix the failing features. Each failing feature typically requires at least one session to identify the gap and one to fix it — and that's without the benefit of `sp-check` pointing directly at the problem.</sub>
+### Reproducibility
+- **Gate-checked:** Every pass was verified by code inspection — the Vise workspaces contain an actual UI-level ban gate; the pure-MCP workspaces do not. Zero attestation shortcuts.
+- **Built from scratch** (greenfield seed) — not patching existing code.
+- **Three arms run with separate tooling.** The Rules-as-Markdown arm has no `sp-check` tool available — it cannot "cheat" by running the checker.
+- **N=1 per cell (Phase 1).** Each agent ran each scenario once. Repeatability seeds on the three most discriminating slices (CM-CHAT, CM-MODERATE, CM-PUSH) are pending. These results should be treated as a strong initial signal, not a statistically settled finding.
+- Full per-feature scorecards, agent transcripts, and workspace diffs: [`benchmarks/FINDINGS.html`](benchmarks/FINDINGS.html) · [`benchmarks/RULES_AS_MARKDOWN.html`](benchmarks/RULES_AS_MARKDOWN.html)
+### Which mode should I use?
+| If you… | Use | Why |
+|---|---|---|
+| Building new social features with an AI agent | **Vise CLI + Skill** | The only mode that reliably delivers all features correctly |
+| Auditing existing social.plus code | `vise check --ci` | Grades any codebase against the full ruleset |
+| Enforcing compliance in a CI pipeline | `vise check --ci` | Exits non-zero on failures; structured JSON output for logs |
+---
 ## Supported Platforms
 | Platform | Coverage | Sensors |
@@ -72,7 +144,7 @@ A bench vise holds the workpiece steady so the craftsman's hands are free to sha
 | **Android (Kotlin)** | ✅ Full | Gradle assemble, unit tests |
 | **iOS (Swift)** | ✅ Full | (static rule checks; runtime sensors WIP) |
-Each platform has 50–55 rules across 10 compliance domains (feed, comments, moderation, chat, secrets, session & auth, notifications, live objects, logging hygiene, design tokens).
+Each platform has 52–54 rules across 10 compliance domains (feed, comments, moderation, chat, secrets, session & auth, notifications, live objects, logging hygiene, design tokens).
 ---
@@ -150,12 +222,13 @@ The flow above is what the skill teaches your AI agent. You — the human — dr
 | `vise plan-harness [path] --request "..."` | (Pre-planning step) Build the harness around the request |
 | `vise init [path] --request "..."` | Write the `sp-vise/` compliance contract for this project |
-### Documentation grounding
+### Documentation grounding & Troubleshooting
 | Command | Purpose |
 |---|---|
 | `vise search-docs "<query>"` | Search social.plus docs for relevant pages |
 | `vise get-doc-page <path>` | Fetch a specific doc page by path |
+| `vise debug [path] --error "..." [--brief]` | Debug an SDK-specific runtime failure and emit a likely-cause summary plus a minimal repair brief |
 ### Compliance verification
@@ -176,6 +249,18 @@ The flow above is what the skill teaches your AI agent. You — the human — dr
 | `vise run-sensors [path]` | Run detected project commands (npm scripts, Gradle, Flutter, lint, typecheck, SDK import smokes); never executes arbitrary shell |
 | `vise run-sensors [path] --dry-run` | List what would run without executing |
+### Troubleshooting quick loop
+For SDK-specific runtime issues, start with the compact debug flow before broader repo exploration:
+```sh
+vise debug . --error-file logs/crash.log --brief
+vise check . --ci
+vise run-sensors .
+```
+`vise debug --brief` returns the likely rule, minimum patch shape, invariants to preserve, and verification commands for the first repair pass.
 ### Skill management
 | Command | Purpose |
@@ -217,7 +302,7 @@ MCP-capable hosts can call Vise as structured tool calls instead of shell comman
 ### Tool names (snake_case per MCP convention)
-`inspect_project`, `plan_harness`, `plan_integration`, `init_compliance`, `check_compliance`, `sync_compliance`, `attest_rule`, `explain_rule`, `init_engagement`, `show_engagement`, `search_docs`, `get_doc_page`, `validate_setup`, `run_sensors`.
+`inspect_project`, `plan_harness`, `plan_integration`, `init_compliance`, `check_compliance`, `sync_compliance`, `attest_rule`, `explain_rule`, `init_engagement`, `show_engagement`, `search_docs`, `get_doc_page`, `debug_issue`, `validate_setup`, `run_sensors`.
 These are the same operations as the CLI commands above, exposed as MCP tools.
@@ -289,22 +374,6 @@ Attestation files record source fingerprints (SHA-256 of cited files) so subsequ
 ---
-## Benchmark Headline
-| Platform | Pure MCP findings | Vise findings | Pure MCP CI | Vise CI |
-|---|---|---|---|---|
-| React / Next.js | 7 (3 errors) | 2 (warnings) | ❌ FAIL | ✅ PASS |
-| React Native | 6 | 2 | ❌ FAIL | ✅ PASS |
-| Flutter | 9 | 2 | ❌ FAIL | ✅ PASS |
-| Android (Kotlin) | 9 | 0 | ❌ FAIL | ✅ PASS |
-| iOS (Swift) | 8 | 0 | ❌ FAIL | ✅ PASS |
-Measured runs of the same AI agent (Claude Sonnet 4.6) implementing "add a global feed" on each platform, with and without Vise. Without Vise: every run ships a hardcoded API key (a deterministic failure that cannot be attested). With Vise: every run passes CI with zero deterministic failures.
-For full methodology, per-cell scorecards, and the v0.7 multi-outcome cross-tool validation (chat / comments / push on React + Flutter, plus Antigravity/Gemini cross-tool), see [`benchmarks/RESULTS.md`](./benchmarks/RESULTS.md) in the installed npm package.
----
 ## Changelog
 See [`CHANGELOG.md`](./CHANGELOG.md) for the full version history.

package/dist/outcomes.js CHANGED Viewed

@@ -5,15 +5,15 @@ export function hasAnswer(answers, id) {
 const CLASSIFY_ORDER = [
     "setup-push",
     "setup-live-data",
+    "add-comments",
     "add-moderation",
     "add-chat",
-    "add-comments",
     "add-feed",
     "troubleshoot",
     "validate-setup",
     "setup-sdk",
 ];
-const PUSH_PATTERNS = [/\b(push|notification|firebase|fcm|apns)\b/];
+const PUSH_PATTERNS = [/\b(push(?:\s+notification)?|push(?:\s+notifications)?|firebase|fcm|apns)\b/];
 const LIVE_PATTERNS = [
     /\b(live object|live objects|live collection|live collections|realtime collection|real-time collection|observe|observer|subscribe|subscription|unsubscribe|live update|live updates)\b/,
 ];
@@ -31,7 +31,7 @@ const CHAT_PATTERNS = [
 ];
 const TROUBLESHOOT_PATTERNS = [/\b(error|broken|crash|not working|fail|timeout|401|403)\b/];
 const VALIDATE_PATTERNS = [/\b(validate|check|correct|setup right|initiali[sz])\b/];
-const SETUP_PATTERNS = [/\b(setup|set up|install|integrate|wire|configure)\b/];
+const SETUP_PATTERNS = [/\b(setup|set up|install|integrate|wire|configure|init sdk|sdk setup|session lifecycle)\b|initialise?s?\b/];
 export const BROAD_SOCIAL_REGEX = /\b(nice|social features|social feature|engagement|community experience)\b/i;
 export const DESIGN_REGEX = /\bdesign token|design tokens|theme|same design|design system|brand/i;
 export function classifyOutcome(request) {
@@ -134,10 +134,18 @@ const setupSdk = {
             step: "Initialize the social.plus client exactly once with API key and explicit region.",
             evidence: [platformQuickStart(ctx.platform).path, "requiredInputs.social.plus API key local env/config variable", "requiredInputs.social.plus region"],
         },
+        {
+            step: "When repairing setup, reuse the app's existing region or endpoint config source instead of hardcoding a guessed default value.",
+            evidence: [platformQuickStart(ctx.platform).path, "requiredInputs.social.plus region"],
+        },
         {
             step: "Wire login after user identity is known and before social.plus API queries/subscriptions.",
             evidence: ["social-plus-sdk/getting-started/authentication", "requiredInputs.user identity source for login"],
         },
+        {
+            step: "When adding renewal handling, keep it in the existing login path and retain the handler for the full session lifetime.",
+            evidence: ["social-plus-sdk/getting-started/authentication", "requiredInputs.user identity source for login"],
+        },
         { step: "Run validate_setup and detected command sensors after edits.", evidence: ["validate_setup", "run_sensors"] },
     ],
     validation: (platform) => [`${platform}.setup.present`, `${platform}.login.present`, `${platform}.region.explicit`],
@@ -470,6 +478,13 @@ const addFeed = {
                     "social-plus-sdk/core-concepts/realtime-communication/live-objects-collections/overview",
                 ],
             },
+            {
+                step: "When repairing or refactoring a feed query, preserve existing pagination inputs and state (for example pageToken, nextPage, hasMore/loadMore, or infinite-query wiring) unless the customer explicitly changes feed behavior.",
+                evidence: [
+                    "requiredInputs.feed scope",
+                    "implementationRules.file-specific edits",
+                ],
+            },
             { step: "Reuse the host app's existing visual system for the social surface.", evidence: designEvidence },
             { step: "Implement loading, empty, error, and data states.", evidence: ["implementationRules.file-specific edits"] },
             { step: "Run validate_setup and detected command sensors after edits.", evidence: ["validate_setup", "run_sensors"] },
@@ -878,6 +893,7 @@ const troubleshoot = {
     implementationRules: () => [],
     implementationSteps: () => [
         { step: "Gather more evidence before implementation.", evidence: ["stopConditions", "search_docs", "inspect_project"] },
+        { step: "If the issue is an SDK-specific runtime symptom, run vise debug first and use the repair brief before broader repo exploration.", evidence: ["search_docs", "inspect_project"] },
     ],
     validation: () => [],
     stopConditions: () => [],

package/dist/server.js CHANGED Viewed

@@ -13,6 +13,7 @@ import { planIntegrationTool } from "./tools/integration.js";
 import { inspectProjectTool, validateSetupTool } from "./tools/project.js";
 import { resolveRequestTool, suggestPatchTool } from "./tools/resolve.js";
 import { runSensorsTool } from "./tools/sensors.js";
+import { debugIssueTool, debugIssue } from "./tools/debug.js";
 import { packageName, packageVersion } from "./version.js";
 const tools = new Map([
     searchDocsTool,
@@ -31,6 +32,7 @@ const tools = new Map([
     runSensorsTool,
     validateSetupTool,
     suggestPatchTool,
+    debugIssueTool,
 ].map((tool) => [tool.name, tool]));
 const bundledSkillName = "social-plus-vise";
 const cliResult = await handleCli(process.argv.slice(2));
@@ -119,6 +121,32 @@ async function handleCli(args) {
             });
             return "exit";
         }
+        if (command === "debug") {
+            assertOnlyKnownFlags(args, ["error", "error-file", "brief"], "debug");
+            let errorMessage = flagValue(args, "error");
+            if (!errorMessage) {
+                const errorFile = flagValue(args, "error-file");
+                if (errorFile) {
+                    errorMessage = await readFile(path.resolve(errorFile), "utf8");
+                }
+                else if (!process.stdin.isTTY) {
+                    const { readFileSync } = await import("node:fs");
+                    try {
+                        errorMessage = readFileSync(0, "utf-8");
+                    }
+                    catch {
+                        errorMessage = undefined;
+                    }
+                }
+            }
+            if (!errorMessage) {
+                console.error("debug requires --error, --error-file, or piped stdin.");
+                process.exitCode = 1;
+                return "exit";
+            }
+            console.log(JSON.stringify(await debugIssue(positionalRepoPath(args.slice(1)), errorMessage, { brief: hasFlag(args, "brief") }), null, 2));
+            return "exit";
+        }
         if (command === "plan" || command === "plan-integration") {
             await printToolResult(planIntegrationTool, {
                 repoPath: positionalRepoPath(args.slice(1)),
@@ -349,6 +377,16 @@ Run deterministic social.plus setup validation for the current project.
 Usage:
   vise validate [repoPath] [--platform typescript] [--surface apps/web]`;
+    }
+    if (command === "debug") {
+        return `${packageName} debug
+Correlate an SDK-specific runtime failure to likely compliance issues and emit a minimal repair brief.
+Usage:
+  vise debug [repoPath] --error "401 Unauthorized: TokenExpiredException during social.plus session renewal"
+  vise debug [repoPath] --error-file logs/crash.log
+  vise debug [repoPath] --error-file logs/crash.log --brief`;
     }
     if (command === "run-sensors" || command === "run-sensor" || command === "run_sensor") {
         return `${packageName} run-sensors
@@ -425,6 +463,7 @@ Usage:
   vise install-skill --target codex Install bundled skill guidance
   vise print-skill                 Print bundled skill markdown
   vise inspect [repoPath]          Inspect platform and design signals
+  vise debug [repoPath] --error ... Debug an SDK-specific runtime error and emit a repair brief
   vise plan [repoPath] --request "..." Create an implementation plan
   vise init [repoPath] --request "..." Initialize compliance sidecar
   vise check [repoPath]            Check compliance contract