npm - livepilot - Versions diffs - 1.17.1 → 1.17.2 - Mend

livepilot 1.17.1 → 1.17.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/CHANGELOG.md +124 -0
package/README.md +8 -7
package/m4l_device/BUILD_GUIDE.md +24 -20
package/m4l_device/LivePilot_Analyzer.amxd +0 -0
package/m4l_device/livepilot_bridge.js +1 -1
package/mcp_server/__init__.py +1 -1
package/mcp_server/m4l_bridge.py +2 -1
package/mcp_server/preview_studio/engine.py +85 -11
package/mcp_server/preview_studio/models.py +8 -0
package/mcp_server/preview_studio/tools.py +98 -48
package/mcp_server/runtime/capability_state.py +18 -0
package/mcp_server/runtime/degradation.py +62 -0
package/mcp_server/runtime/tools.py +53 -4
package/mcp_server/song_brain/tools.py +23 -0
package/mcp_server/synthesis_brain/timbre.py +14 -8
package/mcp_server/tools/_agent_os_engine/__init__.py +10 -0
package/mcp_server/tools/_agent_os_engine/iteration.py +344 -0
package/mcp_server/tools/agent_os.py +194 -3
package/mcp_server/tools/analyzer.py +19 -6
package/package.json +2 -2
package/remote_script/LivePilot/__init__.py +1 -1
package/requirements.txt +5 -5
package/server.json +3 -3

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,129 @@
 # Changelog
+## 1.17.2 — iterate_toward_goal + preview-studio truth-gap (April 23 2026)
+### Added
+- **`iterate_toward_goal` MCP tool** (`mcp_server/tools/agent_os.py`,
+  `mcp_server/tools/_agent_os_engine/iteration.py`): closes the outer
+  evaluation loop. Given a compiled `GoalVector` and a list of candidate
+  move sets, runs up to N experiments sequentially. Each iteration
+  creates an experiment, runs all branches (with per-branch
+  apply-snapshot-undo already handled by the existing experiment engine),
+  scores the top branch against the goal, and either commits (score ≥
+  threshold) or discards and tries the next candidate set. On timeout,
+  commits the best-so-far (`on_timeout="commit_best"`, default) or
+  commits nothing (`on_timeout="discard_on_timeout"`). Per-branch undo
+  stays inside `run_experiment` — this loop never issues a raw undo.
+  Tool count: 426 → 427.
+  Engine ships as both a pure-sync `iterate_toward_goal_engine` (for
+  tests with in-memory fakes) and `iterate_toward_goal_engine_async`
+  (for the live MCP wrapper with coroutine callbacks); the sync entry
+  auto-detects coroutine callbacks and dispatches accordingly. Covered
+  by 11 tests in `tests/test_iterate_toward_goal.py` spanning happy
+  path, exhaustion + commit-best, exhaustion + discard, no candidates,
+  no-winner iterations, max_iterations capping, async coroutine
+  callbacks, and MCP registration.
+  This is the P0 item from the v1.17.1 review gap-analysis between
+  "tool orchestration" and "agentic optimization" — the create /
+  run / compare / commit primitives existed but nothing drove them
+  toward a scalar goal. `iterate_toward_goal` is that driver.
+### Fixed
+- **Preview Studio truth-gap** (`mcp_server/preview_studio/engine.py`,
+  `mcp_server/preview_studio/tools.py`): two compounding bugs made the
+  system lie about committed state.
+  1. `compare_variants()` scored every variant without filtering for
+     `status="blocked"` or missing `compiled_plan`. A blocked /
+     analytical-only variant could win the recommendation even with a
+     higher taste_fit than the only executable option. Fix: partition
+     variants into executable vs analytical, score only the executable
+     list, surface the analytical bucket on a new `analytical_candidates`
+     field for introspection. `recommended` stays a bare string (or
+     `None` when no executable variant exists) so no API shape breaks.
+  2. `commit_preview_variant()` called `engine.commit_variant()` — which
+     flips `preview_set.status = "committed"` and discards every sibling
+     variant — BEFORE checking whether the chosen variant had a compiled
+     plan. Analytical-only picks therefore got recorded as committed
+     with `committed=False` in the response and the preview set's
+     in-memory state said the opposite. Wonder lifecycle also advanced
+     to `resolved`. Fix: short-circuit analytical/blocked picks at the
+     top of the handler, return `{committed: False, reason:
+     "analytical_only" | "blocked", ...}`, leave `preview_set.status`
+     untouched, and gate Wonder lifecycle hooks behind the executable
+     branch. New regressions in `tests/test_preview_studio_truth_gap.py`
+     lock all four scenarios (A1-A4 from the remediation plan).
+- **Runtime capability probes stop lying about `web` and `flucoma`**
+  (`mcp_server/runtime/tools.py`, `mcp_server/runtime/capability_state.py`):
+  `get_capability_state` previously hardcoded `web_ok=False` and never
+  emitted a `flucoma` domain at all, causing `route_request` to pick
+  degraded research/perception paths on machines where those
+  capabilities were actually available. `_probe_web()` now runs a
+  500 ms HEAD request to `https://api.github.com` using stdlib
+  `urllib.request` (no new dependency); `_probe_flucoma()` uses
+  `importlib.util.find_spec("flucoma")` with safe exception swallowing.
+  The `flucoma` domain is now emitted unconditionally so consumers can
+  distinguish "probed and missing" from "not probed yet".
+- **`build_song_brain` flags degraded responses**
+  (`mcp_server/song_brain/tools.py`): When `get_session_info` fails,
+  the tool injected `{tempo: 120.0, track_count: 0}` and returned a
+  polished SongBrain with no indication the inputs were synthesized.
+  The fallback is preserved for backward compatibility but the
+  response now carries a top-level `degradation` payload
+  (`{is_degraded, reasons, substituted_fields}`) so callers can branch
+  on synthesized vs real data.
+- **`create_preview_set` flags the empty-kernel fallback**
+  (`mcp_server/preview_studio/engine.py`,
+  `mcp_server/preview_studio/models.py`): When the caller omits a real
+  session kernel, `create_preview_set` synthesizes an empty-but-valid
+  shape so compilers degrade to no-op steps. `PreviewSet` now carries a
+  `degradation` field that is marked
+  `is_degraded=True, reasons=["empty_kernel_fallback"]` whenever that
+  substitution fires, so downstream consumers can tell a synthesized
+  compile from a kernel-backed one.
+### Added
+- **`DegradationInfo` dataclass** (`mcp_server/runtime/degradation.py`):
+  New shared payload that engines attach to their responses whenever
+  they substitute fallback data. Three fields:
+  `is_degraded: bool`, `reasons: list[str]`, `substituted_fields: list[str]`.
+  Intentionally minimal and import-safe so any engine can adopt it
+  without circular-import risk. Wired into `song_brain` and
+  `preview_studio`; other engines will adopt it as audits surface more
+  silent-fallback paths.
+- **`flucoma` capability domain** now emitted by
+  `build_capability_state` alongside `session_access`, `analyzer`,
+  `memory`, `web`, and `research`. Matches the existing
+  `CapabilityDomain` schema.
+### Changed
+- **`capability-modes.md` reference doc rewritten to match the actual
+  response shape** (`livepilot/skills/livepilot-evaluation/references/capability-modes.md`).
+  The old example JSON described a flat
+  `{mode, analyzer_connected, bridge_version, spectral_cache_age_ms, flucoma_available, session_connected}`
+  shape that hasn't matched `get_capability_state` for releases. The
+  new section documents the nested `capability_state.domains.<name>`
+  structure, explicit per-domain and per-field definitions, and
+  explicitly scopes the `web` domain as *"server-side outbound HTTP
+  capability; does NOT imply curated research corpora are installed"*.
+### Tests
+- `tests/test_preview_studio_truth_gap.py` — 5 tests locking the four
+  A1-A4 scenarios from the remediation plan.
+- `tests/test_runtime_capability_probes.py` — 6 tests covering the
+  web probe (true/false/exception-swallow) and the flucoma probe
+  (emitted-when-importable, emitted-when-missing, find_spec-backed).
+- `tests/test_degradation_signalling.py` — 8 tests covering the
+  `DegradationInfo` dataclass defaults, `song_brain` degradation on
+  session failure, and `preview_studio` degradation on empty-kernel
+  fallback.
 ## 1.17.1 — Splice auto-reconnect + Codex installer fix (April 23 2026)
 Two bug fixes discovered in a parallel worktree hours after v1.17.0

package/README.md CHANGED Viewed

@@ -17,7 +17,7 @@
 <p align="center">
   An agentic production system for Ableton Live 12.<br>
-  426 tools. 52 domains. Device atlas. Plan-aware Splice integration. Auto-composition. Spectral perception. Technique memory. Drum-rack pad builder. Live dead-device detection.
+  427 tools. 52 domains. Device atlas. Plan-aware Splice integration. Auto-composition. Spectral perception. Technique memory. Drum-rack pad builder. Live dead-device detection.
 </p>
 <br>
@@ -80,7 +80,7 @@ Most MCP servers are tool collections — they execute commands. LivePilot is an
 │         └─────────────────┼──────────────────┘                       │
 │                           ▼                                          │
 │                  ┌─────────────────┐                                  │
-│                  │   426 MCP Tools  │                                  │
+│                  │   427 MCP Tools  │                                  │
 │                  │   52 domains     │                                  │
 │                  └────────┬────────┘                                  │
 │                           │                                          │
@@ -121,7 +121,7 @@ Most MCP servers are tool collections — they execute commands. LivePilot is an
 ## The Intelligence Layer
-12 engines sit on top of the 426 tools. They give the AI musical judgment, not just musical execution.
+12 engines sit on top of the 427 tools. They give the AI musical judgment, not just musical execution.
 ### SongBrain — What the Song Is
@@ -173,7 +173,7 @@ Every engine follows: **measure before → act → measure after → compare**.
 ## Tools
-426 tools across 52 domains. Highlights below — [full catalog here](docs/manual/tool-catalog.md).
+427 tools across 52 domains. Highlights below — [full catalog here](docs/manual/tool-catalog.md).
 <br>
@@ -208,7 +208,8 @@ The M4L Analyzer sits on the master track. UDP 9880 carries spectral data to the
 > Most tools work without the analyzer — it adds 32 spectral/analyzer tools (frequency, loudness, perception) and closes the feedback loop.
 ```
-SPECTRAL ─────── 8-band frequency decomposition (sub → air)
+SPECTRAL ─────── 9-band frequency decomposition (sub_low → air)
+                 sub_low (20-60 Hz) split off so kick fundamentals don't hide inside sub
                  true RMS / peak metering
                  Krumhansl-Schmuckler key detection
@@ -361,7 +362,7 @@ The V2 intelligence layer. These tools analyze, diagnose, plan, evaluate, and le
 | Creative Constraints | 5 | constraint activation, reference-inspired variants |
 | Preview Studio | 5 | variant creation, preview rendering, comparison, commit |
-> **[View all 426 tools →](docs/manual/tool-catalog.md)**
+> **[View all 427 tools →](docs/manual/tool-catalog.md)**
 <br>
@@ -588,7 +589,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for architecture details, code guidelines
 | Document | What's inside |
 |----------|---------------|
-| [Manual](docs/manual/index.md) | Complete reference: architecture, all 426 tools, workflows |
+| [Manual](docs/manual/index.md) | Complete reference: architecture, all 427 tools, workflows |
 | [Intelligence Layer](docs/manual/intelligence.md) | How the 12 engines connect — conductor, moves, preview, evaluation |
 | [Device Atlas](docs/manual/device-atlas.md) | 1305 devices indexed — search, suggest, chain building |
 | [Samples & Slicing](docs/manual/samples.md) | 3-source search, fitness critics, slice workflows |

package/m4l_device/BUILD_GUIDE.md CHANGED Viewed

@@ -32,27 +32,31 @@ We tap the audio for analysis without affecting the pass-through.
 4. Add object: `[*~ 0.5]` (scale to prevent clipping)
 5. Connect: `[+~]` outlet → `[*~ 0.5]` inlet
-## Step 4: 8-Band Spectrum Analysis
-1. Add object: `[fffb~ 8]` (fast 8-band filter bank)
-2. Connect: `[*~ 0.5]` outlet → `[fffb~ 8]` inlet
-3. Set `fffb~` frequencies in Inspector or via message:
-   - Band 1: 40 Hz (sub)
-   - Band 2: 130 Hz (low)
-   - Band 3: 350 Hz (low-mid)
-   - Band 4: 1000 Hz (mid)
-   - Band 5: 3000 Hz (high-mid)
-   - Band 6: 6000 Hz (high)
-   - Band 7: 10000 Hz (presence)
-   - Band 8: 16000 Hz (air)
-   To set: add `[loadmess 40 130 350 1000 3000 6000 10000 16000]` → `[fffb~ 8]` right inlet
-4. For each of the 8 outlets of `[fffb~ 8]`:
+## Step 4: 9-Band Spectrum Analysis
+(v1.16+ layout. Pre-v1.16 devices used `[fffb~ 8]`; the server still accepts
+8-band payloads for backward compatibility, but new builds should use 9.)
+1. Add object: `[fffb~ 9]` (fast 9-band filter bank)
+2. Connect: `[*~ 0.5]` outlet → `[fffb~ 9]` inlet
+3. Set `fffb~` center frequencies in Inspector or via message:
+   - Band 1: 35 Hz   (sub_low)   — kick fundamentals, Villalobos subs
+   - Band 2: 85 Hz   (sub)       — 808s, sub-bass body
+   - Band 3: 175 Hz  (low)       — bass body, warmth
+   - Band 4: 350 Hz  (low_mid)   — mud zone
+   - Band 5: 700 Hz  (mid)       — vocal presence, snare body
+   - Band 6: 1400 Hz (high_mid)  — consonants, pick attack
+   - Band 7: 2800 Hz (high)      — presence, intelligibility
+   - Band 8: 5600 Hz (presence)  — cymbal definition
+   - Band 9: 12000 Hz (air)      — shimmer, sparkle
+   To set: add `[loadmess 35. 85. 175. 350. 700. 1400. 2800. 5600. 12000.]` → `[fffb~ 9]` right inlet
+4. For each of the 9 outlets of `[fffb~ 9]`:
    - Add `[abs~]` (rectify to positive)
    - Add `[snapshot~ 200]` (sample at 5 Hz)
-5. Add `[pack f f f f f f f f]` and connect all 8 `[snapshot~]` outlets to it
+5. Add `[pack f f f f f f f f f]` and connect all 9 `[snapshot~]` outlets to it
 6. Add `[prepend /spectrum]` → connect from `[pack]`
 7. Add `[udpsend 127.0.0.1 9880]` → connect from `[prepend]`
@@ -131,7 +135,7 @@ We tap the audio for analysis without affecting the pass-through.
 1. Drop `LivePilot Analyzer` on the **master track**
 2. Play some audio
-3. In Claude Code, run: `get_master_spectrum` — should return 8 band values
+3. In Claude Code, run: `get_master_spectrum` — should return 9 band values (v1.16+) or 8 values (pre-v1.16 .amxd)
 4. Run: `get_master_rms` — should return RMS and peak
 5. After 8+ bars: `get_detected_key` — should return key and scale
@@ -143,7 +147,7 @@ We tap the audio for analysis without affecting the pass-through.
           │                                                  │
 plugin~ ──┤──L+R──► plugout~     (pass-through)             │
           │                                                  │
-          │──L+R──► +~ ──► *~ 0.5 ──┬──► fffb~ 8 ──► UDP   │
+          │──L+R──► +~ ──► *~ 0.5 ──┬──► fffb~ 9 ──► UDP   │
           │                          ├──► peakamp~ ──► UDP   │
           │                          ├──► average~ ──► UDP   │
           │                          └──► sigmund~ ──► JS    │

package/m4l_device/LivePilot_Analyzer.amxd CHANGED Viewed

Binary file

package/m4l_device/livepilot_bridge.js CHANGED Viewed

@@ -95,7 +95,7 @@ function anything() {
 function dispatch(cmd, args) {
     switch(cmd) {
         case "ping":
-            send_response({"ok": true, "version": "1.17.1"});
+            send_response({"ok": true, "version": "1.17.2"});
             break;
         case "get_params":
             cmd_get_params(args);

package/mcp_server/__init__.py CHANGED Viewed

@@ -1,2 +1,2 @@
 """LivePilot MCP Server — bridges MCP protocol to Ableton Live."""
-__version__ = "1.17.1"
+__version__ = "1.17.2"

package/mcp_server/m4l_bridge.py CHANGED Viewed

@@ -471,7 +471,8 @@ class SpectralReceiver(asyncio.DatagramProtocol):
     """Receives OSC-formatted UDP packets from the M4L device.
     OSC messages:
-        /spectrum f f f f f f f f  — 8-band spectrum
+        /spectrum f f f f f f f f [f]  — 8 or 9 band spectrum
+                                          (9 = v1.16+ with sub_low; 8 = legacy)
         /peak f                    — peak level
         /rms f                     — RMS level
         /pitch f f                 — MIDI note, amplitude

package/mcp_server/preview_studio/engine.py CHANGED Viewed

@@ -11,6 +11,7 @@ import json
 import time
 from typing import Optional
+from ..runtime.degradation import DegradationInfo
 from .models import PreviewSet, PreviewVariant
@@ -52,7 +53,10 @@ def create_preview_set(
     kernel: the live session kernel (track topology + device chains). Compilers
         resolve targets from it — without it, variants degrade into no-ops or
         generic reads. Callers that have a `ctx` should fetch a real kernel
-        via runtime.tools.get_session_kernel(ctx).
+        via runtime.tools.get_session_kernel(ctx). When omitted the engine
+        synthesizes an empty-but-valid kernel (see ``_build_triptych``) and
+        flags the resulting PreviewSet with ``degradation.is_degraded=True``
+        so callers can tell a synthesized compile from a real one.
     """
     set_id = _compute_set_id(request_text, kernel_id)
     now = int(time.time() * 1000)
@@ -61,6 +65,18 @@ def create_preview_set(
     song_brain = song_brain or {}
     taste_graph = taste_graph or {}
+    # Degradation bookkeeping — if the caller didn't supply a kernel the
+    # compiler receives a synthesized one (see engine.py line 128 area)
+    # and every variant is scored against that synthetic topology.
+    if kernel:
+        degradation = DegradationInfo()
+    else:
+        degradation = DegradationInfo(
+            is_degraded=True,
+            reasons=["empty_kernel_fallback"],
+            substituted_fields=["compile_kernel"],
+        )
     if strategy == "creative_triptych":
         variants = _build_triptych(
             request_text, moves, song_brain, taste_graph, set_id, now, kernel,
@@ -79,6 +95,7 @@ def create_preview_set(
         source_kernel_id=kernel_id,
         variants=variants,
         created_at_ms=now,
+        degradation=degradation,
     )
     store_preview_set(ps)
     return ps
@@ -258,31 +275,66 @@ def _build_binary(
 # ── Comparison ────────────────────────────────────────────────────
+_NON_EXECUTABLE_STATUSES = {"blocked", "failed"}
+def _is_executable(variant: PreviewVariant) -> bool:
+    """A variant is executable when it has a compiled plan AND its status
+    hasn't been flagged as blocked/failed upstream.
+    The compiled plan may be a non-empty list of steps OR a dict with a
+    non-empty ``steps`` key — both shapes exist in the wild.
+    """
+    if variant.status in _NON_EXECUTABLE_STATUSES:
+        return False
+    plan = variant.compiled_plan
+    if plan is None:
+        return False
+    if isinstance(plan, list):
+        return len(plan) > 0
+    if isinstance(plan, dict):
+        return len(plan.get("steps") or []) > 0
+    # Any other truthy shape is treated as executable; falsy as not.
+    return bool(plan)
 def compare_variants(
     preview_set: PreviewSet,
     criteria: Optional[dict] = None,
 ) -> dict:
-    """Compare variants within a preview set and rank them."""
+    """Compare variants within a preview set and rank them.
+    Truth-gap fix (PR-A): variants that are blocked/failed OR lack a
+    compiled_plan are partitioned out of the scored ranking. They appear
+    in ``analytical_candidates`` (just their variant_ids) and ALSO stay
+    in ``rankings`` at the bottom for introspection, but they can never
+    populate ``recommended``. When no executable variant exists,
+    ``recommended`` is ``None`` so callers can surface a clear message
+    instead of silently committing a no-op.
+    """
     criteria = criteria or {}
     weight_taste = criteria.get("taste_weight", 0.3)
     weight_novelty = criteria.get("novelty_weight", 0.2)
     weight_identity = criteria.get("identity_weight", 0.5)
-    rankings = []
+    executable: list[PreviewVariant] = []
+    analytical: list[PreviewVariant] = []
     for v in preview_set.variants:
-        # Score components
+        (executable if _is_executable(v) else analytical).append(v)
+    def _score(v: PreviewVariant) -> float:
         taste_score = v.taste_fit
         novelty_score = 1.0 - abs(v.novelty_level - 0.5) * 2  # bell curve around 0.5
         identity_score = _identity_effect_score(v.identity_effect)
         composite = (
             taste_score * weight_taste
             + novelty_score * weight_novelty
             + identity_score * weight_identity
         )
-        v.score = round(composite, 3)
+        return round(composite, 3)
-        rankings.append({
+    def _row(v: PreviewVariant) -> dict:
+        return {
             "variant_id": v.variant_id,
             "label": v.label,
             "score": v.score,
@@ -292,13 +344,35 @@ def compare_variants(
             "summary": v.intent,
             "what_preserved": v.what_preserved,
             "why_it_matters": v.why_it_matters,
-        })
-    rankings.sort(key=lambda r: r["score"], reverse=True)
+            "status": v.status,
+        }
+    executable_rows: list[dict] = []
+    for v in executable:
+        v.score = _score(v)
+        executable_rows.append(_row(v))
+    executable_rows.sort(key=lambda r: r["score"], reverse=True)
+    # Analytical variants still get a score computed so introspection
+    # shows the same shape, but they're appended AFTER the sorted
+    # executables so they can never land at position 0.
+    analytical_rows: list[dict] = []
+    for v in analytical:
+        v.score = _score(v)
+        analytical_rows.append(_row(v))
+    rankings = executable_rows + analytical_rows
+    recommended: Optional[str]
+    if executable_rows:
+        recommended = executable_rows[0]["variant_id"]
+    else:
+        recommended = None
     comparison = {
         "rankings": rankings,
-        "recommended": rankings[0]["variant_id"] if rankings else "",
+        "recommended": recommended,
+        "analytical_candidates": [v.variant_id for v in analytical],
         "criteria_used": {
             "taste_weight": weight_taste,
             "novelty_weight": weight_novelty,

package/mcp_server/preview_studio/models.py CHANGED Viewed

@@ -6,6 +6,8 @@ import time
 from dataclasses import asdict, dataclass, field
 from typing import Optional
+from ..runtime.degradation import DegradationInfo
 @dataclass
 class PreviewVariant:
@@ -59,6 +61,11 @@ class PreviewSet:
     committed_variant_id: str = ""
     status: str = "pending"  # pending, compared, committed, discarded
     created_at_ms: int = field(default_factory=lambda: int(time.time() * 1000))
+    # Degradation signalling — set when the engine substituted a fallback
+    # (e.g. an empty-but-valid kernel) during variant compilation. Callers
+    # can inspect .degradation.is_degraded to tell synthesized preview
+    # topology apart from a real kernel-backed compile.
+    degradation: DegradationInfo = field(default_factory=DegradationInfo)
     def to_dict(self) -> dict:
         return {
@@ -71,4 +78,5 @@ class PreviewSet:
             "committed_variant_id": self.committed_variant_id,
             "status": self.status,
             "variant_count": len(self.variants),
+            "degradation": self.degradation.to_dict(),
         }

package/mcp_server/preview_studio/tools.py CHANGED Viewed

@@ -270,7 +270,68 @@ async def commit_preview_variant(
     if not ps:
         return {"error": f"Preview set {set_id} not found"}
+    # Resolve the chosen variant WITHOUT mutating state yet. We have to
+    # short-circuit analytical-only / blocked picks BEFORE engine.commit_variant
+    # runs, otherwise `preview_set.status` gets flipped to "committed" and
+    # sibling variants get discarded even though nothing executed.
+    chosen = None
+    for v in ps.variants:
+        if v.variant_id == variant_id:
+            chosen = v
+            break
+    if not chosen:
+        available = [v.variant_id for v in ps.variants]
+        return {
+            "error": f"Variant {variant_id} not found in set {set_id}",
+            "available_variants": available,
+        }
+    # ── Truth-gap guard: refuse to "commit" a variant that can't execute ──
+    # If the variant was flagged blocked/failed upstream or lacks a
+    # compiled plan, the old code still marked preview_set.status='committed'
+    # and returned committed=False as a silent contradiction. Close that
+    # gap: return an honest no-op and leave state untouched so the caller
+    # can pick a different variant.
+    plan = chosen.compiled_plan
+    plan_is_empty = (
+        plan is None
+        or (isinstance(plan, list) and len(plan) == 0)
+        or (isinstance(plan, dict) and len(plan.get("steps") or []) == 0)
+    )
+    blocked = chosen.status in {"blocked", "failed"}
+    if plan_is_empty or blocked:
+        reason = "blocked" if blocked and plan_is_empty is False else "analytical_only"
+        return {
+            "committed": False,
+            "status": "analytical_only" if reason == "analytical_only" else "blocked",
+            "reason": reason,
+            "preview_set_id": set_id,
+            "variant_id": chosen.variant_id,
+            "label": chosen.label,
+            "intent": chosen.intent,
+            "move_id": chosen.move_id,
+            "identity_effect": chosen.identity_effect,
+            "what_preserved": chosen.what_preserved,
+            "message": (
+                "chose analytical variant; no session changes applied"
+                if reason == "analytical_only"
+                else "variant is blocked; no session changes applied"
+            ),
+            "note": (
+                "Variant has no compiled plan (analytical-only). Preview set "
+                "was left in its pre-commit state so you can pick a different "
+                "variant."
+                if reason == "analytical_only"
+                else "Variant is blocked/failed. Preview set was left in its "
+                "pre-commit state so you can pick a different variant."
+            ),
+        }
+    # Only now do we flip state — the chosen variant has an executable plan.
     chosen = engine.commit_variant(ps, variant_id)
+    # engine.commit_variant cannot return None here (we already verified
+    # the variant_id exists), but keep the defensive check for the type
+    # checker.
     if not chosen:
         available = [v.variant_id for v in ps.variants]
         return {
@@ -289,55 +350,44 @@ async def commit_preview_variant(
     }
     # ── v1.10.3: actually execute the compiled plan ──
-    # If there's no compiled plan, the variant is analytical-only — record
-    # the choice and return honestly instead of pretending it was applied.
-    if not chosen.compiled_plan:
-        result["committed"] = False
-        result["status"] = "analytical_only"
-        result["note"] = (
-            "Variant has no compiled plan (analytical-only). Preview set "
-            "marked the choice but no session changes were made. Use an "
-            "executable variant if you want the commit to apply changes."
-        )
+    from ..runtime.execution_router import execute_plan_steps_async
+    plan = chosen.compiled_plan
+    steps = plan if isinstance(plan, list) else plan.get("steps", []) or []
+    ableton = _get_ableton(ctx)
+    bridge = ctx.lifespan_context.get("m4l")
+    mcp_registry = ctx.lifespan_context.get("mcp_dispatch", {})
+    exec_results = await execute_plan_steps_async(
+        steps,
+        ableton=ableton,
+        bridge=bridge,
+        mcp_registry=mcp_registry,
+        ctx=ctx,
+        stop_on_failure=False,
+    )
+    log = [
+        {
+            "tool": r.tool,
+            "backend": r.backend,
+            "ok": r.ok,
+            **({"result": r.result} if r.ok else {"error": r.error}),
+        }
+        for r in exec_results
+    ]
+    steps_ok = sum(1 for r in exec_results if r.ok)
+    steps_failed = len(exec_results) - steps_ok
+    result["execution_log"] = log
+    result["steps_ok"] = steps_ok
+    result["steps_failed"] = steps_failed
+    if steps_failed == 0 and steps_ok > 0:
+        result["status"] = "committed"
+    elif steps_ok > 0:
+        result["status"] = "committed_with_errors"
     else:
-        from ..runtime.execution_router import execute_plan_steps_async
-        plan = chosen.compiled_plan
-        steps = plan if isinstance(plan, list) else plan.get("steps", []) or []
-        ableton = _get_ableton(ctx)
-        bridge = ctx.lifespan_context.get("m4l")
-        mcp_registry = ctx.lifespan_context.get("mcp_dispatch", {})
-        exec_results = await execute_plan_steps_async(
-            steps,
-            ableton=ableton,
-            bridge=bridge,
-            mcp_registry=mcp_registry,
-            ctx=ctx,
-            stop_on_failure=False,
-        )
-        log = [
-            {
-                "tool": r.tool,
-                "backend": r.backend,
-                "ok": r.ok,
-                **({"result": r.result} if r.ok else {"error": r.error}),
-            }
-            for r in exec_results
-        ]
-        steps_ok = sum(1 for r in exec_results if r.ok)
-        steps_failed = len(exec_results) - steps_ok
-        result["execution_log"] = log
-        result["steps_ok"] = steps_ok
-        result["steps_failed"] = steps_failed
-        if steps_failed == 0 and steps_ok > 0:
-            result["status"] = "committed"
-        elif steps_ok > 0:
-            result["status"] = "committed_with_errors"
-        else:
-            result["status"] = "failed"
-            result["committed"] = False
+        result["status"] = "failed"
+        result["committed"] = False
     # Wonder lifecycle hooks
     ws = _find_wonder_session_by_preview(set_id)