npm - livepilot - Versions diffs - 1.17.2 → 1.17.4 - Mend

livepilot 1.17.2 → 1.17.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/CHANGELOG.md +107 -0
package/m4l_device/LivePilot_Analyzer.amxd +0 -0
package/m4l_device/livepilot_bridge.js +1 -1
package/mcp_server/__init__.py +1 -1
package/mcp_server/preview_studio/tools.py +20 -14
package/mcp_server/runtime/tools.py +29 -2
package/mcp_server/tools/_agent_os_engine/iteration.py +151 -14
package/package.json +1 -1
package/remote_script/LivePilot/__init__.py +1 -1
package/server.json +2 -2

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,112 @@
 # Changelog
+## 1.17.4 — Shape cleanup + memory probe (April 23 2026)
+### Fixed
+- **`get_session_kernel` now probes the memory store** instead of
+  hardcoding `memory_ok=True` (`mcp_server/runtime/tools.py`). If the
+  underlying technique store raises on `list_techniques` (disk full,
+  corrupted index, permissions error), the kernel previously still
+  reported memory as available to orchestration planners. Same
+  truth-gap class as the v1.17.3 web/flucoma fix — should have been
+  caught by the same review pass. Now probed the same way
+  `get_capability_state` does, wrapped in try/except.
+- **`capability_state` flat shape** in session kernel
+  (`mcp_server/runtime/tools.py`): `state.to_dict()` wraps its output as
+  `{"capability_state": {...}}` — that's the right shape for the
+  standalone `get_capability_state` tool, but when stored on the kernel
+  it produced the ugly double-nested
+  `kernel["capability_state"]["capability_state"]["domains"]`. v1.17.3
+  probe tests worked around it with defensive
+  `outer.get("capability_state", outer)`. Fix: unwrap the outer key
+  once before passing to `build_session_kernel`. Consumer path is
+  now `kernel["capability_state"]["domains"]` directly. Standalone
+  `get_capability_state` return shape unchanged.
+### Tests
+- 4 new TDD tests in `tests/test_runtime_capability_probes.py`:
+  - memory probe raises → kernel reports memory unavailable
+  - memory probe succeeds → kernel reports available
+  - kernel's capability_state has no nested `capability_state` key
+  - end-to-end flat access without defensive fallbacks
+- Consumer updates:
+  - `test_session_kernel.py:203` — removed extra level
+  - `test_runtime_capability_probes.py` (4 places) — removed
+    defensive `outer.get('capability_state', outer)` pattern now that
+    the shape is known-flat
+2722 → 2726 passing.
+### Known follow-up
+Audit while writing this release flagged a third bug in
+`mcp_server/runtime/safety_kernel.py:244`: the safety kernel reads
+`capability_state.get("mode", "normal")` but the actual shape uses
+`overall_mode`, not `mode`. The `.get(..., "normal")` default silently
+falls back, so `read_only` mode gating never kicks in. Separate fix,
+out of scope for this release.
+## 1.17.3 — Truth-gap remediation, for real (April 23 2026)
+### Fixed
+- **`iterate_toward_goal` now inspects `commit_fn` return value** (P1,
+  `mcp_server/tools/_agent_os_engine/iteration.py`): prior to this release
+  the iteration loop awaited the commit callback and dropped the return
+  value on the floor, then unconditionally returned `status="committed"`.
+  If the underlying `commit_branch_async` applied zero steps or partially
+  succeeded, the iteration result claimed success — the exact bug pattern
+  the release was meant to fix elsewhere. New `_classify_commit_result()`
+  helper maps known payload shapes to three statuses: `"committed"` (clean),
+  `"committed_with_errors"` (steps_ok > 0 AND steps_failed > 0), and
+  `"commit_failed"` (committed=False, ok=False, status="failed", or
+  steps_ok == 0). Both sync and async cores now zero out
+  `committed_experiment_id`/`committed_branch_id` when the commit truly
+  failed, and surface the raw commit payload on `IterationResult.commit_result`.
+- **Preview Studio commit-before-execute ordering** (P1,
+  `mcp_server/preview_studio/tools.py`): `commit_preview_variant()` called
+  `engine.commit_variant()` BEFORE `execute_plan_steps_async` ran. That
+  flipped `preview_set.status = "committed"` and `committed_variant_id`
+  up front, so when every execution step failed the response correctly
+  said `committed: false / status: "failed"` but the stored state still
+  said the opposite. Wonder lifecycle advance also fired regardless.
+  Reorder: execute first, then flip state only when `steps_ok > 0`.
+  Zero-success path now returns honestly and leaves `preview_set` and
+  WonderSession untouched. Partial-success stays a legitimate commit
+  with `status="committed_with_errors"`.
+- **`get_session_kernel` propagates web + flucoma probe results** (P2,
+  `mcp_server/runtime/tools.py`): the kernel builder called
+  `build_capability_state(...)` with only session/analyzer/memory params,
+  so `web_ok` and `flucoma_ok` silently defaulted to `False`. Meanwhile
+  `get_capability_state()` correctly probed both. Planners that read
+  the kernel (the documented orchestration entrypoint) stayed on
+  degraded paths even when probes would have reported available. Fix:
+  call `_probe_web()` + `_probe_flucoma()` inside `get_session_kernel`
+  and pass through.
+### Added
+- **10 new tests** covering the three truth-gap classes:
+  - `test_iterate_toward_goal.py`: 4 tests for commit inspection
+    (failed commit, partial commit, timeout commit_best, back-compat
+    clean success).
+  - `test_preview_studio_truth_gap.py`: 3 tests for
+    executable-variant-fails paths (all-steps-fail preserves state,
+    Wonder not advanced, partial-success honest commit).
+  - `test_runtime_capability_probes.py`: 3 tests for kernel
+    propagation (web probe → kernel, flucoma probe → kernel,
+    both-unavailable back-compat).
+- **`IterationResult.commit_result`** — the raw commit_fn payload,
+  surfaced on the returned dict whenever a commit was attempted.
+  Callers can inspect `result["commit_result"]["steps_failed"]`,
+  `result["commit_result"]["error"]`, etc.
+This release closes what the post-v1.17.2 review correctly flagged:
+the feature we shipped to "close the evaluation loop" had a truth-gap
+at the innermost step. 2712 → 2722 tests pass.
 ## 1.17.2 — iterate_toward_goal + preview-studio truth-gap (April 23 2026)
 ### Added

package/m4l_device/LivePilot_Analyzer.amxd CHANGED Viewed

Binary file

package/m4l_device/livepilot_bridge.js CHANGED Viewed

@@ -95,7 +95,7 @@ function anything() {
 function dispatch(cmd, args) {
     switch(cmd) {
         case "ping":
-            send_response({"ok": true, "version": "1.17.2"});
+            send_response({"ok": true, "version": "1.17.4"});
             break;
         case "get_params":
             cmd_get_params(args);

package/mcp_server/__init__.py CHANGED Viewed

@@ -1,2 +1,2 @@
 """LivePilot MCP Server — bridges MCP protocol to Ableton Live."""
-__version__ = "1.17.2"
+__version__ = "1.17.4"

package/mcp_server/preview_studio/tools.py CHANGED Viewed

@@ -327,20 +327,17 @@ async def commit_preview_variant(
             ),
         }
-    # Only now do we flip state — the chosen variant has an executable plan.
-    chosen = engine.commit_variant(ps, variant_id)
-    # engine.commit_variant cannot return None here (we already verified
-    # the variant_id exists), but keep the defensive check for the type
-    # checker.
-    if not chosen:
-        available = [v.variant_id for v in ps.variants]
-        return {
-            "error": f"Variant {variant_id} not found in set {set_id}",
-            "available_variants": available,
-        }
+    # ── P1#2 fix (v1.17.3): execute BEFORE flipping state ──
+    # Prior behavior: engine.commit_variant() ran here, BEFORE execution.
+    # If every step then failed, the returned payload correctly said
+    # committed=False / status='failed' — but preview_set.status was
+    # already "committed" and Wonder lifecycle advance fired regardless.
+    # Response and stored state contradicted each other.
+    #
+    # New flow: we already have `chosen` from the resolution block above.
+    # Execute the plan first, count successes, THEN flip state only when
+    # at least one step actually applied. Zero successes = honest no-op.
     result = {
-        "committed": True,
         "variant_id": chosen.variant_id,
         "label": chosen.label,
         "intent": chosen.intent,
@@ -381,15 +378,24 @@ async def commit_preview_variant(
     result["steps_ok"] = steps_ok
     result["steps_failed"] = steps_failed
+    # ── P1#2: only flip preview-set state when at least one step succeeded ──
     if steps_failed == 0 and steps_ok > 0:
         result["status"] = "committed"
+        result["committed"] = True
+        engine.commit_variant(ps, variant_id)
     elif steps_ok > 0:
         result["status"] = "committed_with_errors"
+        result["committed"] = True  # partial but real commit
+        engine.commit_variant(ps, variant_id)
     else:
+        # Every step failed — do NOT flip preview-set state, do NOT advance
+        # Wonder. The response already reflects the failure; the stored
+        # state must agree.
         result["status"] = "failed"
         result["committed"] = False
+        return result
-    # Wonder lifecycle hooks
+    # Wonder lifecycle hooks — only reached when steps_ok > 0.
     ws = _find_wonder_session_by_preview(set_id)
     if ws:
         ws.selected_variant_id = variant_id

package/mcp_server/runtime/tools.py CHANGED Viewed

@@ -179,11 +179,29 @@ def get_session_kernel(
         if analyzer_ok:
             analyzer_fresh = spectral.get("spectrum") is not None
+    # P2#3 (v1.17.3): probe web + flucoma the same way get_capability_state
+    # does, and propagate through. Without this the kernel's capability view
+    # lies to orchestration planners.
+    web_ok = _probe_web()
+    flucoma_ok = _probe_flucoma()
+    # v1.17.4: probe memory the same way too. Previously memory_ok=True was
+    # hardcoded — if the store raised, the kernel still reported memory
+    # available. Same truth-gap class as the v1.17.3 web/flucoma fix.
+    memory_ok = False
+    try:
+        _memory_store.list_techniques(limit=1)
+        memory_ok = True
+    except Exception as exc:
+        logger.debug("get_session_kernel memory probe failed: %s", exc)
     state = build_capability_state(
         session_ok=session_ok,
         analyzer_ok=analyzer_ok,
         analyzer_fresh=analyzer_fresh,
-        memory_ok=True,
+        memory_ok=memory_ok,
+        web_ok=web_ok,
+        flucoma_ok=flucoma_ok,
     )
     # Optional subcomponents — degrade gracefully, but reach into the SAME
@@ -240,9 +258,18 @@ def get_session_kernel(
     except Exception as e:
         kernel_warnings.append(f"session_memory_unavailable: {e}")
+    # v1.17.4: state.to_dict() wraps its output as {"capability_state": {...}}
+    # because that shape is what the standalone get_capability_state tool
+    # returns. When building the session kernel, that wrapper becomes the
+    # ugly double-nested kernel["capability_state"]["capability_state"]["domains"]
+    # path. Unwrap once here so kernel consumers get
+    # kernel["capability_state"]["domains"] directly.
+    _cap_dict = state.to_dict()
+    _cap_flat = _cap_dict.get("capability_state", _cap_dict)
     kernel = build_session_kernel(
         session_info=session_info,
-        capability_state=state.to_dict(),
+        capability_state=_cap_flat,
         request_text=request_text,
         mode=mode,
         aggression=aggression,

package/mcp_server/tools/_agent_os_engine/iteration.py CHANGED Viewed

@@ -39,11 +39,16 @@ class IterationResult:
     """Final result of iterate_toward_goal.
     status:
-      - "committed" — a winner hit threshold, was committed permanently
-      - "exhausted" — max_iterations reached, committed best-so-far (on_timeout=commit_best)
+      - "committed" — a winner hit threshold AND commit succeeded (steps_ok>0, steps_failed==0)
+      - "committed_with_errors" — commit applied some steps but not all (steps_ok>0 AND steps_failed>0)
+      - "commit_failed" — commit was attempted but applied zero steps (steps_ok==0 OR committed:false)
+      - "exhausted" — max_iterations reached, committed best-so-far cleanly (on_timeout=commit_best)
       - "timeout_no_commit" — max_iterations reached, no commit (on_timeout=discard_on_timeout)
       - "no_candidates" — caller provided empty candidate_move_sets
-      - "error" — unrecoverable error; see reason
+    commit_result: the raw dict returned by commit_fn, surfaced for caller
+      inspection. Populated whenever commit_fn was called (regardless of
+      whether the commit succeeded). None when no commit was attempted.
     """
     status: str
     iterations_run: int
@@ -52,9 +57,10 @@ class IterationResult:
     final_score: float
     steps: list[IterationStep] = field(default_factory=list)
     reason: str = ""
+    commit_result: Optional[dict] = None
     def to_dict(self) -> dict:
-        return {
+        d = {
             "status": self.status,
             "iterations_run": self.iterations_run,
             "committed_experiment_id": self.committed_experiment_id,
@@ -63,6 +69,55 @@ class IterationResult:
             "steps": [s.to_dict() for s in self.steps],
             "reason": self.reason,
         }
+        if self.commit_result is not None:
+            d["commit_result"] = self.commit_result
+        return d
+def _classify_commit_result(result: Any) -> str:
+    """Inspect a commit_fn return value and classify into an IterationResult
+    status. Conservative: any failure signal produces 'commit_failed', any
+    partial signal produces 'committed_with_errors', only clean success
+    produces 'committed'.
+    Known failure signals:
+      - {"committed": False, ...}
+      - {"status": "failed", ...}
+      - {"ok": False, ...}
+      - {"error": ...} present at top level (unless committed explicitly True)
+      - {"steps_ok": 0, ...}
+    Known partial signals:
+      - {"status": "committed_with_errors", ...}
+      - {"steps_failed": N, "steps_ok": M>0} where N>0
+    """
+    if not isinstance(result, dict):
+        # Non-dict returns: trust the caller but don't confirm partial/error.
+        return "committed"
+    # Hard failure signals
+    if result.get("committed") is False:
+        return "commit_failed"
+    if result.get("ok") is False:
+        return "commit_failed"
+    if result.get("status") == "failed":
+        return "commit_failed"
+    steps_ok = result.get("steps_ok")
+    steps_failed = result.get("steps_failed")
+    if steps_ok == 0 and (steps_failed is None or steps_failed > 0):
+        return "commit_failed"
+    # Partial success
+    if result.get("status") == "committed_with_errors":
+        return "committed_with_errors"
+    if (
+        isinstance(steps_failed, int) and steps_failed > 0
+        and isinstance(steps_ok, int) and steps_ok > 0
+    ):
+        return "committed_with_errors"
+    # Otherwise: clean success
+    return "committed"
 def iterate_toward_goal_engine(
@@ -195,15 +250,37 @@ def _iterate_sync_core(
             # otherwise the old non-winning experiment leaks in the store.
             if best_exp_id is not None and best_exp_id != exp_id:
                 discard_fn(best_exp_id)
-            commit_fn(exp_id, winner_branch_id)
+            commit_payload = commit_fn(exp_id, winner_branch_id)
+            commit_status = _classify_commit_result(commit_payload)
+            commit_dict = commit_payload if isinstance(commit_payload, dict) else None
+            if commit_status == "commit_failed":
+                return IterationResult(
+                    status="commit_failed",
+                    iterations_run=i + 1,
+                    committed_experiment_id=None,
+                    committed_branch_id=None,
+                    final_score=winner_score,
+                    steps=steps,
+                    reason=(
+                        f"threshold {threshold} met on iteration {i} but commit "
+                        f"applied no steps; see commit_result"
+                    ),
+                    commit_result=commit_dict,
+                )
             return IterationResult(
-                status="committed",
+                status=commit_status,  # "committed" or "committed_with_errors"
                 iterations_run=i + 1,
                 committed_experiment_id=exp_id,
                 committed_branch_id=winner_branch_id,
                 final_score=winner_score,
                 steps=steps,
-                reason=f"threshold {threshold} met on iteration {i}",
+                reason=(
+                    f"threshold {threshold} met on iteration {i}"
+                    if commit_status == "committed"
+                    else f"threshold {threshold} met on iteration {i}; "
+                         f"commit applied with partial failures (see commit_result)"
+                ),
+                commit_result=commit_dict,
             )
         if winner_branch_id is not None and winner_score > best_score:
@@ -217,9 +294,26 @@ def _iterate_sync_core(
             discard_fn(exp_id)
     if on_timeout == "commit_best" and best_exp_id and best_branch_id:
-        commit_fn(best_exp_id, best_branch_id)
+        commit_payload = commit_fn(best_exp_id, best_branch_id)
+        commit_status = _classify_commit_result(commit_payload)
+        commit_dict = commit_payload if isinstance(commit_payload, dict) else None
+        if commit_status == "commit_failed":
+            return IterationResult(
+                status="commit_failed",
+                iterations_run=n,
+                committed_experiment_id=None,
+                committed_branch_id=None,
+                final_score=best_score,
+                steps=steps,
+                reason=(
+                    f"max_iterations={n} reached; commit_best selected best-so-far "
+                    f"(score {best_score}) but the commit applied no steps; "
+                    f"see commit_result"
+                ),
+                commit_result=commit_dict,
+            )
         return IterationResult(
-            status="exhausted",
+            status="exhausted" if commit_status == "committed" else "committed_with_errors",
             iterations_run=n,
             committed_experiment_id=best_exp_id,
             committed_branch_id=best_branch_id,
@@ -228,7 +322,9 @@ def _iterate_sync_core(
             reason=(
                 f"max_iterations={n} reached, threshold {threshold} never met; "
                 f"committed best-so-far with score {best_score}"
+                + ("" if commit_status == "committed" else " (partial commit — see commit_result)")
             ),
+            commit_result=commit_dict,
         )
     if best_exp_id:
@@ -296,15 +392,37 @@ async def _iterate_async_core(
         if met:
             if best_exp_id is not None and best_exp_id != exp_id:
                 await _maybe_await(discard_fn(best_exp_id))
-            await _maybe_await(commit_fn(exp_id, winner_branch_id))
+            commit_payload = await _maybe_await(commit_fn(exp_id, winner_branch_id))
+            commit_status = _classify_commit_result(commit_payload)
+            commit_dict = commit_payload if isinstance(commit_payload, dict) else None
+            if commit_status == "commit_failed":
+                return IterationResult(
+                    status="commit_failed",
+                    iterations_run=i + 1,
+                    committed_experiment_id=None,
+                    committed_branch_id=None,
+                    final_score=winner_score,
+                    steps=steps,
+                    reason=(
+                        f"threshold {threshold} met on iteration {i} but commit "
+                        f"applied no steps; see commit_result"
+                    ),
+                    commit_result=commit_dict,
+                )
             return IterationResult(
-                status="committed",
+                status=commit_status,
                 iterations_run=i + 1,
                 committed_experiment_id=exp_id,
                 committed_branch_id=winner_branch_id,
                 final_score=winner_score,
                 steps=steps,
-                reason=f"threshold {threshold} met on iteration {i}",
+                reason=(
+                    f"threshold {threshold} met on iteration {i}"
+                    if commit_status == "committed"
+                    else f"threshold {threshold} met on iteration {i}; "
+                         f"commit applied with partial failures (see commit_result)"
+                ),
+                commit_result=commit_dict,
             )
         if winner_branch_id is not None and winner_score > best_score:
@@ -317,9 +435,26 @@ async def _iterate_async_core(
             await _maybe_await(discard_fn(exp_id))
     if on_timeout == "commit_best" and best_exp_id and best_branch_id:
-        await _maybe_await(commit_fn(best_exp_id, best_branch_id))
+        commit_payload = await _maybe_await(commit_fn(best_exp_id, best_branch_id))
+        commit_status = _classify_commit_result(commit_payload)
+        commit_dict = commit_payload if isinstance(commit_payload, dict) else None
+        if commit_status == "commit_failed":
+            return IterationResult(
+                status="commit_failed",
+                iterations_run=n,
+                committed_experiment_id=None,
+                committed_branch_id=None,
+                final_score=best_score,
+                steps=steps,
+                reason=(
+                    f"max_iterations={n} reached; commit_best selected best-so-far "
+                    f"(score {best_score}) but the commit applied no steps; "
+                    f"see commit_result"
+                ),
+                commit_result=commit_dict,
+            )
         return IterationResult(
-            status="exhausted",
+            status="exhausted" if commit_status == "committed" else "committed_with_errors",
             iterations_run=n,
             committed_experiment_id=best_exp_id,
             committed_branch_id=best_branch_id,
@@ -328,7 +463,9 @@ async def _iterate_async_core(
             reason=(
                 f"max_iterations={n} reached, threshold {threshold} never met; "
                 f"committed best-so-far with score {best_score}"
+                + ("" if commit_status == "committed" else " (partial commit — see commit_result)")
             ),
+            commit_result=commit_dict,
         )
     if best_exp_id:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "livepilot",
-  "version": "1.17.2",
+  "version": "1.17.4",
   "mcpName": "io.github.dreamrec/livepilot",
   "description": "Agentic production system for Ableton Live 12 — 427 tools, 52 domains. Device atlas (1305 devices), sample engine (Splice + browser + filesystem), auto-composition, spectral perception, technique memory, creative intelligence (12 engines)",
   "author": "Pilot Studio",

package/remote_script/LivePilot/__init__.py CHANGED Viewed

@@ -5,7 +5,7 @@ Entry point for the ControlSurface. Ableton calls create_instance(c_instance)
 when this script is selected in Preferences > Link, Tempo & MIDI.
 """
-__version__ = "1.17.2"
+__version__ = "1.17.4"
 from _Framework.ControlSurface import ControlSurface
 from . import router

package/server.json CHANGED Viewed

@@ -6,12 +6,12 @@
     "url": "https://github.com/dreamrec/LivePilot",
     "source": "github"
   },
-  "version": "1.17.2",
+  "version": "1.17.4",
   "packages": [
     {
       "registryType": "npm",
       "identifier": "livepilot",
-      "version": "1.17.2",
+      "version": "1.17.4",
       "transport": {
         "type": "stdio"
       }