livepilot 1.17.2 → 1.17.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,112 @@
1
1
  # Changelog
2
2
 
3
+ ## 1.17.4 — Shape cleanup + memory probe (April 23 2026)
4
+
5
+ ### Fixed
6
+
7
+ - **`get_session_kernel` now probes the memory store** instead of
8
+ hardcoding `memory_ok=True` (`mcp_server/runtime/tools.py`). If the
9
+ underlying technique store raises on `list_techniques` (disk full,
10
+ corrupted index, permissions error), the kernel previously still
11
+ reported memory as available to orchestration planners. Same
12
+ truth-gap class as the v1.17.3 web/flucoma fix — should have been
13
+ caught by the same review pass. Now probed the same way
14
+ `get_capability_state` does, wrapped in try/except.
15
+ - **`capability_state` flat shape** in session kernel
16
+ (`mcp_server/runtime/tools.py`): `state.to_dict()` wraps its output as
17
+ `{"capability_state": {...}}` — that's the right shape for the
18
+ standalone `get_capability_state` tool, but when stored on the kernel
19
+ it produced the ugly double-nested
20
+ `kernel["capability_state"]["capability_state"]["domains"]`. v1.17.3
21
+ probe tests worked around it with defensive
22
+ `outer.get("capability_state", outer)`. Fix: unwrap the outer key
23
+ once before passing to `build_session_kernel`. Consumer path is
24
+ now `kernel["capability_state"]["domains"]` directly. Standalone
25
+ `get_capability_state` return shape unchanged.
26
+
27
+ ### Tests
28
+
29
+ - 4 new TDD tests in `tests/test_runtime_capability_probes.py`:
30
+ - memory probe raises → kernel reports memory unavailable
31
+ - memory probe succeeds → kernel reports available
32
+ - kernel's capability_state has no nested `capability_state` key
33
+ - end-to-end flat access without defensive fallbacks
34
+ - Consumer updates:
35
+ - `test_session_kernel.py:203` — removed extra level
36
+ - `test_runtime_capability_probes.py` (4 places) — removed
37
+ defensive `outer.get('capability_state', outer)` pattern now that
38
+ the shape is known-flat
39
+
40
+ 2722 → 2726 passing.
41
+
42
+ ### Known follow-up
43
+
44
+ Audit while writing this release flagged a third bug in
45
+ `mcp_server/runtime/safety_kernel.py:244`: the safety kernel reads
46
+ `capability_state.get("mode", "normal")` but the actual shape uses
47
+ `overall_mode`, not `mode`. The `.get(..., "normal")` default silently
48
+ falls back, so `read_only` mode gating never kicks in. Separate fix,
49
+ out of scope for this release.
50
+
51
+ ## 1.17.3 — Truth-gap remediation, for real (April 23 2026)
52
+
53
+ ### Fixed
54
+
55
+ - **`iterate_toward_goal` now inspects `commit_fn` return value** (P1,
56
+ `mcp_server/tools/_agent_os_engine/iteration.py`): prior to this release
57
+ the iteration loop awaited the commit callback and dropped the return
58
+ value on the floor, then unconditionally returned `status="committed"`.
59
+ If the underlying `commit_branch_async` applied zero steps or partially
60
+ succeeded, the iteration result claimed success — the exact bug pattern
61
+ the release was meant to fix elsewhere. New `_classify_commit_result()`
62
+ helper maps known payload shapes to three statuses: `"committed"` (clean),
63
+ `"committed_with_errors"` (steps_ok > 0 AND steps_failed > 0), and
64
+ `"commit_failed"` (committed=False, ok=False, status="failed", or
65
+ steps_ok == 0). Both sync and async cores now zero out
66
+ `committed_experiment_id`/`committed_branch_id` when the commit truly
67
+ failed, and surface the raw commit payload on `IterationResult.commit_result`.
68
+ - **Preview Studio commit-before-execute ordering** (P1,
69
+ `mcp_server/preview_studio/tools.py`): `commit_preview_variant()` called
70
+ `engine.commit_variant()` BEFORE `execute_plan_steps_async` ran. That
71
+ flipped `preview_set.status = "committed"` and `committed_variant_id`
72
+ up front, so when every execution step failed the response correctly
73
+ said `committed: false / status: "failed"` but the stored state still
74
+ said the opposite. Wonder lifecycle advance also fired regardless.
75
+ Reorder: execute first, then flip state only when `steps_ok > 0`.
76
+ Zero-success path now returns honestly and leaves `preview_set` and
77
+ WonderSession untouched. Partial-success stays a legitimate commit
78
+ with `status="committed_with_errors"`.
79
+ - **`get_session_kernel` propagates web + flucoma probe results** (P2,
80
+ `mcp_server/runtime/tools.py`): the kernel builder called
81
+ `build_capability_state(...)` with only session/analyzer/memory params,
82
+ so `web_ok` and `flucoma_ok` silently defaulted to `False`. Meanwhile
83
+ `get_capability_state()` correctly probed both. Planners that read
84
+ the kernel (the documented orchestration entrypoint) stayed on
85
+ degraded paths even when probes would have reported available. Fix:
86
+ call `_probe_web()` + `_probe_flucoma()` inside `get_session_kernel`
87
+ and pass through.
88
+
89
+ ### Added
90
+
91
+ - **10 new tests** covering the three truth-gap classes:
92
+ - `test_iterate_toward_goal.py`: 4 tests for commit inspection
93
+ (failed commit, partial commit, timeout commit_best, back-compat
94
+ clean success).
95
+ - `test_preview_studio_truth_gap.py`: 3 tests for
96
+ executable-variant-fails paths (all-steps-fail preserves state,
97
+ Wonder not advanced, partial-success honest commit).
98
+ - `test_runtime_capability_probes.py`: 3 tests for kernel
99
+ propagation (web probe → kernel, flucoma probe → kernel,
100
+ both-unavailable back-compat).
101
+ - **`IterationResult.commit_result`** — the raw commit_fn payload,
102
+ surfaced on the returned dict whenever a commit was attempted.
103
+ Callers can inspect `result["commit_result"]["steps_failed"]`,
104
+ `result["commit_result"]["error"]`, etc.
105
+
106
+ This release closes what the post-v1.17.2 review correctly flagged:
107
+ the feature we shipped to "close the evaluation loop" had a truth-gap
108
+ at the innermost step. 2712 → 2722 tests pass.
109
+
3
110
  ## 1.17.2 — iterate_toward_goal + preview-studio truth-gap (April 23 2026)
4
111
 
5
112
  ### Added
Binary file
@@ -95,7 +95,7 @@ function anything() {
95
95
  function dispatch(cmd, args) {
96
96
  switch(cmd) {
97
97
  case "ping":
98
- send_response({"ok": true, "version": "1.17.2"});
98
+ send_response({"ok": true, "version": "1.17.4"});
99
99
  break;
100
100
  case "get_params":
101
101
  cmd_get_params(args);
@@ -1,2 +1,2 @@
1
1
  """LivePilot MCP Server — bridges MCP protocol to Ableton Live."""
2
- __version__ = "1.17.2"
2
+ __version__ = "1.17.4"
@@ -327,20 +327,17 @@ async def commit_preview_variant(
327
327
  ),
328
328
  }
329
329
 
330
- # Only now do we flip state the chosen variant has an executable plan.
331
- chosen = engine.commit_variant(ps, variant_id)
332
- # engine.commit_variant cannot return None here (we already verified
333
- # the variant_id exists), but keep the defensive check for the type
334
- # checker.
335
- if not chosen:
336
- available = [v.variant_id for v in ps.variants]
337
- return {
338
- "error": f"Variant {variant_id} not found in set {set_id}",
339
- "available_variants": available,
340
- }
341
-
330
+ # ── P1#2 fix (v1.17.3): execute BEFORE flipping state ──
331
+ # Prior behavior: engine.commit_variant() ran here, BEFORE execution.
332
+ # If every step then failed, the returned payload correctly said
333
+ # committed=False / status='failed' but preview_set.status was
334
+ # already "committed" and Wonder lifecycle advance fired regardless.
335
+ # Response and stored state contradicted each other.
336
+ #
337
+ # New flow: we already have `chosen` from the resolution block above.
338
+ # Execute the plan first, count successes, THEN flip state only when
339
+ # at least one step actually applied. Zero successes = honest no-op.
342
340
  result = {
343
- "committed": True,
344
341
  "variant_id": chosen.variant_id,
345
342
  "label": chosen.label,
346
343
  "intent": chosen.intent,
@@ -381,15 +378,24 @@ async def commit_preview_variant(
381
378
  result["steps_ok"] = steps_ok
382
379
  result["steps_failed"] = steps_failed
383
380
 
381
+ # ── P1#2: only flip preview-set state when at least one step succeeded ──
384
382
  if steps_failed == 0 and steps_ok > 0:
385
383
  result["status"] = "committed"
384
+ result["committed"] = True
385
+ engine.commit_variant(ps, variant_id)
386
386
  elif steps_ok > 0:
387
387
  result["status"] = "committed_with_errors"
388
+ result["committed"] = True # partial but real commit
389
+ engine.commit_variant(ps, variant_id)
388
390
  else:
391
+ # Every step failed — do NOT flip preview-set state, do NOT advance
392
+ # Wonder. The response already reflects the failure; the stored
393
+ # state must agree.
389
394
  result["status"] = "failed"
390
395
  result["committed"] = False
396
+ return result
391
397
 
392
- # Wonder lifecycle hooks
398
+ # Wonder lifecycle hooks — only reached when steps_ok > 0.
393
399
  ws = _find_wonder_session_by_preview(set_id)
394
400
  if ws:
395
401
  ws.selected_variant_id = variant_id
@@ -179,11 +179,29 @@ def get_session_kernel(
179
179
  if analyzer_ok:
180
180
  analyzer_fresh = spectral.get("spectrum") is not None
181
181
 
182
+ # P2#3 (v1.17.3): probe web + flucoma the same way get_capability_state
183
+ # does, and propagate through. Without this the kernel's capability view
184
+ # lies to orchestration planners.
185
+ web_ok = _probe_web()
186
+ flucoma_ok = _probe_flucoma()
187
+
188
+ # v1.17.4: probe memory the same way too. Previously memory_ok=True was
189
+ # hardcoded — if the store raised, the kernel still reported memory
190
+ # available. Same truth-gap class as the v1.17.3 web/flucoma fix.
191
+ memory_ok = False
192
+ try:
193
+ _memory_store.list_techniques(limit=1)
194
+ memory_ok = True
195
+ except Exception as exc:
196
+ logger.debug("get_session_kernel memory probe failed: %s", exc)
197
+
182
198
  state = build_capability_state(
183
199
  session_ok=session_ok,
184
200
  analyzer_ok=analyzer_ok,
185
201
  analyzer_fresh=analyzer_fresh,
186
- memory_ok=True,
202
+ memory_ok=memory_ok,
203
+ web_ok=web_ok,
204
+ flucoma_ok=flucoma_ok,
187
205
  )
188
206
 
189
207
  # Optional subcomponents — degrade gracefully, but reach into the SAME
@@ -240,9 +258,18 @@ def get_session_kernel(
240
258
  except Exception as e:
241
259
  kernel_warnings.append(f"session_memory_unavailable: {e}")
242
260
 
261
+ # v1.17.4: state.to_dict() wraps its output as {"capability_state": {...}}
262
+ # because that shape is what the standalone get_capability_state tool
263
+ # returns. When building the session kernel, that wrapper becomes the
264
+ # ugly double-nested kernel["capability_state"]["capability_state"]["domains"]
265
+ # path. Unwrap once here so kernel consumers get
266
+ # kernel["capability_state"]["domains"] directly.
267
+ _cap_dict = state.to_dict()
268
+ _cap_flat = _cap_dict.get("capability_state", _cap_dict)
269
+
243
270
  kernel = build_session_kernel(
244
271
  session_info=session_info,
245
- capability_state=state.to_dict(),
272
+ capability_state=_cap_flat,
246
273
  request_text=request_text,
247
274
  mode=mode,
248
275
  aggression=aggression,
@@ -39,11 +39,16 @@ class IterationResult:
39
39
  """Final result of iterate_toward_goal.
40
40
 
41
41
  status:
42
- - "committed" — a winner hit threshold, was committed permanently
43
- - "exhausted" — max_iterations reached, committed best-so-far (on_timeout=commit_best)
42
+ - "committed" — a winner hit threshold AND commit succeeded (steps_ok>0, steps_failed==0)
43
+ - "committed_with_errors" — commit applied some steps but not all (steps_ok>0 AND steps_failed>0)
44
+ - "commit_failed" — commit was attempted but applied zero steps (steps_ok==0 OR committed:false)
45
+ - "exhausted" — max_iterations reached, committed best-so-far cleanly (on_timeout=commit_best)
44
46
  - "timeout_no_commit" — max_iterations reached, no commit (on_timeout=discard_on_timeout)
45
47
  - "no_candidates" — caller provided empty candidate_move_sets
46
- - "error" — unrecoverable error; see reason
48
+
49
+ commit_result: the raw dict returned by commit_fn, surfaced for caller
50
+ inspection. Populated whenever commit_fn was called (regardless of
51
+ whether the commit succeeded). None when no commit was attempted.
47
52
  """
48
53
  status: str
49
54
  iterations_run: int
@@ -52,9 +57,10 @@ class IterationResult:
52
57
  final_score: float
53
58
  steps: list[IterationStep] = field(default_factory=list)
54
59
  reason: str = ""
60
+ commit_result: Optional[dict] = None
55
61
 
56
62
  def to_dict(self) -> dict:
57
- return {
63
+ d = {
58
64
  "status": self.status,
59
65
  "iterations_run": self.iterations_run,
60
66
  "committed_experiment_id": self.committed_experiment_id,
@@ -63,6 +69,55 @@ class IterationResult:
63
69
  "steps": [s.to_dict() for s in self.steps],
64
70
  "reason": self.reason,
65
71
  }
72
+ if self.commit_result is not None:
73
+ d["commit_result"] = self.commit_result
74
+ return d
75
+
76
+
77
+ def _classify_commit_result(result: Any) -> str:
78
+ """Inspect a commit_fn return value and classify into an IterationResult
79
+ status. Conservative: any failure signal produces 'commit_failed', any
80
+ partial signal produces 'committed_with_errors', only clean success
81
+ produces 'committed'.
82
+
83
+ Known failure signals:
84
+ - {"committed": False, ...}
85
+ - {"status": "failed", ...}
86
+ - {"ok": False, ...}
87
+ - {"error": ...} present at top level (unless committed explicitly True)
88
+ - {"steps_ok": 0, ...}
89
+
90
+ Known partial signals:
91
+ - {"status": "committed_with_errors", ...}
92
+ - {"steps_failed": N, "steps_ok": M>0} where N>0
93
+ """
94
+ if not isinstance(result, dict):
95
+ # Non-dict returns: trust the caller but don't confirm partial/error.
96
+ return "committed"
97
+
98
+ # Hard failure signals
99
+ if result.get("committed") is False:
100
+ return "commit_failed"
101
+ if result.get("ok") is False:
102
+ return "commit_failed"
103
+ if result.get("status") == "failed":
104
+ return "commit_failed"
105
+ steps_ok = result.get("steps_ok")
106
+ steps_failed = result.get("steps_failed")
107
+ if steps_ok == 0 and (steps_failed is None or steps_failed > 0):
108
+ return "commit_failed"
109
+
110
+ # Partial success
111
+ if result.get("status") == "committed_with_errors":
112
+ return "committed_with_errors"
113
+ if (
114
+ isinstance(steps_failed, int) and steps_failed > 0
115
+ and isinstance(steps_ok, int) and steps_ok > 0
116
+ ):
117
+ return "committed_with_errors"
118
+
119
+ # Otherwise: clean success
120
+ return "committed"
66
121
 
67
122
 
68
123
  def iterate_toward_goal_engine(
@@ -195,15 +250,37 @@ def _iterate_sync_core(
195
250
  # otherwise the old non-winning experiment leaks in the store.
196
251
  if best_exp_id is not None and best_exp_id != exp_id:
197
252
  discard_fn(best_exp_id)
198
- commit_fn(exp_id, winner_branch_id)
253
+ commit_payload = commit_fn(exp_id, winner_branch_id)
254
+ commit_status = _classify_commit_result(commit_payload)
255
+ commit_dict = commit_payload if isinstance(commit_payload, dict) else None
256
+ if commit_status == "commit_failed":
257
+ return IterationResult(
258
+ status="commit_failed",
259
+ iterations_run=i + 1,
260
+ committed_experiment_id=None,
261
+ committed_branch_id=None,
262
+ final_score=winner_score,
263
+ steps=steps,
264
+ reason=(
265
+ f"threshold {threshold} met on iteration {i} but commit "
266
+ f"applied no steps; see commit_result"
267
+ ),
268
+ commit_result=commit_dict,
269
+ )
199
270
  return IterationResult(
200
- status="committed",
271
+ status=commit_status, # "committed" or "committed_with_errors"
201
272
  iterations_run=i + 1,
202
273
  committed_experiment_id=exp_id,
203
274
  committed_branch_id=winner_branch_id,
204
275
  final_score=winner_score,
205
276
  steps=steps,
206
- reason=f"threshold {threshold} met on iteration {i}",
277
+ reason=(
278
+ f"threshold {threshold} met on iteration {i}"
279
+ if commit_status == "committed"
280
+ else f"threshold {threshold} met on iteration {i}; "
281
+ f"commit applied with partial failures (see commit_result)"
282
+ ),
283
+ commit_result=commit_dict,
207
284
  )
208
285
 
209
286
  if winner_branch_id is not None and winner_score > best_score:
@@ -217,9 +294,26 @@ def _iterate_sync_core(
217
294
  discard_fn(exp_id)
218
295
 
219
296
  if on_timeout == "commit_best" and best_exp_id and best_branch_id:
220
- commit_fn(best_exp_id, best_branch_id)
297
+ commit_payload = commit_fn(best_exp_id, best_branch_id)
298
+ commit_status = _classify_commit_result(commit_payload)
299
+ commit_dict = commit_payload if isinstance(commit_payload, dict) else None
300
+ if commit_status == "commit_failed":
301
+ return IterationResult(
302
+ status="commit_failed",
303
+ iterations_run=n,
304
+ committed_experiment_id=None,
305
+ committed_branch_id=None,
306
+ final_score=best_score,
307
+ steps=steps,
308
+ reason=(
309
+ f"max_iterations={n} reached; commit_best selected best-so-far "
310
+ f"(score {best_score}) but the commit applied no steps; "
311
+ f"see commit_result"
312
+ ),
313
+ commit_result=commit_dict,
314
+ )
221
315
  return IterationResult(
222
- status="exhausted",
316
+ status="exhausted" if commit_status == "committed" else "committed_with_errors",
223
317
  iterations_run=n,
224
318
  committed_experiment_id=best_exp_id,
225
319
  committed_branch_id=best_branch_id,
@@ -228,7 +322,9 @@ def _iterate_sync_core(
228
322
  reason=(
229
323
  f"max_iterations={n} reached, threshold {threshold} never met; "
230
324
  f"committed best-so-far with score {best_score}"
325
+ + ("" if commit_status == "committed" else " (partial commit — see commit_result)")
231
326
  ),
327
+ commit_result=commit_dict,
232
328
  )
233
329
 
234
330
  if best_exp_id:
@@ -296,15 +392,37 @@ async def _iterate_async_core(
296
392
  if met:
297
393
  if best_exp_id is not None and best_exp_id != exp_id:
298
394
  await _maybe_await(discard_fn(best_exp_id))
299
- await _maybe_await(commit_fn(exp_id, winner_branch_id))
395
+ commit_payload = await _maybe_await(commit_fn(exp_id, winner_branch_id))
396
+ commit_status = _classify_commit_result(commit_payload)
397
+ commit_dict = commit_payload if isinstance(commit_payload, dict) else None
398
+ if commit_status == "commit_failed":
399
+ return IterationResult(
400
+ status="commit_failed",
401
+ iterations_run=i + 1,
402
+ committed_experiment_id=None,
403
+ committed_branch_id=None,
404
+ final_score=winner_score,
405
+ steps=steps,
406
+ reason=(
407
+ f"threshold {threshold} met on iteration {i} but commit "
408
+ f"applied no steps; see commit_result"
409
+ ),
410
+ commit_result=commit_dict,
411
+ )
300
412
  return IterationResult(
301
- status="committed",
413
+ status=commit_status,
302
414
  iterations_run=i + 1,
303
415
  committed_experiment_id=exp_id,
304
416
  committed_branch_id=winner_branch_id,
305
417
  final_score=winner_score,
306
418
  steps=steps,
307
- reason=f"threshold {threshold} met on iteration {i}",
419
+ reason=(
420
+ f"threshold {threshold} met on iteration {i}"
421
+ if commit_status == "committed"
422
+ else f"threshold {threshold} met on iteration {i}; "
423
+ f"commit applied with partial failures (see commit_result)"
424
+ ),
425
+ commit_result=commit_dict,
308
426
  )
309
427
 
310
428
  if winner_branch_id is not None and winner_score > best_score:
@@ -317,9 +435,26 @@ async def _iterate_async_core(
317
435
  await _maybe_await(discard_fn(exp_id))
318
436
 
319
437
  if on_timeout == "commit_best" and best_exp_id and best_branch_id:
320
- await _maybe_await(commit_fn(best_exp_id, best_branch_id))
438
+ commit_payload = await _maybe_await(commit_fn(best_exp_id, best_branch_id))
439
+ commit_status = _classify_commit_result(commit_payload)
440
+ commit_dict = commit_payload if isinstance(commit_payload, dict) else None
441
+ if commit_status == "commit_failed":
442
+ return IterationResult(
443
+ status="commit_failed",
444
+ iterations_run=n,
445
+ committed_experiment_id=None,
446
+ committed_branch_id=None,
447
+ final_score=best_score,
448
+ steps=steps,
449
+ reason=(
450
+ f"max_iterations={n} reached; commit_best selected best-so-far "
451
+ f"(score {best_score}) but the commit applied no steps; "
452
+ f"see commit_result"
453
+ ),
454
+ commit_result=commit_dict,
455
+ )
321
456
  return IterationResult(
322
- status="exhausted",
457
+ status="exhausted" if commit_status == "committed" else "committed_with_errors",
323
458
  iterations_run=n,
324
459
  committed_experiment_id=best_exp_id,
325
460
  committed_branch_id=best_branch_id,
@@ -328,7 +463,9 @@ async def _iterate_async_core(
328
463
  reason=(
329
464
  f"max_iterations={n} reached, threshold {threshold} never met; "
330
465
  f"committed best-so-far with score {best_score}"
466
+ + ("" if commit_status == "committed" else " (partial commit — see commit_result)")
331
467
  ),
468
+ commit_result=commit_dict,
332
469
  )
333
470
 
334
471
  if best_exp_id:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "livepilot",
3
- "version": "1.17.2",
3
+ "version": "1.17.4",
4
4
  "mcpName": "io.github.dreamrec/livepilot",
5
5
  "description": "Agentic production system for Ableton Live 12 — 427 tools, 52 domains. Device atlas (1305 devices), sample engine (Splice + browser + filesystem), auto-composition, spectral perception, technique memory, creative intelligence (12 engines)",
6
6
  "author": "Pilot Studio",
@@ -5,7 +5,7 @@ Entry point for the ControlSurface. Ableton calls create_instance(c_instance)
5
5
  when this script is selected in Preferences > Link, Tempo & MIDI.
6
6
  """
7
7
 
8
- __version__ = "1.17.2"
8
+ __version__ = "1.17.4"
9
9
 
10
10
  from _Framework.ControlSurface import ControlSurface
11
11
  from . import router
package/server.json CHANGED
@@ -6,12 +6,12 @@
6
6
  "url": "https://github.com/dreamrec/LivePilot",
7
7
  "source": "github"
8
8
  },
9
- "version": "1.17.2",
9
+ "version": "1.17.4",
10
10
  "packages": [
11
11
  {
12
12
  "registryType": "npm",
13
13
  "identifier": "livepilot",
14
- "version": "1.17.2",
14
+ "version": "1.17.4",
15
15
  "transport": {
16
16
  "type": "stdio"
17
17
  }