livepilot 1.17.2 → 1.17.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,64 @@
1
1
  # Changelog
2
2
 
3
+ ## 1.17.3 — Truth-gap remediation, for real (April 23 2026)
4
+
5
+ ### Fixed
6
+
7
+ - **`iterate_toward_goal` now inspects `commit_fn` return value** (P1,
8
+ `mcp_server/tools/_agent_os_engine/iteration.py`): prior to this release
9
+ the iteration loop awaited the commit callback and dropped the return
10
+ value on the floor, then unconditionally returned `status="committed"`.
11
+ If the underlying `commit_branch_async` applied zero steps or partially
12
+ succeeded, the iteration result claimed success — the exact bug pattern
13
+ the release was meant to fix elsewhere. New `_classify_commit_result()`
14
+ helper maps known payload shapes to three statuses: `"committed"` (clean),
15
+ `"committed_with_errors"` (steps_ok > 0 AND steps_failed > 0), and
16
+ `"commit_failed"` (committed=False, ok=False, status="failed", or
17
+ steps_ok == 0). Both sync and async cores now zero out
18
+ `committed_experiment_id`/`committed_branch_id` when the commit truly
19
+ failed, and surface the raw commit payload on `IterationResult.commit_result`.
20
+ - **Preview Studio commit-before-execute ordering** (P1,
21
+ `mcp_server/preview_studio/tools.py`): `commit_preview_variant()` called
22
+ `engine.commit_variant()` BEFORE `execute_plan_steps_async` ran. That
23
+ flipped `preview_set.status = "committed"` and `committed_variant_id`
24
+ up front, so when every execution step failed the response correctly
25
+ said `committed: false / status: "failed"` but the stored state still
26
+ said the opposite. Wonder lifecycle advance also fired regardless.
27
+ Reorder: execute first, then flip state only when `steps_ok > 0`.
28
+ Zero-success path now returns honestly and leaves `preview_set` and
29
+ WonderSession untouched. Partial-success stays a legitimate commit
30
+ with `status="committed_with_errors"`.
31
+ - **`get_session_kernel` propagates web + flucoma probe results** (P2,
32
+ `mcp_server/runtime/tools.py`): the kernel builder called
33
+ `build_capability_state(...)` with only session/analyzer/memory params,
34
+ so `web_ok` and `flucoma_ok` silently defaulted to `False`. Meanwhile
35
+ `get_capability_state()` correctly probed both. Planners that read
36
+ the kernel (the documented orchestration entrypoint) stayed on
37
+ degraded paths even when probes would have reported available. Fix:
38
+ call `_probe_web()` + `_probe_flucoma()` inside `get_session_kernel`
39
+ and pass through.
40
+
41
+ ### Added
42
+
43
+ - **10 new tests** covering the three truth-gap classes:
44
+ - `test_iterate_toward_goal.py`: 4 tests for commit inspection
45
+ (failed commit, partial commit, timeout commit_best, back-compat
46
+ clean success).
47
+ - `test_preview_studio_truth_gap.py`: 3 tests for
48
+ executable-variant-fails paths (all-steps-fail preserves state,
49
+ Wonder not advanced, partial-success honest commit).
50
+ - `test_runtime_capability_probes.py`: 3 tests for kernel
51
+ propagation (web probe → kernel, flucoma probe → kernel,
52
+ both-unavailable back-compat).
53
+ - **`IterationResult.commit_result`** — the raw commit_fn payload,
54
+ surfaced on the returned dict whenever a commit was attempted.
55
+ Callers can inspect `result["commit_result"]["steps_failed"]`,
56
+ `result["commit_result"]["error"]`, etc.
57
+
58
+ This release closes what the post-v1.17.2 review correctly flagged:
59
+ the feature we shipped to "close the evaluation loop" had a truth-gap
60
+ at the innermost step. 2712 → 2722 tests pass.
61
+
3
62
  ## 1.17.2 — iterate_toward_goal + preview-studio truth-gap (April 23 2026)
4
63
 
5
64
  ### Added
Binary file
@@ -95,7 +95,7 @@ function anything() {
95
95
  function dispatch(cmd, args) {
96
96
  switch(cmd) {
97
97
  case "ping":
98
- send_response({"ok": true, "version": "1.17.2"});
98
+ send_response({"ok": true, "version": "1.17.3"});
99
99
  break;
100
100
  case "get_params":
101
101
  cmd_get_params(args);
@@ -1,2 +1,2 @@
1
1
  """LivePilot MCP Server — bridges MCP protocol to Ableton Live."""
2
- __version__ = "1.17.2"
2
+ __version__ = "1.17.3"
@@ -327,20 +327,17 @@ async def commit_preview_variant(
327
327
  ),
328
328
  }
329
329
 
330
- # Only now do we flip state the chosen variant has an executable plan.
331
- chosen = engine.commit_variant(ps, variant_id)
332
- # engine.commit_variant cannot return None here (we already verified
333
- # the variant_id exists), but keep the defensive check for the type
334
- # checker.
335
- if not chosen:
336
- available = [v.variant_id for v in ps.variants]
337
- return {
338
- "error": f"Variant {variant_id} not found in set {set_id}",
339
- "available_variants": available,
340
- }
341
-
330
+ # ── P1#2 fix (v1.17.3): execute BEFORE flipping state ──
331
+ # Prior behavior: engine.commit_variant() ran here, BEFORE execution.
332
+ # If every step then failed, the returned payload correctly said
333
+ # committed=False / status='failed' but preview_set.status was
334
+ # already "committed" and Wonder lifecycle advance fired regardless.
335
+ # Response and stored state contradicted each other.
336
+ #
337
+ # New flow: we already have `chosen` from the resolution block above.
338
+ # Execute the plan first, count successes, THEN flip state only when
339
+ # at least one step actually applied. Zero successes = honest no-op.
342
340
  result = {
343
- "committed": True,
344
341
  "variant_id": chosen.variant_id,
345
342
  "label": chosen.label,
346
343
  "intent": chosen.intent,
@@ -381,15 +378,24 @@ async def commit_preview_variant(
381
378
  result["steps_ok"] = steps_ok
382
379
  result["steps_failed"] = steps_failed
383
380
 
381
+ # ── P1#2: only flip preview-set state when at least one step succeeded ──
384
382
  if steps_failed == 0 and steps_ok > 0:
385
383
  result["status"] = "committed"
384
+ result["committed"] = True
385
+ engine.commit_variant(ps, variant_id)
386
386
  elif steps_ok > 0:
387
387
  result["status"] = "committed_with_errors"
388
+ result["committed"] = True # partial but real commit
389
+ engine.commit_variant(ps, variant_id)
388
390
  else:
391
+ # Every step failed — do NOT flip preview-set state, do NOT advance
392
+ # Wonder. The response already reflects the failure; the stored
393
+ # state must agree.
389
394
  result["status"] = "failed"
390
395
  result["committed"] = False
396
+ return result
391
397
 
392
- # Wonder lifecycle hooks
398
+ # Wonder lifecycle hooks — only reached when steps_ok > 0.
393
399
  ws = _find_wonder_session_by_preview(set_id)
394
400
  if ws:
395
401
  ws.selected_variant_id = variant_id
@@ -179,11 +179,19 @@ def get_session_kernel(
179
179
  if analyzer_ok:
180
180
  analyzer_fresh = spectral.get("spectrum") is not None
181
181
 
182
+ # P2#3 (v1.17.3): probe web + flucoma the same way get_capability_state
183
+ # does, and propagate through. Without this the kernel's capability view
184
+ # lies to orchestration planners.
185
+ web_ok = _probe_web()
186
+ flucoma_ok = _probe_flucoma()
187
+
182
188
  state = build_capability_state(
183
189
  session_ok=session_ok,
184
190
  analyzer_ok=analyzer_ok,
185
191
  analyzer_fresh=analyzer_fresh,
186
192
  memory_ok=True,
193
+ web_ok=web_ok,
194
+ flucoma_ok=flucoma_ok,
187
195
  )
188
196
 
189
197
  # Optional subcomponents — degrade gracefully, but reach into the SAME
@@ -39,11 +39,16 @@ class IterationResult:
39
39
  """Final result of iterate_toward_goal.
40
40
 
41
41
  status:
42
- - "committed" — a winner hit threshold, was committed permanently
43
- - "exhausted" — max_iterations reached, committed best-so-far (on_timeout=commit_best)
42
+ - "committed" — a winner hit threshold AND commit succeeded (steps_ok>0, steps_failed==0)
43
+ - "committed_with_errors" — commit applied some steps but not all (steps_ok>0 AND steps_failed>0)
44
+ - "commit_failed" — commit was attempted but applied zero steps (steps_ok==0 OR committed:false)
45
+ - "exhausted" — max_iterations reached, committed best-so-far cleanly (on_timeout=commit_best)
44
46
  - "timeout_no_commit" — max_iterations reached, no commit (on_timeout=discard_on_timeout)
45
47
  - "no_candidates" — caller provided empty candidate_move_sets
46
- - "error" — unrecoverable error; see reason
48
+
49
+ commit_result: the raw dict returned by commit_fn, surfaced for caller
50
+ inspection. Populated whenever commit_fn was called (regardless of
51
+ whether the commit succeeded). None when no commit was attempted.
47
52
  """
48
53
  status: str
49
54
  iterations_run: int
@@ -52,9 +57,10 @@ class IterationResult:
52
57
  final_score: float
53
58
  steps: list[IterationStep] = field(default_factory=list)
54
59
  reason: str = ""
60
+ commit_result: Optional[dict] = None
55
61
 
56
62
  def to_dict(self) -> dict:
57
- return {
63
+ d = {
58
64
  "status": self.status,
59
65
  "iterations_run": self.iterations_run,
60
66
  "committed_experiment_id": self.committed_experiment_id,
@@ -63,6 +69,55 @@ class IterationResult:
63
69
  "steps": [s.to_dict() for s in self.steps],
64
70
  "reason": self.reason,
65
71
  }
72
+ if self.commit_result is not None:
73
+ d["commit_result"] = self.commit_result
74
+ return d
75
+
76
+
77
+ def _classify_commit_result(result: Any) -> str:
78
+ """Inspect a commit_fn return value and classify into an IterationResult
79
+ status. Conservative: any failure signal produces 'commit_failed', any
80
+ partial signal produces 'committed_with_errors', only clean success
81
+ produces 'committed'.
82
+
83
+ Known failure signals:
84
+ - {"committed": False, ...}
85
+ - {"status": "failed", ...}
86
+ - {"ok": False, ...}
87
+ - {"error": ...} present at top level (unless committed explicitly True)
88
+ - {"steps_ok": 0, ...}
89
+
90
+ Known partial signals:
91
+ - {"status": "committed_with_errors", ...}
92
+ - {"steps_failed": N, "steps_ok": M>0} where N>0
93
+ """
94
+ if not isinstance(result, dict):
95
+ # Non-dict returns: trust the caller but don't confirm partial/error.
96
+ return "committed"
97
+
98
+ # Hard failure signals
99
+ if result.get("committed") is False:
100
+ return "commit_failed"
101
+ if result.get("ok") is False:
102
+ return "commit_failed"
103
+ if result.get("status") == "failed":
104
+ return "commit_failed"
105
+ steps_ok = result.get("steps_ok")
106
+ steps_failed = result.get("steps_failed")
107
+ if steps_ok == 0 and (steps_failed is None or steps_failed > 0):
108
+ return "commit_failed"
109
+
110
+ # Partial success
111
+ if result.get("status") == "committed_with_errors":
112
+ return "committed_with_errors"
113
+ if (
114
+ isinstance(steps_failed, int) and steps_failed > 0
115
+ and isinstance(steps_ok, int) and steps_ok > 0
116
+ ):
117
+ return "committed_with_errors"
118
+
119
+ # Otherwise: clean success
120
+ return "committed"
66
121
 
67
122
 
68
123
  def iterate_toward_goal_engine(
@@ -195,15 +250,37 @@ def _iterate_sync_core(
195
250
  # otherwise the old non-winning experiment leaks in the store.
196
251
  if best_exp_id is not None and best_exp_id != exp_id:
197
252
  discard_fn(best_exp_id)
198
- commit_fn(exp_id, winner_branch_id)
253
+ commit_payload = commit_fn(exp_id, winner_branch_id)
254
+ commit_status = _classify_commit_result(commit_payload)
255
+ commit_dict = commit_payload if isinstance(commit_payload, dict) else None
256
+ if commit_status == "commit_failed":
257
+ return IterationResult(
258
+ status="commit_failed",
259
+ iterations_run=i + 1,
260
+ committed_experiment_id=None,
261
+ committed_branch_id=None,
262
+ final_score=winner_score,
263
+ steps=steps,
264
+ reason=(
265
+ f"threshold {threshold} met on iteration {i} but commit "
266
+ f"applied no steps; see commit_result"
267
+ ),
268
+ commit_result=commit_dict,
269
+ )
199
270
  return IterationResult(
200
- status="committed",
271
+ status=commit_status, # "committed" or "committed_with_errors"
201
272
  iterations_run=i + 1,
202
273
  committed_experiment_id=exp_id,
203
274
  committed_branch_id=winner_branch_id,
204
275
  final_score=winner_score,
205
276
  steps=steps,
206
- reason=f"threshold {threshold} met on iteration {i}",
277
+ reason=(
278
+ f"threshold {threshold} met on iteration {i}"
279
+ if commit_status == "committed"
280
+ else f"threshold {threshold} met on iteration {i}; "
281
+ f"commit applied with partial failures (see commit_result)"
282
+ ),
283
+ commit_result=commit_dict,
207
284
  )
208
285
 
209
286
  if winner_branch_id is not None and winner_score > best_score:
@@ -217,9 +294,26 @@ def _iterate_sync_core(
217
294
  discard_fn(exp_id)
218
295
 
219
296
  if on_timeout == "commit_best" and best_exp_id and best_branch_id:
220
- commit_fn(best_exp_id, best_branch_id)
297
+ commit_payload = commit_fn(best_exp_id, best_branch_id)
298
+ commit_status = _classify_commit_result(commit_payload)
299
+ commit_dict = commit_payload if isinstance(commit_payload, dict) else None
300
+ if commit_status == "commit_failed":
301
+ return IterationResult(
302
+ status="commit_failed",
303
+ iterations_run=n,
304
+ committed_experiment_id=None,
305
+ committed_branch_id=None,
306
+ final_score=best_score,
307
+ steps=steps,
308
+ reason=(
309
+ f"max_iterations={n} reached; commit_best selected best-so-far "
310
+ f"(score {best_score}) but the commit applied no steps; "
311
+ f"see commit_result"
312
+ ),
313
+ commit_result=commit_dict,
314
+ )
221
315
  return IterationResult(
222
- status="exhausted",
316
+ status="exhausted" if commit_status == "committed" else "committed_with_errors",
223
317
  iterations_run=n,
224
318
  committed_experiment_id=best_exp_id,
225
319
  committed_branch_id=best_branch_id,
@@ -228,7 +322,9 @@ def _iterate_sync_core(
228
322
  reason=(
229
323
  f"max_iterations={n} reached, threshold {threshold} never met; "
230
324
  f"committed best-so-far with score {best_score}"
325
+ + ("" if commit_status == "committed" else " (partial commit — see commit_result)")
231
326
  ),
327
+ commit_result=commit_dict,
232
328
  )
233
329
 
234
330
  if best_exp_id:
@@ -296,15 +392,37 @@ async def _iterate_async_core(
296
392
  if met:
297
393
  if best_exp_id is not None and best_exp_id != exp_id:
298
394
  await _maybe_await(discard_fn(best_exp_id))
299
- await _maybe_await(commit_fn(exp_id, winner_branch_id))
395
+ commit_payload = await _maybe_await(commit_fn(exp_id, winner_branch_id))
396
+ commit_status = _classify_commit_result(commit_payload)
397
+ commit_dict = commit_payload if isinstance(commit_payload, dict) else None
398
+ if commit_status == "commit_failed":
399
+ return IterationResult(
400
+ status="commit_failed",
401
+ iterations_run=i + 1,
402
+ committed_experiment_id=None,
403
+ committed_branch_id=None,
404
+ final_score=winner_score,
405
+ steps=steps,
406
+ reason=(
407
+ f"threshold {threshold} met on iteration {i} but commit "
408
+ f"applied no steps; see commit_result"
409
+ ),
410
+ commit_result=commit_dict,
411
+ )
300
412
  return IterationResult(
301
- status="committed",
413
+ status=commit_status,
302
414
  iterations_run=i + 1,
303
415
  committed_experiment_id=exp_id,
304
416
  committed_branch_id=winner_branch_id,
305
417
  final_score=winner_score,
306
418
  steps=steps,
307
- reason=f"threshold {threshold} met on iteration {i}",
419
+ reason=(
420
+ f"threshold {threshold} met on iteration {i}"
421
+ if commit_status == "committed"
422
+ else f"threshold {threshold} met on iteration {i}; "
423
+ f"commit applied with partial failures (see commit_result)"
424
+ ),
425
+ commit_result=commit_dict,
308
426
  )
309
427
 
310
428
  if winner_branch_id is not None and winner_score > best_score:
@@ -317,9 +435,26 @@ async def _iterate_async_core(
317
435
  await _maybe_await(discard_fn(exp_id))
318
436
 
319
437
  if on_timeout == "commit_best" and best_exp_id and best_branch_id:
320
- await _maybe_await(commit_fn(best_exp_id, best_branch_id))
438
+ commit_payload = await _maybe_await(commit_fn(best_exp_id, best_branch_id))
439
+ commit_status = _classify_commit_result(commit_payload)
440
+ commit_dict = commit_payload if isinstance(commit_payload, dict) else None
441
+ if commit_status == "commit_failed":
442
+ return IterationResult(
443
+ status="commit_failed",
444
+ iterations_run=n,
445
+ committed_experiment_id=None,
446
+ committed_branch_id=None,
447
+ final_score=best_score,
448
+ steps=steps,
449
+ reason=(
450
+ f"max_iterations={n} reached; commit_best selected best-so-far "
451
+ f"(score {best_score}) but the commit applied no steps; "
452
+ f"see commit_result"
453
+ ),
454
+ commit_result=commit_dict,
455
+ )
321
456
  return IterationResult(
322
- status="exhausted",
457
+ status="exhausted" if commit_status == "committed" else "committed_with_errors",
323
458
  iterations_run=n,
324
459
  committed_experiment_id=best_exp_id,
325
460
  committed_branch_id=best_branch_id,
@@ -328,7 +463,9 @@ async def _iterate_async_core(
328
463
  reason=(
329
464
  f"max_iterations={n} reached, threshold {threshold} never met; "
330
465
  f"committed best-so-far with score {best_score}"
466
+ + ("" if commit_status == "committed" else " (partial commit — see commit_result)")
331
467
  ),
468
+ commit_result=commit_dict,
332
469
  )
333
470
 
334
471
  if best_exp_id:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "livepilot",
3
- "version": "1.17.2",
3
+ "version": "1.17.3",
4
4
  "mcpName": "io.github.dreamrec/livepilot",
5
5
  "description": "Agentic production system for Ableton Live 12 — 427 tools, 52 domains. Device atlas (1305 devices), sample engine (Splice + browser + filesystem), auto-composition, spectral perception, technique memory, creative intelligence (12 engines)",
6
6
  "author": "Pilot Studio",
@@ -5,7 +5,7 @@ Entry point for the ControlSurface. Ableton calls create_instance(c_instance)
5
5
  when this script is selected in Preferences > Link, Tempo & MIDI.
6
6
  """
7
7
 
8
- __version__ = "1.17.2"
8
+ __version__ = "1.17.3"
9
9
 
10
10
  from _Framework.ControlSurface import ControlSurface
11
11
  from . import router
package/server.json CHANGED
@@ -6,12 +6,12 @@
6
6
  "url": "https://github.com/dreamrec/LivePilot",
7
7
  "source": "github"
8
8
  },
9
- "version": "1.17.2",
9
+ "version": "1.17.3",
10
10
  "packages": [
11
11
  {
12
12
  "registryType": "npm",
13
13
  "identifier": "livepilot",
14
- "version": "1.17.2",
14
+ "version": "1.17.3",
15
15
  "transport": {
16
16
  "type": "stdio"
17
17
  }