npm - livepilot - Versions diffs - 1.10.2 → 1.10.3 - Mend

livepilot 1.10.2 → 1.10.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/.claude-plugin/marketplace.json +1 -1
package/AGENTS.md +1 -1
package/CHANGELOG.md +138 -0
package/README.md +6 -4
package/livepilot/.Codex-plugin/plugin.json +1 -1
package/livepilot/.claude-plugin/plugin.json +1 -1
package/livepilot/skills/livepilot-core/references/overview.md +1 -1
package/livepilot/skills/livepilot-evaluation/references/capability-modes.md +1 -1
package/livepilot.mcpb +0 -0
package/m4l_device/livepilot_bridge.js +1 -1
package/manifest.json +1 -1
package/mcp_server/__init__.py +1 -1
package/mcp_server/composer/engine.py +17 -22
package/mcp_server/composer/sample_resolver.py +150 -11
package/mcp_server/experiment/engine.py +212 -16
package/mcp_server/experiment/models.py +10 -0
package/mcp_server/experiment/tools.py +28 -10
package/mcp_server/persistence/project_store.py +61 -7
package/mcp_server/preview_studio/tools.py +73 -7
package/package.json +1 -1
package/remote_script/LivePilot/__init__.py +1 -1
package/scripts/sync_metadata.py +4 -4

package/.claude-plugin/marketplace.json CHANGED Viewed

@@ -10,7 +10,7 @@
     {
       "name": "livepilot",
       "description": "Agentic production system for Ableton Live 12 — 317 tools, 43 domains, device atlas, spectral perception, technique memory, sample intelligence, auto-composition, neo-Riemannian harmony, Euclidean rhythm, species counterpoint, MIDI I/O",
-      "version": "1.10.2",
+      "version": "1.10.3",
       "author": {
         "name": "Pilot Studio"
       },

package/AGENTS.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# LivePilot v1.10.2 — Ableton Live 12
+# LivePilot v1.10.3 — Ableton Live 12
 ## Project
 - **Repo:** This directory (LivePilot)

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,143 @@
 # Changelog
+## 1.10.3 — Truth Release (April 14 2026)
+A correctness pass focused on making the top-layer workflows **trustworthy
+in real use**. No new tool families, no new domains, no new breadth. Every
+change is a truth-release fix: execution paths are real, emitted plans are
+valid, sample matching is musically sane, and product language matches
+implementation.
+The four flagship workflows this release optimizes for:
+  1. **Session understanding** — already strong, unchanged
+  2. **Sample-guided section building** — fixed by §2 + §3
+  3. **Wonder rescue** — fixed by §1
+  4. **Targeted improvement ("tighten the low end")** — already strong, unchanged
+If a feature couldn't be made true in this cycle, it was downgraded honestly
+rather than preserved as fake capability.
+### Fixed — Execution truth (§1)
+- **Experiments now route through the async execution router.**
+  `mcp_server/experiment/engine.py` had two code paths (`run_branch` and
+  `commit_branch`) that called `ableton.send_command(tool, params)` directly
+  and suppressed every failure with a silent `except Exception: pass`. They
+  now go through `execute_plan_steps_async` with per-step results recorded
+  on `branch.execution_log`. Branch status reflects reality: `evaluated`
+  when steps ran, `failed` when zero succeeded, `committed_with_errors`
+  when a commit was partial. Users can see exactly which tools succeeded
+  and which didn't.
+- **`commit_preview_variant` actually applies the variant now.**
+  Previously this tool only marked the variant as chosen in an in-memory
+  store and updated taste memory — the comment said *"the caller should
+  then apply the variant's compiled plan"* which was a trust leak. Users
+  reasonably expected `commit` to **apply** the variant. It now runs the
+  variant's compiled plan through `execute_plan_steps_async` and returns
+  `execution_log` + `steps_ok` / `steps_failed` + explicit `status`
+  (`committed` / `committed_with_errors` / `failed`). Analytical-only
+  variants (no compiled plan) return `status="analytical_only"` and
+  `committed=False` instead of pretending to apply anything.
+### Fixed — Composer truthfulness (§2)
+- **`suggest_sample_technique` removed from the executable plan.**
+  The composer was emitting `{"tool": "suggest_sample_technique", "params":
+  {"technique_id": layer.technique_id}}` in both `compose()` and `augment()`.
+  The real tool's signature is `(file_path required, intent, philosophy,
+  max_suggestions)` — `technique_id` is not a parameter and `file_path` is
+  required. This step would have always failed at runtime. It's now dropped
+  from the executable plan entirely; `layer.technique_id` still surfaces
+  in the descriptive `result.layers[*].technique_id` output for user
+  inspection. The agent can call `suggest_sample_technique` separately with
+  a real file path if it wants per-sample recipe advice.
+  All 12 remaining composer tool emissions validated against real signatures
+  — they're all correct.
+### Fixed — Sample resolution quality (§3)
+- **Role-aware scored ranking replaces naive first-hit substring matching.**
+  The old `_filesystem_match` returned the first audio file whose name
+  contained the layer's role OR any query token. This produced obvious
+  musical mistakes: a `lead` layer asking for *"techno melody Am"* would
+  get matched to `drums_techno.wav` because of the shared "techno" token.
+  The new scorer considers:
+  * role word in filename (+3.0)
+  * filename's primary role matches layer role (+1.5 bonus)
+  * filename's primary role is a **different** role (−5.0 penalty — this
+    is what blocks the drums-for-lead failure)
+  * role-adjacent hint words (kick/snare for drums, sub/808 for bass, etc.)
+    (+2.0)
+  * query token overlap excluding the role word (+0.5 per token)
+  * tempo token overlap between filename and query (+1.0)
+  A candidate must score strictly above 0.0 to be returned — files with
+  no signal at all return `unresolved` instead of an arbitrary first pick.
+  Six new regression tests lock out specific failure patterns.
+### Fixed — Project identity stability (§5)
+- **`project_hash` uses much more entropy.** The old hash was
+  `tempo + track_count + sorted_track_names` — the author's own comment
+  said *"this is imperfect"*. It collided whenever two songs shared the
+  same tempo and track names, and it was invariant to track reordering,
+  scene changes, and arrangement length. The new hash includes:
+  * tempo (1 decimal)
+  * time signature
+  * song length in beats (arrangement duration — very distinguishing)
+  * **ordered** track list: `(index, name, color, has_midi_input)` per track
+  * return track count + names
+  * **ordered** scene list: `(index, name, color)` per scene
+  Six new tests lock out: track reordering collision, song-length collision,
+  scene-list collision, time-signature collision, and track-rename detection.
+  Not a true project ID (that still needs Live set file path access from
+  the Remote Script, deferred) but substantially less fragile in practice.
+### Changed — Product language (§6)
+- **README.md**: "Producer Agent — autonomous multi-step production"
+  rewritten as *"an orchestrated multi-step assistant for building,
+  layering and refining sessions. [...] The agent proposes plans; the user
+  confirms and listens. LivePilot is a high-trust operator, not an
+  autonomous producer."*
+- **docs/manual/getting-started.md**: "An autonomous agent that can build
+  entire tracks from high-level descriptions" rewritten to frame output as
+  a *"playable baseline — a starting point, not a finished track. You
+  listen, decide what works, and iterate."*
+- **docs/manual/intelligence.md**: `agentic_loop` workflow mode description
+  changed from *"Full autonomous loop with evaluation"* to *"Multi-step
+  plan-and-evaluate loop with explicit checkpoints"*.
+### Tests
+- **1756 passing**, 1 skipped (was 1740 in v1.10.2; +16 net new regressions):
+  * +2 composer: `suggest_sample_technique` NOT in compose/augment plan
+  * +6 sample resolver: role-aware ranking lockouts
+  * +2 preview studio: `commit_preview_variant` executes + analytical-only honesty
+  * +6 project persistence: hash collision-resistance
+### Note — what was intentionally NOT fixed in this cycle
+- **`mcp_dispatch` registry expansion.** Only `load_sample_to_simpler` is
+  registered. The other 9 `MCP_TOOLS` entries are not currently emitted by
+  any compiled plan I can find. The router returns a clear "not in dispatch"
+  error if an unregistered MCP tool ever gets emitted, which is *honest
+  failure* — not silent. Adding stub entries would be preemptive scope.
+- **Wonder Mode full SessionKernel.** Wonder passes real `session_info` from
+  Ableton to the variant compilers when connected — the kernel SHAPE is
+  minimal (`{session_info, mode}`) but the semantic-move compilers only
+  read `kernel.session_info.tracks`, so the extra fields don't change
+  behavior. Low value, deferred.
+- **Silent `except: pass` in non-execution paths.** `commit_preview_variant`
+  has two silent excepts around taste-memory and turn-resolution updates.
+  These are bookkeeping side effects, not execution-critical, and failing
+  them shouldn't abort the commit. Left as-is.
+- **Project identity via Live set file path.** The real fix for §5 would
+  be to pull `song.song_document_path` from Live via a new Remote Script
+  handler. Deferred — the stronger hash is a substantial improvement
+  without adding new Remote Script surface area.
+---
 ## 1.10.2 — npm Distribution Fix + Tool-Count Audit (April 14 2026)
 Patch release. The orchestration hardening shipped in 1.10.1 was correct on

package/README.md CHANGED Viewed

@@ -514,10 +514,12 @@ claude plugin add github:dreamrec/LivePilot/plugin
 | `/evaluate` | Before/after evaluation of recent changes |
 | `/memory` | Technique library management |
-**Producer Agent** — autonomous multi-step production.
-Consults memory for style context, searches the atlas for instruments,
-searches samples, creates tracks, programs MIDI, chains effects,
-reads the spectrum to verify, and arranges sections.
+**Producer Agent** — an orchestrated multi-step assistant for building,
+layering and refining sessions. Consults memory for style context, searches
+the atlas for instruments, searches samples, creates tracks, programs MIDI,
+chains effects, reads the spectrum to verify, and arranges sections. The
+agent proposes plans; the user confirms and listens. LivePilot is a high-
+trust operator, not an autonomous producer.
 **Core Skill** — operational discipline connecting all layers.
 Consult atlas before loading. Read analyzer after mixing.

package/livepilot/.Codex-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "livepilot",
-  "version": "1.10.2",
+  "version": "1.10.3",
   "description": "Agentic production system for Ableton Live 12 — 317 tools, 43 domains, device atlas, sample intelligence, auto-composition, spectral perception, technique memory, neo-Riemannian harmony, Euclidean rhythm, species counterpoint, MIDI I/O",
   "author": {
     "name": "Pilot Studio"

package/livepilot/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "livepilot",
-  "version": "1.10.2",
+  "version": "1.10.3",
   "description": "Agentic production system for Ableton Live 12 — 317 tools, 43 domains, device atlas, sample intelligence, auto-composition, spectral perception, technique memory, neo-Riemannian harmony, Euclidean rhythm, species counterpoint, MIDI I/O",
   "author": {
     "name": "Pilot Studio"

package/livepilot/skills/livepilot-core/references/overview.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# LivePilot v1.10.2 — Architecture & Tool Reference
+# LivePilot v1.10.3 — Architecture & Tool Reference
 Agentic production system for Ableton Live 12. 317 tools across 43 domains. Device atlas (1305 devices, 81 enriched), spectral perception (M4L analyzer), technique memory, automation intelligence (16 curve types, 15 recipes), music theory (Krumhansl-Schmuckler, species counterpoint), generative algorithms (Euclidean rhythm, tintinnabuli, phase shift, additive process), neo-Riemannian harmony (PRL transforms, Tonnetz), MIDI file I/O.

package/livepilot/skills/livepilot-evaluation/references/capability-modes.md CHANGED Viewed

@@ -104,7 +104,7 @@ Call `get_capability_state` at the start of any evaluation session. The response
 {
   "mode": "normal",
   "analyzer_connected": true,
-  "bridge_version": "1.10.2",
+  "bridge_version": "1.10.3",
   "spectral_cache_age_ms": 1200,
   "flucoma_available": false,
   "session_connected": true

package/livepilot.mcpb CHANGED Viewed

Binary file

package/m4l_device/livepilot_bridge.js CHANGED Viewed

@@ -84,7 +84,7 @@ function anything() {
 function dispatch(cmd, args) {
     switch(cmd) {
         case "ping":
-            send_response({"ok": true, "version": "1.10.2"});
+            send_response({"ok": true, "version": "1.10.3"});
             break;
         case "get_params":
             cmd_get_params(args);

package/manifest.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "manifest_version": "0.3",
   "name": "livepilot",
   "display_name": "LivePilot — AI for Ableton Live",
-  "version": "1.10.2",
+  "version": "1.10.3",
   "description": "Agentic production system for Ableton Live 12. Make beats, mix tracks, design sounds, and arrange songs with 317 AI-powered tools.",
   "long_description": "LivePilot is an agentic production system for Ableton Live 12. 317 tools across 43 domains — device atlas (1305 devices), sample intelligence (Splice + browser + filesystem), auto-composition, spectral perception, technique memory, and 12 creative engines.\n\n**What it does:**\n- Creates MIDI clips with notes, chords, and rhythms\n- Loads instruments and effects via Device Atlas (1305 devices indexed)\n- Searches samples across Splice, Ableton browser, and filesystem\n- Plans compositions from text prompts with genre-aware layering\n- Slices samples with intent-based MIDI generation\n- Mixes with volume, panning, sends, and automation\n- Analyzes your mix with real-time spectral data (M4L bridge)\n- Diagnoses stuck sessions and generates creative rescue variants\n- Remembers your production style across sessions\n\n**How it works:**\nLivePilot installs a Remote Script in Ableton that communicates with the AI over a local TCP connection. Everything runs on your machine — no audio leaves your computer.",
   "author": {

package/mcp_server/__init__.py CHANGED Viewed

@@ -1,2 +1,2 @@
 """LivePilot MCP Server — bridges MCP protocol to Ableton Live."""
-__version__ = "1.10.2"
+__version__ = "1.10.3"

package/mcp_server/composer/engine.py CHANGED Viewed

@@ -126,19 +126,18 @@ def _step_load_sample_to_simpler(track_index: int, layer: LayerSpec, file_path:
     }
-def _step_suggest_technique(track_index: int, layer: LayerSpec) -> dict:
-    """Real tool — returns technique recipe for the agent to interpret.
-    Not a pseudo-tool: suggest_sample_technique is a registered MCP tool.
-    The agent reads the returned recipe and applies the steps manually; we
-    don't try to auto-apply here because the recipe is open-ended.
-    """
-    return {
-        "tool": "suggest_sample_technique",
-        "params": {"technique_id": layer.technique_id},
-        "description": f"Get technique recipe '{layer.technique_id}' for track {track_index}",
-        "role": layer.role,
-    }
+# NOTE: there used to be a _step_suggest_technique helper here that emitted a
+# `suggest_sample_technique` step into the executable plan with params
+# {"technique_id": layer.technique_id}. This was broken: the real tool's
+# signature is (file_path, intent, philosophy, max_suggestions) and takes
+# no technique_id param. The step would have failed at runtime with a
+# "required file_path missing" error.
+#
+# Removed in v1.10.3 (Truth Release). Technique suggestions for composer
+# layers are now surfaced in the descriptive result output (result.layers[*].
+# technique_id) — the agent can call suggest_sample_technique separately
+# with the resolved sample path if it wants per-sample recipe advice. The
+# executable plan emits only real, validated tool calls.
 def _processing_steps_with_binding(
@@ -366,8 +365,9 @@ class ComposerEngine:
             plan.append(_step_load_sample_to_simpler(track_index, layer, file_path))
-            if layer.technique_id:
-                plan.append(_step_suggest_technique(track_index, layer))
+            # technique_id intentionally NOT emitted as an executable step —
+            # see note above _step_suggest_technique removal. layer.technique_id
+            # is still surfaced in result.layers for descriptive output.
             plan.extend(_processing_steps_with_binding(track_index, layer, layer_idx))
             plan.extend(_mix_steps(track_index, layer))
@@ -458,13 +458,8 @@ class ComposerEngine:
                 "role": layer.role,
             })
-            if layer.technique_id:
-                plan.append({
-                    "tool": "suggest_sample_technique",
-                    "params": {"technique_id": layer.technique_id},
-                    "description": f"Get technique recipe '{layer.technique_id}'",
-                    "role": layer.role,
-                })
+            # technique_id intentionally NOT emitted (see compose() above).
+            # Surfaced in result.new_layers for descriptive output only.
             for dev_idx, device in enumerate(layer.processing):
                 device_name = device.get("name", "")

package/mcp_server/composer/sample_resolver.py CHANGED Viewed

@@ -16,10 +16,31 @@ Returns (local_path, source) where source is one of:
 Preference order is fixed: filesystem > splice_local > splice_remote > browser.
 Filesystem wins even if Splice has a faster hit — local files are free.
+Role-aware filesystem ranking (v1.10.3)
+----------------------------------------
+Filesystem matching used to return the first file whose name contained the
+role OR any query token. This caused obvious musical mistakes — a `lead`
+layer would get matched to `drums_techno.wav` because both share the genre
+token "techno". The Truth Release (v1.10.3) replaces that with a scored
+ranker that considers:
+  * role word in filename                 (+3.0)
+  * filename's primary role == layer role (+1.5 bonus)
+  * filename's primary role == a DIFFERENT role (-5.0 penalty)
+  * role-adjacent hint words (e.g. kick/snare for drums) (+2.0)
+  * query token overlap, excluding the role word itself (+0.5 per token)
+  * tempo token (e.g. "128bpm") shared between filename and query (+1.0)
+A candidate must score strictly above 0.0 to be returned. This blocks the
+obvious failure mode where genre-only matches override role matches or
+where unrelated files with no signal get returned just because they're
+the first audio file found.
 """
 from __future__ import annotations
+import re
 from pathlib import Path
 from typing import Optional, Tuple
@@ -28,6 +49,32 @@ from .layer_planner import LayerSpec
 _AUDIO_EXTENSIONS = (".wav", ".aif", ".aiff", ".flac")
+# Role-adjacent hint words (NOT the role itself — that's scored separately).
+# These are words commonly found in filenames that indicate the layer role
+# without using the literal role name.
+_ROLE_HINTS: dict[str, frozenset[str]] = {
+    "drums":      frozenset(["kick", "snare", "hat", "clap", "perc", "break", "beat", "loop", "hihat"]),
+    "bass":       frozenset(["sub", "808", "low", "deep", "bassline"]),
+    "lead":       frozenset(["synth", "arp", "mel", "melody", "riff", "hook"]),
+    "pad":        frozenset(["ambient", "atmos", "drone", "string", "warm"]),
+    "texture":    frozenset(["atmos", "ambient", "drone", "swell", "noise"]),
+    "vocal":      frozenset(["vox", "voice", "chop", "phrase", "acapella"]),
+    "percussion": frozenset(["shaker", "tamb", "bongo", "conga", "tom", "ride", "cowbell"]),
+    "fx":         frozenset(["sfx", "riser", "impact", "sweep", "whoosh", "rise", "fall", "hit"]),
+}
+# Flat set of every known "primary role word" that might appear at the start
+# of a filename. Used to classify a filename's dominant role.
+_ALL_ROLE_WORDS: frozenset[str] = frozenset(
+    {role for role in _ROLE_HINTS}
+    | {"drum"}  # singular form of "drums"
+    | {h for hints in _ROLE_HINTS.values() for h in hints}
+)
+# Tempo token pattern — matches 2-3 digit BPM values in filenames like
+# "kick_128bpm.wav", "drums_120_loop.wav", "bass128.wav".
+_TEMPO_RE = re.compile(r"(\d{2,3})")
 def _query_tokens(query: str) -> list[str]:
     """Return lowercase query tokens meaningful for matching (len > 2)."""
@@ -42,21 +89,113 @@ def _iter_candidates(root: Path):
         yield from root.rglob(f"*{ext}")
-def _filesystem_match(layer: LayerSpec, search_roots: list[Path]) -> Optional[str]:
-    """First filename-substring match on role or any query token.
+def _primary_role_of(filename_stem: str) -> Optional[str]:
+    """Identify the dominant 'role' of a filename based on its first token.
+    Example: "drums_techno_128.wav" -> "drums". "bass_sub_808.aif" -> "bass".
+    Returns None if the first token isn't a known role word.
+    """
+    # Split on underscores, hyphens, spaces, dots
+    parts = re.split(r"[_\-\s.]+", filename_stem.lower())
+    for p in parts:
+        if p in _ALL_ROLE_WORDS:
+            return p
+    return None
-    Sync helper — no network, no async needed.
+def _role_matches(primary: str, role: str) -> bool:
+    """True if the filename's primary role belongs to the same role family
+    as the layer's role (handles role == 'drums' vs primary == 'kick')."""
+    if primary == role:
+        return True
+    # "drum" is the singular of "drums"
+    if primary == "drum" and role == "drums":
+        return True
+    # primary is one of the role's hints (e.g. "kick" is a drum hint)
+    hints = _ROLE_HINTS.get(role, frozenset())
+    return primary in hints
+def _score_candidate(path: Path, layer: LayerSpec, query_tempos: set[str]) -> float:
+    """Return a ranking score for this candidate file.
+    Scores combine role fit, role hints, query tokens, and tempo match.
+    A negative score is possible (and disqualifying) when the filename's
+    primary role is clearly a DIFFERENT role family — that blocks the
+    "lead layer grabs drums via shared genre token" failure pattern.
     """
+    name = path.stem.lower()
+    role = (layer.role or "").lower()
+    score = 0.0
+    # 1. Role word literally in filename
+    if role and role in name:
+        score += 3.0
+    # 2. Primary-role classification of the filename
+    primary = _primary_role_of(name)
+    if primary:
+        if _role_matches(primary, role):
+            score += 1.5  # bonus: filename is "about" this layer's role
+        else:
+            score -= 5.0  # heavy penalty: filename is about a different role
+    # 3. Role-adjacent hint words in filename
+    hints = _ROLE_HINTS.get(role, frozenset())
+    for hint in hints:
+        if hint in name:
+            score += 2.0
+            break  # count at most once
+    # 4. Query token overlap (excluding the role word — already scored above)
     tokens = _query_tokens(layer.search_query)
-    role = layer.role.lower()
+    for tok in tokens:
+        if tok == role:
+            continue
+        if tok in name:
+            score += 0.5
+    # 5. Tempo match — if query mentions e.g. "128bpm" and filename has "128"
+    if query_tempos:
+        filename_tempos = set(_TEMPO_RE.findall(name))
+        # Only count digits that are plausible BPMs (60-200)
+        filename_tempos = {t for t in filename_tempos if 60 <= int(t) <= 200}
+        if query_tempos & filename_tempos:
+            score += 1.0
+    return score
+def _extract_query_tempos(query: str) -> set[str]:
+    """Pull tempo tokens (e.g. '128bpm', '120') out of a search query."""
+    tempos = set()
+    for match in _TEMPO_RE.findall(query.lower()):
+        if 60 <= int(match) <= 200:
+            tempos.add(match)
+    return tempos
+def _filesystem_match(layer: LayerSpec, search_roots: list[Path]) -> Optional[str]:
+    """Score every audio file across the search_roots and return the best.
+    Returns None if no file scores above zero. "Above zero" is the
+    threshold for "has any role or token signal" — anything at or below
+    zero is considered unresolved (to avoid returning arbitrary files
+    that happen to be first in alphabetical order).
+    """
+    query_tempos = _extract_query_tempos(layer.search_query)
+    best_path: Optional[Path] = None
+    best_score: float = 0.0  # must strictly exceed this to win
     for root in search_roots:
         for path in _iter_candidates(Path(root)):
-            name = path.name.lower()
-            if role and role in name:
-                return str(path)
-            if any(tok in name for tok in tokens):
-                return str(path)
-    return None
+            score = _score_candidate(path, layer, query_tempos)
+            if score > best_score:
+                best_score = score
+                best_path = path
+    return str(best_path) if best_path is not None else None
 async def _splice_resolve(
@@ -128,7 +267,7 @@ async def resolve_sample_for_layer(
     """
     roots = [Path(r) for r in (search_roots or []) if r]
-    # 1. Filesystem — always try first, no network
+    # 1. Filesystem — always try first, no network. Scored ranking since v1.10.3.
     fs_hit = _filesystem_match(layer, roots)
     if fs_hit:
         return fs_hit, "filesystem"

package/mcp_server/experiment/engine.py CHANGED Viewed

@@ -101,41 +101,128 @@ def run_branch(
     The branch is updated in-place with snapshots and status.
     """
+    # NOTE: this function was converted to an async wrapper around the
+    # async execution router in v1.10.3 (Truth Release). The synchronous
+    # _run_branch_sync stays for any caller that still uses it, but it now
+    # fails loudly on execution errors instead of silently swallowing them.
+    # The canonical path is run_branch_async below. Callers (tools.py) use
+    # the async variant directly.
+    return _run_branch_sync(branch, ableton, compiled_plan, capture_fn)
+def _run_branch_sync(branch, ableton, compiled_plan, capture_fn):
+    """Legacy sync run_branch body. Preserved for back-compat only.
+    Experiment tools now use run_branch_async which routes through the
+    unified execution substrate.
+    """
     branch.status = "running"
     branch.compiled_plan = compiled_plan
-    # 1. Capture before
     branch.before_snapshot = capture_fn()
-    # 2. Execute plan steps
     steps_executed = 0
+    log = []
     for step in compiled_plan.get("steps", []):
         tool = step.get("tool", "")
         params = step.get("params", {})
         if not tool:
             continue
-        # Skip read-only verification steps
         if tool in ("get_track_meters", "get_master_spectrum", "analyze_mix"):
             continue
         try:
-            ableton.send_command(tool, params)
+            result = ableton.send_command(tool, params)
             steps_executed += 1
-        except Exception:
-            pass  # Best effort — continue with remaining steps
+            log.append({"tool": tool, "backend": "remote_command", "ok": True, "result": result})
+        except Exception as exc:
+            log.append({"tool": tool, "backend": "remote_command", "ok": False, "error": str(exc)})
+    branch.execution_log = log
     branch.executed_at_ms = int(time.time() * 1000)
+    branch.after_snapshot = capture_fn()
+    for _ in range(steps_executed):
+        try:
+            ableton.send_command("undo", {})
+        except Exception:
+            break
-    # 3. Capture after
+    branch.status = "evaluated" if steps_executed > 0 else "failed"
+    return branch
+async def run_branch_async(
+    branch,
+    ableton,
+    compiled_plan: dict,
+    capture_fn,
+    bridge=None,
+    mcp_registry=None,
+    ctx=None,
+):
+    """Run a single branch experiment through the async execution router.
+    Same semantics as run_branch (apply → capture → evaluate → undo) but
+    dispatches each step through execute_plan_steps_async so remote /
+    bridge / mcp backends are all routed correctly and per-step failures
+    are visible in branch.execution_log.
+    Read-only verification steps (get_track_meters, get_master_spectrum,
+    analyze_mix) are skipped in the apply pass — they're used for snapshot
+    capture separately.
+    """
+    from ..runtime.execution_router import execute_plan_steps_async
+    branch.status = "running"
+    branch.compiled_plan = compiled_plan
+    branch.before_snapshot = capture_fn()
+    # Filter out read-only verification steps from the apply pass
+    all_steps = compiled_plan.get("steps", []) or []
+    apply_steps = [
+        s for s in all_steps
+        if s.get("tool") and s.get("tool") not in (
+            "get_track_meters", "get_master_spectrum", "analyze_mix",
+        )
+    ]
+    exec_results = await execute_plan_steps_async(
+        apply_steps,
+        ableton=ableton,
+        bridge=bridge,
+        mcp_registry=mcp_registry or {},
+        ctx=ctx,
+        stop_on_failure=False,  # best-effort, but log every failure
+    )
+    # Record per-step results on the branch for visibility
+    branch.execution_log = [
+        {
+            "tool": r.tool,
+            "backend": r.backend,
+            "ok": r.ok,
+            **({"result": r.result} if r.ok else {"error": r.error}),
+        }
+        for r in exec_results
+    ]
+    steps_executed = sum(1 for r in exec_results if r.ok)
+    branch.executed_at_ms = int(time.time() * 1000)
     branch.after_snapshot = capture_fn()
-    # 4. Undo all changes back to checkpoint
+    # Undo all successful steps back to checkpoint. Undo is a remote_command,
+    # route it through the normal ableton.send_command path for simplicity.
     for _ in range(steps_executed):
         try:
             ableton.send_command("undo", {})
         except Exception:
             break
-    branch.status = "evaluated"
+    # A branch is "evaluated" only if it actually applied at least one step.
+    # If every step failed, mark it "failed" — this is the truth-release
+    # behavior that makes the experiment honest instead of pretending
+    # a broken branch produced a neutral result.
+    branch.status = "evaluated" if steps_executed > 0 else "failed"
     return branch
@@ -160,12 +247,102 @@ def evaluate_branch(
 # ── Commit / discard ─────────────────────────────────────────────────────────
+async def commit_branch_async(
+    experiment: ExperimentSet,
+    branch_id: str,
+    ableton,
+    bridge=None,
+    mcp_registry=None,
+    ctx=None,
+) -> dict:
+    """Re-apply the winning branch's moves permanently, through the async
+    execution router. No undo — the changes stick.
+    Returns a dict with the committed branch info AND the execution_log
+    (per-step ok/error results). If any step failed, the branch is marked
+    'committed_with_errors' so the caller can tell the commit was partial.
+    """
+    from ..runtime.execution_router import execute_plan_steps_async
+    branch = experiment.get_branch(branch_id)
+    if not branch:
+        return {"error": f"Branch {branch_id} not found"}
+    if not branch.compiled_plan:
+        return {"error": "Branch has no compiled plan"}
+    all_steps = branch.compiled_plan.get("steps", []) or []
+    apply_steps = [
+        s for s in all_steps
+        if s.get("tool") and s.get("tool") not in (
+            "get_track_meters", "get_master_spectrum", "analyze_mix",
+        )
+    ]
+    exec_results = await execute_plan_steps_async(
+        apply_steps,
+        ableton=ableton,
+        bridge=bridge,
+        mcp_registry=mcp_registry or {},
+        ctx=ctx,
+        stop_on_failure=False,  # best-effort commit — record everything
+    )
+    log = [
+        {
+            "tool": r.tool,
+            "backend": r.backend,
+            "ok": r.ok,
+            **({"result": r.result} if r.ok else {"error": r.error}),
+        }
+        for r in exec_results
+    ]
+    branch.execution_log = log
+    steps_ok = sum(1 for r in exec_results if r.ok)
+    steps_failed = len(exec_results) - steps_ok
+    if steps_failed == 0 and steps_ok > 0:
+        branch.status = "committed"
+    elif steps_ok > 0:
+        branch.status = "committed_with_errors"
+    else:
+        # Zero successful steps — don't claim the commit happened
+        branch.status = "failed"
+        return {
+            "committed": False,
+            "branch_id": branch_id,
+            "branch_name": branch.name,
+            "error": "No steps executed successfully",
+            "steps_attempted": len(apply_steps),
+            "execution_log": log,
+        }
+    experiment.winner_branch_id = branch_id
+    experiment.status = "committed"
+    return {
+        "committed": True,
+        "branch_id": branch_id,
+        "branch_name": branch.name,
+        "steps_executed": steps_ok,
+        "steps_failed": steps_failed,
+        "status": branch.status,
+        "score": branch.score,
+        "execution_log": log,
+    }
 def commit_branch(
     experiment: ExperimentSet,
     branch_id: str,
     ableton,
 ) -> dict:
-    """Re-apply the winning branch's moves permanently."""
+    """Legacy sync wrapper kept for any direct caller. The canonical path
+    is commit_branch_async through tools.py → execute_plan_steps_async.
+    Still truth-honest: records per-step ok/error, marks branches as
+    'committed_with_errors' on partial failure rather than lying about it.
+    """
     branch = experiment.get_branch(branch_id)
     if not branch:
         return {"error": f"Branch {branch_id} not found"}
@@ -173,7 +350,6 @@ def commit_branch(
     if not branch.compiled_plan:
         return {"error": "Branch has no compiled plan"}
-    # Re-execute the plan (this time without undoing)
     executed = []
     for step in branch.compiled_plan.get("steps", []):
         tool = step.get("tool", "")
@@ -182,11 +358,29 @@ def commit_branch(
             continue
         try:
             result = ableton.send_command(tool, params)
-            executed.append({"tool": tool, "ok": True})
+            executed.append({"tool": tool, "ok": True, "backend": "remote_command"})
         except Exception as exc:
-            executed.append({"tool": tool, "ok": False, "error": str(exc)})
+            executed.append({"tool": tool, "ok": False, "backend": "remote_command", "error": str(exc)})
+    branch.execution_log = executed
+    ok_count = sum(1 for e in executed if e.get("ok"))
+    failed_count = len(executed) - ok_count
+    if failed_count == 0 and ok_count > 0:
+        branch.status = "committed"
+    elif ok_count > 0:
+        branch.status = "committed_with_errors"
+    else:
+        branch.status = "failed"
+        return {
+            "committed": False,
+            "branch_id": branch_id,
+            "branch_name": branch.name,
+            "error": "No steps executed successfully",
+            "steps_attempted": len(executed),
+            "execution_log": executed,
+        }
-    branch.status = "committed"
     experiment.winner_branch_id = branch_id
     experiment.status = "committed"
@@ -194,7 +388,9 @@ def commit_branch(
         "committed": True,
         "branch_id": branch_id,
         "branch_name": branch.name,
-        "steps_executed": len(executed),
+        "steps_executed": ok_count,
+        "steps_failed": failed_count,
+        "status": branch.status,
         "score": branch.score,
     }

package/mcp_server/experiment/models.py CHANGED Viewed

@@ -55,6 +55,12 @@ class ExperimentBranch:
     evaluation: Optional[dict] = None
     score: float = 0.0
+    # Execution log — per-step results from the async router. Non-empty when
+    # a branch has been run through run_branch or committed via commit_branch.
+    # Each entry: {tool, backend, ok, error, result}. Surfaced on to_dict()
+    # so callers can see exactly which steps succeeded or failed.
+    execution_log: list = field(default_factory=list)
     # Metadata
     created_at_ms: int = 0
     executed_at_ms: int = 0
@@ -77,6 +83,10 @@ class ExperimentBranch:
             d["after_snapshot"] = self.after_snapshot.to_dict()
         if self.evaluation:
             d["evaluation"] = self.evaluation
+        if self.execution_log:
+            d["execution_log"] = self.execution_log
+            d["steps_ok"] = sum(1 for e in self.execution_log if e.get("ok"))
+            d["steps_failed"] = sum(1 for e in self.execution_log if not e.get("ok"))
         return d

package/mcp_server/experiment/tools.py CHANGED Viewed

@@ -116,7 +116,7 @@ def create_experiment(
 @mcp.tool()
-def run_experiment(
+async def run_experiment(
     ctx: Context,
     experiment_id: str,
 ) -> dict:
@@ -125,10 +125,11 @@ def run_experiment(
     For each branch:
     1. Compile the semantic move against current session
     2. Capture before state
-    3. Execute the compiled plan
+    3. Execute the compiled plan (through the async router — v1.10.3 truth)
     4. Capture after state
-    5. Undo all changes (revert to checkpoint)
+    5. Undo all successful steps (revert to checkpoint)
     6. Evaluate the branch
+    7. Record per-step results on branch.execution_log
     Branches run sequentially (Ableton has linear undo).
     """
@@ -137,6 +138,8 @@ def run_experiment(
         return {"error": f"Experiment {experiment_id} not found"}
     ableton = _get_ableton(ctx)
+    bridge = ctx.lifespan_context.get("m4l")
+    mcp_registry = ctx.lifespan_context.get("mcp_dispatch", {})
     # Import compiler
     from ..semantic_moves import registry, compiler
@@ -149,7 +152,7 @@ def run_experiment(
         # Compile the move
         move = registry.get_move(branch.move_id)
         if not move:
-            branch.status = "evaluated"
+            branch.status = "failed"
             branch.score = 0.0
             branch.evaluation = {"error": f"Move {branch.move_id} not found"}
             results.append(branch.to_dict())
@@ -160,12 +163,15 @@ def run_experiment(
         plan = compiler.compile(move, kernel)
         compiled_dict = plan.to_dict()
-        # Run the branch (apply → capture → undo)
-        engine.run_branch(
+        # Run the branch through the async router
+        await engine.run_branch_async(
             branch=branch,
             ableton=ableton,
             compiled_plan=compiled_dict,
             capture_fn=lambda: _capture_snapshot(ctx),
+            bridge=bridge,
+            mcp_registry=mcp_registry,
+            ctx=ctx,
         )
         # Evaluate
@@ -236,22 +242,34 @@ def compare_experiments(
 @mcp.tool()
-def commit_experiment(
+async def commit_experiment(
     ctx: Context,
     experiment_id: str,
     branch_id: str,
 ) -> dict:
     """Commit the winning branch — re-apply its moves permanently.
-    This executes the branch's compiled plan again, this time without undoing.
-    The experiment is marked as committed.
+    Routes the compiled plan through the async router (v1.10.3 truth).
+    Returns a result dict with per-step execution_log. If any step failed,
+    branch.status is set to 'committed_with_errors' and the response
+    reports steps_failed > 0, so callers can tell the commit was partial.
     """
     experiment = engine.get_experiment(experiment_id)
     if not experiment:
         return {"error": f"Experiment {experiment_id} not found"}
     ableton = _get_ableton(ctx)
-    return engine.commit_branch(experiment, branch_id, ableton)
+    bridge = ctx.lifespan_context.get("m4l")
+    mcp_registry = ctx.lifespan_context.get("mcp_dispatch", {})
+    return await engine.commit_branch_async(
+        experiment,
+        branch_id,
+        ableton,
+        bridge=bridge,
+        mcp_registry=mcp_registry,
+        ctx=ctx,
+    )
 @mcp.tool()

package/mcp_server/persistence/project_store.py CHANGED Viewed

@@ -20,15 +20,69 @@ _MAX_WONDER_OUTCOMES = 10
 def project_hash(session_info: dict) -> str:
-    """Compute a stable project fingerprint from session info.
-    Uses tempo + track count + sorted track names. This is imperfect
-    but stable enough for per-song state within a production session.
+    """Compute a project fingerprint from session info.
+    v1.10.3 Truth Release: this used to use `tempo + len(tracks) + sorted
+    track names`, which had obvious collisions — any two songs at the same
+    tempo with the same track names collided even if the tracks were in
+    different order, the scenes were different, or the arrangement length
+    differed. The author's own comment acknowledged the weakness.
+    The new hash uses a lot more entropy from the session:
+      * tempo (1 decimal)
+      * time signature (num/denom)
+      * song_length (arrangement length in beats) — very distinguishing
+      * ORDERED track list: (index, name, color_index, has_midi_input)
+      * ORDERED scene list: (index, name, color_index)
+      * return track count + names
+    This is still a fingerprint, not a true project ID (for that we'd need
+    the Live set file path, which requires a new Remote Script handler).
+    But it's collision-resistant across the common failure modes:
+      * template-based starts diverge once the user renames a track, adds
+        a scene, or adjusts the arrangement length
+      * track reordering produces a new hash (correctly — it's a real edit)
+      * two songs at 128 BPM with tracks named Drums/Bass no longer collide
+        unless they also share identical scene lists AND song length
     """
     tempo = session_info.get("tempo", 120.0)
-    tracks = session_info.get("tracks", [])
-    track_names = sorted(t.get("name", "") for t in tracks if isinstance(t, dict))
-    seed = f"{tempo:.1f}|{len(tracks)}|{'|'.join(track_names)}"
+    sig_num = session_info.get("signature_numerator", 4)
+    sig_denom = session_info.get("signature_denominator", 4)
+    song_length = session_info.get("song_length", 0.0)
+    tracks = session_info.get("tracks", []) or []
+    # Ordered track signature — (index, name, color, has_midi_input)
+    track_sig = "|".join(
+        f"{t.get('index', i)}:{t.get('name', '')}:{t.get('color_index', 0)}:{int(t.get('has_midi_input', False))}"
+        for i, t in enumerate(tracks)
+        if isinstance(t, dict)
+    )
+    return_tracks = session_info.get("return_tracks", []) or []
+    return_sig = "|".join(
+        f"{r.get('index', i)}:{r.get('name', '')}"
+        for i, r in enumerate(return_tracks)
+        if isinstance(r, dict)
+    )
+    scenes = session_info.get("scenes", []) or []
+    scene_sig = "|".join(
+        f"{s.get('index', i)}:{s.get('name', '')}:{s.get('color_index', 0)}"
+        for i, s in enumerate(scenes)
+        if isinstance(s, dict)
+    )
+    seed = "||".join([
+        f"t={tempo:.1f}",
+        f"sig={sig_num}/{sig_denom}",
+        f"len={song_length:.2f}",
+        f"n_tracks={len(tracks)}",
+        f"tracks=[{track_sig}]",
+        f"n_returns={len(return_tracks)}",
+        f"returns=[{return_sig}]",
+        f"n_scenes={len(scenes)}",
+        f"scenes=[{scene_sig}]",
+    ])
     return hashlib.sha256(seed.encode()).hexdigest()[:12]

package/mcp_server/preview_studio/tools.py CHANGED Viewed

@@ -225,18 +225,33 @@ def compare_preview_variants(
 @mcp.tool()
-def commit_preview_variant(
+async def commit_preview_variant(
     ctx: Context,
     set_id: str,
     variant_id: str,
 ) -> dict:
-    """Commit the chosen variant from a preview set.
-    Marks the variant as committed and discards the others.
-    The caller should then apply the variant's compiled plan.
+    """Commit the chosen variant from a preview set — APPLIES the plan.
+    v1.10.3 Truth Release: this tool used to only mark the variant as
+    committed in the in-memory store and leave plan application to the
+    caller, which was a trust leak — users expected "commit" to actually
+    apply the chosen variant. It now actually runs the variant's compiled
+    plan through the async execution router. No undo after, the changes
+    stick.
+    Returns:
+        {
+            committed: bool (true if all steps applied, false if plan failed),
+            variant_id, label, intent, move_id, identity_effect, what_preserved,
+            execution_log: [{tool, backend, ok, error/result} per step],
+            steps_ok: int,
+            steps_failed: int,
+            status: "committed" | "committed_with_errors" | "failed",
+        }
-    set_id: the preview set
-    variant_id: the chosen variant to commit
+    If the variant is analytical-only (no compiled_plan), the tool records
+    the choice and returns status="analytical_only" WITHOUT pretending to
+    execute anything — callers get a clear signal instead of a silent no-op.
     """
     ps = engine.get_preview_set(set_id)
     if not ps:
@@ -260,6 +275,57 @@ def commit_preview_variant(
         "what_preserved": chosen.what_preserved,
     }
+    # ── v1.10.3: actually execute the compiled plan ──
+    # If there's no compiled plan, the variant is analytical-only — record
+    # the choice and return honestly instead of pretending it was applied.
+    if not chosen.compiled_plan:
+        result["committed"] = False
+        result["status"] = "analytical_only"
+        result["note"] = (
+            "Variant has no compiled plan (analytical-only). Preview set "
+            "marked the choice but no session changes were made. Use an "
+            "executable variant if you want the commit to apply changes."
+        )
+    else:
+        from ..runtime.execution_router import execute_plan_steps_async
+        plan = chosen.compiled_plan
+        steps = plan if isinstance(plan, list) else plan.get("steps", []) or []
+        ableton = _get_ableton(ctx)
+        bridge = ctx.lifespan_context.get("m4l")
+        mcp_registry = ctx.lifespan_context.get("mcp_dispatch", {})
+        exec_results = await execute_plan_steps_async(
+            steps,
+            ableton=ableton,
+            bridge=bridge,
+            mcp_registry=mcp_registry,
+            ctx=ctx,
+            stop_on_failure=False,
+        )
+        log = [
+            {
+                "tool": r.tool,
+                "backend": r.backend,
+                "ok": r.ok,
+                **({"result": r.result} if r.ok else {"error": r.error}),
+            }
+            for r in exec_results
+        ]
+        steps_ok = sum(1 for r in exec_results if r.ok)
+        steps_failed = len(exec_results) - steps_ok
+        result["execution_log"] = log
+        result["steps_ok"] = steps_ok
+        result["steps_failed"] = steps_failed
+        if steps_failed == 0 and steps_ok > 0:
+            result["status"] = "committed"
+        elif steps_ok > 0:
+            result["status"] = "committed_with_errors"
+        else:
+            result["status"] = "failed"
+            result["committed"] = False
     # Wonder lifecycle hooks
     ws = _find_wonder_session_by_preview(set_id)
     if ws:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "livepilot",
-  "version": "1.10.2",
+  "version": "1.10.3",
   "mcpName": "io.github.dreamrec/livepilot",
   "description": "Agentic production system for Ableton Live 12 — 317 tools, 43 domains. Device atlas (1305 devices), sample engine (Splice + browser + filesystem), auto-composition, spectral perception, technique memory, creative intelligence (12 engines)",
   "author": "Pilot Studio",

package/remote_script/LivePilot/__init__.py CHANGED Viewed

@@ -5,7 +5,7 @@ Entry point for the ControlSurface. Ableton calls create_instance(c_instance)
 when this script is selected in Preferences > Link, Tempo & MIDI.
 """
-__version__ = "1.10.2"
+__version__ = "1.10.3"
 from _Framework.ControlSurface import ControlSurface
 from .server import LivePilotServer

package/scripts/sync_metadata.py CHANGED Viewed

@@ -19,13 +19,13 @@ ROOT = Path(__file__).resolve().parents[1]
 def get_version() -> str:
     """Read version from package.json (source of truth)."""
-    pkg = json.loads((ROOT / "package.json").read_text())
+    pkg = json.loads((ROOT / "package.json").read_text(encoding="utf-8"))
     return pkg["version"]
 def get_tool_count() -> int:
     """Read tool count from test_tools_contract.py assertion."""
-    src = (ROOT / "tests" / "test_tools_contract.py").read_text()
+    src = (ROOT / "tests" / "test_tools_contract.py").read_text(encoding="utf-8")
     match = re.search(r"assert len\(tools\) == (\d+)", src)
     if match:
         return int(match.group(1))
@@ -73,7 +73,7 @@ def check_version(version: str) -> list[str]:
         path = ROOT / rel_path
         if not path.exists():
             continue
-        content = path.read_text()
+        content = path.read_text(encoding="utf-8")
         if version not in content:
             # Find what version IS there
             old = re.search(r"1\.\d+\.\d+", content)
@@ -91,7 +91,7 @@ def check_tool_count(count: int) -> list[str]:
         path = ROOT / rel_path
         if not path.exists():
             continue
-        content = path.read_text()
+        content = path.read_text(encoding="utf-8")
         # Look for "N tools" pattern
         matches = re.findall(r"(\d+)\s+tools", content)
         for m in matches: