npm - livepilot - Versions diffs - 1.26.0 → 1.26.2 - Mend

livepilot 1.26.0 → 1.26.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (191) hide show

package/livepilot/skills/livepilot-evaluation/SKILL.md ADDED Viewed

@@ -0,0 +1,195 @@
+---
+name: livepilot-evaluation
+description: This skill should be used when the user asks to "evaluate a change", "was that good", "keep or undo", "A/B compare", "rate my change", "check if that helped", or wants to use the universal evaluation loop to judge production moves.
+---
+# Evaluation Engine — Universal Move Judgment
+The evaluation engine is the shared decision layer used by all other engines (mix, sound design, composition, performance). It determines whether a change improved the session, and whether to keep it, undo it, or learn from it.
+## The Universal Evaluation Loop
+Every production move follows this loop regardless of which engine initiated it.
+### Step 1 — Compile Goal Vector
+Call `compile_goal_vector(request_text, targets, protect, mode, aggression)` to establish what you are trying to achieve.
+**request_text**: a plain-text description of the intended improvement (e.g., "reduce masking between bass and kick in 100-200 Hz range").
+**targets**: dict of dimension → weight (e.g., `{"punch": 0.4, "weight": 0.3, "clarity": 0.3}`). Weights are normalized to sum to 1.0.
+**Valid dimensions split into two families:**
+**Family A — Technical / measurable** (derived from spectrum, RMS, LUFS, masking reports):
+energy, punch, weight, density, brightness, warmth, width, depth, motion, contrast, clarity, cohesion, groove, tension, novelty, polish, emotion.
+**Family B — Artistic / identity** (derived from concept packets, motif graphs, section labels, taste graphs): required on any goal vector invoked by `livepilot-creative-director`:
+- **style_fit** — how well the result matches the concept packet's sonic_identity
+- **distinctiveness** — how much the result differs from the baseline completion pattern
+- **motif_coherence** — whether recurring motifs survive / evolve meaningfully
+- **section_contrast** — whether section-level differences stay legible
+- **restraint** — whether the result respects protected qualities and avoids additive drift
+- **surprise_without_breakage** — whether novelty was introduced without violating identity
+Both families share the same 0.0 – 1.0 scale. A technically improved result can still be artistically worse — both must pass for a creative-director turn to be kept.
+**protect**: dict of dimension → minimum threshold (e.g., `{"clarity": 0.7}`). If a dimension drops below this value after a move, the move is undone.
+**Modes** control how aggressively you act:
+- `observe` — read-only analysis, no changes. Use for diagnostics and status checks.
+- `improve` — targeted fixes for specific issues. The default mode. Make the smallest change that addresses the problem.
+- `explore` — creative experimentation. Wider parameter ranges, more tolerance for unexpected results. Use when the user says "try something", "experiment", or "surprise me".
+- `finish` — polish and finalize. Conservative moves only, protect what already works. Use when the user says "almost done", "final touches", or "wrap it up".
+- `diagnose` — identify problems without fixing them. Like observe but with critic analysis. Use when the user says "what's wrong" without asking for fixes.
+### Step 2 — Build World Model
+Call `build_world_model` to snapshot the current session state and available capabilities:
+- Session info: tracks, clips, devices, tempo, time signature
+- Capability state: analyzer connected, M4L bridge active, FluCoMa available
+- Recent actions: last moves taken and their outcomes
+- Active constraints: performance mode safety limits, user anti-preferences
+The world model determines what tools are available and what measurements are possible.
+### Step 3 — Get Turn Budget
+Call `get_turn_budget(mode)` to determine how many moves you should make this turn:
+- `observe` / `diagnose`: 0 moves (read-only)
+- `improve`: 1-3 moves per turn, evaluate after each
+- `explore`: 1-5 moves per turn, wider tolerance
+- `finish`: 1 move per turn, strict evaluation
+Do not exceed the turn budget. If more work is needed, complete the current turn, report progress, and start a new turn.
+### Step 4 — Capture Before
+Take measurements appropriate to the engine context:
+- **Mix engine**: `get_master_spectrum` + `get_master_rms`
+- **Sound design**: `get_device_parameters` + `get_master_spectrum`
+- **Composition**: `get_notes` or `get_arrangement_notes` + `get_section_graph`
+- **Universal**: `get_mix_snapshot` for full session state
+Always capture before executing. Without a before snapshot, evaluation is meaningless.
+### Step 5 — Execute Intervention
+Apply the planned change using the appropriate tool. Execute exactly one move, then proceed to evaluation.
+### Step 6 — Capture After
+Repeat the same measurements from Step 4. Use identical tool calls to ensure comparable data.
+### Step 7 — Evaluate
+Call the appropriate evaluator:
+- `evaluate_move(goal_vector, before_snapshot, after_snapshot)` — universal evaluator. `goal_vector` is the dict returned by `compile_goal_vector`. Snapshots should contain `spectrum` (9-band dict sub_low → air, or 8-band from pre-v1.16 .amxd builds), `rms`, `peak`.
+- `evaluate_mix_move(before_snapshot, after_snapshot, targets, protect)` — mix-specific with protection constraints. `targets` and `protect` are dicts of dimension → weight/threshold.
+- `evaluate_composition_move(before_snapshot, after_snapshot, goal_vector)` — composition-specific
+- `evaluate_with_fabric(engine, before_snapshot, after_snapshot, targets, protect)` — routes to the appropriate engine-specific evaluator. `engine` must be one of: `"sonic"`, `"composition"`, `"mix"`, `"transition"`, `"translation"`.
+### Step 8 — Read the Verdict
+Every evaluator returns:
+- `keep_change` (bool): whether the change should stay
+- `score` (0.0-1.0): magnitude of improvement (0.5 = neutral, >0.5 = better, <0.5 = worse)
+- `goal_progress` (0.0-1.0): how much closer to the stated goal
+- `collateral_damage` (list): things that got worse as a side effect
+- `explanation` (string): human-readable judgment summary
+### Step 8b — Creative-Success Verdict (for creative-director turns)
+When the evaluation is invoked by `livepilot-creative-director`, also
+assign one of five verdict tags. This is a structured classification
+on top of `keep_change` / `score`, for learning and debugging:
+| Verdict | Meaning | Technical score | Artistic score | User kept | Action |
+|---|---|---|---|---|---|
+| `safe_win` | Low-novelty move that landed cleanly | ≥ 0.6 | ≥ 0.55 | yes | Memory-learn candidate. Use as baseline for future similar asks. |
+| `bold_win` | High-novelty move that landed and stuck | ≥ 0.55 | ≥ 0.65 | yes | STRONG memory-learn candidate. Surface when user asks for similar in the future. |
+| `interesting_failure` | Novel, didn't land technically, but user kept for study | < 0.55 | ≥ 0.60 | yes | Note in memory but DO NOT promote for auto-replay. User keeping it ≠ reusability. |
+| `identity_break` | Technically OK but violated protected qualities | ≥ 0.55 | < 0.45 | no | Auto-undo. `record_anti_preference` with the protected quality that was violated. |
+| `generic_fallback` | Both scores ≤ mid, collapsed to a default pattern | < 0.55 | < 0.55 | no | Auto-undo. This is the "stuck in patterns" signature. `record_anti_preference` with the family+target combo. |
+Include the verdict in `memory_learn` payload so future sessions can
+filter by type. The director's Phase 8 uses these tags explicitly.
+### Step 9 — Keep or Undo
+If `keep_change` is `false`:
+1. Call `undo()` immediately
+2. Report to the user: what was tried, why it was undone, citing `collateral_damage`
+3. Consider an alternative approach for the same goal
+If `keep_change` is `true`:
+1. Report the improvement with score and explanation
+2. If `score > 0.7`, this is a memory promotion candidate (see below)
+### Step 10 — Repeat or Stop
+Check turn budget remaining. If budget allows and goal_progress < 1.0, return to Step 4 for the next move. Otherwise, summarize progress and stop.
+## Capability Modes
+The world model includes a capability state that affects what measurements are available. Call `get_capability_state` to check.
+### normal
+Full measured evaluation. M4L analyzer connected, all spectral/RMS/key tools available. Critics use real data. This is the best mode.
+### measured_degraded
+Analyzer data is stale (older than 5 seconds) or intermittent. Measurements may not reflect current state. Re-trigger analysis before trusting cached values. Inform the user that data freshness is limited.
+### judgment_only
+No analyzer connected. Evaluation relies on device parameter changes, track structure, and role-based heuristics. Critics cannot use spectral evidence. Inform the user: "Evaluation is based on parameter analysis only — spectral verification unavailable."
+### read_only
+Session disconnected or in an error state. No tools can modify the session. Only read operations from cached data. Inform the user and suggest reconnecting.
+## Action Ledger
+Every move is recorded in the action ledger for accountability and learning.
+- `get_action_ledger_summary` — summary of all actions taken this session with scores
+- `get_recent_actions` — last N actions with full detail
+- `get_last_move` — the most recent action and its evaluation result
+Use the ledger to avoid repeating failed approaches. If a move type has been undone twice for the same issue, try a different strategy.
+## Memory Promotion
+Successful moves can be promoted to persistent memory for future sessions.
+- `get_promotion_candidates` — list moves from this session that scored > 0.7 and are eligible for saving
+- `memory_learn(name, type, qualities, payload, tags)` — save a technique to memory. `type` must be one of: `beat_pattern`, `device_chain`, `mix_template`, `browser_pin`, `preference`. `qualities` must include a `summary` field. `payload` contains the technique data.
+- `record_anti_preference(dimension, direction)` — record something the user explicitly rejected. `dimension` is the quality axis (e.g., "brightness", "width"), `direction` must be `"increase"` or `"decrease"`. This ensures the rejected direction is never suggested again.
+### Promotion Rules
+1. Only promote moves the user confirmed satisfaction with — a high score alone is not enough
+2. Anti-preferences are permanent until explicitly deleted
+3. Check `get_anti_preferences` before suggesting any move to avoid repeating rejected ideas
+4. Promotion is optional — never force it. Suggest when appropriate: "That scored 0.85 — want me to save this technique for future sessions?"
+5. **For creative-director turns**, apply the verdict-driven promotion rubric:
+   - `safe_win` → promote if user keeps ≥ 2 subsequent turns (avoid over-promoting minor moves)
+   - `bold_win` → promote immediately; tag with concept packet + novelty_budget context
+   - `interesting_failure` → DO NOT auto-promote; store as a "curiosity" note only
+   - `identity_break` → NEVER promote; `record_anti_preference` instead
+   - `generic_fallback` → NEVER promote; `record_anti_preference` with family+target combo
+   - See `livepilot-core/references/memory-guide.md` for the full promotion rubric.
+## A/B Comparison
+When the user asks "was that good?" or "A/B compare":
+1. The before snapshot is A, the after snapshot is B
+2. Call the evaluator to get the score
+3. Present the comparison: "Before: [metrics]. After: [metrics]. Score: [score]. [explanation]"
+4. If the user prefers A, call `undo()` to revert to it
+5. If the user prefers B, keep the current state

package/livepilot/skills/livepilot-evaluation/references/capability-modes.md ADDED Viewed

@@ -0,0 +1,176 @@
+# Capability Modes Reference
+The evaluation engine adapts its behavior based on what measurement capabilities are available. Call `get_capability_state` to determine the current mode.
+## Mode: normal
+Full measurement capabilities available.
+**Requirements:**
+- Ableton Live connected via TCP port 9878
+- M4L analyzer bridge running on master track
+- UDP 9880 (M4L -> Server) and OSC 9881 (Server -> M4L) active
+- SpectralCache receiving fresh data (age < 5 seconds)
+**Available measurements:**
+- `get_master_spectrum` — 9-band spectral analysis (sub_low → air), real-time
+- `get_master_rms` — RMS and peak levels
+- `get_detected_key` — key detection from audio
+- `get_mel_spectrum` — mel-scaled spectral representation
+- `get_chroma` — chromagram for harmonic analysis
+- `get_onsets` — transient detection
+- `get_momentary_loudness` — short-term loudness
+- `get_spectral_shape` — centroid, spread, skewness, kurtosis
+- All device parameter reads and session state tools
+**Evaluation quality:** Highest. Critics use measured spectral evidence. Before/after comparisons are numerically precise.
+## Mode: measured_degraded
+Analyzer data is present but stale or intermittent.
+**Indicators:**
+- SpectralCache age > 5 seconds
+- Intermittent UDP packet loss from M4L device
+- M4L bridge loaded but analyzer section not receiving audio
+**Available measurements:**
+- All session state tools (tracks, clips, devices, parameters)
+- Cached spectral data (may not reflect current audio)
+- Device parameter reads (always fresh)
+**Evaluation quality:** Moderate. Spectral comparisons may be inaccurate if data is stale. Always check cache age before trusting spectrum values.
+**User notification:** "Analyzer data may be stale. For accurate spectral evaluation, play audio through the master bus and wait 2-3 seconds for the cache to refresh."
+## Mode: judgment_only
+No M4L analyzer connected. The evaluation engine operates on structural and parametric data only.
+**Indicators:**
+- M4L bridge not loaded on master track
+- UDP 9880 not receiving data
+- `get_master_spectrum` returns error or empty data
+**Available measurements:**
+- All session state tools
+- Device parameter reads
+- Track structure (names, types, device chains)
+- Note and clip data
+- Role-based heuristics (bass tracks should have low content, etc.)
+**Evaluation quality:** Limited. No spectral evidence for masking, balance, or loudness judgments. Critics infer from:
+- Track names and roles (a track named "Bass" should have low-frequency content)
+- Device chains (a track with EQ Eight + Compressor is likely processed)
+- Parameter values (filter cutoff position, compressor threshold)
+- Volume/pan/send positions
+**User notification:** "M4L analyzer is not connected. Evaluation is based on track structure and parameter analysis only. For spectral verification, load the LivePilot Bridge device on the master track."
+## Mode: read_only
+Session disconnected or in an error state.
+**Indicators:**
+- TCP connection to port 9878 failed or timed out
+- Remote Script not responding
+- Ableton Live not running or crashed
+**Available measurements:**
+- Cached session data from last successful connection
+- Memory system (technique recall, preferences)
+- No live reads from the session
+**Evaluation quality:** None for current state. Can only reference cached data and memory.
+**User notification:** "Session disconnected. Cannot evaluate current state. Reconnect to Ableton Live to resume."
+## Capability Fallback Chain
+When a measurement fails, fall back gracefully:
+1. Try the primary measurement tool
+2. If it fails, check if degraded data is available in cache
+3. If no cache, use parametric/structural heuristics
+4. If no session connection, report inability and suggest reconnection
+Never silently skip evaluation. Always inform the user which capability mode is active and how it affects the quality of judgment.
+## Checking Capability State
+Call `get_capability_state` at the start of any evaluation session. The response is a nested `domains` dict keyed by capability name — NOT the flat shape older docs described.
+```json
+{
+  "capability_state": {
+    "generated_at_ms": 1776929160866,
+    "overall_mode": "normal",
+    "domains": {
+      "session_access": {"name": "session_access", "available": true,  "confidence": 1.0, "mode": "healthy",     "reasons": []},
+      "analyzer":       {"name": "analyzer",       "available": true,  "confidence": 0.9, "mode": "available",   "reasons": []},
+      "memory":         {"name": "memory",         "available": true,  "confidence": 1.0, "mode": "available",   "reasons": []},
+      "web":            {"name": "web",            "available": true,  "confidence": 0.7, "mode": "available",   "reasons": []},
+      "research":       {"name": "research",       "available": true,  "confidence": 0.9, "mode": "available",   "reasons": []},
+      "flucoma":        {"name": "flucoma",        "available": false, "confidence": 0.0, "mode": "unavailable", "reasons": ["flucoma_not_installed"]}
+    }
+  }
+}
+```
+### Top-level fields
+- `capability_state.generated_at_ms`: Unix-ms timestamp of the probe.
+- `capability_state.overall_mode`: one of `"normal"`, `"measured_degraded"`, `"judgment_only"`, `"read_only"` — the global evaluation-quality tier computed from the per-domain signals.
+- `capability_state.domains`: dict keyed by domain name; each value is a capability-domain record.
+### Per-domain fields
+Every entry in `domains` has the same shape:
+- `name`: the domain key (`"session_access"`, `"analyzer"`, `"memory"`, `"web"`, `"research"`, `"flucoma"`).
+- `available`: boolean — is this capability ready to use right now?
+- `confidence`: 0.0–1.0 — how much to trust the `available` flag (e.g. stale analyzer data lowers confidence).
+- `mode`: short human label specific to the domain (`"healthy"`, `"available"`, `"measured"`, `"stale"`, `"targeted_only"`, `"full"`, `"unavailable"`).
+- `reasons`: list of short machine-readable tokens explaining why the domain is in its current state (`"analyzer_offline"`, `"web_unavailable"`, `"flucoma_not_installed"`, …). Empty when healthy.
+- `freshness_ms`: optional — milliseconds since the domain last received fresh data (currently only the analyzer domain populates this).
+### Domain definitions
+- **session_access** — live TCP connectivity to the Ableton Remote Script on port 9878. `available=true` means a `get_session_info` round-trip succeeded.
+- **analyzer** — the M4L bridge + spectral cache. `available=true` requires the bridge to be connected AND the spectral cache to have recently received data.
+- **memory** — the local technique-store / taste memory. `available=true` means the persistent stores can be read and written.
+- **web** — server-side outbound HTTP capability. True when the MCP host can reach an arbitrary public URL (probed by a 500 ms HEAD request to `https://api.github.com`). Does NOT imply curated research corpora are installed — see the `research` domain for that.
+- **research** — composite over `session_access`, `memory`, and `web`. `mode="full"` when all three are available; `"targeted_only"` when at least one source is up; `"unavailable"` when nothing is reachable.
+- **flucoma** — whether the optional `flucoma` Python package is importable (probed via `importlib.util.find_spec`). FluCoMa-backed tools (`check_flucoma`, `extract_timbre_fingerprint`, etc.) degrade gracefully when this domain is unavailable.
+## Collaborative Mode (Live 12.4+)
+Live 12.4 introduces a new capability tier that unlocks native LOM access for sample replacement. This tier is separate from the evaluation modes above — it affects routing behavior in the MCP server, not spectral measurement.
+**Version gate:** Live 12.4.0+
+**Detection flag:** `LiveVersionCapabilities.has_replace_sample_native == True`
+(exposed on `LiveVersionCapabilities.capability_tier == "collaborative"`)
+**What changes at this tier:**
+- `SimplerDevice.replace_sample(path)` is available as a native LOM call.
+  The MCP tools `replace_simpler_sample` and `load_sample_to_simpler`
+  route to this native path automatically.
+- The native path handles empty Simplers — the long-standing limitation
+  (documented in `feedback_load_browser_item_is_source_of_truth.md`)
+  that required `load_browser_item` as a workaround no longer applies
+  on Live 12.4+.
+**Backward compatibility:**
+- Live 12.0–12.3.x: `has_replace_sample_native == False`. All sample
+  replacement still routes through the M4L bridge. Zero behavior change.
+- Live 12.4+: native path preferred; M4L bridge used only on fallback.
+**Tool signatures:** unchanged. Callers do not need to detect the tier —
+routing is transparent.
+**Follow-up plans (not yet shipped):**
+- Link Audio (tempo-sync sharing between Live sets) — tracked as a
+  future Collaborative-tier feature.
+- Stem Separation v2 — tracked as a future Collaborative-tier feature.
+Neither is available in the 1.26.2 release — still pending.

package/livepilot/skills/livepilot-evaluation/references/evaluation-contracts.md ADDED Viewed

@@ -0,0 +1,121 @@
+# Evaluation Contracts Reference
+Every evaluator returns the same base contract. Engine-specific evaluators extend it with additional fields.
+## Base Evaluation Contract
+Returned by `evaluate_move`:
+```json
+{
+  "keep_change": true,
+  "score": 0.72,
+  "goal_progress": 0.6,
+  "collateral_damage": [],
+  "explanation": "Filter cut at 250 Hz reduced masking by 4 dB without affecting bass body.",
+  "before_metrics": {
+    "master_rms_db": -12.4,
+    "master_peak_db": -3.2,
+    "spectrum": [...]
+  },
+  "after_metrics": {
+    "master_rms_db": -12.8,
+    "master_peak_db": -3.5,
+    "spectrum": [...]
+  }
+}
+```
+### Field Definitions
+- **keep_change** (bool): `true` if the change improved the target without unacceptable regression. `false` if the change should be undone.
+- **score** (float 0.0-1.0): 0.0 = catastrophic regression, 0.5 = neutral (no change), 1.0 = perfect improvement. Scores below 0.4 trigger automatic undo recommendation.
+- **goal_progress** (float 0.0-1.0): how much of the stated goal has been achieved. 1.0 means the goal is fully met. Use this to decide whether to continue iterating.
+- **collateral_damage** (list of strings): side effects that got worse. Empty list means no regressions detected. Examples: "bass lost 2 dB of body", "stereo width narrowed by 15%".
+- **explanation** (string): one-sentence human-readable summary of the judgment. Always report this to the user.
+## Mix Evaluation Contract
+Returned by `evaluate_mix_move`, extends base with:
+```json
+{
+  "targets": {
+    "reduce_masking": { "before": 0.72, "after": 0.35, "improved": true },
+    "maintain_headroom": { "before": -3.2, "after": -3.5, "ok": true }
+  },
+  "protect": {
+    "bass_body": { "before": -14.2, "after": -14.8, "ok": true },
+    "vocal_presence": { "before": -8.1, "after": -8.0, "ok": true }
+  },
+  "spectral_delta_db": {
+    "sub": 0.1, "low": -0.3, "low_mid": -2.1,
+    "mid": 0.2, "high_mid": 0.1, "high": 0.0
+  }
+}
+```
+- **targets**: what the move aimed to improve, with before/after measurements
+- **protect**: what must not get worse, with tolerance checking
+- **spectral_delta_db**: per-band change in spectral energy
+## Composition Evaluation Contract
+Returned by `evaluate_composition_move`, extends base with:
+```json
+{
+  "structural_coherence": 0.85,
+  "thematic_continuity": 0.78,
+  "energy_delta": 0.15,
+  "transition_smoothness": 0.82,
+  "note_count_delta": 12
+}
+```
+- **structural_coherence**: how well the change fits the overall form
+- **thematic_continuity**: whether existing motifs are maintained or developed (not broken)
+- **energy_delta**: change in section energy level
+- **transition_smoothness**: quality of section boundaries after the change
+## Fabric Evaluation Contract
+Returned by `evaluate_with_fabric`, extends base with:
+```json
+{
+  "taste_alignment": 0.88,
+  "anti_preference_violations": [],
+  "similar_past_moves": [
+    { "memory_id": "mix_001", "similarity": 0.91, "past_score": 0.85 }
+  ],
+  "novelty_score": 0.3
+}
+```
+- **taste_alignment**: how well the move matches the user's saved taste profile
+- **anti_preference_violations**: list of anti-preferences this move conflicts with (should be empty)
+- **similar_past_moves**: techniques from memory that resemble this move, with their past scores
+- **novelty_score**: how different this move is from past approaches (high = novel, low = familiar)
+## Scoring Thresholds
+| Score Range | Interpretation | Action |
+|------------|---------------|--------|
+| 0.0 - 0.3 | Significant regression | Auto-undo, explain damage |
+| 0.3 - 0.45 | Mild regression | Undo recommended, ask user |
+| 0.45 - 0.55 | Neutral / no effect | Keep but note it had no impact |
+| 0.55 - 0.7 | Mild improvement | Keep, continue iterating |
+| 0.7 - 0.85 | Clear improvement | Keep, suggest memory promotion |
+| 0.85 - 1.0 | Excellent improvement | Keep, strongly suggest promotion |
+## Collateral Damage Categories
+Common side effects to check for:
+- **bass_body_loss**: EQ cuts in the low-mid range reduced bass warmth
+- **stereo_narrowing**: mono compatibility fix reduced perceived width
+- **headroom_reduction**: boost increased master peak level
+- **transient_loss**: compression removed punch from drums
+- **vocal_masking**: frequency boost created new masking with vocal track
+- **phase_issue**: stereo manipulation introduced phase cancellation

package/livepilot/skills/livepilot-evaluation/references/memory-promotion.md ADDED Viewed

@@ -0,0 +1,110 @@
+# Memory Promotion Reference
+Memory promotion saves successful production moves to persistent storage for recall in future sessions.
+## Promotion Flow
+1. A move scores > 0.7 in evaluation
+2. Call `get_promotion_candidates` to list all eligible moves from this session
+3. Present the candidate to the user with score and description
+4. If the user confirms, call `memory_learn(name, type, qualities, payload)` to save
+5. The technique is now available via `memory_recall` in future sessions
+## Promotion Candidates
+`get_promotion_candidates` returns moves that meet all criteria:
+- Evaluation score > 0.7
+- `keep_change` was `true`
+- The move has not already been promoted
+- The move does not conflict with any anti-preference
+Response format:
+```json
+{
+  "candidates": [
+    {
+      "action_id": "act_001",
+      "move_type": "eq_cut",
+      "score": 0.85,
+      "goal": "reduce masking between bass and kick",
+      "parameters": {
+        "track": "Bass",
+        "device": "EQ Eight",
+        "band": 3,
+        "frequency": 250,
+        "gain_db": -4.5,
+        "q": 2.0
+      },
+      "explanation": "4.5 dB cut at 250 Hz on bass cleared kick presence without losing bass warmth"
+    }
+  ]
+}
+```
+## Memory Types
+When calling `memory_learn`, specify the type:
+- `mix_template` — mixing techniques (EQ curves, compression settings, gain staging recipes)
+- `sound_design` — patch design moves (modulation settings, filter configurations, oscillator tuning)
+- `composition` — structural techniques (transition patterns, arrangement gestures, motif transformations)
+- `automation` — automation curves and recipes
+- `performance` — live performance patterns (scene orderings, safe macro ranges)
+## Anti-Preferences
+Anti-preferences are the inverse of promotion — they record moves the user explicitly rejected.
+### Recording
+Call `record_anti_preference(dimension, direction)` when:
+- The user says "I hate that", "never do that again", "that's wrong"
+- A move is undone and the user expresses displeasure (not just neutral undo)
+- The user explicitly states a preference against a technique
+### Checking
+Call `get_anti_preferences` before suggesting any move. The response lists all recorded anti-preferences with descriptions and creation dates.
+### Format
+```json
+{
+  "anti_preferences": [
+    {
+      "id": "ap_001",
+      "description": "Never boost above 10 kHz on vocals — user finds it harsh",
+      "created": "2026-04-08T14:30:00Z"
+    },
+    {
+      "id": "ap_002",
+      "description": "No sidechain compression on pads — user prefers volume automation for ducking",
+      "created": "2026-04-09T09:15:00Z"
+    }
+  ]
+}
+```
+### Rules
+1. Always check anti-preferences before planning any move
+2. If a planned move matches an anti-preference, skip it and choose an alternative
+3. Anti-preferences are permanent until the user explicitly asks to remove one
+4. When skipping a move due to anti-preference, tell the user: "Skipping [move] because you previously indicated [anti-preference]."
+## Promotion Best Practices
+1. **Do not auto-promote.** Always ask: "That scored [score] — want me to save this technique?"
+2. **Include context in the saved data.** A raw parameter value without context (genre, source material, goal) is less useful on recall.
+3. **Group related moves.** If three EQ cuts together solved a masking problem, save them as one technique, not three.
+4. **Tag with genre and role.** A bass EQ technique for house music may not apply to jazz. Include tags for future filtering.
+5. **Review periodically.** Suggest `memory_list` to the user occasionally to prune outdated techniques.
+## Recall Integration
+When starting a new production task:
+1. Call `memory_recall(type, query)` to find relevant past techniques
+2. Present matches with their past scores: "I found a similar technique from a past session (scored 0.85). Want me to try it here?"
+3. If the user agrees, apply the recalled technique and evaluate as normal
+4. If the recalled technique scores lower in the new context, note this — context sensitivity means not all techniques transfer