npm - shmakk - Versions diffs - 1.2.0 → 1.2.2 - Mend

shmakk 1.2.0 → 1.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/README.md +68 -2
package/package.json +2 -2
package/scripts/demo/record.py +196 -0
package/scripts/demo/scenes.html +913 -0
package/skills/media-video-compose.md +320 -0
package/skills/media-video-script.md +204 -0
package/skills/media-video-voice.md +184 -0
package/src/agent-overview.js +320 -0
package/src/agent-roster.js +53 -0
package/src/agent.js +178 -18
package/src/cli.js +220 -86
package/src/completions.js +3 -1
package/src/correction.js +11 -4
package/src/endpoints.js +94 -31
package/src/guard.js +101 -0
package/src/index.js +19 -5
package/src/llm.js +462 -52
package/src/markdown.js +217 -0
package/src/notify.js +34 -0
package/src/pty.js +1 -1
package/src/review.js +8 -1
package/src/self-commands.js +108 -2
package/src/session.js +58 -2
package/src/ssh.js +255 -0
package/src/subagent.js +12 -1
package/src/taskClassifier.js +2 -2
package/src/team.js +22 -0
package/src/tools.js +487 -1
package/src/workflows.js +32 -0

package/skills/media-video-voice.md ADDED Viewed

@@ -0,0 +1,184 @@
+---
+name: video-voice
+description: Generate per-segment voice-over audio using the tts_generate tool. Takes a storyboard JSON from the script agent and produces a WAV audio file for each segment. Part of the video production pipeline.
+category: media
+---
+# Video Voice-Over
+Generate spoken audio for each segment of a video storyboard. This agent receives the script agent's JSON output and produces individual WAV files — one per segment — that the compositor will synchronize with visuals.
+## When to use
+- You receive a storyboard JSON with `segments` containing `narration` fields
+- You are the voice agent in the video production pipeline, running in parallel with the visual agent
+- The user explicitly asks to generate voice-over audio for a video script
+## When not to use
+- The user wants a single TTS clip outside the video pipeline (use `tts_generate` directly)
+- There is no storyboard — wait for the script agent to finish first
+- All narration fields are empty strings (music-only video — skip voice generation)
+## Input format
+You receive the script agent's handoff — a JSON object with a `segments` array:
+```json
+{
+  "segments": [
+    {
+      "index": 0,
+      "durationSec": 3.5,
+      "startSec": 0.0,
+      "narration": "Your best ideas don't wait for the right moment.",
+      "visualDesc": "...",
+      "transition": null
+    }
+  ]
+}
+```
+## Tool: `tts_generate`
+The `tts_generate` tool wraps the local Kokoro TTS engine (`src/services/tts.js`). It takes text and produces a WAV file.
+### Parameters
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `text` | string | Yes | — | The text to synthesize. Must be non-empty. |
+| `voice` | string | No | `"af_heart"` | Voice ID. See available voices below. |
+| `speed` | number | No | `1.5` | Speech rate multiplier. 1.0 = normal, 1.5 = slightly faster (good for video). Range: 0.5–3.0. |
+| `outputPath` | string | No | auto-generated temp path | Where to save the WAV file. Always provide an explicit path in `$SHMAKK_OUTPUT_DIR/voice/` so the compositor can find the files. |
+### Returns
+```json
+{
+  "audioPath": "/path/to/output.wav",
+  "voice": "af_heart",
+  "durationSec": 3.4
+}
+```
+### Available voices
+Kokoro provides multiple voices. Run `tts.listVoices()` to get the full list. Common voices:
+| Voice ID | Language | Gender | Character |
+|----------|----------|--------|-----------|
+| `af_heart` | en-us | female | Warm, natural, good default for narration |
+| `af_bella` | en-us | female | Energetic, younger sounding |
+| `af_nicole` | en-us | female | Calm, professional |
+| `af_sarah` | en-us | female | Bright, articulate |
+| `af_sky` | en-us | female | Soft, gentle |
+| `am_adam` | en-us | male | Deep, authoritative |
+| `am_michael` | en-us | male | Neutral, clear |
+| `am_eric` | en-us | male | Friendly, casual |
+| `am_jesse` | en-us | male | Relaxed, conversational |
+### Voice selection strategy
+- **Single narrator:** Pick one voice for all segments and use it consistently. `af_heart` (female) or `am_michael` (male) are solid defaults unless the user specifies a preference.
+- **Multiple speakers:** If the storyboard narration fields contain speaker labels like `[Interviewer]: ...` and `[Speaker]: ...`, assign different voices to each role. Extract the speaker label, strip it from the text sent to TTS, and apply the assigned voice.
+- **User preference:** If the user specifies a voice (e.g., "use a British female voice" or "deep male voice"), map that to the closest available Kokoro voice. If no match exists, pick the closest and note the choice in the output.
+## Workflow
+### Step 1: Receive the storyboard
+Extract the `segments` array from the script agent's handoff. Validate that it is an array with at least one segment and that segments have non-empty `narration` fields (skip segments where narration is empty).
+### Step 2: Choose voice(s)
+- If the storyboard uses speaker labels, identify all unique speakers
+- Assign a distinct voice to each speaker role
+- If no speaker labels, pick a single voice based on:
+  1. User's explicit request (if any)
+  2. Content tone: energetic → `af_bella`, professional → `af_nicole`, warm → `af_heart`, authoritative → `am_adam`
+  3. Default: `af_heart`
+### Step 3: Create output directory
+Use `make_dir` to create the output directory. The convention is:
+```
+output/voice/
+```
+All audio files go here so the compositor can reference them by path.
+### Step 4: Generate audio per segment
+For each segment with non-empty narration:
+1. **Extract text:** If narration contains a speaker label like `"[Speaker]: text here"`, strip the label and only pass the text after the colon to TTS.
+2. **Call `tts_generate`:** Pass the text, voice, and explicit output path.
+3. **Name files predictably:** Use the pattern `segment-{index}.wav` (e.g., `segment-0.wav`, `segment-1.wav`). This makes it trivial for the compositor to match audio to segments.
+For segments with empty narration, skip generation and mark the segment with `audioPath: null`.
+### Step 5: Collect results
+After all TTS calls complete, assemble the output payload:
+```json
+{
+  "voice": "af_heart",
+  "speed": 1.5,
+  "segments": [
+    {
+      "index": 0,
+      "audioPath": "output/voice/segment-0.wav",
+      "durationSec": 3.4,
+      "voice": "af_heart"
+    },
+    {
+      "index": 1,
+      "audioPath": "output/voice/segment-1.wav",
+      "durationSec": 5.1,
+      "voice": "af_heart"
+    }
+  ]
+}
+```
+### Step 6: Hand off
+Return this payload. The compositor will merge it with the visual agent's output to assemble the final video. Include the `durationSec` for each segment (from `tts_generate` return value) — the compositor uses this to verify timing alignment.
+## Budget awareness
+`tts_generate` costs 1 budget point per call. However, TTS runs locally on Kokoro (no API cost). Be mindful of:
+- Each non-empty narration segment = 1 `tts_generate` call
+- A 12-segment video with all narration = 12 budget points
+- If budget is tight, consider whether segments with very short narration (< 10 words) can be merged with adjacent segments (coordinate with script agent — but if you already have the storyboard, proceed as-is; script changes are the script agent's responsibility)
+## Edge cases
+- **Empty narration (music-only segment):** Skip `tts_generate`. Set `audioPath: null` and `durationSec: null` in the output for that segment. The compositor will use the visual duration for timing.
+- **Very long narration (> 75 words):** Kokoro handles it fine, but video pacing may suffer. Flag in a note but generate the audio anyway.
+- **Speaker label with no colon:** Treat the entire string as narration text. If the label pattern is ambiguous, generate as-is.
+- **TTS generation fails:** If a single segment fails, retry once with `speed: 1.0` (some voices handle slower speeds more reliably). If it still fails, log the error and set `audioPath: null` for that segment — the compositor can still assemble the video with silence for that segment.
+- **Voice not found:** Run `tts.listVoices()` to get the available voice list. Pick the closest match by gender/language. If the user specified a voice that does not exist, explain and pick a fallback.
+## Example
+```bash
+# After receiving storyboard, create output directory and generate audio:
+make_dir output/voice/
+# For each segment with narration:
+tts_generate --text "Your best ideas don't wait for the right moment." \
+  --voice af_heart \
+  --speed 1.5 \
+  --outputPath output/voice/segment-0.wav
+tts_generate --text "They arrive in the shower, on a walk, or right before you fall asleep." \
+  --voice af_heart \
+  --speed 1.5 \
+  --outputPath output/voice/segment-1.wav
+```
+Note: The actual `tts_generate` tool is invoked via the LLM function call interface, not shell commands. The examples above illustrate the parameter values, not the invocation syntax.

package/src/agent-overview.js ADDED Viewed

@@ -0,0 +1,320 @@
+// Agent overview — live tracking registry for multi-agent team execution.
+//
+// Maintains an in-memory registry of all agents (active and completed) during
+// a team run. Provides query methods for the overview self-commands so users
+// can see which agents are working, which skills they use, and drill into
+// specific agents for detailed output.
+//
+// Architecture:
+//   team.js → agentOverview.register(id, meta)   when an agent starts
+//   team.js → agentOverview.update(id, patch)     when an agent finishes
+//   self-commands → agentOverview.getAll()        for overview display
+//   self-commands → agentOverview.get(id)         for detailed drill-down
+const MAX_HISTORY = 50;  // keep at most N completed entries after reset
+// In-memory state — one registry per process lifetime.
+// Keys: agent id (string). Values: entry object.
+const registry = new Map();
+// Stable order of registration (for overview listing).
+const order = [];
+let teamRunActive = false;
+let teamRunId = null;
+// ── Public API ────────────────────────────────────────────────────────────────
+function startTeamRun(id) {
+  teamRunActive = true;
+  teamRunId = id || `team-${Date.now()}`;
+}
+function endTeamRun() {
+  teamRunActive = false;
+  teamRunId = null;
+}
+function isTeamRunActive() {
+  return teamRunActive;
+}
+function register(id, meta) {
+  if (!id) id = `agent-${Date.now()}-${Math.random().toString(36).slice(2, 6)}`;
+  const entry = {
+    id,
+    role: meta.role || 'unknown',
+    skill: meta.skill || null,
+    skillSource: meta.skillSource || null,
+    task: meta.task || '',
+    fileScope: meta.fileScope || null,
+    topology: meta.topology || 'unknown',
+    status: meta.status || 'pending',   // pending | running | done | error
+    startTime: meta.startTime || Date.now(),
+    endTime: null,
+    toolCount: 0,
+    error: null,
+    output: '',       // truncated on store; full accessed via separate buffer
+    outputPreview: '', // first 2000 chars for quick display
+  };
+  registry.set(id, entry);
+  order.push(id);
+  return id;
+}
+function update(id, patch) {
+  const entry = registry.get(id);
+  if (!entry) return false;
+  if (patch.status) entry.status = patch.status;
+  if (patch.endTime) entry.endTime = patch.endTime;
+  if (patch.toolCount !== undefined) entry.toolCount = patch.toolCount;
+  if (patch.error !== undefined) entry.error = patch.error;
+  if (patch.skill !== undefined) entry.skill = patch.skill;
+  if (patch.skillSource !== undefined) entry.skillSource = patch.skillSource;
+  if (patch.output !== undefined) {
+    const stripped = stripAnsi(String(patch.output));
+    entry.outputPreview = stripped.slice(0, 2000);
+    entry.output = stripped.slice(0, 8000); // keep reasonable cap
+  }
+  return true;
+}
+// Mark an agent as running (transitions from pending → running).
+function markRunning(id) {
+  return update(id, { status: 'running', startTime: Date.now() });
+}
+// Mark an agent as completed with result data.
+function markDone(id, { toolCount, output, skill, skillSource } = {}) {
+  return update(id, {
+    status: 'done',
+    endTime: Date.now(),
+    toolCount: toolCount || 0,
+    output: output || '',
+    skill: skill || undefined,
+    skillSource: skillSource || undefined,
+  });
+}
+// Mark an agent as errored.
+function markError(id, error) {
+  return update(id, { status: 'error', endTime: Date.now(), error: String(error || '') });
+}
+// ── Query API ─────────────────────────────────────────────────────────────────
+function get(id) {
+  const entry = registry.get(id);
+  if (!entry) return null;
+  return { ...entry };  // defensive copy
+}
+function getAll() {
+  // Return entries in registration order, newest last.
+  return order.map(id => {
+    const e = registry.get(id);
+    return e ? { ...e } : null;
+  }).filter(Boolean);
+}
+function getActive() {
+  return getAll().filter(e => e.status === 'running' || e.status === 'pending');
+}
+function getCompleted() {
+  return getAll().filter(e => e.status === 'done' || e.status === 'error');
+}
+// Find by role name (case-insensitive partial match).
+function findByRole(role) {
+  const lower = String(role).toLowerCase();
+  return getAll().filter(e => e.role.toLowerCase().includes(lower));
+}
+// Find by skill name.
+function findBySkill(skill) {
+  const lower = String(skill).toLowerCase();
+  return getAll().filter(e => e.skill && e.skill.toLowerCase().includes(lower));
+}
+// ── Reset ─────────────────────────────────────────────────────────────────────
+// Called at the end of a team run to clear state for the next run.
+// Keeps a small history window for post-run inspection.
+function reset() {
+  const completed = getCompleted();
+  // Trim to MAX_HISTORY, keeping most recent.
+  const toKeep = completed.slice(-MAX_HISTORY);
+  registry.clear();
+  order.length = 0;
+  for (const e of toKeep) {
+    registry.set(e.id, e);
+    order.push(e.id);
+  }
+  teamRunActive = false;
+  teamRunId = null;
+}
+// ── Formatting helpers ────────────────────────────────────────────────────────
+function stripAnsi(s) {
+  return String(s || '').replace(/\x1b\[[0-9;]*m/g, '');
+}
+function formatDuration(ms) {
+  if (!ms || ms < 0) return '—';
+  const s = ms / 1000;
+  if (s < 1) return `${Math.round(ms)}ms`;
+  if (s < 60) return `${s.toFixed(1)}s`;
+  const m = Math.floor(s / 60);
+  const sec = Math.round(s % 60);
+  return `${m}m ${sec}s`;
+}
+function statusIcon(status) {
+  switch (status) {
+    case 'pending': return '\x1b[2m○\x1b[0m';     // dim circle
+    case 'running': return '\x1b[36m◉\x1b[0m';     // cyan filled circle
+    case 'done': return '\x1b[32m●\x1b[0m';        // green filled circle
+    case 'error': return '\x1b[31m●\x1b[0m';       // red filled circle
+    default: return '\x1b[2m?\x1b[0m';
+  }
+}
+// Build a compact overview table as an array of strings.
+function formatOverview(agents) {
+  if (!agents || agents.length === 0) {
+    return ['\x1b[2mNo agents registered.\x1b[0m'];
+  }
+  const now = Date.now();
+  const lines = [];
+  // Header
+  const teamTag = teamRunId ? ` \x1b[2m(team: ${teamRunId.slice(-8)})\x1b[0m` : '';
+  lines.push(`\x1b[1mAgent Overview${teamTag}\x1b[0m`);
+  lines.push('');
+  // Column widths
+  const roleWidth = Math.max(8, ...agents.map(a => a.role.length));
+  const skillWidth = Math.max(5, ...agents.map(a => (a.skill || '—').length));
+  const taskWidth = Math.min(60, Math.max(4, ...agents.map(a => (a.task || '').length)));
+  for (const a of agents) {
+    const icon = statusIcon(a.status);
+    const role = a.role.padEnd(roleWidth);
+    const skill = (a.skill || '\x1b[2m—\x1b[0m').padEnd(skillWidth + (a.skill ? 0 : 9)); // +9 for ANSI codes
+    const task = (a.task || '').slice(0, taskWidth);
+    const elapsed = a.endTime
+      ? formatDuration(a.endTime - a.startTime)
+      : formatDuration(now - a.startTime);
+    const tools = a.toolCount > 0 ? `${a.toolCount} tools` : '';
+    lines.push(` ${icon} \x1b[36m${role}\x1b[0m  \x1b[2m${skill.trim()}\x1b[0m  ${task}`);
+    const infoParts = [];
+    if (elapsed) infoParts.push(elapsed);
+    if (tools) infoParts.push(tools);
+    if (a.error) infoParts.push(`\x1b[31m${a.error}\x1b[0m`);
+    lines.push(`   \x1b[2m${' '.repeat(roleWidth)}  ${infoParts.join(' · ')}\x1b[0m`);
+  }
+  // Summary line
+  const active = agents.filter(a => a.status === 'running' || a.status === 'pending').length;
+  const done = agents.filter(a => a.status === 'done').length;
+  const errors = agents.filter(a => a.status === 'error').length;
+  const summaryParts = [];
+  if (active) summaryParts.push(`\x1b[36m${active} active\x1b[0m`);
+  if (done) summaryParts.push(`\x1b[32m${done} done\x1b[0m`);
+  if (errors) summaryParts.push(`\x1b[31m${errors} errors\x1b[0m`);
+  lines.push('');
+  lines.push(`\x1b[2m${agents.length} total${summaryParts.length ? ' · ' + summaryParts.join(' · ') : ''}\x1b[0m`);
+  return lines;
+}
+// Build a detailed single-agent view.
+function formatAgentDetail(agent) {
+  if (!agent) return ['\x1b[31mAgent not found.\x1b[0m'];
+  const now = Date.now();
+  const lines = [];
+  lines.push(`\x1b[1m${agent.role}\x1b[0m  ${statusIcon(agent.status)} ${agent.status}`);
+  lines.push(`\x1b[2mid: ${agent.id}\x1b[0m`);
+  lines.push('');
+  if (agent.task) {
+    lines.push(`\x1b[1mTask\x1b[0m`);
+    lines.push(`  ${agent.task}`);
+    lines.push('');
+  }
+  lines.push(`\x1b[1mDetails\x1b[0m`);
+  lines.push(`  Role:      ${agent.role}`);
+  lines.push(`  Skill:     ${agent.skill || '\x1b[2m(none — using roster hint)\x1b[0m'}`);
+  if (agent.skillSource) lines.push(`  Source:    \x1b[2m${agent.skillSource}\x1b[0m`);
+  lines.push(`  Topology:  ${agent.topology}`);
+  if (agent.fileScope) lines.push(`  Scope:     ${agent.fileScope}`);
+  const elapsed = agent.endTime
+    ? formatDuration(agent.endTime - agent.startTime)
+    : `\x1b[36m${formatDuration(now - agent.startTime)} (running)\x1b[0m`;
+  lines.push(`  Duration:  ${elapsed}`);
+  lines.push(`  Tools:     ${agent.toolCount}`);
+  if (agent.error) {
+    lines.push(`  Error:     \x1b[31m${agent.error}\x1b[0m`);
+  }
+  lines.push('');
+  if (agent.outputPreview) {
+    lines.push(`\x1b[1mOutput\x1b[0m (first 2000 chars)`);
+    lines.push('\x1b[2m──────────────────────────────────────────────────────\x1b[0m');
+    lines.push(agent.outputPreview);
+    lines.push('\x1b[2m──────────────────────────────────────────────────────\x1b[0m');
+    if (agent.output && agent.output.length >= 2000) {
+      lines.push(`\x1b[2m... output truncated (${agent.output.length} total)\x1b[0m`);
+    }
+  }
+  return lines;
+}
+// ── Exports ───────────────────────────────────────────────────────────────────
+module.exports = {
+  // lifecycle
+  startTeamRun,
+  endTeamRun,
+  isTeamRunActive,
+  reset,
+  // mutation
+  register,
+  update,
+  markRunning,
+  markDone,
+  markError,
+  // query
+  get,
+  getAll,
+  getActive,
+  getCompleted,
+  findByRole,
+  findBySkill,
+  // formatting
+  formatOverview,
+  formatAgentDetail,
+  statusIcon,
+  formatDuration,
+};

package/src/agent-roster.js ADDED Viewed

@@ -0,0 +1,53 @@
+// Additional agent roster entries for media/video production roles.
+//
+// These extend the main AGENT_ROSTER in src/team.js with specialist roles
+// for the video production pipeline: script writing and video compositing.
+//
+// The voice and visual roles are handled by the existing media-video-voice
+// and media-imagegen skills respectively.
+//
+// Each entry maps to a skill file in the skills/ directory:
+//   script     → skills/media-video-script.md
+//   compositor → skills/media-video-compose.md
+const AGENT_ROSTER_EXTENSIONS = {
+  script: {
+    profile: 'deep',
+    hint: `Specialist: Video Script Writer
+Focus: turning user prompts into structured timed storyboards for video production.
+Guidelines:
+- Output valid JSON: an array of segments, each with startTime, endTime, narration, visualDesc.
+- startTime/endTime in seconds (floating-point). Total duration must match user request.
+- narration: conversational text suitable for TTS. Keep each segment under 30 seconds of speech (~75 words max).
+- visualDesc: detailed visual prompt for image generation. Describe scene, style, composition, color palette.
+- Match the user's requested tone, pacing, and style. For explainer videos, prefer clear logical flow. For demos, prefer step-by-step walkthrough.
+- If duration or segment count is unclear, ask before finalizing.`,
+    skill: 'media-video-script',
+  },
+  compositor: {
+    profile: 'builder',
+    hint: `Specialist: Video Compositor
+Tools: video_compose (assemble clips/images/audio into a segment), video_concat (join rendered segments), video_probe (inspect metadata).
+Focus: assembling audio, images, and transitions into a final video file.
+Guidelines:
+- Read the script agent's output first — it defines the timeline and assets per segment.
+- For each segment: call video_compose with the image path, audio path, startTime, and endTime to render that segment.
+- Use video_probe to verify audio duration and image dimensions before composing.
+- After all segments are rendered, call video_concat to join them into the final output.
+- Transitions: prefer crossfade (0.3–0.5s) between segments unless otherwise specified.
+- Output format: H.264 video (libx264), AAC audio, .mp4 container. Match the first segment's resolution.
+- If an asset is missing or has wrong duration, report the exact segment and path — do not silently skip.
+- Verify the final output with video_probe: check total duration matches expected.`,
+    skill: 'media-video-compose',
+  },
+};
+// Role-to-skill mapping for these extensions. Used by src/team.js to look up
+// skill files that provide the full agent instructions.
+const ROLE_TO_SKILL_EXTENSIONS = {
+  script: 'media-video-script',
+  compositor: 'media-video-compose',
+};
+module.exports = { AGENT_ROSTER_EXTENSIONS, ROLE_TO_SKILL_EXTENSIONS };