npm - maestro-flow - Versions diffs - 0.4.19 → 0.4.21 - Mend

maestro-flow 0.4.19 → 0.4.21

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (163) hide show

package/.agents/skills/team-adversarial-swarm/specs/convergence-criteria.md ADDED Viewed

@@ -0,0 +1,75 @@
+# Convergence Criteria — Adversarial Edition
+When does the swarm stop? Two-layer convergence: Python signal + adversarial debate.
+## Two-Layer Convergence
+Unlike team-swarm (Python-only), adversarial-swarm uses two layers:
+```
+Layer 1: Python aco.py converged → signal (data-driven)
+Layer 2: wf-swarm-converge → adversarial debate (judgment-driven)
+```
+Python provides the raw signal (stagnation, entropy, budget). The Workflow module
+runs a prosecutor/defender/judge debate to make the final call.
+## Python Stop Conditions (any-of)
+| Criterion | Default | Description |
+|-----------|---------|-------------|
+| `max_iterations` | 5 | Hard cap — always triggers |
+| `stagnation` | patience=2 | Best score unchanged for N iterations |
+| `entropy_floor` | 0.5 | Pheromone entropy below threshold |
+| `budget_tokens` | 100000 | Total token cost exceeded |
+| `target_score` | 0.95 | Best verified_score crosses target |
+## Adversarial Override
+The Python signal is an INPUT to the adversarial debate, not the final decision:
+| Python says | Adversarial debate can |
+|-------------|----------------------|
+| converged=true | Override to CONTINUE if prosecutor makes strong case (rare) |
+| converged=false | Override to STOP if defender makes strong case (quality sufficient) |
+| max_iterations reached | MUST converge (no override — hard safety net) |
+## Debate Decision Rules
+From wf-swarm-converge.js:
+1. `iteration >= max_iterations` → MUST converge (no debate needed)
+2. `iteration == 1` → MUST NOT converge (too early)
+3. Stagnation signal + defender confidence > 60% → converge
+4. Prosecutor confidence > 80% + best score < 0.5 → continue (insufficient quality)
+5. Defender concedes major points → continue
+6. Prosecutor concedes major points → stop
+7. Otherwise → weigh evidence quality
+## Configuration
+```json
+{
+  "convergence": {
+    "max_iterations": 5,
+    "stagnation": { "enabled": true, "patience": 2, "min_delta": 0.01 },
+    "entropy_floor": { "enabled": true, "threshold": 0.5 },
+    "budget_tokens": { "enabled": false, "max": 100000 },
+    "target_score": { "enabled": true, "value": 0.95 }
+  }
+}
+```
+## Why Two-Layer
+Single-layer convergence is fragile:
+- Python-only misses qualitative signals (solution is "good enough" even if score plateau isn't reached)
+- LLM-only is unreliable for numeric comparisons (misreads stagnation signals)
+- Combined: Python provides hard data, adversarial debate provides judgment
+## Anti-Patterns
+- DO NOT disable `max_iterations` — runaway risk
+- DO NOT use `stagnation.patience < 2` — noise triggers false stops
+- DO NOT skip the adversarial debate for iteration 1 — just force the outcome
+- DO NOT let prosecutor override `max_iterations` — hard cap is sacred

package/.agents/skills/team-adversarial-swarm/specs/pheromone-schema.md ADDED Viewed

@@ -0,0 +1,90 @@
+# Pheromone Schema
+Pheromone matrix structure, update formula, evaporation rule.
+Authoritative spec for `pheromone/current.json` and history snapshots.
+Inherited from team-swarm — ACO math is identical.
+## File Layout
+```
+<session>/pheromone/
+├── current.json         # latest state, overwritten each iteration
+├── history/
+│   ├── 1.json           # snapshot after iteration 1
+│   ├── 2.json
+│   └── ...
+└── init.json            # snapshot of initial state (immutable)
+```
+## Schema (pheromone/current.json)
+```json
+{
+  "version": "1.0",
+  "iteration": 3,
+  "n_nodes": 42,
+  "matrix_type": "edge_weighted_sparse",
+  "tau": {
+    "<node_a>::<node_b>": 0.85,
+    "<node_a>::<node_c>": 1.20
+  },
+  "node_tau": {
+    "<node_a>": 0.92,
+    "<node_b>": 1.05
+  },
+  "metadata": {
+    "alpha": 1.0,
+    "beta": 2.0,
+    "rho": 0.2,
+    "q": 1.0,
+    "tau_init": 1.0,
+    "tau_min": 0.01,
+    "tau_max": 10.0
+  },
+  "stats": {
+    "mean": 0.91,
+    "max": 2.34,
+    "min": 0.05,
+    "entropy": 3.21,
+    "n_edges_active": 87
+  }
+}
+```
+## Update Formula
+After iteration k, for each ant a:
+```
+delta_tau_a(edge) = q * verified_score_a  if edge in path_a, else 0
+```
+Then:
+```
+tau(edge) = (1 - rho) * tau(edge) + sum_over_ants(delta_tau_a(edge))
+tau(edge) = clip(tau(edge), tau_min, tau_max)
+```
+**Note**: In adversarial-swarm, `verified_score` comes from the 3-vote adversarial scoring
+module (wf-swarm-score) rather than a single scorer. The pheromone math is identical —
+only the score source changes.
+## Selection Probability
+```
+p(i -> j) = (tau(i,j)^alpha * eta(i,j)^beta) / sum_{k in N(i)}(tau(i,k)^alpha * eta(i,k)^beta)
+```
+## Path-Hints vs Full-Path
+`aco.py select` returns weighted starting nodes + edge probabilities.
+Ant agents make actual node-by-node choices with freedom to deviate.
+## Entropy as Convergence Signal
+Shannon entropy of normalized pheromone:
+- High H → diverse exploration (early stage)
+- Low H → concentrated on few paths (converging)
+- Used by both Python `converged` check AND adversarial convergence debate

package/.agents/skills/team-adversarial-swarm/specs/swarm-config-template.json ADDED Viewed

@@ -0,0 +1,66 @@
+{
+  "_comment": "Config template for team-adversarial-swarm. Generated by coordinator in Phase 1.",
+  "swarm": {
+    "n_ants": 5,
+    "max_iterations": 5,
+    "elite_keep": 3
+  },
+  "aco": {
+    "alpha": 1.0,
+    "beta": 2.0,
+    "rho": 0.2,
+    "q": 1.0,
+    "tau_init": 1.0,
+    "tau_min": 0.01,
+    "tau_max": 10.0
+  },
+  "task_space": {
+    "type": "graph",
+    "nodes": ["node_a", "node_b", "node_c"],
+    "_alt_auto_discover": "auto_discover_from: 'src/**/*.ts'",
+    "max_path_length": 5,
+    "start_nodes": "any",
+    "edges": "complete"
+  },
+  "scoring": {
+    "_comment": "mode: adversarial (3-vote per ant) | script | fallback",
+    "mode": "adversarial",
+    "rubric": "",
+    "self_score_discount": 0.5,
+    "_alt_script": "script_path: './my-scoring-rule.py'"
+  },
+  "ant_prompt": {
+    "objective": "Find the file with the highest density of suspicious code patterns",
+    "evidence_requirements": [
+      "At least 1 file:line reference per node visited",
+      "Concrete code snippet showing the suspicious pattern"
+    ],
+    "tools_hint": "Use Grep + Read for code-based exploration"
+  },
+  "convergence": {
+    "max_iterations": 5,
+    "stagnation": {
+      "enabled": true,
+      "patience": 2,
+      "min_delta": 0.01
+    },
+    "entropy_floor": {
+      "enabled": true,
+      "threshold": 0.5
+    },
+    "budget_tokens": {
+      "enabled": false,
+      "max": 100000
+    },
+    "target_score": {
+      "enabled": true,
+      "value": 0.95
+    }
+  }
+}

package/.agents/skills/team-adversarial-swarm/specs/swarm-protocol.md ADDED Viewed

@@ -0,0 +1,105 @@
+# Swarm Protocol — Adversarial Edition
+Defines how the SKILL.md coordinator, Python ACO controller, and modular Workflow scripts interface.
+## Design Philosophy
+| Principle | Rationale |
+|-----------|-----------|
+| **Outer loop = coordinator, inner loop = Workflow** | Coordinator drives iteration lifecycle; Workflow scripts handle multi-agent parallelism |
+| **ACO math = Python script** | Optimization math is cheap and deterministic; kept in Python for correctness |
+| **Exploration = Workflow agents** | LLM agents explore task space in parallel via `parallel()` |
+| **Every decision = adversarial** | No single-agent verdicts — scoring, convergence, synthesis all use adversarial patterns |
+| **Schema-locked ant output** | Same contract as team-swarm; LLM output → algorithm input via strict JSON |
+| **Modular composition** | 4 independent Workflow scripts, composable in any order |
+## Three-Component Architecture
+```
++-----------------------------------------------------+
+|  SKILL.md Coordinator                               |
+|  - Parses user task → emits swarm-config.json       |
+|  - Iteration loop: calls Python + Workflow modules  |
+|  - Data bridge between Workflow calls               |
++--------+--------------------------------------------+
+         | Bash subprocess     | Workflow({scriptPath})
+         v                     v
++---------------------+  +--------------------------------+
+|  Python ACO         |  |  Workflow Modules (4x)          |
+|  scripts/aco.py     |  |  wf-swarm-explore.js            |
+|  - init/select/     |  |  wf-swarm-score.js              |
+|    update/converge  |  |  wf-swarm-converge.js           |
+|  - Owns pheromone   |  |  wf-swarm-synthesize.js         |
++---------------------+  |  - parallel()/pipeline() agents |
+                          |  - Adversarial decision gates   |
+                          +--------------------------------+
+```
+## Iteration Lifecycle
+```
+[Coordinator] Phase 1: generate swarm-config.json
+[Coordinator] Phase 2: python aco.py init
+[Coordinator] Phase 3: iteration k = 1..K
+  ├─ python aco.py select --iter k → assignments
+  ├─ Workflow(wf-swarm-explore, args={assignments...}) → ant_results
+  ├─ Workflow(wf-swarm-score, args={ant_results...}) → verified_scores
+  ├─ Write scores → python aco.py update --iter k → pheromone updated
+  ├─ Workflow(wf-swarm-converge, args={best, history...}) → {converged}
+  └─ if converged: break
+[Coordinator] Phase 4:
+  ├─ python aco.py report → best + top_k + curve
+  └─ Workflow(wf-swarm-synthesize, args={best, top_k...}) → best-solution.md
+```
+## Script ↔ Coordinator Contract
+Same as team-swarm. All scripts:
+- Read from `<session>/...` (via `--session` flag)
+- Emit JSON to stdout
+- Exit 0 = success, 1 = error, 2 = config invalid
+- Idempotent: calling update twice for same iteration is safe
+| Subcommand | Input | Output (stdout JSON) | Side effects |
+|------------|-------|---------------------|--------------|
+| `init` | swarm-config.json | `{status, pheromone_path, n_nodes}` | writes pheromone/current.json, task-space.json |
+| `select --iter k` | pheromone/current.json | `{assignments: [{ant_id, path_hints, ...}]}` | none |
+| `update --iter k` | artifacts/ant-k-*.json, scores | `{mean_score, best_score, delta}` | writes pheromone + trails + best.json |
+| `converged` | history/, best.json, config | `{converged, reason, metrics}` | none |
+| `report` | best.json, history/ | full report JSON | none |
+## Coordinator ↔ Workflow Contract
+Each Workflow module receives `args` and returns structured JSON.
+| Module | args (input) | return (output) |
+|--------|-------------|-----------------|
+| explore | `{iteration, assignments[], objective, session, config, task_space, wisdom}` | `{ant_results[], metadata}` |
+| score | `{iteration, ant_results[], objective, rubric}` | `{votes, calibration, metadata}` |
+| converge | `{iteration, best, history[], config}` | `{converged, reason, confidence, debate}` |
+| synthesize | `{best, top_k[], convergence_story, objective, total_iterations, total_ants}` | `{perspectives, synthesis, metadata}` |
+**Key rule**: Coordinator writes ant artifacts and scores to disk BETWEEN Workflow calls.
+Workflow agents can read files but structured data flows through args/return.
+## Adversarial Patterns by Module
+| Module | Pattern | Agents per decision |
+|--------|---------|-------------------|
+| explore | Parallel ants + cross-validation | N ants + N validators |
+| score | Prosecutor/Defender/Judge per ant | 3 × N ants + 1 calibrator |
+| converge | Prosecutor(continue) / Defender(stop) / Judge | 3 agents |
+| synthesize | 3-perspective (why-won/stability/caveats) + Arbitrator | 4 agents |
+## vs team-swarm Protocol
+| Aspect | team-swarm | team-adversarial-swarm |
+|--------|-----------|----------------------|
+| Worker spawning | delegate_subagent(team-worker) + callback | Workflow parallel()/pipeline() |
+| Scoring | Single scorer OR script | 3-vote adversarial per ant |
+| Convergence | Python script only | Python signal + adversarial debate |
+| Synthesis | Single analyst | 3-perspective + arbitrator |
+| Session management | create_team + team_msg | Coordinator direct file I/O |
+| Data flow | Message bus + file artifacts | args → Workflow → return + files |

package/.agents/skills/team-adversarial-swarm/workflows/wf-swarm-converge.js ADDED Viewed

@@ -0,0 +1,197 @@
+export const meta = {
+  name: 'wf-swarm-converge',
+  description: 'Adversarial convergence decision — prosecutor(continue) vs defender(stop) vs judge resolves',
+  whenToUse: 'After each ACO iteration: adversarial debate on whether swarm has converged or should continue',
+  phases: [
+    { title: 'Argue', detail: 'Prosecutor argues to continue, Defender argues to stop' },
+    { title: 'Judge', detail: 'Judge resolves debate with evidence-weighted verdict' },
+  ],
+}
+const ARGUMENT_SCHEMA = {
+  type: 'object',
+  properties: {
+    role: { type: 'string', enum: ['prosecutor', 'defender'] },
+    stance: { type: 'string', enum: ['continue', 'stop'] },
+    argument: { type: 'string' },
+    key_points: {
+      type: 'array',
+      items: {
+        type: 'object',
+        properties: {
+          point: { type: 'string' },
+          evidence: { type: 'string' },
+          strength: { type: 'string', enum: ['strong', 'moderate', 'weak'] },
+        },
+        required: ['point', 'evidence', 'strength'],
+      },
+    },
+    concessions: { type: 'array', items: { type: 'string' } },
+    confidence: { type: 'number', minimum: 0, maximum: 100 },
+  },
+  required: ['role', 'stance', 'argument', 'key_points', 'confidence'],
+}
+const VERDICT_SCHEMA = {
+  type: 'object',
+  properties: {
+    converged: { type: 'boolean' },
+    reason: { type: 'string' },
+    confidence: { type: 'number', minimum: 0, maximum: 100 },
+    adversarial_outcome: {
+      type: 'object',
+      properties: {
+        prosecutor_confidence: { type: 'number' },
+        defender_confidence: { type: 'number' },
+        decisive_factor: { type: 'string' },
+        prosecutor_concessions: { type: 'array', items: { type: 'string' } },
+        defender_concessions: { type: 'array', items: { type: 'string' } },
+      },
+      required: ['prosecutor_confidence', 'defender_confidence', 'decisive_factor'],
+    },
+    recommendation: { type: 'string' },
+  },
+  required: ['converged', 'reason', 'confidence', 'adversarial_outcome'],
+}
+const iteration = args?.iteration || 1
+const best = args?.best || {}
+const history = args?.history || []
+const config = args?.config || {}
+const patience = config.patience || 2
+const minImprovement = config.min_improvement || 0.01
+const maxIterations = config.max_iterations || 5
+const historyDigest = history.map((h, i) =>
+  `Iter ${i + 1}: best=${h.best_score} mean=${h.mean_score} delta=${h.delta || 'n/a'} ants=${h.completed_ants || 'n/a'}`
+).join('\n')
+const improvementTrend = history.length >= 2
+  ? history.slice(-patience).map(h => h.delta || 0)
+  : []
+const stagnating = improvementTrend.length >= patience && improvementTrend.every(d => Math.abs(d) < minImprovement)
+// Phase 1: Adversarial Debate
+phase('Argue')
+log(`Iteration ${iteration}: adversarial convergence debate...`)
+const debate = await parallel([
+  () => agent(
+    `You are the PROSECUTOR. Argue that the swarm should CONTINUE exploring.
+## Current State
+Iteration: ${iteration} of ${maxIterations}
+Best score: ${best.score || 'unknown'}
+Best ant: ${best.ant_id || 'unknown'}
+## Score History
+${historyDigest || 'No history yet'}
+## Convergence Config
+Patience: ${patience} (stop if no improvement for this many iterations)
+Min improvement: ${minImprovement}
+Max iterations: ${maxIterations}
+## Stagnation Signal
+${stagnating ? 'YES — last ' + patience + ' iterations show < ' + minImprovement + ' improvement' : 'NO — improvement still occurring or not enough data'}
+## Your Job: Argue to CONTINUE
+Build the strongest case that the swarm should keep going:
+- Best score isn't good enough yet (absolute quality)
+- Score variance across ants suggests unexplored promising regions
+- Pheromone entropy is still high (many paths competitive)
+- Budget allows more iterations
+- Recent deviations from pheromone hints produced discoveries
+Acknowledge when stopping would be reasonable — concessions add credibility.
+Your confidence reflects how genuinely strong your case is.`,
+    { label: 'prosecutor:continue', phase: 'Argue', schema: ARGUMENT_SCHEMA }
+  ),
+  () => agent(
+    `You are the DEFENDER. Argue that the swarm should STOP and declare convergence.
+## Current State
+Iteration: ${iteration} of ${maxIterations}
+Best score: ${best.score || 'unknown'}
+Best ant: ${best.ant_id || 'unknown'}
+## Score History
+${historyDigest || 'No history yet'}
+## Convergence Config
+Patience: ${patience} (stop if no improvement for this many iterations)
+Min improvement: ${minImprovement}
+Max iterations: ${maxIterations}
+## Stagnation Signal
+${stagnating ? 'YES — last ' + patience + ' iterations show < ' + minImprovement + ' improvement' : 'NO — improvement still occurring or not enough data'}
+## Your Job: Argue to STOP
+Build the strongest case that the swarm has converged:
+- Best score is stable across recent iterations
+- Multiple ants converging on similar paths (low entropy)
+- Diminishing returns — each iteration yields less improvement
+- Best solution quality is sufficient for the objective
+- Further iterations would waste budget without meaningful gain
+Acknowledge when continuing might help — concessions add credibility.
+Your confidence reflects how genuinely strong your case is.`,
+    { label: 'defender:stop', phase: 'Argue', schema: ARGUMENT_SCHEMA }
+  ),
+])
+const validDebate = debate.filter(Boolean)
+const prosecutor = validDebate.find(a => a.role === 'prosecutor')
+const defender = validDebate.find(a => a.role === 'defender')
+const debateDigest = validDebate.map(a =>
+  `### ${a.role.toUpperCase()} (stance: ${a.stance}, confidence: ${a.confidence}%)\n${a.argument}\nKey points:\n${a.key_points.map(p => '- [' + p.strength + '] ' + p.point).join('\n')}\nConcessions: ${a.concessions.join('; ') || 'none'}`
+).join('\n\n---\n\n')
+log(`Prosecutor: ${prosecutor ? prosecutor.confidence : '?'}% for continue | Defender: ${defender ? defender.confidence : '?'}% for stop`)
+// Phase 2: Judge resolves
+phase('Judge')
+log('Judge resolving convergence debate...')
+const verdict = await agent(
+  `You are the JUDGE. Two advocates debated whether this swarm should continue or stop.
+=== DEBATE ===
+${debateDigest}
+=== OBJECTIVE DATA ===
+Iteration: ${iteration} of ${maxIterations}
+Best score: ${best.score || 'unknown'}
+Score history: ${historyDigest || 'none'}
+Stagnation signal: ${stagnating ? 'YES' : 'NO'}
+=== DECISION RULES ===
+1. If iteration >= max_iterations → MUST converge (safety net)
+2. If iteration == 1 → MUST NOT converge (need at least 2 iterations)
+3. If stagnation signal AND defender confidence > 60% → converge
+4. If prosecutor confidence > 80% AND best score < 0.5 → continue (insufficient quality)
+5. If defender concedes major points → likely should continue
+6. If prosecutor concedes major points → likely should stop
+7. Otherwise → weigh evidence quality from both sides
+Record the adversarial_outcome with both confidences, concessions, and the decisive factor.
+Provide a recommendation for what to focus on if continuing.`,
+  { label: 'judge', phase: 'Judge', schema: VERDICT_SCHEMA }
+)
+return {
+  iteration: iteration,
+  converged: verdict ? verdict.converged : (iteration >= maxIterations),
+  reason: verdict ? verdict.reason : 'max_iterations_reached',
+  confidence: verdict ? verdict.confidence : 100,
+  adversarial_outcome: verdict ? verdict.adversarial_outcome : null,
+  debate: { prosecutor: prosecutor, defender: defender },
+  metadata: {
+    best_score: best.score,
+    stagnation_signal: stagnating,
+    iteration_of_max: iteration + '/' + maxIterations,
+    prosecutor_confidence: prosecutor ? prosecutor.confidence : null,
+    defender_confidence: defender ? defender.confidence : null,
+  },
+}