npm - @yemi33/minions - Versions diffs - 0.1.2122 → 0.1.2123 - Mend

@yemi33/minions 0.1.2122 → 0.1.2123

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/docs/harness-mode.md ADDED Viewed

@@ -0,0 +1,92 @@
+# Tri-Agent Harness Mode
+> Status: opt-in feature flag on scheduled tasks (`harness_mode: "tri_agent"`).
+> Shipped: W-mq07a9gf000jbc2b. Module: [`engine/harness.js`](../engine/harness.js).
+## What it is
+A way to turn one schedule firing into a coordinated **Planner → Generator → Evaluator** trio that iterates on a shared on-disk artifact until the artifact meets a rubric or hits an iteration cap. Useful for "produce a piece of work, then improve it" loops where a single agent call would either underspecify the task or produce uneven quality.
+The three roles in order:
+1. **Planner** (`ask` type, read-only) — reads the rubric, writes a short plan into the mission directory.
+2. **Generator** (defaults to `ask`, inherits `sched.type`) — produces the artifact at `<MINIONS_DIR>/engine/harness/<missionId>/artifact.md` per the plan.
+3. **Evaluator** (`ask`, read-only) — scores the artifact against the rubric and reports a verdict.
+If the evaluator's verdict score is below `harness_threshold` (and the iteration cap hasn't been hit), the engine appends a fresh `Generator → Evaluator` pair carrying the evaluator's feedback in the next generator's prompt. Loop continues until pass or cap.
+## Config schema (add to a schedule in `config.json`)
+```json
+{
+  "id": "weekly-design-review",
+  "title": "Tri-agent design review",
+  "cron": "0 9 * * MON",
+  "type": "ask",
+  "harness_mode": "tri_agent",
+  "harness_rubric": "Score 0-1. 1.0 = all sections complete with code examples. 0 = missing sections.",
+  "harness_threshold": 0.7,
+  "harness_max_iterations": 5
+}
+```
+| Field                    | Required | Default | Notes                                                                 |
+|--------------------------|----------|---------|-----------------------------------------------------------------------|
+| `harness_mode`           | yes      | —       | Must equal `"tri_agent"` to enable. Any other value falls back to plain scheduled work. |
+| `harness_rubric`         | yes      | —       | Non-empty string. Injected into every role's prompt. The evaluator scores against this. |
+| `harness_threshold`      | no       | `0.7`   | Number in `(0, 1]`. Verdict score `>= threshold` = pass; `<` = iterate. |
+| `harness_max_iterations` | no       | `5`     | Positive integer, capped at `20`. Counts generator iterations; planner is iteration 1. |
+Invalid harness config logs a warning and **skips the firing without recording a schedule run**, so fixing the config and waiting for the next cron tick is enough to recover — no manual reset needed.
+## Lifecycle
+```
+cron fires
+  └─ scheduler.discoverScheduledWork detects harness_mode === 'tri_agent'
+       └─ validateHarnessConfig (skip+warn on failure)
+            └─ createTriAgentMission → 3 work items
+                 ├─ Planner   (iteration 1)
+                 ├─ Generator (iteration 1, depends on Planner)
+                 └─ Evaluator (iteration 1, depends on Generator)
+                      │
+                      ▼ (on success)
+                 lifecycle.runPostCompletionHooks
+                  └─ handleHarnessIterationResult
+                       └─ parseEvaluatorVerdict + shouldIterateAgain
+                            └─ if iterate: append Generator + Evaluator (iteration N+1)
+                                 └─ next tick dispatches them
+                            └─ if pass / cap / inconclusive: mission terminal
+```
+## Artifact layout
+```
+<MINIONS_DIR>/engine/harness/<missionId>/
+  └─ artifact.md            ← Generator writes here, Evaluator reads here
+```
+Mission ID format: `<scheduleId>-<unixMs>-<rand6>`. The mission directory is the contract — agents in all 3 roles get the same path injected into their prompts.
+## Evaluator verdict protocol
+The evaluator can signal pass/fail/score either way:
+- **Preferred (structured):** include the fields in the completion report sidecar:
+  ```json
+  { "harness_pass": true, "harness_score": 0.82, "harness_feedback": "all sections present" }
+  ```
+- **Fallback (text):** include `Score: 0.82` and `PASS` / `FAIL` in the summary. Structured fields win when both present. `FAIL` takes precedence when both `PASS` and `FAIL` appear in the text.
+If neither signal is parseable, the harness treats the verdict as inconclusive and stops iterating (`shouldIterateAgain` returns false) to avoid an infinite loop driven by a silent agent.
+## Dedup behavior (engine.js)
+Within a single tick the standard scheduled-work dedup is keyed by `_scheduleId`, which would collapse the harness trio to one item. The harness trio share a `_missionId`; engine.js snapshots active mission IDs **before** the dedup loop so all 3 land together, while plain scheduled items keep the original `_scheduleId` dedup.
+## Operational notes
+- Tri-agent items are **schedule-driven** — there's no manual "fire a harness mission" entry point. Add a schedule with `harness_mode: "tri_agent"` to opt in.
+- Iteration pairs always reuse the original mission's artifact path, threshold, max-iterations, and rubric. The evaluator's verdict feedback is appended to the next generator's prompt.
+- Mission state lives entirely on disk: the work-items.json trio + the artifact file. No new DB tables.
+- Each iteration's evaluator is a separate work item, so dispatch retries, cooldowns, and steering apply normally to every role.

package/engine/ado.js CHANGED Viewed

@@ -975,6 +975,15 @@ async function forEachActivePr(config, token, callback) {
       continue;
     }
+    // Per-project throttle skip — emit one log line per skipped project, then continue.
+    // Sub-item W-mq03l6zh0006f0a1-b will replace the global isAdoThrottled() probe with
+    // a per-org `isOrgBaseThrottled(orgBase)` check so a 429 on one org no longer pauses
+    // polling for healthy orgs.
+    if (isAdoThrottled()) {
+      log('info', `[ado] PR poll skipped for ${project.name || project.repoName || 'unknown project'} — org ${orgBase} throttled`);
+      continue;
+    }
     // Parallelize PR polling within each project (max 5 concurrent to avoid rate limits)
     const CONCURRENCY = 5;
     for (let i = 0; i < activePrs.length; i += CONCURRENCY) {

package/engine/github.js CHANGED Viewed

@@ -295,7 +295,10 @@ function resetSlugBackoff(slug) {
 // ─── GitHub Rate-Limit Throttle ────────────────────────────────────────────
 // Tracks rate-limiting from GitHub API (gh CLI exits non-zero with rate-limit messages).
 // GitHub rate limits reset hourly, so cap at 60 min.
-const _ghThrottle = createThrottleTracker({ label: 'gh', baseBackoffMs: 60000, maxBackoffMs: 60 * 60000 });
+// jitterRatio: 0.2 — apply ±20% random jitter to backoff to avoid thundering herd
+// when many concurrent gh calls race the same 1-hr reset window. See sub-item
+// W-mq03l6zh0006f0a1-a for the createThrottleTracker jitter math.
+const _ghThrottle = createThrottleTracker({ label: 'gh', baseBackoffMs: 60000, maxBackoffMs: 60 * 60000, jitterRatio: 0.2 });
 /** Returns true if GitHub is rate-limited and retryAfter hasn't elapsed. */
 const isGhThrottled = () => _ghThrottle.isThrottled();

package/engine/harness.js ADDED Viewed

@@ -0,0 +1,592 @@
+/**
+ * engine/harness.js — Tri-agent harness mode for long-running missions.
+ *
+ * Inspired by Anthropic's Harness Design (April 2026), this module implements
+ * the "tri_agent" mission mode: a schedule firing produces three coordinated
+ * work items — Planner → Generator → Evaluator — that operate against a
+ * shared artifact on disk. The Evaluator scores the artifact against a rubric
+ * declared in the schedule config; if it scores below the threshold (default
+ * 0.7), the engine spawns another Generator+Evaluator iteration with the
+ * Evaluator's feedback injected. The cycle terminates on pass or after
+ * `harness_max_iterations` (default 5) iterations — whichever comes first.
+ *
+ * Why a separate module:
+ *   - Keeps scheduler.js focused on cron parsing + run-dedup.
+ *   - The iteration loop lives in lifecycle.js so the engine's existing
+ *     post-completion hook chain owns retry orchestration.
+ *   - Pure helpers here (no side effects beyond reading shared.MINIONS_DIR)
+ *     are easy to test in isolation.
+ *
+ * Schedule config shape (additive on top of the cron schedule schema):
+ *   {
+ *     id: 'daily-research',
+ *     cron: '0 9 * *',
+ *     title: 'Daily research pass',
+ *     description: 'Summarize new arxiv papers',
+ *     project: 'minions',                    // optional
+ *     type: 'ask',                            // generator work-type (default 'ask')
+ *     harness_mode: 'tri_agent',              // REQUIRED to enable harness mode
+ *     harness_rubric: 'Must cite ≥3 papers', // REQUIRED — passed to Evaluator
+ *     harness_threshold: 0.7,                 // default 0.7, must be (0, 1]
+ *     harness_max_iterations: 5,              // default 5, integer in [1, 20]
+ *   }
+ *
+ * Per-item _harness meta (carried through dispatch + lifecycle):
+ *   {
+ *     role: 'planner' | 'generator' | 'evaluator',
+ *     iteration: 1,                  // generators/evaluators bump this on retry
+ *     missionId: 'sched-id-1700-abc',
+ *     artifactPath: '<MINIONS_DIR>/engine/harness/<missionId>/artifact.md',
+ *     rubric: '...',                 // verbatim from schedule
+ *     threshold: 0.7,
+ *     maxIterations: 5,
+ *     generatorType: 'ask',          // remembered so iteration N+1 reuses it
+ *   }
+ *
+ * Zero dependencies beyond Node built-ins + engine/shared.
+ */
+'use strict';
+const path = require('path');
+const shared = require('./shared');
+const { MINIONS_DIR, WI_STATUS, ts } = shared;
+const HARNESS_MODE = Object.freeze({
+  TRI_AGENT: 'tri_agent',
+});
+const HARNESS_ROLE = Object.freeze({
+  PLANNER: 'planner',
+  GENERATOR: 'generator',
+  EVALUATOR: 'evaluator',
+});
+const HARNESS_DEFAULTS = Object.freeze({
+  threshold: 0.7,
+  maxIterations: 5,
+  // Used when sched.type is absent. Ask is read-only and produces no PR, which
+  // matches the "research / synthesis" use case the harness was designed for.
+  generatorType: 'ask',
+});
+const HARNESS_MAX_ITERATIONS_CAP = 20;
+// Filesystem layout for harness artifacts. Each mission gets its own dir so
+// concurrent missions don't stomp each other's artifacts.
+function harnessRootDir() {
+  return path.join(MINIONS_DIR, 'engine', 'harness');
+}
+function harnessMissionDir(missionId) {
+  return path.join(harnessRootDir(), missionId);
+}
+function harnessArtifactPath(missionId) {
+  return path.join(harnessMissionDir(missionId), 'artifact.md');
+}
+/**
+ * Validate a schedule's tri-agent harness configuration.
+ * Returns { valid: boolean, errors: string[], resolved?: { threshold, maxIterations, generatorType } }.
+ * The `resolved` object reflects the defaults that will be applied (only populated when valid=true).
+ */
+function validateHarnessConfig(sched) {
+  const errors = [];
+  if (!sched || typeof sched !== 'object') {
+    return { valid: false, errors: ['schedule must be an object'] };
+  }
+  if (sched.harness_mode !== HARNESS_MODE.TRI_AGENT) {
+    errors.push(`harness_mode must be "${HARNESS_MODE.TRI_AGENT}" (got ${JSON.stringify(sched.harness_mode)})`);
+  }
+  if (typeof sched.harness_rubric !== 'string' || sched.harness_rubric.trim().length === 0) {
+    errors.push('harness_rubric is required (non-empty string)');
+  }
+  let threshold = HARNESS_DEFAULTS.threshold;
+  if (sched.harness_threshold !== undefined && sched.harness_threshold !== null) {
+    if (typeof sched.harness_threshold !== 'number' || !Number.isFinite(sched.harness_threshold)
+        || sched.harness_threshold <= 0 || sched.harness_threshold > 1) {
+      errors.push(`harness_threshold must be a number in (0, 1] (got ${JSON.stringify(sched.harness_threshold)})`);
+    } else {
+      threshold = sched.harness_threshold;
+    }
+  }
+  let maxIterations = HARNESS_DEFAULTS.maxIterations;
+  if (sched.harness_max_iterations !== undefined && sched.harness_max_iterations !== null) {
+    const n = sched.harness_max_iterations;
+    if (!Number.isInteger(n) || n < 1 || n > HARNESS_MAX_ITERATIONS_CAP) {
+      errors.push(`harness_max_iterations must be a positive integer ≤ ${HARNESS_MAX_ITERATIONS_CAP} (got ${JSON.stringify(n)})`);
+    } else {
+      maxIterations = n;
+    }
+  }
+  if (errors.length > 0) return { valid: false, errors };
+  return {
+    valid: true,
+    errors: [],
+    resolved: {
+      threshold,
+      maxIterations,
+      generatorType: typeof sched.type === 'string' && sched.type.trim() ? sched.type.trim() : HARNESS_DEFAULTS.generatorType,
+    },
+  };
+}
+// ─── ID + path helpers ──────────────────────────────────────────────────────
+function _shortRand() {
+  // 6 hex chars from current ms entropy — collisions are vanishingly rare for
+  // distinct scheduler ticks and harness has no cross-process write contention
+  // (each mission's artifact dir is freshly created).
+  return Math.random().toString(36).slice(2, 8);
+}
+function _buildMissionId(sched, nowMs) {
+  const base = sched && typeof sched.id === 'string' ? sched.id : 'mission';
+  return `${base}-${nowMs}-${_shortRand()}`;
+}
+function _buildItemId(scheduleId, role, iteration, nowMs) {
+  return `sched-${scheduleId}-${role}-i${iteration}-${nowMs}-${_shortRand()}`;
+}
+// ─── Prompt builders ────────────────────────────────────────────────────────
+const RUBRIC_HEADING = '## Rubric';
+const ARTIFACT_HEADING = '## Shared Artifact';
+function _buildPlannerDescription(sched, ctx) {
+  const { artifactPath, missionId, threshold, maxIterations, rubric, iteration } = ctx;
+  return [
+    `# Tri-Agent Mission — Planner (iteration ${iteration})`,
+    '',
+    `**Mission:** ${sched.title || sched.id}`,
+    '',
+    `**Goal:** ${(sched.description || sched.title || '').trim()}`,
+    '',
+    'You are the **Planner** in a three-agent harness loop. Your job is to',
+    'decompose the mission goal above into a numbered list of concrete subtasks',
+    'that the Generator will execute next, then write the plan to the shared',
+    'artifact below. Keep subtasks small and verifiable.',
+    '',
+    ARTIFACT_HEADING,
+    '',
+    `Write your plan to: \`${artifactPath}\``,
+    '',
+    'Structure the file as:',
+    '```',
+    `# Mission ${missionId}`,
+    '',
+    '## Plan',
+    '1. <subtask 1>',
+    '2. <subtask 2>',
+    '...',
+    '```',
+    '',
+    'Do not execute the subtasks yourself — that is the Generator\'s job.',
+    'Create the artifact directory if it does not exist.',
+    '',
+    RUBRIC_HEADING,
+    '',
+    'The Evaluator will eventually score the completed artifact against this',
+    'rubric. Plan with the rubric in mind:',
+    '',
+    '> ' + rubric.split('\n').join('\n> '),
+    '',
+    `Threshold: ${threshold} · Max iterations: ${maxIterations}`,
+    '',
+    `Mission ID: \`${missionId}\``,
+  ].join('\n');
+}
+function _buildGeneratorDescription(sched, ctx, opts) {
+  const { artifactPath, missionId, threshold, maxIterations, rubric, iteration } = ctx;
+  const previousFeedback = opts && opts.previousFeedback ? String(opts.previousFeedback).trim() : '';
+  const lines = [
+    `# Tri-Agent Mission — Generator (iteration ${iteration})`,
+    '',
+    `**Mission:** ${sched.title || sched.id}`,
+    '',
+    `**Goal:** ${(sched.description || sched.title || '').trim()}`,
+    '',
+    'You are the **Generator** in a three-agent harness loop. Read the Planner\'s',
+    'subtask list from the shared artifact, execute each subtask in order, and',
+    'append your outputs to the artifact under a clearly-labelled section.',
+    '',
+    ARTIFACT_HEADING,
+    '',
+    `Shared artifact: \`${artifactPath}\``,
+    '',
+    `Append a section titled \`## Generator Output (iteration ${iteration})\` to the`,
+    'artifact. Within it, address each numbered subtask from the plan.',
+    '',
+    'Do NOT delete or rewrite earlier sections — append only.',
+  ];
+  if (previousFeedback) {
+    lines.push(
+      '',
+      `## Previous Evaluator Feedback (iteration ${iteration - 1})`,
+      '',
+      'The previous iteration failed the rubric. The Evaluator provided this feedback:',
+      '',
+      '> ' + previousFeedback.split('\n').join('\n> '),
+      '',
+      'Address this feedback explicitly in your new output.',
+    );
+  }
+  lines.push(
+    '',
+    RUBRIC_HEADING,
+    '',
+    'The Evaluator will score your output against this rubric:',
+    '',
+    '> ' + rubric.split('\n').join('\n> '),
+    '',
+    `Threshold: ${threshold} · Max iterations: ${maxIterations}`,
+    '',
+    `Mission ID: \`${missionId}\``,
+  );
+  return lines.join('\n');
+}
+function _buildEvaluatorDescription(sched, ctx) {
+  const { artifactPath, missionId, threshold, maxIterations, rubric, iteration } = ctx;
+  return [
+    `# Tri-Agent Mission — Evaluator (iteration ${iteration})`,
+    '',
+    `**Mission:** ${sched.title || sched.id}`,
+    '',
+    'You are the **Evaluator** in a three-agent harness loop. Read the shared',
+    'artifact (including the Planner\'s plan and the Generator\'s output) and',
+    'score it against the rubric below.',
+    '',
+    ARTIFACT_HEADING,
+    '',
+    `Shared artifact: \`${artifactPath}\``,
+    '',
+    'Append a section titled `## Evaluation (iteration ' + iteration + ')` containing:',
+    '- A numeric score in `[0, 1]` formatted as `Score: 0.NN`',
+    '- A `PASS` or `FAIL` verdict on its own line',
+    '- Concrete feedback under `### Feedback` explaining strengths and gaps',
+    '',
+    RUBRIC_HEADING,
+    '',
+    '> ' + rubric.split('\n').join('\n> '),
+    '',
+    `**Threshold:** ${threshold} — a score < ${threshold} is a FAIL and triggers another Generator iteration`,
+    `(up to ${maxIterations} total iterations).`,
+    '',
+    '## Completion Report',
+    '',
+    'In your JSON completion report include these fields so the engine can route',
+    'the next iteration deterministically (in addition to the standard schema):',
+    '```json',
+    '{',
+    '  "status": "success",',
+    '  "summary": "<one-line verdict>",',
+    '  "harness_score": 0.NN,',
+    '  "harness_pass": true | false,',
+    '  "harness_feedback": "<machine-readable feedback the next Generator should address>"',
+    '}',
+    '```',
+    '',
+    'If you cannot evaluate (artifact missing, malformed), set `harness_pass: false`,',
+    '`harness_score: 0`, and explain in `harness_feedback`.',
+    '',
+    `Mission ID: \`${missionId}\``,
+  ].join('\n');
+}
+// ─── Mission creation ───────────────────────────────────────────────────────
+function _commonItemFields(sched, role, iteration) {
+  return {
+    title: `[harness:${role}:i${iteration}] ${sched.title || sched.id}`,
+    priority: sched.priority || 'medium',
+    status: WI_STATUS.PENDING,
+    created: ts(),
+    createdBy: 'scheduler:harness',
+    project: sched.project || null,
+    agent: null,
+    _scheduleId: sched.id,
+  };
+}
+function _buildHarnessMeta(missionId, role, iteration, resolved, sched, artifactPath) {
+  return {
+    role,
+    iteration,
+    missionId,
+    artifactPath,
+    rubric: sched.harness_rubric,
+    threshold: resolved.threshold,
+    maxIterations: resolved.maxIterations,
+    generatorType: resolved.generatorType,
+  };
+}
+/**
+ * Build the initial Planner → Generator → Evaluator trio for a tri-agent
+ * schedule firing. Throws if the schedule's harness config is invalid — the
+ * caller is responsible for validating + logging upstream when desired.
+ *
+ * Returns { items, missionId, artifactPath } where items[0..2] are the three
+ * work items in dispatch order, already linked by depends_on.
+ */
+function createTriAgentMission(sched, opts) {
+  const { valid, errors, resolved } = validateHarnessConfig(sched);
+  if (!valid) throw new Error(`tri_agent harness config invalid for schedule ${sched && sched.id}: ${errors.join('; ')}`);
+  const nowMs = opts && Number.isFinite(opts.now) ? opts.now : Date.now();
+  const missionId = (opts && typeof opts.missionId === 'string' && opts.missionId) || _buildMissionId(sched, nowMs);
+  const artifactPath = harnessArtifactPath(missionId);
+  const iteration = 1;
+  const ctx = {
+    artifactPath, missionId, iteration,
+    threshold: resolved.threshold,
+    maxIterations: resolved.maxIterations,
+    rubric: sched.harness_rubric,
+  };
+  const plannerId = _buildItemId(sched.id, HARNESS_ROLE.PLANNER, iteration, nowMs);
+  const generatorId = _buildItemId(sched.id, HARNESS_ROLE.GENERATOR, iteration, nowMs);
+  const evaluatorId = _buildItemId(sched.id, HARNESS_ROLE.EVALUATOR, iteration, nowMs);
+  // Planner + Evaluator are read-only (ask) by design — they don't mutate
+  // project code, they only read/write the shared harness artifact. The
+  // generator inherits sched.type (default 'ask').
+  const planner = {
+    id: plannerId,
+    type: 'ask',
+    description: _buildPlannerDescription(sched, ctx),
+    depends_on: [],
+    ..._commonItemFields(sched, HARNESS_ROLE.PLANNER, iteration),
+    _missionId: missionId,
+    _harness: _buildHarnessMeta(missionId, HARNESS_ROLE.PLANNER, iteration, resolved, sched, artifactPath),
+  };
+  const generator = {
+    id: generatorId,
+    type: resolved.generatorType,
+    description: _buildGeneratorDescription(sched, ctx, {}),
+    depends_on: [plannerId],
+    ..._commonItemFields(sched, HARNESS_ROLE.GENERATOR, iteration),
+    _missionId: missionId,
+    _harness: _buildHarnessMeta(missionId, HARNESS_ROLE.GENERATOR, iteration, resolved, sched, artifactPath),
+  };
+  const evaluator = {
+    id: evaluatorId,
+    type: 'ask',
+    description: _buildEvaluatorDescription(sched, ctx),
+    depends_on: [generatorId],
+    ..._commonItemFields(sched, HARNESS_ROLE.EVALUATOR, iteration),
+    _missionId: missionId,
+    _harness: _buildHarnessMeta(missionId, HARNESS_ROLE.EVALUATOR, iteration, resolved, sched, artifactPath),
+  };
+  return { items: [planner, generator, evaluator], missionId, artifactPath };
+}
+/**
+ * Build the Generator+Evaluator pair for iteration N+1 after the Evaluator
+ * fails the rubric. The Planner only runs once per mission — its plan is
+ * already in the shared artifact.
+ *
+ * `prevEvaluatorItem` is the work item that just completed (must carry
+ * `_harness` meta with role='evaluator'). The new generator depends on it so
+ * the engine's dispatch loop won't fire it until the artifact is fully written.
+ */
+function createIterationWorkItems(prevEvaluatorItem, verdict, opts) {
+  if (!prevEvaluatorItem || !prevEvaluatorItem._harness) {
+    throw new Error('createIterationWorkItems: prevEvaluatorItem missing _harness meta');
+  }
+  const prevMeta = prevEvaluatorItem._harness;
+  if (prevMeta.role !== HARNESS_ROLE.EVALUATOR) {
+    throw new Error(`createIterationWorkItems: prev item must be an evaluator (got role=${prevMeta.role})`);
+  }
+  const iteration = (Number(prevMeta.iteration) || 1) + 1;
+  const nowMs = opts && Number.isFinite(opts.now) ? opts.now : Date.now();
+  const sched = {
+    id: prevEvaluatorItem._scheduleId,
+    title: prevEvaluatorItem.title || prevMeta.missionId,
+    description: '', // Carried via artifact + feedback, not re-rendered.
+    harness_rubric: prevMeta.rubric,
+    project: prevEvaluatorItem.project || null,
+    priority: prevEvaluatorItem.priority || 'medium',
+  };
+  const ctx = {
+    artifactPath: prevMeta.artifactPath,
+    missionId: prevMeta.missionId,
+    iteration,
+    threshold: prevMeta.threshold,
+    maxIterations: prevMeta.maxIterations,
+    rubric: prevMeta.rubric,
+  };
+  const resolved = {
+    threshold: prevMeta.threshold,
+    maxIterations: prevMeta.maxIterations,
+    generatorType: prevMeta.generatorType || HARNESS_DEFAULTS.generatorType,
+  };
+  const generatorId = _buildItemId(sched.id || 'mission', HARNESS_ROLE.GENERATOR, iteration, nowMs);
+  const evaluatorId = _buildItemId(sched.id || 'mission', HARNESS_ROLE.EVALUATOR, iteration, nowMs);
+  const feedback = verdict && verdict.feedback ? verdict.feedback : '(no feedback supplied)';
+  const generator = {
+    id: generatorId,
+    type: resolved.generatorType,
+    title: `[harness:generator:i${iteration}] ${sched.title}`,
+    description: _buildGeneratorDescription(sched, ctx, { previousFeedback: feedback }),
+    depends_on: [prevEvaluatorItem.id],
+    priority: sched.priority,
+    status: WI_STATUS.PENDING,
+    created: ts(),
+    createdBy: 'harness:iterate',
+    project: sched.project,
+    agent: null,
+    _scheduleId: sched.id,
+    _missionId: prevMeta.missionId,
+    _harness: _buildHarnessMeta(prevMeta.missionId, HARNESS_ROLE.GENERATOR, iteration, resolved, sched, prevMeta.artifactPath),
+  };
+  const evaluator = {
+    id: evaluatorId,
+    type: 'ask',
+    title: `[harness:evaluator:i${iteration}] ${sched.title}`,
+    description: _buildEvaluatorDescription(sched, ctx),
+    depends_on: [generatorId],
+    priority: sched.priority,
+    status: WI_STATUS.PENDING,
+    created: ts(),
+    createdBy: 'harness:iterate',
+    project: sched.project,
+    agent: null,
+    _scheduleId: sched.id,
+    _missionId: prevMeta.missionId,
+    _harness: _buildHarnessMeta(prevMeta.missionId, HARNESS_ROLE.EVALUATOR, iteration, resolved, sched, prevMeta.artifactPath),
+  };
+  return [generator, evaluator];
+}
+// ─── Verdict parsing + iteration gate ───────────────────────────────────────
+const SCORE_RE = /(?:^|\W)Score\s*[:=]\s*([0-1](?:\.\d+)?|\.\d+)/i;
+const PASS_RE = /(?:^|[^\w])(PASS|✅\s*PASS|verdict\s*[:=]\s*pass)\b/i;
+const FAIL_RE = /(?:^|[^\w])(FAIL|❌\s*FAIL|verdict\s*[:=]\s*fail)\b/i;
+/**
+ * Extract { score, pass, feedback } from the Evaluator's completion report and
+ * stdout. Structured fields in the completion report win when present.
+ *
+ * Returns:
+ *   { score: number | null, pass: boolean | null, feedback: string }
+ * `score=null` and `pass=null` together mean "no signal" — the caller should
+ * treat this as inconclusive (do NOT retry blindly).
+ */
+function parseEvaluatorVerdict(stdout, structuredCompletion) {
+  let score = null;
+  let pass = null;
+  let feedback = '';
+  // Structured fields take precedence — they're the documented contract in
+  // the evaluator prompt and not vulnerable to text-format drift.
+  if (structuredCompletion && typeof structuredCompletion === 'object') {
+    if (typeof structuredCompletion.harness_score === 'number' && Number.isFinite(structuredCompletion.harness_score)) {
+      score = Math.max(0, Math.min(1, structuredCompletion.harness_score));
+    }
+    if (typeof structuredCompletion.harness_pass === 'boolean') {
+      pass = structuredCompletion.harness_pass;
+    }
+    if (typeof structuredCompletion.harness_feedback === 'string' && structuredCompletion.harness_feedback.trim()) {
+      feedback = structuredCompletion.harness_feedback.trim();
+    } else if (typeof structuredCompletion.summary === 'string' && structuredCompletion.summary.trim()) {
+      feedback = structuredCompletion.summary.trim();
+    }
+  }
+  // Text fallback — only fill in fields the structured report did not provide.
+  if ((score === null || pass === null || !feedback) && typeof stdout === 'string' && stdout.length > 0) {
+    if (score === null) {
+      const m = SCORE_RE.exec(stdout);
+      if (m) {
+        const n = parseFloat(m[1]);
+        if (Number.isFinite(n)) score = Math.max(0, Math.min(1, n));
+      }
+    }
+    if (pass === null) {
+      const failMatch = FAIL_RE.exec(stdout);
+      const passMatch = PASS_RE.exec(stdout);
+      // FAIL takes precedence over PASS when both appear (the evaluator's
+      // explanation of failure may mention 'pass criteria' etc).
+      if (failMatch) pass = false;
+      else if (passMatch) pass = true;
+    }
+    if (!feedback) {
+      // Best-effort: take the last non-empty line as the feedback summary.
+      const lines = stdout.split(/\r?\n/).map(l => l.trim()).filter(Boolean);
+      if (lines.length > 0) feedback = lines[lines.length - 1].slice(0, 2000);
+    }
+  }
+  // If score is set but pass is not, infer pass from the threshold caller
+  // (lifecycle.shouldIterateAgain) — but leave pass=null here so the caller
+  // can apply the per-mission threshold rather than baking in a default.
+  return { score, pass, feedback };
+}
+/**
+ * Decide whether to spawn another Generator+Evaluator iteration.
+ *
+ * Rules (in order):
+ *   1. If verdict.pass === true, stop (mission succeeded).
+ *   2. If iteration >= maxIterations, stop (cap reached).
+ *   3. If we have a numeric score AND score >= threshold, treat as pass and stop.
+ *   4. If we have a numeric score AND score <  threshold, iterate.
+ *   5. If verdict.pass === false explicitly (no score), iterate.
+ *   6. Otherwise (no score and no pass signal), STOP — silent agents would
+ *      loop forever; require explicit failure to retry.
+ */
+function shouldIterateAgain(harnessMeta, verdict) {
+  if (!harnessMeta || !verdict) return false;
+  const iteration = Number(harnessMeta.iteration) || 1;
+  const maxIterations = Number(harnessMeta.maxIterations) || HARNESS_DEFAULTS.maxIterations;
+  const threshold = Number(harnessMeta.threshold);
+  const t = Number.isFinite(threshold) ? threshold : HARNESS_DEFAULTS.threshold;
+  if (verdict.pass === true) return false;
+  if (iteration >= maxIterations) return false;
+  if (typeof verdict.score === 'number' && Number.isFinite(verdict.score)) {
+    return verdict.score < t;
+  }
+  if (verdict.pass === false) return true;
+  return false;
+}
+module.exports = {
+  HARNESS_MODE,
+  HARNESS_ROLE,
+  HARNESS_DEFAULTS,
+  HARNESS_MAX_ITERATIONS_CAP,
+  harnessRootDir,
+  harnessMissionDir,
+  harnessArtifactPath,
+  validateHarnessConfig,
+  createTriAgentMission,
+  createIterationWorkItems,
+  parseEvaluatorVerdict,
+  shouldIterateAgain,
+  // Exported for direct unit tests (per docs/skills.md skill 'export-internal-helpers-for-direct-unit-tests').
+  _buildPlannerDescription,
+  _buildGeneratorDescription,
+  _buildEvaluatorDescription,
+};

package/engine/lifecycle.js CHANGED Viewed

@@ -14,6 +14,7 @@ const { trackEngineUsage } = require('./llm');
 const { resolveRuntime } = require('./runtimes');
 const adoGitAuth = require('./ado-git-auth');
 const queries = require('./queries');
+const harness = require('./harness');
 const { isBranchActive } = require('./cooldown');
 const { worktreeMatchesBranch, getWorktreeBranch, cleanupMergedPrLocalBranch } = require('./cleanup');
 const { getConfig, getInboxFiles, getNotes, getPrs, getDispatch,
@@ -4040,6 +4041,82 @@ function handleDecompositionResult(stdout, meta, config, runtimeName) {
   return 0;
 }
+/**
+ * Tri-agent harness mode (W-mq07a9gf000jbc2b): when an evaluator completes,
+ * parse its verdict against the configured rubric/threshold and — if the
+ * artifact didn't pass and the iteration cap hasn't been hit — append a
+ * fresh Generator+Evaluator pair so the harness can iterate on its own
+ * artifact. Returns the number of work items appended (0 = terminal stop,
+ * either pass or cap reached).
+ *
+ * Called from runPostCompletionHooks after a successful run when the
+ * dispatched item carries _harness.role === 'evaluator'.
+ */
+function handleHarnessIterationResult(stdout, structuredCompletion, meta, config) {
+  const evaluatorItem = meta?.item;
+  if (!evaluatorItem?._harness || evaluatorItem._harness.role !== harness.HARNESS_ROLE.EVALUATOR) return 0;
+  let verdict;
+  try {
+    verdict = harness.parseEvaluatorVerdict(stdout || '', structuredCompletion || null);
+  } catch (err) {
+    log('warn', `Harness ${evaluatorItem._harness.missionId}: verdict parse failed — ${err.message}; treating as terminal stop`);
+    return 0;
+  }
+  if (!harness.shouldIterateAgain(evaluatorItem._harness, verdict)) {
+    const reason = verdict.pass === true ? 'passed' :
+      (evaluatorItem._harness.iteration >= evaluatorItem._harness.maxIterations ? 'max iterations reached' :
+      'inconclusive verdict');
+    log('info', `Harness mission ${evaluatorItem._harness.missionId} terminal stop (iteration ${evaluatorItem._harness.iteration}, ${reason}, score=${verdict.score ?? 'n/a'})`);
+    return 0;
+  }
+  let nextItems;
+  try {
+    nextItems = harness.createIterationWorkItems(evaluatorItem, verdict, {});
+  } catch (err) {
+    log('warn', `Harness ${evaluatorItem._harness.missionId}: iteration build failed — ${err.message}`);
+    return 0;
+  }
+  if (!Array.isArray(nextItems) || nextItems.length === 0) return 0;
+  // Mirror handleDecompositionResult: scan central + per-project work-items.json
+  // and append into the file that owns the evaluator (the trio always lands in
+  // the central file in practice — scheduler.discoverScheduledWork writes
+  // directly to engine/work-items.json via engine.js — but iterate defensively).
+  const projects = shared.getProjects(config);
+  const allPaths = [path.join(MINIONS_DIR, 'work-items.json')];
+  for (const p of projects) allPaths.push(shared.projectWorkItemsPath(p));
+  let appendedTo = null;
+  for (const wiPath of allPaths) {
+    let found = false;
+    mutateJsonFileLocked(wiPath, data => {
+      if (!Array.isArray(data)) return data;
+      const evaluator = data.find(i => i.id === evaluatorItem.id);
+      if (!evaluator) return data;
+      found = true;
+      // De-dupe by id in case a previous tick already appended the next pair.
+      const existingIds = new Set(data.map(i => i.id));
+      for (const it of nextItems) {
+        if (existingIds.has(it.id)) continue;
+        data.push(it);
+      }
+      return data;
+    }, { defaultValue: [] });
+    if (found) { appendedTo = wiPath; break; }
+  }
+  if (!appendedTo) {
+    log('warn', `Harness ${evaluatorItem._harness.missionId}: evaluator ${evaluatorItem.id} not found in any work-items.json — iteration skipped`);
+    return 0;
+  }
+  log('info', `Harness mission ${evaluatorItem._harness.missionId} iterating: appended ${nextItems.length} work items (next iteration: ${nextItems[0]._harness.iteration}, score=${verdict.score ?? 'n/a'})`);
+  return nextItems.length;
+}
 /**
  * W-mpg58wv3 — auto-dispatch a re-review WI when a fix-WI born from a minion
  * REQUEST_CHANGES marks done. Closure-loop for the shared Yemi reviewer slot:
@@ -4386,6 +4463,19 @@ async function runPostCompletionHooks(dispatchItem, agentId, code, stdout, confi
     }
   }
+  // Tri-agent harness iteration (W-mq07a9gf000jbc2b): if the evaluator just
+  // completed successfully and verdict says retry, append the next Gen+Eval
+  // pair into the same work-items.json. Engine will dispatch them on the
+  // next tick. No interaction with skipDoneStatus — the evaluator itself
+  // still marks DONE; iteration is a sibling write, not a parent decomp.
+  if (effectiveSuccess && meta?.item?._harness?.role === harness.HARNESS_ROLE.EVALUATOR) {
+    try {
+      handleHarnessIterationResult(stdout, structuredCompletion, meta, config);
+    } catch (err) {
+      log('warn', `Harness iteration hook failed for ${meta.item.id}: ${err.message}`);
+    }
+  }
   // Verify review work items include a verdict — must run BEFORE updateWorkItemStatus(DONE),
   // same pattern as plan-to-prd (#893): updateWorkItemStatus deletes _retryCount, so the check
   // must read/increment it before that happens. Also sets skipDoneStatus so completedAt isn't
@@ -5204,6 +5294,7 @@ module.exports = {
   isPrAttachmentRequired,
   extractDecompositionJson,
   handleDecompositionResult,
+  handleHarnessIterationResult,
   processCompletionFollowups,
   // W-mpg58wv3 — closure-loop dispatch helpers (exported for testing).
   dispatchReReviewForFix,

package/engine/scheduler.js CHANGED Viewed

@@ -25,7 +25,8 @@ const fs = require('fs');
 const path = require('path');
 const shared = require('./shared');
 const routing = require('./routing');
-const { safeJson, safeWrite, mutateJsonFileLocked, mutateScheduleRuns, ts, dateStamp, WI_STATUS, WORK_TYPE } = shared;
+const harness = require('./harness');
+const { safeJson, safeWrite, mutateJsonFileLocked, mutateScheduleRuns, ts, dateStamp, log, WI_STATUS, WORK_TYPE } = shared;
 const SCHEDULE_RUNS_PATH = path.join(shared.MINIONS_DIR, 'engine', 'schedule-runs.json');
@@ -186,9 +187,9 @@ function createScheduledWorkItem(sched) {
   };
 }
-function writeScheduleRunEntry(runs, scheduleId, workItemId) {
+function writeScheduleRunEntry(runs, scheduleId, workItemId, extra) {
   const existing = typeof runs[scheduleId] === 'object' && runs[scheduleId] ? runs[scheduleId] : {};
-  runs[scheduleId] = { ...existing, lastRun: ts(), lastWorkItemId: workItemId };
+  runs[scheduleId] = { ...existing, lastRun: ts(), lastWorkItemId: workItemId, ...(extra || {}) };
   return runs[scheduleId];
 }
@@ -222,6 +223,42 @@ function discoverScheduledWork(config) {
       const lastRun = typeof runEntry === 'string' ? runEntry : (runEntry?.lastRun || null);
       if (!shouldRunNow(sched, lastRun)) continue;
+      // Tri-agent harness mode (W-mq07a9gf000jbc2b): a single schedule firing
+      // produces a coordinated Planner → Generator → Evaluator trio rather than
+      // a single work item. Validate config first — on bad config, skip this
+      // tick WITHOUT recording a schedule run so the operator can fix the
+      // config and the next tick will pick it up.
+      if (sched.harness_mode === harness.HARNESS_MODE.TRI_AGENT) {
+        const validation = harness.validateHarnessConfig(sched);
+        if (!validation.valid) {
+          log('warn', `Scheduler: harness config invalid for ${sched.id} — skipping (errors: ${validation.errors.join('; ')})`);
+          continue;
+        }
+        try {
+          // Resolve schedule-time template variables on the title/description
+          // BEFORE handing the schedule to the harness builder so subtask
+          // prompts inherit the same substitutions as regular schedules.
+          const resolvedSched = {
+            ...sched,
+            title: resolveScheduleTemplateVars(sched.title),
+            description: resolveScheduleTemplateVars(sched.description || sched.title),
+            harness_rubric: resolveScheduleTemplateVars(sched.harness_rubric),
+          };
+          const mission = harness.createTriAgentMission(resolvedSched);
+          for (const it of mission.items) work.push(it);
+          // Record the mission's planner id as lastWorkItemId for compatibility
+          // with the existing schedule-runs shape, plus lastMissionId so the
+          // dashboard and consolidation tooling can join across the trio.
+          writeScheduleRunEntry(runs, sched.id, mission.items[0].id, {
+            lastMissionId: mission.missionId,
+            harnessMode: harness.HARNESS_MODE.TRI_AGENT,
+          });
+        } catch (err) {
+          log('warn', `Scheduler: tri-agent mission build failed for ${sched.id}: ${err.message}`);
+        }
+        continue;
+      }
       // Substitute schedule-time template vars (e.g. {{date}}) before the work
       // item is written — single-pass playbook rendering can't reach placeholders
       // embedded inside task_description, so they must be resolved up front.

package/engine.js CHANGED Viewed

@@ -3944,7 +3944,7 @@ function reconcileItemsWithPrs(items, allPrs, { onlyIds } = {}) {
 // ─── Inbox Consolidation (extracted to engine/consolidation.js) ──────────────
 const { consolidateInbox } = require('./engine/consolidation');
-const { pollPrStatus, pollPrHumanComments, reconcilePrs, checkLiveReviewStatus: adoCheckLiveReview, checkLiveBuildAndConflict: adoCheckLiveBuildAndConflict, needsAdoPollRetry, getAdoToken, isAdoThrottled } = require('./engine/ado');
+const { pollPrStatus, pollPrHumanComments, reconcilePrs, checkLiveReviewStatus: adoCheckLiveReview, checkLiveBuildAndConflict: adoCheckLiveBuildAndConflict, needsAdoPollRetry, getAdoToken, isAdoThrottled, getAdoThrottleStateAll } = require('./engine/ado');
 const { pollPrStatus: ghPollPrStatus, pollPrHumanComments: ghPollPrHumanComments, reconcilePrs: ghReconcilePrs, checkLiveReviewStatus: ghCheckLiveReview, checkLiveBuildAndConflict: ghCheckLiveBuildAndConflict, isGhThrottled } = require('./engine/github');
 // ─── State Snapshot ─────────────────────────────────────────────────────────
@@ -6878,12 +6878,35 @@ async function discoverWork(config) {
         mutateJsonFileLocked(centralPath, (items) => {
           if (!Array.isArray(items)) items = [];
           let added = 0;
+          // Snapshot active dedup keys BEFORE the loop so multiple items in the
+          // same harness mission (same _missionId) all land in one tick. Without
+          // this snapshot, the first item's push would block subsequent items
+          // in the same mission from joining (W-mq07a9gf000jbc2b — tri-agent
+          // harness mode requires Planner+Generator+Evaluator to land together).
+          const activeMissionIds = new Set();
+          const activeScheduleIds = new Set();
+          for (const existing of items) {
+            if (existing.status === WI_STATUS.DONE || existing.status === WI_STATUS.FAILED) continue;
+            if (existing._missionId) activeMissionIds.add(existing._missionId);
+            if (existing._scheduleId) activeScheduleIds.add(existing._scheduleId);
+          }
+          const addedScheduleIdsThisTick = new Set();
           for (const item of taskItems) {
-            if (!items.some(i => i._scheduleId === item._scheduleId && i.status !== WI_STATUS.DONE && i.status !== WI_STATUS.FAILED)) {
-              items.push(item);
-              added++;
-              log('info', `Scheduled task fired: ${item._scheduleId} → ${item.title}`);
+            // Mission items dedup by _missionId against pre-existing rows only
+            // (the trio's other items added later in this loop must not block
+            // each other). Plain scheduled items keep the original scheduleId
+            // dedup AND skip if a sibling item from the same tick already
+            // claimed the schedule slot.
+            if (item._missionId) {
+              if (activeMissionIds.has(item._missionId)) continue;
+            } else {
+              if (activeScheduleIds.has(item._scheduleId)) continue;
+              if (addedScheduleIdsThisTick.has(item._scheduleId)) continue;
             }
+            items.push(item);
+            if (!item._missionId && item._scheduleId) addedScheduleIdsThisTick.add(item._scheduleId);
+            added++;
+            log('info', `Scheduled task fired: ${item._scheduleId} → ${item.title}`);
           }
           return items;
         }, { defaultValue: [] });
@@ -7349,10 +7372,18 @@ async function tickInner() {
     lastPrStatusPollAt = now;
     // Build promise array — enabled+unthrottled polls run concurrently via Promise.allSettled
     const statusPolls = [];
-    if (adoPollEnabled && !isAdoThrottled()) {
-      statusPolls.push(pollPrStatus(config).catch(err => { log('warn', `ADO PR status poll error: ${err?.message || err}${err?.stack ? ' | ' + err.stack.split('\n')[1]?.trim() : ''}`); }));
-    } else if (adoPollEnabled && isAdoThrottled()) {
-      log('info', '[ado] PR status poll skipped — throttled');
+    if (adoPollEnabled) {
+      // Per-org throttle skip happens inside forEachActivePr (one log line per skipped project).
+      // Top-level short-circuit: when every known ADO org is throttled, skip the whole phase
+      // with one log line to avoid the per-project iteration cost.
+      const adoThrottleStates = getAdoThrottleStateAll() || {};
+      const adoOrgCount = Object.keys(adoThrottleStates).length;
+      const allAdoThrottled = adoOrgCount > 0 && Object.values(adoThrottleStates).every(s => s && s.throttled);
+      if (allAdoThrottled) {
+        log('info', `[ado] PR status poll skipped — all ${adoOrgCount} known orgs throttled`);
+      } else {
+        statusPolls.push(pollPrStatus(config).catch(err => { log('warn', `ADO PR status poll error: ${err?.message || err}${err?.stack ? ' | ' + err.stack.split('\n')[1]?.trim() : ''}`); }));
+      }
     }
     if (ghPollEnabled && !isGhThrottled()) {
       statusPolls.push(ghPollPrStatus(config).catch(err => { log('warn', `GitHub PR status poll error: ${err?.message || err}${err?.stack ? ' | ' + err.stack.split('\n')[1]?.trim() : ''}`); }));
@@ -7395,10 +7426,18 @@ async function tickInner() {
     lastPrCommentsPollAt = now;
     // Build promise array — enabled+unthrottled comment polls run concurrently via Promise.allSettled
     const commentPolls = [];
-    if (adoPollEnabled && !isAdoThrottled()) {
-      commentPolls.push(pollPrHumanComments(config).catch(err => { log('warn', `ADO PR comment poll error: ${err?.message || err}${err?.stack ? ' | ' + err.stack.split('\n')[1]?.trim() : ''}`); }));
-    } else if (adoPollEnabled && isAdoThrottled()) {
-      log('info', '[ado] PR comment poll skipped — throttled');
+    if (adoPollEnabled) {
+      // Per-org throttle skip happens inside forEachActivePr (one log line per skipped project).
+      // Top-level short-circuit: when every known ADO org is throttled, skip the whole phase
+      // with one log line to avoid the per-project iteration cost.
+      const adoThrottleStates = getAdoThrottleStateAll() || {};
+      const adoOrgCount = Object.keys(adoThrottleStates).length;
+      const allAdoThrottled = adoOrgCount > 0 && Object.values(adoThrottleStates).every(s => s && s.throttled);
+      if (allAdoThrottled) {
+        log('info', `[ado] PR comment poll skipped — all ${adoOrgCount} known orgs throttled`);
+      } else {
+        commentPolls.push(pollPrHumanComments(config).catch(err => { log('warn', `ADO PR comment poll error: ${err?.message || err}${err?.stack ? ' | ' + err.stack.split('\n')[1]?.trim() : ''}`); }));
+      }
     }
     if (ghPollEnabled && !isGhThrottled()) {
       commentPolls.push(ghPollPrHumanComments(config).catch(err => { log('warn', `GitHub PR comment poll error: ${err?.message || err}${err?.stack ? ' | ' + err.stack.split('\n')[1]?.trim() : ''}`); }));

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@yemi33/minions",
-  "version": "0.1.2122",
+  "version": "0.1.2123",
   "description": "Multi-agent AI dev team that runs from ~/.minions/ — five autonomous agents share a single engine, dashboard, and knowledge base",
   "bin": {
     "minions": "bin/minions.js"