npm - @xn-intenton-z2a/agentic-lib - Versions diffs - 7.4.12 → 7.4.13 - Mend

@xn-intenton-z2a/agentic-lib 7.4.12 → 7.4.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/.github/agents/agent-director.md +3 -0
package/.github/agents/agent-implementation-review.md +63 -0
package/.github/agents/agent-supervisor.md +11 -1
package/.github/workflows/agentic-lib-workflow.yml +78 -6
package/bin/agentic-lib.js +14 -1
package/package.json +1 -1
package/src/actions/agentic-step/index.js +17 -1
package/src/actions/agentic-step/logging.js +82 -1
package/src/actions/agentic-step/tasks/direct.js +28 -0
package/src/actions/agentic-step/tasks/implementation-review.js +232 -0
package/src/actions/agentic-step/tasks/supervise.js +20 -0
package/src/copilot/telemetry.js +29 -0
package/src/seeds/zero-package.json +1 -1

package/.github/agents/agent-director.md CHANGED Viewed

@@ -27,6 +27,9 @@ Declare `mission-complete` when ALL of the following are true:
 3. The Recently Closed Issues confirm that acceptance criteria have been addressed
 4. No TODOs remain in source code
 5. Dedicated test files exist (not just seed tests)
+6. The Implementation Review shows no critical gaps (if review data is present)
+**Important:** If the Implementation Review section is present in your prompt and identifies critical gaps — missing implementations, untested features, or misleading metrics — do NOT declare mission-complete even if other metrics are met. The review is ground-truth evidence; metrics can be misleading.
 ### Mission Failed
 Declare `mission-failed` when ANY of the following are true:

package/.github/agents/agent-implementation-review.md ADDED Viewed

@@ -0,0 +1,63 @@
+---
+description: Trace mission elements through source code, tests, website, and behaviour tests to verify implementation completeness
+---
+You are an implementation review agent for an autonomous coding repository. Your job is to provide **ground-truth evidence** that the mission is actually implemented — not just that metrics say it is.
+## Your Role
+You do NOT write code, create issues, or dispatch workflows. You ONLY review and report. Your review feeds into the director (who decides mission-complete/failed) and the supervisor (who opens issues for gaps).
+## Why This Matters
+Metrics can be misleading:
+- Issues closed in error look like "resolved" issues
+- Trivial tests (empty assertions, tests that always pass) inflate test counts
+- Features marked "done" in documentation but missing from actual code
+- PRs merged that don't actually implement what the issue requested
+Your job is to look past the metrics and verify the actual state of the code.
+## Review Process
+### Step 1: Decompose the Mission
+Read MISSION.md and break it into discrete deliverable elements. Each element should be a specific capability or feature that the mission requires.
+### Step 2: Trace Each Element
+For each mission element, search the codebase:
+1. **Implementation** (`src/lib/`): Is there actual code that implements this? Look for functions, classes, or modules that provide the capability. Read the code to verify it's substantive, not just a stub.
+2. **Unit Tests** (`tests/`): Are there test files that import from `src/lib/` and test this element? Read the tests to verify they make meaningful assertions — not just `expect(true).toBe(true)`.
+3. **Behaviour Tests** (`tests/behaviour/` or Playwright tests): Are there end-to-end tests that exercise this element? Check that they interact with the actual feature, not just load a page.
+4. **Website Usage** (`src/web/`, `docs/`): Does the website actually use this feature? Look for imports from `src/lib/` or API calls that surface the feature to users.
+5. **Integration Path**: How does the website access the library? Direct import, script tag, API endpoint? Document the actual mechanism.
+6. **Behaviour Coverage**: Do the behaviour tests verify that the website presents this specific feature? Check that Playwright tests assert on feature-specific content, not just generic page structure.
+### Step 3: Identify Misleading Metrics
+Look for patterns that could give false confidence:
+- Recently closed issues that have no associated commits or PRs
+- Test files that exist but don't test the claimed feature
+- Documentation that claims completion without corresponding code
+- Issues closed with "not planned" that might have been legitimate work items
+### Step 4: Report
+Call `report_implementation_review` with:
+- **elements**: Each mission element with its trace results
+- **gaps**: Specific missing pieces with severity ratings
+- **advice**: One English sentence summarising completeness
+- **misleadingMetrics**: Any metrics that don't reflect reality
+## Severity Guide
+- **critical**: Mission element is not implemented at all, or a core feature has no tests
+- **moderate**: Feature exists but lacks test coverage, or website doesn't expose it
+- **low**: Minor coverage gaps, documentation issues, or cosmetic concerns
+## Output
+You MUST call `report_implementation_review` exactly once with your complete findings.

package/.github/agents/agent-supervisor.md CHANGED Viewed

@@ -18,7 +18,17 @@ Look at which metrics are NOT MET — these tell you what gaps remain:
 5. Source TODO count > 0 → create an issue to resolve TODOs
 6. Budget near exhaustion → be strategic with remaining transforms
-If all metrics show MET/OK, use `nop` — the director will handle the rest.
+7. Implementation review gaps → create issues with label `implementation-gap` for critical gaps
+If all metrics show MET/OK and no implementation review gaps exist, use `nop` — the director will handle the rest.
+### Implementation Review
+If an **Implementation Review** section is present in the prompt, examine it carefully. The review traces each mission element through source code, tests, website, and behaviour tests. It provides ground-truth evidence of what is actually implemented — not just what metrics suggest.
+- **Critical gaps** should result in creating a focused issue (label: `implementation-gap`) that describes exactly what is missing
+- **Moderate gaps** should be noted but may not need immediate action
+- **Misleading metrics** should inform your decision-making — don't take actions based on metrics the review has flagged as unreliable
 ## Priority Order

package/.github/workflows/agentic-lib-workflow.yml CHANGED Viewed

@@ -708,11 +708,79 @@ jobs:
         if: github.repository != 'xn-intenton-z2a/agentic-lib' && needs.params.outputs.dry-run != 'true'
         env:
           LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
-        run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md"
+        run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md" agent-log-*.md
+  # ─── Implementation Review: traces mission elements through code/tests/website ──
+  implementation-review:
+    needs: [params]
+    if: |
+      !cancelled() &&
+      (needs.params.outputs.mode == 'full' || needs.params.outputs.mode == 'maintain-only') &&
+      needs.params.result == 'success'
+    runs-on: ubuntu-latest
+    outputs:
+      review-advice: ${{ steps.review.outputs.completeness-advice }}
+      review-gaps: ${{ steps.review.outputs.gaps }}
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 0
+          ref: ${{ inputs.ref || github.sha }}
+      - name: Fetch log and agent logs from log branch
+        env:
+          LOG_FILE: ${{ needs.params.outputs.log-file }}
+          LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
+          SCREENSHOT_FILE: ${{ needs.params.outputs.screenshot-file }}
+        run: |
+          for f in "${LOG_FILE}" "intention.md" "${SCREENSHOT_FILE}"; do
+            git show "origin/${LOG_BRANCH}:${f}" > "$f" 2>/dev/null || true
+          done
+          mkdir -p .agent-logs
+          git fetch origin "${LOG_BRANCH}" 2>/dev/null || true
+          for f in $(git ls-tree --name-only "origin/${LOG_BRANCH}" 2>/dev/null | grep '^agent-log-' || true); do
+            git show "origin/${LOG_BRANCH}:${f}" > ".agent-logs/${f}" 2>/dev/null || true
+          done
+          echo "Fetched $(ls .agent-logs/agent-log-*.md 2>/dev/null | wc -l | tr -d ' ') agent log files"
+      - uses: actions/setup-node@v6
+        with:
+          node-version: "24"
+      - name: Self-init (agentic-lib dev only)
+        if: hashFiles('scripts/self-init.sh') != '' && hashFiles('.github/agentic-lib/actions/agentic-step/package.json') == ''
+        run: bash scripts/self-init.sh
+      - name: Install agentic-step dependencies
+        working-directory: .github/agentic-lib/actions/agentic-step
+        run: |
+          npm ci
+          if [ -d "../../copilot" ]; then
+            ln -sf "$(pwd)/node_modules" ../../copilot/node_modules
+          fi
+      - name: Run implementation review
+        id: review
+        if: github.repository != 'xn-intenton-z2a/agentic-lib'
+        uses: ./.github/agentic-lib/actions/agentic-step
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
+        with:
+          task: "implementation-review"
+          config: ${{ needs.params.outputs.config-path }}
+          instructions: ".github/agents/agent-implementation-review.md"
+          model: ${{ needs.params.outputs.model }}
+      - name: Push log to log branch
+        if: github.repository != 'xn-intenton-z2a/agentic-lib' && needs.params.outputs.dry-run != 'true'
+        env:
+          LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
+        run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md" agent-log-*.md
   # ─── Director: LLM evaluates mission status (complete/failed/in-progress) ──
   director:
-    needs: [params, telemetry, maintain]
+    needs: [params, telemetry, maintain, implementation-review]
     if: |
       !cancelled() &&
       (needs.params.outputs.mode == 'full' || needs.params.outputs.mode == 'dev-only') &&
@@ -760,6 +828,8 @@ jobs:
         env:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
           COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
+          REVIEW_ADVICE: ${{ needs.implementation-review.outputs.review-advice }}
+          REVIEW_GAPS: ${{ needs.implementation-review.outputs.review-gaps }}
         with:
           task: "direct"
           config: ${{ needs.params.outputs.config-path }}
@@ -770,11 +840,11 @@ jobs:
         if: github.repository != 'xn-intenton-z2a/agentic-lib' && needs.params.outputs.dry-run != 'true'
         env:
           LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
-        run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md"
+        run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md" agent-log-*.md
   # ─── Supervisor: LLM decides what to do (after director evaluates) ──
   supervisor:
-    needs: [params, pr-cleanup, telemetry, maintain, director]
+    needs: [params, pr-cleanup, telemetry, maintain, implementation-review, director]
     if: |
       !cancelled() &&
       (needs.params.outputs.mode == 'full' || needs.params.outputs.mode == 'dev-only') &&
@@ -821,6 +891,8 @@ jobs:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
           COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
           INPUT_MESSAGE: ${{ needs.params.outputs.message }}
+          REVIEW_ADVICE: ${{ needs.implementation-review.outputs.review-advice }}
+          REVIEW_GAPS: ${{ needs.implementation-review.outputs.review-gaps }}
         with:
           task: "supervise"
           config: ${{ needs.params.outputs.config-path }}
@@ -831,7 +903,7 @@ jobs:
         if: github.repository != 'xn-intenton-z2a/agentic-lib' && needs.params.outputs.dry-run != 'true'
         env:
           LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
-        run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md"
+        run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md" agent-log-*.md
   # ─── Fix stuck PRs with failing checks ─────────────────────────────
   fix-stuck:
@@ -1403,7 +1475,7 @@ jobs:
         if: github.repository != 'xn-intenton-z2a/agentic-lib' && needs.params.outputs.dry-run != 'true'
         env:
           LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
-        run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md"
+        run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md" agent-log-*.md
       - name: Create PR and attempt merge
         if: github.repository != 'xn-intenton-z2a/agentic-lib' && steps.issue.outputs.issue-number != '' && needs.params.outputs.dry-run != 'true' && steps.pre-commit-test.outputs.tests-passed == 'true' && steps.pre-commit-behaviour-test.outputs.tests-passed != 'false'

package/bin/agentic-lib.js CHANGED Viewed

@@ -14,7 +14,7 @@
 //   npx @xn-intenton-z2a/agentic-lib maintain-library
 //   npx @xn-intenton-z2a/agentic-lib fix-code
-import { copyFileSync, existsSync, mkdirSync, rmSync, rmdirSync, readdirSync, readFileSync, writeFileSync } from "fs";
+import { copyFileSync, existsSync, mkdirSync, rmSync, rmdirSync, readdirSync, readFileSync, writeFileSync, unlinkSync } from "fs";
 import { applyDistTransform } from "../src/dist-transform.js";
 import { resolve, dirname, join } from "path";
 import { fileURLToPath } from "url";
@@ -785,6 +785,19 @@ function initPurge(seedsDir, missionName, initTimestamp) {
     initTransformFile(tomlSource, resolve(target, "agentic-lib.toml"), "SEED: agentic-lib.toml (transformed)");
   }
+  // Clear agent log files (written by implementation-review and other tasks)
+  try {
+    const agentLogs = readdirSync(target).filter((f) => f.startsWith("agent-log-") && f.endsWith(".md"));
+    for (const f of agentLogs) {
+      console.log(`  DELETE: ${f} (agent log)`);
+      if (!dryRun) {
+        unlinkSync(resolve(target, f));
+      }
+      initChanges++;
+    }
+    if (agentLogs.length > 0) console.log(`  Cleared ${agentLogs.length} agent log file(s)`);
+  } catch { /* ignore — directory may not have agent logs */ }
   // Copy mission seed file as MISSION.md
   const missionsDir = resolve(seedsDir, "missions");
   const missionFile = resolve(missionsDir, `${missionName}.md`);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@xn-intenton-z2a/agentic-lib",
-  "version": "7.4.12",
+  "version": "7.4.13",
   "description": "Agentic-lib Agentic Coding Systems SDK powering automated GitHub workflows.",
   "type": "module",
   "scripts": {

package/src/actions/agentic-step/index.js CHANGED Viewed

@@ -8,7 +8,7 @@
 import * as core from "@actions/core";
 import * as github from "@actions/github";
 import { loadConfig, getWritablePaths } from "./config-loader.js";
-import { logActivity, generateClosingNotes } from "./logging.js";
+import { logActivity, generateClosingNotes, writeAgentLog } from "./logging.js";
 import { readFileSync, existsSync } from "fs";
 import {
   buildMissionMetrics, buildMissionReadiness,
@@ -30,12 +30,14 @@ import { reviewIssue } from "./tasks/review-issue.js";
 import { discussions } from "./tasks/discussions.js";
 import { supervise } from "./tasks/supervise.js";
 import { direct } from "./tasks/direct.js";
+import { implementationReview } from "./tasks/implementation-review.js";
 const TASKS = {
   "resolve-issue": resolveIssue, "fix-code": fixCode, "transform": transform,
   "maintain-features": maintainFeatures, "maintain-library": maintainLibrary,
   "enhance-issue": enhanceIssue, "review-issue": reviewIssue,
   "discussions": discussions, "supervise": supervise, "direct": direct,
+  "implementation-review": implementationReview,
 };
 async function run() {
@@ -138,6 +140,20 @@ async function run() {
       });
     }
+    // Write standalone agent log file (pushed to agentic-lib-logs branch by workflow)
+    try {
+      const agentLogFile = writeAgentLog({
+        task, outcome: result.outcome || "completed",
+        model: result.model || model, durationMs, tokensUsed: result.tokensUsed,
+        narrative: result.narrative, contextNotes: result.contextNotes,
+        reviewTable: result.reviewTable, completenessAdvice: result.completenessAdvice,
+        missionMetrics,
+      });
+      core.info(`Agent log written: ${agentLogFile}`);
+    } catch (err) {
+      core.warning(`Could not write agent log: ${err.message}`);
+    }
     core.info(`agentic-step completed: outcome=${result.outcome}`);
   } catch (error) {
     core.setFailed(`agentic-step failed: ${error.message}`);

package/src/actions/agentic-step/logging.js CHANGED Viewed

@@ -5,7 +5,7 @@
 // Appends structured entries to the intentïon.md activity log,
 // including commit URLs and safety-check outcomes.
-import { writeFileSync, readFileSync, appendFileSync, existsSync, mkdirSync, copyFileSync } from "fs";
+import { writeFileSync, readFileSync, appendFileSync, existsSync, mkdirSync, copyFileSync, readdirSync } from "fs";
 import { dirname, basename } from "path";
 import { join } from "path";
 import * as core from "@actions/core";
@@ -172,6 +172,87 @@ export function logActivity({
   }
 }
+/**
+ * Write a standalone agent log file for a single task execution.
+ * Each file is uniquely named with a filesystem-safe datetime stamp.
+ *
+ * @param {Object} options
+ * @param {string} options.task - The task name
+ * @param {string} options.outcome - The task outcome
+ * @param {string} [options.model] - Model used
+ * @param {number} [options.durationMs] - Task duration in milliseconds
+ * @param {string} [options.narrative] - LLM-generated narrative
+ * @param {Array}  [options.reviewTable] - Implementation review table rows
+ * @param {string} [options.completenessAdvice] - English completeness assessment
+ * @param {string} [options.contextNotes] - Additional context notes
+ * @param {Array}  [options.missionMetrics] - Mission metrics entries
+ * @param {number} [options.tokensUsed] - Total tokens consumed
+ * @returns {string} The filename of the written log file
+ */
+export function writeAgentLog({
+  task, outcome, model, durationMs, narrative,
+  reviewTable, completenessAdvice, contextNotes,
+  missionMetrics, tokensUsed,
+}) {
+  const now = new Date();
+  const stamp = now.toISOString().replace(/:/g, "-").replace(/\./g, "-");
+  const filename = `agent-log-${stamp}.md`;
+  const parts = [
+    `# Agent Log: ${task} at ${now.toISOString()}`,
+    "",
+    "## Summary",
+    `**Task:** ${task}`,
+    `**Outcome:** ${outcome}`,
+  ];
+  if (model) parts.push(`**Model:** ${model}`);
+  if (tokensUsed) parts.push(`**Tokens:** ${tokensUsed}`);
+  if (durationMs) {
+    const secs = Math.round(durationMs / 1000);
+    parts.push(`**Duration:** ${secs}s`);
+  }
+  if (reviewTable && reviewTable.length > 0) {
+    parts.push("", "## Implementation Review");
+    parts.push("| Element | Implemented | Unit Tested | Behaviour Tested | Website Used | Notes |");
+    parts.push("|---------|-------------|-------------|------------------|--------------|-------|");
+    for (const row of reviewTable) {
+      parts.push(`| ${row.element || ""} | ${row.implemented || ""} | ${row.unitTested || ""} | ${row.behaviourTested || ""} | ${row.websiteUsed || ""} | ${row.notes || ""} |`);
+    }
+  }
+  if (completenessAdvice) {
+    parts.push("", "## Completeness Assessment");
+    parts.push(completenessAdvice);
+  }
+  if (missionMetrics && missionMetrics.length > 0) {
+    parts.push("", "## Mission Metrics");
+    parts.push("| Metric | Value | Target | Status |");
+    parts.push("|--------|-------|--------|--------|");
+    for (const m of missionMetrics) {
+      parts.push(`| ${m.metric} | ${m.value} | ${m.target} | ${m.status} |`);
+    }
+  }
+  if (narrative) {
+    parts.push("", "## Narrative");
+    parts.push(narrative);
+  }
+  if (contextNotes) {
+    parts.push("", "## Context Notes");
+    parts.push(contextNotes);
+  }
+  parts.push("", "---");
+  parts.push(`Generated by agentic-step ${task} at ${now.toISOString()}`);
+  writeFileSync(filename, parts.join("\n"));
+  return filename;
+}
 /**
  * Generate closing notes from limits status, flagging limits at or approaching capacity.
  *

package/src/actions/agentic-step/tasks/direct.js CHANGED Viewed

@@ -71,6 +71,14 @@ function buildMetricAssessment(ctx, config) {
   const minTests = thresholds.minDedicatedTests ?? 1;
   const maxTodos = thresholds.maxSourceTodos ?? 0;
+  // Implementation review gaps (passed from workflow via env)
+  let reviewGaps = [];
+  try {
+    const gapsJson = process.env.REVIEW_GAPS;
+    if (gapsJson) reviewGaps = JSON.parse(gapsJson);
+  } catch { /* ignore parse errors */ }
+  const criticalGaps = reviewGaps.filter((g) => g.severity === "critical");
   const metrics = [
     { metric: "Open issues", value: ctx.issuesSummary.length, target: 0, met: ctx.issuesSummary.length === 0 },
     { metric: "Open PRs", value: ctx.prsSummary.length, target: 0, met: ctx.prsSummary.length === 0 },
@@ -78,6 +86,7 @@ function buildMetricAssessment(ctx, config) {
     { metric: "Dedicated tests", value: ctx.dedicatedTestCount, target: minTests, met: ctx.dedicatedTestCount >= minTests },
     { metric: "Source TODOs", value: ctx.sourceTodoCount, target: maxTodos, met: ctx.sourceTodoCount <= maxTodos },
     { metric: "Budget", value: ctx.cumulativeTransformationCost, target: ctx.transformationBudget || "unlimited", met: !(ctx.transformationBudget > 0 && ctx.cumulativeTransformationCost >= ctx.transformationBudget) },
+    { metric: "Implementation review", value: criticalGaps.length === 0 ? "No critical gaps" : `${criticalGaps.length} critical gap(s)`, target: "No critical gaps", met: criticalGaps.length === 0 },
   ];
   const allMet = metrics.every((m) => m.met);
@@ -124,10 +133,29 @@ function buildPrompt(ctx, agentInstructions, metricAssessment) {
     `Source TODOs: ${ctx.sourceTodoCount}`,
     `Transformation budget: ${ctx.cumulativeTransformationCost}/${ctx.transformationBudget || "unlimited"}`,
     "",
+    ...(process.env.REVIEW_ADVICE ? [
+      "## Implementation Review",
+      `**Completeness:** ${process.env.REVIEW_ADVICE}`,
+      ...((() => {
+        try {
+          const gaps = JSON.parse(process.env.REVIEW_GAPS || "[]");
+          if (gaps.length > 0) {
+            return [
+              "",
+              "### Gaps Found",
+              ...gaps.map((g) => `- [${g.severity}] ${g.element}: ${g.description} (${g.gapType})`),
+            ];
+          }
+        } catch { /* ignore */ }
+        return [];
+      })()),
+      "",
+    ] : []),
     "## Your Task",
     "Use list_issues and list_prs to review open work items.",
     "Use read_file to inspect source code and tests for completeness.",
     "Use git_diff or git_status for additional context if needed.",
+    "Consider the implementation review findings — if critical gaps exist, do NOT declare mission-complete.",
     "Then call report_director_decision with your determination.",
     "",
     "**You MUST call report_director_decision exactly once.**",

package/src/actions/agentic-step/tasks/implementation-review.js ADDED Viewed

@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-3.0-only
+// Copyright (C) 2025-2026 Polycode Limited
+// tasks/implementation-review.js — Trace mission elements through code, tests, website
+//
+// Uses runCopilotSession with read-only tools to decompose the mission and
+// verify each element is implemented, tested, and presented on the website.
+// Produces a structured review with test-result table and completeness advice.
+import * as core from "@actions/core";
+import { existsSync, readdirSync } from "fs";
+import { readOptionalFile, extractNarrative, NARRATIVE_INSTRUCTION } from "../copilot.js";
+import { runCopilotSession } from "../../../copilot/copilot-session.js";
+import { createGitHubTools, createGitTools } from "../../../copilot/github-tools.js";
+function buildReviewPrompt(mission, config, agentInstructions, agentLogsSummary) {
+  const sourcePath = config.paths?.source?.path || "src/lib/";
+  const testsPath = config.paths?.tests?.path || "tests/";
+  const webPath = config.paths?.web?.path || "src/web/";
+  const behaviourPath = config.paths?.behaviour?.path || "tests/behaviour/";
+  const featuresPath = config.paths?.features?.path || "features/";
+  return [
+    "## Instructions",
+    agentInstructions,
+    "",
+    "## Mission",
+    mission || "(no mission defined)",
+    "",
+    "## Repository Paths",
+    `- Source: \`${sourcePath}\``,
+    `- Tests: \`${testsPath}\``,
+    `- Web: \`${webPath}\``,
+    `- Behaviour tests: \`${behaviourPath}\``,
+    `- Features: \`${featuresPath}\``,
+    "",
+    ...(agentLogsSummary ? ["## Previous Reviews", agentLogsSummary, ""] : []),
+    "## Your Task",
+    "1. Read MISSION.md and decompose it into discrete deliverable elements.",
+    "2. For each element, use list_files and read_file to trace it through:",
+    "   - Source implementation in src/lib/",
+    "   - Unit tests in tests/",
+    "   - Behaviour tests (Playwright)",
+    "   - Website usage in src/web/ or docs/",
+    "   - Integration path (how website accesses the library)",
+    "   - Behaviour test coverage of the website feature",
+    "3. Flag misleading patterns:",
+    "   - Issues closed without corresponding code changes",
+    "   - Tests that don't assert anything meaningful (empty/trivial)",
+    "   - Features listed as done in docs but missing from code",
+    "   - PRs merged without test coverage for the claimed feature",
+    "4. Call report_implementation_review with your findings.",
+    "",
+    "**You MUST call report_implementation_review exactly once.**",
+  ].join("\n");
+}
+/**
+ * Implementation review task: decompose mission, trace through code/tests/website.
+ *
+ * @param {Object} context - Task context from index.js
+ * @returns {Promise<Object>} Result with outcome, review data, tokensUsed, model
+ */
+export async function implementationReview(context) {
+  const { config, instructions, model, octokit, repo, logFilePath, screenshotFilePath } = context;
+  const t = config.tuning || {};
+  const mission = readOptionalFile(config.paths.mission.path);
+  if (!mission) {
+    return { outcome: "nop", details: "No mission defined — skipping implementation review" };
+  }
+  if (existsSync("MISSION_COMPLETE.md") && config.supervisor !== "maintenance") {
+    return { outcome: "nop", details: "Mission already complete" };
+  }
+  if (existsSync("MISSION_FAILED.md")) {
+    return { outcome: "nop", details: "Mission already failed" };
+  }
+  // Check for previous agent logs
+  const agentLogsDir = ".agent-logs";
+  let agentLogsSummary = "";
+  if (existsSync(agentLogsDir)) {
+    try {
+      const files = readdirSync(agentLogsDir).filter((f) => f.startsWith("agent-log-") && f.endsWith(".md"));
+      if (files.length > 0) {
+        agentLogsSummary = `${files.length} previous agent log file(s) available. Use list_files and read_file on .agent-logs/ to review them.`;
+      }
+    } catch { /* ignore */ }
+  }
+  const agentInstructions = instructions || "Review the implementation completeness of the mission.";
+  const prompt = buildReviewPrompt(mission, config, agentInstructions, agentLogsSummary);
+  const systemPrompt =
+    "You are an implementation review agent for an autonomous coding repository. " +
+    "Your job is to trace each mission element through the codebase — verifying that it is " +
+    "implemented in source code, covered by unit tests, exercised by behaviour tests, " +
+    "presented on the website, and that the behaviour tests verify the website presentation. " +
+    "Focus on ground-truth evidence, not metrics. Metrics can be misleading — issues closed " +
+    "in error, trivial tests, or features marked done without code all create false confidence." +
+    NARRATIVE_INSTRUCTION;
+  // Shared mutable state to capture the review
+  const reviewResult = { elements: [], gaps: [], advice: "", misleadingMetrics: [] };
+  const createTools = (defineTool, _wp, logger) => {
+    const ghTools = createGitHubTools(octokit, repo, defineTool, logger);
+    const gitTools = createGitTools(defineTool, logger);
+    const reportReview = defineTool("report_implementation_review", {
+      description: "Report the implementation review findings. Call this exactly once with all traced elements, identified gaps, and completeness advice.",
+      parameters: {
+        type: "object",
+        properties: {
+          elements: {
+            type: "array",
+            items: {
+              type: "object",
+              properties: {
+                name: { type: "string", description: "Mission element name" },
+                implemented: { type: "boolean", description: "Found in source code" },
+                unitTested: { type: "boolean", description: "Has unit test coverage" },
+                behaviourTested: { type: "boolean", description: "Has behaviour/Playwright test coverage" },
+                websiteUsed: { type: "boolean", description: "Used on the website" },
+                integrationPath: { type: "string", description: "How the website accesses this feature" },
+                behaviourCoverage: { type: "string", description: "How behaviour tests verify website presentation" },
+                notes: { type: "string", description: "Additional observations" },
+              },
+              required: ["name", "implemented"],
+            },
+            description: "Mission elements traced through the codebase",
+          },
+          gaps: {
+            type: "array",
+            items: {
+              type: "object",
+              properties: {
+                element: { type: "string", description: "Which mission element has the gap" },
+                gapType: {
+                  type: "string",
+                  enum: ["not-implemented", "not-tested", "not-on-website", "no-behaviour-test", "misleading-metric"],
+                  description: "Type of gap",
+                },
+                description: { type: "string", description: "What is missing" },
+                severity: {
+                  type: "string",
+                  enum: ["critical", "moderate", "low"],
+                  description: "How important this gap is",
+                },
+              },
+              required: ["element", "gapType", "description", "severity"],
+            },
+            description: "Identified implementation gaps",
+          },
+          advice: { type: "string", description: "Single English sentence summarising overall completeness" },
+          misleadingMetrics: {
+            type: "array",
+            items: {
+              type: "object",
+              properties: {
+                metric: { type: "string", description: "Which metric is misleading" },
+                reason: { type: "string", description: "Why it is misleading" },
+                evidence: { type: "string", description: "What evidence supports this" },
+              },
+              required: ["metric", "reason"],
+            },
+            description: "Metrics that may be misleading about actual progress",
+          },
+        },
+        required: ["elements", "gaps", "advice"],
+      },
+      handler: async ({ elements, gaps, advice, misleadingMetrics }) => {
+        reviewResult.elements = elements || [];
+        reviewResult.gaps = gaps || [];
+        reviewResult.advice = advice || "";
+        reviewResult.misleadingMetrics = misleadingMetrics || [];
+        return { textResultForLlm: `Review recorded: ${elements?.length || 0} elements traced, ${gaps?.length || 0} gaps found` };
+      },
+    });
+    return [...ghTools, ...gitTools, reportReview];
+  };
+  const attachments = [];
+  if (logFilePath) attachments.push({ type: "file", path: logFilePath });
+  if (screenshotFilePath) attachments.push({ type: "file", path: screenshotFilePath });
+  const result = await runCopilotSession({
+    workspacePath: process.cwd(),
+    model,
+    tuning: t,
+    agentPrompt: systemPrompt,
+    userPrompt: prompt,
+    writablePaths: [],
+    createTools,
+    attachments,
+    excludedTools: ["write_file", "run_command", "run_tests", "dispatch_workflow", "close_issue", "label_issue", "post_discussion_comment", "create_issue", "comment_on_issue"],
+    logger: { info: core.info, warning: core.warning, error: core.error, debug: core.debug },
+  });
+  const tokensUsed = result.tokensIn + result.tokensOut;
+  // Build review table for logging
+  const reviewTable = reviewResult.elements.map((e) => ({
+    element: e.name,
+    implemented: e.implemented ? "YES" : "NO",
+    unitTested: e.unitTested ? "YES" : "NO",
+    behaviourTested: e.behaviourTested ? "YES" : "NO",
+    websiteUsed: e.websiteUsed ? "YES" : "NO",
+    notes: e.notes || "",
+  }));
+  // Set outputs for downstream jobs
+  core.setOutput("completeness-advice", (reviewResult.advice || "").substring(0, 500));
+  core.setOutput("gaps", JSON.stringify((reviewResult.gaps || []).slice(0, 20)));
+  core.setOutput("review-table", JSON.stringify(reviewTable));
+  return {
+    outcome: "implementation-reviewed",
+    tokensUsed,
+    inputTokens: result.tokensIn,
+    outputTokens: result.tokensOut,
+    cost: 0,
+    model,
+    narrative: result.narrative || reviewResult.advice,
+    reviewTable,
+    reviewGaps: reviewResult.gaps,
+    completenessAdvice: reviewResult.advice,
+    misleadingMetrics: reviewResult.misleadingMetrics,
+    details: `Traced ${reviewResult.elements.length} element(s), found ${reviewResult.gaps.length} gap(s)`,
+  };
+}

package/src/actions/agentic-step/tasks/supervise.js CHANGED Viewed

@@ -474,6 +474,26 @@ function buildPrompt(ctx, agentInstructions, config) {
     `### Recent Activity`,
     ctx.recentActivity || "none",
     "",
+    ...(process.env.REVIEW_ADVICE ? [
+      "### Implementation Review",
+      `**Completeness:** ${process.env.REVIEW_ADVICE}`,
+      ...((() => {
+        try {
+          const gaps = JSON.parse(process.env.REVIEW_GAPS || "[]");
+          if (gaps.length > 0) {
+            return [
+              "",
+              "**Gaps Found:**",
+              ...gaps.map((g) => `- [${g.severity}] ${g.element}: ${g.description} (${g.gapType})`),
+              "",
+              "Consider creating issues with label 'implementation-gap' for critical gaps.",
+            ];
+          }
+        } catch { /* ignore */ }
+        return [];
+      })()),
+      "",
+    ] : []),
     "## Your Task",
     "Use list_issues, list_prs, read_file, and search_discussions to explore the repository state.",
     "Then call report_supervisor_plan with your chosen actions and reasoning.",

package/src/copilot/telemetry.js CHANGED Viewed

@@ -147,6 +147,35 @@ export function readCumulativeCost(intentionFilepath) {
   return [...costMatches].reduce((sum, m) => sum + parseInt(m[1], 10), 0);
 }
+/**
+ * Gather and parse all agent-log-*.md files from a directory.
+ * Returns structured data from each log file for use in prompts and metrics.
+ *
+ * @param {string} logsDir - Directory containing agent-log-*.md files
+ * @returns {Array} Parsed log entries: { filename, task, outcome, advice, content }
+ */
+export function gatherAgentLogs(logsDir) {
+  if (!logsDir || !existsSync(logsDir)) return [];
+  try {
+    const files = readdirSync(logsDir)
+      .filter((f) => f.startsWith("agent-log-") && f.endsWith(".md"))
+      .sort();
+    return files.map((f) => {
+      const content = readFileSync(join(logsDir, f), "utf8");
+      const taskMatch = content.match(/\*\*Task:\*\* (.+)/);
+      const outcomeMatch = content.match(/\*\*Outcome:\*\* (.+)/);
+      const adviceMatch = content.match(/## Completeness Assessment\n([\s\S]*?)(?=\n##|\n---)/);
+      return {
+        filename: f,
+        task: taskMatch ? taskMatch[1].trim() : "unknown",
+        outcome: outcomeMatch ? outcomeMatch[1].trim() : "unknown",
+        advice: adviceMatch ? adviceMatch[1].trim() : "",
+        content,
+      };
+    });
+  } catch { return []; }
+}
 /**
  * Build limits status array for activity logging.
  *

package/src/seeds/zero-package.json CHANGED Viewed

@@ -17,7 +17,7 @@
   "author": "",
   "license": "MIT",
   "dependencies": {
-    "@xn-intenton-z2a/agentic-lib": "^7.4.12"
+    "@xn-intenton-z2a/agentic-lib": "^7.4.13"
   },
   "devDependencies": {
     "@playwright/test": "^1.58.0",