@xn-intenton-z2a/agentic-lib 7.4.12 → 7.4.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -27,6 +27,9 @@ Declare `mission-complete` when ALL of the following are true:
27
27
  3. The Recently Closed Issues confirm that acceptance criteria have been addressed
28
28
  4. No TODOs remain in source code
29
29
  5. Dedicated test files exist (not just seed tests)
30
+ 6. The Implementation Review shows no critical gaps (if review data is present)
31
+
32
+ **Important:** If the Implementation Review section is present in your prompt and identifies critical gaps — missing implementations, untested features, or misleading metrics — do NOT declare mission-complete even if other metrics are met. The review is ground-truth evidence; metrics can be misleading.
30
33
 
31
34
  ### Mission Failed
32
35
  Declare `mission-failed` when ANY of the following are true:
@@ -0,0 +1,63 @@
1
+ ---
2
+ description: Trace mission elements through source code, tests, website, and behaviour tests to verify implementation completeness
3
+ ---
4
+
5
+ You are an implementation review agent for an autonomous coding repository. Your job is to provide **ground-truth evidence** that the mission is actually implemented — not just that metrics say it is.
6
+
7
+ ## Your Role
8
+
9
+ You do NOT write code, create issues, or dispatch workflows. You ONLY review and report. Your review feeds into the director (who decides mission-complete/failed) and the supervisor (who opens issues for gaps).
10
+
11
+ ## Why This Matters
12
+
13
+ Metrics can be misleading:
14
+ - Issues closed in error look like "resolved" issues
15
+ - Trivial tests (empty assertions, tests that always pass) inflate test counts
16
+ - Features marked "done" in documentation but missing from actual code
17
+ - PRs merged that don't actually implement what the issue requested
18
+
19
+ Your job is to look past the metrics and verify the actual state of the code.
20
+
21
+ ## Review Process
22
+
23
+ ### Step 1: Decompose the Mission
24
+ Read MISSION.md and break it into discrete deliverable elements. Each element should be a specific capability or feature that the mission requires.
25
+
26
+ ### Step 2: Trace Each Element
27
+ For each mission element, search the codebase:
28
+
29
+ 1. **Implementation** (`src/lib/`): Is there actual code that implements this? Look for functions, classes, or modules that provide the capability. Read the code to verify it's substantive, not just a stub.
30
+
31
+ 2. **Unit Tests** (`tests/`): Are there test files that import from `src/lib/` and test this element? Read the tests to verify they make meaningful assertions — not just `expect(true).toBe(true)`.
32
+
33
+ 3. **Behaviour Tests** (`tests/behaviour/` or Playwright tests): Are there end-to-end tests that exercise this element? Check that they interact with the actual feature, not just load a page.
34
+
35
+ 4. **Website Usage** (`src/web/`, `docs/`): Does the website actually use this feature? Look for imports from `src/lib/` or API calls that surface the feature to users.
36
+
37
+ 5. **Integration Path**: How does the website access the library? Direct import, script tag, API endpoint? Document the actual mechanism.
38
+
39
+ 6. **Behaviour Coverage**: Do the behaviour tests verify that the website presents this specific feature? Check that Playwright tests assert on feature-specific content, not just generic page structure.
40
+
41
+ ### Step 3: Identify Misleading Metrics
42
+ Look for patterns that could give false confidence:
43
+ - Recently closed issues that have no associated commits or PRs
44
+ - Test files that exist but don't test the claimed feature
45
+ - Documentation that claims completion without corresponding code
46
+ - Issues closed with "not planned" that might have been legitimate work items
47
+
48
+ ### Step 4: Report
49
+ Call `report_implementation_review` with:
50
+ - **elements**: Each mission element with its trace results
51
+ - **gaps**: Specific missing pieces with severity ratings
52
+ - **advice**: One English sentence summarising completeness
53
+ - **misleadingMetrics**: Any metrics that don't reflect reality
54
+
55
+ ## Severity Guide
56
+
57
+ - **critical**: Mission element is not implemented at all, or a core feature has no tests
58
+ - **moderate**: Feature exists but lacks test coverage, or website doesn't expose it
59
+ - **low**: Minor coverage gaps, documentation issues, or cosmetic concerns
60
+
61
+ ## Output
62
+
63
+ You MUST call `report_implementation_review` exactly once with your complete findings.
@@ -18,7 +18,17 @@ Look at which metrics are NOT MET — these tell you what gaps remain:
18
18
  5. Source TODO count > 0 → create an issue to resolve TODOs
19
19
  6. Budget near exhaustion → be strategic with remaining transforms
20
20
 
21
- If all metrics show MET/OK, use `nop` the director will handle the rest.
21
+ 7. Implementation review gaps create issues with label `implementation-gap` for critical gaps
22
+
23
+ If all metrics show MET/OK and no implementation review gaps exist, use `nop` — the director will handle the rest.
24
+
25
+ ### Implementation Review
26
+
27
+ If an **Implementation Review** section is present in the prompt, examine it carefully. The review traces each mission element through source code, tests, website, and behaviour tests. It provides ground-truth evidence of what is actually implemented — not just what metrics suggest.
28
+
29
+ - **Critical gaps** should result in creating a focused issue (label: `implementation-gap`) that describes exactly what is missing
30
+ - **Moderate gaps** should be noted but may not need immediate action
31
+ - **Misleading metrics** should inform your decision-making — don't take actions based on metrics the review has flagged as unreliable
22
32
 
23
33
  ## Priority Order
24
34
 
@@ -708,11 +708,79 @@ jobs:
708
708
  if: github.repository != 'xn-intenton-z2a/agentic-lib' && needs.params.outputs.dry-run != 'true'
709
709
  env:
710
710
  LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
711
- run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md"
711
+ run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md" agent-log-*.md
712
+
713
+ # ─── Implementation Review: traces mission elements through code/tests/website ──
714
+ implementation-review:
715
+ needs: [params]
716
+ if: |
717
+ !cancelled() &&
718
+ (needs.params.outputs.mode == 'full' || needs.params.outputs.mode == 'maintain-only') &&
719
+ needs.params.result == 'success'
720
+ runs-on: ubuntu-latest
721
+ outputs:
722
+ review-advice: ${{ steps.review.outputs.completeness-advice }}
723
+ review-gaps: ${{ steps.review.outputs.gaps }}
724
+ steps:
725
+ - uses: actions/checkout@v6
726
+ with:
727
+ fetch-depth: 0
728
+ ref: ${{ inputs.ref || github.sha }}
729
+
730
+ - name: Fetch log and agent logs from log branch
731
+ env:
732
+ LOG_FILE: ${{ needs.params.outputs.log-file }}
733
+ LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
734
+ SCREENSHOT_FILE: ${{ needs.params.outputs.screenshot-file }}
735
+ run: |
736
+ for f in "${LOG_FILE}" "intention.md" "${SCREENSHOT_FILE}"; do
737
+ git show "origin/${LOG_BRANCH}:${f}" > "$f" 2>/dev/null || true
738
+ done
739
+ mkdir -p .agent-logs
740
+ git fetch origin "${LOG_BRANCH}" 2>/dev/null || true
741
+ for f in $(git ls-tree --name-only "origin/${LOG_BRANCH}" 2>/dev/null | grep '^agent-log-' || true); do
742
+ git show "origin/${LOG_BRANCH}:${f}" > ".agent-logs/${f}" 2>/dev/null || true
743
+ done
744
+ echo "Fetched $(ls .agent-logs/agent-log-*.md 2>/dev/null | wc -l | tr -d ' ') agent log files"
745
+
746
+ - uses: actions/setup-node@v6
747
+ with:
748
+ node-version: "24"
749
+
750
+ - name: Self-init (agentic-lib dev only)
751
+ if: hashFiles('scripts/self-init.sh') != '' && hashFiles('.github/agentic-lib/actions/agentic-step/package.json') == ''
752
+ run: bash scripts/self-init.sh
753
+
754
+ - name: Install agentic-step dependencies
755
+ working-directory: .github/agentic-lib/actions/agentic-step
756
+ run: |
757
+ npm ci
758
+ if [ -d "../../copilot" ]; then
759
+ ln -sf "$(pwd)/node_modules" ../../copilot/node_modules
760
+ fi
761
+
762
+ - name: Run implementation review
763
+ id: review
764
+ if: github.repository != 'xn-intenton-z2a/agentic-lib'
765
+ uses: ./.github/agentic-lib/actions/agentic-step
766
+ env:
767
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
768
+ COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
769
+ with:
770
+ task: "implementation-review"
771
+ config: ${{ needs.params.outputs.config-path }}
772
+ instructions: ".github/agents/agent-implementation-review.md"
773
+ model: ${{ needs.params.outputs.model }}
774
+
775
+ - name: Push log to log branch
776
+ if: github.repository != 'xn-intenton-z2a/agentic-lib' && needs.params.outputs.dry-run != 'true'
777
+ env:
778
+ LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
779
+ run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md" agent-log-*.md
712
780
 
713
781
  # ─── Director: LLM evaluates mission status (complete/failed/in-progress) ──
714
782
  director:
715
- needs: [params, telemetry, maintain]
783
+ needs: [params, telemetry, maintain, implementation-review]
716
784
  if: |
717
785
  !cancelled() &&
718
786
  (needs.params.outputs.mode == 'full' || needs.params.outputs.mode == 'dev-only') &&
@@ -760,6 +828,8 @@ jobs:
760
828
  env:
761
829
  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
762
830
  COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
831
+ REVIEW_ADVICE: ${{ needs.implementation-review.outputs.review-advice }}
832
+ REVIEW_GAPS: ${{ needs.implementation-review.outputs.review-gaps }}
763
833
  with:
764
834
  task: "direct"
765
835
  config: ${{ needs.params.outputs.config-path }}
@@ -770,11 +840,11 @@ jobs:
770
840
  if: github.repository != 'xn-intenton-z2a/agentic-lib' && needs.params.outputs.dry-run != 'true'
771
841
  env:
772
842
  LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
773
- run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md"
843
+ run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md" agent-log-*.md
774
844
 
775
845
  # ─── Supervisor: LLM decides what to do (after director evaluates) ──
776
846
  supervisor:
777
- needs: [params, pr-cleanup, telemetry, maintain, director]
847
+ needs: [params, pr-cleanup, telemetry, maintain, implementation-review, director]
778
848
  if: |
779
849
  !cancelled() &&
780
850
  (needs.params.outputs.mode == 'full' || needs.params.outputs.mode == 'dev-only') &&
@@ -821,6 +891,8 @@ jobs:
821
891
  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
822
892
  COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
823
893
  INPUT_MESSAGE: ${{ needs.params.outputs.message }}
894
+ REVIEW_ADVICE: ${{ needs.implementation-review.outputs.review-advice }}
895
+ REVIEW_GAPS: ${{ needs.implementation-review.outputs.review-gaps }}
824
896
  with:
825
897
  task: "supervise"
826
898
  config: ${{ needs.params.outputs.config-path }}
@@ -831,7 +903,7 @@ jobs:
831
903
  if: github.repository != 'xn-intenton-z2a/agentic-lib' && needs.params.outputs.dry-run != 'true'
832
904
  env:
833
905
  LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
834
- run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md"
906
+ run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md" agent-log-*.md
835
907
 
836
908
  # ─── Fix stuck PRs with failing checks ─────────────────────────────
837
909
  fix-stuck:
@@ -1403,7 +1475,7 @@ jobs:
1403
1475
  if: github.repository != 'xn-intenton-z2a/agentic-lib' && needs.params.outputs.dry-run != 'true'
1404
1476
  env:
1405
1477
  LOG_BRANCH: ${{ needs.params.outputs.log-branch }}
1406
- run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md"
1478
+ run: bash .github/agentic-lib/scripts/push-to-logs.sh "${{ needs.params.outputs.log-file }}" "intention.md" agent-log-*.md
1407
1479
 
1408
1480
  - name: Create PR and attempt merge
1409
1481
  if: github.repository != 'xn-intenton-z2a/agentic-lib' && steps.issue.outputs.issue-number != '' && needs.params.outputs.dry-run != 'true' && steps.pre-commit-test.outputs.tests-passed == 'true' && steps.pre-commit-behaviour-test.outputs.tests-passed != 'false'
@@ -14,7 +14,7 @@
14
14
  // npx @xn-intenton-z2a/agentic-lib maintain-library
15
15
  // npx @xn-intenton-z2a/agentic-lib fix-code
16
16
 
17
- import { copyFileSync, existsSync, mkdirSync, rmSync, rmdirSync, readdirSync, readFileSync, writeFileSync } from "fs";
17
+ import { copyFileSync, existsSync, mkdirSync, rmSync, rmdirSync, readdirSync, readFileSync, writeFileSync, unlinkSync } from "fs";
18
18
  import { applyDistTransform } from "../src/dist-transform.js";
19
19
  import { resolve, dirname, join } from "path";
20
20
  import { fileURLToPath } from "url";
@@ -785,6 +785,19 @@ function initPurge(seedsDir, missionName, initTimestamp) {
785
785
  initTransformFile(tomlSource, resolve(target, "agentic-lib.toml"), "SEED: agentic-lib.toml (transformed)");
786
786
  }
787
787
 
788
+ // Clear agent log files (written by implementation-review and other tasks)
789
+ try {
790
+ const agentLogs = readdirSync(target).filter((f) => f.startsWith("agent-log-") && f.endsWith(".md"));
791
+ for (const f of agentLogs) {
792
+ console.log(` DELETE: ${f} (agent log)`);
793
+ if (!dryRun) {
794
+ unlinkSync(resolve(target, f));
795
+ }
796
+ initChanges++;
797
+ }
798
+ if (agentLogs.length > 0) console.log(` Cleared ${agentLogs.length} agent log file(s)`);
799
+ } catch { /* ignore — directory may not have agent logs */ }
800
+
788
801
  // Copy mission seed file as MISSION.md
789
802
  const missionsDir = resolve(seedsDir, "missions");
790
803
  const missionFile = resolve(missionsDir, `${missionName}.md`);
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@xn-intenton-z2a/agentic-lib",
3
- "version": "7.4.12",
3
+ "version": "7.4.13",
4
4
  "description": "Agentic-lib Agentic Coding Systems SDK powering automated GitHub workflows.",
5
5
  "type": "module",
6
6
  "scripts": {
@@ -8,7 +8,7 @@
8
8
  import * as core from "@actions/core";
9
9
  import * as github from "@actions/github";
10
10
  import { loadConfig, getWritablePaths } from "./config-loader.js";
11
- import { logActivity, generateClosingNotes } from "./logging.js";
11
+ import { logActivity, generateClosingNotes, writeAgentLog } from "./logging.js";
12
12
  import { readFileSync, existsSync } from "fs";
13
13
  import {
14
14
  buildMissionMetrics, buildMissionReadiness,
@@ -30,12 +30,14 @@ import { reviewIssue } from "./tasks/review-issue.js";
30
30
  import { discussions } from "./tasks/discussions.js";
31
31
  import { supervise } from "./tasks/supervise.js";
32
32
  import { direct } from "./tasks/direct.js";
33
+ import { implementationReview } from "./tasks/implementation-review.js";
33
34
 
34
35
  const TASKS = {
35
36
  "resolve-issue": resolveIssue, "fix-code": fixCode, "transform": transform,
36
37
  "maintain-features": maintainFeatures, "maintain-library": maintainLibrary,
37
38
  "enhance-issue": enhanceIssue, "review-issue": reviewIssue,
38
39
  "discussions": discussions, "supervise": supervise, "direct": direct,
40
+ "implementation-review": implementationReview,
39
41
  };
40
42
 
41
43
  async function run() {
@@ -138,6 +140,20 @@ async function run() {
138
140
  });
139
141
  }
140
142
 
143
+ // Write standalone agent log file (pushed to agentic-lib-logs branch by workflow)
144
+ try {
145
+ const agentLogFile = writeAgentLog({
146
+ task, outcome: result.outcome || "completed",
147
+ model: result.model || model, durationMs, tokensUsed: result.tokensUsed,
148
+ narrative: result.narrative, contextNotes: result.contextNotes,
149
+ reviewTable: result.reviewTable, completenessAdvice: result.completenessAdvice,
150
+ missionMetrics,
151
+ });
152
+ core.info(`Agent log written: ${agentLogFile}`);
153
+ } catch (err) {
154
+ core.warning(`Could not write agent log: ${err.message}`);
155
+ }
156
+
141
157
  core.info(`agentic-step completed: outcome=${result.outcome}`);
142
158
  } catch (error) {
143
159
  core.setFailed(`agentic-step failed: ${error.message}`);
@@ -5,7 +5,7 @@
5
5
  // Appends structured entries to the intentïon.md activity log,
6
6
  // including commit URLs and safety-check outcomes.
7
7
 
8
- import { writeFileSync, readFileSync, appendFileSync, existsSync, mkdirSync, copyFileSync } from "fs";
8
+ import { writeFileSync, readFileSync, appendFileSync, existsSync, mkdirSync, copyFileSync, readdirSync } from "fs";
9
9
  import { dirname, basename } from "path";
10
10
  import { join } from "path";
11
11
  import * as core from "@actions/core";
@@ -172,6 +172,87 @@ export function logActivity({
172
172
  }
173
173
  }
174
174
 
175
+ /**
176
+ * Write a standalone agent log file for a single task execution.
177
+ * Each file is uniquely named with a filesystem-safe datetime stamp.
178
+ *
179
+ * @param {Object} options
180
+ * @param {string} options.task - The task name
181
+ * @param {string} options.outcome - The task outcome
182
+ * @param {string} [options.model] - Model used
183
+ * @param {number} [options.durationMs] - Task duration in milliseconds
184
+ * @param {string} [options.narrative] - LLM-generated narrative
185
+ * @param {Array} [options.reviewTable] - Implementation review table rows
186
+ * @param {string} [options.completenessAdvice] - English completeness assessment
187
+ * @param {string} [options.contextNotes] - Additional context notes
188
+ * @param {Array} [options.missionMetrics] - Mission metrics entries
189
+ * @param {number} [options.tokensUsed] - Total tokens consumed
190
+ * @returns {string} The filename of the written log file
191
+ */
192
+ export function writeAgentLog({
193
+ task, outcome, model, durationMs, narrative,
194
+ reviewTable, completenessAdvice, contextNotes,
195
+ missionMetrics, tokensUsed,
196
+ }) {
197
+ const now = new Date();
198
+ const stamp = now.toISOString().replace(/:/g, "-").replace(/\./g, "-");
199
+ const filename = `agent-log-${stamp}.md`;
200
+
201
+ const parts = [
202
+ `# Agent Log: ${task} at ${now.toISOString()}`,
203
+ "",
204
+ "## Summary",
205
+ `**Task:** ${task}`,
206
+ `**Outcome:** ${outcome}`,
207
+ ];
208
+
209
+ if (model) parts.push(`**Model:** ${model}`);
210
+ if (tokensUsed) parts.push(`**Tokens:** ${tokensUsed}`);
211
+ if (durationMs) {
212
+ const secs = Math.round(durationMs / 1000);
213
+ parts.push(`**Duration:** ${secs}s`);
214
+ }
215
+
216
+ if (reviewTable && reviewTable.length > 0) {
217
+ parts.push("", "## Implementation Review");
218
+ parts.push("| Element | Implemented | Unit Tested | Behaviour Tested | Website Used | Notes |");
219
+ parts.push("|---------|-------------|-------------|------------------|--------------|-------|");
220
+ for (const row of reviewTable) {
221
+ parts.push(`| ${row.element || ""} | ${row.implemented || ""} | ${row.unitTested || ""} | ${row.behaviourTested || ""} | ${row.websiteUsed || ""} | ${row.notes || ""} |`);
222
+ }
223
+ }
224
+
225
+ if (completenessAdvice) {
226
+ parts.push("", "## Completeness Assessment");
227
+ parts.push(completenessAdvice);
228
+ }
229
+
230
+ if (missionMetrics && missionMetrics.length > 0) {
231
+ parts.push("", "## Mission Metrics");
232
+ parts.push("| Metric | Value | Target | Status |");
233
+ parts.push("|--------|-------|--------|--------|");
234
+ for (const m of missionMetrics) {
235
+ parts.push(`| ${m.metric} | ${m.value} | ${m.target} | ${m.status} |`);
236
+ }
237
+ }
238
+
239
+ if (narrative) {
240
+ parts.push("", "## Narrative");
241
+ parts.push(narrative);
242
+ }
243
+
244
+ if (contextNotes) {
245
+ parts.push("", "## Context Notes");
246
+ parts.push(contextNotes);
247
+ }
248
+
249
+ parts.push("", "---");
250
+ parts.push(`Generated by agentic-step ${task} at ${now.toISOString()}`);
251
+
252
+ writeFileSync(filename, parts.join("\n"));
253
+ return filename;
254
+ }
255
+
175
256
  /**
176
257
  * Generate closing notes from limits status, flagging limits at or approaching capacity.
177
258
  *
@@ -71,6 +71,14 @@ function buildMetricAssessment(ctx, config) {
71
71
  const minTests = thresholds.minDedicatedTests ?? 1;
72
72
  const maxTodos = thresholds.maxSourceTodos ?? 0;
73
73
 
74
+ // Implementation review gaps (passed from workflow via env)
75
+ let reviewGaps = [];
76
+ try {
77
+ const gapsJson = process.env.REVIEW_GAPS;
78
+ if (gapsJson) reviewGaps = JSON.parse(gapsJson);
79
+ } catch { /* ignore parse errors */ }
80
+ const criticalGaps = reviewGaps.filter((g) => g.severity === "critical");
81
+
74
82
  const metrics = [
75
83
  { metric: "Open issues", value: ctx.issuesSummary.length, target: 0, met: ctx.issuesSummary.length === 0 },
76
84
  { metric: "Open PRs", value: ctx.prsSummary.length, target: 0, met: ctx.prsSummary.length === 0 },
@@ -78,6 +86,7 @@ function buildMetricAssessment(ctx, config) {
78
86
  { metric: "Dedicated tests", value: ctx.dedicatedTestCount, target: minTests, met: ctx.dedicatedTestCount >= minTests },
79
87
  { metric: "Source TODOs", value: ctx.sourceTodoCount, target: maxTodos, met: ctx.sourceTodoCount <= maxTodos },
80
88
  { metric: "Budget", value: ctx.cumulativeTransformationCost, target: ctx.transformationBudget || "unlimited", met: !(ctx.transformationBudget > 0 && ctx.cumulativeTransformationCost >= ctx.transformationBudget) },
89
+ { metric: "Implementation review", value: criticalGaps.length === 0 ? "No critical gaps" : `${criticalGaps.length} critical gap(s)`, target: "No critical gaps", met: criticalGaps.length === 0 },
81
90
  ];
82
91
 
83
92
  const allMet = metrics.every((m) => m.met);
@@ -124,10 +133,29 @@ function buildPrompt(ctx, agentInstructions, metricAssessment) {
124
133
  `Source TODOs: ${ctx.sourceTodoCount}`,
125
134
  `Transformation budget: ${ctx.cumulativeTransformationCost}/${ctx.transformationBudget || "unlimited"}`,
126
135
  "",
136
+ ...(process.env.REVIEW_ADVICE ? [
137
+ "## Implementation Review",
138
+ `**Completeness:** ${process.env.REVIEW_ADVICE}`,
139
+ ...((() => {
140
+ try {
141
+ const gaps = JSON.parse(process.env.REVIEW_GAPS || "[]");
142
+ if (gaps.length > 0) {
143
+ return [
144
+ "",
145
+ "### Gaps Found",
146
+ ...gaps.map((g) => `- [${g.severity}] ${g.element}: ${g.description} (${g.gapType})`),
147
+ ];
148
+ }
149
+ } catch { /* ignore */ }
150
+ return [];
151
+ })()),
152
+ "",
153
+ ] : []),
127
154
  "## Your Task",
128
155
  "Use list_issues and list_prs to review open work items.",
129
156
  "Use read_file to inspect source code and tests for completeness.",
130
157
  "Use git_diff or git_status for additional context if needed.",
158
+ "Consider the implementation review findings — if critical gaps exist, do NOT declare mission-complete.",
131
159
  "Then call report_director_decision with your determination.",
132
160
  "",
133
161
  "**You MUST call report_director_decision exactly once.**",
@@ -0,0 +1,232 @@
1
+ // SPDX-License-Identifier: GPL-3.0-only
2
+ // Copyright (C) 2025-2026 Polycode Limited
3
+ // tasks/implementation-review.js — Trace mission elements through code, tests, website
4
+ //
5
+ // Uses runCopilotSession with read-only tools to decompose the mission and
6
+ // verify each element is implemented, tested, and presented on the website.
7
+ // Produces a structured review with test-result table and completeness advice.
8
+
9
+ import * as core from "@actions/core";
10
+ import { existsSync, readdirSync } from "fs";
11
+ import { readOptionalFile, extractNarrative, NARRATIVE_INSTRUCTION } from "../copilot.js";
12
+ import { runCopilotSession } from "../../../copilot/copilot-session.js";
13
+ import { createGitHubTools, createGitTools } from "../../../copilot/github-tools.js";
14
+
15
+ function buildReviewPrompt(mission, config, agentInstructions, agentLogsSummary) {
16
+ const sourcePath = config.paths?.source?.path || "src/lib/";
17
+ const testsPath = config.paths?.tests?.path || "tests/";
18
+ const webPath = config.paths?.web?.path || "src/web/";
19
+ const behaviourPath = config.paths?.behaviour?.path || "tests/behaviour/";
20
+ const featuresPath = config.paths?.features?.path || "features/";
21
+
22
+ return [
23
+ "## Instructions",
24
+ agentInstructions,
25
+ "",
26
+ "## Mission",
27
+ mission || "(no mission defined)",
28
+ "",
29
+ "## Repository Paths",
30
+ `- Source: \`${sourcePath}\``,
31
+ `- Tests: \`${testsPath}\``,
32
+ `- Web: \`${webPath}\``,
33
+ `- Behaviour tests: \`${behaviourPath}\``,
34
+ `- Features: \`${featuresPath}\``,
35
+ "",
36
+ ...(agentLogsSummary ? ["## Previous Reviews", agentLogsSummary, ""] : []),
37
+ "## Your Task",
38
+ "1. Read MISSION.md and decompose it into discrete deliverable elements.",
39
+ "2. For each element, use list_files and read_file to trace it through:",
40
+ " - Source implementation in src/lib/",
41
+ " - Unit tests in tests/",
42
+ " - Behaviour tests (Playwright)",
43
+ " - Website usage in src/web/ or docs/",
44
+ " - Integration path (how website accesses the library)",
45
+ " - Behaviour test coverage of the website feature",
46
+ "3. Flag misleading patterns:",
47
+ " - Issues closed without corresponding code changes",
48
+ " - Tests that don't assert anything meaningful (empty/trivial)",
49
+ " - Features listed as done in docs but missing from code",
50
+ " - PRs merged without test coverage for the claimed feature",
51
+ "4. Call report_implementation_review with your findings.",
52
+ "",
53
+ "**You MUST call report_implementation_review exactly once.**",
54
+ ].join("\n");
55
+ }
56
+
57
+ /**
58
+ * Implementation review task: decompose mission, trace through code/tests/website.
59
+ *
60
+ * @param {Object} context - Task context from index.js
61
+ * @returns {Promise<Object>} Result with outcome, review data, tokensUsed, model
62
+ */
63
+ export async function implementationReview(context) {
64
+ const { config, instructions, model, octokit, repo, logFilePath, screenshotFilePath } = context;
65
+ const t = config.tuning || {};
66
+
67
+ const mission = readOptionalFile(config.paths.mission.path);
68
+ if (!mission) {
69
+ return { outcome: "nop", details: "No mission defined — skipping implementation review" };
70
+ }
71
+
72
+ if (existsSync("MISSION_COMPLETE.md") && config.supervisor !== "maintenance") {
73
+ return { outcome: "nop", details: "Mission already complete" };
74
+ }
75
+ if (existsSync("MISSION_FAILED.md")) {
76
+ return { outcome: "nop", details: "Mission already failed" };
77
+ }
78
+
79
+ // Check for previous agent logs
80
+ const agentLogsDir = ".agent-logs";
81
+ let agentLogsSummary = "";
82
+ if (existsSync(agentLogsDir)) {
83
+ try {
84
+ const files = readdirSync(agentLogsDir).filter((f) => f.startsWith("agent-log-") && f.endsWith(".md"));
85
+ if (files.length > 0) {
86
+ agentLogsSummary = `${files.length} previous agent log file(s) available. Use list_files and read_file on .agent-logs/ to review them.`;
87
+ }
88
+ } catch { /* ignore */ }
89
+ }
90
+
91
+ const agentInstructions = instructions || "Review the implementation completeness of the mission.";
92
+ const prompt = buildReviewPrompt(mission, config, agentInstructions, agentLogsSummary);
93
+
94
+ const systemPrompt =
95
+ "You are an implementation review agent for an autonomous coding repository. " +
96
+ "Your job is to trace each mission element through the codebase — verifying that it is " +
97
+ "implemented in source code, covered by unit tests, exercised by behaviour tests, " +
98
+ "presented on the website, and that the behaviour tests verify the website presentation. " +
99
+ "Focus on ground-truth evidence, not metrics. Metrics can be misleading — issues closed " +
100
+ "in error, trivial tests, or features marked done without code all create false confidence." +
101
+ NARRATIVE_INSTRUCTION;
102
+
103
+ // Shared mutable state to capture the review
104
+ const reviewResult = { elements: [], gaps: [], advice: "", misleadingMetrics: [] };
105
+
106
+ const createTools = (defineTool, _wp, logger) => {
107
+ const ghTools = createGitHubTools(octokit, repo, defineTool, logger);
108
+ const gitTools = createGitTools(defineTool, logger);
109
+
110
+ const reportReview = defineTool("report_implementation_review", {
111
+ description: "Report the implementation review findings. Call this exactly once with all traced elements, identified gaps, and completeness advice.",
112
+ parameters: {
113
+ type: "object",
114
+ properties: {
115
+ elements: {
116
+ type: "array",
117
+ items: {
118
+ type: "object",
119
+ properties: {
120
+ name: { type: "string", description: "Mission element name" },
121
+ implemented: { type: "boolean", description: "Found in source code" },
122
+ unitTested: { type: "boolean", description: "Has unit test coverage" },
123
+ behaviourTested: { type: "boolean", description: "Has behaviour/Playwright test coverage" },
124
+ websiteUsed: { type: "boolean", description: "Used on the website" },
125
+ integrationPath: { type: "string", description: "How the website accesses this feature" },
126
+ behaviourCoverage: { type: "string", description: "How behaviour tests verify website presentation" },
127
+ notes: { type: "string", description: "Additional observations" },
128
+ },
129
+ required: ["name", "implemented"],
130
+ },
131
+ description: "Mission elements traced through the codebase",
132
+ },
133
+ gaps: {
134
+ type: "array",
135
+ items: {
136
+ type: "object",
137
+ properties: {
138
+ element: { type: "string", description: "Which mission element has the gap" },
139
+ gapType: {
140
+ type: "string",
141
+ enum: ["not-implemented", "not-tested", "not-on-website", "no-behaviour-test", "misleading-metric"],
142
+ description: "Type of gap",
143
+ },
144
+ description: { type: "string", description: "What is missing" },
145
+ severity: {
146
+ type: "string",
147
+ enum: ["critical", "moderate", "low"],
148
+ description: "How important this gap is",
149
+ },
150
+ },
151
+ required: ["element", "gapType", "description", "severity"],
152
+ },
153
+ description: "Identified implementation gaps",
154
+ },
155
+ advice: { type: "string", description: "Single English sentence summarising overall completeness" },
156
+ misleadingMetrics: {
157
+ type: "array",
158
+ items: {
159
+ type: "object",
160
+ properties: {
161
+ metric: { type: "string", description: "Which metric is misleading" },
162
+ reason: { type: "string", description: "Why it is misleading" },
163
+ evidence: { type: "string", description: "What evidence supports this" },
164
+ },
165
+ required: ["metric", "reason"],
166
+ },
167
+ description: "Metrics that may be misleading about actual progress",
168
+ },
169
+ },
170
+ required: ["elements", "gaps", "advice"],
171
+ },
172
+ handler: async ({ elements, gaps, advice, misleadingMetrics }) => {
173
+ reviewResult.elements = elements || [];
174
+ reviewResult.gaps = gaps || [];
175
+ reviewResult.advice = advice || "";
176
+ reviewResult.misleadingMetrics = misleadingMetrics || [];
177
+ return { textResultForLlm: `Review recorded: ${elements?.length || 0} elements traced, ${gaps?.length || 0} gaps found` };
178
+ },
179
+ });
180
+
181
+ return [...ghTools, ...gitTools, reportReview];
182
+ };
183
+
184
+ const attachments = [];
185
+ if (logFilePath) attachments.push({ type: "file", path: logFilePath });
186
+ if (screenshotFilePath) attachments.push({ type: "file", path: screenshotFilePath });
187
+
188
+ const result = await runCopilotSession({
189
+ workspacePath: process.cwd(),
190
+ model,
191
+ tuning: t,
192
+ agentPrompt: systemPrompt,
193
+ userPrompt: prompt,
194
+ writablePaths: [],
195
+ createTools,
196
+ attachments,
197
+ excludedTools: ["write_file", "run_command", "run_tests", "dispatch_workflow", "close_issue", "label_issue", "post_discussion_comment", "create_issue", "comment_on_issue"],
198
+ logger: { info: core.info, warning: core.warning, error: core.error, debug: core.debug },
199
+ });
200
+
201
+ const tokensUsed = result.tokensIn + result.tokensOut;
202
+
203
+ // Build review table for logging
204
+ const reviewTable = reviewResult.elements.map((e) => ({
205
+ element: e.name,
206
+ implemented: e.implemented ? "YES" : "NO",
207
+ unitTested: e.unitTested ? "YES" : "NO",
208
+ behaviourTested: e.behaviourTested ? "YES" : "NO",
209
+ websiteUsed: e.websiteUsed ? "YES" : "NO",
210
+ notes: e.notes || "",
211
+ }));
212
+
213
+ // Set outputs for downstream jobs
214
+ core.setOutput("completeness-advice", (reviewResult.advice || "").substring(0, 500));
215
+ core.setOutput("gaps", JSON.stringify((reviewResult.gaps || []).slice(0, 20)));
216
+ core.setOutput("review-table", JSON.stringify(reviewTable));
217
+
218
+ return {
219
+ outcome: "implementation-reviewed",
220
+ tokensUsed,
221
+ inputTokens: result.tokensIn,
222
+ outputTokens: result.tokensOut,
223
+ cost: 0,
224
+ model,
225
+ narrative: result.narrative || reviewResult.advice,
226
+ reviewTable,
227
+ reviewGaps: reviewResult.gaps,
228
+ completenessAdvice: reviewResult.advice,
229
+ misleadingMetrics: reviewResult.misleadingMetrics,
230
+ details: `Traced ${reviewResult.elements.length} element(s), found ${reviewResult.gaps.length} gap(s)`,
231
+ };
232
+ }
@@ -474,6 +474,26 @@ function buildPrompt(ctx, agentInstructions, config) {
474
474
  `### Recent Activity`,
475
475
  ctx.recentActivity || "none",
476
476
  "",
477
+ ...(process.env.REVIEW_ADVICE ? [
478
+ "### Implementation Review",
479
+ `**Completeness:** ${process.env.REVIEW_ADVICE}`,
480
+ ...((() => {
481
+ try {
482
+ const gaps = JSON.parse(process.env.REVIEW_GAPS || "[]");
483
+ if (gaps.length > 0) {
484
+ return [
485
+ "",
486
+ "**Gaps Found:**",
487
+ ...gaps.map((g) => `- [${g.severity}] ${g.element}: ${g.description} (${g.gapType})`),
488
+ "",
489
+ "Consider creating issues with label 'implementation-gap' for critical gaps.",
490
+ ];
491
+ }
492
+ } catch { /* ignore */ }
493
+ return [];
494
+ })()),
495
+ "",
496
+ ] : []),
477
497
  "## Your Task",
478
498
  "Use list_issues, list_prs, read_file, and search_discussions to explore the repository state.",
479
499
  "Then call report_supervisor_plan with your chosen actions and reasoning.",
@@ -147,6 +147,35 @@ export function readCumulativeCost(intentionFilepath) {
147
147
  return [...costMatches].reduce((sum, m) => sum + parseInt(m[1], 10), 0);
148
148
  }
149
149
 
150
+ /**
151
+ * Gather and parse all agent-log-*.md files from a directory.
152
+ * Returns structured data from each log file for use in prompts and metrics.
153
+ *
154
+ * @param {string} logsDir - Directory containing agent-log-*.md files
155
+ * @returns {Array} Parsed log entries: { filename, task, outcome, advice, content }
156
+ */
157
+ export function gatherAgentLogs(logsDir) {
158
+ if (!logsDir || !existsSync(logsDir)) return [];
159
+ try {
160
+ const files = readdirSync(logsDir)
161
+ .filter((f) => f.startsWith("agent-log-") && f.endsWith(".md"))
162
+ .sort();
163
+ return files.map((f) => {
164
+ const content = readFileSync(join(logsDir, f), "utf8");
165
+ const taskMatch = content.match(/\*\*Task:\*\* (.+)/);
166
+ const outcomeMatch = content.match(/\*\*Outcome:\*\* (.+)/);
167
+ const adviceMatch = content.match(/## Completeness Assessment\n([\s\S]*?)(?=\n##|\n---)/);
168
+ return {
169
+ filename: f,
170
+ task: taskMatch ? taskMatch[1].trim() : "unknown",
171
+ outcome: outcomeMatch ? outcomeMatch[1].trim() : "unknown",
172
+ advice: adviceMatch ? adviceMatch[1].trim() : "",
173
+ content,
174
+ };
175
+ });
176
+ } catch { return []; }
177
+ }
178
+
150
179
  /**
151
180
  * Build limits status array for activity logging.
152
181
  *
@@ -17,7 +17,7 @@
17
17
  "author": "",
18
18
  "license": "MIT",
19
19
  "dependencies": {
20
- "@xn-intenton-z2a/agentic-lib": "^7.4.12"
20
+ "@xn-intenton-z2a/agentic-lib": "^7.4.13"
21
21
  },
22
22
  "devDependencies": {
23
23
  "@playwright/test": "^1.58.0",