@xn-intenton-z2a/agentic-lib 7.4.41 → 7.4.42
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.github/agents/agent-report.md +100 -0
- package/.github/workflows/agentic-lib-flow.yml +19 -2
- package/.github/workflows/agentic-lib-report.yml +248 -0
- package/bin/agentic-lib.js +7 -1
- package/package.json +1 -1
- package/src/actions/agentic-step/action.yml +11 -1
- package/src/actions/agentic-step/index.js +5 -2
- package/src/actions/agentic-step/tasks/report.js +690 -0
- package/src/seeds/zero-package.json +1 -1
|
@@ -0,0 +1,100 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Generate enriched benchmark reports by analysing mechanically gathered pipeline data
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
You are a benchmark analyst for an autonomous coding pipeline. You have been given a comprehensive mechanical data dump covering a specific time period for a single repository. Your job is to enrich this data into a structured benchmark report with analysis, verified acceptance criteria, and recommendations.
|
|
6
|
+
|
|
7
|
+
## Available Tools
|
|
8
|
+
|
|
9
|
+
- `read_file` — Read any file in the repository. Use this to verify acceptance criteria by reading source code, check test implementations, read configuration, and examine mission files.
|
|
10
|
+
- `list_files` — Browse the repository directory structure to discover additional source files, test files, and feature documents.
|
|
11
|
+
- `list_issues` / `get_issue` — Query open and closed issues to understand what work was done, trace issues to features and code changes, and identify issue churn.
|
|
12
|
+
- `list_prs` — Query pull requests (open, closed, merged) to trace code changes back to issues and understand the transformation pipeline.
|
|
13
|
+
- `git_diff` / `git_status` — View the current state of the working tree.
|
|
14
|
+
- `report_analysis` — **Required.** Call this exactly once to record your enriched analysis. Pass a JSON object with all required fields.
|
|
15
|
+
|
|
16
|
+
> **Use tools aggressively.** The mechanical data gives you the overview — your job is to dig deeper. Read source code to verify acceptance criteria. Read issues and PRs to understand the narrative of what happened. Read commits to trace changes. Don't just trust issue titles — read the bodies and the actual code.
|
|
17
|
+
|
|
18
|
+
## Context Provided (Mechanical Data)
|
|
19
|
+
|
|
20
|
+
The task handler has already gathered and included in the prompt:
|
|
21
|
+
|
|
22
|
+
- **MISSION.md** — full mission text with extracted acceptance criteria
|
|
23
|
+
- **agentic-lib.toml** — full configuration snapshot (model, profile, budget, paths, tuning)
|
|
24
|
+
- **agentic-lib-state.toml** — full persistent state snapshot (counters, budget, status flags)
|
|
25
|
+
- **Workflow runs** — all runs in the period with name, conclusion, timing, duration, and URLs
|
|
26
|
+
- **Pull requests** — merged and open PRs with branch, title, additions/deletions, file count
|
|
27
|
+
- **Commits** — all commits with SHA, message, author, timestamp
|
|
28
|
+
- **Issues** — open and recently closed issues with labels, title, body excerpts
|
|
29
|
+
- **Source code** — full contents of all source files (src/lib/*.js), not just line counts
|
|
30
|
+
- **Test files** — full contents of all test files, not just filenames
|
|
31
|
+
- **Agent log excerpts** — narrative excerpts from the most recent agent log files
|
|
32
|
+
- **Website HTML** — text summary of the GitHub Pages website content
|
|
33
|
+
- **Screenshot** — whether SCREENSHOT_INDEX.png was captured (available as artifact)
|
|
34
|
+
- **README.md** — repository README content
|
|
35
|
+
- **Mission status** — whether MISSION_COMPLETE.md or MISSION_FAILED.md exist, with contents
|
|
36
|
+
|
|
37
|
+
## Your Task
|
|
38
|
+
|
|
39
|
+
### 1. Verify Acceptance Criteria (CRITICAL)
|
|
40
|
+
|
|
41
|
+
For each criterion extracted from MISSION.md:
|
|
42
|
+
- Use `read_file` to check source code for evidence of implementation
|
|
43
|
+
- Use `list_issues` / `get_issue` to find related issues that addressed it
|
|
44
|
+
- Mark each criterion as **PASS**, **FAIL**, or **NOT TESTED** with specific evidence (file path, line number, function name, or issue number)
|
|
45
|
+
- Don't trust issue titles — verify in the actual code
|
|
46
|
+
|
|
47
|
+
### 2. Build Iteration Narrative
|
|
48
|
+
|
|
49
|
+
For each workflow run in the period:
|
|
50
|
+
- Was it an init run, a supervisor run, or a manual dispatch?
|
|
51
|
+
- Did it produce a transform? (Check: was a PR merged in the same time window?)
|
|
52
|
+
- What did the supervisor/director decide? (Check agent logs)
|
|
53
|
+
- Map runs to PRs to commits to understand the transformation chain
|
|
54
|
+
|
|
55
|
+
Write this as the `iteration_narrative` field — a clear prose timeline of what happened.
|
|
56
|
+
|
|
57
|
+
### 3. Assess Code Quality
|
|
58
|
+
|
|
59
|
+
Read the source code included in the mechanical data:
|
|
60
|
+
- Is the implementation correct and complete?
|
|
61
|
+
- Are the tests meaningful (testing real behaviour) or trivial (testing existence)?
|
|
62
|
+
- Are there TODO comments or incomplete implementations?
|
|
63
|
+
- Does the code structure match what the mission asked for?
|
|
64
|
+
|
|
65
|
+
### 4. Identify Findings
|
|
66
|
+
|
|
67
|
+
Each finding should be categorised as:
|
|
68
|
+
- **POSITIVE** — something that worked well
|
|
69
|
+
- **CONCERN** — something that needs attention
|
|
70
|
+
- **REGRESSION** — something that got worse compared to expected behaviour
|
|
71
|
+
|
|
72
|
+
Every finding must cite evidence (file path, issue number, commit SHA, or workflow run ID).
|
|
73
|
+
|
|
74
|
+
### 5. Produce Scenario Summary
|
|
75
|
+
|
|
76
|
+
Fill in the `scenario_summary` object:
|
|
77
|
+
- `total_iterations`: total workflow runs
|
|
78
|
+
- `transforms`: how many produced merged PRs with code changes
|
|
79
|
+
- `convergence_iteration`: which iteration reached mission-complete (0 if not)
|
|
80
|
+
- `final_source_lines`: line count of main source file
|
|
81
|
+
- `final_test_count`: number of test files
|
|
82
|
+
- `acceptance_pass_count`: e.g. "7/8 PASS"
|
|
83
|
+
- `total_tokens`: from state file counters
|
|
84
|
+
|
|
85
|
+
### 6. Make Recommendations
|
|
86
|
+
|
|
87
|
+
Actionable next steps for improving the pipeline, the mission, or the code. Be specific.
|
|
88
|
+
|
|
89
|
+
### 7. Call `report_analysis`
|
|
90
|
+
|
|
91
|
+
Record your complete analysis as a structured JSON object. This is mandatory — the report cannot be enriched without it.
|
|
92
|
+
|
|
93
|
+
## Report Quality Standards
|
|
94
|
+
|
|
95
|
+
- Every claim must cite evidence
|
|
96
|
+
- Acceptance criteria assessment must read the actual source code
|
|
97
|
+
- Compare state file counters with observed workflow runs for consistency
|
|
98
|
+
- Note any discrepancies between what the pipeline reports and what actually happened
|
|
99
|
+
- Be honest about failures — a clear failure report is more valuable than a vague success report
|
|
100
|
+
- Include the iteration narrative as prose, not just a table
|
|
@@ -65,6 +65,10 @@ on:
|
|
|
65
65
|
type: string
|
|
66
66
|
required: false
|
|
67
67
|
default: "true"
|
|
68
|
+
generate-report:
|
|
69
|
+
type: string
|
|
70
|
+
required: false
|
|
71
|
+
default: "false"
|
|
68
72
|
workflow_dispatch:
|
|
69
73
|
inputs:
|
|
70
74
|
mode:
|
|
@@ -157,8 +161,8 @@ on:
|
|
|
157
161
|
type: string
|
|
158
162
|
required: false
|
|
159
163
|
default: ""
|
|
160
|
-
|
|
161
|
-
description: "
|
|
164
|
+
generate-report:
|
|
165
|
+
description: "Generate enriched benchmark report after flow"
|
|
162
166
|
type: boolean
|
|
163
167
|
required: false
|
|
164
168
|
default: false
|
|
@@ -183,6 +187,7 @@ jobs:
|
|
|
183
187
|
skipMaintain: ${{ steps.resolve.outputs.skipMaintain }}
|
|
184
188
|
create-seed-issues: ${{ steps.resolve.outputs.create-seed-issues }}
|
|
185
189
|
skip-tests: ${{ steps.resolve.outputs.skip-tests }}
|
|
190
|
+
generate-report: ${{ steps.resolve.outputs.generate-report }}
|
|
186
191
|
steps:
|
|
187
192
|
- uses: actions/checkout@v6
|
|
188
193
|
with:
|
|
@@ -229,6 +234,7 @@ jobs:
|
|
|
229
234
|
resolve('skipMaintain', '${{ inputs.skipMaintain }}', null, 'false');
|
|
230
235
|
resolve('create-seed-issues', '${{ inputs.create-seed-issues }}', null, 'false');
|
|
231
236
|
resolve('skip-tests', '${{ inputs.skip-tests }}', null, 'true');
|
|
237
|
+
resolve('generate-report', '${{ inputs.generate-report }}', null, 'false');
|
|
232
238
|
|
|
233
239
|
# ── Phase 0: Update agentic-lib package ────────────────────────────
|
|
234
240
|
update:
|
|
@@ -1007,3 +1013,14 @@ jobs:
|
|
|
1007
1013
|
with:
|
|
1008
1014
|
name: benchmark-report
|
|
1009
1015
|
path: BENCHMARK_REPORT_FLOW_*.md
|
|
1016
|
+
|
|
1017
|
+
# ── Enriched benchmark report (opt-in) ──────────────────────────
|
|
1018
|
+
enriched-report:
|
|
1019
|
+
needs: [params, generate-report]
|
|
1020
|
+
if: always() && needs.params.outputs.generate-report == 'true'
|
|
1021
|
+
uses: ./.github/workflows/agentic-lib-report.yml
|
|
1022
|
+
with:
|
|
1023
|
+
model: ${{ inputs.model }}
|
|
1024
|
+
config-path: ${{ inputs.config-path }}
|
|
1025
|
+
dry-run: ${{ inputs.dry-run || 'false' }}
|
|
1026
|
+
secrets: inherit
|
|
@@ -0,0 +1,248 @@
|
|
|
1
|
+
# SPDX-License-Identifier: MIT
|
|
2
|
+
# Copyright (C) 2025-2026 Polycode Limited
|
|
3
|
+
# .github/workflows/agentic-lib-report.yml
|
|
4
|
+
#
|
|
5
|
+
# Benchmark report generator: mechanically gathers pipeline data for a time period,
|
|
6
|
+
# optionally enriches via LLM, and commits the report to repo root on main.
|
|
7
|
+
|
|
8
|
+
name: agentic-lib-report
|
|
9
|
+
run-name: "agentic-lib-report [${{ github.ref_name }}]"
|
|
10
|
+
|
|
11
|
+
on:
|
|
12
|
+
workflow_call:
|
|
13
|
+
inputs:
|
|
14
|
+
ref:
|
|
15
|
+
type: string
|
|
16
|
+
required: false
|
|
17
|
+
default: ""
|
|
18
|
+
period-start:
|
|
19
|
+
type: string
|
|
20
|
+
required: false
|
|
21
|
+
default: ""
|
|
22
|
+
period-end:
|
|
23
|
+
type: string
|
|
24
|
+
required: false
|
|
25
|
+
default: ""
|
|
26
|
+
model:
|
|
27
|
+
type: string
|
|
28
|
+
required: false
|
|
29
|
+
default: ""
|
|
30
|
+
config-path:
|
|
31
|
+
type: string
|
|
32
|
+
required: false
|
|
33
|
+
default: ""
|
|
34
|
+
dry-run:
|
|
35
|
+
type: string
|
|
36
|
+
required: false
|
|
37
|
+
default: "false"
|
|
38
|
+
workflow_dispatch:
|
|
39
|
+
inputs:
|
|
40
|
+
period-start:
|
|
41
|
+
description: "Report period start (ISO 8601, default: most recent init run)"
|
|
42
|
+
type: string
|
|
43
|
+
required: false
|
|
44
|
+
default: ""
|
|
45
|
+
period-end:
|
|
46
|
+
description: "Report period end (ISO 8601, default: now)"
|
|
47
|
+
type: string
|
|
48
|
+
required: false
|
|
49
|
+
default: ""
|
|
50
|
+
model:
|
|
51
|
+
description: "Copilot SDK model for LLM enrichment"
|
|
52
|
+
type: choice
|
|
53
|
+
required: false
|
|
54
|
+
default: ""
|
|
55
|
+
options:
|
|
56
|
+
- ""
|
|
57
|
+
- gpt-5-mini
|
|
58
|
+
- claude-sonnet-4
|
|
59
|
+
- gpt-4.1
|
|
60
|
+
dry-run:
|
|
61
|
+
description: "Skip commit and push"
|
|
62
|
+
type: boolean
|
|
63
|
+
required: false
|
|
64
|
+
default: false
|
|
65
|
+
|
|
66
|
+
permissions:
|
|
67
|
+
contents: write
|
|
68
|
+
issues: read
|
|
69
|
+
pull-requests: read
|
|
70
|
+
actions: read
|
|
71
|
+
|
|
72
|
+
env:
|
|
73
|
+
configPath: "agentic-lib.toml"
|
|
74
|
+
|
|
75
|
+
jobs:
|
|
76
|
+
params:
|
|
77
|
+
runs-on: ubuntu-latest
|
|
78
|
+
outputs:
|
|
79
|
+
period-start: ${{ steps.resolve.outputs.period-start }}
|
|
80
|
+
period-end: ${{ steps.resolve.outputs.period-end }}
|
|
81
|
+
model: ${{ steps.resolve.outputs.model }}
|
|
82
|
+
config-path: ${{ steps.resolve.outputs.config-path }}
|
|
83
|
+
dry-run: ${{ steps.resolve.outputs.dry-run }}
|
|
84
|
+
steps:
|
|
85
|
+
- uses: actions/checkout@v6
|
|
86
|
+
with:
|
|
87
|
+
sparse-checkout: ${{ env.configPath }}
|
|
88
|
+
sparse-checkout-cone-mode: false
|
|
89
|
+
- id: resolve
|
|
90
|
+
uses: actions/github-script@v8
|
|
91
|
+
with:
|
|
92
|
+
script: |
|
|
93
|
+
const fs = require('fs');
|
|
94
|
+
const configPath = '${{ env.configPath }}';
|
|
95
|
+
let toml = '';
|
|
96
|
+
if (fs.existsSync(configPath)) toml = fs.readFileSync(configPath, 'utf8');
|
|
97
|
+
const readToml = (key) => {
|
|
98
|
+
const m = toml.match(new RegExp(`^\\s*${key}\\s*=\\s*"([^"]*)"`, 'm'));
|
|
99
|
+
return m ? m[1] : '';
|
|
100
|
+
};
|
|
101
|
+
const resolve = (name, input, configKey, defaultValue) => {
|
|
102
|
+
let value = input || '';
|
|
103
|
+
let source = 'input';
|
|
104
|
+
if (!value && configKey) {
|
|
105
|
+
value = readToml(configKey);
|
|
106
|
+
source = value ? 'config' : '';
|
|
107
|
+
}
|
|
108
|
+
if (!value) {
|
|
109
|
+
value = defaultValue;
|
|
110
|
+
source = 'default';
|
|
111
|
+
}
|
|
112
|
+
const configVal = configKey ? readToml(configKey) : 'N/A';
|
|
113
|
+
core.info(`param ${name}: input='${input}' config='${configVal}' default='${defaultValue}' → '${value}' (source: ${source})`);
|
|
114
|
+
core.setOutput(name, value);
|
|
115
|
+
return value;
|
|
116
|
+
};
|
|
117
|
+
resolve('period-start', '${{ inputs.period-start }}', null, '');
|
|
118
|
+
resolve('period-end', '${{ inputs.period-end }}', null, '');
|
|
119
|
+
resolve('model', '${{ inputs.model }}', 'model', 'gpt-5-mini');
|
|
120
|
+
resolve('config-path', '${{ inputs.config-path }}', null, '${{ env.configPath }}');
|
|
121
|
+
resolve('dry-run', '${{ inputs.dry-run }}', null, 'false');
|
|
122
|
+
|
|
123
|
+
generate-report:
|
|
124
|
+
needs: [params]
|
|
125
|
+
runs-on: ubuntu-latest
|
|
126
|
+
outputs:
|
|
127
|
+
report-file: ${{ steps.write.outputs.report-file }}
|
|
128
|
+
steps:
|
|
129
|
+
- uses: actions/checkout@v6
|
|
130
|
+
with:
|
|
131
|
+
ref: ${{ inputs.ref || 'main' }}
|
|
132
|
+
fetch-depth: 0
|
|
133
|
+
token: ${{ secrets.WORKFLOW_TOKEN || github.token }}
|
|
134
|
+
|
|
135
|
+
- uses: actions/setup-node@v6
|
|
136
|
+
with:
|
|
137
|
+
node-version: "24"
|
|
138
|
+
|
|
139
|
+
- name: Self-init (agentic-lib dev only)
|
|
140
|
+
if: hashFiles('scripts/self-init.sh') != '' && hashFiles('.github/agentic-lib/actions/agentic-step/package.json') == ''
|
|
141
|
+
run: bash scripts/self-init.sh
|
|
142
|
+
|
|
143
|
+
- name: Install agentic-step dependencies
|
|
144
|
+
working-directory: .github/agentic-lib/actions/agentic-step
|
|
145
|
+
run: |
|
|
146
|
+
npm ci
|
|
147
|
+
if [ -d "../../copilot" ]; then
|
|
148
|
+
ln -sf "$(pwd)/node_modules" ../../copilot/node_modules
|
|
149
|
+
fi
|
|
150
|
+
|
|
151
|
+
- name: Fetch state and logs from logs branch
|
|
152
|
+
run: |
|
|
153
|
+
git fetch origin agentic-lib-logs --depth=1 2>/dev/null || true
|
|
154
|
+
git show origin/agentic-lib-logs:agentic-lib-state.toml > agentic-lib-state.toml 2>/dev/null || echo "# no state" > agentic-lib-state.toml
|
|
155
|
+
# Fetch agent log files for context
|
|
156
|
+
git ls-tree origin/agentic-lib-logs --name-only 2>/dev/null | grep '^agent-log-' | sort > /tmp/log-files.txt || true
|
|
157
|
+
while read -r logfile; do
|
|
158
|
+
git show "origin/agentic-lib-logs:${logfile}" > "${logfile}" 2>/dev/null || true
|
|
159
|
+
done < /tmp/log-files.txt
|
|
160
|
+
# Fetch screenshot from logs branch
|
|
161
|
+
git show origin/agentic-lib-logs:SCREENSHOT_INDEX.png > SCREENSHOT_INDEX.png 2>/dev/null || true
|
|
162
|
+
|
|
163
|
+
- name: Fetch website from GitHub Pages
|
|
164
|
+
run: |
|
|
165
|
+
REPO_NAME="${{ github.event.repository.name }}"
|
|
166
|
+
OWNER="${{ github.repository_owner }}"
|
|
167
|
+
curl -sL --max-time 15 "https://${OWNER}.github.io/${REPO_NAME}/" > /tmp/website.html 2>/dev/null || echo "<html><body>No website available</body></html>" > /tmp/website.html
|
|
168
|
+
echo "Website fetched: $(wc -c < /tmp/website.html) bytes"
|
|
169
|
+
|
|
170
|
+
- name: Run report task
|
|
171
|
+
id: report
|
|
172
|
+
uses: ./.github/agentic-lib/actions/agentic-step
|
|
173
|
+
env:
|
|
174
|
+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
|
175
|
+
COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
|
|
176
|
+
with:
|
|
177
|
+
task: "report"
|
|
178
|
+
config: ${{ needs.params.outputs.config-path }}
|
|
179
|
+
instructions: ".github/agents/agent-report.md"
|
|
180
|
+
model: ${{ needs.params.outputs.model }}
|
|
181
|
+
period-start: ${{ needs.params.outputs.period-start }}
|
|
182
|
+
period-end: ${{ needs.params.outputs.period-end }}
|
|
183
|
+
|
|
184
|
+
- name: Write report file
|
|
185
|
+
id: write
|
|
186
|
+
shell: bash
|
|
187
|
+
run: |
|
|
188
|
+
REPORT_NUM=$(printf "%03d" $(( $(ls BENCHMARK_REPORT_*.md 2>/dev/null | wc -l) + 1 )))
|
|
189
|
+
REPORT_FILE="BENCHMARK_REPORT_${REPORT_NUM}.md"
|
|
190
|
+
# The report content is in the action output; write it via env var to handle multiline
|
|
191
|
+
cat > "${REPORT_FILE}" << 'REPORT_EOF'
|
|
192
|
+
${{ steps.report.outputs.report-content }}
|
|
193
|
+
REPORT_EOF
|
|
194
|
+
echo "report-file=${REPORT_FILE}" >> $GITHUB_OUTPUT
|
|
195
|
+
echo "Generated ${REPORT_FILE}"
|
|
196
|
+
|
|
197
|
+
- name: Commit and push report
|
|
198
|
+
if: needs.params.outputs.dry-run != 'true' && github.repository != 'xn-intenton-z2a/agentic-lib'
|
|
199
|
+
run: |
|
|
200
|
+
git config user.name "github-actions[bot]"
|
|
201
|
+
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
|
|
202
|
+
git add BENCHMARK_REPORT_*.md
|
|
203
|
+
git diff --staged --quiet && echo "No report to commit" && exit 0
|
|
204
|
+
git commit -m "report: benchmark report [skip ci]"
|
|
205
|
+
MAX_RETRIES=3
|
|
206
|
+
PUSH_SUCCESS=""
|
|
207
|
+
for attempt in $(seq 1 $MAX_RETRIES); do
|
|
208
|
+
case $attempt in
|
|
209
|
+
1)
|
|
210
|
+
echo "=== Attempt 1: rebase ==="
|
|
211
|
+
git pull --rebase origin main || {
|
|
212
|
+
git rebase --abort 2>/dev/null || true
|
|
213
|
+
sleep $((10 + RANDOM % 10))
|
|
214
|
+
continue
|
|
215
|
+
}
|
|
216
|
+
;;
|
|
217
|
+
2)
|
|
218
|
+
echo "=== Attempt 2: merge ==="
|
|
219
|
+
git pull --no-rebase origin main || {
|
|
220
|
+
git merge --abort 2>/dev/null || true
|
|
221
|
+
sleep $((20 + RANDOM % 10))
|
|
222
|
+
continue
|
|
223
|
+
}
|
|
224
|
+
;;
|
|
225
|
+
3)
|
|
226
|
+
echo "=== Attempt 3: merge -X theirs ==="
|
|
227
|
+
git pull -X theirs --no-rebase origin main || {
|
|
228
|
+
git merge --abort 2>/dev/null || true
|
|
229
|
+
sleep $((30 + RANDOM % 10))
|
|
230
|
+
continue
|
|
231
|
+
}
|
|
232
|
+
;;
|
|
233
|
+
esac
|
|
234
|
+
git push origin main && { PUSH_SUCCESS="true"; break; }
|
|
235
|
+
echo "Push failed on attempt $attempt"
|
|
236
|
+
sleep $((attempt * 10 + RANDOM % 10))
|
|
237
|
+
done
|
|
238
|
+
if [ "$PUSH_SUCCESS" != "true" ]; then
|
|
239
|
+
echo "::warning::Failed to push report after $MAX_RETRIES attempts — saved as artifact"
|
|
240
|
+
fi
|
|
241
|
+
|
|
242
|
+
- name: Upload report artifact
|
|
243
|
+
if: always()
|
|
244
|
+
uses: actions/upload-artifact@v4
|
|
245
|
+
with:
|
|
246
|
+
name: benchmark-report
|
|
247
|
+
path: BENCHMARK_REPORT_*.md
|
|
248
|
+
if-no-files-found: ignore
|
package/bin/agentic-lib.js
CHANGED
|
@@ -29,12 +29,13 @@ const command = args[0];
|
|
|
29
29
|
const flags = args.slice(1);
|
|
30
30
|
|
|
31
31
|
let initChanges = 0;
|
|
32
|
-
const TASK_COMMANDS = ["transform", "maintain-features", "maintain-library", "fix-code"];
|
|
32
|
+
const TASK_COMMANDS = ["transform", "maintain-features", "maintain-library", "fix-code", "report"];
|
|
33
33
|
const TASK_AGENT_MAP = {
|
|
34
34
|
"transform": "agent-issue-resolution",
|
|
35
35
|
"fix-code": "agent-apply-fix",
|
|
36
36
|
"maintain-features": "agent-maintain-features",
|
|
37
37
|
"maintain-library": "agent-maintain-library",
|
|
38
|
+
"report": "agent-report",
|
|
38
39
|
};
|
|
39
40
|
const INIT_COMMANDS = ["init", "update", "reset"];
|
|
40
41
|
const ALL_COMMANDS = [...INIT_COMMANDS, ...TASK_COMMANDS, "version", "iterate"];
|
|
@@ -55,6 +56,7 @@ Tasks (run Copilot SDK transformations):
|
|
|
55
56
|
maintain-features Generate feature files from mission
|
|
56
57
|
maintain-library Update library docs from SOURCES.md
|
|
57
58
|
fix-code Fix failing tests
|
|
59
|
+
report Generate benchmark report for a time period
|
|
58
60
|
|
|
59
61
|
Iterator:
|
|
60
62
|
iterate Single Copilot SDK session — reads, writes, tests, iterates autonomously
|
|
@@ -129,6 +131,10 @@ const prIdx = flags.indexOf("--pr");
|
|
|
129
131
|
const prNumber = prIdx >= 0 ? parseInt(flags[prIdx + 1], 10) : 0;
|
|
130
132
|
const discussionIdx = flags.indexOf("--discussion");
|
|
131
133
|
const discussionUrl = discussionIdx >= 0 ? flags[discussionIdx + 1] : "";
|
|
134
|
+
const periodStartIdx = flags.indexOf("--period-start");
|
|
135
|
+
const periodStart = periodStartIdx >= 0 ? flags[periodStartIdx + 1] : "";
|
|
136
|
+
const periodEndIdx = flags.indexOf("--period-end");
|
|
137
|
+
const periodEnd = periodEndIdx >= 0 ? flags[periodEndIdx + 1] : "";
|
|
132
138
|
|
|
133
139
|
// ─── Task Commands ───────────────────────────────────────────────────
|
|
134
140
|
|
package/package.json
CHANGED
|
@@ -13,7 +13,7 @@ inputs:
|
|
|
13
13
|
task:
|
|
14
14
|
description: >
|
|
15
15
|
The task to perform. One of: resolve-issue, fix-code, transform, maintain-features,
|
|
16
|
-
maintain-library, enhance-issue, review-issue, discussions, supervise
|
|
16
|
+
maintain-library, enhance-issue, review-issue, discussions, supervise, report
|
|
17
17
|
required: true
|
|
18
18
|
config:
|
|
19
19
|
description: "Path to agentic-lib.toml configuration file"
|
|
@@ -50,6 +50,14 @@ inputs:
|
|
|
50
50
|
description: "Copilot SDK model to use"
|
|
51
51
|
required: false
|
|
52
52
|
default: "gpt-5-mini"
|
|
53
|
+
period-start:
|
|
54
|
+
description: "Report period start (ISO 8601, default: most recent init run)"
|
|
55
|
+
required: false
|
|
56
|
+
default: ""
|
|
57
|
+
period-end:
|
|
58
|
+
description: "Report period end (ISO 8601, default: now)"
|
|
59
|
+
required: false
|
|
60
|
+
default: ""
|
|
53
61
|
|
|
54
62
|
outputs:
|
|
55
63
|
result:
|
|
@@ -66,6 +74,8 @@ outputs:
|
|
|
66
74
|
description: "Argument for the chosen action (free text)"
|
|
67
75
|
narrative:
|
|
68
76
|
description: "One-sentence English narrative of what the task did and why"
|
|
77
|
+
report-content:
|
|
78
|
+
description: "Generated report content (report task only)"
|
|
69
79
|
|
|
70
80
|
runs:
|
|
71
81
|
using: "node24"
|
|
@@ -32,13 +32,14 @@ import { discussions } from "./tasks/discussions.js";
|
|
|
32
32
|
import { supervise } from "./tasks/supervise.js";
|
|
33
33
|
import { direct } from "./tasks/direct.js";
|
|
34
34
|
import { implementationReview } from "./tasks/implementation-review.js";
|
|
35
|
+
import { report } from "./tasks/report.js";
|
|
35
36
|
|
|
36
37
|
const TASKS = {
|
|
37
38
|
"resolve-issue": resolveIssue, "fix-code": fixCode, "transform": transform,
|
|
38
39
|
"maintain-features": maintainFeatures, "maintain-library": maintainLibrary,
|
|
39
40
|
"enhance-issue": enhanceIssue, "review-issue": reviewIssue,
|
|
40
41
|
"discussions": discussions, "supervise": supervise, "direct": direct,
|
|
41
|
-
"implementation-review": implementationReview,
|
|
42
|
+
"implementation-review": implementationReview, "report": report,
|
|
42
43
|
};
|
|
43
44
|
|
|
44
45
|
async function run() {
|
|
@@ -88,6 +89,8 @@ async function run() {
|
|
|
88
89
|
discussionUrl: core.getInput("discussion-url"),
|
|
89
90
|
commentNodeId: core.getInput("comment-node-id"),
|
|
90
91
|
commentCreatedAt: core.getInput("comment-created-at"),
|
|
92
|
+
periodStart: core.getInput("period-start"),
|
|
93
|
+
periodEnd: core.getInput("period-end"),
|
|
91
94
|
octokit: github.getOctokit(process.env.GITHUB_TOKEN),
|
|
92
95
|
repo: github.context.repo, github: github.context,
|
|
93
96
|
logFilePath, screenshotFilePath,
|
|
@@ -102,7 +105,7 @@ async function run() {
|
|
|
102
105
|
|
|
103
106
|
// Set outputs
|
|
104
107
|
core.setOutput("result", result.outcome || "completed");
|
|
105
|
-
for (const [key, field] of [["pr-number", "prNumber"], ["tokens-used", "tokensUsed"], ["model", "model"], ["action", "action"], ["action-arg", "actionArg"], ["narrative", "narrative"]]) {
|
|
108
|
+
for (const [key, field] of [["pr-number", "prNumber"], ["tokens-used", "tokensUsed"], ["model", "model"], ["action", "action"], ["action-arg", "actionArg"], ["narrative", "narrative"], ["report-content", "reportContent"]]) {
|
|
106
109
|
if (result[field]) core.setOutput(key, String(result[field]));
|
|
107
110
|
}
|
|
108
111
|
|
|
@@ -0,0 +1,690 @@
|
|
|
1
|
+
// SPDX-License-Identifier: GPL-3.0-only
|
|
2
|
+
// Copyright (C) 2025-2026 Polycode Limited
|
|
3
|
+
// tasks/report.js — Benchmark report: mechanical data gathering + optional LLM enrichment
|
|
4
|
+
//
|
|
5
|
+
// Gathers configuration, state, workflow runs, commits, issues, PRs, source code,
|
|
6
|
+
// test files, agent logs, screenshots, and website HTML for a given time period.
|
|
7
|
+
// Produces a structured markdown report comparable to what a Claude Code session
|
|
8
|
+
// would write following ITERATION_BENCHMARKS_SIMPLE.md.
|
|
9
|
+
|
|
10
|
+
import * as core from "@actions/core";
|
|
11
|
+
import { existsSync, readFileSync, readdirSync } from "fs";
|
|
12
|
+
import { readOptionalFile, scanDirectory, NARRATIVE_INSTRUCTION } from "../copilot.js";
|
|
13
|
+
import { runCopilotSession } from "../../../copilot/copilot-session.js";
|
|
14
|
+
import { createGitHubTools, createGitTools } from "../../../copilot/github-tools.js";
|
|
15
|
+
|
|
16
|
+
/**
|
|
17
|
+
* Discover the most recent init workflow run timestamp via GitHub API.
|
|
18
|
+
*/
|
|
19
|
+
async function findLatestInitRun(octokit, owner, repo) {
|
|
20
|
+
try {
|
|
21
|
+
const { data } = await octokit.rest.actions.listWorkflowRunsForRepo({
|
|
22
|
+
owner, repo, per_page: 20,
|
|
23
|
+
});
|
|
24
|
+
const initRuns = data.workflow_runs
|
|
25
|
+
.filter(r => r.name && r.name.includes("agentic-lib-init"))
|
|
26
|
+
.sort((a, b) => new Date(b.created_at) - new Date(a.created_at));
|
|
27
|
+
if (initRuns.length > 0) return initRuns[0].created_at;
|
|
28
|
+
} catch (err) {
|
|
29
|
+
core.warning(`Could not find init runs: ${err.message}`);
|
|
30
|
+
}
|
|
31
|
+
return new Date(Date.now() - 24 * 60 * 60 * 1000).toISOString();
|
|
32
|
+
}
|
|
33
|
+
|
|
34
|
+
/**
|
|
35
|
+
* List workflow runs in the period.
|
|
36
|
+
*/
|
|
37
|
+
async function listWorkflowRuns(octokit, owner, repo, since, until) {
|
|
38
|
+
const runs = [];
|
|
39
|
+
try {
|
|
40
|
+
const { data } = await octokit.rest.actions.listWorkflowRunsForRepo({
|
|
41
|
+
owner, repo, per_page: 50, created: `${since}..${until}`,
|
|
42
|
+
});
|
|
43
|
+
for (const r of data.workflow_runs) {
|
|
44
|
+
runs.push({
|
|
45
|
+
id: r.id, name: r.name, status: r.status, conclusion: r.conclusion,
|
|
46
|
+
created_at: r.created_at, updated_at: r.updated_at,
|
|
47
|
+
html_url: r.html_url,
|
|
48
|
+
});
|
|
49
|
+
}
|
|
50
|
+
} catch (err) {
|
|
51
|
+
core.warning(`Could not list workflow runs: ${err.message}`);
|
|
52
|
+
}
|
|
53
|
+
return runs;
|
|
54
|
+
}
|
|
55
|
+
|
|
56
|
+
/**
|
|
57
|
+
* List commits in the period.
|
|
58
|
+
*/
|
|
59
|
+
async function listCommits(octokit, owner, repo, since, until) {
|
|
60
|
+
const commits = [];
|
|
61
|
+
try {
|
|
62
|
+
const { data } = await octokit.rest.repos.listCommits({
|
|
63
|
+
owner, repo, since, until, per_page: 100,
|
|
64
|
+
});
|
|
65
|
+
for (const c of data) {
|
|
66
|
+
commits.push({
|
|
67
|
+
sha: c.sha.substring(0, 8),
|
|
68
|
+
message: c.commit.message.split("\n")[0],
|
|
69
|
+
author: c.commit.author?.name || "unknown",
|
|
70
|
+
date: c.commit.author?.date || "",
|
|
71
|
+
});
|
|
72
|
+
}
|
|
73
|
+
} catch (err) {
|
|
74
|
+
core.warning(`Could not list commits: ${err.message}`);
|
|
75
|
+
}
|
|
76
|
+
return commits;
|
|
77
|
+
}
|
|
78
|
+
|
|
79
|
+
/**
|
|
80
|
+
* List issues (open + recently closed) with bodies for context.
|
|
81
|
+
*/
|
|
82
|
+
async function listIssues(octokit, owner, repo, since) {
|
|
83
|
+
const issues = [];
|
|
84
|
+
for (const state of ["open", "closed"]) {
|
|
85
|
+
try {
|
|
86
|
+
const { data } = await octokit.rest.issues.listForRepo({
|
|
87
|
+
owner, repo, state, since, per_page: 50, sort: "created", direction: "desc",
|
|
88
|
+
});
|
|
89
|
+
for (const i of data) {
|
|
90
|
+
if (i.pull_request) continue;
|
|
91
|
+
issues.push({
|
|
92
|
+
number: i.number, state: i.state, title: i.title,
|
|
93
|
+
labels: i.labels.map(l => l.name).join(", "),
|
|
94
|
+
created_at: i.created_at, closed_at: i.closed_at,
|
|
95
|
+
body: i.body ? i.body.substring(0, 500) : "",
|
|
96
|
+
});
|
|
97
|
+
}
|
|
98
|
+
} catch { /* ignore */ }
|
|
99
|
+
}
|
|
100
|
+
return issues;
|
|
101
|
+
}
|
|
102
|
+
|
|
103
|
+
/**
|
|
104
|
+
* List PRs (merged + open) with diff stats.
|
|
105
|
+
*/
|
|
106
|
+
async function listPullRequests(octokit, owner, repo, since) {
|
|
107
|
+
const prs = [];
|
|
108
|
+
for (const state of ["closed", "open"]) {
|
|
109
|
+
try {
|
|
110
|
+
const { data } = await octokit.rest.pulls.list({
|
|
111
|
+
owner, repo, state, per_page: 30, sort: "created", direction: "desc",
|
|
112
|
+
});
|
|
113
|
+
for (const p of data) {
|
|
114
|
+
if (state === "closed" && !p.merged_at) continue;
|
|
115
|
+
if (new Date(p.created_at) < new Date(since)) continue;
|
|
116
|
+
prs.push({
|
|
117
|
+
number: p.number, title: p.title, state: p.state,
|
|
118
|
+
branch: p.head?.ref || "", merged_at: p.merged_at,
|
|
119
|
+
created_at: p.created_at,
|
|
120
|
+
additions: p.additions || 0, deletions: p.deletions || 0,
|
|
121
|
+
changed_files: p.changed_files || 0,
|
|
122
|
+
});
|
|
123
|
+
}
|
|
124
|
+
} catch { /* ignore */ }
|
|
125
|
+
}
|
|
126
|
+
return prs;
|
|
127
|
+
}
|
|
128
|
+
|
|
129
|
+
/**
|
|
130
|
+
* Read source file contents (with size limit per file).
|
|
131
|
+
*/
|
|
132
|
+
function readSourceFiles(config) {
|
|
133
|
+
const files = [];
|
|
134
|
+
const MAX_CHARS = 5000;
|
|
135
|
+
const srcPath = config.paths?.source?.path || "src/lib/main.js";
|
|
136
|
+
const srcDir = "src/lib";
|
|
137
|
+
|
|
138
|
+
// Collect all source files
|
|
139
|
+
const filePaths = [];
|
|
140
|
+
if (existsSync(srcPath)) filePaths.push(srcPath);
|
|
141
|
+
if (existsSync(srcDir)) {
|
|
142
|
+
try {
|
|
143
|
+
for (const f of readdirSync(srcDir)) {
|
|
144
|
+
const fp = `${srcDir}/${f}`;
|
|
145
|
+
if (fp === srcPath) continue;
|
|
146
|
+
if (f.endsWith(".js") || f.endsWith(".ts")) filePaths.push(fp);
|
|
147
|
+
}
|
|
148
|
+
} catch { /* ignore */ }
|
|
149
|
+
}
|
|
150
|
+
|
|
151
|
+
for (const fp of filePaths) {
|
|
152
|
+
try {
|
|
153
|
+
const content = readFileSync(fp, "utf8");
|
|
154
|
+
files.push({
|
|
155
|
+
file: fp,
|
|
156
|
+
lines: content.split("\n").length,
|
|
157
|
+
content: content.length > MAX_CHARS
|
|
158
|
+
? content.substring(0, MAX_CHARS) + `\n... (truncated at ${MAX_CHARS} chars, total ${content.length})`
|
|
159
|
+
: content,
|
|
160
|
+
});
|
|
161
|
+
} catch { /* ignore */ }
|
|
162
|
+
}
|
|
163
|
+
return files;
|
|
164
|
+
}
|
|
165
|
+
|
|
166
|
+
/**
|
|
167
|
+
* Read test file contents (with size limit).
|
|
168
|
+
*/
|
|
169
|
+
function readTestFiles() {
|
|
170
|
+
const files = [];
|
|
171
|
+
const MAX_CHARS = 3000;
|
|
172
|
+
for (const dir of ["tests", "tests/unit", "__tests__"]) {
|
|
173
|
+
if (!existsSync(dir)) continue;
|
|
174
|
+
try {
|
|
175
|
+
for (const f of readdirSync(dir)) {
|
|
176
|
+
if (!f.endsWith(".test.js") && !f.endsWith(".test.ts") && !f.endsWith(".spec.js")) continue;
|
|
177
|
+
const fp = `${dir}/${f}`;
|
|
178
|
+
try {
|
|
179
|
+
const content = readFileSync(fp, "utf8");
|
|
180
|
+
files.push({
|
|
181
|
+
file: fp,
|
|
182
|
+
lines: content.split("\n").length,
|
|
183
|
+
content: content.length > MAX_CHARS
|
|
184
|
+
? content.substring(0, MAX_CHARS) + `\n... (truncated)`
|
|
185
|
+
: content,
|
|
186
|
+
});
|
|
187
|
+
} catch { /* ignore */ }
|
|
188
|
+
}
|
|
189
|
+
} catch { /* ignore */ }
|
|
190
|
+
}
|
|
191
|
+
return files;
|
|
192
|
+
}
|
|
193
|
+
|
|
194
|
+
/**
|
|
195
|
+
* Read agent log file contents (last N logs).
|
|
196
|
+
*/
|
|
197
|
+
function readAgentLogs(logPrefix, maxLogs = 10) {
|
|
198
|
+
const logDir = logPrefix.includes("/") ? logPrefix.substring(0, logPrefix.lastIndexOf("/")) : ".";
|
|
199
|
+
const logBase = logPrefix.includes("/") ? logPrefix.substring(logPrefix.lastIndexOf("/") + 1) : logPrefix;
|
|
200
|
+
const logs = [];
|
|
201
|
+
try {
|
|
202
|
+
const allLogs = readdirSync(logDir)
|
|
203
|
+
.filter(f => f.startsWith(logBase) && f.endsWith(".md"))
|
|
204
|
+
.sort();
|
|
205
|
+
// Take the most recent N logs
|
|
206
|
+
const recent = allLogs.slice(-maxLogs);
|
|
207
|
+
for (const f of recent) {
|
|
208
|
+
const fp = logDir === "." ? f : `${logDir}/${f}`;
|
|
209
|
+
try {
|
|
210
|
+
const content = readFileSync(fp, "utf8");
|
|
211
|
+
// Extract key info: first 80 lines
|
|
212
|
+
const lines = content.split("\n");
|
|
213
|
+
logs.push({
|
|
214
|
+
file: f,
|
|
215
|
+
excerpt: lines.slice(0, 80).join("\n"),
|
|
216
|
+
totalLines: lines.length,
|
|
217
|
+
});
|
|
218
|
+
} catch { /* ignore */ }
|
|
219
|
+
}
|
|
220
|
+
} catch { /* ignore */ }
|
|
221
|
+
return logs;
|
|
222
|
+
}
|
|
223
|
+
|
|
224
|
+
/**
|
|
225
|
+
* Extract acceptance criteria from MISSION.md.
|
|
226
|
+
* Looks for bullet points, numbered lists, or "Acceptance Criteria" sections.
|
|
227
|
+
*/
|
|
228
|
+
function extractAcceptanceCriteria(missionContent) {
|
|
229
|
+
if (!missionContent) return [];
|
|
230
|
+
const criteria = [];
|
|
231
|
+
const lines = missionContent.split("\n");
|
|
232
|
+
let inCriteria = false;
|
|
233
|
+
for (const line of lines) {
|
|
234
|
+
const lower = line.toLowerCase();
|
|
235
|
+
if (lower.includes("acceptance") || lower.includes("criteria") || lower.includes("requirements")) {
|
|
236
|
+
inCriteria = true;
|
|
237
|
+
continue;
|
|
238
|
+
}
|
|
239
|
+
if (inCriteria && /^#+\s/.test(line) && !lower.includes("criteria")) {
|
|
240
|
+
inCriteria = false;
|
|
241
|
+
}
|
|
242
|
+
if (inCriteria) {
|
|
243
|
+
const match = line.match(/^[\s]*[-*]\s+(.+)/) || line.match(/^[\s]*\d+\.\s+(.+)/);
|
|
244
|
+
if (match) criteria.push(match[1].trim());
|
|
245
|
+
}
|
|
246
|
+
}
|
|
247
|
+
// If no explicit criteria section found, extract all bullet points as potential criteria
|
|
248
|
+
if (criteria.length === 0) {
|
|
249
|
+
for (const line of lines) {
|
|
250
|
+
const match = line.match(/^[\s]*[-*]\s+(.+)/);
|
|
251
|
+
if (match && match[1].length > 10) criteria.push(match[1].trim());
|
|
252
|
+
}
|
|
253
|
+
}
|
|
254
|
+
return criteria;
|
|
255
|
+
}
|
|
256
|
+
|
|
257
|
+
/**
|
|
258
|
+
* Read website HTML if available (placed by workflow).
|
|
259
|
+
*/
|
|
260
|
+
function readWebsiteHtml() {
|
|
261
|
+
const path = "/tmp/website.html";
|
|
262
|
+
if (existsSync(path)) {
|
|
263
|
+
const content = readFileSync(path, "utf8");
|
|
264
|
+
// Extract meaningful text content (strip tags, keep first 2000 chars)
|
|
265
|
+
const textContent = content
|
|
266
|
+
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, "")
|
|
267
|
+
.replace(/<style[^>]*>[\s\S]*?<\/style>/gi, "")
|
|
268
|
+
.replace(/<[^>]+>/g, " ")
|
|
269
|
+
.replace(/\s+/g, " ")
|
|
270
|
+
.trim();
|
|
271
|
+
return {
|
|
272
|
+
rawLength: content.length,
|
|
273
|
+
textSummary: textContent.substring(0, 2000),
|
|
274
|
+
hasContent: textContent.length > 100,
|
|
275
|
+
};
|
|
276
|
+
}
|
|
277
|
+
return null;
|
|
278
|
+
}
|
|
279
|
+
|
|
280
|
+
/**
|
|
281
|
+
* Build the mechanical report content.
|
|
282
|
+
*/
|
|
283
|
+
function buildMechanicalReport({
|
|
284
|
+
periodStart, periodEnd, config, stateContent, configContent,
|
|
285
|
+
workflowRuns, commits, issues, prs, sourceFiles, testFiles, agentLogs,
|
|
286
|
+
missionContent, acceptanceCriteria, websiteInfo, hasScreenshot, repo,
|
|
287
|
+
}) {
|
|
288
|
+
const sections = [];
|
|
289
|
+
const now = new Date().toISOString().split("T")[0];
|
|
290
|
+
|
|
291
|
+
sections.push(`# Benchmark Report`);
|
|
292
|
+
sections.push(``);
|
|
293
|
+
sections.push(`**Date**: ${now}`);
|
|
294
|
+
sections.push(`**Repository**: ${repo.owner}/${repo.repo}`);
|
|
295
|
+
sections.push(`**Period**: ${periodStart} → ${periodEnd}`);
|
|
296
|
+
sections.push(`**Generated by**: agentic-lib-report (mechanical + LLM enrichment)`);
|
|
297
|
+
sections.push(``);
|
|
298
|
+
sections.push(`---`);
|
|
299
|
+
|
|
300
|
+
// ── Mission ──
|
|
301
|
+
sections.push(``);
|
|
302
|
+
sections.push(`## Mission`);
|
|
303
|
+
sections.push(``);
|
|
304
|
+
sections.push("```");
|
|
305
|
+
sections.push(missionContent || "(no MISSION.md found)");
|
|
306
|
+
sections.push("```");
|
|
307
|
+
|
|
308
|
+
if (acceptanceCriteria.length > 0) {
|
|
309
|
+
sections.push(``);
|
|
310
|
+
sections.push(`### Extracted Acceptance Criteria`);
|
|
311
|
+
sections.push(``);
|
|
312
|
+
for (let i = 0; i < acceptanceCriteria.length; i++) {
|
|
313
|
+
sections.push(`${i + 1}. ${acceptanceCriteria[i]}`);
|
|
314
|
+
}
|
|
315
|
+
}
|
|
316
|
+
|
|
317
|
+
// ── Configuration snapshot ──
|
|
318
|
+
sections.push(``);
|
|
319
|
+
sections.push(`## Configuration (agentic-lib.toml)`);
|
|
320
|
+
sections.push(``);
|
|
321
|
+
sections.push("```toml");
|
|
322
|
+
sections.push(configContent || "(not found)");
|
|
323
|
+
sections.push("```");
|
|
324
|
+
|
|
325
|
+
// ── State snapshot ──
|
|
326
|
+
sections.push(``);
|
|
327
|
+
sections.push(`## State (agentic-lib-state.toml)`);
|
|
328
|
+
sections.push(``);
|
|
329
|
+
sections.push("```toml");
|
|
330
|
+
sections.push(stateContent || "(not found)");
|
|
331
|
+
sections.push("```");
|
|
332
|
+
|
|
333
|
+
// ── Mission status ──
|
|
334
|
+
sections.push(``);
|
|
335
|
+
sections.push(`## Mission Status`);
|
|
336
|
+
sections.push(``);
|
|
337
|
+
const complete = existsSync("MISSION_COMPLETE.md");
|
|
338
|
+
const failed = existsSync("MISSION_FAILED.md");
|
|
339
|
+
sections.push(`| Signal | Present |`);
|
|
340
|
+
sections.push(`|--------|---------|`);
|
|
341
|
+
sections.push(`| MISSION_COMPLETE.md | ${complete ? "YES" : "NO"} |`);
|
|
342
|
+
sections.push(`| MISSION_FAILED.md | ${failed ? "YES" : "NO"} |`);
|
|
343
|
+
|
|
344
|
+
if (complete) {
|
|
345
|
+
sections.push(``);
|
|
346
|
+
sections.push("```");
|
|
347
|
+
sections.push(readFileSync("MISSION_COMPLETE.md", "utf8").trim());
|
|
348
|
+
sections.push("```");
|
|
349
|
+
}
|
|
350
|
+
if (failed) {
|
|
351
|
+
sections.push(``);
|
|
352
|
+
sections.push("```");
|
|
353
|
+
sections.push(readFileSync("MISSION_FAILED.md", "utf8").trim());
|
|
354
|
+
sections.push("```");
|
|
355
|
+
}
|
|
356
|
+
|
|
357
|
+
// ── Workflow runs (iteration timeline) ──
|
|
358
|
+
sections.push(``);
|
|
359
|
+
sections.push(`## Workflow Runs (${workflowRuns.length})`);
|
|
360
|
+
sections.push(``);
|
|
361
|
+
sections.push(`| # | Name | Conclusion | Started | Duration | URL |`);
|
|
362
|
+
sections.push(`|---|------|------------|---------|----------|-----|`);
|
|
363
|
+
for (let i = 0; i < workflowRuns.length; i++) {
|
|
364
|
+
const r = workflowRuns[i];
|
|
365
|
+
const startMs = new Date(r.created_at).getTime();
|
|
366
|
+
const endMs = new Date(r.updated_at).getTime();
|
|
367
|
+
const durationMin = Math.round((endMs - startMs) / 60000);
|
|
368
|
+
sections.push(`| ${i + 1} | ${r.name} | ${r.conclusion || r.status} | ${r.created_at} | ~${durationMin}min | [${r.id}](${r.html_url}) |`);
|
|
369
|
+
}
|
|
370
|
+
|
|
371
|
+
// ── Pull Requests (transformation evidence) ──
|
|
372
|
+
sections.push(``);
|
|
373
|
+
sections.push(`## Pull Requests (${prs.length})`);
|
|
374
|
+
sections.push(``);
|
|
375
|
+
sections.push(`| # | Branch | Title | Merged | +/- | Files |`);
|
|
376
|
+
sections.push(`|---|--------|-------|--------|-----|-------|`);
|
|
377
|
+
for (const p of prs) {
|
|
378
|
+
sections.push(`| #${p.number} | ${p.branch} | ${p.title} | ${p.merged_at || "open"} | +${p.additions}/-${p.deletions} | ${p.changed_files} |`);
|
|
379
|
+
}
|
|
380
|
+
|
|
381
|
+
// ── Commits timeline ──
|
|
382
|
+
sections.push(``);
|
|
383
|
+
sections.push(`## Commits Timeline (${commits.length})`);
|
|
384
|
+
sections.push(``);
|
|
385
|
+
sections.push(`| SHA | Date | Author | Message |`);
|
|
386
|
+
sections.push(`|-----|------|--------|---------|`);
|
|
387
|
+
for (const c of commits) {
|
|
388
|
+
sections.push(`| ${c.sha} | ${c.date} | ${c.author} | ${c.message} |`);
|
|
389
|
+
}
|
|
390
|
+
|
|
391
|
+
// ── Issues ──
|
|
392
|
+
sections.push(``);
|
|
393
|
+
sections.push(`## Issues (${issues.length})`);
|
|
394
|
+
sections.push(``);
|
|
395
|
+
sections.push(`| # | State | Labels | Title |`);
|
|
396
|
+
sections.push(`|---|-------|--------|-------|`);
|
|
397
|
+
for (const i of issues) {
|
|
398
|
+
sections.push(`| #${i.number} | ${i.state} | ${i.labels || "-"} | ${i.title} |`);
|
|
399
|
+
}
|
|
400
|
+
|
|
401
|
+
// ── Source Code (full contents) ──
|
|
402
|
+
sections.push(``);
|
|
403
|
+
sections.push(`## Source Code (${sourceFiles.length} files)`);
|
|
404
|
+
for (const s of sourceFiles) {
|
|
405
|
+
sections.push(``);
|
|
406
|
+
sections.push(`### ${s.file} (${s.lines} lines)`);
|
|
407
|
+
sections.push(``);
|
|
408
|
+
sections.push("```javascript");
|
|
409
|
+
sections.push(s.content);
|
|
410
|
+
sections.push("```");
|
|
411
|
+
}
|
|
412
|
+
|
|
413
|
+
// ── Test Files (full contents) ──
|
|
414
|
+
sections.push(``);
|
|
415
|
+
sections.push(`## Test Files (${testFiles.length} files)`);
|
|
416
|
+
for (const t of testFiles) {
|
|
417
|
+
sections.push(``);
|
|
418
|
+
sections.push(`### ${t.file} (${t.lines} lines)`);
|
|
419
|
+
sections.push(``);
|
|
420
|
+
sections.push("```javascript");
|
|
421
|
+
sections.push(t.content);
|
|
422
|
+
sections.push("```");
|
|
423
|
+
}
|
|
424
|
+
|
|
425
|
+
// ── Website & Screenshot ──
|
|
426
|
+
sections.push(``);
|
|
427
|
+
sections.push(`## Website & Screenshot`);
|
|
428
|
+
sections.push(``);
|
|
429
|
+
sections.push(`**Screenshot**: ${hasScreenshot ? "SCREENSHOT_INDEX.png captured (see artifacts)" : "not available"}`);
|
|
430
|
+
sections.push(``);
|
|
431
|
+
if (websiteInfo) {
|
|
432
|
+
sections.push(`**Website** (${websiteInfo.rawLength} bytes, ${websiteInfo.hasContent ? "has content" : "minimal content"}):`);
|
|
433
|
+
sections.push(``);
|
|
434
|
+
sections.push("```");
|
|
435
|
+
sections.push(websiteInfo.textSummary);
|
|
436
|
+
sections.push("```");
|
|
437
|
+
} else {
|
|
438
|
+
sections.push(`**Website**: not fetched`);
|
|
439
|
+
}
|
|
440
|
+
|
|
441
|
+
// ── Agent Log Narratives ──
|
|
442
|
+
sections.push(``);
|
|
443
|
+
sections.push(`## Agent Logs (${agentLogs.length} files)`);
|
|
444
|
+
for (const log of agentLogs) {
|
|
445
|
+
sections.push(``);
|
|
446
|
+
sections.push(`### ${log.file} (${log.totalLines} lines)`);
|
|
447
|
+
sections.push(``);
|
|
448
|
+
sections.push("```");
|
|
449
|
+
sections.push(log.excerpt);
|
|
450
|
+
if (log.totalLines > 80) sections.push(`... (${log.totalLines - 80} more lines)`);
|
|
451
|
+
sections.push("```");
|
|
452
|
+
}
|
|
453
|
+
|
|
454
|
+
// ── README ──
|
|
455
|
+
const readmeContent = readOptionalFile("README.md");
|
|
456
|
+
if (readmeContent) {
|
|
457
|
+
sections.push(``);
|
|
458
|
+
sections.push(`## README.md`);
|
|
459
|
+
sections.push(``);
|
|
460
|
+
const readmeLines = readmeContent.split("\n");
|
|
461
|
+
sections.push(readmeLines.slice(0, 60).join("\n"));
|
|
462
|
+
if (readmeLines.length > 60) sections.push(`\n... (${readmeLines.length - 60} more lines)`);
|
|
463
|
+
}
|
|
464
|
+
|
|
465
|
+
sections.push(``);
|
|
466
|
+
return sections.join("\n");
|
|
467
|
+
}
|
|
468
|
+
|
|
469
|
+
/**
|
|
470
|
+
* Report task handler.
|
|
471
|
+
*/
|
|
472
|
+
export async function report(context) {
|
|
473
|
+
const { config, octokit, repo, model } = context;
|
|
474
|
+
const owner = repo.owner;
|
|
475
|
+
const repoName = repo.repo;
|
|
476
|
+
|
|
477
|
+
// Resolve period
|
|
478
|
+
let periodStart = context.periodStart || "";
|
|
479
|
+
let periodEnd = context.periodEnd || new Date().toISOString();
|
|
480
|
+
if (!periodStart) {
|
|
481
|
+
periodStart = await findLatestInitRun(octokit, owner, repoName);
|
|
482
|
+
core.info(`period-start defaulted to latest init run: ${periodStart}`);
|
|
483
|
+
}
|
|
484
|
+
core.info(`Report period: ${periodStart} → ${periodEnd}`);
|
|
485
|
+
|
|
486
|
+
// Read config and state files
|
|
487
|
+
const configPath = config._configPath || "agentic-lib.toml";
|
|
488
|
+
const configContent = readOptionalFile(configPath);
|
|
489
|
+
const stateContent = readOptionalFile("agentic-lib-state.toml");
|
|
490
|
+
const missionContent = readOptionalFile("MISSION.md");
|
|
491
|
+
const acceptanceCriteria = extractAcceptanceCriteria(missionContent || "");
|
|
492
|
+
|
|
493
|
+
// Gather data from GitHub API
|
|
494
|
+
const workflowRuns = await listWorkflowRuns(octokit, owner, repoName, periodStart, periodEnd);
|
|
495
|
+
const commits = await listCommits(octokit, owner, repoName, periodStart, periodEnd);
|
|
496
|
+
const issues = await listIssues(octokit, owner, repoName, periodStart);
|
|
497
|
+
const prs = await listPullRequests(octokit, owner, repoName, periodStart);
|
|
498
|
+
|
|
499
|
+
// Gather local data: full file contents, not just stats
|
|
500
|
+
const sourceFiles = readSourceFiles(config);
|
|
501
|
+
const testFiles = readTestFiles();
|
|
502
|
+
const logPrefix = config.intentionBot?.logPrefix || "agent-log-";
|
|
503
|
+
const agentLogs = readAgentLogs(logPrefix);
|
|
504
|
+
const websiteInfo = readWebsiteHtml();
|
|
505
|
+
const hasScreenshot = existsSync("SCREENSHOT_INDEX.png");
|
|
506
|
+
|
|
507
|
+
core.info(`Gathered: ${workflowRuns.length} runs, ${commits.length} commits, ${issues.length} issues, ${prs.length} PRs, ${sourceFiles.length} source files, ${testFiles.length} test files, ${agentLogs.length} logs`);
|
|
508
|
+
|
|
509
|
+
// Build mechanical report
|
|
510
|
+
const mechanicalReport = buildMechanicalReport({
|
|
511
|
+
periodStart, periodEnd, config, stateContent, configContent,
|
|
512
|
+
workflowRuns, commits, issues, prs, sourceFiles, testFiles, agentLogs,
|
|
513
|
+
missionContent, acceptanceCriteria, websiteInfo, hasScreenshot, repo,
|
|
514
|
+
});
|
|
515
|
+
|
|
516
|
+
// Optional LLM enrichment (if Copilot token available)
|
|
517
|
+
let enrichedAnalysis = null;
|
|
518
|
+
let tokensUsed = 0;
|
|
519
|
+
let resultModel = model;
|
|
520
|
+
if (process.env.COPILOT_GITHUB_TOKEN) {
|
|
521
|
+
try {
|
|
522
|
+
const agentPrompt = context.instructions || "";
|
|
523
|
+
const tools = [
|
|
524
|
+
...createGitHubTools(octokit, owner, repoName),
|
|
525
|
+
...createGitTools(),
|
|
526
|
+
{
|
|
527
|
+
name: "report_analysis",
|
|
528
|
+
description: "Record your enriched analysis of the benchmark data. Call exactly once.",
|
|
529
|
+
parameters: {
|
|
530
|
+
type: "object",
|
|
531
|
+
properties: {
|
|
532
|
+
summary: { type: "string", description: "Executive summary of the benchmark period" },
|
|
533
|
+
findings: {
|
|
534
|
+
type: "array",
|
|
535
|
+
items: {
|
|
536
|
+
type: "object",
|
|
537
|
+
properties: {
|
|
538
|
+
id: { type: "string" },
|
|
539
|
+
title: { type: "string" },
|
|
540
|
+
severity: { type: "string", enum: ["POSITIVE", "CONCERN", "REGRESSION"] },
|
|
541
|
+
description: { type: "string" },
|
|
542
|
+
},
|
|
543
|
+
required: ["id", "title", "severity", "description"],
|
|
544
|
+
},
|
|
545
|
+
},
|
|
546
|
+
recommendations: { type: "array", items: { type: "string" } },
|
|
547
|
+
acceptance_criteria: {
|
|
548
|
+
type: "array",
|
|
549
|
+
items: {
|
|
550
|
+
type: "object",
|
|
551
|
+
properties: {
|
|
552
|
+
criterion: { type: "string" },
|
|
553
|
+
status: { type: "string", enum: ["PASS", "FAIL", "NOT TESTED"] },
|
|
554
|
+
evidence: { type: "string" },
|
|
555
|
+
},
|
|
556
|
+
required: ["criterion", "status", "evidence"],
|
|
557
|
+
},
|
|
558
|
+
},
|
|
559
|
+
iteration_narrative: { type: "string", description: "Narrative describing each iteration: what happened, what changed, what the agent decided" },
|
|
560
|
+
scenario_summary: {
|
|
561
|
+
type: "object",
|
|
562
|
+
properties: {
|
|
563
|
+
total_iterations: { type: "number" },
|
|
564
|
+
transforms: { type: "number" },
|
|
565
|
+
convergence_iteration: { type: "number", description: "Which iteration reached mission-complete (0 if not reached)" },
|
|
566
|
+
final_source_lines: { type: "number" },
|
|
567
|
+
final_test_count: { type: "number" },
|
|
568
|
+
acceptance_pass_count: { type: "string", description: "e.g. '7/8 PASS'" },
|
|
569
|
+
total_tokens: { type: "number" },
|
|
570
|
+
},
|
|
571
|
+
},
|
|
572
|
+
},
|
|
573
|
+
required: ["summary", "findings", "recommendations"],
|
|
574
|
+
},
|
|
575
|
+
handler: async (args) => {
|
|
576
|
+
enrichedAnalysis = args;
|
|
577
|
+
return "Analysis recorded.";
|
|
578
|
+
},
|
|
579
|
+
},
|
|
580
|
+
];
|
|
581
|
+
|
|
582
|
+
const userPrompt = [
|
|
583
|
+
"Analyse the following mechanical benchmark data and produce an enriched report.",
|
|
584
|
+
"",
|
|
585
|
+
"You MUST:",
|
|
586
|
+
"1. Verify each acceptance criterion by reading the actual source code (use read_file).",
|
|
587
|
+
"2. For each workflow run, determine if it produced a transform (check merged PRs in the same time window).",
|
|
588
|
+
"3. Read issue bodies (use get_issue) to understand what work was requested and completed.",
|
|
589
|
+
"4. Produce a narrative timeline: for each iteration, what happened, what changed, what the agent decided.",
|
|
590
|
+
"5. Assess code quality by reading the source — is it clean, correct, well-tested?",
|
|
591
|
+
"",
|
|
592
|
+
"=== MECHANICAL DATA ===",
|
|
593
|
+
mechanicalReport,
|
|
594
|
+
].join("\n");
|
|
595
|
+
|
|
596
|
+
const result = await runCopilotSession({
|
|
597
|
+
workspacePath: ".",
|
|
598
|
+
model: model || config.model || "gpt-5-mini",
|
|
599
|
+
tuning: config.tuning || {},
|
|
600
|
+
timeoutMs: 180000,
|
|
601
|
+
agentPrompt,
|
|
602
|
+
userPrompt,
|
|
603
|
+
tools,
|
|
604
|
+
});
|
|
605
|
+
tokensUsed = result.tokensUsed || 0;
|
|
606
|
+
resultModel = result.model || model;
|
|
607
|
+
} catch (err) {
|
|
608
|
+
core.warning(`LLM enrichment failed (report will be mechanical only): ${err.message}`);
|
|
609
|
+
}
|
|
610
|
+
}
|
|
611
|
+
|
|
612
|
+
// Compose final report
|
|
613
|
+
let finalReport = mechanicalReport;
|
|
614
|
+
if (enrichedAnalysis) {
|
|
615
|
+
const enrichedSections = [];
|
|
616
|
+
enrichedSections.push(``);
|
|
617
|
+
enrichedSections.push(`---`);
|
|
618
|
+
enrichedSections.push(``);
|
|
619
|
+
enrichedSections.push(`## Analysis (LLM-enriched)`);
|
|
620
|
+
enrichedSections.push(``);
|
|
621
|
+
enrichedSections.push(enrichedAnalysis.summary || "");
|
|
622
|
+
enrichedSections.push(``);
|
|
623
|
+
|
|
624
|
+
if (enrichedAnalysis.iteration_narrative) {
|
|
625
|
+
enrichedSections.push(`### Iteration Narrative`);
|
|
626
|
+
enrichedSections.push(``);
|
|
627
|
+
enrichedSections.push(enrichedAnalysis.iteration_narrative);
|
|
628
|
+
enrichedSections.push(``);
|
|
629
|
+
}
|
|
630
|
+
|
|
631
|
+
if (enrichedAnalysis.acceptance_criteria?.length) {
|
|
632
|
+
enrichedSections.push(`### Acceptance Criteria`);
|
|
633
|
+
enrichedSections.push(``);
|
|
634
|
+
enrichedSections.push(`| Criterion | Status | Evidence |`);
|
|
635
|
+
enrichedSections.push(`|-----------|--------|----------|`);
|
|
636
|
+
for (const ac of enrichedAnalysis.acceptance_criteria) {
|
|
637
|
+
enrichedSections.push(`| ${ac.criterion} | ${ac.status} | ${ac.evidence} |`);
|
|
638
|
+
}
|
|
639
|
+
enrichedSections.push(``);
|
|
640
|
+
}
|
|
641
|
+
|
|
642
|
+
if (enrichedAnalysis.scenario_summary) {
|
|
643
|
+
const s = enrichedAnalysis.scenario_summary;
|
|
644
|
+
enrichedSections.push(`### Scenario Summary`);
|
|
645
|
+
enrichedSections.push(``);
|
|
646
|
+
enrichedSections.push(`| Metric | Value |`);
|
|
647
|
+
enrichedSections.push(`|--------|-------|`);
|
|
648
|
+
if (s.total_iterations != null) enrichedSections.push(`| Total iterations | ${s.total_iterations} |`);
|
|
649
|
+
if (s.transforms != null) enrichedSections.push(`| Transforms | ${s.transforms} |`);
|
|
650
|
+
if (s.convergence_iteration) enrichedSections.push(`| Convergence | Iteration ${s.convergence_iteration} |`);
|
|
651
|
+
if (s.final_source_lines) enrichedSections.push(`| Final source lines | ${s.final_source_lines} |`);
|
|
652
|
+
if (s.final_test_count) enrichedSections.push(`| Final test count | ${s.final_test_count} |`);
|
|
653
|
+
if (s.acceptance_pass_count) enrichedSections.push(`| Acceptance criteria | ${s.acceptance_pass_count} |`);
|
|
654
|
+
if (s.total_tokens) enrichedSections.push(`| Total tokens | ${s.total_tokens} |`);
|
|
655
|
+
enrichedSections.push(``);
|
|
656
|
+
}
|
|
657
|
+
|
|
658
|
+
if (enrichedAnalysis.findings?.length) {
|
|
659
|
+
enrichedSections.push(`### Findings`);
|
|
660
|
+
enrichedSections.push(``);
|
|
661
|
+
for (const f of enrichedAnalysis.findings) {
|
|
662
|
+
enrichedSections.push(`#### ${f.id}: ${f.title} (${f.severity})`);
|
|
663
|
+
enrichedSections.push(``);
|
|
664
|
+
enrichedSections.push(f.description);
|
|
665
|
+
enrichedSections.push(``);
|
|
666
|
+
}
|
|
667
|
+
}
|
|
668
|
+
|
|
669
|
+
if (enrichedAnalysis.recommendations?.length) {
|
|
670
|
+
enrichedSections.push(`### Recommendations`);
|
|
671
|
+
enrichedSections.push(``);
|
|
672
|
+
for (let i = 0; i < enrichedAnalysis.recommendations.length; i++) {
|
|
673
|
+
enrichedSections.push(`${i + 1}. ${enrichedAnalysis.recommendations[i]}`);
|
|
674
|
+
}
|
|
675
|
+
enrichedSections.push(``);
|
|
676
|
+
}
|
|
677
|
+
|
|
678
|
+
finalReport += enrichedSections.join("\n");
|
|
679
|
+
}
|
|
680
|
+
|
|
681
|
+
const narrative = `Generated benchmark report for ${repo.owner}/${repo.repo} covering ${periodStart} to ${periodEnd} (${workflowRuns.length} runs, ${commits.length} commits, ${issues.length} issues)`;
|
|
682
|
+
|
|
683
|
+
return {
|
|
684
|
+
outcome: "report-generated",
|
|
685
|
+
narrative,
|
|
686
|
+
reportContent: finalReport,
|
|
687
|
+
tokensUsed,
|
|
688
|
+
model: resultModel,
|
|
689
|
+
};
|
|
690
|
+
}
|