agent-regression-lab 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,502 @@
1
+ # Phase One Npm Tools Implementation Plan
2
+
3
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
+
5
+ **Goal:** Finish the Phase 1 friction-removal work by adding npm-installable tool support, shipping package-style example tools, and making the `init` and docs flow work cleanly for `npx` users.
6
+
7
+ **Architecture:** Keep existing repo-local `modulePath` tool loading intact and add an explicit package-based loading path via `package`. Validate that each tool registration uses exactly one source, resolve file-backed tools from the project root, and resolve package-backed tools through normal Node package resolution from the current project. Ship minimal example packages in-repo to demonstrate the supported authoring shape and update `init` plus docs to make the new flow the default recommendation for installed usage.
8
+
9
+ **Tech Stack:** TypeScript, Node.js ESM dynamic import, YAML config, node:test, tsx
10
+
11
+ ---
12
+
13
+ ## File Map
14
+
15
+ **Core types and config**
16
+ - Modify: `src/types.ts`
17
+ - Modify: `src/config.ts`
18
+
19
+ **Tool loading**
20
+ - Modify: `src/tools.ts`
21
+
22
+ **Init flow**
23
+ - Modify: `src/init.ts`
24
+
25
+ **Docs**
26
+ - Modify: `README.md`
27
+ - Modify: `docs/tools.md`
28
+ - Modify: `docs/troubleshooting.md`
29
+ - Modify: `.claude/project.md`
30
+ - Modify: `.claude/active-tasks.md`
31
+
32
+ **Examples**
33
+ - Create: `examples/support-tools/package.json`
34
+ - Create: `examples/support-tools/index.js`
35
+ - Create: `examples/support-tools/README.md`
36
+ - Create: `examples/coding-tools/package.json`
37
+ - Create: `examples/coding-tools/index.js`
38
+ - Create: `examples/coding-tools/README.md`
39
+
40
+ **Tests**
41
+ - Modify: `tests/config.tier-one.test.ts`
42
+ - Create: `tests/packageTools.test.ts`
43
+ - Modify: `tests/cliPackaging.test.ts`
44
+ - Modify: `tests/init.test.ts`
45
+
46
+ ---
47
+
48
+ ### Task 1: Add Explicit Package-Based Tool Registration
49
+
50
+ **Files:**
51
+ - Modify: `src/types.ts`
52
+ - Modify: `src/config.ts`
53
+ - Modify: `tests/config.tier-one.test.ts`
54
+
55
+ - [ ] **Step 1: Write failing config tests for package-backed tools**
56
+
57
+ Add tests covering:
58
+
59
+ ```ts
60
+ test("config accepts a package-backed tool registration", () => {
61
+ // tool has package + exportName and no modulePath
62
+ });
63
+
64
+ test("config rejects a tool with both modulePath and package", () => {
65
+ assert.throws(() => loadAgentLabConfig(), /exactly one of 'modulePath' or 'package'/i);
66
+ });
67
+
68
+ test("config rejects a tool with neither modulePath nor package", () => {
69
+ assert.throws(() => loadAgentLabConfig(), /exactly one of 'modulePath' or 'package'/i);
70
+ });
71
+ ```
72
+
73
+ - [ ] **Step 2: Run the config tests to verify failure**
74
+
75
+ Run: `npx tsx --test tests/config.tier-one.test.ts`
76
+
77
+ Expected: FAIL because `ToolRegistration` and `validateToolRegistration()` currently require `modulePath`.
78
+
79
+ - [ ] **Step 3: Extend the tool registration type**
80
+
81
+ Update `src/types.ts`:
82
+
83
+ ```ts
84
+ export type ToolRegistration = ToolSpec & {
85
+ modulePath?: string;
86
+ package?: string;
87
+ exportName?: string;
88
+ };
89
+ ```
90
+
91
+ - [ ] **Step 4: Update config validation for explicit source selection**
92
+
93
+ Update `validateToolRegistration()` in `src/config.ts` so it enforces:
94
+
95
+ ```ts
96
+ const hasModulePath = typeof value.modulePath === "string" && value.modulePath.length > 0;
97
+ const hasPackage = typeof value.package === "string" && value.package.length > 0;
98
+
99
+ if ((hasModulePath ? 1 : 0) + (hasPackage ? 1 : 0) !== 1) {
100
+ throw new Error(`Tool '${value.name}' must define exactly one of 'modulePath' or 'package'.`);
101
+ }
102
+ ```
103
+
104
+ Retain:
105
+
106
+ - required `name`
107
+ - required `exportName`
108
+ - required `description`
109
+ - required `inputSchema`
110
+
111
+ And only enforce repo-boundary + existence checks when `modulePath` is used.
112
+
113
+ - [ ] **Step 5: Re-run the config tests**
114
+
115
+ Run: `npx tsx --test tests/config.tier-one.test.ts`
116
+
117
+ Expected: PASS
118
+
119
+ - [ ] **Step 6: Commit**
120
+
121
+ ```bash
122
+ git add src/types.ts src/config.ts tests/config.tier-one.test.ts
123
+ git commit -m "feat: support package-backed tool registrations"
124
+ ```
125
+
126
+ ---
127
+
128
+ ### Task 2: Load Tools From Installed Packages
129
+
130
+ **Files:**
131
+ - Modify: `src/tools.ts`
132
+ - Create: `tests/packageTools.test.ts`
133
+
134
+ - [ ] **Step 1: Write failing loader tests for package-backed tools**
135
+
136
+ Create `tests/packageTools.test.ts` with coverage for:
137
+
138
+ ```ts
139
+ test("loadToolRegistry loads a package-backed tool from node_modules", async () => {
140
+ // temp workspace with package tool registration and a stub package in node_modules
141
+ });
142
+
143
+ test("loadToolRegistry surfaces a clear error when package import fails", async () => {
144
+ // missing package should mention tool name and package name
145
+ });
146
+
147
+ test("loadToolRegistry still loads repo-local modulePath tools", async () => {
148
+ // regression guard for existing behavior
149
+ });
150
+ ```
151
+
152
+ - [ ] **Step 2: Run the new loader tests to verify failure**
153
+
154
+ Run: `npx tsx --test tests/packageTools.test.ts`
155
+
156
+ Expected: FAIL because `loadConfiguredTool()` only imports `modulePath`.
157
+
158
+ - [ ] **Step 3: Refactor tool loading into source-specific helpers**
159
+
160
+ In `src/tools.ts`, split the loading path:
161
+
162
+ ```ts
163
+ async function loadConfiguredTool(tool: ToolRegistration): Promise<LoadedTool> {
164
+ const module = tool.package
165
+ ? await importConfiguredPackageTool(tool)
166
+ : await importConfiguredFileTool(tool);
167
+
168
+ const candidate = module[tool.exportName!];
169
+ if (typeof candidate !== "function") {
170
+ throw new Error(`Tool '${tool.name}' export '${tool.exportName}' is not a function.`);
171
+ }
172
+
173
+ return {
174
+ spec: {
175
+ name: tool.name,
176
+ description: tool.description,
177
+ inputSchema: tool.inputSchema,
178
+ },
179
+ handler: candidate as ToolHandler,
180
+ };
181
+ }
182
+ ```
183
+
184
+ Add:
185
+
186
+ ```ts
187
+ async function importConfiguredFileTool(tool: ToolRegistration) {
188
+ const moduleUrl = pathToFileURL(resolve(tool.modulePath!)).href;
189
+ return await import(moduleUrl);
190
+ }
191
+
192
+ async function importConfiguredPackageTool(tool: ToolRegistration) {
193
+ try {
194
+ return await import(tool.package!);
195
+ } catch (error) {
196
+ const message = error instanceof Error ? error.message : String(error);
197
+ throw new Error(`Tool '${tool.name}' failed to load package '${tool.package}': ${message}`);
198
+ }
199
+ }
200
+ ```
201
+
202
+ - [ ] **Step 4: Re-run the loader tests**
203
+
204
+ Run: `npx tsx --test tests/packageTools.test.ts`
205
+
206
+ Expected: PASS
207
+
208
+ - [ ] **Step 5: Run existing tool-related regression tests**
209
+
210
+ Run: `npx tsx --test tests/benchmarkExpansion.test.ts tests/cliPackaging.test.ts`
211
+
212
+ Expected: PASS
213
+
214
+ - [ ] **Step 6: Commit**
215
+
216
+ ```bash
217
+ git add src/tools.ts tests/packageTools.test.ts tests/cliPackaging.test.ts
218
+ git commit -m "feat: load custom tools from installed packages"
219
+ ```
220
+
221
+ ---
222
+
223
+ ### Task 3: Update Init To Teach The Package-Tool Path
224
+
225
+ **Files:**
226
+ - Modify: `src/init.ts`
227
+ - Modify: `tests/init.test.ts`
228
+
229
+ - [ ] **Step 1: Write failing init tests**
230
+
231
+ Add or extend tests to assert the generated config includes comments for both styles:
232
+
233
+ ```ts
234
+ assert.match(config, /modulePath: \.\/tools\/customTool\.ts/);
235
+ assert.match(config, /package: "@agentlab\/example-support-tools"/);
236
+ ```
237
+
238
+ Also verify the next-step guidance mentions package installation:
239
+
240
+ ```ts
241
+ assert.match(output, /npm install @agentlab\/example-support-tools/);
242
+ ```
243
+
244
+ - [ ] **Step 2: Run init tests to verify failure**
245
+
246
+ Run: `npx tsx --test tests/init.test.ts`
247
+
248
+ Expected: FAIL because the current init template only documents repo-local paths.
249
+
250
+ - [ ] **Step 3: Update the generated config and next-step guidance**
251
+
252
+ Modify `src/init.ts` so `SAMPLE_CONFIG` shows:
253
+
254
+ ```yaml
255
+ # Tools can come from:
256
+ # 1. repo-local files
257
+ # 2. installed npm packages
258
+
259
+ # tools:
260
+ # - name: my.local_tool
261
+ # modulePath: ./tools/customTool.ts
262
+ # exportName: customTool
263
+ #
264
+ # - name: support.find_duplicate_charge
265
+ # package: "@agentlab/example-support-tools"
266
+ # exportName: findDuplicateCharge
267
+ ```
268
+
269
+ And update console output to include:
270
+
271
+ ```ts
272
+ console.log(" npm install @agentlab/example-support-tools");
273
+ console.log(" # then register package-backed tools in agentlab.config.yaml");
274
+ ```
275
+
276
+ - [ ] **Step 4: Re-run init tests**
277
+
278
+ Run: `npx tsx --test tests/init.test.ts`
279
+
280
+ Expected: PASS
281
+
282
+ - [ ] **Step 5: Commit**
283
+
284
+ ```bash
285
+ git add src/init.ts tests/init.test.ts
286
+ git commit -m "docs: teach init flow about package-backed tools"
287
+ ```
288
+
289
+ ---
290
+
291
+ ### Task 4: Add Minimal Example Tool Packages
292
+
293
+ **Files:**
294
+ - Create: `examples/support-tools/package.json`
295
+ - Create: `examples/support-tools/index.js`
296
+ - Create: `examples/support-tools/README.md`
297
+ - Create: `examples/coding-tools/package.json`
298
+ - Create: `examples/coding-tools/index.js`
299
+ - Create: `examples/coding-tools/README.md`
300
+ - Create: `tests/packageTools.test.ts` (extend)
301
+
302
+ - [ ] **Step 1: Create the support tools example package**
303
+
304
+ Create `examples/support-tools/package.json`:
305
+
306
+ ```json
307
+ {
308
+ "name": "@agentlab/example-support-tools",
309
+ "private": true,
310
+ "type": "module",
311
+ "exports": {
312
+ ".": "./index.js"
313
+ }
314
+ }
315
+ ```
316
+
317
+ Create `examples/support-tools/index.js` with a minimal exported function:
318
+
319
+ ```js
320
+ export async function findDuplicateCharge(input) {
321
+ const customerId = String(input?.customer_id ?? "");
322
+ if (!customerId) {
323
+ throw new Error("customer_id is required");
324
+ }
325
+ return { order_id: `dup_${customerId}` };
326
+ }
327
+ ```
328
+
329
+ - [ ] **Step 2: Create the coding tools example package**
330
+
331
+ Create `examples/coding-tools/package.json`:
332
+
333
+ ```json
334
+ {
335
+ "name": "@agentlab/example-coding-tools",
336
+ "private": true,
337
+ "type": "module",
338
+ "exports": {
339
+ ".": "./index.js"
340
+ }
341
+ }
342
+ ```
343
+
344
+ Create `examples/coding-tools/index.js`:
345
+
346
+ ```js
347
+ export async function readRepoHint(input) {
348
+ const path = String(input?.path ?? "");
349
+ if (!path) {
350
+ throw new Error("path is required");
351
+ }
352
+ return { path, hint: "Check the target file before editing." };
353
+ }
354
+ ```
355
+
356
+ - [ ] **Step 3: Add short READMEs showing intended registration**
357
+
358
+ Each README should include an `agentlab.config.yaml` snippet using `package` + `exportName`.
359
+
360
+ - [ ] **Step 4: Extend package tool tests to verify the example package shape**
361
+
362
+ Add a test that imports the example package directly:
363
+
364
+ ```ts
365
+ const mod = await import(resolve("examples/support-tools/index.js"));
366
+ assert.equal(typeof mod.findDuplicateCharge, "function");
367
+ ```
368
+
369
+ - [ ] **Step 5: Run example-package tests**
370
+
371
+ Run: `npx tsx --test tests/packageTools.test.ts`
372
+
373
+ Expected: PASS
374
+
375
+ - [ ] **Step 6: Commit**
376
+
377
+ ```bash
378
+ git add examples/support-tools examples/coding-tools tests/packageTools.test.ts
379
+ git commit -m "feat: add package-style example tool packages"
380
+ ```
381
+
382
+ ---
383
+
384
+ ### Task 5: Update Docs For Npx And Package-Based Tools
385
+
386
+ **Files:**
387
+ - Modify: `README.md`
388
+ - Modify: `docs/tools.md`
389
+ - Modify: `docs/troubleshooting.md`
390
+ - Modify: `.claude/project.md`
391
+ - Modify: `.claude/active-tasks.md`
392
+
393
+ - [ ] **Step 1: Update public docs to describe both tool sources**
394
+
395
+ In `docs/tools.md`, change the core framing from:
396
+
397
+ ```md
398
+ Custom tools are registered in `agentlab.config.yaml` and loaded from repo-local JS or TS modules.
399
+ ```
400
+
401
+ to:
402
+
403
+ ```md
404
+ Custom tools are registered in `agentlab.config.yaml` and can be loaded from either repo-local modules or installed npm packages.
405
+ ```
406
+
407
+ Add example blocks for both `modulePath` and `package`.
408
+
409
+ - [ ] **Step 2: Update README npx flow**
410
+
411
+ Add a package-tool example under the `npx` / install guidance:
412
+
413
+ ```bash
414
+ npm install @agentlab/example-support-tools
415
+ ```
416
+
417
+ And explain that package-backed tools are the preferred extension path for installed and `npx`-based projects.
418
+
419
+ - [ ] **Step 3: Update troubleshooting**
420
+
421
+ Add failure cases for:
422
+
423
+ - missing installed package
424
+ - wrong `exportName` from a package
425
+ - specifying both `modulePath` and `package`
426
+
427
+ Use direct messages matching the implementation.
428
+
429
+ - [ ] **Step 4: Update internal phase tracking**
430
+
431
+ In `.claude/project.md` and `.claude/active-tasks.md`:
432
+
433
+ - mark npm-installable tools complete
434
+ - keep Phase 1 open only if example packages / starter-repo / README rewrite still remain
435
+
436
+ - [ ] **Step 5: Run a docs-adjacent regression smoke**
437
+
438
+ Run: `npm run smoke:cli`
439
+
440
+ Expected: PASS
441
+
442
+ - [ ] **Step 6: Commit**
443
+
444
+ ```bash
445
+ git add README.md docs/tools.md docs/troubleshooting.md .claude/project.md .claude/active-tasks.md
446
+ git commit -m "docs: document npm-installable tools for npx users"
447
+ ```
448
+
449
+ ---
450
+
451
+ ### Task 6: Full Verification For Phase 1 Tooling
452
+
453
+ **Files:**
454
+ - Modify as needed from previous tasks only
455
+
456
+ - [ ] **Step 1: Run focused tests**
457
+
458
+ Run:
459
+
460
+ ```bash
461
+ npx tsx --test tests/config.tier-one.test.ts tests/packageTools.test.ts tests/init.test.ts tests/cliPackaging.test.ts
462
+ ```
463
+
464
+ Expected: PASS
465
+
466
+ - [ ] **Step 2: Run full suite**
467
+
468
+ Run:
469
+
470
+ ```bash
471
+ npm test
472
+ ```
473
+
474
+ Expected: PASS
475
+
476
+ - [ ] **Step 3: Run full release checks**
477
+
478
+ Run:
479
+
480
+ ```bash
481
+ npm run check
482
+ npm run build
483
+ npm run smoke:cli
484
+ npm_config_cache=/tmp/agentlab-npm-cache npm pack --dry-run
485
+ ```
486
+
487
+ Expected: PASS
488
+
489
+ - [ ] **Step 4: Reconcile internal status**
490
+
491
+ If verification is green:
492
+
493
+ - keep `.claude/project.md` Phase 1 marked in progress only if remaining tasks are truly outstanding
494
+ - otherwise mark the package-tool portion complete explicitly
495
+
496
+ - [ ] **Step 5: Final commit**
497
+
498
+ ```bash
499
+ git add -A
500
+ git commit -m "feat: finish phase-one npm tool support"
501
+ ```
502
+
@@ -0,0 +1,164 @@
1
+ # Phase 2 Lite And Phase 3 Design
2
+
3
+ ## Goal
4
+
5
+ Compress the original Phase 2 into a minimal integration-story pass, then move immediately into Phase 3 UI/demo polish.
6
+
7
+ The intent is to preserve product legibility for new users without spending weeks on broad framework coverage before the product is visually demo-ready.
8
+
9
+ ## Why This Change
10
+
11
+ The current product is technically credible, but the main remaining gap is not core capability. It is demonstration quality and onboarding clarity.
12
+
13
+ Fully skipping Phase 2 would create a prettier product with weaker adoption paths:
14
+
15
+ - users would see polished UI but still ask how it fits their workflow
16
+ - the product would remain support-agent-coded in perception
17
+ - the README and launch story would still lack recognizable entry points
18
+
19
+ Keeping a trimmed Phase 2 solves that without delaying Phase 3 materially.
20
+
21
+ ## Recommended Roadmap Change
22
+
23
+ Use this ordering:
24
+
25
+ 1. `Phase 2-lite`
26
+ 2. `Phase 3`
27
+
28
+ Do not treat Phase 2-lite as a broad integration campaign. Treat it as the minimum viable integration story required to make Phase 3 polish meaningful.
29
+
30
+ ## Phase 2 Lite Scope
31
+
32
+ Phase 2-lite should deliver only the pieces that make the product legible to new technical users.
33
+
34
+ ### Keep
35
+
36
+ - `arl-test` as the canonical HTTP/live-agent example
37
+ - one CI example using `agentlab run --suite-def pre_merge`
38
+ - one coding-agent example or guide
39
+ - 2-3 README entry points such as:
40
+ - start here for HTTP agents
41
+ - start here for coding agents
42
+ - start here for CI/pre-merge regression
43
+
44
+ ### Skip For Now
45
+
46
+ - broad framework integration coverage
47
+ - multiple framework-specific guides
48
+ - large scenario-pack system work
49
+ - marketplace/community work
50
+ - many hero examples across every ecosystem
51
+
52
+ ## Phase 2 Lite Deliverables
53
+
54
+ ### 1. Canonical Integration Paths
55
+
56
+ The product should have three obvious ways in:
57
+
58
+ - HTTP/live service path
59
+ - anchored by `arl-test`
60
+ - coding-agent path
61
+ - enough to prove ARL is not just for support agents
62
+ - CI path
63
+ - GitHub Actions example using suite definitions
64
+
65
+ These should be recognizable and copy-pasteable.
66
+
67
+ ### 2. README Entry Points
68
+
69
+ The README should not just describe the architecture. It should route users by workflow.
70
+
71
+ Recommended entry sections:
72
+
73
+ - “If your agent runs as an HTTP service”
74
+ - “If you are validating coding-agent changes”
75
+ - “If you want pre-merge regression checks in CI”
76
+
77
+ Each section should point to one canonical example, not many.
78
+
79
+ ### 3. Keep Scope Narrow
80
+
81
+ Phase 2-lite should avoid product expansion.
82
+
83
+ It should mainly be:
84
+
85
+ - examples
86
+ - README routing
87
+ - one CI workflow example
88
+ - one extra concrete use-case path beyond HTTP support
89
+
90
+ ## Phase 3 Scope
91
+
92
+ After Phase 2-lite, Phase 3 becomes the main workstream.
93
+
94
+ ### Primary Goal
95
+
96
+ Make the product demoable, screenshotable, and easier to understand visually.
97
+
98
+ ### Core Work
99
+
100
+ - comparison view redesign
101
+ - clearer red/green regression presentation
102
+ - better trace visualization
103
+ - stronger run history/dashboard view
104
+ - visual polish that feels intentional rather than debug-console minimal
105
+ - README screenshots or GIFs that show the regression story quickly
106
+
107
+ ### Design Constraint
108
+
109
+ Phase 3 should improve clarity, not add ornamental UI.
110
+
111
+ Every UI change should help users answer one of these questions faster:
112
+
113
+ - what changed?
114
+ - what failed?
115
+ - where did it fail?
116
+ - did the candidate regress?
117
+ - should I trust this run?
118
+
119
+ ## Success Criteria
120
+
121
+ ### After Phase 2 Lite
122
+
123
+ A new technical user can quickly identify:
124
+
125
+ - how to use ARL with an HTTP agent
126
+ - how to use ARL in CI
127
+ - that ARL can also support coding-agent regression workflows
128
+
129
+ ### After Phase 3
130
+
131
+ The product should be visually strong enough that:
132
+
133
+ - screenshots are worth sharing
134
+ - demos feel polished
135
+ - mentors and early users understand the product faster
136
+ - the UI helps explain value instead of requiring explanation around it
137
+
138
+ ## Non-Goals
139
+
140
+ This roadmap change does not mean:
141
+
142
+ - hosted platform work
143
+ - broad plugin/framework ecosystem support
144
+ - marketplace or virality mechanics
145
+ - replacing core CLI authoring with UI-first configuration
146
+
147
+ Those remain later-phase work.
148
+
149
+ ## Recommended Execution Order
150
+
151
+ 1. update internal roadmap/task tracking to reflect `Phase 2-lite`
152
+ 2. implement the minimal integration-story assets
153
+ 3. switch immediately to Phase 3 UI/demo polish
154
+
155
+ ## Decision
156
+
157
+ Use a compressed integration phase, not a skipped integration phase.
158
+
159
+ That is the best tradeoff between:
160
+
161
+ - speed
162
+ - product clarity
163
+ - demo quality
164
+ - launch readiness