@m8i-51/shoal 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,19 +1,21 @@
1
1
  [日本語版はこちら](README_JA.md)
2
2
 
3
- # shoal
3
+ <p align="center">
4
+ <img src="assets/logo-lockup.svg" alt="shoal" height="72">
5
+ </p>
4
6
 
5
- [![npm](https://img.shields.io/npm/v/@m8i-51/shoal?color=red)](https://www.npmjs.com/package/@m8i-51/shoal)
6
- [![TypeScript](https://img.shields.io/badge/TypeScript-5-blue?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
7
- [![Playwright](https://img.shields.io/badge/Playwright-browser-45ba4b?logo=playwright&logoColor=white)](https://playwright.dev/)
8
- [![Anthropic](https://img.shields.io/badge/Anthropic-Claude-blueviolet?logo=anthropic&logoColor=white)](https://www.anthropic.com/)
7
+ <p align="center">
8
+ <a href="https://www.npmjs.com/package/@m8i-51/shoal"><img src="https://img.shields.io/npm/v/@m8i-51/shoal?color=red" alt="npm"></a>
9
+ <a href="https://www.typescriptlang.org/"><img src="https://img.shields.io/badge/TypeScript-5-blue?logo=typescript&logoColor=white" alt="TypeScript"></a>
10
+ <a href="https://playwright.dev/"><img src="https://img.shields.io/badge/Playwright-browser-45ba4b?logo=playwright&logoColor=white" alt="Playwright"></a>
11
+ <a href="https://www.anthropic.com/"><img src="https://img.shields.io/badge/Anthropic-Claude-blueviolet?logo=anthropic&logoColor=white" alt="Anthropic"></a>
12
+ </p>
9
13
 
10
- Point it at any web app. Agents explore it and file GitHub Issues.
14
+ **AI agents that experience your app and help it grow.**
11
15
 
12
- shoal drops a swarm of agents onto a web app. Each agent has a distinct persona and evaluation lens accessibility, security, business logic, data integrity, new user experience, and goal alignment. They explore independently via API and real browser, then a triage agent deduplicates findings and files GitHub Issues.
16
+ shoal drops a swarm of AI agents onto a web app. Each agent has a distinct persona and explores the app as a real user would — navigating pages, taking actions, noticing friction. They surface bugs, usability issues, missing features, and gaps between what the app does and what it's meant to achieve.
13
17
 
14
- A **web dashboard** lets you start runs, monitor live progress, review findings by category, and track estimated LLM cost per run.
15
-
16
- No test scripts. No test data. No prior knowledge of the app required.
18
+ No test scripts. No test data. No prior knowledge of the app required. Just a URL.
17
19
 
18
20
  ---
19
21
 
@@ -41,6 +43,21 @@ Target App (any URL)
41
43
  Triage Agent
42
44
  ```
43
45
 
46
+ Each agent carries a distinct perspective — accessibility, security, business logic, UI design, new user experience, and more. They operate on a shared understanding of the app's purpose and goals. Coverage is tracked across runs, so each session naturally focuses on areas that haven't been explored yet.
47
+
48
+ ---
49
+
50
+ ## What it finds
51
+
52
+ At the end of each run:
53
+
54
+ - **Bugs** — broken flows, errors, inconsistent data
55
+ - **UX issues** — confusing interactions, dead ends, unclear states
56
+ - **Feature suggestions** — things that would add real value
57
+ - **Goal gaps** — where the app falls short of what it's trying to achieve
58
+
59
+ Findings are filed as GitHub Issues or saved as a self-contained HTML report. A **web dashboard** lets you start runs, watch live progress, review findings by category, and track estimated LLM cost per run.
60
+
44
61
  ---
45
62
 
46
63
  ## Quick Start
@@ -52,7 +69,7 @@ npm install -g @m8i-51/shoal
52
69
  npx playwright install chromium
53
70
  ```
54
71
 
55
- Move to the project you want to test, then run:
72
+ Move to the project you want to explore, then run:
56
73
 
57
74
  ```bash
58
75
  cd your-project
@@ -63,7 +80,7 @@ Open `.env` and set at minimum:
63
80
 
64
81
  ```env
65
82
  ANTHROPIC_API_KEY=sk-ant-...
66
- BASE_URL=http://localhost:3000 # URL of the app to test
83
+ BASE_URL=http://localhost:3000 # URL of the app to explore
67
84
  ```
68
85
 
69
86
  Then run:
@@ -113,6 +130,7 @@ Opens at `http://localhost:4000`. From there you can:
113
130
  | `ANTHROPIC_API_KEY` | — | Required |
114
131
  | `GITHUB_TOKEN` | — | Optional — enables Issue creation |
115
132
  | `GITHUB_REPO` | — | `owner/repo` format |
133
+ | `REFRESH_SPEC` | — | Set to `1` to re-run product discovery |
116
134
 
117
135
  ---
118
136
 
@@ -158,6 +176,23 @@ Alternatively, copy `targets/example.ts`, register it in `targets/index.ts`, and
158
176
 
159
177
  ---
160
178
 
179
+ ## Scheduled runs
180
+
181
+ To run shoal weekly against a staging environment, add a GitHub Actions workflow to your repo.
182
+
183
+ Run `shoal init` — it will offer to generate `.github/workflows/shoal-weekly.yml` automatically. Or copy the example from this repo:
184
+
185
+ ```bash
186
+ curl -O https://raw.githubusercontent.com/m8i-51/shoal/main/.github/workflows/shoal-weekly.example.yml
187
+ mv shoal-weekly.example.yml .github/workflows/shoal-weekly.yml
188
+ ```
189
+
190
+ Then add `ANTHROPIC_API_KEY` to your repo's **Actions secrets** (`Settings → Secrets and variables → Actions`).
191
+
192
+ The workflow runs every Monday at 09:00 UTC and can also be triggered manually from the Actions tab. Findings are filed as GitHub Issues using the built-in `GITHUB_TOKEN`.
193
+
194
+ ---
195
+
161
196
  ## Account Manager
162
197
 
163
198
  For apps that require login, shoal includes an Account Manager agent that autonomously discovers and tests authentication. It finds login pages, tests credentials from `test-accounts/` (gitignored), and injects session state into explorer agents so they can reach authenticated routes.
package/bin/init.js CHANGED
@@ -1,5 +1,5 @@
1
- import { intro, outro, select, text, isCancel, cancel } from "@clack/prompts";
2
- import { writeFileSync, existsSync } from "fs";
1
+ import { intro, outro, select, text, confirm, isCancel, cancel } from "@clack/prompts";
2
+ import { writeFileSync, existsSync, mkdirSync } from "fs";
3
3
  import { join } from "path";
4
4
 
5
5
  const PROVIDERS = [
@@ -102,5 +102,68 @@ export async function runInit(cwd) {
102
102
  const lines = Object.entries(env).map(([k, v]) => `${k}=${v}`);
103
103
  writeFileSync(envPath, lines.join("\n") + "\n", "utf-8");
104
104
 
105
+ // ── GitHub Actions workflow (optional) ────────────────────────────
106
+ const wantsWorkflow = guard(await confirm({
107
+ message: "Generate a GitHub Actions workflow for weekly scheduled runs?",
108
+ initialValue: false,
109
+ }));
110
+
111
+ if (wantsWorkflow) {
112
+ const stagingUrl = guard(await text({
113
+ message: "Staging URL (used as BASE_URL in the workflow)",
114
+ placeholder: "https://staging.example.com",
115
+ validate: (v) => v?.trim() ? undefined : "Required",
116
+ }));
117
+
118
+ const workflowDir = join(cwd, ".github", "workflows");
119
+ const workflowPath = join(workflowDir, "shoal-weekly.yml");
120
+ mkdirSync(workflowDir, { recursive: true });
121
+ writeFileSync(workflowPath, `# shoal weekly run
122
+ #
123
+ # Required secrets: ANTHROPIC_API_KEY
124
+ # Required variables: STAGING_URL is hardcoded below — update as needed
125
+ #
126
+ # GitHub Issues are filed automatically using the built-in GITHUB_TOKEN.
127
+
128
+ name: shoal weekly run
129
+
130
+ on:
131
+ schedule:
132
+ - cron: '0 9 * * 1' # every Monday at 09:00 UTC
133
+ workflow_dispatch: # also allow manual trigger from the Actions tab
134
+
135
+ jobs:
136
+ shoal:
137
+ runs-on: ubuntu-latest
138
+ timeout-minutes: 60
139
+
140
+ steps:
141
+ - uses: actions/checkout@v4
142
+
143
+ - uses: actions/setup-node@v4
144
+ with:
145
+ node-version: '20'
146
+
147
+ - name: Install shoal
148
+ run: npm install -g @m8i-51/shoal
149
+
150
+ - name: Install Playwright browsers
151
+ run: npx playwright install chromium --with-deps
152
+
153
+ - name: Run shoal
154
+ env:
155
+ ANTHROPIC_API_KEY: \${{ secrets.ANTHROPIC_API_KEY }}
156
+ BASE_URL: ${stagingUrl.trim()}
157
+ GITHUB_TOKEN: \${{ secrets.GITHUB_TOKEN }}
158
+ GITHUB_REPO: \${{ github.repository }}
159
+ MAX_BROWSERS: '2'
160
+ MAX_EXPLORERS: '0'
161
+ run: shoal
162
+ `, "utf-8");
163
+
164
+ console.log(`\n Created ${workflowPath}`);
165
+ console.log(" Next: add ANTHROPIC_API_KEY to your repo's Actions secrets");
166
+ }
167
+
105
168
  outro("Created .env\n\n shoal serve — open the dashboard at http://localhost:4000\n shoal — run agents from the terminal");
106
169
  }
@@ -1,6 +1,6 @@
1
1
  import * as fs from "fs";
2
2
  import * as path from "path";
3
- import type { Finding, RunLog } from "./types";
3
+ import type { Finding, RunLog, RegressionCheck } from "./types";
4
4
  import type { ProductSpec } from "./product-discovery";
5
5
  import type { TriageResult } from "./triage";
6
6
  import type { Scenario, ScenarioOutcome } from "./scenario-designer";
@@ -51,6 +51,10 @@ export function generateReport(
51
51
  ): string {
52
52
  const reportPath = path.join(process.cwd(), "logs", `report_${runLog.runId}.html`);
53
53
 
54
+ const allRegressionChecks: RegressionCheck[] = runLog.agents.flatMap((a) => a.regressionChecks ?? []);
55
+ const fixedChecks = allRegressionChecks.filter((c) => c.status === "fixed");
56
+ const regressedChecks = allRegressionChecks.filter((c) => c.status === "regressed");
57
+
54
58
  const issuedSet = new Set(triageResult.issued);
55
59
  const skippedSet = new Set(triageResult.skipped);
56
60
 
@@ -242,10 +246,28 @@ export function generateReport(
242
246
  <div class="stat-card"><div class="number">${triageResult.skipped.length}</div><div class="label">skipped</div></div>
243
247
  <div class="stat-card"><div class="number">${triageResult.unprocessed.length}</div><div class="label">pending</div></div>
244
248
  <div class="stat-card"><div class="number">${runLog.agents.length}</div><div class="label">agents</div></div>
249
+ ${allRegressionChecks.length > 0 ? `<div class="stat-card"><div class="number" style="color:#22c55e">${fixedChecks.length}</div><div class="label">still fixed</div></div><div class="stat-card"><div class="number" style="color:${regressedChecks.length > 0 ? "#ef4444" : "#94a3b8"}">${regressedChecks.length}</div><div class="label">regressed</div></div>` : ""}
245
250
  </div>
246
251
  <div class="category-bar">${categoryBar || '<div style="width:100%;display:flex;align-items:center;padding:0 .75rem;font-size:.75rem;color:#94a3b8">no findings</div>'}</div>
247
252
  </section>
248
253
 
254
+ ${allRegressionChecks.length > 0 ? `
255
+ <section>
256
+ <h2>Progress (${allRegressionChecks.length} issues checked)</h2>
257
+ ${regressedChecks.length > 0 ? `<p style="color:#ef4444;font-size:.875rem;margin-bottom:.75rem">⚠ ${regressedChecks.length} regression${regressedChecks.length !== 1 ? "s" : ""} detected</p>` : `<p style="color:#22c55e;font-size:.875rem;margin-bottom:.75rem">✓ All previously fixed issues remain resolved</p>`}
258
+ <table>
259
+ <thead><tr><th>#</th><th>Issue</th><th style="text-align:center">Status</th></tr></thead>
260
+ <tbody>
261
+ ${allRegressionChecks.map((c) => `
262
+ <tr>
263
+ <td style="color:#94a3b8">#${c.issueNumber}</td>
264
+ <td>${esc(c.issueTitle)}</td>
265
+ <td style="text-align:center">${c.status === "fixed" ? '<span class="badge" style="background:#22c55e">✓ fixed</span>' : '<span class="badge" style="background:#ef4444">⚠ regressed</span>'}</td>
266
+ </tr>`).join("")}
267
+ </tbody>
268
+ </table>
269
+ </section>` : ""}
270
+
249
271
  <section>
250
272
  <h2>Findings (${findings.length})</h2>
251
273
  ${sortedFindings.length > 0 ? findingCards : "<p style='color:#94a3b8;font-size:.875rem'>No findings collected.</p>"}
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@m8i-51/shoal",
3
- "version": "0.1.4",
3
+ "version": "0.1.5",
4
4
  "type": "module",
5
5
  "description": "Multi-agent web exploration framework — finds bugs, UX issues, and missing features by running AI agents against your app",
6
6
  "repository": {
package/run.ts CHANGED
@@ -43,6 +43,7 @@ import { estimateCost, formatCostUSD } from "./framework/cost";
43
43
  const BASE_URL = process.env.BASE_URL ?? "http://localhost:3000";
44
44
  const GITHUB_TOKEN = process.env.GITHUB_TOKEN ?? "";
45
45
  const GITHUB_REPO = process.env.GITHUB_REPO ?? "";
46
+ const REFRESH_SPEC = process.env.REFRESH_SPEC === "1";
46
47
  const githubOptions = { token: GITHUB_TOKEN, repo: GITHUB_REPO };
47
48
 
48
49
  const TARGET = process.env.TARGET ?? "none";
@@ -110,7 +111,14 @@ const POST_FEEDBACK_TOOL: Tool = {
110
111
  type: "object",
111
112
  properties: {
112
113
  title: { type: "string" },
113
- body: { type: "string" },
114
+ body: {
115
+ type: "string",
116
+ description: `Describe the finding. Tone varies by category:
117
+ - bug: technical — state what happened, what was expected, and steps to reproduce.
118
+ - ux: experiential — write from the user's perspective ("I tried to...", "It was hard to find...", "I got confused when...").
119
+ - feature-request: aspirational — describe what you wished you could do ("It would have been helpful if...", "I wanted to...").
120
+ - goal-gap: goal-oriented — explain which goal was blocked and why ("I was trying to achieve X, but couldn't because...").`,
121
+ },
114
122
  category: { type: "string", enum: ["ux", "feature-request", "bug", "goal-gap"] },
115
123
  },
116
124
  required: ["title", "body", "category"],
@@ -317,6 +325,12 @@ ${productSpec.appDescription}
317
325
  If you notice anything inconvenient, a missing feature, or bug-like behavior,
318
326
  report it with the post_feedback tool.
319
327
 
328
+ When writing the body, match the tone to the category:
329
+ - bug: technical ("The endpoint returned 500 when...", "Expected X but got Y")
330
+ - ux: experiential ("I tried to find the button but...", "It was unclear what would happen if...")
331
+ - feature-request: aspirational ("It would have been useful if...", "I wished I could...")
332
+ - goal-gap: goal-oriented ("I was trying to X, but couldn't because...")
333
+
320
334
  [Implemented Features]
321
335
  ${productSpec.features}
322
336
  ${productSpec.uiFeatures ? `\n[UI-Only Features]\nThese features exist in the UI but may not be reflected in API responses. Keep them in mind when interpreting API results.\n${productSpec.uiFeatures}\n` : ""}${productSpec.designContext ? `\n[Design Context]\n${productSpec.designContext}\n` : ""}${goalsSection(productSpec)}${assignment.scenario
@@ -882,6 +896,12 @@ ${productSpec.appDescription}
882
896
  4. Move to another page and repeat
883
897
  5. Finish after 8–10 actions
884
898
 
899
+ When writing the body, match the tone to the category:
900
+ - bug: technical ("The endpoint returned 500 when...", "Expected X but got Y")
901
+ - ux: experiential ("I tried to find the button but...", "It was unclear what would happen if...")
902
+ - feature-request: aspirational ("It would have been useful if...", "I wished I could...")
903
+ - goal-gap: goal-oriented ("I was trying to X, but couldn't because...")
904
+
885
905
  [Using Observation Tools]
886
906
  - To verify an action was actually applied, call diff_since_last_action
887
907
  - If data isn't reflected or errors appear, call read_network_errors
@@ -1041,8 +1061,13 @@ async function main() {
1041
1061
  const scenarioOutcomes: ScenarioOutcome[] = [];
1042
1062
  try {
1043
1063
  const cached = loadCachedSpec(BASE_URL);
1044
- if (cached) {
1045
- console.log(`\n[product-discovery] using cache (date: ${cached.discoveredAt?.slice(0, 10) ?? "unknown"}, confidence: ${cached.confidence})`);
1064
+ if (cached && !REFRESH_SPEC) {
1065
+ const ageDays = cached.discoveredAt
1066
+ ? Math.floor((Date.now() - new Date(cached.discoveredAt).getTime()) / 86_400_000)
1067
+ : null;
1068
+ const ageStr = ageDays != null ? `${ageDays} day${ageDays !== 1 ? "s" : ""} old` : "unknown date";
1069
+ const staleHint = ageDays != null && ageDays >= 7 ? " — set REFRESH_SPEC=1 to re-run discovery" : "";
1070
+ console.log(`\n[product-discovery] using cache (${ageStr}, confidence: ${cached.confidence})${staleHint}`);
1046
1071
  productSpec = cached;
1047
1072
  } else {
1048
1073
  const discoveryContext = await browser.newContext({ viewport: { width: 1024, height: 640 } });
package/server/index.ts CHANGED
@@ -6,6 +6,7 @@ import { dirname, join, resolve } from "path";
6
6
  import { existsSync, readFileSync, writeFileSync } from "fs";
7
7
  import { listRuns, getReportPath } from "./runs.js";
8
8
  import { activeSessions, spawnRun, cancelSession } from "./runner.js";
9
+ import { loadSchedule, saveSchedule, startScheduler, type ScheduleConfig } from "./scheduler.js";
9
10
 
10
11
  function specFilePath(baseUrl: string): string {
11
12
  try {
@@ -248,6 +249,28 @@ process.on("unhandledRejection", (reason) => {
248
249
  console.error("[server] unhandledRejection:", reason);
249
250
  });
250
251
 
252
+ // ----------------------------------------------------------------
253
+ // API: schedule config
254
+ // ----------------------------------------------------------------
255
+ app.get("/api/schedule", (_req, res) => {
256
+ res.json(loadSchedule());
257
+ });
258
+
259
+ app.patch("/api/schedule", (req, res) => {
260
+ const current = loadSchedule();
261
+ const { enabled, dayOfWeek, hour, minute } = req.body as Partial<ScheduleConfig>;
262
+ const updated: ScheduleConfig = {
263
+ ...current,
264
+ ...(enabled != null ? { enabled: Boolean(enabled) } : {}),
265
+ ...(dayOfWeek != null && Number.isInteger(dayOfWeek) && dayOfWeek >= 0 && dayOfWeek <= 6 ? { dayOfWeek } : {}),
266
+ ...(hour != null && Number.isInteger(hour) && hour >= 0 && hour <= 23 ? { hour } : {}),
267
+ ...(minute != null && Number.isInteger(minute) && minute >= 0 && minute <= 59 ? { minute } : {}),
268
+ };
269
+ saveSchedule(updated);
270
+ res.json(updated);
271
+ });
272
+
251
273
  app.listen(PORT, () => {
252
274
  console.log(`\nshoal dashboard → http://localhost:${PORT}\n`);
275
+ startScheduler();
253
276
  });
package/server/runs.ts CHANGED
@@ -15,6 +15,8 @@ export interface RunSummary {
15
15
  hasReport: boolean;
16
16
  isLive?: boolean;
17
17
  estimatedCostUSD: number | null;
18
+ regressionChecked: number;
19
+ regressionFailed: number;
18
20
  }
19
21
 
20
22
  function countFindings(runId: string): { total: number; byCategory: Record<string, number> } {
@@ -63,6 +65,8 @@ export function listRuns(): RunSummary[] {
63
65
  hasReport: false,
64
66
  isLive: true,
65
67
  estimatedCostUSD: null,
68
+ regressionChecked: 0,
69
+ regressionFailed: 0,
66
70
  });
67
71
  }
68
72
  } catch { /* skip */ }
@@ -90,6 +94,8 @@ export function listRuns(): RunSummary[] {
90
94
  findingsByCategory: byCategory,
91
95
  hasReport: fs.existsSync(reportPath),
92
96
  estimatedCostUSD: log.summary?.cost?.estimatedUSD ?? null,
97
+ regressionChecked: log.summary?.regressionChecked ?? 0,
98
+ regressionFailed: log.summary?.regressionFailed ?? 0,
93
99
  });
94
100
  } catch { /* skip */ }
95
101
  }
@@ -0,0 +1,65 @@
1
+ import { existsSync, readFileSync, writeFileSync } from "fs";
2
+ import { join } from "path";
3
+ import { spawnRun } from "./runner.js";
4
+
5
+ export interface ScheduleConfig {
6
+ enabled: boolean;
7
+ dayOfWeek: number; // 0=Sun 1=Mon ... 6=Sat
8
+ hour: number;
9
+ minute: number;
10
+ lastRunDate: string | null; // YYYY-MM-DD — prevents double-trigger
11
+ }
12
+
13
+ const DEFAULT_CONFIG: ScheduleConfig = {
14
+ enabled: false,
15
+ dayOfWeek: 1,
16
+ hour: 9,
17
+ minute: 0,
18
+ lastRunDate: null,
19
+ };
20
+
21
+ function configPath(): string {
22
+ return join(process.cwd(), "schedule.json");
23
+ }
24
+
25
+ export function loadSchedule(): ScheduleConfig {
26
+ const p = configPath();
27
+ if (!existsSync(p)) return { ...DEFAULT_CONFIG };
28
+ try {
29
+ return { ...DEFAULT_CONFIG, ...JSON.parse(readFileSync(p, "utf-8")) };
30
+ } catch {
31
+ return { ...DEFAULT_CONFIG };
32
+ }
33
+ }
34
+
35
+ export function saveSchedule(config: ScheduleConfig): void {
36
+ writeFileSync(configPath(), JSON.stringify(config, null, 2), "utf-8");
37
+ }
38
+
39
+ export function startScheduler(): void {
40
+ const check = () => {
41
+ const config = loadSchedule();
42
+ if (!config.enabled) return;
43
+
44
+ const now = new Date();
45
+ const today = now.toISOString().slice(0, 10);
46
+
47
+ // ±1 分のウィンドウで判定(interval のズレを吸収)
48
+ const nowMin = now.getDay() * 1440 + now.getHours() * 60 + now.getMinutes();
49
+ const targetMin = config.dayOfWeek * 1440 + config.hour * 60 + config.minute;
50
+ const diff = nowMin - targetMin;
51
+
52
+ if (diff >= 0 && diff < 2 && config.lastRunDate !== today) {
53
+ console.log(`[scheduler] triggering scheduled run (${today})`);
54
+ spawnRun({});
55
+ saveSchedule({ ...config, lastRunDate: today });
56
+ }
57
+ };
58
+
59
+ // 次の分の頭に揃えてから毎分チェック
60
+ const msToNextMinute = 60_000 - (Date.now() % 60_000);
61
+ setTimeout(() => {
62
+ check();
63
+ setInterval(check, 60_000);
64
+ }, msToNextMinute);
65
+ }