npm - @m8i-51/shoal - Versions diffs - 0.1.4 → 0.1.5 - Mend

@m8i-51/shoal 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +47 -12
package/bin/init.js +65 -2
package/framework/report.ts +23 -1
package/package.json +1 -1
package/run.ts +28 -3
package/server/index.ts +23 -0
package/server/runs.ts +6 -0
package/server/scheduler.ts +65 -0
package/web/dist/assets/index-ehlX_Hdw.js +68 -0
package/web/dist/index.html +1 -1
package/web/dist/mascot.svg +53 -0
package/web/dist/assets/index-CD6EJ_1O.js +0 -68

package/README.md CHANGED Viewed

@@ -1,19 +1,21 @@
 [日本語版はこちら](README_JA.md)
-# shoal
+<p align="center">
+  <img src="assets/logo-lockup.svg" alt="shoal" height="72">
+</p>
-[![npm](https://img.shields.io/npm/v/@m8i-51/shoal?color=red)](https://www.npmjs.com/package/@m8i-51/shoal)
-[![TypeScript](https://img.shields.io/badge/TypeScript-5-blue?logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
-[![Playwright](https://img.shields.io/badge/Playwright-browser-45ba4b?logo=playwright&logoColor=white)](https://playwright.dev/)
-[![Anthropic](https://img.shields.io/badge/Anthropic-Claude-blueviolet?logo=anthropic&logoColor=white)](https://www.anthropic.com/)
+<p align="center">
+  <a href="https://www.npmjs.com/package/@m8i-51/shoal"><img src="https://img.shields.io/npm/v/@m8i-51/shoal?color=red" alt="npm"></a>
+  <a href="https://www.typescriptlang.org/"><img src="https://img.shields.io/badge/TypeScript-5-blue?logo=typescript&logoColor=white" alt="TypeScript"></a>
+  <a href="https://playwright.dev/"><img src="https://img.shields.io/badge/Playwright-browser-45ba4b?logo=playwright&logoColor=white" alt="Playwright"></a>
+  <a href="https://www.anthropic.com/"><img src="https://img.shields.io/badge/Anthropic-Claude-blueviolet?logo=anthropic&logoColor=white" alt="Anthropic"></a>
+</p>
-Point it at any web app. Agents explore it and file GitHub Issues.
+**AI agents that experience your app — and help it grow.**
-shoal drops a swarm of agents onto a web app. Each agent has a distinct persona and evaluation lens — accessibility, security, business logic, data integrity, new user experience, and goal alignment. They explore independently via API and real browser, then a triage agent deduplicates findings and files GitHub Issues.
+shoal drops a swarm of AI agents onto a web app. Each agent has a distinct persona and explores the app as a real user would — navigating pages, taking actions, noticing friction. They surface bugs, usability issues, missing features, and gaps between what the app does and what it's meant to achieve.
-A **web dashboard** lets you start runs, monitor live progress, review findings by category, and track estimated LLM cost per run.
-No test scripts. No test data. No prior knowledge of the app required.
+No test scripts. No test data. No prior knowledge of the app required. Just a URL.
 ---
@@ -41,6 +43,21 @@ Target App (any URL)
                  Triage Agent
 ```
+Each agent carries a distinct perspective — accessibility, security, business logic, UI design, new user experience, and more. They operate on a shared understanding of the app's purpose and goals. Coverage is tracked across runs, so each session naturally focuses on areas that haven't been explored yet.
+---
+## What it finds
+At the end of each run:
+- **Bugs** — broken flows, errors, inconsistent data
+- **UX issues** — confusing interactions, dead ends, unclear states
+- **Feature suggestions** — things that would add real value
+- **Goal gaps** — where the app falls short of what it's trying to achieve
+Findings are filed as GitHub Issues or saved as a self-contained HTML report. A **web dashboard** lets you start runs, watch live progress, review findings by category, and track estimated LLM cost per run.
 ---
 ## Quick Start
@@ -52,7 +69,7 @@ npm install -g @m8i-51/shoal
 npx playwright install chromium
 ```
-Move to the project you want to test, then run:
+Move to the project you want to explore, then run:
 ```bash
 cd your-project
@@ -63,7 +80,7 @@ Open `.env` and set at minimum:
 ```env
 ANTHROPIC_API_KEY=sk-ant-...
-BASE_URL=http://localhost:3000   # URL of the app to test
+BASE_URL=http://localhost:3000   # URL of the app to explore
 ```
 Then run:
@@ -113,6 +130,7 @@ Opens at `http://localhost:4000`. From there you can:
 | `ANTHROPIC_API_KEY` | — | Required |
 | `GITHUB_TOKEN` | — | Optional — enables Issue creation |
 | `GITHUB_REPO` | — | `owner/repo` format |
+| `REFRESH_SPEC` | — | Set to `1` to re-run product discovery |
 ---
@@ -158,6 +176,23 @@ Alternatively, copy `targets/example.ts`, register it in `targets/index.ts`, and
 ---
+## Scheduled runs
+To run shoal weekly against a staging environment, add a GitHub Actions workflow to your repo.
+Run `shoal init` — it will offer to generate `.github/workflows/shoal-weekly.yml` automatically. Or copy the example from this repo:
+```bash
+curl -O https://raw.githubusercontent.com/m8i-51/shoal/main/.github/workflows/shoal-weekly.example.yml
+mv shoal-weekly.example.yml .github/workflows/shoal-weekly.yml
+```
+Then add `ANTHROPIC_API_KEY` to your repo's **Actions secrets** (`Settings → Secrets and variables → Actions`).
+The workflow runs every Monday at 09:00 UTC and can also be triggered manually from the Actions tab. Findings are filed as GitHub Issues using the built-in `GITHUB_TOKEN`.
+---
 ## Account Manager
 For apps that require login, shoal includes an Account Manager agent that autonomously discovers and tests authentication. It finds login pages, tests credentials from `test-accounts/` (gitignored), and injects session state into explorer agents so they can reach authenticated routes.

package/bin/init.js CHANGED Viewed

@@ -1,5 +1,5 @@
-import { intro, outro, select, text, isCancel, cancel } from "@clack/prompts";
-import { writeFileSync, existsSync } from "fs";
+import { intro, outro, select, text, confirm, isCancel, cancel } from "@clack/prompts";
+import { writeFileSync, existsSync, mkdirSync } from "fs";
 import { join } from "path";
 const PROVIDERS = [
@@ -102,5 +102,68 @@ export async function runInit(cwd) {
   const lines = Object.entries(env).map(([k, v]) => `${k}=${v}`);
   writeFileSync(envPath, lines.join("\n") + "\n", "utf-8");
+  // ── GitHub Actions workflow (optional) ────────────────────────────
+  const wantsWorkflow = guard(await confirm({
+    message: "Generate a GitHub Actions workflow for weekly scheduled runs?",
+    initialValue: false,
+  }));
+  if (wantsWorkflow) {
+    const stagingUrl = guard(await text({
+      message: "Staging URL (used as BASE_URL in the workflow)",
+      placeholder: "https://staging.example.com",
+      validate: (v) => v?.trim() ? undefined : "Required",
+    }));
+    const workflowDir = join(cwd, ".github", "workflows");
+    const workflowPath = join(workflowDir, "shoal-weekly.yml");
+    mkdirSync(workflowDir, { recursive: true });
+    writeFileSync(workflowPath, `# shoal weekly run
+#
+# Required secrets:  ANTHROPIC_API_KEY
+# Required variables: STAGING_URL is hardcoded below — update as needed
+#
+# GitHub Issues are filed automatically using the built-in GITHUB_TOKEN.
+name: shoal weekly run
+on:
+  schedule:
+    - cron: '0 9 * * 1'   # every Monday at 09:00 UTC
+  workflow_dispatch:        # also allow manual trigger from the Actions tab
+jobs:
+  shoal:
+    runs-on: ubuntu-latest
+    timeout-minutes: 60
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+        with:
+          node-version: '20'
+      - name: Install shoal
+        run: npm install -g @m8i-51/shoal
+      - name: Install Playwright browsers
+        run: npx playwright install chromium --with-deps
+      - name: Run shoal
+        env:
+          ANTHROPIC_API_KEY: \${{ secrets.ANTHROPIC_API_KEY }}
+          BASE_URL: ${stagingUrl.trim()}
+          GITHUB_TOKEN: \${{ secrets.GITHUB_TOKEN }}
+          GITHUB_REPO: \${{ github.repository }}
+          MAX_BROWSERS: '2'
+          MAX_EXPLORERS: '0'
+        run: shoal
+`, "utf-8");
+    console.log(`\n  Created ${workflowPath}`);
+    console.log("  Next: add ANTHROPIC_API_KEY to your repo's Actions secrets");
+  }
   outro("Created .env\n\n  shoal serve   — open the dashboard at http://localhost:4000\n  shoal         — run agents from the terminal");
 }

package/framework/report.ts CHANGED Viewed

@@ -1,6 +1,6 @@
 import * as fs from "fs";
 import * as path from "path";
-import type { Finding, RunLog } from "./types";
+import type { Finding, RunLog, RegressionCheck } from "./types";
 import type { ProductSpec } from "./product-discovery";
 import type { TriageResult } from "./triage";
 import type { Scenario, ScenarioOutcome } from "./scenario-designer";
@@ -51,6 +51,10 @@ export function generateReport(
 ): string {
   const reportPath = path.join(process.cwd(), "logs", `report_${runLog.runId}.html`);
+  const allRegressionChecks: RegressionCheck[] = runLog.agents.flatMap((a) => a.regressionChecks ?? []);
+  const fixedChecks = allRegressionChecks.filter((c) => c.status === "fixed");
+  const regressedChecks = allRegressionChecks.filter((c) => c.status === "regressed");
   const issuedSet = new Set(triageResult.issued);
   const skippedSet = new Set(triageResult.skipped);
@@ -242,10 +246,28 @@ export function generateReport(
       <div class="stat-card"><div class="number">${triageResult.skipped.length}</div><div class="label">skipped</div></div>
       <div class="stat-card"><div class="number">${triageResult.unprocessed.length}</div><div class="label">pending</div></div>
       <div class="stat-card"><div class="number">${runLog.agents.length}</div><div class="label">agents</div></div>
+      ${allRegressionChecks.length > 0 ? `<div class="stat-card"><div class="number" style="color:#22c55e">${fixedChecks.length}</div><div class="label">still fixed</div></div><div class="stat-card"><div class="number" style="color:${regressedChecks.length > 0 ? "#ef4444" : "#94a3b8"}">${regressedChecks.length}</div><div class="label">regressed</div></div>` : ""}
     </div>
     <div class="category-bar">${categoryBar || '<div style="width:100%;display:flex;align-items:center;padding:0 .75rem;font-size:.75rem;color:#94a3b8">no findings</div>'}</div>
   </section>
+  ${allRegressionChecks.length > 0 ? `
+  <section>
+    <h2>Progress (${allRegressionChecks.length} issues checked)</h2>
+    ${regressedChecks.length > 0 ? `<p style="color:#ef4444;font-size:.875rem;margin-bottom:.75rem">⚠ ${regressedChecks.length} regression${regressedChecks.length !== 1 ? "s" : ""} detected</p>` : `<p style="color:#22c55e;font-size:.875rem;margin-bottom:.75rem">✓ All previously fixed issues remain resolved</p>`}
+    <table>
+      <thead><tr><th>#</th><th>Issue</th><th style="text-align:center">Status</th></tr></thead>
+      <tbody>
+        ${allRegressionChecks.map((c) => `
+        <tr>
+          <td style="color:#94a3b8">#${c.issueNumber}</td>
+          <td>${esc(c.issueTitle)}</td>
+          <td style="text-align:center">${c.status === "fixed" ? '<span class="badge" style="background:#22c55e">✓ fixed</span>' : '<span class="badge" style="background:#ef4444">⚠ regressed</span>'}</td>
+        </tr>`).join("")}
+      </tbody>
+    </table>
+  </section>` : ""}
   <section>
     <h2>Findings (${findings.length})</h2>
     ${sortedFindings.length > 0 ? findingCards : "<p style='color:#94a3b8;font-size:.875rem'>No findings collected.</p>"}

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@m8i-51/shoal",
-  "version": "0.1.4",
+  "version": "0.1.5",
   "type": "module",
   "description": "Multi-agent web exploration framework — finds bugs, UX issues, and missing features by running AI agents against your app",
   "repository": {

package/run.ts CHANGED Viewed

@@ -43,6 +43,7 @@ import { estimateCost, formatCostUSD } from "./framework/cost";
 const BASE_URL = process.env.BASE_URL ?? "http://localhost:3000";
 const GITHUB_TOKEN = process.env.GITHUB_TOKEN ?? "";
 const GITHUB_REPO = process.env.GITHUB_REPO ?? "";
+const REFRESH_SPEC = process.env.REFRESH_SPEC === "1";
 const githubOptions = { token: GITHUB_TOKEN, repo: GITHUB_REPO };
 const TARGET = process.env.TARGET ?? "none";
@@ -110,7 +111,14 @@ const POST_FEEDBACK_TOOL: Tool = {
     type: "object",
     properties: {
       title: { type: "string" },
-      body: { type: "string" },
+      body: {
+        type: "string",
+        description: `Describe the finding. Tone varies by category:
+- bug: technical — state what happened, what was expected, and steps to reproduce.
+- ux: experiential — write from the user's perspective ("I tried to...", "It was hard to find...", "I got confused when...").
+- feature-request: aspirational — describe what you wished you could do ("It would have been helpful if...", "I wanted to...").
+- goal-gap: goal-oriented — explain which goal was blocked and why ("I was trying to achieve X, but couldn't because...").`,
+      },
       category: { type: "string", enum: ["ux", "feature-request", "bug", "goal-gap"] },
     },
     required: ["title", "body", "category"],
@@ -317,6 +325,12 @@ ${productSpec.appDescription}
 If you notice anything inconvenient, a missing feature, or bug-like behavior,
 report it with the post_feedback tool.
+When writing the body, match the tone to the category:
+- bug: technical ("The endpoint returned 500 when...", "Expected X but got Y")
+- ux: experiential ("I tried to find the button but...", "It was unclear what would happen if...")
+- feature-request: aspirational ("It would have been useful if...", "I wished I could...")
+- goal-gap: goal-oriented ("I was trying to X, but couldn't because...")
 [Implemented Features]
 ${productSpec.features}
 ${productSpec.uiFeatures ? `\n[UI-Only Features]\nThese features exist in the UI but may not be reflected in API responses. Keep them in mind when interpreting API results.\n${productSpec.uiFeatures}\n` : ""}${productSpec.designContext ? `\n[Design Context]\n${productSpec.designContext}\n` : ""}${goalsSection(productSpec)}${assignment.scenario
@@ -882,6 +896,12 @@ ${productSpec.appDescription}
 4. Move to another page and repeat
 5. Finish after 8–10 actions
+When writing the body, match the tone to the category:
+- bug: technical ("The endpoint returned 500 when...", "Expected X but got Y")
+- ux: experiential ("I tried to find the button but...", "It was unclear what would happen if...")
+- feature-request: aspirational ("It would have been useful if...", "I wished I could...")
+- goal-gap: goal-oriented ("I was trying to X, but couldn't because...")
 [Using Observation Tools]
 - To verify an action was actually applied, call diff_since_last_action
 - If data isn't reflected or errors appear, call read_network_errors
@@ -1041,8 +1061,13 @@ async function main() {
   const scenarioOutcomes: ScenarioOutcome[] = [];
   try {
     const cached = loadCachedSpec(BASE_URL);
-    if (cached) {
-      console.log(`\n[product-discovery] using cache (date: ${cached.discoveredAt?.slice(0, 10) ?? "unknown"}, confidence: ${cached.confidence})`);
+    if (cached && !REFRESH_SPEC) {
+      const ageDays = cached.discoveredAt
+        ? Math.floor((Date.now() - new Date(cached.discoveredAt).getTime()) / 86_400_000)
+        : null;
+      const ageStr = ageDays != null ? `${ageDays} day${ageDays !== 1 ? "s" : ""} old` : "unknown date";
+      const staleHint = ageDays != null && ageDays >= 7 ? " — set REFRESH_SPEC=1 to re-run discovery" : "";
+      console.log(`\n[product-discovery] using cache (${ageStr}, confidence: ${cached.confidence})${staleHint}`);
       productSpec = cached;
     } else {
       const discoveryContext = await browser.newContext({ viewport: { width: 1024, height: 640 } });

package/server/index.ts CHANGED Viewed

@@ -6,6 +6,7 @@ import { dirname, join, resolve } from "path";
 import { existsSync, readFileSync, writeFileSync } from "fs";
 import { listRuns, getReportPath } from "./runs.js";
 import { activeSessions, spawnRun, cancelSession } from "./runner.js";
+import { loadSchedule, saveSchedule, startScheduler, type ScheduleConfig } from "./scheduler.js";
 function specFilePath(baseUrl: string): string {
   try {
@@ -248,6 +249,28 @@ process.on("unhandledRejection", (reason) => {
   console.error("[server] unhandledRejection:", reason);
 });
+// ----------------------------------------------------------------
+// API: schedule config
+// ----------------------------------------------------------------
+app.get("/api/schedule", (_req, res) => {
+  res.json(loadSchedule());
+});
+app.patch("/api/schedule", (req, res) => {
+  const current = loadSchedule();
+  const { enabled, dayOfWeek, hour, minute } = req.body as Partial<ScheduleConfig>;
+  const updated: ScheduleConfig = {
+    ...current,
+    ...(enabled != null ? { enabled: Boolean(enabled) } : {}),
+    ...(dayOfWeek != null && Number.isInteger(dayOfWeek) && dayOfWeek >= 0 && dayOfWeek <= 6 ? { dayOfWeek } : {}),
+    ...(hour != null && Number.isInteger(hour) && hour >= 0 && hour <= 23 ? { hour } : {}),
+    ...(minute != null && Number.isInteger(minute) && minute >= 0 && minute <= 59 ? { minute } : {}),
+  };
+  saveSchedule(updated);
+  res.json(updated);
+});
 app.listen(PORT, () => {
   console.log(`\nshoal dashboard → http://localhost:${PORT}\n`);
+  startScheduler();
 });

package/server/runs.ts CHANGED Viewed

@@ -15,6 +15,8 @@ export interface RunSummary {
   hasReport: boolean;
   isLive?: boolean;
   estimatedCostUSD: number | null;
+  regressionChecked: number;
+  regressionFailed: number;
 }
 function countFindings(runId: string): { total: number; byCategory: Record<string, number> } {
@@ -63,6 +65,8 @@ export function listRuns(): RunSummary[] {
             hasReport: false,
             isLive: true,
             estimatedCostUSD: null,
+            regressionChecked: 0,
+            regressionFailed: 0,
           });
         }
       } catch { /* skip */ }
@@ -90,6 +94,8 @@ export function listRuns(): RunSummary[] {
         findingsByCategory: byCategory,
         hasReport: fs.existsSync(reportPath),
         estimatedCostUSD: log.summary?.cost?.estimatedUSD ?? null,
+        regressionChecked: log.summary?.regressionChecked ?? 0,
+        regressionFailed: log.summary?.regressionFailed ?? 0,
       });
     } catch { /* skip */ }
   }

package/server/scheduler.ts ADDED Viewed

@@ -0,0 +1,65 @@
+import { existsSync, readFileSync, writeFileSync } from "fs";
+import { join } from "path";
+import { spawnRun } from "./runner.js";
+export interface ScheduleConfig {
+  enabled: boolean;
+  dayOfWeek: number;   // 0=Sun 1=Mon ... 6=Sat
+  hour: number;
+  minute: number;
+  lastRunDate: string | null;  // YYYY-MM-DD — prevents double-trigger
+}
+const DEFAULT_CONFIG: ScheduleConfig = {
+  enabled: false,
+  dayOfWeek: 1,
+  hour: 9,
+  minute: 0,
+  lastRunDate: null,
+};
+function configPath(): string {
+  return join(process.cwd(), "schedule.json");
+}
+export function loadSchedule(): ScheduleConfig {
+  const p = configPath();
+  if (!existsSync(p)) return { ...DEFAULT_CONFIG };
+  try {
+    return { ...DEFAULT_CONFIG, ...JSON.parse(readFileSync(p, "utf-8")) };
+  } catch {
+    return { ...DEFAULT_CONFIG };
+  }
+}
+export function saveSchedule(config: ScheduleConfig): void {
+  writeFileSync(configPath(), JSON.stringify(config, null, 2), "utf-8");
+}
+export function startScheduler(): void {
+  const check = () => {
+    const config = loadSchedule();
+    if (!config.enabled) return;
+    const now = new Date();
+    const today = now.toISOString().slice(0, 10);
+    // ±1 分のウィンドウで判定（interval のズレを吸収）
+    const nowMin = now.getDay() * 1440 + now.getHours() * 60 + now.getMinutes();
+    const targetMin = config.dayOfWeek * 1440 + config.hour * 60 + config.minute;
+    const diff = nowMin - targetMin;
+    if (diff >= 0 && diff < 2 && config.lastRunDate !== today) {
+      console.log(`[scheduler] triggering scheduled run (${today})`);
+      spawnRun({});
+      saveSchedule({ ...config, lastRunDate: today });
+    }
+  };
+  // 次の分の頭に揃えてから毎分チェック
+  const msToNextMinute = 60_000 - (Date.now() % 60_000);
+  setTimeout(() => {
+    check();
+    setInterval(check, 60_000);
+  }, msToNextMinute);
+}