site-agent-pro 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +689 -0
- package/dist/auth/credentialStore.js +62 -0
- package/dist/auth/inbox.js +193 -0
- package/dist/auth/profile.js +379 -0
- package/dist/auth/runner.js +1124 -0
- package/dist/backend/dashboardData.js +194 -0
- package/dist/backend/runArtifacts.js +48 -0
- package/dist/backend/runRepository.js +93 -0
- package/dist/bin.js +2 -0
- package/dist/cli/backfillSiteChecks.js +143 -0
- package/dist/cli/run.js +309 -0
- package/dist/cli/trade.js +69 -0
- package/dist/config.js +199 -0
- package/dist/core/agentProfiles.js +55 -0
- package/dist/core/aggregateReport.js +382 -0
- package/dist/core/audit.js +30 -0
- package/dist/core/customTaskSuite.js +148 -0
- package/dist/core/evaluator.js +217 -0
- package/dist/core/executor.js +788 -0
- package/dist/core/fallbackReport.js +335 -0
- package/dist/core/formHeuristics.js +411 -0
- package/dist/core/gameplaySummary.js +164 -0
- package/dist/core/interaction.js +202 -0
- package/dist/core/pageState.js +201 -0
- package/dist/core/planner.js +1669 -0
- package/dist/core/processSubmissionBatch.js +204 -0
- package/dist/core/runAuditJob.js +170 -0
- package/dist/core/runner.js +2352 -0
- package/dist/core/siteBrief.js +107 -0
- package/dist/core/siteChecks.js +1526 -0
- package/dist/core/taskDirectives.js +279 -0
- package/dist/core/taskHeuristics.js +263 -0
- package/dist/dashboard/client.js +1256 -0
- package/dist/dashboard/contracts.js +95 -0
- package/dist/dashboard/narrative.js +277 -0
- package/dist/dashboard/server.js +458 -0
- package/dist/dashboard/theme.js +888 -0
- package/dist/index.js +84 -0
- package/dist/llm/client.js +188 -0
- package/dist/paystack/account.js +123 -0
- package/dist/paystack/client.js +100 -0
- package/dist/paystack/index.js +13 -0
- package/dist/paystack/test-paystack.js +83 -0
- package/dist/paystack/transfer.js +138 -0
- package/dist/paystack/types.js +74 -0
- package/dist/paystack/webhook.js +121 -0
- package/dist/prompts/browserAgent.js +124 -0
- package/dist/prompts/reviewer.js +71 -0
- package/dist/reporting/clickReplay.js +290 -0
- package/dist/reporting/html.js +930 -0
- package/dist/reporting/markdown.js +238 -0
- package/dist/reporting/template.js +1141 -0
- package/dist/schemas/types.js +361 -0
- package/dist/submissions/customTasks.js +196 -0
- package/dist/submissions/html.js +770 -0
- package/dist/submissions/model.js +56 -0
- package/dist/submissions/publicUrl.js +76 -0
- package/dist/submissions/service.js +74 -0
- package/dist/submissions/store.js +37 -0
- package/dist/submissions/types.js +65 -0
- package/dist/trade/engine.js +241 -0
- package/dist/trade/evm/erc20.js +44 -0
- package/dist/trade/extractor.js +148 -0
- package/dist/trade/policy.js +35 -0
- package/dist/trade/session.js +31 -0
- package/dist/trade/types.js +107 -0
- package/dist/trade/validator.js +148 -0
- package/dist/utils/files.js +59 -0
- package/dist/utils/log.js +24 -0
- package/dist/utils/playwrightCompat.js +14 -0
- package/dist/utils/time.js +3 -0
- package/dist/wallet/provider.js +345 -0
- package/dist/wallet/relay.js +129 -0
- package/dist/wallet/wallet.js +178 -0
- package/docs/01-installation.md +134 -0
- package/docs/02-running-your-first-audit.md +136 -0
- package/docs/03-configuration.md +233 -0
- package/docs/04-how-the-agent-thinks.md +41 -0
- package/docs/05-extending-personas-and-tasks.md +42 -0
- package/docs/06-hardening-for-production.md +92 -0
- package/package.json +60 -0
package/README.md
ADDED
|
@@ -0,0 +1,689 @@
|
|
|
1
|
+
# Site Agent Pro
|
|
2
|
+
|
|
3
|
+
> AI-powered browser agent that executes real user tasks on any website, captures step-by-step evidence, and produces scored, actionable reports.
|
|
4
|
+
|
|
5
|
+
**Playwright** · **OpenAI / Ollama** · **axe-core** · **TypeScript** · **Zod**
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## How It Works
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
User submits URL + tasks
|
|
13
|
+
│
|
|
14
|
+
▼
|
|
15
|
+
┌─────────────────────────────┐
|
|
16
|
+
│ Chromium launches │
|
|
17
|
+
│ (desktop 1440×900 │
|
|
18
|
+
│ or mobile 390×844) │
|
|
19
|
+
└──────────┬──────────────────┘
|
|
20
|
+
▼
|
|
21
|
+
┌─────────────────────────────┐
|
|
22
|
+
│ For each task: │
|
|
23
|
+
│ 1. Capture page state │
|
|
24
|
+
│ 2. LLM plans next action │
|
|
25
|
+
│ 3. Playwright executes it │
|
|
26
|
+
│ 4. Repeat until done │
|
|
27
|
+
└──────────┬──────────────────┘
|
|
28
|
+
▼
|
|
29
|
+
┌─────────────────────────────┐
|
|
30
|
+
│ Site checks: │
|
|
31
|
+
│ SEO · Performance · │
|
|
32
|
+
│ Security · Accessibility · │
|
|
33
|
+
│ Mobile · Content · CRO │
|
|
34
|
+
└──────────┬──────────────────┘
|
|
35
|
+
▼
|
|
36
|
+
┌─────────────────────────────┐
|
|
37
|
+
│ LLM evaluates the run │
|
|
38
|
+
│ → Scored report (1-10) │
|
|
39
|
+
│ → HTML / Markdown / JSON │
|
|
40
|
+
│ → Activity replay animation │
|
|
41
|
+
└─────────────────────────────┘
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Features
|
|
47
|
+
|
|
48
|
+
- **Task-driven execution** — the agent follows only the tasks you provide, nothing more
|
|
49
|
+
- **Step-by-step evidence** — every interaction, page state, relevant console signal, and network failure is logged
|
|
50
|
+
- **Ordered instruction parsing** — pasted instructions, bullet lists, JSON tasks, and uploaded text files are normalized into accepted task lanes
|
|
51
|
+
- **Independent evaluation** — the LLM scores from captured evidence, not from the agent's own impressions
|
|
52
|
+
- **Multi-agent perspectives** — run 1–5 agents with different personas on the same site, merged into one report
|
|
53
|
+
- **Auth-aware** — detects login walls mid-run, fills signup forms, polls IMAP for OTP/verification emails
|
|
54
|
+
- **Supplemental audits** — SEO crawl, security headers, performance timings, accessibility (axe-core), CRO signals, content readability, mobile layout
|
|
55
|
+
- **Activity replay** — compact animated WebP that overlays all recorded agent actions onto the captured click frames
|
|
56
|
+
- **Exchange-flow QA** — safely tests Naira/crypto buy and sell flows with harmless values and stops before real transfers
|
|
57
|
+
- **Paystack Integration** — provision dedicated virtual Naira accounts for agents and execute **autonomous bank transfers** during tasks
|
|
58
|
+
- **Dual LLM support** — OpenAI (GPT-5) for production, Ollama for local/private development
|
|
59
|
+
- **Two deployment modes** — CLI or web dashboard, including Render web service deployment
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Quick Start
|
|
64
|
+
|
|
65
|
+
> **Using site-agent-pro in your own project?** Install it as a devDependency instead of cloning:
|
|
66
|
+
> ```bash
|
|
67
|
+
> npm install --save-dev site-agent-pro
|
|
68
|
+
>
|
|
69
|
+
> # Run a quick audit
|
|
70
|
+
> site-agent-pro --url http://localhost:3000 --task "Check the homepage"
|
|
71
|
+
>
|
|
72
|
+
> # OR launch the web UI to enter tasks visually
|
|
73
|
+
> site-agent-pro ui
|
|
74
|
+
> ```
|
|
75
|
+
> See [Programmatic API](#programmatic-api) for scripted and CI usage.
|
|
76
|
+
|
|
77
|
+
### 1. Install (self-hosted / contributor setup)
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
npm install
|
|
81
|
+
npm run browser:install
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### 2. Configure
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
cp .env.example .env
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
Set your LLM provider in `.env`:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
# Option A: OpenAI (recommended for production)
|
|
94
|
+
LLM_PROVIDER=openai
|
|
95
|
+
OPENAI_API_KEY=your_key_here
|
|
96
|
+
|
|
97
|
+
LLM_PROVIDER=ollama
|
|
98
|
+
OLLAMA_MODEL=llama3.1:8b
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
#### Sensitive Data & CLI Overrides
|
|
102
|
+
If you prefer not to save sensitive credentials in your `.env` file, you can pass them directly via the CLI:
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
npx site-agent-pro --url https://example.com \
|
|
106
|
+
--openai-api-key "sk-..." \
|
|
107
|
+
--imap-password "your-app-password" \
|
|
108
|
+
--auth-password "test-user-password" \
|
|
109
|
+
--private-key "0x..."
|
|
110
|
+
```
|
|
111
|
+
CLI flags always take precedence over values in `.env`.
|
|
112
|
+
|
|
113
|
+
### 3. Run
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
# Run the agent against a site
|
|
117
|
+
npm run dev -- --url https://example.com \
|
|
118
|
+
--task "Open pricing and compare the visible plans before signup"
|
|
119
|
+
|
|
120
|
+
# Start the web dashboard
|
|
121
|
+
npm run dashboard
|
|
122
|
+
# → http://localhost:4173
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### 4. View Results
|
|
126
|
+
|
|
127
|
+
Artifacts are saved to `runs/<run-id>/`:
|
|
128
|
+
|
|
129
|
+
| File | Contents |
|
|
130
|
+
|---|---|
|
|
131
|
+
| `report.html` | Standalone shareable report |
|
|
132
|
+
| `report.json` | Machine-readable scored report |
|
|
133
|
+
| `report.md` | Markdown report |
|
|
134
|
+
| `task-results.json` | Per-task step history and outcomes |
|
|
135
|
+
| `raw-events.json` | Every browser event, console log, and network request |
|
|
136
|
+
| `accessibility.json` | axe-core violation list |
|
|
137
|
+
| `site-checks.json` | SEO, performance, security, CRO, content, mobile checks |
|
|
138
|
+
| `click-replay.webp` | Compact animated activity replay with click screenshots and overlays for all recorded actions |
|
|
139
|
+
| `*.webm` | Full browser session video recording (enabled via `RECORD_VIDEO=true`) |
|
|
140
|
+
| `inputs.json` | Run configuration and timing metadata |
|
|
141
|
+
| `trade-executions.json` | Optional deterministic trade validation/execution records when trade mode is enabled |
|
|
142
|
+
| `trade-instruction.json` | Optional standalone trade CLI input copy when `npm run trade:run` is used |
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## CLI Reference
|
|
147
|
+
|
|
148
|
+
```bash
|
|
149
|
+
# Basic single-task run
|
|
150
|
+
npm run dev -- --url https://example.com --task "Click the pricing tab"
|
|
151
|
+
|
|
152
|
+
# Multiple tasks
|
|
153
|
+
npm run dev -- --url https://example.com \
|
|
154
|
+
--task "Read the visible how-to-play section" \
|
|
155
|
+
--task "Play the game five times and record each win or loss"
|
|
156
|
+
|
|
157
|
+
# Mobile viewport (iPhone 13, 390×844)
|
|
158
|
+
npm run dev -- --url https://example.com --task "Check the mobile nav" --mobile
|
|
159
|
+
|
|
160
|
+
# Headed mode (visible browser)
|
|
161
|
+
npm run dev -- --url https://example.com --task "Open pricing" --headed
|
|
162
|
+
|
|
163
|
+
# Ollama for local sites
|
|
164
|
+
npm run dev -- --url http://127.0.0.1:3000 --task "Check the homepage CTA" \
|
|
165
|
+
--llm-provider ollama --model llama3.1:8b
|
|
166
|
+
|
|
167
|
+
# Allow self-signed HTTPS certificates
|
|
168
|
+
npm run dev -- --url https://localhost:3000 --task "Check the homepage" --ignore-https-errors
|
|
169
|
+
|
|
170
|
+
# Exchange-flow QA without real transfers
|
|
171
|
+
npm run dev -- --url https://example.com \
|
|
172
|
+
--task "Click Buy; enter 50000 NGN; confirm the crypto preview updates; copy the account number if available; stop before making any real payment" \
|
|
173
|
+
--task "Click Sell; enter 0.01 USDT; confirm the Naira payout preview updates; stop before sending any real crypto"
|
|
174
|
+
|
|
175
|
+
# Deterministic onchain validation in dry-run mode
|
|
176
|
+
npm run dev -- --url https://example-dapp.test \
|
|
177
|
+
--task "Sell 0.01 USDC using the visible deposit address" \
|
|
178
|
+
--trade-dry-run --trade-strategy deposit_only
|
|
179
|
+
|
|
180
|
+
# Autonomous Paystack transfer during a task
|
|
181
|
+
npm run dev -- --url https://example-shop.com \
|
|
182
|
+
--task "Pay 100 Naira to the bank account shown on the checkout page" \
|
|
183
|
+
--trade-enabled
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
> **Note:** Every CLI run requires at least one `--task` flag. Runs with no tasks are rejected.
|
|
187
|
+
|
|
188
|
+
### All CLI Options
|
|
189
|
+
|
|
190
|
+
| Flag | Description |
|
|
191
|
+
|---|---|
|
|
192
|
+
| `--url <url>` | **(Required)** Website URL to test |
|
|
193
|
+
| `--task <task>` | **(Required)** Task for the agent. Repeat for multiple tasks |
|
|
194
|
+
| `--headed` | Run browser in headed (visible) mode |
|
|
195
|
+
| `--mobile` | Use iPhone 13 mobile viewport |
|
|
196
|
+
| `--ignore-https-errors` | Allow invalid or self-signed HTTPS certificates |
|
|
197
|
+
| `--llm-provider <name>` | LLM provider: `openai` or `ollama` |
|
|
198
|
+
| `--model <name>` | Override the model name |
|
|
199
|
+
| `--ollama-base-url <url>` | Override the Ollama endpoint |
|
|
200
|
+
| `--storage-state <path>` | Load Playwright storage state JSON before the run |
|
|
201
|
+
| `--save-storage-state <path>` | Save Playwright storage state JSON after the run |
|
|
202
|
+
| `--auth-flow` | Bootstrap a test account (signup/login/OTP), then run tasks |
|
|
203
|
+
| `--auth-only` | Bootstrap a test account and save session — skip task run |
|
|
204
|
+
| `--signup-url <url>` | Signup page URL (absolute or relative) |
|
|
205
|
+
| `--login-url <url>` | Login page URL (absolute or relative) |
|
|
206
|
+
| `--access-url <url>` | Protected page URL to verify after login |
|
|
207
|
+
| `--trade-enabled` | Allow deterministic onchain trade execution for this run |
|
|
208
|
+
| `--trade-dry-run` | Validate extracted trade details without broadcasting a transaction |
|
|
209
|
+
| `--trade-strategy <strategy>` | Trade strategy: `auto`, `dapp_only`, or `deposit_only` |
|
|
210
|
+
| `--trade-confirmations <count>` | Confirmations to wait for before marking a trade confirmed, 0–12 |
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## Programmatic API
|
|
215
|
+
|
|
216
|
+
Install site-agent-pro as a devDependency in your own project — no cloning required:
|
|
217
|
+
|
|
218
|
+
```bash
|
|
219
|
+
npm install --save-dev site-agent-pro
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
Then run audits from any script, test file, or CI pipeline:
|
|
223
|
+
|
|
224
|
+
```ts
|
|
225
|
+
import { runAudit } from "site-agent-pro";
|
|
226
|
+
|
|
227
|
+
const result = await runAudit({
|
|
228
|
+
url: "http://localhost:3000", // works with localhost dev servers
|
|
229
|
+
tasks: [
|
|
230
|
+
"Open the pricing page and note the visible plans",
|
|
231
|
+
"Click the sign-up button and check the form fields",
|
|
232
|
+
],
|
|
233
|
+
});
|
|
234
|
+
|
|
235
|
+
console.log(`Score: ${result.report.overall_score}/10`);
|
|
236
|
+
console.log(`Summary: ${result.report.summary}`);
|
|
237
|
+
console.log(`Artifacts saved to: ${result.runDir}`);
|
|
238
|
+
|
|
239
|
+
// Gate your CI pipeline on a minimum quality score
|
|
240
|
+
if (result.report.overall_score < 7) {
|
|
241
|
+
process.exit(1);
|
|
242
|
+
}
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
Or add site-agent-pro to your `package.json` scripts and run it from the CLI:
|
|
246
|
+
|
|
247
|
+
```json
|
|
248
|
+
"scripts": {
|
|
249
|
+
"audit": "site-agent-pro --url http://localhost:3000 --task 'Check the homepage CTA'",
|
|
250
|
+
"audit:ui": "site-agent-pro ui",
|
|
251
|
+
"audit:mobile": "site-agent-pro --url http://localhost:3000 --task 'Check mobile nav' --mobile",
|
|
252
|
+
"audit:auth": "site-agent-pro --url http://localhost:3000 --task 'Reach the dashboard' --auth-flow"
|
|
253
|
+
}
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
```bash
|
|
257
|
+
npm run dev & # your app
|
|
258
|
+
npm run audit # site-agent-pro hits it live
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
For full control, import the lower-level API directly:
|
|
262
|
+
|
|
263
|
+
```ts
|
|
264
|
+
import { runAuditJob, buildCustomTaskSuite, normalizeCustomTasks } from "site-agent-pro";
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
> **Note:** site-agent-pro requires Playwright's Chromium browser. Run `npx playwright install chromium` once after installing the package.
|
|
268
|
+
|
|
269
|
+
---
|
|
270
|
+
|
|
271
|
+
## Web Dashboard
|
|
272
|
+
|
|
273
|
+
```bash
|
|
274
|
+
npm run dashboard
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
| URL | Purpose |
|
|
278
|
+
|---|---|
|
|
279
|
+
| `http://localhost:4173/` | Public submission form — enter URL, paste instructions, or upload text/JSON tasks |
|
|
280
|
+
| `http://localhost:4173/dashboard` | Internal run dashboard — inspect all results |
|
|
281
|
+
| `/submissions/<id>` | Submission progress tracking |
|
|
282
|
+
| `/r/<token>` | Public shareable report link (valid 30 days) |
|
|
283
|
+
| `/outputs/<run-id>` | Standalone HTML report for any run |
|
|
284
|
+
| `/api/runs` | REST API — list all runs |
|
|
285
|
+
| `/api/runs/<id>` | REST API — run detail |
|
|
286
|
+
|
|
287
|
+
The dashboard supports:
|
|
288
|
+
- Instruction paste box plus optional `.txt`, `.md`, `.json`, or `.csv` upload
|
|
289
|
+
- Public-hosted or localhost/private target mode for local development
|
|
290
|
+
- 1–5 concurrent agent perspectives per submission
|
|
291
|
+
- Per-submission trade controls for enablement, dry-run, strategy, and confirmation count
|
|
292
|
+
- Aggregate and per-agent report inspection
|
|
293
|
+
- Artifact downloads (JSON, Markdown, HTML, WebP activity replay)
|
|
294
|
+
- Strengths, weaknesses, and top fix recommendations
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## Authentication
|
|
299
|
+
|
|
300
|
+
The agent can bootstrap authenticated sessions for sites that require signup/login.
|
|
301
|
+
|
|
302
|
+
### Quick Example
|
|
303
|
+
|
|
304
|
+
```bash
|
|
305
|
+
npm run dev -- --url https://example.com \
|
|
306
|
+
--task "Reach the account dashboard and confirm billing is visible" \
|
|
307
|
+
--auth-flow --signup-url /register --login-url /login --access-url /app
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
### What Auth Bootstrap Does
|
|
311
|
+
|
|
312
|
+
1. Fills visible signup fields with your configured test identity
|
|
313
|
+
2. Polls your IMAP inbox for OTP or verification emails
|
|
314
|
+
3. Submits the OTP code or opens the verification link
|
|
315
|
+
4. Logs in with the same credentials
|
|
316
|
+
5. Verifies a protected page (if `--access-url` is provided)
|
|
317
|
+
6. Saves the authenticated session to `.auth/session.json`
|
|
318
|
+
|
|
319
|
+
Successful auth also caches the working identity in `.auth/credentials.json`, keyed by target origin, so later runs against the same site can reuse the saved username or email plus password.
|
|
320
|
+
|
|
321
|
+
Auth walls detected mid-task are handled automatically when auth credentials are configured or a working identity has already been cached for that target origin.
|
|
322
|
+
|
|
323
|
+
### Auth-Only Mode
|
|
324
|
+
|
|
325
|
+
Save an authenticated session without running tasks:
|
|
326
|
+
|
|
327
|
+
```bash
|
|
328
|
+
npm run dev -- --url https://example.com --auth-only \
|
|
329
|
+
--signup-url /register --login-url /login --access-url /dashboard
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
### Session Reuse
|
|
333
|
+
|
|
334
|
+
For sites behind verified sessions, load a saved Playwright storage state:
|
|
335
|
+
|
|
336
|
+
```bash
|
|
337
|
+
# Via CLI flag
|
|
338
|
+
npm run dev -- --url https://example.com --task "Reach the dashboard" \
|
|
339
|
+
--storage-state .auth/session.json
|
|
340
|
+
|
|
341
|
+
# Via .env (auto-loaded on every run)
|
|
342
|
+
PLAYWRIGHT_STORAGE_STATE_PATH=.auth/session.json
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
> **Important:** This does not bypass CAPTCHA, MFA, or anti-bot controls. It reuses legitimate sessions you've established.
|
|
346
|
+
|
|
347
|
+
### Auth Environment Variables
|
|
348
|
+
|
|
349
|
+
**Required for a fresh auth bootstrap** (not needed when the target origin already has cached credentials):
|
|
350
|
+
|
|
351
|
+
| Variable | Description |
|
|
352
|
+
|---|---|
|
|
353
|
+
| `AUTH_TEST_EMAIL` | Email address for signup/login |
|
|
354
|
+
| `AUTH_TEST_PASSWORD` | Password for signup/login |
|
|
355
|
+
|
|
356
|
+
**Optional login field:**
|
|
357
|
+
|
|
358
|
+
`AUTH_TEST_USERNAME`
|
|
359
|
+
|
|
360
|
+
**IMAP inbox** (for OTP/verification email polling):
|
|
361
|
+
|
|
362
|
+
| Variable | Default | Description |
|
|
363
|
+
|---|---|---|
|
|
364
|
+
| `AUTH_IMAP_HOST` | — | IMAP server hostname |
|
|
365
|
+
| `AUTH_IMAP_PORT` | `993` | IMAP port |
|
|
366
|
+
| `AUTH_IMAP_SECURE` | `true` | Use TLS |
|
|
367
|
+
| `AUTH_IMAP_USER` | — | IMAP username |
|
|
368
|
+
| `AUTH_IMAP_PASSWORD` | — | IMAP password |
|
|
369
|
+
| `AUTH_IMAP_MAILBOX` | `INBOX` | Mailbox to poll |
|
|
370
|
+
|
|
371
|
+
**Optional identity fields:**
|
|
372
|
+
|
|
373
|
+
`AUTH_TEST_FIRST_NAME` · `AUTH_TEST_LAST_NAME` · `AUTH_TEST_PHONE` · `AUTH_TEST_ADDRESS_LINE1` · `AUTH_TEST_ADDRESS_LINE2` · `AUTH_TEST_CITY` · `AUTH_TEST_STATE` · `AUTH_TEST_POSTAL_CODE` · `AUTH_TEST_COUNTRY` · `AUTH_TEST_COMPANY`
|
|
374
|
+
|
|
375
|
+
**Optional tuning:**
|
|
376
|
+
|
|
377
|
+
| Variable | Default | Description |
|
|
378
|
+
|---|---|---|
|
|
379
|
+
| `AUTH_EMAIL_POLL_TIMEOUT_MS` | `180000` | Max wait time for verification email |
|
|
380
|
+
| `AUTH_EMAIL_POLL_INTERVAL_MS` | `5000` | Poll frequency |
|
|
381
|
+
| `AUTH_OTP_LENGTH` | `6` | Expected OTP digit count |
|
|
382
|
+
| `AUTH_EMAIL_FROM_FILTER` | — | Filter emails by sender |
|
|
383
|
+
| `AUTH_EMAIL_SUBJECT_FILTER` | — | Filter emails by subject |
|
|
384
|
+
| `AUTH_GENERATED_IDENTITY_MAX_ATTEMPTS` | `5` | Signup retry count when a generated identity is rejected |
|
|
385
|
+
| `AUTH_EMAIL_DOMAIN` | — | Override the generated plus-address domain |
|
|
386
|
+
| `AUTH_SIGNUP_URL` | — | Default signup URL (instead of CLI flag) |
|
|
387
|
+
| `AUTH_LOGIN_URL` | — | Default login URL |
|
|
388
|
+
| `AUTH_ACCESS_URL` | — | Default protected page URL |
|
|
389
|
+
| `AUTH_SESSION_STATE_PATH` | `.auth/session.json` | Where to save the session |
|
|
390
|
+
|
|
391
|
+
---
|
|
392
|
+
|
|
393
|
+
## Web3 Wallet Integration
|
|
394
|
+
|
|
395
|
+
The agent has built-in support for interacting with Web3 dApps. It uses a dual-mode architecture:
|
|
396
|
+
|
|
397
|
+
1. **Programmatic Provider (Default):** Injects a secure, headless-compatible `window.ethereum` provider. Transaction signing requests are intercepted and sent to a local HTTP relay running securely inside the Node.js process. The private key never enters the browser.
|
|
398
|
+
2. **MetaMask Extension Mode (Optional):** Runs a full headed browser with the MetaMask extension loaded, and automatically clicks "Connect", "Confirm", or "Sign" in MetaMask popups.
|
|
399
|
+
|
|
400
|
+
### Quick Setup
|
|
401
|
+
|
|
402
|
+
Configure your wallet in `.env`:
|
|
403
|
+
|
|
404
|
+
```bash
|
|
405
|
+
# Required
|
|
406
|
+
WALLET_PRIVATE_KEY=your_private_key_here
|
|
407
|
+
WALLET_RPC_URL=https://eth-sepolia.g.alchemy.com/v2/...
|
|
408
|
+
WALLET_CHAIN_ID=11155111
|
|
409
|
+
|
|
410
|
+
# Optional: Mnemonic instead of private key
|
|
411
|
+
# WALLET_MNEMONIC="word1 word2 ..."
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
Once configured, the agent will automatically inject the wallet into every page it visits. The LLM planner is also aware of its wallet address and can interact with "Connect Wallet" flows.
|
|
415
|
+
|
|
416
|
+
### Deterministic Trade Execution
|
|
417
|
+
|
|
418
|
+
Trade execution is off by default unless `TRADE_ENABLED=true` or a run explicitly passes `--trade-enabled` or `--trade-dry-run`. When enabled, the agent only attempts a deterministic EVM sell/deposit handoff when the visible page and task provide enough evidence for recipient address, token, chain, and amount. Dry-run mode validates the extracted instruction and writes `trade-executions.json` without broadcasting.
|
|
419
|
+
|
|
420
|
+
Useful controls:
|
|
421
|
+
|
|
422
|
+
```bash
|
|
423
|
+
# Validate only; do not broadcast
|
|
424
|
+
--trade-dry-run
|
|
425
|
+
|
|
426
|
+
# Broadcast only if validation passes
|
|
427
|
+
--trade-enabled
|
|
428
|
+
|
|
429
|
+
# Choose how to handle the visible trade path
|
|
430
|
+
--trade-strategy auto|dapp_only|deposit_only
|
|
431
|
+
```
|
|
432
|
+
|
|
433
|
+
For a standalone trade instruction JSON file, use:
|
|
434
|
+
|
|
435
|
+
```bash
|
|
436
|
+
npm run trade:run -- --instruction ./sell-instruction.json --strategy deposit_only
|
|
437
|
+
|
|
438
|
+
# Add --broadcast only when you are ready to send the transaction
|
|
439
|
+
npm run trade:run -- --instruction ./sell-instruction.json --broadcast
|
|
440
|
+
```
|
|
441
|
+
|
|
442
|
+
### Using MetaMask Extension Mode
|
|
443
|
+
|
|
444
|
+
If you specifically need to test how a dApp interacts with the MetaMask UI, you can run the agent with the extension loaded. This requires extracting the MetaMask extension folder (not a `.crx` file).
|
|
445
|
+
|
|
446
|
+
**How to get the Extension Path (Mac):**
|
|
447
|
+
1. Ensure MetaMask is installed in your normal Google Chrome browser.
|
|
448
|
+
2. The extracted path is typically located at:
|
|
449
|
+
`/Users/<YourUsername>/Library/Application Support/Google/Chrome/Default/Extensions/nkbihfbeogaeaoehlefnkodbefgpgknn/<version_number>`
|
|
450
|
+
3. Set the environment variable in `.env`:
|
|
451
|
+
```bash
|
|
452
|
+
WALLET_METAMASK_EXTENSION_PATH=/Users/YourUsername/Library/.../11.14.2_0
|
|
453
|
+
WALLET_METAMASK_USER_DATA_DIR=/Users/YourUsername/.site-agent-metamask-profile
|
|
454
|
+
```
|
|
455
|
+
*Note: Using this mode forces the agent to run in headed (visible) mode. For real MetaMask signing/confirm popups, point `WALLET_METAMASK_USER_DATA_DIR` at a persistent Chromium profile where MetaMask is already set up and unlocked.*
|
|
456
|
+
|
|
457
|
+
---
|
|
458
|
+
|
|
459
|
+
## Paystack Integration
|
|
460
|
+
|
|
461
|
+
The agent has built-in support for the Paystack API (Nigeria) to handle Naira payments and payouts. This enables "Agent-as-a-Service" monetization flows.
|
|
462
|
+
|
|
463
|
+
### Features
|
|
464
|
+
- **Dedicated Virtual Accounts (DVA):** Automatically provisions a unique bank account number (Wema/GTB) for each agent persona.
|
|
465
|
+
- **Naira Transfers:** Initiates outbound transfers to any Nigerian bank account via the Transfers API.
|
|
466
|
+
- **Webhook Processing:** Securely handles `charge.success` and `transfer.success` events with HMAC-SHA512 verification.
|
|
467
|
+
- **Zero-Dependency Client:** Uses Node 20+ native `fetch` (no `axios` required).
|
|
468
|
+
|
|
469
|
+
### Quick Setup
|
|
470
|
+
Configure Paystack in `.env`:
|
|
471
|
+
```bash
|
|
472
|
+
PAYSTACK_SECRET_KEY=sk_test_...
|
|
473
|
+
PAYSTACK_PUBLIC_KEY=pk_test_...
|
|
474
|
+
PAYSTACK_DVA_PROVIDER=wema-bank
|
|
475
|
+
PAYSTACK_AGENT_PHONE=+234... # Required for Live DVAs
|
|
476
|
+
```
|
|
477
|
+
|
|
478
|
+
### Autonomous Transfers & Verification
|
|
479
|
+
The agent can autonomously fulfill payment requests and **verify incoming funds or tokens** using its direct API/on-chain access:
|
|
480
|
+
- **Sending Naira**: When it encounters a task like "Pay the vendor ₦500," it extracts the bank details from the page and initiates a transfer using the `pay` action.
|
|
481
|
+
- **Verifying Naira**: If tasked to "wait for payment before releasing tokens," the agent monitors its own Paystack transaction history. It will only proceed to click "Release" or "Confirm" once it sees the matching successful transaction in its account.
|
|
482
|
+
- **Verifying Tokens**: If tasked to "confirm tokens have arrived before paying Naira," the agent monitors its actual on-chain wallet balance via its RPC connection. It ignores potentially fake website UIs and only pays once the blockchain confirms the receipt of funds.
|
|
483
|
+
|
|
484
|
+
**Safety:** Transfers are only broadcast if `PAYSTACK_TRANSFER_ENABLED=true`. Otherwise, it performs a dry-run validation.
|
|
485
|
+
|
|
486
|
+
### Testing the Integration
|
|
487
|
+
Run the standalone smoke test to verify your API keys and DVA provisioning:
|
|
488
|
+
```bash
|
|
489
|
+
npm run paystack:test
|
|
490
|
+
```
|
|
491
|
+
|
|
492
|
+
---
|
|
493
|
+
|
|
494
|
+
|
|
495
|
+
## Configuration
|
|
496
|
+
|
|
497
|
+
All settings are read from environment variables (`.env` file).
|
|
498
|
+
|
|
499
|
+
### Core Settings
|
|
500
|
+
|
|
501
|
+
| Variable | Default | Description |
|
|
502
|
+
|---|---|---|
|
|
503
|
+
| `LLM_PROVIDER` | `openai` | LLM backend: `openai` or `ollama` |
|
|
504
|
+
| `OPENAI_API_KEY` | — | **(Required for OpenAI)** API key |
|
|
505
|
+
| `OPENAI_MODEL` | `gpt-5` | Model for planning and evaluation |
|
|
506
|
+
| `OLLAMA_BASE_URL` | `http://127.0.0.1:11434` | Ollama server URL |
|
|
507
|
+
| `OLLAMA_MODEL` | `llama3.1:8b` | Ollama model name |
|
|
508
|
+
|
|
509
|
+
### Execution Limits
|
|
510
|
+
|
|
511
|
+
| Variable | Default | Description |
|
|
512
|
+
|---|---|---|
|
|
513
|
+
| `MAX_SESSION_DURATION_MS` | `600000` | Total run time cap (clamped 60s–600s) |
|
|
514
|
+
| `MAX_STEPS_PER_TASK` | `32` | Max actions per task |
|
|
515
|
+
| `ACTION_DELAY_MS` | `600` | Delay between actions (human-like pacing) |
|
|
516
|
+
| `NAVIGATION_TIMEOUT_MS` | `25000` | Page load timeout |
|
|
517
|
+
| `RECORD_VIDEO` | `false` | Record Playwright video into the run directory |
|
|
518
|
+
|
|
519
|
+
### Browser
|
|
520
|
+
|
|
521
|
+
| Variable | Default | Description |
|
|
522
|
+
|---|---|---|
|
|
523
|
+
| `HEADLESS` | `true` | Set `false` for headed mode |
|
|
524
|
+
| `PLAYWRIGHT_STORAGE_STATE_PATH` | — | Auto-load session state JSON |
|
|
525
|
+
| `PLAYWRIGHT_EXECUTABLE_PATH` | — | Custom Chromium binary path |
|
|
526
|
+
| `USE_SERVERLESS_CHROMIUM` | — | Force `@sparticuz/chromium` |
|
|
527
|
+
| `SPARTICUZ_CHROMIUM_LOCATION` | — | Chromium binary location hint |
|
|
528
|
+
|
|
529
|
+
### Trade Policy
|
|
530
|
+
|
|
531
|
+
| Variable | Default | Description |
|
|
532
|
+
|---|---|---|
|
|
533
|
+
| `TRADE_ENABLED` | `false` | Enable deterministic trade execution by default |
|
|
534
|
+
| `TRADE_ALLOWLISTED_CHAIN_IDS` | — | Comma-separated chain IDs allowed for trade execution |
|
|
535
|
+
| `TRADE_TOKEN_REGISTRY` | `[]` | JSON array of `{ chainId, symbol, assetKind, contract?, decimals }` entries |
|
|
536
|
+
| `TRADE_MAX_TOKEN_AMOUNT` | — | Maximum token amount allowed by policy |
|
|
537
|
+
| `TRADE_REQUIRE_EXACT_TOKEN_CONTRACT` | `true` | Require ERC-20 contract matches when validating trades |
|
|
538
|
+
| `TRADE_CONFIRMATIONS_REQUIRED` | `1` | Default confirmations to wait for, 0–12 |
|
|
539
|
+
| `TRADE_RECEIPT_TIMEOUT_MS` | `120000` | Max wait time for transaction receipt/confirmation |
|
|
540
|
+
|
|
541
|
+
### Dashboard & Deployment
|
|
542
|
+
|
|
543
|
+
| Variable | Default | Description |
|
|
544
|
+
|---|---|---|
|
|
545
|
+
| `APP_BASE_URL` | — | Production URL for public report links. On Render, `RENDER_EXTERNAL_URL` is used automatically when this is unset. |
|
|
546
|
+
| `SITE_AGENT_DATA_DIR` | — | Root directory for persisted runs and submissions. Set this to your Render disk mount path for durable storage. |
|
|
547
|
+
| `PORT` | `10000` on Render | Public HTTP port for Render web services |
|
|
548
|
+
| `DASHBOARD_PORT` | `4173` | Dashboard server port |
|
|
549
|
+
| `DASHBOARD_HOST` | `127.0.0.1` locally, `0.0.0.0` on Render | Dashboard server host |
|
|
550
|
+
| `REPORT_TTL_DAYS` | `30` | Public report link expiry |
|
|
551
|
+
| `INTERNAL_JOB_SECRET` | — | Restrict background job invocation |
|
|
552
|
+
|
|
553
|
+
---
|
|
554
|
+
|
|
555
|
+
## Writing Effective Tasks
|
|
556
|
+
|
|
557
|
+
Since the agent follows only your tasks, structure them as focused coverage lanes:
|
|
558
|
+
|
|
559
|
+
```bash
|
|
560
|
+
# Map the main journey
|
|
561
|
+
--task "Navigate to pricing and compare the monthly vs yearly plans"
|
|
562
|
+
|
|
563
|
+
# Inspect discovery paths
|
|
564
|
+
--task "Use the site search to find 'refund policy' and read the visible result"
|
|
565
|
+
|
|
566
|
+
# Follow the conversion path
|
|
567
|
+
--task "Click the Sign Up Free tab, fill every visible detail, and submit"
|
|
568
|
+
|
|
569
|
+
# Probe edge cases
|
|
570
|
+
--task "Enter an invalid email in the signup form and check the error message"
|
|
571
|
+
|
|
572
|
+
# Safely test an exchange flow
|
|
573
|
+
--task "Click Buy; enter 50000 NGN; confirm the crypto preview updates; provide a harmless test wallet address; verify the payment account card appears; stop before making any real payment"
|
|
574
|
+
|
|
575
|
+
# Ask for monitoring evidence
|
|
576
|
+
--task "Check exchange-flow monitoring evidence for amount entry, wallet submission, bank submission, displayed account details, copy actions, and transfer attempts"
|
|
577
|
+
|
|
578
|
+
# Autonomous fulfillment
|
|
579
|
+
--task "Buy the token; pay ₦500 to the Zenith Bank account shown on the confirmation screen"
|
|
580
|
+
```
|
|
581
|
+
|
|
582
|
+
**Tips for better results:**
|
|
583
|
+
- Write **specific, concrete actions** — not "explore the site"
|
|
584
|
+
- Use ordered verbs like **click**, **enter**, **copy**, **scroll**, **wait**, **go back**, and **stop** when the sequence matters
|
|
585
|
+
- Include literal values when needed, for example `enter 50000 NGN` or `type "test@example.com" into email`
|
|
586
|
+
- Split large journeys into **separate tasks** so early clicks don't consume the entire budget
|
|
587
|
+
- Paste multi-line instructions or upload text/JSON files in the dashboard when tasks come from a spec
|
|
588
|
+
- A combined Naira/crypto exchange spec that mentions Buy flow, Sell flow, Naira, crypto, and logging/monitoring/events is expanded into separate Buy, Sell, and monitoring tasks
|
|
589
|
+
- For slow sites, increase `NAVIGATION_TIMEOUT_MS` before increasing step counts
|
|
590
|
+
- Use `--storage-state` for pages behind authentication
|
|
591
|
+
- Run **multiple agent perspectives** (2-5) when you want broader coverage
|
|
592
|
+
- For game sites, be explicit: "Play 5 rounds and record each win or loss"
|
|
593
|
+
- For exchange/payment QA, use harmless test values and explicitly tell the agent to stop before any real payment, crypto transfer, purchase, or payout
|
|
594
|
+
|
|
595
|
+
---
|
|
596
|
+
|
|
597
|
+
## Render Deployment
|
|
598
|
+
|
|
599
|
+
This repo now targets a standard Render web service deployment:
|
|
600
|
+
|
|
601
|
+
- **Dashboard server** (`src/dashboard/server.ts`) — handles the app, submission routes, public reports, and dashboard APIs
|
|
602
|
+
- **Local filesystem persistence** — submissions and run artifacts are stored under `SITE_AGENT_DATA_DIR`
|
|
603
|
+
- **Render Blueprint** (`render.yaml`) — defines the Render web service, health check, and persistent disk mount
|
|
604
|
+
- **Full Playwright runtime** — the build installs Chromium for the dashboard worker process
|
|
605
|
+
|
|
606
|
+
### Included `render.yaml`
|
|
607
|
+
|
|
608
|
+
The repo root includes a Render Blueprint with:
|
|
609
|
+
|
|
610
|
+
- `runtime: node`
|
|
611
|
+
- `buildCommand: npm ci && npm run build && npm run browser:install`
|
|
612
|
+
- `startCommand: npm run render:start`
|
|
613
|
+
- `healthCheckPath: /health`
|
|
614
|
+
- a persistent disk mounted at `/opt/render/project/src/data`
|
|
615
|
+
|
|
616
|
+
### Required Environment Variables
|
|
617
|
+
|
|
618
|
+
| Variable | Description |
|
|
619
|
+
|---|---|
|
|
620
|
+
| `OPENAI_API_KEY` | OpenAI API key when `LLM_PROVIDER=openai` |
|
|
621
|
+
|
|
622
|
+
### Recommended Environment Variables
|
|
623
|
+
|
|
624
|
+
| Variable | Description |
|
|
625
|
+
|---|---|
|
|
626
|
+
| `LLM_PROVIDER` | Use `openai` for a single-service Render deployment unless you are also hosting Ollama separately |
|
|
627
|
+
| `APP_BASE_URL` | Optional. If unset on Render, the app falls back to `RENDER_EXTERNAL_URL` |
|
|
628
|
+
| `SITE_AGENT_DATA_DIR` | Override only if you change the disk mount path from the default in `render.yaml` |
|
|
629
|
+
| `INTERNAL_JOB_SECRET` | Optional hardening for internal job-style routes |
|
|
630
|
+
|
|
631
|
+
> **Note:** Render web services must bind to `0.0.0.0:$PORT`, and persistent filesystem data survives deploys only when it is written under the attached disk mount path. See the official Render docs for [web services](https://render.com/docs/web-services), [persistent disks](https://render.com/docs/disks), and the [Blueprint spec](https://render.com/docs/blueprint-spec).
|
|
632
|
+
|
|
633
|
+
---
|
|
634
|
+
|
|
635
|
+
## Architecture
|
|
636
|
+
|
|
637
|
+
For a detailed technical breakdown of every module, see [**ARCHITECTURE.md**](ARCHITECTURE.md).
|
|
638
|
+
|
|
639
|
+
High-level summary:
|
|
640
|
+
|
|
641
|
+
| Layer | Key Files | Purpose |
|
|
642
|
+
|---|---|---|
|
|
643
|
+
| **Entry points** | `cli/run.ts`, `dashboard/server.ts` | CLI and web UI |
|
|
644
|
+
| **Orchestration** | `runAuditJob.ts`, `processSubmissionBatch.ts` | Single-run and multi-agent execution |
|
|
645
|
+
| **Agent loop** | `runner.ts` → `planner.ts` → `executor.ts` | Capture state → LLM plans → Playwright acts |
|
|
646
|
+
| **Page understanding** | `pageState.ts`, `siteBrief.ts`, `taskDirectives.ts`, `submissions/customTasks.ts` | DOM snapshots, site comprehension, ordered instruction and upload parsing |
|
|
647
|
+
| **Authentication** | `auth/profile.ts`, `auth/inbox.ts`, `auth/runner.ts` | Identity management, IMAP OTP polling, login flows |
|
|
648
|
+
| **Evaluation** | `evaluator.ts`, `aggregateReport.ts` | LLM scoring, multi-agent result merging |
|
|
649
|
+
| **Site checks** | `siteChecks.ts`, `audit.ts` | SEO, performance, security, accessibility |
|
|
650
|
+
| **Paystack** | `paystack/*` | Dedicated virtual accounts, Naira transfers, webhooks |
|
|
651
|
+
| **Reporting** | `reporting/html.ts`, `reporting/markdown.ts`, `clickReplay.ts` | HTML/MD/JSON reports, activity replay animation |
|
|
652
|
+
| **Trade safety** | `trade/*`, `wallet/*` | Wallet injection, deterministic trade extraction, policy validation, dry-run/broadcast records |
|
|
653
|
+
| **LLM** | `llm/client.ts`, `prompts/browserAgent.ts`, `prompts/reviewer.ts` | OpenAI + Ollama client, system prompts |
|
|
654
|
+
|
|
655
|
+
---
|
|
656
|
+
|
|
657
|
+
## Important Constraints
|
|
658
|
+
|
|
659
|
+
- **No CAPTCHA/MFA bypass** — the agent does not solve CAPTCHAs, MFA challenges, or anti-bot controls
|
|
660
|
+
- **No hidden DOM access** — the agent interacts only with visible elements, like a real user
|
|
661
|
+
- **No unsupported claims** — the evaluator scores from evidence only, not from the agent's impressions
|
|
662
|
+
- **Task-required** — every run must have at least one explicit task
|
|
663
|
+
- **Legitimate sessions only** — storage state reuse is for approved, pre-established sessions
|
|
664
|
+
- **Trade-safe by default** — onchain execution is disabled unless explicitly enabled, and exchange-flow QA should stop before real-world transfers
|
|
665
|
+
|
|
666
|
+
---
|
|
667
|
+
|
|
668
|
+
## Step-by-Step Guides
|
|
669
|
+
|
|
670
|
+
| Guide | Topic |
|
|
671
|
+
|---|---|
|
|
672
|
+
| `docs/01-installation.md` | Installation and setup |
|
|
673
|
+
| `docs/02-running-your-first-audit.md` | Your first run |
|
|
674
|
+
| `docs/03-configuration.md` | Configuration deep-dive |
|
|
675
|
+
| `docs/04-how-the-agent-thinks.md` | Agent planning internals |
|
|
676
|
+
| `docs/05-extending-personas-and-tasks.md` | Custom personas and tasks |
|
|
677
|
+
| `docs/06-hardening-for-production.md` | Production deployment |
|
|
678
|
+
|
|
679
|
+
---
|
|
680
|
+
|
|
681
|
+
## Recommended Rollout
|
|
682
|
+
|
|
683
|
+
1. **Start local** — run manually on desktop, inspect logs and reports
|
|
684
|
+
2. **Tune tasks** — write focused coverage lanes for your product
|
|
685
|
+
3. **Add mobile** — include `--mobile` runs
|
|
686
|
+
4. **Multi-agent** — use 2-5 perspectives for broader coverage
|
|
687
|
+
5. **CI integration** — only after you've validated the scores match your expectations
|
|
688
|
+
|
|
689
|
+
> Treat the scores as **signals, not ground truth** until you've calibrated them against your own quality bar.
|