site-agent-pro 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +689 -0
- package/dist/auth/credentialStore.js +62 -0
- package/dist/auth/inbox.js +193 -0
- package/dist/auth/profile.js +379 -0
- package/dist/auth/runner.js +1124 -0
- package/dist/backend/dashboardData.js +194 -0
- package/dist/backend/runArtifacts.js +48 -0
- package/dist/backend/runRepository.js +93 -0
- package/dist/bin.js +2 -0
- package/dist/cli/backfillSiteChecks.js +143 -0
- package/dist/cli/run.js +309 -0
- package/dist/cli/trade.js +69 -0
- package/dist/config.js +199 -0
- package/dist/core/agentProfiles.js +55 -0
- package/dist/core/aggregateReport.js +382 -0
- package/dist/core/audit.js +30 -0
- package/dist/core/customTaskSuite.js +148 -0
- package/dist/core/evaluator.js +217 -0
- package/dist/core/executor.js +788 -0
- package/dist/core/fallbackReport.js +335 -0
- package/dist/core/formHeuristics.js +411 -0
- package/dist/core/gameplaySummary.js +164 -0
- package/dist/core/interaction.js +202 -0
- package/dist/core/pageState.js +201 -0
- package/dist/core/planner.js +1669 -0
- package/dist/core/processSubmissionBatch.js +204 -0
- package/dist/core/runAuditJob.js +170 -0
- package/dist/core/runner.js +2352 -0
- package/dist/core/siteBrief.js +107 -0
- package/dist/core/siteChecks.js +1526 -0
- package/dist/core/taskDirectives.js +279 -0
- package/dist/core/taskHeuristics.js +263 -0
- package/dist/dashboard/client.js +1256 -0
- package/dist/dashboard/contracts.js +95 -0
- package/dist/dashboard/narrative.js +277 -0
- package/dist/dashboard/server.js +458 -0
- package/dist/dashboard/theme.js +888 -0
- package/dist/index.js +84 -0
- package/dist/llm/client.js +188 -0
- package/dist/paystack/account.js +123 -0
- package/dist/paystack/client.js +100 -0
- package/dist/paystack/index.js +13 -0
- package/dist/paystack/test-paystack.js +83 -0
- package/dist/paystack/transfer.js +138 -0
- package/dist/paystack/types.js +74 -0
- package/dist/paystack/webhook.js +121 -0
- package/dist/prompts/browserAgent.js +124 -0
- package/dist/prompts/reviewer.js +71 -0
- package/dist/reporting/clickReplay.js +290 -0
- package/dist/reporting/html.js +930 -0
- package/dist/reporting/markdown.js +238 -0
- package/dist/reporting/template.js +1141 -0
- package/dist/schemas/types.js +361 -0
- package/dist/submissions/customTasks.js +196 -0
- package/dist/submissions/html.js +770 -0
- package/dist/submissions/model.js +56 -0
- package/dist/submissions/publicUrl.js +76 -0
- package/dist/submissions/service.js +74 -0
- package/dist/submissions/store.js +37 -0
- package/dist/submissions/types.js +65 -0
- package/dist/trade/engine.js +241 -0
- package/dist/trade/evm/erc20.js +44 -0
- package/dist/trade/extractor.js +148 -0
- package/dist/trade/policy.js +35 -0
- package/dist/trade/session.js +31 -0
- package/dist/trade/types.js +107 -0
- package/dist/trade/validator.js +148 -0
- package/dist/utils/files.js +59 -0
- package/dist/utils/log.js +24 -0
- package/dist/utils/playwrightCompat.js +14 -0
- package/dist/utils/time.js +3 -0
- package/dist/wallet/provider.js +345 -0
- package/dist/wallet/relay.js +129 -0
- package/dist/wallet/wallet.js +178 -0
- package/docs/01-installation.md +134 -0
- package/docs/02-running-your-first-audit.md +136 -0
- package/docs/03-configuration.md +233 -0
- package/docs/04-how-the-agent-thinks.md +41 -0
- package/docs/05-extending-personas-and-tasks.md +42 -0
- package/docs/06-hardening-for-production.md +92 -0
- package/package.json +60 -0
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# 01 - Installation
|
|
2
|
+
|
|
3
|
+
There are two ways to use site-agent-pro. Choose the one that fits your workflow.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Track A — devDependency (recommended for most developers)
|
|
8
|
+
|
|
9
|
+
This is the right choice if you want to audit your own product as you build it, run audits in CI, or call site-agent-pro from scripts and test files.
|
|
10
|
+
|
|
11
|
+
### 1. Prerequisites
|
|
12
|
+
|
|
13
|
+
- Node.js 20.10 or newer
|
|
14
|
+
- npm 10 or newer
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
node -v
|
|
18
|
+
npm -v
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
### 2. Install in your project
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
npm install --save-dev site-agent-pro
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
### 3. Install the Playwright browser
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
npx playwright install chromium
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
This only needs to be done once per machine or CI environment.
|
|
34
|
+
|
|
35
|
+
### 4. Set your API key
|
|
36
|
+
|
|
37
|
+
Create a `.env` file (or set environment variables) in your project root:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
OPENAI_API_KEY=your_real_key_here
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Or use Ollama for local/offline development (no API key needed):
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
LLM_PROVIDER=ollama
|
|
47
|
+
OLLAMA_MODEL=llama3.1:8b
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
### 5. Run your first audit
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
# Against your running dev server
|
|
54
|
+
site-agent-pro --url http://localhost:3000 --task "Check the homepage CTA"
|
|
55
|
+
|
|
56
|
+
# Or add it to your package.json scripts
|
|
57
|
+
# "audit": "site-agent-pro --url http://localhost:3000 --task 'Check the homepage'"
|
|
58
|
+
# npm run audit
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### 6. Or use it programmatically
|
|
62
|
+
|
|
63
|
+
```ts
|
|
64
|
+
import { runAudit } from "site-agent-pro";
|
|
65
|
+
|
|
66
|
+
const result = await runAudit({
|
|
67
|
+
url: "http://localhost:3000",
|
|
68
|
+
tasks: ["Check the homepage CTA", "Try the signup flow"],
|
|
69
|
+
});
|
|
70
|
+
|
|
71
|
+
console.log(`Score: ${result.report.overall_score}/10`);
|
|
72
|
+
|
|
73
|
+
if (result.report.overall_score < 7) {
|
|
74
|
+
process.exit(1); // Fail CI
|
|
75
|
+
}
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Track B — Clone and run (for contributors and self-hosting)
|
|
81
|
+
|
|
82
|
+
Use this if you want to run the full web dashboard, modify the source, or self-host the submission server.
|
|
83
|
+
|
|
84
|
+
### 1. Prerequisites
|
|
85
|
+
|
|
86
|
+
- Node.js 20.10 or newer
|
|
87
|
+
- npm 10 or newer
|
|
88
|
+
- Git
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
node -v
|
|
92
|
+
npm -v
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### 2. Clone the repository
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
git clone https://github.com/your-org/site-agent-pro.git
|
|
99
|
+
cd site-agent-pro
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### 3. Install dependencies
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
npm install
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### 4. Install the Playwright browser
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
npm run browser:install
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### 5. Create your environment file
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
cp .env.example .env
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### 6. Add your OpenAI API key
|
|
121
|
+
|
|
122
|
+
Open `.env` and set:
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
OPENAI_API_KEY=your_real_key_here
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### 7. Confirm TypeScript builds cleanly
|
|
129
|
+
|
|
130
|
+
```bash
|
|
131
|
+
npm run check
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
If this fails, do not keep going and pretend everything is fine. Fix the error first.
|
|
@@ -0,0 +1,136 @@
|
|
|
1
|
+
# 02 - Running Your First Audit
|
|
2
|
+
|
|
3
|
+
## Which command to use?
|
|
4
|
+
|
|
5
|
+
| Setup | Command |
|
|
6
|
+
|---|---|
|
|
7
|
+
| Installed as npm devDependency | `site-agent-pro --url ... --task "..."` |
|
|
8
|
+
| Cloned from source | `npm run dev -- --url ... --task "..."` |
|
|
9
|
+
|
|
10
|
+
All examples below use the `site-agent-pro` command. If you cloned the repo, replace it with `npm run dev --`.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## 1. Start with a simple public site
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
site-agent-pro --url https://example.com --task "Open pricing and compare the visible plans before signup"
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
This creates a new run directory in `runs/`.
|
|
21
|
+
If you want the full local product flow, start the app with `npm run dashboard` and submit the URL through `http://localhost:4173/`.
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## 2. Run against your own localhost dev server
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
# Start your app first
|
|
29
|
+
npm run dev
|
|
30
|
+
|
|
31
|
+
# Then in another terminal
|
|
32
|
+
site-agent-pro --url http://localhost:3000 --task "Check the homepage CTA"
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
This is the core use case for side-by-side development: catch UX issues as you build, not after.
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## 3. Inspect the output
|
|
40
|
+
|
|
41
|
+
Each run produces a timestamped directory in `runs/` containing:
|
|
42
|
+
|
|
43
|
+
- `inputs.json`
|
|
44
|
+
- `raw-events.json`
|
|
45
|
+
- `task-results.json`
|
|
46
|
+
- `accessibility.json`
|
|
47
|
+
- `report.json`
|
|
48
|
+
- `report.html`
|
|
49
|
+
- `report.md`
|
|
50
|
+
- `click-replay.webp` (animated replay)
|
|
51
|
+
- `*.webm` (full video recording if `RECORD_VIDEO=true`)
|
|
52
|
+
|
|
53
|
+
Open `report.html` in your browser for a readable, standalone report.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## 4. Run in a visible browser
|
|
58
|
+
|
|
59
|
+
Use this while debugging interaction issues:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
site-agent-pro --url https://example.com --task "Open pricing" --headed
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## 5. Run as a mobile user
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
site-agent-pro --url https://example.com --task "Check the mobile nav" --mobile
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## 6. Bootstrap an authenticated session
|
|
76
|
+
|
|
77
|
+
When the site requires signup, email verification, OTP, or login before the important content is visible:
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
site-agent-pro --url https://example.com \
|
|
81
|
+
--task "Reach the account dashboard and confirm billing is visible" \
|
|
82
|
+
--auth-flow --signup-url /register --login-url /login --access-url /dashboard \
|
|
83
|
+
--headed
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
If you only want the authenticated Playwright session file and not the task run:
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
site-agent-pro --url https://example.com \
|
|
90
|
+
--auth-only --signup-url /register --login-url /login --access-url /dashboard
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
This writes `auth-flow.json` into the run directory and saves the authenticated `storageState` so future runs can reuse it directly.
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## 7. Use it programmatically (devDependency users)
|
|
98
|
+
|
|
99
|
+
Instead of the CLI, call site-agent-pro from a script or test file:
|
|
100
|
+
|
|
101
|
+
```ts
|
|
102
|
+
import { runAudit } from "site-agent-pro";
|
|
103
|
+
|
|
104
|
+
const result = await runAudit({
|
|
105
|
+
url: "http://localhost:3000",
|
|
106
|
+
tasks: [
|
|
107
|
+
"Open pricing and compare the visible plans before signup",
|
|
108
|
+
"Click the sign-up button and check the form fields",
|
|
109
|
+
],
|
|
110
|
+
});
|
|
111
|
+
|
|
112
|
+
console.log(`Score: ${result.report.overall_score}/10`);
|
|
113
|
+
console.log(`Strengths: ${result.report.strengths.join(", ")}`);
|
|
114
|
+
console.log(`Top fixes: ${result.report.top_fixes.join(", ")}`);
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## 8. Read the task output correctly
|
|
120
|
+
|
|
121
|
+
Do not treat the overall score as objective truth.
|
|
122
|
+
Use the output to answer:
|
|
123
|
+
- what users could do
|
|
124
|
+
- where they got stuck
|
|
125
|
+
- what broke trust
|
|
126
|
+
- what to fix first
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## 9. Use the hosted output flow (self-hosted setup only)
|
|
131
|
+
|
|
132
|
+
When you run the local app server:
|
|
133
|
+
- submit a public URL from `/`
|
|
134
|
+
- check status at `/submissions/<submission-id>`
|
|
135
|
+
- open the unique public task-output link at `/r/<token>`
|
|
136
|
+
- download the finished output from `/dashboard`
|
|
@@ -0,0 +1,233 @@
|
|
|
1
|
+
# 03 - Configuration
|
|
2
|
+
|
|
3
|
+
## Environment variables
|
|
4
|
+
|
|
5
|
+
### `OPENAI_API_KEY`
|
|
6
|
+
Required. Your API key.
|
|
7
|
+
|
|
8
|
+
### `OPENAI_MODEL`
|
|
9
|
+
Default: `gpt-5`
|
|
10
|
+
|
|
11
|
+
Change this if you want a different compatible model.
|
|
12
|
+
|
|
13
|
+
### `APP_BASE_URL`
|
|
14
|
+
Default: `http://localhost:4173`
|
|
15
|
+
|
|
16
|
+
Used when building hosted task-output links.
|
|
17
|
+
|
|
18
|
+
### `HEADLESS`
|
|
19
|
+
Default: `true`
|
|
20
|
+
|
|
21
|
+
Set to `false` if you want the browser visible by default.
|
|
22
|
+
|
|
23
|
+
### `MAX_SESSION_DURATION_MS`
|
|
24
|
+
Default: `600000`
|
|
25
|
+
|
|
26
|
+
Caps a single audit at 10 minutes in V1.
|
|
27
|
+
The code enforces a hard ceiling of 600 seconds even if you set a larger value.
|
|
28
|
+
|
|
29
|
+
### `MAX_STEPS_PER_TASK`
|
|
30
|
+
Default: `32`
|
|
31
|
+
|
|
32
|
+
The default now leans toward a forensic investigation across multiple focused coverage lanes instead of one vague exploration pass.
|
|
33
|
+
The runner also preserves time for later tasks and supplemental site checks, so increasing this does not guarantee more useful coverage.
|
|
34
|
+
Raise this only when tasks genuinely require even longer flows. Bigger numbers can still make the agent wander if the site has poor signals.
|
|
35
|
+
|
|
36
|
+
### `ACTION_DELAY_MS`
|
|
37
|
+
Default: `600`
|
|
38
|
+
|
|
39
|
+
Extra delay between actions. Useful when sites animate heavily.
|
|
40
|
+
|
|
41
|
+
### `NAVIGATION_TIMEOUT_MS`
|
|
42
|
+
Default: `25000`
|
|
43
|
+
|
|
44
|
+
Increase this for painfully slow sites.
|
|
45
|
+
This timeout also affects the supplemental site probes that power performance, SEO, security, mobile, and content coverage.
|
|
46
|
+
|
|
47
|
+
### `REPORT_TTL_DAYS`
|
|
48
|
+
Default: `30`
|
|
49
|
+
|
|
50
|
+
Hosted public task-output links expire after this many days.
|
|
51
|
+
|
|
52
|
+
### `RECORD_VIDEO`
|
|
53
|
+
Default: `false`
|
|
54
|
+
|
|
55
|
+
When set to `true`, Playwright captures a full video recording of every browser session. These recordings are saved in the run directory and are viewable in the dashboard alongside the animated WebP replays.
|
|
56
|
+
|
|
57
|
+
### `PLAYWRIGHT_STORAGE_STATE_PATH`
|
|
58
|
+
Default: unset
|
|
59
|
+
|
|
60
|
+
Optional path to a Playwright `storageState` JSON file.
|
|
61
|
+
Use this when your approved test lane already has a legitimate verified or authenticated session and you want the CLI or local app to reuse it automatically.
|
|
62
|
+
|
|
63
|
+
## Coverage playbook
|
|
64
|
+
|
|
65
|
+
If you want the fewest possible `blocked` metrics:
|
|
66
|
+
|
|
67
|
+
- Prefer sites or QA lanes that are reachable without CAPTCHA, Cloudflare challenges, or geo/IP throttling.
|
|
68
|
+
- Reuse a legitimate session with `PLAYWRIGHT_STORAGE_STATE_PATH` or `--storage-state` when important paths sit behind login or verification.
|
|
69
|
+
- Raise `NAVIGATION_TIMEOUT_MS` for slow sites before raising `MAX_STEPS_PER_TASK`.
|
|
70
|
+
- Keep `MAX_SESSION_DURATION_MS` near the 10-minute ceiling for deeper task runs.
|
|
71
|
+
- Use multiple agent perspectives in the submission form when you want broader behavioral coverage, not just deeper repetition from one agent.
|
|
72
|
+
|
|
73
|
+
### `DASHBOARD_PORT`
|
|
74
|
+
Default: `4173`
|
|
75
|
+
|
|
76
|
+
Port used by the local app server.
|
|
77
|
+
|
|
78
|
+
### `DASHBOARD_HOST`
|
|
79
|
+
Default: `127.0.0.1`
|
|
80
|
+
|
|
81
|
+
Host binding used by the local app server.
|
|
82
|
+
|
|
83
|
+
## Auth bootstrap variables
|
|
84
|
+
|
|
85
|
+
These are needed when you bootstrap a new auth identity with `--auth-flow` or `--auth-only`.
|
|
86
|
+
After a successful auth run, the runner also caches the working credentials in `.auth/credentials.json` keyed by target origin, so later runs against the same site can reuse them automatically.
|
|
87
|
+
|
|
88
|
+
### `AUTH_TEST_EMAIL`
|
|
89
|
+
Required for auth bootstrap.
|
|
90
|
+
|
|
91
|
+
The base mailbox address the runner uses for signup and login.
|
|
92
|
+
On the first signup attempt it uses this exact address.
|
|
93
|
+
If the site says the account already exists, the runner now retries with fresh plus-address aliases such as `name+siteagent-...@domain.com` so it can keep registering without manual edits.
|
|
94
|
+
|
|
95
|
+
### `AUTH_TEST_PASSWORD`
|
|
96
|
+
Required for auth bootstrap.
|
|
97
|
+
|
|
98
|
+
The password the runner uses for both signup and login.
|
|
99
|
+
|
|
100
|
+
### `AUTH_TEST_USERNAME`
|
|
101
|
+
Optional.
|
|
102
|
+
|
|
103
|
+
Use this when the site expects a username field that is different from the email address. If omitted, the runner derives a fallback username from the configured email address.
|
|
104
|
+
|
|
105
|
+
### `AUTH_TEST_FIRST_NAME` through `AUTH_TEST_COMPANY`
|
|
106
|
+
Defaults are provided in `.env.example`.
|
|
107
|
+
|
|
108
|
+
These values are used to fill visible signup fields such as name, phone, address, city, state, postal code, country, and company.
|
|
109
|
+
When the runner has to retry signup with a fresh identity, it also adds small numeric variations to these details so sites that enforce uniqueness beyond email are less likely to reject the retry.
|
|
110
|
+
|
|
111
|
+
### `AUTH_IMAP_HOST`, `AUTH_IMAP_PORT`, `AUTH_IMAP_SECURE`, `AUTH_IMAP_USER`, `AUTH_IMAP_PASSWORD`, `AUTH_IMAP_MAILBOX`
|
|
112
|
+
|
|
113
|
+
Configure the real inbox the runner should poll for OTP or verification emails.
|
|
114
|
+
The auth bootstrap uses IMAP mailbox access, not a browser-driven webmail tab.
|
|
115
|
+
|
|
116
|
+
### `AUTH_EMAIL_POLL_TIMEOUT_MS`
|
|
117
|
+
Default: `180000`
|
|
118
|
+
|
|
119
|
+
How long to wait for the verification email before failing the auth bootstrap.
|
|
120
|
+
|
|
121
|
+
### `AUTH_EMAIL_POLL_INTERVAL_MS`
|
|
122
|
+
Default: `5000`
|
|
123
|
+
|
|
124
|
+
How frequently to poll the inbox for a new message.
|
|
125
|
+
|
|
126
|
+
### `AUTH_OTP_LENGTH`
|
|
127
|
+
Default: `6`
|
|
128
|
+
|
|
129
|
+
Expected OTP length for numeric code extraction.
|
|
130
|
+
|
|
131
|
+
### `AUTH_EMAIL_FROM_FILTER`
|
|
132
|
+
Optional.
|
|
133
|
+
|
|
134
|
+
Use this when the mailbox receives lots of unrelated email and you want to constrain matching to a specific sender.
|
|
135
|
+
|
|
136
|
+
### `AUTH_EMAIL_SUBJECT_FILTER`
|
|
137
|
+
Optional.
|
|
138
|
+
|
|
139
|
+
Use this when the mailbox receives lots of unrelated email and you want to constrain matching to a specific subject fragment.
|
|
140
|
+
|
|
141
|
+
### `AUTH_GENERATED_IDENTITY_MAX_ATTEMPTS`
|
|
142
|
+
Default: `5`
|
|
143
|
+
|
|
144
|
+
How many signup identities the runner should try before giving up when the site keeps reporting that the account already exists.
|
|
145
|
+
|
|
146
|
+
### `AUTH_SIGNUP_URL`, `AUTH_LOGIN_URL`, `AUTH_ACCESS_URL`
|
|
147
|
+
Optional.
|
|
148
|
+
|
|
149
|
+
Default auth flow URLs used by the CLI when you do not pass `--signup-url`, `--login-url`, or `--access-url`.
|
|
150
|
+
|
|
151
|
+
If auth credentials are configured and a normal task run lands on a real login or registration wall, the runner can also attempt an automatic in-session signup/login recovery using the current blocked page as the protected destination to re-open.
|
|
152
|
+
|
|
153
|
+
### `AUTH_SESSION_STATE_PATH`
|
|
154
|
+
Default: `.auth/session.json`
|
|
155
|
+
|
|
156
|
+
Where the authenticated Playwright session is saved if you do not explicitly pass `--save-storage-state`.
|
|
157
|
+
|
|
158
|
+
## CLI flags
|
|
159
|
+
|
|
160
|
+
### `--url`
|
|
161
|
+
Required website URL.
|
|
162
|
+
|
|
163
|
+
### `--task`
|
|
164
|
+
Required for task runs. Repeat it for each accepted task you want the agent to perform.
|
|
165
|
+
|
|
166
|
+
Example:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
npm run dev -- --url https://example.com --task "Open pricing and compare the visible plans" --task "Reach the signup page without creating an account"
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### `--headed`
|
|
173
|
+
Shows the browser.
|
|
174
|
+
|
|
175
|
+
### `--mobile`
|
|
176
|
+
Uses a mobile browser profile.
|
|
177
|
+
|
|
178
|
+
### `--ignore-https-errors`
|
|
179
|
+
Allows invalid or self-signed HTTPS certificates.
|
|
180
|
+
|
|
181
|
+
Useful for local development sites such as:
|
|
182
|
+
|
|
183
|
+
```bash
|
|
184
|
+
npm run dev -- --url https://localhost:3000 --ignore-https-errors
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
### `--storage-state`
|
|
188
|
+
Loads a Playwright `storageState` JSON file for a single run.
|
|
189
|
+
|
|
190
|
+
Example:
|
|
191
|
+
|
|
192
|
+
```bash
|
|
193
|
+
npm run dev -- --url https://example.com --storage-state .auth/session.json
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### `--save-storage-state`
|
|
197
|
+
Saves the Playwright `storageState` JSON after the run finishes.
|
|
198
|
+
|
|
199
|
+
Example:
|
|
200
|
+
|
|
201
|
+
```bash
|
|
202
|
+
npm run dev -- --url https://example.com --storage-state .auth/session.json --save-storage-state .auth/session.json
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
### `--auth-flow`
|
|
206
|
+
Runs the auth bootstrap first, then continues the accepted task run with the authenticated session.
|
|
207
|
+
|
|
208
|
+
Example:
|
|
209
|
+
|
|
210
|
+
```bash
|
|
211
|
+
npm run dev -- --url https://example.com --auth-flow --signup-url /register --login-url /login --access-url /app
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
### `--auth-only`
|
|
215
|
+
Runs only the auth bootstrap and saves the authenticated session without generating a task output.
|
|
216
|
+
|
|
217
|
+
### `--signup-url`
|
|
218
|
+
Optional absolute or relative signup URL for auth bootstrap.
|
|
219
|
+
|
|
220
|
+
### `--login-url`
|
|
221
|
+
Optional absolute or relative login URL for auth bootstrap.
|
|
222
|
+
|
|
223
|
+
### `--access-url`
|
|
224
|
+
Optional absolute or relative protected URL used to confirm the session can reach authenticated content after login.
|
|
225
|
+
|
|
226
|
+
## Local app routes
|
|
227
|
+
|
|
228
|
+
After running `npm run dashboard`:
|
|
229
|
+
- `/` is the public submission form
|
|
230
|
+
- `/dashboard` is the internal run dashboard
|
|
231
|
+
- `/submissions/<id>` is the submission status page
|
|
232
|
+
- `/r/<token>` is the public task-output link
|
|
233
|
+
- `/outputs/<run-id>` is the standalone HTML output route
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# 04 - How the Agent Thinks
|
|
2
|
+
|
|
3
|
+
## The execution loop
|
|
4
|
+
|
|
5
|
+
For each task, the system does this:
|
|
6
|
+
|
|
7
|
+
1. capture visible page state
|
|
8
|
+
2. ask the model for the next realistic user action
|
|
9
|
+
3. execute the action with guarded locators
|
|
10
|
+
4. log what happened
|
|
11
|
+
5. repeat until the task ends or the step limit is hit
|
|
12
|
+
|
|
13
|
+
## Why the planner and evaluator are separate
|
|
14
|
+
|
|
15
|
+
If one model both acts and judges, it will flatter itself and invent success.
|
|
16
|
+
That is weak design.
|
|
17
|
+
|
|
18
|
+
This project separates:
|
|
19
|
+
- **planner**: chooses the next action
|
|
20
|
+
- **evaluator**: reviews the evidence afterward
|
|
21
|
+
|
|
22
|
+
## What the planner sees
|
|
23
|
+
|
|
24
|
+
The planner gets:
|
|
25
|
+
- page title and URL
|
|
26
|
+
- visible body text excerpt
|
|
27
|
+
- visible interactive elements
|
|
28
|
+
- headings
|
|
29
|
+
- modal hints
|
|
30
|
+
- previous action history
|
|
31
|
+
|
|
32
|
+
## What the planner does not get
|
|
33
|
+
|
|
34
|
+
It does not get:
|
|
35
|
+
- hidden DOM content
|
|
36
|
+
- fake claims that something succeeded
|
|
37
|
+
|
|
38
|
+
## Why this matters
|
|
39
|
+
|
|
40
|
+
You wanted a system that behaves like a regular user.
|
|
41
|
+
Regular users do not inspect invisible elements or parse the entire DOM perfectly.
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
# 05 - Extending Accepted Tasks
|
|
2
|
+
|
|
3
|
+
## Define tasks from explicit input
|
|
4
|
+
|
|
5
|
+
There are no built-in personas or default task files anymore.
|
|
6
|
+
Each run is driven by accepted tasks submitted from the dashboard or passed to the CLI with repeated `--task` flags.
|
|
7
|
+
|
|
8
|
+
Example CLI input:
|
|
9
|
+
|
|
10
|
+
```bash
|
|
11
|
+
npm run dev -- --url https://example.com \
|
|
12
|
+
--task "Find the pricing page and compare the visible plans" \
|
|
13
|
+
--task "Open the contact path and confirm whether support is easy to reach"
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
For game-oriented runs, write the requested behavior directly into the accepted tasks. Example: read the visible how-to-play section, reach a playable state, and play five rounds while recording wins and losses.
|
|
17
|
+
|
|
18
|
+
## Good task design
|
|
19
|
+
|
|
20
|
+
A good task is:
|
|
21
|
+
- concrete
|
|
22
|
+
- time-bounded
|
|
23
|
+
- observable
|
|
24
|
+
- easy to judge from evidence
|
|
25
|
+
- complementary with the other tasks in the suite
|
|
26
|
+
|
|
27
|
+
Good task sets usually split coverage into a few lanes such as:
|
|
28
|
+
- main journey and orientation
|
|
29
|
+
- discovery and information architecture
|
|
30
|
+
- conversion and trust
|
|
31
|
+
- suspicious interactions and recovery states
|
|
32
|
+
|
|
33
|
+
That gives the runner broader coverage without asking one task to explain the whole site alone.
|
|
34
|
+
|
|
35
|
+
## Bad task design
|
|
36
|
+
|
|
37
|
+
Trash tasks look like this:
|
|
38
|
+
- “Explore the site”
|
|
39
|
+
- “See if it is good”
|
|
40
|
+
- “Understand everything”
|
|
41
|
+
|
|
42
|
+
Those are vague, hard to score, and guaranteed to produce mush.
|
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
# 06 - Hardening for Production
|
|
2
|
+
|
|
3
|
+
## 0. CI integration (devDependency users)
|
|
4
|
+
|
|
5
|
+
This is the natural endpoint of side-by-side development: once you trust the audit scores on your own product, automate them.
|
|
6
|
+
|
|
7
|
+
### Run site-agent-pro in CI against a preview URL
|
|
8
|
+
|
|
9
|
+
```yaml
|
|
10
|
+
# Example: GitHub Actions
|
|
11
|
+
- name: Run site-agent-pro audit
|
|
12
|
+
env:
|
|
13
|
+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
|
14
|
+
run: |
|
|
15
|
+
site-agent-pro --url ${{ env.PREVIEW_URL }} \
|
|
16
|
+
--task "Check the homepage CTA" \
|
|
17
|
+
--task "Open pricing and compare the visible plans"
|
|
18
|
+
|
|
19
|
+
> [!TIP]
|
|
20
|
+
> You can also pass secrets directly via CLI flags (e.g., `--openai-api-key ${{ secrets.OPENAI_API_KEY }}`) if you prefer not to map them as environment variables.
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
### Gate on a minimum score using the programmatic API
|
|
24
|
+
|
|
25
|
+
Create an `audit.mjs` script in your project:
|
|
26
|
+
|
|
27
|
+
```ts
|
|
28
|
+
import { runAudit } from "site-agent-pro";
|
|
29
|
+
|
|
30
|
+
const result = await runAudit({
|
|
31
|
+
url: process.env.PREVIEW_URL ?? "http://localhost:3000",
|
|
32
|
+
tasks: [
|
|
33
|
+
"Check the homepage CTA",
|
|
34
|
+
"Open pricing and compare the visible plans",
|
|
35
|
+
],
|
|
36
|
+
});
|
|
37
|
+
|
|
38
|
+
console.log(`Score: ${result.report.overall_score}/10`);
|
|
39
|
+
result.report.top_fixes.forEach((fix) => console.log(`Fix: ${fix}`));
|
|
40
|
+
|
|
41
|
+
if (result.report.overall_score < 7) {
|
|
42
|
+
console.error("Audit score below threshold. Failing build.");
|
|
43
|
+
process.exit(1);
|
|
44
|
+
}
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
```yaml
|
|
48
|
+
- name: Run audit gate
|
|
49
|
+
run: node audit.mjs
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
> **Important:** Do not add CI gating until you have manually reviewed enough runs on your own product to understand the score range. Arbitrary thresholds will create false failures and destroy trust in the tool.
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## 1. Add retries carefully
|
|
57
|
+
|
|
58
|
+
Retrying every failed action blindly is lazy and dangerous.
|
|
59
|
+
Add retries only for:
|
|
60
|
+
- network hiccups
|
|
61
|
+
- delayed rendering
|
|
62
|
+
- slow client-side routing
|
|
63
|
+
|
|
64
|
+
## 2. Improve task completion checks
|
|
65
|
+
|
|
66
|
+
The current completion logic is conservative but still heuristic.
|
|
67
|
+
Production systems should add explicit validators per task.
|
|
68
|
+
|
|
69
|
+
Examples:
|
|
70
|
+
- pricing check should detect real price patterns
|
|
71
|
+
- contact check should verify actual support/contact details
|
|
72
|
+
- signup check should verify the next-step form is real and usable
|
|
73
|
+
|
|
74
|
+
## 3. Improve event-aware evaluation
|
|
75
|
+
|
|
76
|
+
Right now the evaluator relies on interaction logs, task outcomes, and accessibility findings.
|
|
77
|
+
If you want better judgment later, enrich the structured events instead of adding guesswork.
|
|
78
|
+
|
|
79
|
+
## 4. Add category-specific personas
|
|
80
|
+
|
|
81
|
+
Use different task sets for:
|
|
82
|
+
- SaaS marketing sites
|
|
83
|
+
- ecommerce stores
|
|
84
|
+
- docs portals
|
|
85
|
+
- recruiting pages
|
|
86
|
+
- local business websites
|
|
87
|
+
|
|
88
|
+
## 5. Add CI only after manual trust is earned
|
|
89
|
+
|
|
90
|
+
Do not turn this into a pipeline gate until you have manually reviewed enough runs to understand its failure modes.
|
|
91
|
+
|
|
92
|
+
|