claudeye 1.0.9 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +17 -2643
- package/package.json +10 -10
package/README.md
CHANGED
|
@@ -1,2656 +1,30 @@
|
|
|
1
|
-
|
|
2
|
-
____ _ _
|
|
3
|
-
/ ___| | __ _ _ _ __| | ___ _ _ ___
|
|
4
|
-
| | | |/ _` | | | |/ _` |/ _ \ | | |/ _ \
|
|
5
|
-
| |___| | (_| | |_| | (_| | __/ |_| | __/
|
|
6
|
-
\____|_|\__,_|\__,_|\__,_|\___|\__, |\___|
|
|
7
|
-
|___/
|
|
8
|
-
```
|
|
1
|
+
# Claudeye is now FailproofAI — and we're open source!
|
|
9
2
|
|
|
10
|
-
|
|
3
|
+
Hey there, fellow agent wranglers!
|
|
11
4
|
|
|
12
|
-
**
|
|
13
|
-
**Understand** where they struggle.
|
|
14
|
-
**Utilize** insights to improve.
|
|
5
|
+
We've got some big news: **Claudeye has evolved.** New name. New home. Same mission of keeping your AI agents honest — but now with the doors wide open.
|
|
15
6
|
|
|
16
|
-
[
|
|
17
|
-
[](https://www.npmjs.com/package/claudeye)
|
|
18
|
-
[](https://nodejs.org)
|
|
19
|
-
[](https://www.typescriptlang.org/)
|
|
20
|
-
[](https://discord.com/invite/zT92CAgvkj)
|
|
21
|
-
[](https://claudeye.exosphere.host)
|
|
7
|
+
**Claudeye is now [FailproofAI](https://befailproof.ai)**, and it's fully open source.
|
|
22
8
|
|
|
23
|
-
|
|
9
|
+
That's right — all the oversight, observability, session replay, custom evals, and hook policies you know and love? Free. Open. Yours to fork, extend, and contribute to.
|
|
24
10
|
|
|
25
|
-
|
|
26
|
-
- [Quick Start](#quick-start)
|
|
27
|
-
- [Hook Policies](#protect-hook-policies)
|
|
28
|
-
- [Enterprise](#enterprise)
|
|
29
|
-
- [Why Claudeye?](#why-claudeye)
|
|
30
|
-
- [Features](#features)
|
|
31
|
-
- [CLI Reference](#cli-reference)
|
|
32
|
-
- [Custom Evals & Enrichments](#custom-evals--enrichments)
|
|
33
|
-
- [API Reference](#api-reference)
|
|
34
|
-
- [`createApp()`](#createapp)
|
|
35
|
-
- [`app.condition(fn)`](#appconditionfn)
|
|
36
|
-
- [`app.queueCondition(fn, options?)`](#appqueueconditionfn-options)
|
|
37
|
-
- [`app.cacheInvalidation(fn)`](#appcacheinvalidationfn)
|
|
38
|
-
- [`app.eval(name, fn, options?)`](#appevalname-fn-options)
|
|
39
|
-
- [`app.enrich(name, fn, options?)`](#appenrichname-fn-options)
|
|
40
|
-
- [`app.action(name, fn, options?)`](#appactionname-fn-options)
|
|
41
|
-
- [`app.alert(name, fn, options?)`](#appalertname-fn-options)
|
|
42
|
-
- [`app.dashboard.view(name, options?)`](#appdashboardviewname-options)
|
|
43
|
-
- [`app.dashboard.filter(name, fn, options?)`](#appdashboardfiltername-fn-options)
|
|
44
|
-
- [`app.dashboard.filter({ preBuilt })`](#appdashboardfilter-prebuilt-)
|
|
45
|
-
- [`app.dashboard.aggregate(name, definition, options?)`](#appdashboardaggregatename-definition-options)
|
|
46
|
-
- [`app.auth(options)`](#appauthoptions)
|
|
47
|
-
- [`app.listen(port?, options?)`](#applistenport-options)
|
|
48
|
-
- [Subagent Scope](#subagent-scope)
|
|
49
|
-
- [Evaluation Order](#evaluation-order)
|
|
50
|
-
- [UI Behavior](#ui-behavior)
|
|
51
|
-
- [Types](#types)
|
|
52
|
-
- [Examples](#examples)
|
|
53
|
-
- [Basic Evals & Enrichments](#example-basic-evals--enrichments)
|
|
54
|
-
- [Dashboard Filters](#example-dashboard-filters)
|
|
55
|
-
- [Multi-View Dashboard](#example-multi-view-dashboard)
|
|
56
|
-
- [Eval Score Filters (cachedOnly)](#example-eval-score-filters-cachedonly)
|
|
57
|
-
- [Actions](#example-actions)
|
|
58
|
-
- [Alerts](#example-alerts)
|
|
59
|
-
- [Minimal Filters Only](#example-minimal-filters-only)
|
|
60
|
-
- [Background Queue Processing](#background-queue-processing)
|
|
61
|
-
- [Caching](#caching)
|
|
62
|
-
- [Authentication](#authentication)
|
|
63
|
-
- [Deployment with PM2](#deployment-with-pm2)
|
|
64
|
-
- [Telemetry](#telemetry)
|
|
65
|
-
- [How It Works](#how-it-works)
|
|
66
|
-
- [Community](#community)
|
|
67
|
-
- [License](#license)
|
|
11
|
+
## Where to find us
|
|
68
12
|
|
|
69
|
-
|
|
13
|
+
| | |
|
|
14
|
+
|---|---|
|
|
15
|
+
| **GitHub** | [github.com/exospherehost/failproofai](https://github.com/exospherehost/failproofai) |
|
|
16
|
+
| **Website** | [befailproof.ai](https://befailproof.ai) |
|
|
70
17
|
|
|
71
|
-
##
|
|
18
|
+
## Why the rename?
|
|
72
19
|
|
|
73
|
-
|
|
20
|
+
Because "keeping an eye on Claude" was just the beginning. FailproofAI is about making **all** your AI agents failproof — not just watching, but actively protecting, evaluating, and improving every session.
|
|
74
21
|
|
|
75
|
-
|
|
22
|
+
Plus, let's be honest — *failproofai* just rolls off the tongue better at conferences.
|
|
76
23
|
|
|
77
|
-
##
|
|
24
|
+
## What's next?
|
|
78
25
|
|
|
79
|
-
|
|
80
|
-
npm install -g claudeye && claudeye
|
|
81
|
-
# or: bun install -g claudeye && claudeye
|
|
82
|
-
```
|
|
26
|
+
Head over to the new repo, give it a star, and join the fun. Whether you're building evals, writing hook policies, or just want to replay that one session where your agent tried to `rm -rf /` — we've got you covered.
|
|
83
27
|
|
|
84
|
-
|
|
28
|
+
See you on the other side!
|
|
85
29
|
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
> **Note:** Claudeye is distributed as a compiled native binary per platform. When you install `claudeye`, npm automatically pulls the correct platform-specific package (`@claudeye/linux-x64`, `@claudeye/darwin-arm64`, etc.) as an optional dependency. Node.js >= 20.9.0 must be available on PATH for the dashboard server.
|
|
89
|
-
|
|
90
|
-
> **Important — Session retention:** Claude Code automatically deletes
|
|
91
|
-
> sessions older than 30 days by default. To keep sessions visible in Claudeye,
|
|
92
|
-
> set the retention period in `~/.claude/settings.json`:
|
|
93
|
-
>
|
|
94
|
-
> ```json
|
|
95
|
-
> { "cleanupPeriodDays": 9999 }
|
|
96
|
-
> ```
|
|
97
|
-
>
|
|
98
|
-
> Set `cleanupPeriodDays` to the number of days you want to retain session
|
|
99
|
-
> history. **Do NOT set it to `0`** — this wipes all stored sessions
|
|
100
|
-
> immediately. See [Claude Code settings](https://code.claude.com/docs/en/settings)
|
|
101
|
-
> for details.
|
|
102
|
-
|
|
103
|
-
## Protect: Hook Policies
|
|
104
|
-
|
|
105
|
-
AI agents can execute arbitrary shell commands. Without guardrails, Claude can accidentally `rm -rf /`, read your `.env` secrets, `sudo` install packages, force-push to main, or pipe a curl download to `sh`. Claudeye's hook policies sit in Claude Code's hook pipeline and block dangerous actions in real time — before they execute.
|
|
106
|
-
|
|
107
|
-
### Quick Start
|
|
108
|
-
|
|
109
|
-
```bash
|
|
110
|
-
# Install with interactive policy selector (Enter to toggle, S to submit)
|
|
111
|
-
claudeye --install-hooks
|
|
112
|
-
|
|
113
|
-
# Or install all 27 policies at once
|
|
114
|
-
claudeye --install-hooks all
|
|
115
|
-
|
|
116
|
-
# Install specific policies
|
|
117
|
-
claudeye --install-hooks block-sudo block-env-files sanitize-jwt
|
|
118
|
-
|
|
119
|
-
# See what's enabled
|
|
120
|
-
claudeye --list-hooks
|
|
121
|
-
|
|
122
|
-
# Disable a specific policy
|
|
123
|
-
claudeye --remove-hooks block-sudo
|
|
124
|
-
|
|
125
|
-
# Remove all hooks from Claude Code
|
|
126
|
-
claudeye --remove-hooks
|
|
127
|
-
```
|
|
128
|
-
|
|
129
|
-
### Available Policies
|
|
130
|
-
|
|
131
|
-
| Policy | Default | What it blocks / warns |
|
|
132
|
-
|--------|:-------:|----------------|
|
|
133
|
-
| `sanitize-jwt` | On | Scrubs JWT tokens (`eyJ...`) from tool output before Claude sees them |
|
|
134
|
-
| `sanitize-api-keys` | On | Scrubs API keys from tool output (OpenAI, Anthropic, GitHub, AWS, Stripe, Google) |
|
|
135
|
-
| `sanitize-connection-strings` | On | Scrubs database connection strings with embedded credentials (`postgresql://user:pass@host`) from tool output |
|
|
136
|
-
| `sanitize-private-key-content` | On | Scrubs PEM private key blocks (`-----BEGIN PRIVATE KEY-----`) from tool output |
|
|
137
|
-
| `sanitize-bearer-tokens` | On | Scrubs `Authorization: Bearer <token>` headers from tool output |
|
|
138
|
-
| `protect-env-vars` | On | `env`, `printenv`, `echo $SECRET`, `export API_KEY=...` |
|
|
139
|
-
| `block-env-files` | On | Reading or writing `.env`, `.env.local`, `.env.production` |
|
|
140
|
-
| `block-sudo` | On | Any `sudo` command |
|
|
141
|
-
| `block-curl-pipe-sh` | On | `curl ... \| sh`, `wget ... \| bash` — remote code execution |
|
|
142
|
-
| `block-push-master` | On | `git push origin main`, `git push origin master` |
|
|
143
|
-
| `block-claudeye-commands` | On | Direct `claudeye` CLI invocations and `npm/bun/yarn/pnpm uninstall claudeye` |
|
|
144
|
-
| `block-rm-rf` | Off | `rm -rf /`, `rm -rf ~`, `rm -rf /*` — catastrophic deletions |
|
|
145
|
-
| `block-force-push` | Off | `git push --force`, `git push -f` |
|
|
146
|
-
| `block-secrets-write` | Off | Writing to `.pem`, `.key`, `id_rsa`, `credentials` files |
|
|
147
|
-
| `block-read-outside-cwd` | Off | Read/Glob/Grep/Bash read commands targeting files outside session CWD (`~/.claude/` allowed except `settings*.json` files) |
|
|
148
|
-
| `block-work-on-main` | Off | `git commit`, `git merge`, `git rebase`, `git cherry-pick` when current branch is `main` or `master` |
|
|
149
|
-
| `warn-destructive-sql` | Off | Warns before `DROP TABLE/DATABASE`, `TRUNCATE`, or `DELETE FROM` (without `WHERE`) via `psql`, `mysql`, `sqlite3`, etc. |
|
|
150
|
-
| `warn-large-file-write` | Off | Warns when a Write tool payload exceeds 100KB (catches runaway generation loops) |
|
|
151
|
-
| `warn-package-publish` | Off | Warns before `npm publish`, `cargo publish`, `gem push`, `poetry publish`, and similar |
|
|
152
|
-
| `warn-repeated-tool-calls` | Off | Detects 3+ identical tool calls in session transcript; warns Claude to try a different approach |
|
|
153
|
-
| `warn-git-amend` | Off | Warns before `git commit --amend`, which rewrites history and can diverge shared branches |
|
|
154
|
-
| `warn-git-stash-drop` | Off | Warns before `git stash drop` or `git stash clear`, which permanently deletes stashed changes |
|
|
155
|
-
| `warn-all-files-staged` | Off | Warns before `git add -A`, `git add --all`, or `git add .`, which may stage unintended files |
|
|
156
|
-
| `warn-schema-alteration` | Off | Warns before `ALTER TABLE` with column or rename operations via `psql`, `mysql`, `sqlite3`, etc. |
|
|
157
|
-
| `warn-global-package-install` | Off | Warns before `npm install -g`, `cargo install`, `yarn global add`, and similar global installs |
|
|
158
|
-
| `warn-background-process` | Off | Warns before `nohup`, `screen -d`, `tmux new -d`, or commands backgrounded with `&` |
|
|
159
|
-
| `verify-intent` | ⚗ Beta | LLM-powered: verifies all user intents were addressed before Claude stops. Retries up to 3 times. Requires `--configure-llm` setup. |
|
|
160
|
-
|
|
161
|
-
Selection is saved to `~/.claudeye/hooks-config.json`. In non-TTY environments (CI/piped), the 11 default policies are used automatically. Re-running `--install-hooks` re-opens the selector with your current choices pre-loaded.
|
|
162
|
-
|
|
163
|
-
> **Web UI** — You can also toggle individual policies on/off from the **Policies** page in the Claudeye dashboard without re-running the CLI. Changes take effect immediately for the next hook invocation.
|
|
164
|
-
|
|
165
|
-
### Scoped Installation
|
|
166
|
-
|
|
167
|
-
By default, hooks are installed at **user** scope (`~/.claude/settings.json`). Use `--scope` to target a different settings file:
|
|
168
|
-
|
|
169
|
-
| Scope | File | Use case |
|
|
170
|
-
|-------|------|----------|
|
|
171
|
-
| `user` (default) | `~/.claude/settings.json` | Machine-wide — applies to all projects |
|
|
172
|
-
| `project` | `{cwd}/.claude/settings.json` | Committed to git — shared across the team |
|
|
173
|
-
| `local` | `{cwd}/.claude/settings.local.json` | Per-developer, gitignored |
|
|
174
|
-
|
|
175
|
-
```bash
|
|
176
|
-
# Install to project scope (committed, shared with team)
|
|
177
|
-
claudeye --install-hooks all --scope project
|
|
178
|
-
|
|
179
|
-
# Install to local scope (gitignored, personal)
|
|
180
|
-
claudeye --install-hooks --scope local
|
|
181
|
-
|
|
182
|
-
# Install to a specific project directory (without cd-ing into it)
|
|
183
|
-
claudeye --install-hooks all --scope project --cwd /path/to/project
|
|
184
|
-
|
|
185
|
-
# Remove from all scopes (default)
|
|
186
|
-
claudeye --remove-hooks
|
|
187
|
-
|
|
188
|
-
# Remove from a specific scope only
|
|
189
|
-
claudeye --remove-hooks --scope project
|
|
190
|
-
|
|
191
|
-
# See which scopes have hooks installed
|
|
192
|
-
claudeye --list-hooks
|
|
193
|
-
```
|
|
194
|
-
|
|
195
|
-
#### The `--cwd` flag
|
|
196
|
-
|
|
197
|
-
`--cwd` overrides `process.cwd()` when resolving settings paths for `--scope project` and `--scope local`. It has **no effect** with `--scope user` because user-scope settings always live at `~/.claude/settings.json` regardless of the working directory.
|
|
198
|
-
|
|
199
|
-
This is intentional: the `user` scope is machine-wide and has no concept of a project directory, while `project` and `local` scopes resolve relative to a project root. `--cwd` lets you target a project without `cd`-ing into it — useful in scripts, CI pipelines, or when managing hooks across multiple repositories.
|
|
200
|
-
|
|
201
|
-
```bash
|
|
202
|
-
# These two are equivalent:
|
|
203
|
-
cd /path/to/project && claudeye --install-hooks all --scope project
|
|
204
|
-
claudeye --install-hooks all --scope project --cwd /path/to/project
|
|
205
|
-
|
|
206
|
-
# --cwd is silently ignored here (user scope doesn't need a project directory)
|
|
207
|
-
claudeye --install-hooks all --cwd /path/to/project
|
|
208
|
-
# ↑ installs to ~/.claude/settings.json, NOT /path/to/project/.claude/settings.json
|
|
209
|
-
|
|
210
|
-
# Always pair --cwd with --scope project or --scope local:
|
|
211
|
-
claudeye --install-hooks all --scope project --cwd /path/to/project
|
|
212
|
-
claudeye --remove-hooks --scope local --cwd /path/to/project
|
|
213
|
-
claudeye --list-hooks --cwd /path/to/project
|
|
214
|
-
```
|
|
215
|
-
|
|
216
|
-
> **Why `--cwd` instead of `--project-dir`?** The flag was originally named `--project-dir`, but that was confusingly similar to `--projects-path` (which points to `~/.claude/projects` for the dashboard). `--cwd` is a widely understood convention (npm, pnpm, turbo all use it) and precisely describes the behavior: it sets the working directory for path resolution.
|
|
217
|
-
|
|
218
|
-
**Avoid installing hooks in multiple scopes for the same project.** Claude Code merges settings from all scopes, so duplicate hooks may cause the same policy to evaluate twice. If `--list-hooks` shows hooks in multiple scopes, remove the extra installation with `--remove-hooks --scope <scope>`.
|
|
219
|
-
|
|
220
|
-
### Policy Params
|
|
221
|
-
|
|
222
|
-
Tune builtin policies without replacing them. Add a `policyParams` key to any `.claudeye/hooks-config.json` file:
|
|
223
|
-
|
|
224
|
-
```json
|
|
225
|
-
{
|
|
226
|
-
"enabledPolicies": ["block-sudo", "block-push-master", "warn-large-file-write"],
|
|
227
|
-
"policyParams": {
|
|
228
|
-
"block-sudo": {
|
|
229
|
-
"allowPatterns": ["sudo systemctl status *", "sudo journalctl *"]
|
|
230
|
-
},
|
|
231
|
-
"block-push-master": {
|
|
232
|
-
"protectedBranches": ["main", "master", "release"]
|
|
233
|
-
},
|
|
234
|
-
"warn-large-file-write": {
|
|
235
|
-
"thresholdKb": 512
|
|
236
|
-
}
|
|
237
|
-
}
|
|
238
|
-
}
|
|
239
|
-
```
|
|
240
|
-
|
|
241
|
-
Config files are read from three scopes in priority order (first wins for params):
|
|
242
|
-
1. `{cwd}/.claudeye/hooks-config.json` — project (can be committed)
|
|
243
|
-
2. `{cwd}/.claudeye/hooks-config.local.json` — local (gitignore this)
|
|
244
|
-
3. `~/.claudeye/hooks-config.json` — global (managed by `--install-hooks`)
|
|
245
|
-
|
|
246
|
-
`enabledPolicies` are unioned across all three scopes.
|
|
247
|
-
|
|
248
|
-
| Policy | Param | Type | Default | Description |
|
|
249
|
-
|---|---|---|---|---|
|
|
250
|
-
| `block-sudo` | `allowPatterns` | `string[]` | `[]` | Token-matched patterns to allow (e.g. `"sudo systemctl *"`) |
|
|
251
|
-
| `block-rm-rf` | `allowPaths` | `string[]` | `[]` | Paths exempt from catastrophic-deletion blocking |
|
|
252
|
-
| `block-read-outside-cwd` | `allowPaths` | `string[]` | `[]` | Paths outside cwd allowed for reading |
|
|
253
|
-
| `block-push-master` | `protectedBranches` | `string[]` | `["main","master"]` | Branch names blocked from `git push` |
|
|
254
|
-
| `block-work-on-main` | `protectedBranches` | `string[]` | `["main","master"]` | Branch names blocked for commits/merges |
|
|
255
|
-
| `sanitize-api-keys` | `additionalPatterns` | `{regex,label}[]` | `[]` | Extra credential patterns to redact from output |
|
|
256
|
-
| `block-secrets-write` | `additionalPatterns` | `string[]` | `[]` | Extra filename substrings to block writing |
|
|
257
|
-
| `warn-large-file-write` | `thresholdKb` | `number` | `1024` | Threshold in KB above which file writes warn |
|
|
258
|
-
|
|
259
|
-
> **`allowPatterns` safety** — `block-sudo` matches patterns against **parsed argv tokens**, not the raw command string. This prevents shell injection bypass like `sudo systemctl status; rm -rf /` from matching the pattern `sudo systemctl status *`.
|
|
260
|
-
|
|
261
|
-
> **Web UI** — Policy params can also be edited directly from the **Policies tab** in the dashboard. Click the gear icon next to any policy with configurable parameters to open the editor.
|
|
262
|
-
|
|
263
|
-
### Custom Hooks
|
|
264
|
-
|
|
265
|
-
Register arbitrary hook logic in a JavaScript file using the `custom` keyword with `--install-hooks`:
|
|
266
|
-
|
|
267
|
-
```bash
|
|
268
|
-
claudeye --install-hooks custom ./my-hooks.js
|
|
269
|
-
```
|
|
270
|
-
|
|
271
|
-
**`my-hooks.js`**:
|
|
272
|
-
```js
|
|
273
|
-
import { customPolicies, allow, deny, instruct } from "claudeye";
|
|
274
|
-
|
|
275
|
-
customPolicies.add({
|
|
276
|
-
name: "block-production-writes",
|
|
277
|
-
description: "Prevent writes to production config files",
|
|
278
|
-
match: { events: ["PreToolUse"] },
|
|
279
|
-
fn: async (ctx) => {
|
|
280
|
-
if (ctx.toolName === "Write") {
|
|
281
|
-
const path = ctx.toolInput?.file_path ?? "";
|
|
282
|
-
if (path.includes("production") || path.includes("prod.")) {
|
|
283
|
-
return deny("Writing to production config is blocked");
|
|
284
|
-
}
|
|
285
|
-
}
|
|
286
|
-
return allow();
|
|
287
|
-
},
|
|
288
|
-
});
|
|
289
|
-
```
|
|
290
|
-
|
|
291
|
-
- **`ctx`** fields: `eventType`, `toolName`, `toolInput`, `payload`, `session`, `params`
|
|
292
|
-
- **Return values**: `allow()`, `deny(message)`, `instruct(message)` — `deny(message)` is surfaced to Claude as `"Blocked by claudeye: <message>"`, consistent with builtin policy output
|
|
293
|
-
- **Transitive imports**: files imported by your hooks entry point are automatically rewritten
|
|
294
|
-
- **Fail-open**: any error or 10-second timeout returns `allow` and logs a warning — Claude is never blocked by a broken hook
|
|
295
|
-
- **View loaded hooks**: `claudeye --list-hooks` shows a Custom Hooks section when a path is configured; the Policies tab in the dashboard shows each custom policy as a rich row with its name, description, and event scope (where defined), aligned with the built-in policy layout — edit the JS file to add, remove, or reorder them
|
|
296
|
-
- **Remove**: `claudeye --remove-hooks custom` clears the path from global config
|
|
297
|
-
- **Validation**: the file is loaded and validated at install time — if it has syntax errors, import failures, or registers no hooks, the install fails with an error and config is left unchanged
|
|
298
|
-
|
|
299
|
-
**TypeScript types** (exported from `claudeye`):
|
|
300
|
-
|
|
301
|
-
```ts
|
|
302
|
-
interface PolicyContext {
|
|
303
|
-
eventType: string;
|
|
304
|
-
toolName?: string;
|
|
305
|
-
toolInput?: Record<string, unknown>;
|
|
306
|
-
payload: Record<string, unknown>;
|
|
307
|
-
session?: {
|
|
308
|
-
sessionId?: string;
|
|
309
|
-
transcriptPath?: string;
|
|
310
|
-
cwd?: string;
|
|
311
|
-
permissionMode?: string;
|
|
312
|
-
hookEventName?: string;
|
|
313
|
-
};
|
|
314
|
-
params?: Record<string, unknown>;
|
|
315
|
-
}
|
|
316
|
-
|
|
317
|
-
type PolicyDecision = "allow" | "deny" | "instruct";
|
|
318
|
-
|
|
319
|
-
interface PolicyResult {
|
|
320
|
-
decision: PolicyDecision;
|
|
321
|
-
reason?: string;
|
|
322
|
-
message?: string;
|
|
323
|
-
}
|
|
324
|
-
|
|
325
|
-
interface CustomHook {
|
|
326
|
-
name: string;
|
|
327
|
-
description?: string;
|
|
328
|
-
match?: { events?: HookEventType[] };
|
|
329
|
-
fn: (ctx: PolicyContext) => PolicyResult | Promise<PolicyResult>;
|
|
330
|
-
}
|
|
331
|
-
```
|
|
332
|
-
|
|
333
|
-
### LLM-Powered Policies
|
|
334
|
-
|
|
335
|
-
Some policies (like `verify-intent`) use an external LLM to make intelligent decisions. These require a one-time configuration of an OpenAI-compatible API provider.
|
|
336
|
-
|
|
337
|
-
#### Configure LLM Provider
|
|
338
|
-
|
|
339
|
-
```bash
|
|
340
|
-
# Interactive setup (prompts for API key, base URL, model)
|
|
341
|
-
claudeye --configure-llm
|
|
342
|
-
|
|
343
|
-
# Non-interactive (flags)
|
|
344
|
-
claudeye --configure-llm --llm-api-key sk-xxx
|
|
345
|
-
|
|
346
|
-
# Full configuration
|
|
347
|
-
claudeye --configure-llm --llm-api-key sk-xxx --llm-base-url https://api.groq.com/openai/v1 --llm-model llama-3-70b
|
|
348
|
-
```
|
|
349
|
-
|
|
350
|
-
Any OpenAI-compatible API works — OpenAI, Groq, Together, Ollama (`http://localhost:11434/v1`), etc. Configuration is saved to `~/.claudeye/hooks-config.json`:
|
|
351
|
-
|
|
352
|
-
```json
|
|
353
|
-
{
|
|
354
|
-
"enabledPolicies": ["verify-intent"],
|
|
355
|
-
"llm": {
|
|
356
|
-
"apiKey": "sk-...",
|
|
357
|
-
"baseUrl": "https://api.openai.com/v1",
|
|
358
|
-
"model": "gpt-4o-mini"
|
|
359
|
-
}
|
|
360
|
-
}
|
|
361
|
-
```
|
|
362
|
-
|
|
363
|
-
You can also override at runtime with environment variables (useful for CI):
|
|
364
|
-
|
|
365
|
-
| Variable | Description | Default |
|
|
366
|
-
|----------|-------------|---------|
|
|
367
|
-
| `CLAUDEYE_LLM_API_KEY` | API key | - |
|
|
368
|
-
| `CLAUDEYE_LLM_BASE_URL` | Base URL | `https://api.openai.com/v1` |
|
|
369
|
-
| `CLAUDEYE_LLM_MODEL` | Model name | `gpt-4o-mini` |
|
|
370
|
-
|
|
371
|
-
#### verify-intent Policy
|
|
372
|
-
|
|
373
|
-
The `verify-intent` policy fires when Claude says it's done (Stop event) and uses two LLM calls to check whether all user requests were actually completed:
|
|
374
|
-
|
|
375
|
-
1. **Extract** — Reads the session transcript and asks the LLM to list all actionable user intents
|
|
376
|
-
2. **Verify** — Sends the intents and transcript to the LLM, which checks if each was satisfied
|
|
377
|
-
|
|
378
|
-
If any intents are unsatisfied, Claude is instructed to continue working. This repeats up to 3 times before allowing the stop.
|
|
379
|
-
|
|
380
|
-
```bash
|
|
381
|
-
# Enable after configuring LLM
|
|
382
|
-
claudeye --install-hooks verify-intent
|
|
383
|
-
```
|
|
384
|
-
|
|
385
|
-
### Hook Activity Dashboard
|
|
386
|
-
|
|
387
|
-
Every policy evaluation is logged to `~/.claudeye/cache/hook-activity/` and displayed at `/policies` in the dashboard. You can filter by decision (allow/deny), event type, and policy name — with pagination and auto-refresh.
|
|
388
|
-
|
|
389
|
-
Deny events are also reported to anonymous telemetry (if enabled) so you can track how often policies fire across machines.
|
|
390
|
-
|
|
391
|
-
### Hook Logging
|
|
392
|
-
|
|
393
|
-
Hook processes write structured log output to **stderr**, controlled by two environment variables:
|
|
394
|
-
|
|
395
|
-
| Variable | Purpose | Values | Default |
|
|
396
|
-
|----------|---------|--------|---------|
|
|
397
|
-
| `CLAUDEYE_LOG_LEVEL` | Log level threshold for both stderr and file output | `info`, `warn`, `error` | `warn` |
|
|
398
|
-
| `CLAUDEYE_HOOK_LOG_FILE` | Enable file logging | unset/empty = off, `1` or `true` = `~/.claudeye/logs/`, or a custom directory path | disabled |
|
|
399
|
-
|
|
400
|
-
#### What gets logged
|
|
401
|
-
|
|
402
|
-
| Level | What is logged |
|
|
403
|
-
|-------|---------------|
|
|
404
|
-
| `info` | Event type, enabled policy count, matched policies, evaluation result (decision, policy name, duration), deny/instruct reasons |
|
|
405
|
-
| `warn` | Stdin read failures, JSON payload parse failures, activity persistence failures |
|
|
406
|
-
| `error` | Critical failures |
|
|
407
|
-
|
|
408
|
-
Stderr output format: `[claudeye:hook] LEVEL message`
|
|
409
|
-
|
|
410
|
-
#### Stderr-only logging (default)
|
|
411
|
-
|
|
412
|
-
Set `CLAUDEYE_LOG_LEVEL` in your shell environment before launching Claude Code:
|
|
413
|
-
|
|
414
|
-
```bash
|
|
415
|
-
# See all hook activity in Claude Code's stderr
|
|
416
|
-
CLAUDEYE_LOG_LEVEL=info claude
|
|
417
|
-
|
|
418
|
-
# Only see warnings and errors (default behavior)
|
|
419
|
-
CLAUDEYE_LOG_LEVEL=warn claude
|
|
420
|
-
```
|
|
421
|
-
|
|
422
|
-
#### File logging (opt-in)
|
|
423
|
-
|
|
424
|
-
Enable persistent log files via `CLAUDEYE_HOOK_LOG_FILE`:
|
|
425
|
-
|
|
426
|
-
```bash
|
|
427
|
-
# Enable file logging to default directory (~/.claudeye/logs/)
|
|
428
|
-
CLAUDEYE_HOOK_LOG_FILE=1 CLAUDEYE_LOG_LEVEL=info claude
|
|
429
|
-
|
|
430
|
-
# Enable file logging to a custom directory
|
|
431
|
-
CLAUDEYE_HOOK_LOG_FILE=/var/log/claudeye CLAUDEYE_LOG_LEVEL=info claude
|
|
432
|
-
|
|
433
|
-
# In .bashrc or .zshrc for persistent config
|
|
434
|
-
export CLAUDEYE_LOG_LEVEL=info
|
|
435
|
-
export CLAUDEYE_HOOK_LOG_FILE=1
|
|
436
|
-
```
|
|
437
|
-
|
|
438
|
-
File logging details:
|
|
439
|
-
- **Active log file:** `hooks.log` in the log directory
|
|
440
|
-
- **Rotation:** Size-based at 500 KB — when `hooks.log` exceeds this, it is renamed to `hooks-{timestamp}.log` and a fresh `hooks.log` is created
|
|
441
|
-
- **Format:** `[ISO timestamp] LEVEL message` (plain text, human-readable)
|
|
442
|
-
- **Write mode:** Synchronous (hook processes are short-lived)
|
|
443
|
-
- **Failure handling:** File logging is best-effort — if writes fail, the hook still completes normally
|
|
444
|
-
|
|
445
|
-
### Using Hook Policies with Claude Agents SDK
|
|
446
|
-
|
|
447
|
-
Claudeye hook policies work with both **Claude Code** and the **Claude Agents SDK** (Python and TypeScript). The same `claudeye --install-hooks` installation works for both — the difference is how the SDK discovers the hooks.
|
|
448
|
-
|
|
449
|
-
#### Why it matters
|
|
450
|
-
|
|
451
|
-
When you run `claudeye --install-hooks`, hook entries are written to `~/.claude/settings.json`. Claude Code reads this file automatically. The Claude Agents SDK also supports these shell-command hooks, but **only if you explicitly tell it to load settings**.
|
|
452
|
-
|
|
453
|
-
Without this, your Agent SDK apps run with **zero hook protection** — even if you've installed hooks for Claude Code on the same machine.
|
|
454
|
-
|
|
455
|
-
#### Python (Claude Agents SDK)
|
|
456
|
-
|
|
457
|
-
Add `setting_sources=["user"]` to load hooks from `~/.claude/settings.json`:
|
|
458
|
-
|
|
459
|
-
```python
|
|
460
|
-
from claude_agent_sdk import query, ClaudeAgentOptions
|
|
461
|
-
|
|
462
|
-
options = ClaudeAgentOptions(
|
|
463
|
-
setting_sources=["user"], # Loads ~/.claude/settings.json (includes claudeye hooks)
|
|
464
|
-
)
|
|
465
|
-
|
|
466
|
-
for message in query(prompt="Refactor the auth module", options=options):
|
|
467
|
-
print(message)
|
|
468
|
-
```
|
|
469
|
-
|
|
470
|
-
To also load project-level settings (`.claude/settings.json` in the working directory):
|
|
471
|
-
|
|
472
|
-
```python
|
|
473
|
-
options = ClaudeAgentOptions(
|
|
474
|
-
setting_sources=["user", "project"],
|
|
475
|
-
)
|
|
476
|
-
```
|
|
477
|
-
|
|
478
|
-
#### TypeScript (Claude Agents SDK)
|
|
479
|
-
|
|
480
|
-
```typescript
|
|
481
|
-
import { query } from "@anthropic-ai/claude-agent-sdk";
|
|
482
|
-
|
|
483
|
-
for await (const message of query({
|
|
484
|
-
prompt: "Refactor the auth module",
|
|
485
|
-
options: {
|
|
486
|
-
settingSources: ["user"], // Loads ~/.claude/settings.json (includes claudeye hooks)
|
|
487
|
-
},
|
|
488
|
-
})) {
|
|
489
|
-
console.log(message);
|
|
490
|
-
}
|
|
491
|
-
```
|
|
492
|
-
|
|
493
|
-
#### What happens under the hood
|
|
494
|
-
|
|
495
|
-
The flow is identical to Claude Code:
|
|
496
|
-
|
|
497
|
-
1. Agent SDK reads `~/.claude/settings.json` and finds claudeye hook entries
|
|
498
|
-
2. Before a tool executes, the SDK invokes `claudeye --hook PreToolUse` with the tool call payload on stdin
|
|
499
|
-
3. Claudeye evaluates enabled policies and returns allow/deny via stdout
|
|
500
|
-
4. The SDK blocks the tool call if the policy denies it
|
|
501
|
-
|
|
502
|
-
The hook response format (`hookSpecificOutput` with `permissionDecision` and `permissionDecisionReason`) is the same protocol used by Claude Code — claudeye doesn't need to know which client is calling it.
|
|
503
|
-
|
|
504
|
-
#### Verification
|
|
505
|
-
|
|
506
|
-
After installing hooks and configuring `setting_sources`, you can verify by checking the `/policies` activity dashboard. Both Claude Code and Agent SDK hook events appear in the same activity log.
|
|
507
|
-
|
|
508
|
-
#### Common Pitfall
|
|
509
|
-
|
|
510
|
-
If you installed hooks with `claudeye --install-hooks` but your Agent SDK app isn't blocking anything, check that you've set `setting_sources`. Without it, the SDK ignores `~/.claude/settings.json` entirely — hooks, permissions, and all other settings.
|
|
511
|
-
|
|
512
|
-
### CLI Examples
|
|
513
|
-
|
|
514
|
-
```bash
|
|
515
|
-
# Custom projects path
|
|
516
|
-
claudeye --projects-path /path/to/projects
|
|
517
|
-
|
|
518
|
-
# Different port, no browser
|
|
519
|
-
claudeye --port 3000 --no-open
|
|
520
|
-
|
|
521
|
-
# LAN access
|
|
522
|
-
claudeye --host 0.0.0.0
|
|
523
|
-
|
|
524
|
-
# Load custom evals and enrichments
|
|
525
|
-
claudeye --evals ./my-evals.js
|
|
526
|
-
|
|
527
|
-
# Password-protect the dashboard
|
|
528
|
-
claudeye --auth-user admin:secret
|
|
529
|
-
|
|
530
|
-
# Multiple auth users
|
|
531
|
-
claudeye --auth-user admin:secret --auth-user viewer:readonly
|
|
532
|
-
|
|
533
|
-
# Clear cached results
|
|
534
|
-
claudeye --cache-clear
|
|
535
|
-
|
|
536
|
-
# Enable background queue processing (scan every 60 seconds)
|
|
537
|
-
claudeye --evals ./my-evals.js --queue-interval 60
|
|
538
|
-
|
|
539
|
-
# Background processing with higher concurrency
|
|
540
|
-
claudeye --evals ./my-evals.js --queue-interval 30 --queue-concurrency 5
|
|
541
|
-
```
|
|
542
|
-
|
|
543
|
-
## Enterprise
|
|
544
|
-
|
|
545
|
-
The free tier ships 13 built-in policies with more added in every update. For teams that need deeper control, Claudeye Enterprise provides:
|
|
546
|
-
|
|
547
|
-
- **Custom hook policies** — define organization-specific rules beyond the built-in set. Block access to internal endpoints, enforce branch naming conventions, restrict tool usage by project, or implement any policy your security team requires.
|
|
548
|
-
- **Failure pattern knowledge base** — an intelligent agent that learns from your agents' past failures. It checks every session against a growing library of known failure patterns — broken tool call sequences, retry loops, permission escalations, misapplied fixes — and flags issues before they compound. The knowledge base improves continuously as your agents encounter new edge cases.
|
|
549
|
-
|
|
550
|
-
[Contact us](https://discord.com/invite/zT92CAgvkj) to learn more.
|
|
551
|
-
|
|
552
|
-
## Why Claudeye?
|
|
553
|
-
|
|
554
|
-
| Feature | Claudeye | Langfuse | Dev-Agent-Lens | ccusage | Raw JSONL |
|
|
555
|
-
|---------|:--------:|:--------:|:--------------:|:-------:|:---------:|
|
|
556
|
-
| Local-first (no cloud) | **Yes** | Self-host option | Proxy required | Yes | Yes |
|
|
557
|
-
| Session replay | **Yes** | Traces only | Traces only | No | Manual |
|
|
558
|
-
| Custom evals | **Yes** | Limited | No | No | No |
|
|
559
|
-
| Subagent expansion | **Yes** | No | No | No | No |
|
|
560
|
-
| Zero config | **Yes** | Setup required | Proxy setup | Yes | N/A |
|
|
561
|
-
| Visual dashboard | **Yes** | Yes | Yes (Phoenix) | CLI only | No |
|
|
562
|
-
| Hook security policies | **Yes** | No | No | No | No |
|
|
563
|
-
|
|
564
|
-
## Features
|
|
565
|
-
|
|
566
|
-
### Uncover
|
|
567
|
-
|
|
568
|
-
- **Projects & sessions browser** - filter by date range or keyword, paginated and sorted newest-first
|
|
569
|
-
- **Full execution trace viewer** - every message, tool call, thinking block, and system event
|
|
570
|
-
- **Nested subagent logs** - expand to see subagent executions inline, pre-loaded with the session
|
|
571
|
-
- **Virtual scrolling** - handles sessions with thousands of entries without performance issues
|
|
572
|
-
|
|
573
|
-
### Understand
|
|
574
|
-
|
|
575
|
-
- **Session stats bar** - turns, tool calls, subagents, duration, and models at a glance
|
|
576
|
-
- **Custom evals** - grade sessions with pass/fail results and 0-1 scores
|
|
577
|
-
- **Per-eval recompute** - re-run a single eval without reprocessing all others
|
|
578
|
-
- **Conditional evals** - gate evals globally or per-item, with session/subagent scope control
|
|
579
|
-
- **Cache invalidation hook** - register a custom function via `app.cacheInvalidation()` to invalidate stale cached results based on age, score, or any custom logic
|
|
580
|
-
|
|
581
|
-
### Utilize
|
|
582
|
-
|
|
583
|
-
- **Hook security policies** - 13 built-in policies (free, with more every update) that block dangerous commands (`sudo`, `rm -rf /`, `.env` reads, `curl | sh`, force-pushes, commits on main, and more) in real time via Claude Code's hook system — before they execute
|
|
584
|
-
- **Hook activity dashboard** - see every policy check at `/hooks`: what was allowed, what was blocked, with filters by decision, event type, and policy name
|
|
585
|
-
- **Hook logging** - structured stderr output for every hook invocation (event type, policy count, decision, duration), plus opt-in file logging with automatic rotation via `CLAUDEYE_HOOK_LOG_FILE`
|
|
586
|
-
- **Custom enrichments** - compute metadata (token counts, quality signals, labels) as key-value pairs
|
|
587
|
-
- **Custom actions** - on-demand tasks triggered from the dashboard via `app.action()` — generate summaries, export metrics, or run side-effects with full access to eval and enrichment results
|
|
588
|
-
- **Alerts** - register callbacks via `app.alert()` that fire after all evals and enrichments complete (Slack webhooks, CI notifications, logging)
|
|
589
|
-
- **Dashboard views & filters** - organize filters into named views, each with focused filter tiles (boolean toggles, range sliders, multi-select dropdowns) and a filterable sessions table
|
|
590
|
-
- **Dashboard aggregates** - define cross-session summary tables with `app.dashboard.aggregate()`, using `{ collect, reduce }` for full control over output
|
|
591
|
-
- **Unified queue** - all evals and enrichments (session, subagent, UI, background) go through a single priority queue with bounded concurrency, live tracking at `/queue`
|
|
592
|
-
- **JSONL export** - download raw session logs
|
|
593
|
-
- **Auto-refresh** - monitor live sessions at 5s, 10s, or 30s intervals
|
|
594
|
-
- **Light/dark theme** - with system preference detection
|
|
595
|
-
|
|
596
|
-
## CLI Reference
|
|
597
|
-
|
|
598
|
-
| Flag | Description | Default |
|
|
599
|
-
|------|-------------|---------|
|
|
600
|
-
| `--projects-path, -p <path>` | Path to Claude projects directory | `~/.claude/projects` |
|
|
601
|
-
| `--port <number>` | Port to bind | `8020` |
|
|
602
|
-
| `--host <address>` | Host to bind (`0.0.0.0` for LAN) | `localhost` |
|
|
603
|
-
| `--evals <path>` | Path to evals/enrichments file | - |
|
|
604
|
-
| `--auth-user <user:pass>` | Add an auth user (repeatable) | - |
|
|
605
|
-
| `--cache-path <path>` | Custom cache directory | `~/.claudeye/cache` |
|
|
606
|
-
| `--cache-clear` | Clear all cached results and exit | - |
|
|
607
|
-
| `--no-open` | Don't auto-open the browser | - |
|
|
608
|
-
| `--queue-interval <secs>` | Background scan interval in seconds | disabled |
|
|
609
|
-
| `--queue-concurrency <num>` | Max parallel items per batch | `2` |
|
|
610
|
-
| `--queue-history-ttl <secs>` | Seconds to keep completed items | `3600` |
|
|
611
|
-
| `--max-queue-items <num>` | Max items to enqueue per scan (0=unlimited) | `500` |
|
|
612
|
-
| `--logging <level>` | Log level: `info`, `warn`, `error` (applies to dashboard server; hooks read `CLAUDEYE_LOG_LEVEL` env var) | `warn` |
|
|
613
|
-
| `--disable-telemetry` | Disable anonymous usage analytics | enabled |
|
|
614
|
-
| `--install-hooks [policies\|custom <path>]` | Install hooks (interactive, or: `all`, `name1 name2 ...`, or `custom <path>` to register a custom hooks JS file) | - |
|
|
615
|
-
| `--remove-hooks [policies\|custom]` | Remove hooks (all, or: `name1 name2 ...` to disable specific, or `custom` to clear the custom hooks path) | - |
|
|
616
|
-
| `--list-hooks` | Show hook policies in a table with enabled status, descriptions, and a separate Beta section | - |
|
|
617
|
-
| `--configure-llm` | Configure LLM provider for smart policies (interactive, or pass flags below) | - |
|
|
618
|
-
| `--llm-api-key <key>` | API key for the LLM provider | - |
|
|
619
|
-
| `--llm-base-url <url>` | Base URL for OpenAI-compatible API | `https://api.openai.com/v1` |
|
|
620
|
-
| `--llm-model <model>` | Model name | `gpt-4o-mini` |
|
|
621
|
-
| `--scope <scope>` | Scope for `--install-hooks` (default: `user`) / `--remove-hooks` (default: `all`): `user`, `project`, `local`, or `all` (remove only) | - |
|
|
622
|
-
| `--cwd <path>` | Working directory for `--scope project`/`local` (default: cwd). No effect with `--scope user`. | cwd |
|
|
623
|
-
| `-h, --help` | Show help | - |
|
|
624
|
-
|
|
625
|
-
## Custom Evals & Enrichments
|
|
626
|
-
|
|
627
|
-
Define evals and enrichments in a single JS file and load with `--evals`:
|
|
628
|
-
|
|
629
|
-
```js
|
|
630
|
-
import { createApp } from 'claudeye';
|
|
631
|
-
|
|
632
|
-
const app = createApp();
|
|
633
|
-
|
|
634
|
-
// Evals: grade your sessions
|
|
635
|
-
app.eval('under-50-turns', ({ stats }) => ({
|
|
636
|
-
pass: stats.turnCount <= 50,
|
|
637
|
-
score: Math.max(0, 1 - stats.turnCount / 100),
|
|
638
|
-
message: `${stats.turnCount} turn(s)`,
|
|
639
|
-
}));
|
|
640
|
-
|
|
641
|
-
app.eval('tool-success', ({ entries }) => {
|
|
642
|
-
const results = entries.filter(e => e.type === 'tool_result');
|
|
643
|
-
const errors = results.filter(e => e.is_error);
|
|
644
|
-
const rate = results.length ? 1 - errors.length / results.length : 1;
|
|
645
|
-
return { pass: rate >= 0.9, score: rate };
|
|
646
|
-
});
|
|
647
|
-
|
|
648
|
-
// Enrichments: add metadata to sessions
|
|
649
|
-
app.enrich('session-summary', ({ entries, stats }) => ({
|
|
650
|
-
'Total Tokens': entries.reduce((s, e) => s + (e.usage?.total_tokens || 0), 0),
|
|
651
|
-
'Primary Model': stats.models[0] || 'unknown',
|
|
652
|
-
'Tool Calls': stats.toolCallCount,
|
|
653
|
-
}));
|
|
654
|
-
```
|
|
655
|
-
|
|
656
|
-
```bash
|
|
657
|
-
claudeye --evals ./my-evals.js
|
|
658
|
-
```
|
|
659
|
-
|
|
660
|
-
---
|
|
661
|
-
|
|
662
|
-
## API Reference
|
|
663
|
-
|
|
664
|
-
> **Distribution note (v1.0.0+):** Claudeye is distributed as a compiled native binary. The `claudeye` npm package is a thin JS wrapper that proxies `createApp()` calls to the binary via JSON IPC. User-defined functions (eval closures, enrich callbacks, etc.) remain in your Node.js process — they are never serialized into the binary. The public API surface is identical to previous versions; no code changes are required.
|
|
665
|
-
|
|
666
|
-
### `createApp()`
|
|
667
|
-
|
|
668
|
-
Returns a `ClaudeyeApp` instance. All methods are chainable.
|
|
669
|
-
|
|
670
|
-
```ts
|
|
671
|
-
import { createApp } from 'claudeye';
|
|
672
|
-
const app = createApp();
|
|
673
|
-
```
|
|
674
|
-
|
|
675
|
-
---
|
|
676
|
-
|
|
677
|
-
### `app.condition(fn)`
|
|
678
|
-
|
|
679
|
-
Set a global condition that gates all evals, enrichments, and actions. Calling this multiple times replaces the previous condition.
|
|
680
|
-
|
|
681
|
-
```ts
|
|
682
|
-
app.condition(({ entries, stats, projectName, sessionId }) => boolean | Promise<boolean>);
|
|
683
|
-
```
|
|
684
|
-
|
|
685
|
-
If the global condition returns `false` (or throws), every registered eval, enrichment, and action is skipped.
|
|
686
|
-
|
|
687
|
-
#### Examples
|
|
688
|
-
|
|
689
|
-
```js
|
|
690
|
-
// Only run for sessions with actual content
|
|
691
|
-
app.condition(({ entries }) => entries.length > 0);
|
|
692
|
-
|
|
693
|
-
// Only run for non-test projects
|
|
694
|
-
app.condition(({ projectName }) => !projectName.includes('test'));
|
|
695
|
-
|
|
696
|
-
// Only run for sessions longer than 5 turns
|
|
697
|
-
app.condition(({ stats }) => stats.turnCount >= 5);
|
|
698
|
-
|
|
699
|
-
// Async condition
|
|
700
|
-
app.condition(async ({ sessionId }) => {
|
|
701
|
-
// You could check an external service, database, etc.
|
|
702
|
-
return sessionId !== 'skip-this-one';
|
|
703
|
-
});
|
|
704
|
-
```
|
|
705
|
-
|
|
706
|
-
#### Combining Global and Per-Item Conditions
|
|
707
|
-
|
|
708
|
-
Global and per-item conditions stack. The global condition runs first; if it passes, per-item conditions are checked individually:
|
|
709
|
-
|
|
710
|
-
```js
|
|
711
|
-
const app = createApp();
|
|
712
|
-
|
|
713
|
-
// Global: skip everything for empty sessions
|
|
714
|
-
app.condition(({ entries }) => entries.length > 0);
|
|
715
|
-
|
|
716
|
-
// Per-eval: only check turn count for sessions with tool calls
|
|
717
|
-
app.eval('efficient-tools',
|
|
718
|
-
({ stats }) => ({
|
|
719
|
-
pass: stats.toolCallCount <= stats.turnCount * 2,
|
|
720
|
-
score: Math.max(0, 1 - (stats.toolCallCount / (stats.turnCount * 4))),
|
|
721
|
-
}),
|
|
722
|
-
{ condition: ({ stats }) => stats.toolCallCount > 0 }
|
|
723
|
-
);
|
|
724
|
-
|
|
725
|
-
// Per-enrichment: only compute model info for sessions that used a model
|
|
726
|
-
app.enrich('model-info',
|
|
727
|
-
({ stats }) => ({
|
|
728
|
-
'Primary Model': stats.models[0] || 'unknown',
|
|
729
|
-
'Model Count': stats.models.length,
|
|
730
|
-
}),
|
|
731
|
-
{ condition: ({ stats }) => stats.models.length > 0 }
|
|
732
|
-
);
|
|
733
|
-
```
|
|
734
|
-
|
|
735
|
-
> **Note:** Calling `app.condition()` multiple times replaces the previous condition. Only the last one is active. The global condition applies to both evals and enrichments; there's no way to set separate global conditions for each.
|
|
736
|
-
|
|
737
|
-
---
|
|
738
|
-
|
|
739
|
-
### `app.queueCondition(fn, options?)`
|
|
740
|
-
|
|
741
|
-
Set a condition that gates **background queue processing**. This is separate from `app.condition()` — it only affects the background scanner (`scanAndEnqueue`), not UI-triggered runs.
|
|
742
|
-
|
|
743
|
-
```ts
|
|
744
|
-
app.queueCondition(fn: ConditionFunction, options?: QueueConditionOptions): ClaudeyeApp;
|
|
745
|
-
```
|
|
746
|
-
|
|
747
|
-
If the queue condition returns `false` (or throws), the session is skipped entirely — no evals or enrichments are enqueued for that session during background scanning.
|
|
748
|
-
|
|
749
|
-
#### Options
|
|
750
|
-
|
|
751
|
-
| Option | Type | Default | Description |
|
|
752
|
-
|--------|------|---------|-------------|
|
|
753
|
-
| `cacheable` | `boolean` | `false` | When `true`, condition results are cached per-session with hash-based invalidation. When `false` (default), the condition is evaluated fresh every scan cycle. |
|
|
754
|
-
|
|
755
|
-
#### Caching
|
|
756
|
-
|
|
757
|
-
By default (`cacheable: false`), the condition is re-evaluated every scan cycle. This is safe for time-dependent or external-state-dependent conditions (e.g., business hours, feature flags).
|
|
758
|
-
|
|
759
|
-
When `cacheable: true`, condition results are cached per-session. The cache auto-invalidates when:
|
|
760
|
-
- The **session file changes** (new content → different `contentHash`)
|
|
761
|
-
- The **condition function is edited** (different `conditionCodeHash` via `fn.toString()` → SHA-256)
|
|
762
|
-
|
|
763
|
-
Use `cacheable: true` only when the condition depends solely on session data.
|
|
764
|
-
|
|
765
|
-
Log parsing is **lazy** — only triggered when the condition is actually evaluated (every cycle when non-cacheable, or on cache miss when cacheable). The parsed session data comes from `getCachedSessionLog()` which is itself LRU-cached in memory.
|
|
766
|
-
|
|
767
|
-
#### Examples
|
|
768
|
-
|
|
769
|
-
```js
|
|
770
|
-
// Only process sessions with actual content (evaluated fresh every cycle)
|
|
771
|
-
app.queueCondition(({ entries }) => entries.length > 5);
|
|
772
|
-
|
|
773
|
-
// Only process sessions from specific projects (safe to cache — depends only on session data)
|
|
774
|
-
app.queueCondition(({ projectName }) => projectName.startsWith('production'), { cacheable: true });
|
|
775
|
-
|
|
776
|
-
// Time-dependent condition — must not be cached
|
|
777
|
-
app.queueCondition(() => {
|
|
778
|
-
const hour = new Date().getHours();
|
|
779
|
-
return hour >= 9 && hour < 17; // business hours only
|
|
780
|
-
});
|
|
781
|
-
|
|
782
|
-
// Skip sessions that are too short to be meaningful (safe to cache)
|
|
783
|
-
app.queueCondition(({ stats }) => stats.turnCount >= 3, { cacheable: true });
|
|
784
|
-
```
|
|
785
|
-
|
|
786
|
-
#### Difference from `app.condition()`
|
|
787
|
-
|
|
788
|
-
| | `app.condition()` | `app.queueCondition()` |
|
|
789
|
-
|---|---|---|
|
|
790
|
-
| **Scope** | Gates all evals/enrichments/actions | Gates background queue only |
|
|
791
|
-
| **Affects UI runs** | Yes | No |
|
|
792
|
-
| **Affects background scanner** | No | Yes |
|
|
793
|
-
| **Caching** | No (re-evaluated each run) | Opt-in via `{ cacheable: true }` |
|
|
794
|
-
|
|
795
|
-
Both can be set simultaneously. `app.queueCondition()` acts as a pre-filter for the background scanner; sessions that pass then go through the normal `app.condition()` check when individual items execute.
|
|
796
|
-
|
|
797
|
-
---
|
|
798
|
-
|
|
799
|
-
### `app.cacheInvalidation(fn)`
|
|
800
|
-
|
|
801
|
-
Register a cache invalidation hook that runs before serving any cached eval or enrichment result. If the hook returns `true`, the cached entry is discarded and the item re-runs. Only affects evals and enrichments — actions, alerts, and conditions pass through unchanged.
|
|
802
|
-
|
|
803
|
-
```ts
|
|
804
|
-
app.cacheInvalidation((ctx) => boolean | Promise<boolean>);
|
|
805
|
-
```
|
|
806
|
-
|
|
807
|
-
The `ctx` object includes `itemName`, `itemKind` (`"evals"` or `"enrichments"`), `cachedAt` (ISO timestamp), `cachedValue` (the full cached result), `projectName`, `sessionId`, `contentHash`, `itemCodeHash` (hash from the cached entry), and `currentItemCodeHash` (hash of the current function code).
|
|
808
|
-
|
|
809
|
-
```js
|
|
810
|
-
// Invalidate cache entries older than 24 hours
|
|
811
|
-
app.cacheInvalidation(({ cachedAt }) => {
|
|
812
|
-
const age = Date.now() - new Date(cachedAt).getTime();
|
|
813
|
-
return age > 24 * 60 * 60 * 1000;
|
|
814
|
-
});
|
|
815
|
-
|
|
816
|
-
// Re-run a specific eval when cached score is 0
|
|
817
|
-
app.cacheInvalidation((ctx) => {
|
|
818
|
-
if (ctx.itemKind === 'evals' && ctx.itemName === 'no-hallucination') {
|
|
819
|
-
return ctx.cachedValue.score === 0;
|
|
820
|
-
}
|
|
821
|
-
return false;
|
|
822
|
-
});
|
|
823
|
-
|
|
824
|
-
// Re-run when the eval/enrichment function code has changed
|
|
825
|
-
app.cacheInvalidation(({ itemCodeHash, currentItemCodeHash }) => {
|
|
826
|
-
return itemCodeHash !== currentItemCodeHash;
|
|
827
|
-
});
|
|
828
|
-
```
|
|
829
|
-
|
|
830
|
-
---
|
|
831
|
-
|
|
832
|
-
### `app.eval(name, fn, options?)`
|
|
833
|
-
|
|
834
|
-
Register an eval function. Evals grade sessions with a pass/fail result and an optional 0-1 score.
|
|
835
|
-
|
|
836
|
-
- **`name`** - unique string identifier for the eval
|
|
837
|
-
- **`fn`** - function receiving an `EvalContext` and returning an `EvalResult`
|
|
838
|
-
- **`options.condition`** - optional condition function to gate this eval
|
|
839
|
-
- **`options.scope`** - `'session'` (default), `'subagent'`, or `'both'`
|
|
840
|
-
- **`options.subagentType`** - only run for subagents of this type (e.g. `'Explore'`)
|
|
841
|
-
|
|
842
|
-
If a per-eval condition returns `false`, the eval is marked as **skipped** in the results panel. If the condition throws, the eval is marked as **errored** with the message `Condition error: <message>`.
|
|
843
|
-
|
|
844
|
-
#### `EvalResult`
|
|
845
|
-
|
|
846
|
-
```ts
|
|
847
|
-
interface EvalResult {
|
|
848
|
-
pass: boolean; // Did the eval pass?
|
|
849
|
-
score?: number; // 0-1, clamped automatically (default: 1.0)
|
|
850
|
-
message?: string; // Shown in the UI
|
|
851
|
-
metadata?: Record<string, unknown>; // Arbitrary data
|
|
852
|
-
}
|
|
853
|
-
```
|
|
854
|
-
|
|
855
|
-
#### Examples
|
|
856
|
-
|
|
857
|
-
```js
|
|
858
|
-
// Simple: check if session stayed under a turn budget
|
|
859
|
-
app.eval('under-50-turns', ({ stats }) => ({
|
|
860
|
-
pass: stats.turnCount <= 50,
|
|
861
|
-
score: Math.max(0, 1 - stats.turnCount / 100),
|
|
862
|
-
message: `${stats.turnCount} turn(s)`,
|
|
863
|
-
}));
|
|
864
|
-
|
|
865
|
-
// Check tool success rate
|
|
866
|
-
app.eval('tool-success-rate', ({ entries }) => {
|
|
867
|
-
const toolResults = entries.filter(e =>
|
|
868
|
-
e.type === 'user' &&
|
|
869
|
-
Array.isArray(e.message?.content) &&
|
|
870
|
-
e.message.content.some(b => b.type === 'tool_result')
|
|
871
|
-
);
|
|
872
|
-
const errors = toolResults.filter(e =>
|
|
873
|
-
e.message?.content?.some(b => b.is_error === true)
|
|
874
|
-
);
|
|
875
|
-
const rate = toolResults.length > 0
|
|
876
|
-
? 1 - (errors.length / toolResults.length)
|
|
877
|
-
: 1;
|
|
878
|
-
return {
|
|
879
|
-
pass: rate >= 0.9,
|
|
880
|
-
score: rate,
|
|
881
|
-
message: `${errors.length}/${toolResults.length} tool errors`,
|
|
882
|
-
};
|
|
883
|
-
});
|
|
884
|
-
|
|
885
|
-
// Check that the session ended with a text response
|
|
886
|
-
app.eval('has-completion', ({ entries }) => {
|
|
887
|
-
const lastAssistant = [...entries].reverse().find(e => e.type === 'assistant');
|
|
888
|
-
const hasText = lastAssistant?.message?.content?.some?.(b => b.type === 'text');
|
|
889
|
-
return {
|
|
890
|
-
pass: !!hasText,
|
|
891
|
-
score: hasText ? 1.0 : 0,
|
|
892
|
-
message: hasText ? 'Session completed with text response' : 'No final text response',
|
|
893
|
-
};
|
|
894
|
-
});
|
|
895
|
-
|
|
896
|
-
// With a per-eval condition: only run for longer sessions
|
|
897
|
-
app.eval('under-budget',
|
|
898
|
-
({ stats }) => ({
|
|
899
|
-
pass: stats.turnCount <= 30,
|
|
900
|
-
score: Math.max(0, 1 - stats.turnCount / 60),
|
|
901
|
-
message: `${stats.turnCount} turns`,
|
|
902
|
-
}),
|
|
903
|
-
{ condition: ({ stats }) => stats.turnCount >= 5 }
|
|
904
|
-
);
|
|
905
|
-
|
|
906
|
-
// Subagent-scoped eval
|
|
907
|
-
app.eval('explore-depth', ({ entries, source }) => {
|
|
908
|
-
const myEntries = entries.filter(e => e._source === source);
|
|
909
|
-
return {
|
|
910
|
-
pass: myEntries.length > 5,
|
|
911
|
-
score: Math.min(myEntries.length / 20, 1),
|
|
912
|
-
};
|
|
913
|
-
}, { scope: 'subagent', subagentType: 'Explore' });
|
|
914
|
-
```
|
|
915
|
-
|
|
916
|
-
---
|
|
917
|
-
|
|
918
|
-
### `app.enrich(name, fn, options?)`
|
|
919
|
-
|
|
920
|
-
Register an enricher function. Enrichments compute key-value metadata displayed in the dashboard.
|
|
921
|
-
|
|
922
|
-
- **`name`** - unique string identifier for the enricher
|
|
923
|
-
- **`fn`** - function receiving an `EvalContext` and returning a `Record<string, string | number | boolean>`
|
|
924
|
-
- **`options.condition`** - optional condition function to gate this enricher
|
|
925
|
-
- **`options.scope`** - `'session'` (default), `'subagent'`, or `'both'`
|
|
926
|
-
- **`options.subagentType`** - only run for subagents of this type (e.g. `'Explore'`)
|
|
927
|
-
|
|
928
|
-
#### `EnrichmentResult`
|
|
929
|
-
|
|
930
|
-
```ts
|
|
931
|
-
// Enrichers return a flat key-value map
|
|
932
|
-
type EnrichmentResult = Record<string, string | number | boolean>;
|
|
933
|
-
```
|
|
934
|
-
|
|
935
|
-
#### Examples
|
|
936
|
-
|
|
937
|
-
```js
|
|
938
|
-
// Session overview
|
|
939
|
-
app.enrich('overview', ({ stats }) => ({
|
|
940
|
-
'Turns': stats.turnCount,
|
|
941
|
-
'Tool Calls': stats.toolCallCount,
|
|
942
|
-
'Duration': stats.duration,
|
|
943
|
-
'Models': stats.models.join(', ') || 'none',
|
|
944
|
-
}));
|
|
945
|
-
|
|
946
|
-
// Token and cost breakdown
|
|
947
|
-
app.enrich('token-usage', ({ entries }) => {
|
|
948
|
-
const inputTokens = entries.reduce((s, e) => s + (e.usage?.input_tokens || 0), 0);
|
|
949
|
-
const outputTokens = entries.reduce((s, e) => s + (e.usage?.output_tokens || 0), 0);
|
|
950
|
-
return {
|
|
951
|
-
'Input Tokens': inputTokens,
|
|
952
|
-
'Output Tokens': outputTokens,
|
|
953
|
-
'Total Tokens': inputTokens + outputTokens,
|
|
954
|
-
'Est. Cost': `$${((inputTokens * 0.003 + outputTokens * 0.015) / 1000).toFixed(4)}`,
|
|
955
|
-
};
|
|
956
|
-
});
|
|
957
|
-
|
|
958
|
-
// Error analysis (only when errors exist)
|
|
959
|
-
app.enrich('error-analysis',
|
|
960
|
-
({ entries }) => {
|
|
961
|
-
const errors = entries.filter(e => e.is_error === true);
|
|
962
|
-
return {
|
|
963
|
-
'Total Errors': errors.length,
|
|
964
|
-
'Error Rate': `${((errors.length / entries.length) * 100).toFixed(1)}%`,
|
|
965
|
-
};
|
|
966
|
-
},
|
|
967
|
-
{ condition: ({ entries }) => entries.some(e => e.is_error === true) }
|
|
968
|
-
);
|
|
969
|
-
|
|
970
|
-
// Subagent info (only when subagents were spawned)
|
|
971
|
-
app.enrich('subagent-info',
|
|
972
|
-
({ entries, stats }) => {
|
|
973
|
-
const subagentEntries = entries.filter(e => e.type === 'assistant' && e.parentUuid);
|
|
974
|
-
return {
|
|
975
|
-
'Subagent Count': stats.subagentCount,
|
|
976
|
-
'Subagent Entries': subagentEntries.length,
|
|
977
|
-
};
|
|
978
|
-
},
|
|
979
|
-
{ condition: ({ stats }) => stats.subagentCount > 0 }
|
|
980
|
-
);
|
|
981
|
-
|
|
982
|
-
// Advanced metrics (async condition)
|
|
983
|
-
app.enrich('advanced-metrics',
|
|
984
|
-
({ entries }) => ({
|
|
985
|
-
'Entry Count': entries.length,
|
|
986
|
-
'Avg Entry Size': Math.round(
|
|
987
|
-
entries.reduce((s, e) => s + JSON.stringify(e).length, 0) / entries.length
|
|
988
|
-
),
|
|
989
|
-
}),
|
|
990
|
-
{
|
|
991
|
-
condition: async ({ entries }) => {
|
|
992
|
-
return entries.length > 10;
|
|
993
|
-
},
|
|
994
|
-
}
|
|
995
|
-
);
|
|
996
|
-
```
|
|
997
|
-
|
|
998
|
-
---
|
|
999
|
-
|
|
1000
|
-
### `app.action(name, fn, options?)`
|
|
1001
|
-
|
|
1002
|
-
Register a user-defined action. Actions are a flexible primitive for on-demand tasks — generating summaries, exporting metrics, running side-effects, or anything else that doesn't fit the eval (pass/fail) or enrichment (key-value) model. Actions are never auto-run; they are triggered manually from the dashboard.
|
|
1003
|
-
|
|
1004
|
-
- **`name`** - unique string identifier for the action
|
|
1005
|
-
- **`fn`** - function receiving an `ActionContext` and returning an `ActionResult`
|
|
1006
|
-
- **`options.condition`** - optional condition function to gate this action
|
|
1007
|
-
- **`options.scope`** - `'session'` (default), `'subagent'`, or `'both'`
|
|
1008
|
-
- **`options.subagentType`** - only run for subagents of this type (e.g. `'Explore'`)
|
|
1009
|
-
- **`options.cache`** - cache results (default: `true`). Set to `false` for side-effect actions that should always re-run.
|
|
1010
|
-
- **`options.inputs`** - optional array of `ActionInputField` definitions. When provided, the dashboard renders an inline form before running the action. Collected values are available via `context.inputs`. Actions without inputs work exactly as before.
|
|
1011
|
-
|
|
1012
|
-
#### `ActionContext`
|
|
1013
|
-
|
|
1014
|
-
Actions receive an extended context that includes cached eval and enrichment results:
|
|
1015
|
-
|
|
1016
|
-
```ts
|
|
1017
|
-
interface ActionContext extends EvalContext {
|
|
1018
|
-
evalResults: Record<string, EvalRunResult>; // Cached eval results for the session
|
|
1019
|
-
enrichmentResults: Record<string, EnrichRunResult>; // Cached enrichment results
|
|
1020
|
-
inputs: ActionInputValues; // Values collected from the input form (empty object if no inputs defined)
|
|
1021
|
-
}
|
|
1022
|
-
```
|
|
1023
|
-
|
|
1024
|
-
This means actions can build on prior analysis — check which evals passed, read enrichment data, and combine it with raw session data. When `options.inputs` is defined, `context.inputs` contains the user-provided values keyed by field name.
|
|
1025
|
-
|
|
1026
|
-
#### `ActionResult`
|
|
1027
|
-
|
|
1028
|
-
```ts
|
|
1029
|
-
interface ActionResult {
|
|
1030
|
-
output?: string; // Free-form text (rendered in monospace block with copy button)
|
|
1031
|
-
status: 'success' | 'error'; // Action outcome
|
|
1032
|
-
message?: string; // Short summary shown in the UI
|
|
1033
|
-
}
|
|
1034
|
-
```
|
|
1035
|
-
|
|
1036
|
-
Return `output` for text. The output is rendered in a scrollable monospace block with a copy-to-clipboard button. The `status` field determines the icon in the results panel.
|
|
1037
|
-
|
|
1038
|
-
#### Examples
|
|
1039
|
-
|
|
1040
|
-
```js
|
|
1041
|
-
// Session summary: combine stats with eval results
|
|
1042
|
-
app.action('session-summary', ({ stats, evalResults }) => {
|
|
1043
|
-
const evalNames = Object.keys(evalResults);
|
|
1044
|
-
const passCount = evalNames.filter(n => evalResults[n]?.pass).length;
|
|
1045
|
-
return {
|
|
1046
|
-
output: [
|
|
1047
|
-
`Session: ${stats.turnCount} turns, ${stats.toolCallCount} tool calls`,
|
|
1048
|
-
`Duration: ${stats.duration}`,
|
|
1049
|
-
`Models: ${stats.models.join(', ') || 'unknown'}`,
|
|
1050
|
-
`Evals: ${passCount}/${evalNames.length} passed`,
|
|
1051
|
-
].join('\n'),
|
|
1052
|
-
status: 'success',
|
|
1053
|
-
message: 'Summary generated',
|
|
1054
|
-
};
|
|
1055
|
-
});
|
|
1056
|
-
|
|
1057
|
-
// Export metrics: gather enrichment data into a text report
|
|
1058
|
-
app.action('export-metrics', ({ stats, enrichmentResults }) => {
|
|
1059
|
-
const enrichData = {};
|
|
1060
|
-
for (const [name, result] of Object.entries(enrichmentResults)) {
|
|
1061
|
-
if (result.data) Object.assign(enrichData, result.data);
|
|
1062
|
-
}
|
|
1063
|
-
const lines = [
|
|
1064
|
-
...Object.entries(enrichData).map(([k, v]) => `${k}: ${v}`),
|
|
1065
|
-
`turnCount: ${stats.turnCount}`,
|
|
1066
|
-
`toolCallCount: ${stats.toolCallCount}`,
|
|
1067
|
-
];
|
|
1068
|
-
return {
|
|
1069
|
-
output: lines.join('\n'),
|
|
1070
|
-
status: 'success',
|
|
1071
|
-
message: `Exported ${Object.keys(enrichData).length + 2} metrics`,
|
|
1072
|
-
};
|
|
1073
|
-
});
|
|
1074
|
-
|
|
1075
|
-
// Action with user-defined inputs: prompt for parameters before running
|
|
1076
|
-
app.action('custom-summary', ({ stats, evalResults, inputs }) => {
|
|
1077
|
-
const maxLines = inputs.maxLines ?? 20;
|
|
1078
|
-
const includeEvals = inputs.includeEvals ?? true;
|
|
1079
|
-
const lines = [`Session: ${stats.turnCount} turns, ${stats.toolCallCount} tool calls`];
|
|
1080
|
-
if (includeEvals) {
|
|
1081
|
-
const evalNames = Object.keys(evalResults);
|
|
1082
|
-
const passCount = evalNames.filter(n => evalResults[n]?.pass).length;
|
|
1083
|
-
lines.push(`Evals: ${passCount}/${evalNames.length} passed`);
|
|
1084
|
-
}
|
|
1085
|
-
return { output: lines.slice(0, maxLines).join('\n'), status: 'success' };
|
|
1086
|
-
}, {
|
|
1087
|
-
inputs: [
|
|
1088
|
-
{ name: 'maxLines', type: 'number', label: 'Max lines', default: 20 },
|
|
1089
|
-
{ name: 'includeEvals', type: 'boolean', label: 'Include eval results', default: true },
|
|
1090
|
-
{ name: 'format', type: 'select', label: 'Output format', options: [
|
|
1091
|
-
{ label: 'Plain text', value: 'text' },
|
|
1092
|
-
{ label: 'Markdown', value: 'markdown' },
|
|
1093
|
-
]},
|
|
1094
|
-
],
|
|
1095
|
-
});
|
|
1096
|
-
|
|
1097
|
-
// Side-effect action: write results to a file (disable caching)
|
|
1098
|
-
app.action('write-report', async ({ projectName, sessionId, stats, evalResults }) => {
|
|
1099
|
-
const fs = await import('fs/promises');
|
|
1100
|
-
const report = {
|
|
1101
|
-
projectName, sessionId,
|
|
1102
|
-
turns: stats.turnCount,
|
|
1103
|
-
evals: Object.fromEntries(
|
|
1104
|
-
Object.entries(evalResults).map(([name, r]) => [name, { pass: r.pass, score: r.score }])
|
|
1105
|
-
),
|
|
1106
|
-
timestamp: new Date().toISOString(),
|
|
1107
|
-
};
|
|
1108
|
-
await fs.appendFile('session-reports.jsonl', JSON.stringify(report) + '\n');
|
|
1109
|
-
return {
|
|
1110
|
-
status: 'success',
|
|
1111
|
-
message: `Report appended to session-reports.jsonl`,
|
|
1112
|
-
output: 'Written to session-reports.jsonl',
|
|
1113
|
-
};
|
|
1114
|
-
}, { cache: false });
|
|
1115
|
-
|
|
1116
|
-
// Conditional action: only available for sessions with tool usage
|
|
1117
|
-
app.action('tool-analysis', ({ entries, source }) => {
|
|
1118
|
-
const toolUses = entries.filter(e =>
|
|
1119
|
-
e._source === (source || 'session') &&
|
|
1120
|
-
e.type === 'assistant' &&
|
|
1121
|
-
Array.isArray(e.message?.content) &&
|
|
1122
|
-
e.message.content.some(b => b.type === 'tool_use')
|
|
1123
|
-
);
|
|
1124
|
-
const toolNames = [...new Set(toolUses.flatMap(e =>
|
|
1125
|
-
(e.message?.content || []).filter(b => b.type === 'tool_use').map(b => b.name)
|
|
1126
|
-
))];
|
|
1127
|
-
return {
|
|
1128
|
-
output: toolNames.length > 0
|
|
1129
|
-
? `Tools used: ${toolNames.join(', ')}`
|
|
1130
|
-
: 'No tools used',
|
|
1131
|
-
status: 'success',
|
|
1132
|
-
};
|
|
1133
|
-
}, { condition: ({ stats }) => stats.toolCallCount > 0 });
|
|
1134
|
-
|
|
1135
|
-
// Subagent-scoped action
|
|
1136
|
-
app.action('agent-report', ({ entries, source, stats }) => {
|
|
1137
|
-
const myEntries = entries.filter(e => e._source === source);
|
|
1138
|
-
return {
|
|
1139
|
-
output: [
|
|
1140
|
-
`Source: ${source}`,
|
|
1141
|
-
`Entries: ${myEntries.length}`,
|
|
1142
|
-
`Turns: ${stats.turnCount}`,
|
|
1143
|
-
`Tool calls: ${stats.toolCallCount}`,
|
|
1144
|
-
].join('\n'),
|
|
1145
|
-
status: 'success',
|
|
1146
|
-
message: `Agent ${source}: ${myEntries.length} entries`,
|
|
1147
|
-
};
|
|
1148
|
-
}, { scope: 'subagent' });
|
|
1149
|
-
```
|
|
1150
|
-
|
|
1151
|
-
#### Action UI Behavior
|
|
1152
|
-
|
|
1153
|
-
The Actions panel appears on session pages (and in expanded subagent cards for subagent-scoped actions) when matching actions are registered. The panel:
|
|
1154
|
-
|
|
1155
|
-
- Shows all registered actions as idle (not run) on initial load, with cached results displayed immediately
|
|
1156
|
-
- Provides a **Run** button per action and a **Run All** button in the header
|
|
1157
|
-
- For actions with `inputs`, clicking **Run** (or **Run All**) opens an inline form to collect values before executing. When re-running a completed action, the form is pre-filled with the previous values so the user can review or modify them.
|
|
1158
|
-
- Displays `output` in a scrollable monospace block with a copy-to-clipboard button
|
|
1159
|
-
- Shows a "cached" badge for results served from cache
|
|
1160
|
-
- Collapses by default (click the header to expand)
|
|
1161
|
-
- Error count, success count, and loading spinners are visible in the header even when collapsed
|
|
1162
|
-
|
|
1163
|
-
#### Action Types
|
|
1164
|
-
|
|
1165
|
-
```ts
|
|
1166
|
-
type ActionFunction = (context: ActionContext) => ActionResult | Promise<ActionResult>;
|
|
1167
|
-
|
|
1168
|
-
interface ActionInputField {
|
|
1169
|
-
name: string;
|
|
1170
|
-
type: 'string' | 'number' | 'boolean' | 'select';
|
|
1171
|
-
label?: string; // Display label (defaults to name)
|
|
1172
|
-
required?: boolean;
|
|
1173
|
-
default?: string | number | boolean;
|
|
1174
|
-
options?: Array<{ label: string; value: string }>; // For 'select' type only
|
|
1175
|
-
}
|
|
1176
|
-
|
|
1177
|
-
type ActionInputValues = Record<string, string | number | boolean>;
|
|
1178
|
-
|
|
1179
|
-
interface RegisteredAction {
|
|
1180
|
-
name: string;
|
|
1181
|
-
fn: ActionFunction;
|
|
1182
|
-
condition?: ConditionFunction;
|
|
1183
|
-
scope: EvalScope; // 'session' | 'subagent' | 'both'
|
|
1184
|
-
subagentType?: string;
|
|
1185
|
-
cache: boolean; // default: true
|
|
1186
|
-
inputs?: ActionInputField[];
|
|
1187
|
-
}
|
|
1188
|
-
|
|
1189
|
-
interface ActionRunResult {
|
|
1190
|
-
name: string;
|
|
1191
|
-
output?: string;
|
|
1192
|
-
status: 'success' | 'error';
|
|
1193
|
-
message?: string;
|
|
1194
|
-
durationMs: number;
|
|
1195
|
-
error?: string; // Present when the action threw or its condition threw
|
|
1196
|
-
skipped?: boolean; // Present when the global/per-action condition returned false
|
|
1197
|
-
inputs?: ActionInputValues; // Snapshot of input values used for this run
|
|
1198
|
-
}
|
|
1199
|
-
|
|
1200
|
-
interface ActionRunSummary {
|
|
1201
|
-
results: ActionRunResult[];
|
|
1202
|
-
totalDurationMs: number;
|
|
1203
|
-
errorCount: number;
|
|
1204
|
-
skippedCount: number;
|
|
1205
|
-
}
|
|
1206
|
-
```
|
|
1207
|
-
|
|
1208
|
-
---
|
|
1209
|
-
|
|
1210
|
-
### `app.alert(name, fn, options?)`
|
|
1211
|
-
|
|
1212
|
-
Register an alert callback that fires after all evals and enrichments complete for a session. Alerts are the hook point for Slack webhooks, CI notifications, logging, or any post-processing logic.
|
|
1213
|
-
|
|
1214
|
-
- **`name`** - unique string identifier for the alert (re-registering replaces the previous callback)
|
|
1215
|
-
- **`fn`** - function receiving an `AlertContext` and returning `void | Promise<void>`
|
|
1216
|
-
- **`options`** - optional `AlertOptions` object:
|
|
1217
|
-
- **`suppressOnRecompute`** (default: `true`) — when `true`, this alert is suppressed during recomputes triggered by code changes (e.g., modifying an eval function). Alerts still fire when new session data arrives. Set to `false` for alerts that should always fire regardless of the trigger.
|
|
1218
|
-
|
|
1219
|
-
#### When Alerts Fire
|
|
1220
|
-
|
|
1221
|
-
Alerts fire **once per session content version** — the unified queue checks after each item completes whether all eval/enrichment work for that session is done (no pending or processing items remain). When the last item completes, a debounced check fires alerts if the dedup marker allows it.
|
|
1222
|
-
|
|
1223
|
-
| Trigger | Behavior |
|
|
1224
|
-
|---------|----------|
|
|
1225
|
-
| **Initial page load** | `queuePerItem()` per eval/enrichment → alerts fire when last item completes |
|
|
1226
|
-
| **Background processing** | `scanAndEnqueue()` → individual items at LOW priority → alerts fire when last item completes |
|
|
1227
|
-
| **Page reload (all cached)** | No items enter the queue → no alerts fire |
|
|
1228
|
-
| **Re-run All** | Clears dedup marker first, then parallel `queuePerItem()` calls → alerts fire exactly once |
|
|
1229
|
-
| **Re-run single** | `queuePerItem()` for one item → alerts do NOT re-fire (dedup marker still valid) |
|
|
1230
|
-
| **Session content changes** | New `contentHash` invalidates the dedup marker → alerts fire again |
|
|
1231
|
-
| **Alert registrations change** | New `alertsHash` invalidates the dedup marker → alerts fire again |
|
|
1232
|
-
|
|
1233
|
-
Each alert callback is individually try/caught via `Promise.allSettled`. A throwing alert never blocks other alerts or eval processing. Errors are logged to console.
|
|
1234
|
-
|
|
1235
|
-
#### `AlertContext`
|
|
1236
|
-
|
|
1237
|
-
```ts
|
|
1238
|
-
interface AlertContext {
|
|
1239
|
-
projectName: string; // Encoded project folder name
|
|
1240
|
-
sessionId: string; // Session UUID
|
|
1241
|
-
evalSummary?: EvalRunSummary; // Present when evals registered & ran
|
|
1242
|
-
enrichSummary?: EnrichRunSummary; // Present when enrichments registered & ran
|
|
1243
|
-
}
|
|
1244
|
-
```
|
|
1245
|
-
|
|
1246
|
-
`evalSummary` and `enrichSummary` contain **all** results for the session — both cached and freshly computed. The alert always sees the complete picture.
|
|
1247
|
-
|
|
1248
|
-
#### Examples
|
|
1249
|
-
|
|
1250
|
-
```js
|
|
1251
|
-
// Slack webhook on eval failure
|
|
1252
|
-
app.alert('slack-on-failure', async ({ projectName, sessionId, evalSummary }) => {
|
|
1253
|
-
if (evalSummary && evalSummary.failCount > 0) {
|
|
1254
|
-
await fetch('https://hooks.slack.com/services/T.../B.../xxx', {
|
|
1255
|
-
method: 'POST',
|
|
1256
|
-
headers: { 'Content-Type': 'application/json' },
|
|
1257
|
-
body: JSON.stringify({
|
|
1258
|
-
text: `${evalSummary.failCount} evals failed for ${projectName}/${sessionId}`,
|
|
1259
|
-
}),
|
|
1260
|
-
});
|
|
1261
|
-
}
|
|
1262
|
-
});
|
|
1263
|
-
|
|
1264
|
-
// Console logging
|
|
1265
|
-
app.alert('log-results', ({ projectName, sessionId, evalSummary, enrichSummary }) => {
|
|
1266
|
-
const evals = evalSummary
|
|
1267
|
-
? `${evalSummary.passCount} pass, ${evalSummary.failCount} fail, ${evalSummary.errorCount} error`
|
|
1268
|
-
: 'no evals';
|
|
1269
|
-
const enrichments = enrichSummary
|
|
1270
|
-
? `${enrichSummary.results.length} enrichments (${enrichSummary.errorCount} errors)`
|
|
1271
|
-
: 'no enrichments';
|
|
1272
|
-
console.log(`[ALERT] ${projectName}/${sessionId}: ${evals} | ${enrichments}`);
|
|
1273
|
-
});
|
|
1274
|
-
|
|
1275
|
-
// Write results to a file for CI
|
|
1276
|
-
app.alert('ci-report', async ({ projectName, sessionId, evalSummary }) => {
|
|
1277
|
-
if (!evalSummary) return;
|
|
1278
|
-
const fs = await import('fs/promises');
|
|
1279
|
-
await fs.appendFile('eval-results.jsonl', JSON.stringify({
|
|
1280
|
-
projectName,
|
|
1281
|
-
sessionId,
|
|
1282
|
-
passCount: evalSummary.passCount,
|
|
1283
|
-
failCount: evalSummary.failCount,
|
|
1284
|
-
results: evalSummary.results.map(r => ({ name: r.name, pass: r.pass, score: r.score })),
|
|
1285
|
-
timestamp: new Date().toISOString(),
|
|
1286
|
-
}) + '\n');
|
|
1287
|
-
});
|
|
1288
|
-
|
|
1289
|
-
// Alert that fires even on code-change recomputes (e.g., audit logging)
|
|
1290
|
-
app.alert('audit-log', ({ projectName, sessionId }) => {
|
|
1291
|
-
console.log(`[AUDIT] Processed ${projectName}/${sessionId}`);
|
|
1292
|
-
}, { suppressOnRecompute: false });
|
|
1293
|
-
```
|
|
1294
|
-
|
|
1295
|
-
#### Alert Types
|
|
1296
|
-
|
|
1297
|
-
```ts
|
|
1298
|
-
type AlertFunction = (context: AlertContext) => void | Promise<void>;
|
|
1299
|
-
|
|
1300
|
-
interface AlertOptions {
|
|
1301
|
-
suppressOnRecompute?: boolean; // Default: true
|
|
1302
|
-
}
|
|
1303
|
-
|
|
1304
|
-
interface RegisteredAlert {
|
|
1305
|
-
name: string;
|
|
1306
|
-
fn: AlertFunction;
|
|
1307
|
-
suppressOnRecompute: boolean;
|
|
1308
|
-
}
|
|
1309
|
-
|
|
1310
|
-
interface EvalRunSummary {
|
|
1311
|
-
results: EvalRunResult[];
|
|
1312
|
-
totalDurationMs: number;
|
|
1313
|
-
passCount: number;
|
|
1314
|
-
failCount: number;
|
|
1315
|
-
errorCount: number;
|
|
1316
|
-
skippedCount: number;
|
|
1317
|
-
}
|
|
1318
|
-
|
|
1319
|
-
interface EnrichRunSummary {
|
|
1320
|
-
results: EnrichRunResult[];
|
|
1321
|
-
totalDurationMs: number;
|
|
1322
|
-
errorCount: number;
|
|
1323
|
-
skippedCount: number;
|
|
1324
|
-
}
|
|
1325
|
-
```
|
|
1326
|
-
|
|
1327
|
-
---
|
|
1328
|
-
|
|
1329
|
-
### `app.dashboard.view(name, options?)`
|
|
1330
|
-
|
|
1331
|
-
Create a named dashboard view. Views group related filters into focused sets. Each view appears as a card on `/dashboard` and has its own route at `/dashboard/[viewName]`.
|
|
1332
|
-
|
|
1333
|
-
- **`name`** - unique string identifier for the view
|
|
1334
|
-
- **`options.label`** - human-readable label displayed on the card (defaults to the name)
|
|
1335
|
-
- **`options.cachedOnly`** - only show sessions with complete cache entries (default: `false`)
|
|
1336
|
-
|
|
1337
|
-
Returns a `DashboardViewBuilder` with chainable `.filter()` and `.aggregate()` methods.
|
|
1338
|
-
|
|
1339
|
-
#### `DashboardViewBuilder`
|
|
1340
|
-
|
|
1341
|
-
```ts
|
|
1342
|
-
interface DashboardViewBuilder {
|
|
1343
|
-
filter(name: string, fn: FilterFunction, options?: FilterOptions): DashboardViewBuilder;
|
|
1344
|
-
filter(options: { preBuilt: string[] }): DashboardViewBuilder;
|
|
1345
|
-
aggregate(name: string, definition: AggregateDefinition, options?: AggregateOptions): DashboardViewBuilder;
|
|
1346
|
-
}
|
|
1347
|
-
```
|
|
1348
|
-
|
|
1349
|
-
The view builder's `.filter()` returns the view builder (not the app), so you can chain multiple filters within a view:
|
|
1350
|
-
|
|
1351
|
-
```js
|
|
1352
|
-
app.dashboard.view('performance', { label: 'Performance Metrics' })
|
|
1353
|
-
.filter('turn-count', ({ stats }) => stats.turnCount, { label: 'Turn Count' })
|
|
1354
|
-
.filter('tool-calls', ({ stats }) => stats.toolCallCount, { label: 'Tool Calls' });
|
|
1355
|
-
```
|
|
1356
|
-
|
|
1357
|
-
#### `ViewOptions`
|
|
1358
|
-
|
|
1359
|
-
```ts
|
|
1360
|
-
interface ViewOptions {
|
|
1361
|
-
label?: string; // Human-readable label (defaults to name)
|
|
1362
|
-
cachedOnly?: boolean; // Only show sessions with complete cache entries (default: false)
|
|
1363
|
-
}
|
|
1364
|
-
```
|
|
1365
|
-
|
|
1366
|
-
#### `cachedOnly` Views
|
|
1367
|
-
|
|
1368
|
-
When `cachedOnly: true`, uncached sessions are **pre-filtered** before any filter computation runs — sessions without complete cache entries are skipped entirely. This avoids expensive log parsing, eval cache reads, and filter function execution for uncached sessions. Sessions appear in the view as the background queue processes them.
|
|
1369
|
-
|
|
1370
|
-
Filter functions in `cachedOnly` views receive `ctx.evalResults` — a record of cached eval results keyed by eval name. Each entry has `pass`, `score`, and optional `error`/`message` fields. This lets you create per-eval score filters (range sliders) and composite boolean filters without re-running evals.
|
|
1371
|
-
|
|
1372
|
-
```js
|
|
1373
|
-
// Only show sessions that have been fully processed
|
|
1374
|
-
app.dashboard.view('quality', { cachedOnly: true })
|
|
1375
|
-
.filter('has-errors', ({ entries }) =>
|
|
1376
|
-
entries.some(e => e.type === 'assistant' &&
|
|
1377
|
-
Array.isArray(e.message?.content) &&
|
|
1378
|
-
e.message.content.some(b => b.type === 'tool_use' && b.is_error)),
|
|
1379
|
-
{ label: 'Has Errors' });
|
|
1380
|
-
|
|
1381
|
-
// Per-eval score filters using evalResults
|
|
1382
|
-
app.dashboard.view('eval-results', { cachedOnly: true, label: 'Eval Score Filters' })
|
|
1383
|
-
.filter('quality-score', (ctx) => ctx.evalResults?.['quality']?.score ?? 0,
|
|
1384
|
-
{ label: 'Quality Score' })
|
|
1385
|
-
.filter('all-passing', (ctx) => {
|
|
1386
|
-
if (!ctx.evalResults) return false;
|
|
1387
|
-
return Object.values(ctx.evalResults).every(r => r.pass);
|
|
1388
|
-
}, { label: 'All Evals Passing' });
|
|
1389
|
-
```
|
|
1390
|
-
|
|
1391
|
-
#### Routing
|
|
1392
|
-
|
|
1393
|
-
| URL | Behavior |
|
|
1394
|
-
|-----|----------|
|
|
1395
|
-
| `/dashboard` | If named views exist, shows a view index (card grid). If only default filters, shows them directly. If nothing registered, shows an empty state. |
|
|
1396
|
-
| `/dashboard/[viewName]` | Specific named view with its filters and sessions table. |
|
|
1397
|
-
|
|
1398
|
-
#### Examples
|
|
1399
|
-
|
|
1400
|
-
```js
|
|
1401
|
-
// Two focused views
|
|
1402
|
-
app.dashboard.view('performance', { label: 'Performance Metrics' })
|
|
1403
|
-
.filter('turn-count', ({ stats }) => stats.turnCount, { label: 'Turn Count' })
|
|
1404
|
-
.filter('tool-calls', ({ stats }) => stats.toolCallCount, { label: 'Tool Calls' });
|
|
1405
|
-
|
|
1406
|
-
app.dashboard.view('quality', { label: 'Quality Checks' })
|
|
1407
|
-
.filter('has-errors', ({ entries }) =>
|
|
1408
|
-
entries.some(e => e.type === 'assistant' &&
|
|
1409
|
-
Array.isArray(e.message?.content) &&
|
|
1410
|
-
e.message.content.some(b => b.type === 'tool_use' && b.is_error)),
|
|
1411
|
-
{ label: 'Has Errors' })
|
|
1412
|
-
.filter('primary-model', ({ stats }) => stats.models[0] || 'unknown',
|
|
1413
|
-
{ label: 'Primary Model' });
|
|
1414
|
-
|
|
1415
|
-
// Backward-compat: app.dashboard.filter() still works (goes to "default" view)
|
|
1416
|
-
app.dashboard.filter('uses-subagents', ({ stats }) => stats.subagentCount > 0,
|
|
1417
|
-
{ label: 'Uses Subagents' }
|
|
1418
|
-
);
|
|
1419
|
-
```
|
|
1420
|
-
|
|
1421
|
-
---
|
|
1422
|
-
|
|
1423
|
-
### `app.dashboard.filter(name, fn, options?)`
|
|
1424
|
-
|
|
1425
|
-
Register a dashboard filter on the **default** view. For organizing filters into named views, see `app.dashboard.view()` above.
|
|
1426
|
-
|
|
1427
|
-
- **`name`** - unique string identifier for the filter
|
|
1428
|
-
- **`fn`** - function receiving an `EvalContext` and returning a `FilterValue` (`boolean`, `number`, or `string`)
|
|
1429
|
-
- **`options.label`** - human-readable label for the filter tile (defaults to the name)
|
|
1430
|
-
- **`options.condition`** - optional condition function to gate this filter
|
|
1431
|
-
|
|
1432
|
-
The return type auto-determines the UI control:
|
|
1433
|
-
|
|
1434
|
-
| Return type | UI control | Behavior |
|
|
1435
|
-
|-------------|-----------|----------|
|
|
1436
|
-
| `boolean` | Three-state toggle | Cycle: All → Yes → No → All |
|
|
1437
|
-
| `number` | Range slider | Dual-handle slider with min/max inputs. Step auto-adapts: integer data uses steps of 1/5/10; float data uses 0.01 (range≤1), 0.1 (range≤10), or 1 |
|
|
1438
|
-
| `string` | Multi-select dropdown | Checkboxes with Select All / Clear |
|
|
1439
|
-
|
|
1440
|
-
Filter values are computed server-side with an incremental index (only new/changed sessions are reprocessed). Filtering and pagination happen server-side, returning only the matching page of results.
|
|
1441
|
-
|
|
1442
|
-
#### Examples
|
|
1443
|
-
|
|
1444
|
-
```js
|
|
1445
|
-
// Boolean filter: toggle sessions that have tool errors
|
|
1446
|
-
app.dashboard.filter('has-errors', ({ entries }) =>
|
|
1447
|
-
entries.some(e =>
|
|
1448
|
-
e.type === 'assistant' &&
|
|
1449
|
-
Array.isArray(e.message?.content) &&
|
|
1450
|
-
e.message.content.some(b => b.type === 'tool_use' && b.is_error)
|
|
1451
|
-
),
|
|
1452
|
-
{ label: 'Has Errors' }
|
|
1453
|
-
);
|
|
1454
|
-
|
|
1455
|
-
// Number filter: range slider for turn count
|
|
1456
|
-
app.dashboard.filter('turn-count', ({ stats }) => stats.turnCount,
|
|
1457
|
-
{ label: 'Turn Count' }
|
|
1458
|
-
);
|
|
1459
|
-
|
|
1460
|
-
// String filter: multi-select for primary model
|
|
1461
|
-
app.dashboard.filter('primary-model', ({ stats }) => stats.models[0] || 'unknown',
|
|
1462
|
-
{ label: 'Primary Model' }
|
|
1463
|
-
);
|
|
1464
|
-
|
|
1465
|
-
// Number filter: range slider for tool call count
|
|
1466
|
-
app.dashboard.filter('tool-calls', ({ stats }) => stats.toolCallCount,
|
|
1467
|
-
{ label: 'Tool Calls' }
|
|
1468
|
-
);
|
|
1469
|
-
|
|
1470
|
-
// Boolean filter: sessions with subagents
|
|
1471
|
-
app.dashboard.filter('uses-subagents', ({ stats }) => stats.subagentCount > 0,
|
|
1472
|
-
{ label: 'Uses Subagents' }
|
|
1473
|
-
);
|
|
1474
|
-
|
|
1475
|
-
// String filter: session duration bucket
|
|
1476
|
-
app.dashboard.filter('duration-bucket', ({ stats }) => {
|
|
1477
|
-
const ms = parseInt(stats.duration) || 0;
|
|
1478
|
-
if (ms < 60000) return 'Under 1m';
|
|
1479
|
-
if (ms < 300000) return '1-5m';
|
|
1480
|
-
if (ms < 900000) return '5-15m';
|
|
1481
|
-
return 'Over 15m';
|
|
1482
|
-
}, { label: 'Duration' });
|
|
1483
|
-
|
|
1484
|
-
// With a per-filter condition: only compute for non-empty sessions
|
|
1485
|
-
app.dashboard.filter('avg-tools-per-turn',
|
|
1486
|
-
({ stats }) => stats.turnCount > 0
|
|
1487
|
-
? Math.round(stats.toolCallCount / stats.turnCount * 10) / 10
|
|
1488
|
-
: 0,
|
|
1489
|
-
{
|
|
1490
|
-
label: 'Avg Tools/Turn',
|
|
1491
|
-
condition: ({ entries }) => entries.length > 0,
|
|
1492
|
-
}
|
|
1493
|
-
);
|
|
1494
|
-
```
|
|
1495
|
-
|
|
1496
|
-
#### How Filters Work
|
|
1497
|
-
|
|
1498
|
-
1. When the `/dashboard` page loads, the server action discovers all projects and sessions
|
|
1499
|
-
2. An incremental `DashboardIndex` diffs the discovered sessions against previously computed rows — only new or changed sessions are processed (unchanged sessions are skipped entirely)
|
|
1500
|
-
3. For new/changed sessions, it checks the per-session disk cache first, then falls back to parsing the JSONL log and running filters
|
|
1501
|
-
4. Filter metadata (min/max for numbers, unique values for strings) is rebuilt from accumulators only when the session set changes
|
|
1502
|
-
5. Server-side filtering and pagination are applied — only the matching page of results is sent to the client
|
|
1503
|
-
6. User interactions (toggle, slider, dropdown) trigger a debounced (300ms) server re-fetch with the new filter state
|
|
1504
|
-
|
|
1505
|
-
#### Global Condition
|
|
1506
|
-
|
|
1507
|
-
Dashboard filters respect the global condition set via `app.condition()`. If the global condition returns `false` for a session, all filters are skipped for that session.
|
|
1508
|
-
|
|
1509
|
-
```js
|
|
1510
|
-
// Skip empty sessions across evals, enrichments, AND dashboard filters
|
|
1511
|
-
app.condition(({ entries }) => entries.length > 0);
|
|
1512
|
-
```
|
|
1513
|
-
|
|
1514
|
-
---
|
|
1515
|
-
|
|
1516
|
-
### `app.dashboard.filter({ preBuilt })`
|
|
1517
|
-
|
|
1518
|
-
Enable pre-built filters on the dashboard. Pre-built filters provide common filtering functionality with optimized UX — no custom filter function needed.
|
|
1519
|
-
|
|
1520
|
-
- **`preBuilt`** - array of pre-built filter names to enable
|
|
1521
|
-
|
|
1522
|
-
#### Available Pre-Built Filters
|
|
1523
|
-
|
|
1524
|
-
| Name | UI control | Description |
|
|
1525
|
-
|------|-----------|-------------|
|
|
1526
|
-
| `'lastModified'` | Date range picker (from/to) | Filters sessions by their last modified date |
|
|
1527
|
-
|
|
1528
|
-
#### Examples
|
|
1529
|
-
|
|
1530
|
-
```js
|
|
1531
|
-
// Enable on the default view
|
|
1532
|
-
app.dashboard.filter({ preBuilt: ['lastModified'] });
|
|
1533
|
-
|
|
1534
|
-
// Combine with custom filters
|
|
1535
|
-
app.dashboard.filter({ preBuilt: ['lastModified'] });
|
|
1536
|
-
app.dashboard.filter('model', ({ stats }) => stats.models[0] || 'unknown',
|
|
1537
|
-
{ label: 'Model' });
|
|
1538
|
-
|
|
1539
|
-
// Enable on a named view
|
|
1540
|
-
app.dashboard.view('overview', { label: 'Overview' })
|
|
1541
|
-
.filter({ preBuilt: ['lastModified'] })
|
|
1542
|
-
.filter('turns', ({ stats }) => stats.turnCount, { label: 'Turns' });
|
|
1543
|
-
```
|
|
1544
|
-
|
|
1545
|
-
Pre-built date filters operate directly on the `DashboardSessionRow.lastModified` field — they don't require parsing session logs or running a filter function. The date range (min/max) is computed from the observed session data. Filtering is done server-side using normalized server-local calendar dates.
|
|
1546
|
-
|
|
1547
|
-
---
|
|
1548
|
-
|
|
1549
|
-
### `app.dashboard.aggregate(name, definition, options?)`
|
|
1550
|
-
|
|
1551
|
-
Register a cross-session aggregate on the **default** view. For organizing aggregates into named views, use `app.dashboard.view().aggregate()`.
|
|
1552
|
-
|
|
1553
|
-
- **`name`** - unique string identifier for the aggregate
|
|
1554
|
-
- **`definition`** - a `{ collect, reduce }` object
|
|
1555
|
-
- **`options.label`** - human-readable label for the aggregate section (defaults to the name)
|
|
1556
|
-
- **`options.condition`** - optional condition function to gate this aggregate per session
|
|
1557
|
-
|
|
1558
|
-
Provide a `{ collect, reduce }` object. The `collect` function runs per session, and `reduce` transforms all collected values into your output table.
|
|
1559
|
-
|
|
1560
|
-
#### Examples
|
|
1561
|
-
|
|
1562
|
-
```js
|
|
1563
|
-
// Eval pass rate summary table
|
|
1564
|
-
app.dashboard.aggregate('eval-summary', {
|
|
1565
|
-
collect: ({ evalResults }) => {
|
|
1566
|
-
const result = {};
|
|
1567
|
-
for (const [name, r] of Object.entries(evalResults)) {
|
|
1568
|
-
result[`${name}_pass`] = r.pass;
|
|
1569
|
-
result[`${name}_score`] = r.score;
|
|
1570
|
-
}
|
|
1571
|
-
return result;
|
|
1572
|
-
},
|
|
1573
|
-
reduce: (collected) => {
|
|
1574
|
-
const evalNames = new Set();
|
|
1575
|
-
for (const s of collected) {
|
|
1576
|
-
for (const key of Object.keys(s.values)) {
|
|
1577
|
-
if (key.endsWith('_pass')) evalNames.add(key.replace('_pass', ''));
|
|
1578
|
-
}
|
|
1579
|
-
}
|
|
1580
|
-
return Array.from(evalNames).map(name => ({
|
|
1581
|
-
'Eval': name,
|
|
1582
|
-
'Pass Rate': collected.filter(s => s.values[`${name}_pass`]).length / collected.length,
|
|
1583
|
-
'Avg Score': collected.reduce((sum, s) => {
|
|
1584
|
-
const v = s.values[`${name}_score`];
|
|
1585
|
-
return sum + (typeof v === 'number' ? v : 0);
|
|
1586
|
-
}, 0) / collected.length,
|
|
1587
|
-
}));
|
|
1588
|
-
},
|
|
1589
|
-
});
|
|
1590
|
-
|
|
1591
|
-
// Aggregates on named views alongside filters
|
|
1592
|
-
app.dashboard.view('quality', { label: 'Quality' })
|
|
1593
|
-
.aggregate('session-metrics', {
|
|
1594
|
-
collect: ({ stats }) => ({
|
|
1595
|
-
turnCount: stats.turnCount,
|
|
1596
|
-
toolCalls: stats.toolCallCount,
|
|
1597
|
-
}),
|
|
1598
|
-
reduce: (collected) => {
|
|
1599
|
-
const n = collected.length || 1;
|
|
1600
|
-
let turns = 0, tools = 0;
|
|
1601
|
-
for (const s of collected) {
|
|
1602
|
-
turns += typeof s.values.turnCount === 'number' ? s.values.turnCount : 0;
|
|
1603
|
-
tools += typeof s.values.toolCalls === 'number' ? s.values.toolCalls : 0;
|
|
1604
|
-
}
|
|
1605
|
-
return [
|
|
1606
|
-
{ Metric: 'Avg Turns', Value: +(turns / n).toFixed(1) },
|
|
1607
|
-
{ Metric: 'Avg Tool Calls', Value: +(tools / n).toFixed(1) },
|
|
1608
|
-
];
|
|
1609
|
-
},
|
|
1610
|
-
})
|
|
1611
|
-
.filter('turns', ({ stats }) => stats.turnCount, { label: 'Turns' });
|
|
1612
|
-
```
|
|
1613
|
-
|
|
1614
|
-
#### `AggregateContext`
|
|
1615
|
-
|
|
1616
|
-
The collect function receives an extended context:
|
|
1617
|
-
|
|
1618
|
-
```ts
|
|
1619
|
-
interface AggregateContext {
|
|
1620
|
-
entries: Record<string, unknown>[]; // Raw JSONL lines
|
|
1621
|
-
stats: EvalLogStats; // Computed stats
|
|
1622
|
-
projectName: string;
|
|
1623
|
-
sessionId: string;
|
|
1624
|
-
source: string;
|
|
1625
|
-
evalResults: Record<string, { pass: boolean; score: number; error?: string; message?: string }>;
|
|
1626
|
-
enrichResults: Record<string, Record<string, EnrichmentValue>>;
|
|
1627
|
-
filterValues: Record<string, FilterValue>;
|
|
1628
|
-
}
|
|
1629
|
-
```
|
|
1630
|
-
|
|
1631
|
-
#### Aggregate Types
|
|
1632
|
-
|
|
1633
|
-
```ts
|
|
1634
|
-
type AggregateValue = boolean | number | string;
|
|
1635
|
-
|
|
1636
|
-
type AggregateCollectFunction = (
|
|
1637
|
-
context: AggregateContext,
|
|
1638
|
-
) => Record<string, AggregateValue> | Promise<Record<string, AggregateValue>>;
|
|
1639
|
-
|
|
1640
|
-
type AggregateReduceFunction = (
|
|
1641
|
-
collected: CollectedSession[],
|
|
1642
|
-
) => AggregateTableRow[] | Promise<AggregateTableRow[]>;
|
|
1643
|
-
|
|
1644
|
-
type AggregateDefinition = {
|
|
1645
|
-
collect: AggregateCollectFunction;
|
|
1646
|
-
reduce: AggregateReduceFunction;
|
|
1647
|
-
};
|
|
1648
|
-
|
|
1649
|
-
interface AggregateOptions {
|
|
1650
|
-
label?: string;
|
|
1651
|
-
condition?: ConditionFunction;
|
|
1652
|
-
}
|
|
1653
|
-
|
|
1654
|
-
interface CollectedSession {
|
|
1655
|
-
projectName: string;
|
|
1656
|
-
sessionId: string;
|
|
1657
|
-
values: Record<string, AggregateValue>;
|
|
1658
|
-
}
|
|
1659
|
-
|
|
1660
|
-
type AggregateTableRow = Record<string, AggregateValue>;
|
|
1661
|
-
|
|
1662
|
-
interface AggregatePayload {
|
|
1663
|
-
aggregates: {
|
|
1664
|
-
name: string;
|
|
1665
|
-
label: string;
|
|
1666
|
-
rows: AggregateTableRow[];
|
|
1667
|
-
columns: string[];
|
|
1668
|
-
}[];
|
|
1669
|
-
totalSessions: number;
|
|
1670
|
-
totalDurationMs: number;
|
|
1671
|
-
}
|
|
1672
|
-
```
|
|
1673
|
-
|
|
1674
|
-
---
|
|
1675
|
-
|
|
1676
|
-
### `app.auth(options)`
|
|
1677
|
-
|
|
1678
|
-
Configure username/password authentication. When at least one user is configured (via `app.auth()`, `--auth-user`, or `CLAUDEYE_AUTH_USERS` env var), all UI routes are protected by a login page. Users from all sources are merged.
|
|
1679
|
-
|
|
1680
|
-
- **`options.users`** - array of `{ username: string; password: string }` objects
|
|
1681
|
-
|
|
1682
|
-
```ts
|
|
1683
|
-
app.auth({ users: [
|
|
1684
|
-
{ username: 'admin', password: 'secret' },
|
|
1685
|
-
{ username: 'viewer', password: 'readonly' },
|
|
1686
|
-
] });
|
|
1687
|
-
```
|
|
1688
|
-
|
|
1689
|
-
Chainable — returns the app instance:
|
|
1690
|
-
|
|
1691
|
-
```js
|
|
1692
|
-
app
|
|
1693
|
-
.auth({ users: [{ username: 'admin', password: 'secret' }] })
|
|
1694
|
-
.eval('my-eval', fn)
|
|
1695
|
-
.listen();
|
|
1696
|
-
```
|
|
1697
|
-
|
|
1698
|
-
When auth is active:
|
|
1699
|
-
- All UI routes redirect to `/login` for unauthenticated users
|
|
1700
|
-
- A signed HMAC-SHA256 session cookie (`claudeye_session`) is set on login, with 24h expiry
|
|
1701
|
-
- The navbar shows a **Sign out** button
|
|
1702
|
-
- If no users are configured, auth is completely disabled (no login page, no blocking)
|
|
1703
|
-
|
|
1704
|
-
#### Multiple Sources
|
|
1705
|
-
|
|
1706
|
-
Users from CLI, environment, and API are merged:
|
|
1707
|
-
|
|
1708
|
-
```bash
|
|
1709
|
-
# CLI
|
|
1710
|
-
claudeye --evals ./my-evals.js --auth-user ops:pass123
|
|
1711
|
-
|
|
1712
|
-
# Environment (comma-separated user:password pairs)
|
|
1713
|
-
CLAUDEYE_AUTH_USERS=admin:secret claudeye --evals ./my-evals.js
|
|
1714
|
-
|
|
1715
|
-
# API (in my-evals.js)
|
|
1716
|
-
app.auth({ users: [{ username: 'dev', password: 'devpass' }] });
|
|
1717
|
-
```
|
|
1718
|
-
|
|
1719
|
-
All three users (`ops`, `admin`, `dev`) would be valid.
|
|
1720
|
-
|
|
1721
|
-
---
|
|
1722
|
-
|
|
1723
|
-
### `app.listen(port?, options?)`
|
|
1724
|
-
|
|
1725
|
-
Start the Claudeye dashboard server.
|
|
1726
|
-
|
|
1727
|
-
- **`port`** - port number (default: 8020)
|
|
1728
|
-
- **`options.host`** - bind address (default: `"localhost"`, use `"0.0.0.0"` for LAN)
|
|
1729
|
-
- **`options.open`** - auto-open browser (default: `true`)
|
|
1730
|
-
|
|
1731
|
-
When the file is loaded via `--evals` or `CLAUDEYE_EVALS_MODULE`, `listen()` is a no-op. It won't spawn a duplicate server.
|
|
1732
|
-
|
|
1733
|
-
```js
|
|
1734
|
-
const app = createApp();
|
|
1735
|
-
|
|
1736
|
-
app.eval('my-eval', fn);
|
|
1737
|
-
app.enrich('my-enricher', fn);
|
|
1738
|
-
|
|
1739
|
-
// Only starts a server when run directly with `node` or `bun`
|
|
1740
|
-
app.listen(3000, { host: '0.0.0.0', open: false });
|
|
1741
|
-
```
|
|
1742
|
-
|
|
1743
|
-
You can also run your evals file directly with `bun my-evals.js` (or `node my-evals.js`) if you include `app.listen()`. This spawns the dashboard as a child process.
|
|
1744
|
-
|
|
1745
|
-
---
|
|
1746
|
-
|
|
1747
|
-
## Subagent Scope
|
|
1748
|
-
|
|
1749
|
-
Evals, enrichments, and actions run at the session level by default. Use the `scope` option to target subagent logs.
|
|
1750
|
-
|
|
1751
|
-
### Scope Options
|
|
1752
|
-
|
|
1753
|
-
| Scope | Runs at session level | Runs at subagent level |
|
|
1754
|
-
|-------|:---:|:---:|
|
|
1755
|
-
| `'session'` (default) | Yes | No |
|
|
1756
|
-
| `'subagent'` | No | Yes |
|
|
1757
|
-
| `'both'` | Yes | Yes |
|
|
1758
|
-
|
|
1759
|
-
### Subagent Context
|
|
1760
|
-
|
|
1761
|
-
When running at subagent level, the `EvalContext` includes additional metadata:
|
|
1762
|
-
|
|
1763
|
-
```js
|
|
1764
|
-
app.eval('adaptive-eval', (ctx) => {
|
|
1765
|
-
if (ctx.source !== 'session') {
|
|
1766
|
-
// Running at subagent level — source is "agent-{id}"
|
|
1767
|
-
console.log(ctx.source); // e.g. 'agent-a1b2c3'
|
|
1768
|
-
console.log(ctx.subagentType); // e.g. 'Explore'
|
|
1769
|
-
console.log(ctx.subagentDescription); // e.g. 'Search for auth code'
|
|
1770
|
-
console.log(ctx.parentSessionId); // parent session ID
|
|
1771
|
-
}
|
|
1772
|
-
return { pass: true };
|
|
1773
|
-
}, { scope: 'both' });
|
|
1774
|
-
```
|
|
1775
|
-
|
|
1776
|
-
### Combined Data in Subagent Scope
|
|
1777
|
-
|
|
1778
|
-
Subagent-scoped evals and enrichments receive the full combined data (session + all subagents), not just the subagent's own entries. The `source` field in `EvalContext` directly matches the `_source` value on entries, so you can filter easily:
|
|
1779
|
-
|
|
1780
|
-
```js
|
|
1781
|
-
// Subagent-scoped eval that filters to its own entries
|
|
1782
|
-
app.eval('explore-thoroughness', ({ entries, source }) => {
|
|
1783
|
-
const myEntries = entries.filter(e => e._source === source);
|
|
1784
|
-
return {
|
|
1785
|
-
pass: myEntries.length > 5,
|
|
1786
|
-
score: Math.min(myEntries.length / 20, 1),
|
|
1787
|
-
};
|
|
1788
|
-
}, { scope: 'subagent', subagentType: 'Explore' });
|
|
1789
|
-
```
|
|
1790
|
-
|
|
1791
|
-
### SubagentType Filtering
|
|
1792
|
-
|
|
1793
|
-
When you specify `subagentType`, the eval/enrichment only runs for subagents of that type. Subagents of other types will not see the eval panel at all.
|
|
1794
|
-
|
|
1795
|
-
```js
|
|
1796
|
-
// Only runs for Explore subagents
|
|
1797
|
-
app.eval('explore-thoroughness', ({ entries, source }) => {
|
|
1798
|
-
const myEntries = entries.filter(e => e._source === source);
|
|
1799
|
-
return {
|
|
1800
|
-
pass: myEntries.length > 5,
|
|
1801
|
-
score: Math.min(myEntries.length / 20, 1),
|
|
1802
|
-
message: `${myEntries.length} entries explored`,
|
|
1803
|
-
};
|
|
1804
|
-
}, { scope: 'subagent', subagentType: 'Explore' });
|
|
1805
|
-
|
|
1806
|
-
// Runs for all subagent types
|
|
1807
|
-
app.eval('agent-efficiency', ({ stats }) => ({
|
|
1808
|
-
pass: stats.turnCount <= 10,
|
|
1809
|
-
score: Math.max(0, 1 - stats.turnCount / 20),
|
|
1810
|
-
message: `${stats.turnCount} turns`,
|
|
1811
|
-
}), { scope: 'subagent' });
|
|
1812
|
-
|
|
1813
|
-
// Subagent-scoped enrichment
|
|
1814
|
-
app.enrich('agent-summary',
|
|
1815
|
-
({ stats, entries }) => ({
|
|
1816
|
-
'Agent Turns': stats.turnCount,
|
|
1817
|
-
'Agent Tool Calls': stats.toolCallCount,
|
|
1818
|
-
'Agent Entries': entries.length,
|
|
1819
|
-
}),
|
|
1820
|
-
{ scope: 'subagent' }
|
|
1821
|
-
);
|
|
1822
|
-
|
|
1823
|
-
// Scoped to both session and subagent level
|
|
1824
|
-
app.eval('quality-check', ({ stats, source }) => ({
|
|
1825
|
-
pass: stats.toolCallCount <= 20,
|
|
1826
|
-
score: Math.max(0, 1 - stats.toolCallCount / 40),
|
|
1827
|
-
message: `${source}: ${stats.toolCallCount} tool calls`,
|
|
1828
|
-
}), { scope: 'both' });
|
|
1829
|
-
```
|
|
1830
|
-
|
|
1831
|
-
---
|
|
1832
|
-
|
|
1833
|
-
## Evaluation Order
|
|
1834
|
-
|
|
1835
|
-
When a session is loaded, conditions are evaluated in this order:
|
|
1836
|
-
|
|
1837
|
-
```
|
|
1838
|
-
1. Global condition checked
|
|
1839
|
-
|-- Returns false or throws -> ALL evals/enrichments marked "skipped"
|
|
1840
|
-
\-- Returns true -> proceed to step 2
|
|
1841
|
-
|
|
1842
|
-
2. For each eval/enrichment:
|
|
1843
|
-
|-- Has per-item condition?
|
|
1844
|
-
| |-- Returns false -> that item marked "skipped"
|
|
1845
|
-
| |-- Throws -> that item marked "errored" (not skipped)
|
|
1846
|
-
| \-- Returns true -> run the function
|
|
1847
|
-
\-- No condition -> run the function
|
|
1848
|
-
|
|
1849
|
-
3. Function executes
|
|
1850
|
-
|-- Returns result -> recorded normally
|
|
1851
|
-
\-- Throws -> marked "errored", other items still run
|
|
1852
|
-
```
|
|
1853
|
-
|
|
1854
|
-
---
|
|
1855
|
-
|
|
1856
|
-
## UI Behavior
|
|
1857
|
-
|
|
1858
|
-
In the dashboard, conditional results appear as follows:
|
|
1859
|
-
|
|
1860
|
-
| Status | Evals Panel | Enrichments Panel |
|
|
1861
|
-
|--------|-------------|-------------------|
|
|
1862
|
-
| **Skipped** | Grayed-out row with "skipped" label | Grayed-out row with "skipped" label |
|
|
1863
|
-
| **Condition error** | Row with warning icon and error message | Row with warning icon and error message |
|
|
1864
|
-
| **Passed / Data** | Green check with score bar | Key-value pairs grouped by enricher |
|
|
1865
|
-
| **Failed** | Red X with score bar | N/A |
|
|
1866
|
-
|
|
1867
|
-
Skipped items are counted separately in the summary bar (e.g. "2 passed, 1 skipped").
|
|
1868
|
-
|
|
1869
|
-
---
|
|
1870
|
-
|
|
1871
|
-
## Types
|
|
1872
|
-
|
|
1873
|
-
All TypeScript types exported from `claudeye`:
|
|
1874
|
-
|
|
1875
|
-
### `EvalContext`
|
|
1876
|
-
|
|
1877
|
-
Both evals and enrichers receive the same context object:
|
|
1878
|
-
|
|
1879
|
-
```ts
|
|
1880
|
-
interface EvalContext {
|
|
1881
|
-
entries: Record<string, unknown>[]; // Combined session + subagent JSONL lines, each tagged with `_source`
|
|
1882
|
-
stats: EvalLogStats; // Computed stats across all entries (session + subagent)
|
|
1883
|
-
projectName: string; // Encoded project folder name
|
|
1884
|
-
sessionId: string; // Session UUID
|
|
1885
|
-
source: string; // "session" or "agent-{id}" — matches entry._source directly
|
|
1886
|
-
subagentType?: string; // e.g. 'Explore', 'Bash' (subagent scope only)
|
|
1887
|
-
subagentDescription?: string; // Short description (subagent scope only)
|
|
1888
|
-
parentSessionId?: string; // Parent session ID (subagent scope only)
|
|
1889
|
-
evalResults?: Record<string, { // Cached eval results (cachedOnly views only)
|
|
1890
|
-
pass: boolean;
|
|
1891
|
-
score: number;
|
|
1892
|
-
error?: string;
|
|
1893
|
-
message?: string;
|
|
1894
|
-
}>;
|
|
1895
|
-
}
|
|
1896
|
-
```
|
|
1897
|
-
|
|
1898
|
-
`entries` contains the **raw JSONL data** from the session and all its subagents combined. Every line from the session log file and its subagent log files is parsed as JSON and included. Each entry has a `_source` field: `"session"` for main session entries, or `"agent-{id}"` for subagent entries. This means:
|
|
1899
|
-
|
|
1900
|
-
- Tool-result lines (which the display view merges into tool_use blocks) are present as separate entries
|
|
1901
|
-
- All entry types are included: `user`, `assistant`, `system`, `tool_result`, `queue-operation`, etc.
|
|
1902
|
-
- Properties are accessed directly (e.g. `e.usage?.total_tokens`) rather than through a `.raw` wrapper
|
|
1903
|
-
- Filter by `e._source === "session"` to get only main session data
|
|
1904
|
-
- Filter by `e._source` starting with `"agent-"` to get subagent data
|
|
1905
|
-
|
|
1906
|
-
### `EvalLogEntry` (helper type)
|
|
1907
|
-
|
|
1908
|
-
`EvalLogEntry` is exported as a convenience type for describing the display-oriented parsed entries, but it is **not** the type of `EvalContext.entries`. The entries passed to evals and enrichments are raw JSONL objects (`Record<string, unknown>[]`).
|
|
1909
|
-
|
|
1910
|
-
```ts
|
|
1911
|
-
interface EvalLogEntry {
|
|
1912
|
-
type: string;
|
|
1913
|
-
_source?: string; // "session" or "agent-{id}"
|
|
1914
|
-
uuid: string;
|
|
1915
|
-
parentUuid: string | null;
|
|
1916
|
-
timestamp: string;
|
|
1917
|
-
timestampMs: number;
|
|
1918
|
-
timestampFormatted: string;
|
|
1919
|
-
message?: {
|
|
1920
|
-
role: string;
|
|
1921
|
-
content: string | EvalContentBlock[];
|
|
1922
|
-
model?: string;
|
|
1923
|
-
};
|
|
1924
|
-
raw?: Record<string, unknown>;
|
|
1925
|
-
label?: string;
|
|
1926
|
-
}
|
|
1927
|
-
```
|
|
1928
|
-
|
|
1929
|
-
### `EvalLogStats`
|
|
1930
|
-
|
|
1931
|
-
> Stats are computed across all entries (session + subagent combined). Use `_source` filtering on entries before computing custom scoped metrics if needed.
|
|
1932
|
-
|
|
1933
|
-
```ts
|
|
1934
|
-
interface EvalLogStats {
|
|
1935
|
-
turnCount: number; // Number of conversation turns
|
|
1936
|
-
userCount: number; // Number of user messages
|
|
1937
|
-
assistantCount: number; // Number of assistant responses
|
|
1938
|
-
toolCallCount: number; // Total tool invocations
|
|
1939
|
-
subagentCount: number; // Number of subagent spawns
|
|
1940
|
-
duration: string; // Formatted duration (e.g. "2m 15s")
|
|
1941
|
-
models: string[]; // Distinct model IDs used
|
|
1942
|
-
}
|
|
1943
|
-
```
|
|
1944
|
-
|
|
1945
|
-
### `EvalResult`
|
|
1946
|
-
|
|
1947
|
-
```ts
|
|
1948
|
-
interface EvalResult {
|
|
1949
|
-
pass: boolean; // Did the eval pass?
|
|
1950
|
-
score?: number; // 0-1, clamped automatically (default: 1.0)
|
|
1951
|
-
message?: string; // Shown in the UI
|
|
1952
|
-
metadata?: Record<string, unknown>; // Arbitrary data
|
|
1953
|
-
}
|
|
1954
|
-
```
|
|
1955
|
-
|
|
1956
|
-
### `EnrichmentResult`
|
|
1957
|
-
|
|
1958
|
-
```ts
|
|
1959
|
-
type EnrichmentResult = Record<string, string | number | boolean>;
|
|
1960
|
-
```
|
|
1961
|
-
|
|
1962
|
-
### `ConditionFunction`
|
|
1963
|
-
|
|
1964
|
-
```ts
|
|
1965
|
-
type ConditionFunction = (context: EvalContext) => boolean | Promise<boolean>;
|
|
1966
|
-
```
|
|
1967
|
-
|
|
1968
|
-
### `FilterValue`
|
|
1969
|
-
|
|
1970
|
-
```ts
|
|
1971
|
-
type FilterValue = boolean | number | string;
|
|
1972
|
-
```
|
|
1973
|
-
|
|
1974
|
-
### `FilterFunction`
|
|
1975
|
-
|
|
1976
|
-
```ts
|
|
1977
|
-
type FilterFunction = (context: EvalContext) => FilterValue | Promise<FilterValue>;
|
|
1978
|
-
```
|
|
1979
|
-
|
|
1980
|
-
### `FilterOptions`
|
|
1981
|
-
|
|
1982
|
-
```ts
|
|
1983
|
-
interface FilterOptions {
|
|
1984
|
-
label?: string; // Human-readable tile label (defaults to name)
|
|
1985
|
-
condition?: ConditionFunction; // Per-filter gate
|
|
1986
|
-
}
|
|
1987
|
-
```
|
|
1988
|
-
|
|
1989
|
-
### `FilterMeta`
|
|
1990
|
-
|
|
1991
|
-
Metadata auto-derived from computed filter values. Discriminated union by `type`:
|
|
1992
|
-
|
|
1993
|
-
```ts
|
|
1994
|
-
type FilterMeta =
|
|
1995
|
-
| { type: 'boolean'; name: string; label: string }
|
|
1996
|
-
| { type: 'number'; name: string; label: string; min: number; max: number }
|
|
1997
|
-
| { type: 'string'; name: string; label: string; values: string[] }
|
|
1998
|
-
| { type: 'date'; name: string; label: string; min: string; max: string };
|
|
1999
|
-
```
|
|
2000
|
-
|
|
2001
|
-
### `DashboardPayload`
|
|
2002
|
-
|
|
2003
|
-
```ts
|
|
2004
|
-
interface DashboardPayload {
|
|
2005
|
-
sessions: DashboardSessionRow[]; // One page of matching sessions
|
|
2006
|
-
filterMeta: FilterMeta[]; // One per registered filter
|
|
2007
|
-
totalDurationMs: number; // Server-side computation time
|
|
2008
|
-
totalCount: number; // Total sessions before filtering
|
|
2009
|
-
matchingCount: number; // Total sessions after filtering
|
|
2010
|
-
page: number; // Current page (1-based)
|
|
2011
|
-
pageSize: number; // Items per page
|
|
2012
|
-
}
|
|
2013
|
-
|
|
2014
|
-
interface DashboardSessionRow {
|
|
2015
|
-
projectName: string;
|
|
2016
|
-
sessionId: string;
|
|
2017
|
-
lastModified: string; // ISO 8601
|
|
2018
|
-
lastModifiedFormatted: string; // Human-readable
|
|
2019
|
-
filterValues: Record<string, FilterValue>;
|
|
2020
|
-
}
|
|
2021
|
-
```
|
|
2022
|
-
|
|
2023
|
-
### `EvalScope`
|
|
2024
|
-
|
|
2025
|
-
```ts
|
|
2026
|
-
type EvalScope = 'session' | 'subagent' | 'both';
|
|
2027
|
-
```
|
|
2028
|
-
|
|
2029
|
-
---
|
|
2030
|
-
|
|
2031
|
-
## Examples
|
|
2032
|
-
|
|
2033
|
-
Complete, runnable example files. Save any of these as a `.js` file and run with `claudeye --evals ./your-file.js`.
|
|
2034
|
-
|
|
2035
|
-
### Example: Basic Evals & Enrichments
|
|
2036
|
-
|
|
2037
|
-
The quickstart example — define evals and enrichments in one file:
|
|
2038
|
-
|
|
2039
|
-
```js
|
|
2040
|
-
import { createApp } from 'claudeye';
|
|
2041
|
-
|
|
2042
|
-
const app = createApp();
|
|
2043
|
-
|
|
2044
|
-
// ── Global condition ────────────────────────────────────────────
|
|
2045
|
-
// Skip empty sessions across evals, enrichments, AND dashboard filters.
|
|
2046
|
-
app.condition(({ entries }) => entries.length > 0);
|
|
2047
|
-
|
|
2048
|
-
// ── Evals ───────────────────────────────────────────────────────
|
|
2049
|
-
|
|
2050
|
-
app.eval('under-50-turns', ({ stats }) => ({
|
|
2051
|
-
pass: stats.turnCount <= 50,
|
|
2052
|
-
score: Math.max(0, 1 - stats.turnCount / 100),
|
|
2053
|
-
message: `${stats.turnCount} turn(s)`,
|
|
2054
|
-
}));
|
|
2055
|
-
|
|
2056
|
-
app.eval('has-completion', ({ entries }) => {
|
|
2057
|
-
const last = [...entries].reverse().find(e => e.type === 'assistant');
|
|
2058
|
-
const hasText = last?.message?.content?.some?.(b => b.type === 'text');
|
|
2059
|
-
return {
|
|
2060
|
-
pass: !!hasText,
|
|
2061
|
-
score: hasText ? 1.0 : 0,
|
|
2062
|
-
message: hasText ? 'Ended with text' : 'No final text response',
|
|
2063
|
-
};
|
|
2064
|
-
});
|
|
2065
|
-
|
|
2066
|
-
app.eval('session-tool-count', ({ entries }) => {
|
|
2067
|
-
const sessionTools = entries
|
|
2068
|
-
.filter(e => e._source === 'session' && e.type === 'assistant')
|
|
2069
|
-
.flatMap(e => (e.message?.content || []).filter(b => b.type === 'tool_use'));
|
|
2070
|
-
return {
|
|
2071
|
-
pass: sessionTools.length <= 100,
|
|
2072
|
-
score: Math.max(0, 1 - sessionTools.length / 200),
|
|
2073
|
-
message: `${sessionTools.length} session-level tool calls`,
|
|
2074
|
-
};
|
|
2075
|
-
});
|
|
2076
|
-
|
|
2077
|
-
// ── Enrichments ─────────────────────────────────────────────────
|
|
2078
|
-
|
|
2079
|
-
app.enrich('session-overview', ({ stats }) => ({
|
|
2080
|
-
'Turns': stats.turnCount,
|
|
2081
|
-
'Tool Calls': stats.toolCallCount,
|
|
2082
|
-
'Subagents': stats.subagentCount,
|
|
2083
|
-
'Duration': stats.duration,
|
|
2084
|
-
'Models': stats.models.join(', ') || 'none',
|
|
2085
|
-
}));
|
|
2086
|
-
|
|
2087
|
-
app.listen();
|
|
2088
|
-
```
|
|
2089
|
-
|
|
2090
|
-
### Example: Dashboard Filters
|
|
2091
|
-
|
|
2092
|
-
Named dashboard views with focused filter sets, evals, and enrichments:
|
|
2093
|
-
|
|
2094
|
-
```js
|
|
2095
|
-
import { createApp } from 'claudeye';
|
|
2096
|
-
|
|
2097
|
-
const app = createApp();
|
|
2098
|
-
|
|
2099
|
-
// ── Global condition ────────────────────────────────────────────
|
|
2100
|
-
app.condition(({ entries }) => entries.length > 0);
|
|
2101
|
-
|
|
2102
|
-
// ── Performance view ────────────────────────────────────────────
|
|
2103
|
-
app.dashboard.view('performance', { label: 'Performance Metrics' })
|
|
2104
|
-
.filter({ preBuilt: ['lastModified'] })
|
|
2105
|
-
.filter('turn-count', ({ stats }) => stats.turnCount, { label: 'Turn Count' })
|
|
2106
|
-
.filter('tool-calls', ({ stats }) => stats.toolCallCount, { label: 'Tool Calls' })
|
|
2107
|
-
.filter('avg-tools-per-turn',
|
|
2108
|
-
({ stats }) => stats.turnCount > 0
|
|
2109
|
-
? Math.round(stats.toolCallCount / stats.turnCount * 10) / 10
|
|
2110
|
-
: 0,
|
|
2111
|
-
{
|
|
2112
|
-
label: 'Avg Tools/Turn',
|
|
2113
|
-
condition: ({ stats }) => stats.toolCallCount > 0,
|
|
2114
|
-
}
|
|
2115
|
-
);
|
|
2116
|
-
|
|
2117
|
-
// ── Quality view ────────────────────────────────────────────────
|
|
2118
|
-
app.dashboard.view('quality', { label: 'Quality Checks' })
|
|
2119
|
-
.filter('has-errors', ({ entries }) =>
|
|
2120
|
-
entries.some(e =>
|
|
2121
|
-
e.type === 'assistant' &&
|
|
2122
|
-
Array.isArray(e.message?.content) &&
|
|
2123
|
-
e.message.content.some(b => b.type === 'tool_use' && b.is_error)
|
|
2124
|
-
),
|
|
2125
|
-
{ label: 'Has Errors' }
|
|
2126
|
-
)
|
|
2127
|
-
.filter('primary-model', ({ stats }) => stats.models[0] || 'unknown',
|
|
2128
|
-
{ label: 'Primary Model' }
|
|
2129
|
-
)
|
|
2130
|
-
.filter('uses-subagents', ({ stats }) => stats.subagentCount > 0,
|
|
2131
|
-
{ label: 'Uses Subagents' }
|
|
2132
|
-
);
|
|
2133
|
-
|
|
2134
|
-
// ── Evals ───────────────────────────────────────────────────────
|
|
2135
|
-
|
|
2136
|
-
app.eval('under-50-turns', ({ stats }) => ({
|
|
2137
|
-
pass: stats.turnCount <= 50,
|
|
2138
|
-
score: Math.max(0, 1 - stats.turnCount / 100),
|
|
2139
|
-
message: `${stats.turnCount} turn(s)`,
|
|
2140
|
-
}));
|
|
2141
|
-
|
|
2142
|
-
app.eval('has-completion', ({ entries }) => {
|
|
2143
|
-
const last = [...entries].reverse().find(e => e.type === 'assistant');
|
|
2144
|
-
const hasText = last?.message?.content?.some?.(b => b.type === 'text');
|
|
2145
|
-
return {
|
|
2146
|
-
pass: !!hasText,
|
|
2147
|
-
score: hasText ? 1.0 : 0,
|
|
2148
|
-
message: hasText ? 'Ended with text' : 'No final text response',
|
|
2149
|
-
};
|
|
2150
|
-
});
|
|
2151
|
-
|
|
2152
|
-
app.eval('session-tool-count', ({ entries }) => {
|
|
2153
|
-
const sessionTools = entries
|
|
2154
|
-
.filter(e => e._source === 'session' && e.type === 'assistant')
|
|
2155
|
-
.flatMap(e => (e.message?.content || []).filter(b => b.type === 'tool_use'));
|
|
2156
|
-
return {
|
|
2157
|
-
pass: sessionTools.length <= 100,
|
|
2158
|
-
score: Math.max(0, 1 - sessionTools.length / 200),
|
|
2159
|
-
message: `${sessionTools.length} session-level tool calls`,
|
|
2160
|
-
};
|
|
2161
|
-
});
|
|
2162
|
-
|
|
2163
|
-
// ── Enrichments ─────────────────────────────────────────────────
|
|
2164
|
-
|
|
2165
|
-
app.enrich('session-overview', ({ stats }) => ({
|
|
2166
|
-
'Turns': stats.turnCount,
|
|
2167
|
-
'Tool Calls': stats.toolCallCount,
|
|
2168
|
-
'Subagents': stats.subagentCount,
|
|
2169
|
-
'Duration': stats.duration,
|
|
2170
|
-
'Models': stats.models.join(', ') || 'none',
|
|
2171
|
-
}));
|
|
2172
|
-
|
|
2173
|
-
app.listen();
|
|
2174
|
-
```
|
|
2175
|
-
|
|
2176
|
-
### Example: Multi-View Dashboard
|
|
2177
|
-
|
|
2178
|
-
Multiple named views with focused filter sets:
|
|
2179
|
-
|
|
2180
|
-
```js
|
|
2181
|
-
import { createApp } from 'claudeye';
|
|
2182
|
-
|
|
2183
|
-
const app = createApp();
|
|
2184
|
-
|
|
2185
|
-
// ── Performance view ────────────────────────────────────────────
|
|
2186
|
-
// Metrics about session length and tool usage.
|
|
2187
|
-
app.dashboard.view('performance', { label: 'Performance Metrics' })
|
|
2188
|
-
.filter('turn-count', ({ stats }) => stats.turnCount,
|
|
2189
|
-
{ label: 'Turn Count' })
|
|
2190
|
-
.filter('tool-calls', ({ stats }) => stats.toolCallCount,
|
|
2191
|
-
{ label: 'Tool Calls' })
|
|
2192
|
-
.filter('uses-subagents', ({ stats }) => stats.subagentCount > 0,
|
|
2193
|
-
{ label: 'Uses Subagents' });
|
|
2194
|
-
|
|
2195
|
-
// ── Quality view ────────────────────────────────────────────────
|
|
2196
|
-
// Error and model analysis.
|
|
2197
|
-
app.dashboard.view('quality', { label: 'Quality Checks' })
|
|
2198
|
-
.filter('has-errors', ({ entries }) =>
|
|
2199
|
-
entries.some(e =>
|
|
2200
|
-
e.type === 'assistant' &&
|
|
2201
|
-
Array.isArray(e.message?.content) &&
|
|
2202
|
-
e.message.content.some(b => b.type === 'tool_use' && b.is_error)
|
|
2203
|
-
),
|
|
2204
|
-
{ label: 'Has Errors' })
|
|
2205
|
-
.filter('primary-model', ({ stats }) => stats.models[0] || 'unknown',
|
|
2206
|
-
{ label: 'Primary Model' });
|
|
2207
|
-
|
|
2208
|
-
// ── Backward-compatible default filter ──────────────────────────
|
|
2209
|
-
// app.dashboard.filter() still works — goes to the "default" view.
|
|
2210
|
-
// Default filters show below the view cards on /dashboard.
|
|
2211
|
-
app.dashboard.filter('model', ({ stats }) => stats.models[0] || 'unknown',
|
|
2212
|
-
{ label: 'Model' }
|
|
2213
|
-
);
|
|
2214
|
-
|
|
2215
|
-
app.listen();
|
|
2216
|
-
```
|
|
2217
|
-
|
|
2218
|
-
### Example: Eval Score Filters (cachedOnly)
|
|
2219
|
-
|
|
2220
|
-
Per-eval score filters on a `cachedOnly` dashboard view. When a view uses `cachedOnly: true`, filter functions receive `ctx.evalResults` containing cached eval results:
|
|
2221
|
-
|
|
2222
|
-
```js
|
|
2223
|
-
import { createApp } from 'claudeye';
|
|
2224
|
-
|
|
2225
|
-
const app = createApp();
|
|
2226
|
-
|
|
2227
|
-
// ── Register evals that produce scores ──────────────────────────
|
|
2228
|
-
app.eval('quality', ({ entries }) => {
|
|
2229
|
-
const assistantMsgs = entries.filter(e => e.type === 'assistant');
|
|
2230
|
-
const avgLength = assistantMsgs.reduce((sum, e) => {
|
|
2231
|
-
const content = typeof e.message?.content === 'string' ? e.message.content : '';
|
|
2232
|
-
return sum + content.length;
|
|
2233
|
-
}, 0) / (assistantMsgs.length || 1);
|
|
2234
|
-
const score = Math.min(avgLength / 500, 1);
|
|
2235
|
-
return { pass: score > 0.5, score };
|
|
2236
|
-
});
|
|
2237
|
-
|
|
2238
|
-
app.eval('speed', ({ stats }) => {
|
|
2239
|
-
const durationSec = parseFloat(stats.duration) || 60;
|
|
2240
|
-
const score = Math.max(0, 1 - durationSec / 120);
|
|
2241
|
-
return { pass: score > 0.3, score };
|
|
2242
|
-
});
|
|
2243
|
-
|
|
2244
|
-
// ── cachedOnly view with per-eval score filters ─────────────────
|
|
2245
|
-
app.dashboard.view('eval-results', { cachedOnly: true, label: 'Eval Score Filters' })
|
|
2246
|
-
.filter('quality-score',
|
|
2247
|
-
(ctx) => ctx.evalResults?.['quality']?.score ?? 0,
|
|
2248
|
-
{ label: 'Quality Score' })
|
|
2249
|
-
.filter('speed-score',
|
|
2250
|
-
(ctx) => ctx.evalResults?.['speed']?.score ?? 0,
|
|
2251
|
-
{ label: 'Speed Score' })
|
|
2252
|
-
.filter('all-passing', (ctx) => {
|
|
2253
|
-
if (!ctx.evalResults) return false;
|
|
2254
|
-
return Object.values(ctx.evalResults).every(r => r.pass);
|
|
2255
|
-
}, { label: 'All Evals Passing' });
|
|
2256
|
-
|
|
2257
|
-
app.listen();
|
|
2258
|
-
```
|
|
2259
|
-
|
|
2260
|
-
### Example: Actions
|
|
2261
|
-
|
|
2262
|
-
On-demand tasks triggered manually from the dashboard. Actions receive the full session context plus cached eval and enrichment results:
|
|
2263
|
-
|
|
2264
|
-
```js
|
|
2265
|
-
import { createApp } from 'claudeye';
|
|
2266
|
-
|
|
2267
|
-
const app = createApp();
|
|
2268
|
-
|
|
2269
|
-
// ── Evals (actions can read these results) ──────────────────────
|
|
2270
|
-
|
|
2271
|
-
app.eval('under-50-turns', ({ stats }) => ({
|
|
2272
|
-
pass: stats.turnCount <= 50,
|
|
2273
|
-
score: Math.max(0, 1 - stats.turnCount / 100),
|
|
2274
|
-
message: `${stats.turnCount} turn(s)`,
|
|
2275
|
-
}));
|
|
2276
|
-
|
|
2277
|
-
app.eval('has-completion', ({ entries }) => {
|
|
2278
|
-
const last = [...entries].reverse().find(e => e.type === 'assistant');
|
|
2279
|
-
const hasText = last?.message?.content?.some?.(b => b.type === 'text');
|
|
2280
|
-
return {
|
|
2281
|
-
pass: !!hasText,
|
|
2282
|
-
score: hasText ? 1.0 : 0,
|
|
2283
|
-
message: hasText ? 'Ended with text' : 'No final text response',
|
|
2284
|
-
};
|
|
2285
|
-
});
|
|
2286
|
-
|
|
2287
|
-
// ── Enrichments (actions can read these results too) ────────────
|
|
2288
|
-
|
|
2289
|
-
app.enrich('overview', ({ stats }) => ({
|
|
2290
|
-
'Turns': stats.turnCount,
|
|
2291
|
-
'Tool Calls': stats.toolCallCount,
|
|
2292
|
-
'Models': stats.models.join(', ') || 'none',
|
|
2293
|
-
}));
|
|
2294
|
-
|
|
2295
|
-
// ── Actions ─────────────────────────────────────────────────────
|
|
2296
|
-
|
|
2297
|
-
// Session summary: combines stats with eval pass counts
|
|
2298
|
-
app.action('session-summary', ({ stats, evalResults }) => {
|
|
2299
|
-
const evalNames = Object.keys(evalResults);
|
|
2300
|
-
const passCount = evalNames.filter(n => evalResults[n]?.pass).length;
|
|
2301
|
-
return {
|
|
2302
|
-
output: [
|
|
2303
|
-
`Session: ${stats.turnCount} turns, ${stats.toolCallCount} tool calls`,
|
|
2304
|
-
`Duration: ${stats.duration}`,
|
|
2305
|
-
`Models: ${stats.models.join(', ') || 'unknown'}`,
|
|
2306
|
-
`Evals: ${passCount}/${evalNames.length} passed`,
|
|
2307
|
-
].join('\n'),
|
|
2308
|
-
status: 'success',
|
|
2309
|
-
message: 'Summary generated',
|
|
2310
|
-
};
|
|
2311
|
-
});
|
|
2312
|
-
|
|
2313
|
-
// Export metrics: gathers enrichment data into a text report
|
|
2314
|
-
app.action('export-metrics', ({ stats, enrichmentResults }) => {
|
|
2315
|
-
const enrichData = {};
|
|
2316
|
-
for (const [name, result] of Object.entries(enrichmentResults)) {
|
|
2317
|
-
if (result.data) Object.assign(enrichData, result.data);
|
|
2318
|
-
}
|
|
2319
|
-
const lines = [
|
|
2320
|
-
...Object.entries(enrichData).map(([k, v]) => `${k}: ${v}`),
|
|
2321
|
-
`turnCount: ${stats.turnCount}`,
|
|
2322
|
-
`toolCallCount: ${stats.toolCallCount}`,
|
|
2323
|
-
];
|
|
2324
|
-
return {
|
|
2325
|
-
output: lines.join('\n'),
|
|
2326
|
-
status: 'success',
|
|
2327
|
-
message: `Exported ${Object.keys(enrichData).length + 2} metrics`,
|
|
2328
|
-
};
|
|
2329
|
-
});
|
|
2330
|
-
|
|
2331
|
-
// Tool inventory: lists unique tools used in the session
|
|
2332
|
-
app.action('tool-inventory', ({ entries }) => {
|
|
2333
|
-
const toolUses = entries.filter(e =>
|
|
2334
|
-
e.type === 'assistant' &&
|
|
2335
|
-
Array.isArray(e.message?.content) &&
|
|
2336
|
-
e.message.content.some(b => b.type === 'tool_use')
|
|
2337
|
-
);
|
|
2338
|
-
const toolNames = [...new Set(toolUses.flatMap(e =>
|
|
2339
|
-
(e.message?.content || []).filter(b => b.type === 'tool_use').map(b => b.name)
|
|
2340
|
-
))];
|
|
2341
|
-
return {
|
|
2342
|
-
output: toolNames.length > 0
|
|
2343
|
-
? `Tools used:\n${toolNames.map(t => ` - ${t}`).join('\n')}`
|
|
2344
|
-
: 'No tools used in this session',
|
|
2345
|
-
status: 'success',
|
|
2346
|
-
};
|
|
2347
|
-
}, { condition: ({ stats }) => stats.toolCallCount > 0 });
|
|
2348
|
-
|
|
2349
|
-
// Side-effect action: always re-runs (cache: false)
|
|
2350
|
-
app.action('write-report', async ({ projectName, sessionId, stats }) => {
|
|
2351
|
-
const fs = await import('fs/promises');
|
|
2352
|
-
await fs.appendFile('session-reports.jsonl', JSON.stringify({
|
|
2353
|
-
projectName, sessionId, turns: stats.turnCount,
|
|
2354
|
-
timestamp: new Date().toISOString(),
|
|
2355
|
-
}) + '\n');
|
|
2356
|
-
return { status: 'success', message: 'Report appended' };
|
|
2357
|
-
}, { cache: false });
|
|
2358
|
-
|
|
2359
|
-
app.listen();
|
|
2360
|
-
```
|
|
2361
|
-
|
|
2362
|
-
### Example: Alerts
|
|
2363
|
-
|
|
2364
|
-
Callbacks that fire after all evals and enrichments complete for a session:
|
|
2365
|
-
|
|
2366
|
-
```js
|
|
2367
|
-
import { createApp } from 'claudeye';
|
|
2368
|
-
|
|
2369
|
-
const app = createApp();
|
|
2370
|
-
|
|
2371
|
-
// ── Evals ───────────────────────────────────────────────────────
|
|
2372
|
-
|
|
2373
|
-
app.eval('under-50-turns', ({ stats }) => ({
|
|
2374
|
-
pass: stats.turnCount <= 50,
|
|
2375
|
-
score: Math.max(0, 1 - stats.turnCount / 100),
|
|
2376
|
-
message: `${stats.turnCount} turn(s)`,
|
|
2377
|
-
}));
|
|
2378
|
-
|
|
2379
|
-
app.eval('has-completion', ({ entries }) => {
|
|
2380
|
-
const last = [...entries].reverse().find(e => e.type === 'assistant');
|
|
2381
|
-
const hasText = last?.message?.content?.some?.(b => b.type === 'text');
|
|
2382
|
-
return {
|
|
2383
|
-
pass: !!hasText,
|
|
2384
|
-
score: hasText ? 1.0 : 0,
|
|
2385
|
-
message: hasText ? 'Ended with text' : 'No final text response',
|
|
2386
|
-
};
|
|
2387
|
-
});
|
|
2388
|
-
|
|
2389
|
-
// ── Enrichments ─────────────────────────────────────────────────
|
|
2390
|
-
|
|
2391
|
-
app.enrich('overview', ({ stats }) => ({
|
|
2392
|
-
'Turns': stats.turnCount,
|
|
2393
|
-
'Tool Calls': stats.toolCallCount,
|
|
2394
|
-
'Models': stats.models.join(', ') || 'none',
|
|
2395
|
-
}));
|
|
2396
|
-
|
|
2397
|
-
// ── Alerts ──────────────────────────────────────────────────────
|
|
2398
|
-
|
|
2399
|
-
// Console log: always fires, logs a summary line
|
|
2400
|
-
app.alert('log-results', ({ projectName, sessionId, evalSummary, enrichSummary }) => {
|
|
2401
|
-
const evals = evalSummary
|
|
2402
|
-
? `${evalSummary.passCount} pass, ${evalSummary.failCount} fail, ${evalSummary.errorCount} error`
|
|
2403
|
-
: 'no evals';
|
|
2404
|
-
const enrichments = enrichSummary
|
|
2405
|
-
? `${enrichSummary.results.length} enrichments`
|
|
2406
|
-
: 'no enrichments';
|
|
2407
|
-
console.log(`[ALERT] ${projectName}/${sessionId}: ${evals} | ${enrichments}`);
|
|
2408
|
-
});
|
|
2409
|
-
|
|
2410
|
-
// Failure alert: only logs when evals fail
|
|
2411
|
-
app.alert('warn-on-failure', ({ projectName, sessionId, evalSummary }) => {
|
|
2412
|
-
if (evalSummary && evalSummary.failCount > 0) {
|
|
2413
|
-
const failedNames = evalSummary.results
|
|
2414
|
-
.filter(r => !r.error && !r.skipped && !r.pass)
|
|
2415
|
-
.map(r => r.name);
|
|
2416
|
-
console.warn(
|
|
2417
|
-
`[FAILURE] ${projectName}/${sessionId}: ${failedNames.join(', ')} failed`
|
|
2418
|
-
);
|
|
2419
|
-
}
|
|
2420
|
-
});
|
|
2421
|
-
|
|
2422
|
-
// Slack webhook example (uncomment and replace the URL to enable):
|
|
2423
|
-
// app.alert('slack-on-failure', async ({ projectName, sessionId, evalSummary }) => {
|
|
2424
|
-
// if (evalSummary && evalSummary.failCount > 0) {
|
|
2425
|
-
// await fetch('https://hooks.slack.com/services/T.../B.../xxx', {
|
|
2426
|
-
// method: 'POST',
|
|
2427
|
-
// headers: { 'Content-Type': 'application/json' },
|
|
2428
|
-
// body: JSON.stringify({
|
|
2429
|
-
// text: `${evalSummary.failCount} evals failed for ${projectName}/${sessionId}`,
|
|
2430
|
-
// }),
|
|
2431
|
-
// });
|
|
2432
|
-
// }
|
|
2433
|
-
// });
|
|
2434
|
-
|
|
2435
|
-
app.listen();
|
|
2436
|
-
```
|
|
2437
|
-
|
|
2438
|
-
### Example: Minimal Filters Only
|
|
2439
|
-
|
|
2440
|
-
A minimal example — a single named view with filters, no evals or enrichments:
|
|
2441
|
-
|
|
2442
|
-
```js
|
|
2443
|
-
import { createApp } from 'claudeye';
|
|
2444
|
-
|
|
2445
|
-
const app = createApp();
|
|
2446
|
-
|
|
2447
|
-
app.dashboard.view('overview', { label: 'Session Overview' })
|
|
2448
|
-
.filter({ preBuilt: ['lastModified'] })
|
|
2449
|
-
.filter('model', ({ stats }) => stats.models[0] || 'unknown',
|
|
2450
|
-
{ label: 'Model' })
|
|
2451
|
-
.filter('turns', ({ stats }) => stats.turnCount,
|
|
2452
|
-
{ label: 'Turns' })
|
|
2453
|
-
.filter('used-tools', ({ stats }) => stats.toolCallCount > 0,
|
|
2454
|
-
{ label: 'Used Tools' });
|
|
2455
|
-
|
|
2456
|
-
app.listen();
|
|
2457
|
-
```
|
|
2458
|
-
|
|
2459
|
-
---
|
|
2460
|
-
|
|
2461
|
-
## Background Queue Processing
|
|
2462
|
-
|
|
2463
|
-
Enable background processing to automatically scan and evaluate all sessions on a timer:
|
|
2464
|
-
|
|
2465
|
-
```bash
|
|
2466
|
-
claudeye --evals ./my-evals.js --queue-interval 60
|
|
2467
|
-
```
|
|
2468
|
-
|
|
2469
|
-
Use `app.queueCondition()` to gate which sessions the background queue processes:
|
|
2470
|
-
|
|
2471
|
-
```js
|
|
2472
|
-
// Only process sessions with more than 5 entries
|
|
2473
|
-
app.queueCondition(({ entries }) => entries.length > 5);
|
|
2474
|
-
```
|
|
2475
|
-
|
|
2476
|
-
The condition receives the full `EvalContext` and returns a boolean. If false, the session is skipped entirely. Results are cached per-session and auto-invalidate when the session file or condition function changes.
|
|
2477
|
-
|
|
2478
|
-
### How the Queue Works
|
|
2479
|
-
|
|
2480
|
-
The queue is **unified** — every individual eval and enrichment (session-scoped, subagent-scoped, UI-triggered, or background-scanned) passes through a single priority queue with bounded concurrency.
|
|
2481
|
-
|
|
2482
|
-
1. **Foreground (always active):** When a session page loads or a re-run is triggered, each uncached eval/enrichment is enqueued at HIGH priority via `queuePerItem()` and processed immediately (up to the concurrency limit)
|
|
2483
|
-
2. **Background (opt-in):** When `CLAUDEYE_QUEUE_INTERVAL` is set, a timer scans all projects for uncached sessions and enqueues individual uncached evals/enrichments at LOW priority
|
|
2484
|
-
3. **Priority:** HIGH (foreground/UI) items are always processed before LOW (background) items
|
|
2485
|
-
4. **Dedup:** If the same item is enqueued twice, the existing entry is upgraded to the higher priority
|
|
2486
|
-
5. **Subagent support:** Subagent evals/enrichments go through the same queue. The session ID is encoded as `sessionId/agent-agentId` for tracking purposes
|
|
2487
|
-
6. **Alerts:** After each successful item completes, the queue checks if any pending/processing work remains for that session. When no work remains, alerts fire once per content version
|
|
2488
|
-
|
|
2489
|
-
### Queue Status UI
|
|
2490
|
-
|
|
2491
|
-
The queue status is visible in two places:
|
|
2492
|
-
|
|
2493
|
-
**Navbar dropdown** — shows current processing items (max 7) and pending items with priority badges. Badge count = pending + processing.
|
|
2494
|
-
|
|
2495
|
-
**`/queue` details page** — three tabs:
|
|
2496
|
-
- **In Queue** — pending items with type badge, item name, session link, priority, queued time
|
|
2497
|
-
- **Processing** — active items with spinner, type badge, item name, session link, started time
|
|
2498
|
-
- **Processed** — completed items with type badge, item name, session link, duration, success/fail icon, completed time. Data is loaded from disk via paginated JSONL files (25 entries per page), so history survives process restarts
|
|
2499
|
-
|
|
2500
|
-
**Dashboard panel** — collapsible panel showing queue state with processing and pending tables, background processor indicator, and error list.
|
|
2501
|
-
|
|
2502
|
-
All views auto-refresh and self-hide when there's no queue activity.
|
|
2503
|
-
|
|
2504
|
-
### Environment Variables
|
|
2505
|
-
|
|
2506
|
-
| Variable | Description | Default |
|
|
2507
|
-
|----------|-------------|---------|
|
|
2508
|
-
| `CLAUDEYE_QUEUE_INTERVAL` | Background scan interval in seconds | disabled |
|
|
2509
|
-
| `CLAUDEYE_QUEUE_CONCURRENCY` | Max parallel items per batch | `2` |
|
|
2510
|
-
| `CLAUDEYE_QUEUE_HISTORY_TTL` | Seconds to keep completed items | `3600` |
|
|
2511
|
-
| `CLAUDEYE_QUEUE_MAX_ITEMS` | Max items to enqueue per scan (0=unlimited) | `500` |
|
|
2512
|
-
| `CLAUDEYE_LOG_LEVEL` | Log verbosity for both dashboard server and hook processes: `info`, `warn`, `error` | `warn` |
|
|
2513
|
-
| `CLAUDEYE_HOOK_LOG_FILE` | Enable hook file logging: `1` or `true` for default dir (`~/.claudeye/logs/`), or an absolute path for a custom directory | disabled |
|
|
2514
|
-
| `CLAUDEYE_DISABLE_PAGES` | Comma-separated pages to hide from nav and block direct access: `policies`, `dashboard`, `projects`. The first non-disabled page (policies → projects → dashboard) becomes the root `/` landing page. | unset |
|
|
2515
|
-
|
|
2516
|
-
At `info` level, all log lines (including `ACTIVITY` lines for user actions) are emitted. At `warn` (default), only warnings and errors appear. At `error`, only errors are shown.
|
|
2517
|
-
|
|
2518
|
-
**Hook logging:** `CLAUDEYE_LOG_LEVEL` controls the verbosity of hook stderr output (event type, policy count, evaluation result at `info`; failures at `warn`). When `CLAUDEYE_HOOK_LOG_FILE` is additionally set, hooks write to persistent log files with automatic size-based rotation at 500 KB. See the [Hook Logging](#hook-logging) section for details and examples.
|
|
2519
|
-
|
|
2520
|
-
---
|
|
2521
|
-
|
|
2522
|
-
## Caching
|
|
2523
|
-
|
|
2524
|
-
Caching is **always on**. Results are cached to `~/.claudeye/cache/` and automatically invalidated when session logs or eval definitions change. Click **Re-run** in the dashboard to bypass the cache.
|
|
2525
|
-
|
|
2526
|
-
```bash
|
|
2527
|
-
claudeye --cache-path /tmp/cc # Custom cache location
|
|
2528
|
-
claudeye --cache-clear # Clear cache and exit
|
|
2529
|
-
```
|
|
2530
|
-
|
|
2531
|
-
---
|
|
2532
|
-
|
|
2533
|
-
## Authentication
|
|
2534
|
-
|
|
2535
|
-
Claudeye ships with **opt-in** username/password auth. When no users are configured, everything works exactly as before — no login page, no blocking.
|
|
2536
|
-
|
|
2537
|
-
### Enable via CLI
|
|
2538
|
-
|
|
2539
|
-
```bash
|
|
2540
|
-
# Single user
|
|
2541
|
-
claudeye --auth-user admin:secret
|
|
2542
|
-
|
|
2543
|
-
# Multiple users
|
|
2544
|
-
claudeye --auth-user admin:secret --auth-user viewer:readonly
|
|
2545
|
-
```
|
|
2546
|
-
|
|
2547
|
-
### Enable via environment variable
|
|
2548
|
-
|
|
2549
|
-
```bash
|
|
2550
|
-
CLAUDEYE_AUTH_USERS=admin:secret claudeye
|
|
2551
|
-
CLAUDEYE_AUTH_USERS=admin:secret,viewer:readonly claudeye
|
|
2552
|
-
```
|
|
2553
|
-
|
|
2554
|
-
### Enable via the programmatic API
|
|
2555
|
-
|
|
2556
|
-
```js
|
|
2557
|
-
import { createApp } from 'claudeye';
|
|
2558
|
-
|
|
2559
|
-
const app = createApp();
|
|
2560
|
-
|
|
2561
|
-
app.auth({ users: [
|
|
2562
|
-
{ username: 'admin', password: 'secret' },
|
|
2563
|
-
{ username: 'viewer', password: 'readonly' },
|
|
2564
|
-
] });
|
|
2565
|
-
|
|
2566
|
-
app.listen();
|
|
2567
|
-
```
|
|
2568
|
-
|
|
2569
|
-
All three methods can be combined — users from CLI flags, the env var, and `app.auth()` are merged together.
|
|
2570
|
-
|
|
2571
|
-
When auth is active, all UI routes redirect to `/login`. After signing in, a signed session cookie (24h expiry) grants access. A **Sign out** button appears in the navbar.
|
|
2572
|
-
|
|
2573
|
-
---
|
|
2574
|
-
|
|
2575
|
-
## Deployment with PM2
|
|
2576
|
-
|
|
2577
|
-
For production deployments, use PM2 with Bun as the interpreter:
|
|
2578
|
-
|
|
2579
|
-
```js
|
|
2580
|
-
// ecosystem.config.cjs
|
|
2581
|
-
module.exports = {
|
|
2582
|
-
apps: [{
|
|
2583
|
-
name: 'claudeye',
|
|
2584
|
-
script: 'node_modules/.bin/next',
|
|
2585
|
-
args: 'start',
|
|
2586
|
-
interpreter: 'bun',
|
|
2587
|
-
cwd: '/path/to/claudeye',
|
|
2588
|
-
env: {
|
|
2589
|
-
PORT: 8020,
|
|
2590
|
-
HOSTNAME: '0.0.0.0',
|
|
2591
|
-
CLAUDE_PROJECTS_PATH: '/home/user/.claude/projects',
|
|
2592
|
-
CLAUDEYE_EVALS_MODULE: './my-evals.js',
|
|
2593
|
-
CLAUDEYE_QUEUE_INTERVAL: '60',
|
|
2594
|
-
},
|
|
2595
|
-
}],
|
|
2596
|
-
};
|
|
2597
|
-
```
|
|
2598
|
-
|
|
2599
|
-
```bash
|
|
2600
|
-
# Start
|
|
2601
|
-
pm2 start ecosystem.config.cjs
|
|
2602
|
-
|
|
2603
|
-
# Monitor
|
|
2604
|
-
pm2 monit
|
|
2605
|
-
|
|
2606
|
-
# Auto-restart on reboot
|
|
2607
|
-
pm2 startup
|
|
2608
|
-
pm2 save
|
|
2609
|
-
```
|
|
2610
|
-
|
|
2611
|
-
---
|
|
2612
|
-
|
|
2613
|
-
## Telemetry
|
|
2614
|
-
|
|
2615
|
-
Claudeye collects anonymous, non-PII usage analytics (e.g. `app_started`, `queue_scan_completed`) to understand feature adoption. Events are keyed by a random instance UUID — no project names, session IDs, eval names, or log content are ever sent.
|
|
2616
|
-
|
|
2617
|
-
**Opt out:**
|
|
2618
|
-
|
|
2619
|
-
```bash
|
|
2620
|
-
# CLI flag
|
|
2621
|
-
claudeye --disable-telemetry
|
|
2622
|
-
|
|
2623
|
-
# Or environment variable
|
|
2624
|
-
CLAUDEYE_TELEMETRY_DISABLED=1 claudeye
|
|
2625
|
-
```
|
|
2626
|
-
|
|
2627
|
-
When disabled, all telemetry code is zero-cost no-op (no network requests, no dynamic imports).
|
|
2628
|
-
|
|
2629
|
-
---
|
|
2630
|
-
|
|
2631
|
-
## How It Works
|
|
2632
|
-
|
|
2633
|
-
1. `createApp()` + `app.eval()` / `app.enrich()` / `app.action()` / `app.alert()` / `app.condition()` / `app.queueCondition()` / `app.dashboard.view()` / `app.dashboard.filter()` / `app.dashboard.aggregate()` register functions in global registries
|
|
2634
|
-
2. When you run `claudeye --evals ./my-file.js`, the server dynamically imports your file, populating the registries
|
|
2635
|
-
3. All eval/enrichment execution routes through a unified priority queue. Each individual eval and enrichment is a separate queue item. UI requests use HIGH priority; background scanning uses LOW priority
|
|
2636
|
-
4. Each item runs through: cache check → execute if uncached → cache result → check if session complete → fire alerts if complete
|
|
2637
|
-
5. The global condition is checked first. If it fails, everything is skipped
|
|
2638
|
-
6. Per-item conditions are checked individually. Skipped items don't block others
|
|
2639
|
-
7. Each function is individually error-isolated. If one throws, the others still run
|
|
2640
|
-
8. After all evals and enrichments complete, registered alerts fire with the complete `AlertContext` (eval summary + enrichment summary)
|
|
2641
|
-
9. Results are serialized and displayed in separate panels in the dashboard UI
|
|
2642
|
-
10. Named dashboard views (`/dashboard`) show a view index; each view (`/dashboard/[viewName]`) computes filter values incrementally (only new/changed sessions are processed), then filters and paginates server-side for efficiency
|
|
2643
|
-
11. Dashboard aggregates run a separate server action that collects per-session values (with eval/enrichment/filter results) and reduces them via user-defined reduce functions into sortable summary tables
|
|
2644
|
-
12. When `CLAUDEYE_QUEUE_INTERVAL` is set, a background processor scans for uncached items on a timer. Track queue state at `/queue` or via the navbar dropdown
|
|
2645
|
-
|
|
2646
|
-
---
|
|
2647
|
-
|
|
2648
|
-
## Community
|
|
2649
|
-
|
|
2650
|
-
- [Website & Docs](https://claudeye.exosphere.host) - documentation, guides, and examples
|
|
2651
|
-
- [Discord](https://discord.com/invite/zT92CAgvkj) - get help and connect with other developers
|
|
2652
|
-
- [Issues](https://github.com/exospherehost/claudeye/issues) - bug reports and feature requests
|
|
2653
|
-
|
|
2654
|
-
## License
|
|
2655
|
-
|
|
2656
|
-
MIT + Commons Clause. See [LICENSE](./LICENSE).
|
|
30
|
+
**[github.com/exospherehost/failproofai](https://github.com/exospherehost/failproofai)** | **[befailproof.ai](https://befailproof.ai)**
|