npm - claudeye - Versions diffs - 0.2.2 → 0.3.0 - Mend

claudeye 0.2.2 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of claudeye might be problematic. Click here for more details.

Files changed (191) hide show

package/README.md CHANGED Viewed

@@ -23,11 +23,16 @@ This starts a local server on port 8020 and opens the dashboard in your browser.
 ### Options
 ```
---projects-path <path>  Path to Claude projects directory (default: ~/.claude/projects)
---port <number>         Preferred port (default: 8020)
---evals <path>          Path to an evals file (see Custom Evals below)
---no-open               Don't auto-open the browser
--h, --help              Show help
+--projects-path, -p <path>  Path to Claude projects directory (default: ~/.claude/projects)
+--port <number>             Preferred port (default: 8020)
+--host <address>            Host to bind to (default: localhost)
+                            Use 0.0.0.0 for LAN access
+--evals <path>              Path to an evals/enrichments file (see Custom Evals below)
+--cache <on|off>            Enable/disable result caching (default: on)
+--cache-path <path>         Custom cache directory (default: ~/.claudeye/cache)
+--cache-clear               Clear all cached results and exit
+--no-open                   Don't auto-open the browser
+-h, --help                  Show help
 ```
 ### Examples
@@ -38,6 +43,21 @@ claudeye --projects-path /path/to/projects
 # Run on a different port without opening the browser
 claudeye --port 3000 --no-open
+# Bind to all interfaces for LAN access
+claudeye --host 0.0.0.0
+# Load custom evals and enrichments
+claudeye --evals ./my-evals.js
+# Disable caching
+claudeye --cache off
+# Use a custom cache directory
+claudeye --cache-path /tmp/claudeye-cache
+# Clear cached results
+claudeye --cache-clear
 ```
 ## Features
@@ -60,6 +80,7 @@ Open a session to get a full breakdown of the agent execution:
 - **Subagent logs** — nested subagent executions load on demand
 - **Thinking blocks** — collapsible reasoning traces
 - **Virtual scrolling** — handles large logs without performance issues
+- **Download** — export session logs as JSONL files
 ### Filtering
@@ -67,10 +88,73 @@ Open a session to get a full breakdown of the agent execution:
 - **Keyword search**: add multiple keywords as chips, AND logic, case-insensitive
 - Filters combine — both time and keyword conditions must match
+### Auto-Refresh
+The navbar includes a refresh button with an auto-refresh dropdown (Off, 5s, 10s, 30s). It works globally across all pages — projects, sessions, and log views all refresh when triggered.
 ### Theme Support
 Light and dark mode with system preference detection. Toggle from the navbar. Preference is saved to `localStorage`.
+---
+## Caching
+Claudeye caches eval and enrichment results to avoid recomputing them on every page load. Caching is **on by default** and requires zero configuration.
+### How It Works
+- Results are cached to `~/.claudeye/cache/` as JSON files, keyed by project and session
+- Cache invalidation is **content-hash-based**, not TTL-based:
+  - The session file's modification time and size are hashed (fast, no full file read)
+  - The evals module file content is hashed
+  - The set of registered eval/enricher names is tracked
+- A cache hit is only valid when **all three** match — if the session log grows, the evals file changes, or evals are added/removed, the cache is automatically invalidated
+- Clicking **Re-run** in the dashboard always bypasses the cache and runs fresh computation
+- Cached results show a subtle "cached" badge next to the duration in the results panel
+### Running Without Cache
+```bash
+# Disable caching entirely
+claudeye --cache off
+# Or during development
+npm run dev -- --cache off
+```
+When caching is disabled, no cache files are created or read. The startup log will show `Cache: disabled`.
+### Custom Cache Location
+```bash
+claudeye --cache-path /path/to/custom/cache
+```
+### Clearing the Cache
+```bash
+# Clear the default cache directory and exit
+claudeye --cache-clear
+# Clear a custom cache directory
+claudeye --cache-clear --cache-path /path/to/custom/cache
+```
+### Startup Log
+When the server starts, the cache status is shown alongside other configuration:
+```
+Starting Claudeye dashboard...
+  Projects: /home/user/.claude/projects
+  Evals:    /home/user/my-evals.js
+  Cache:    local (/home/user/.claudeye/cache/)
+  URL:      http://localhost:8020
+```
+---
 ## Custom Evals & Enrichments
 Claudeye supports two types of custom extensions you can define in a single JS/TS file and load with the `--evals` flag:
@@ -165,107 +249,510 @@ node my-evals.js
 This spawns the Claudeye dashboard as a child process. When the same file is loaded via `--evals`, `listen()` automatically becomes a no-op so you won't get a duplicate server.
-### API Reference
+---
+## Conditional Evals & Enrichments
+Conditions let you control when evals and enrichments run. There are two levels:
+1. **Global condition** — gates _all_ evals and enrichments at once
+2. **Per-eval / per-enrichment conditions** — gate individual functions
-#### `createApp()`
+### Global Condition
-Returns a `ClaudeyeApp` instance.
+Use `app.condition()` to set a global condition. If it returns `false` (or throws), every registered eval and enrichment is skipped.
+```js
+import { createApp } from 'claudeye';
+const app = createApp();
-#### `app.eval(name, fn)`
+// Only run evals/enrichments for sessions with actual content
+app.condition(({ entries }) => entries.length > 0);
-Register an eval function. Chainable.
+// Only run for non-test projects
+app.condition(({ projectName }) => !projectName.includes('test'));
+```
+The global condition receives the same `EvalContext` as evals and enrichments, so you can inspect entries, stats, project name, and session ID.
+> **Note:** Calling `app.condition()` multiple times replaces the previous condition — only the last one is active.
+### Per-Eval Conditions
+Pass a `condition` function in the options object (third argument) to `app.eval()`:
+```js
+app.eval('budget-check',
+  ({ stats }) => ({
+    pass: stats.toolCallCount <= 50,
+    score: 1 - Math.min(stats.toolCallCount / 100, 1),
+    message: `${stats.toolCallCount} tool call(s)`,
+  }),
+  {
+    condition: ({ stats }) => stats.toolCallCount > 0,
+  }
+);
+```
+If the condition returns `false`, the eval is marked as **skipped** in the results panel. If the condition throws, the eval is marked as **errored** with the message `Condition error: <message>`.
+### Per-Enrichment Conditions
+Same pattern — pass a `condition` in the options to `app.enrich()`:
+```js
+app.enrich('error-analysis',
+  ({ entries }) => {
+    const errors = entries.filter(e => e.raw?.is_error === true);
+    return {
+      'Total Errors': errors.length,
+      'Error Rate': `${((errors.length / entries.length) * 100).toFixed(1)}%`,
+    };
+  },
+  {
+    condition: ({ entries }) => entries.some(e => e.raw?.is_error === true),
+  }
+);
+```
+### Evaluation Order
+When a session is loaded, conditions are evaluated in this order:
+```
+1. Global condition checked
+   ├─ Returns false or throws → ALL evals/enrichments marked "skipped"
+   └─ Returns true → proceed to step 2
+2. For each eval/enrichment:
+   ├─ Has per-item condition?
+   │   ├─ Returns false → that item marked "skipped"
+   │   ├─ Throws → that item marked "errored" (not skipped)
+   │   └─ Returns true → run the function
+   └─ No condition → run the function
+3. Function executes
+   ├─ Returns result → recorded normally
+   └─ Throws → marked "errored", other items still run
+```
+### Condition Function Signature
+Both global and per-item conditions use the same type:
+```ts
+type ConditionFunction = (context: EvalContext) => boolean | Promise<boolean>;
+```
+Conditions can be async — the runner awaits them.
+### Combining Global and Per-Item Conditions
+Global and per-item conditions stack. The global condition runs first; if it passes, per-item conditions are checked individually:
+```js
+const app = createApp();
+// Global: skip everything for empty sessions
+app.condition(({ entries }) => entries.length > 0);
+// Per-eval: only check turn count for sessions with tool calls
+app.eval('efficient-tools',
+  ({ stats }) => ({
+    pass: stats.toolCallCount <= stats.turnCount * 2,
+    score: Math.max(0, 1 - (stats.toolCallCount / (stats.turnCount * 4))),
+  }),
+  { condition: ({ stats }) => stats.toolCallCount > 0 }
+);
+// Per-enrichment: only compute cost for sessions that used a model
+app.enrich('cost',
+  ({ entries }) => {
+    const tokens = entries.reduce((s, e) => s + (e.raw?.usage?.total_tokens || 0), 0);
+    return { 'Total Tokens': tokens, 'Est. Cost': `$${(tokens * 0.00001).toFixed(4)}` };
+  },
+  { condition: ({ stats }) => stats.models.length > 0 }
+);
+```
+### UI Behavior
+In the dashboard, conditional results appear as follows:
+| Status | Evals Panel | Enrichments Panel |
+|--------|-------------|-------------------|
+| **Skipped** | Grayed-out row with "skipped" label | Grayed-out row with "skipped" label |
+| **Condition error** | Row with warning icon and error message | Row with warning icon and error message |
+| **Passed / Data** | Green check with score bar | Key-value pairs grouped by enricher |
+| **Failed** | Red X with score bar | N/A |
+Skipped items are counted separately in the summary bar (e.g. "2 passed, 1 skipped").
+### Full Example with Conditions
+```js
+import { createApp } from 'claudeye';
+const app = createApp();
+// ── Global condition: require at least one user message ──
+app.condition(({ entries }) =>
+  entries.some(e => e.type === 'user')
+);
+// ── Evals ──
+// Always runs (when global condition passes)
+app.eval('has-completion', ({ entries }) => {
+  const lastAssistant = [...entries].reverse().find(e => e.type === 'assistant');
+  const hasText = lastAssistant?.message?.content?.some?.(b => b.type === 'text');
+  return {
+    pass: !!hasText,
+    score: hasText ? 1.0 : 0,
+    message: hasText ? 'Session completed with text response' : 'No final text response',
+  };
+});
+// Only runs for sessions with tool calls
+app.eval('tool-success-rate',
+  ({ entries }) => {
+    const toolResults = entries.filter(e => e.type === 'tool');
+    const errors = toolResults.filter(e => e.raw?.is_error === true);
+    const rate = toolResults.length > 0 ? 1 - (errors.length / toolResults.length) : 1;
+    return {
+      pass: rate >= 0.9,
+      score: rate,
+      message: `${errors.length}/${toolResults.length} tool errors`,
+    };
+  },
+  { condition: ({ stats }) => stats.toolCallCount > 0 }
+);
+// Only runs for longer sessions
+app.eval('under-budget',
+  ({ stats }) => ({
+    pass: stats.turnCount <= 30,
+    score: Math.max(0, 1 - stats.turnCount / 60),
+    message: `${stats.turnCount} turns`,
+  }),
+  { condition: ({ stats }) => stats.turnCount >= 5 }
+);
+// ── Enrichments ──
+// Always runs
+app.enrich('overview', ({ stats }) => ({
+  'Turns': stats.turnCount,
+  'Tool Calls': stats.toolCallCount,
+  'Duration': stats.duration,
+  'Models': stats.models.join(', ') || 'none',
+}));
+// Only when subagents were spawned
+app.enrich('subagent-info',
+  ({ entries, stats }) => {
+    const subagentEntries = entries.filter(e => e.type === 'assistant' && e.parentUuid);
+    return {
+      'Subagent Count': stats.subagentCount,
+      'Subagent Messages': subagentEntries.length,
+    };
+  },
+  { condition: ({ stats }) => stats.subagentCount > 0 }
+);
+// Async condition example
+app.enrich('advanced-metrics',
+  ({ entries }) => ({
+    'Entry Count': entries.length,
+    'Avg Entry Size': Math.round(
+      entries.reduce((s, e) => s + JSON.stringify(e).length, 0) / entries.length
+    ),
+  }),
+  {
+    condition: async ({ entries }) => {
+      // Conditions can be async
+      return entries.length > 10;
+    },
+  }
+);
+```
+---
+## Subagent-Scoped Evals & Enrichments
+By default, evals and enrichments run at the session level. You can also register evals and enrichments that run against individual subagent logs using the `scope` option.
+### Scope Options
+The `scope` option controls when an eval or enrichment runs:
+| Scope | Runs at session level | Runs at subagent level |
+|-------|----------------------|----------------------|
+| `'session'` (default) | Yes | No |
+| `'subagent'` | No | Yes |
+| `'both'` | Yes | Yes |
+### Basic Usage
+```js
+import { createApp } from 'claudeye';
+const app = createApp();
+// Session-only (default, backward-compatible)
+app.eval('no-errors', fn);
+// Subagent-only, runs for all subagent types
+app.eval('agent-quality', fn, { scope: 'subagent' });
+// Subagent-only, only runs for Explore subagents
+app.eval('explore-depth', fn, { scope: 'subagent', subagentType: 'Explore' });
+// Runs at both session and subagent level
+app.eval('general-check', fn, { scope: 'both' });
+// Enrichments work the same way
+app.enrich('agent-cost', fn, { scope: 'subagent' });
+app.enrich('agent-tokens', fn, { scope: 'subagent', subagentType: 'Explore' });
+```
+### Subagent Context
+When running at subagent level, the `EvalContext` includes additional metadata:
+```js
+app.eval('adaptive-eval', (ctx) => {
+  if (ctx.scope === 'subagent') {
+    // Running against a subagent's log
+    console.log(ctx.subagentId);          // e.g. 'a1b2c3'
+    console.log(ctx.subagentType);        // e.g. 'Explore'
+    console.log(ctx.subagentDescription); // e.g. 'Search for auth code'
+    console.log(ctx.parentSessionId);     // parent session ID
+  }
+  return { pass: true };
+}, { scope: 'both' });
+```
+### SubagentType Filtering
+When you specify `subagentType`, the eval/enrichment only runs for subagents of that type. Subagents of other types will not see the eval panel at all.
+```js
+// Only runs for Explore subagents
+app.eval('explore-thoroughness', ({ entries }) => ({
+  pass: entries.length > 5,
+  score: Math.min(entries.length / 20, 1),
+}), { scope: 'subagent', subagentType: 'Explore' });
+// Runs for all subagent types
+app.eval('agent-efficiency', ({ stats }) => ({
+  pass: stats.turnCount <= 10,
+  score: Math.max(0, 1 - stats.turnCount / 20),
+}), { scope: 'subagent' });
+```
+### UI Behavior
+When a subagent is expanded in the log viewer, eval and enrichment panels appear below the stats bar (only if matching subagent-scoped evals/enrichments are registered). Panels use a compact layout to fit within the nested subagent view. If no subagent-scoped evals match the subagent's type, the panels are not rendered.
+### Caching
+Subagent eval/enrichment results are cached separately from session results:
+```
+~/.claudeye/cache/evals/{project}/{sessionId}.json              # session-level
+~/.claudeye/cache/evals/{project}/{sessionId}/agent-{id}.json   # subagent-level
+```
+Cache invalidation works the same way — based on the subagent log file's mtime+size and the evals module content hash.
+### Edge Cases
+- **No subagents in session**: Subagent-scoped evals never run — panels only mount inside expanded subagent cards
+- **`scope: 'both'` with `subagentType`**: At session level, the `subagentType` filter is ignored. At subagent level, it applies.
+- **Conditions + scope**: Scope filtering happens first (registry level), then conditions run with the full `EvalContext` including scope metadata
+- **Backward compatibility**: Existing evals with no `scope` option default to `'session'` — behavior is unchanged
+---
+## API Reference
+### `createApp()`
+Returns a `ClaudeyeApp` instance. All methods are chainable.
+```ts
+import { createApp } from 'claudeye';
+const app = createApp();
+```
+### `app.condition(fn)`
+Set a global condition that gates all evals and enrichments. Calling this multiple times replaces the previous condition.
+```ts
+app.condition(({ entries, stats, projectName, sessionId }) => boolean | Promise<boolean>);
+```
+### `app.eval(name, fn, options?)`
+Register an eval function.
 - **`name`** — unique string identifier for the eval
 - **`fn`** — function receiving an `EvalContext` and returning an `EvalResult`
+- **`options.condition`** — optional condition function to gate this eval
+- **`options.scope`** — `'session'` (default), `'subagent'`, or `'both'`
+- **`options.subagentType`** — only run for subagents of this type (e.g. `'Explore'`)
 ```ts
 app.eval('my-eval', (ctx) => ({ pass: true, score: 1.0 }));
+app.eval('conditional-eval', evalFn, {
+  condition: (ctx) => ctx.stats.turnCount > 0,
+});
+app.eval('subagent-eval', evalFn, {
+  scope: 'subagent',
+  subagentType: 'Explore',
+});
 ```
-#### `app.enrich(name, fn)`
+### `app.enrich(name, fn, options?)`
-Register an enricher function. Chainable.
+Register an enricher function.
 - **`name`** — unique string identifier for the enricher
 - **`fn`** — function receiving an `EvalContext` and returning a `Record<string, string | number | boolean>`
+- **`options.condition`** — optional condition function to gate this enricher
+- **`options.scope`** — `'session'` (default), `'subagent'`, or `'both'`
+- **`options.subagentType`** — only run for subagents of this type (e.g. `'Explore'`)
 ```ts
 app.enrich('metrics', (ctx) => ({
   'Turn Count': ctx.stats.turnCount,
   'Models Used': ctx.stats.models.join(', '),
 }));
+app.enrich('conditional-enricher', enrichFn, {
+  condition: (ctx) => ctx.entries.length > 0,
+});
+app.enrich('agent-cost', enrichFn, {
+  scope: 'subagent',
+});
 ```
-#### `app.listen(port?, options?)`
+### `app.listen(port?, options?)`
 Start the Claudeye dashboard server.
 - **`port`** — port number (default: 8020)
+- **`options.host`** — bind address (default: "localhost", use "0.0.0.0" for LAN)
 - **`options.open`** — auto-open browser (default: true)
 When the file is loaded via `--evals` or `CLAUDEYE_EVALS_MODULE`, `listen()` is a no-op.
-#### `EvalContext`
+### `EvalContext`
 Both evals and enrichers receive the same context object:
 ```ts
 interface EvalContext {
-  entries: EvalLogEntry[];  // Parsed log entries for the session
-  stats: EvalLogStats;      // Computed stats (turn count, tool calls, etc.)
-  projectName: string;      // Encoded project folder name
-  sessionId: string;        // Session UUID
+  entries: EvalLogEntry[];        // Parsed log entries for the session or subagent
+  stats: EvalLogStats;            // Computed stats (turn count, tool calls, etc.)
+  projectName: string;            // Encoded project folder name
+  sessionId: string;              // Session UUID
+  scope: 'session' | 'subagent'; // Whether running at session or subagent level
+  subagentId?: string;            // Hex ID of the subagent (subagent scope only)
+  subagentType?: string;          // e.g. 'Explore', 'Bash' (subagent scope only)
+  subagentDescription?: string;   // Short description (subagent scope only)
+  parentSessionId?: string;       // Parent session ID (subagent scope only)
 }
 ```
-#### `EvalLogStats`
+### `EvalLogEntry`
+```ts
+interface EvalLogEntry {
+  type: string;               // 'user' | 'assistant' | 'tool' | 'queue-operation' | ...
+  uuid: string;               // Unique entry ID
+  parentUuid: string | null;  // Parent ID for nested entries (subagents)
+  timestamp: string;          // ISO timestamp
+  timestampMs: number;        // Epoch milliseconds
+  timestampFormatted: string; // Human-readable timestamp
+  message?: {
+    role: string;                          // 'user' | 'assistant'
+    content: string | EvalContentBlock[];  // Text or structured content blocks
+    model?: string;                        // Model ID (assistant messages only)
+  };
+  raw?: Record<string, unknown>;  // Raw JSONL data (usage, errors, etc.)
+  label?: string;                 // Custom label
+}
+```
+### `EvalLogStats`
 ```ts
 interface EvalLogStats {
-  turnCount: number;
-  userCount: number;
-  assistantCount: number;
-  toolCallCount: number;
-  subagentCount: number;
-  duration: string;        // Formatted duration (e.g. "2m 15s")
-  models: string[];        // Model IDs used in the session
+  turnCount: number;      // Number of conversation turns
+  userCount: number;      // Number of user messages
+  assistantCount: number; // Number of assistant responses
+  toolCallCount: number;  // Total tool invocations
+  subagentCount: number;  // Number of subagent spawns
+  duration: string;       // Formatted duration (e.g. "2m 15s")
+  models: string[];       // Distinct model IDs used
 }
 ```
-#### `EvalResult`
+### `EvalResult`
 ```ts
 interface EvalResult {
-  pass: boolean;            // Required: did the eval pass?
-  score?: number;           // Optional: 0-1, clamped automatically
-  message?: string;         // Optional: shown in the UI
-  metadata?: Record<string, unknown>;  // Optional: arbitrary data
+  pass: boolean;                          // Did the eval pass?
+  score?: number;                         // 0-1, clamped automatically (default: 1.0)
+  message?: string;                       // Shown in the UI
+  metadata?: Record<string, unknown>;     // Arbitrary data
 }
 ```
-#### `EnrichmentResult`
+### `EnrichmentResult`
 ```ts
 // Enrichers return a flat key-value map
 type EnrichmentResult = Record<string, string | number | boolean>;
 ```
-### How It Works
+### `ConditionFunction`
+```ts
+type ConditionFunction = (context: EvalContext) => boolean | Promise<boolean>;
+```
+---
+## How It Works
-1. `createApp()` + `app.eval()` / `app.enrich()` register functions in global registries
-2. When you run `claudeye --evals ./my-file.js`, the server dynamically imports your file, populating both the eval and enricher registries
+1. `createApp()` + `app.eval()` / `app.enrich()` / `app.condition()` register functions in global registries
+2. When you run `claudeye --evals ./my-file.js`, the server dynamically imports your file, populating the eval, enricher, and condition registries
 3. When the dashboard loads a session, server actions run all registered evals and enrichers against the session's parsed log entries
-4. Each function is individually error-isolated — if one throws, the others still run
-5. Results are serialized and displayed in separate panels in the dashboard UI
+4. The global condition is checked first — if it fails, everything is skipped
+5. Per-item conditions are checked individually — skipped items don't block others
+6. Each function is individually error-isolated — if one throws, the others still run
+7. Results are serialized and displayed in separate panels in the dashboard UI
-### Tips
+## Tips
 - Each eval and enricher is wrapped in a try/catch. If one throws, the others still run and the error is shown in the UI.
-- Eval scores are clamped to 0-1. If you don't provide a score, passing evals default to 1.0 and failing evals default to 0.
+- Eval scores are clamped to 0-1. If you don't provide a score, it defaults to 1.0.
 - Both eval and enricher functions can be async.
+- Condition functions can also be async.
 - Re-registering with the same name replaces the previous function.
-- Click "Re-run" in either panel to re-execute against the current session.
-- You can mix `app.eval()` and `app.enrich()` calls freely in the same file — they're independent systems that share the same context.
+- Click "Re-run" in either panel to re-execute against the current session (always bypasses cache).
+- You can mix `app.eval()`, `app.enrich()`, and `app.condition()` calls freely in the same file.
+- The global condition applies to both evals and enrichments — there's no way to set separate global conditions for each.
+- Per-item condition errors are treated as eval/enrichment errors (not skips), so you'll see the error message in the UI.
 ## License