npm - @percepta/kaizen - Versions diffs - 0.6.0 → 0.7.0 - Mend

@percepta/kaizen 0.6.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (137) hide show

package/README.md CHANGED Viewed

@@ -1,176 +1,104 @@
 # Kaizen
-Automated AI researcher that improves AI systems. Kaizen investigates production traces, builds evaluation datasets, records scored runs, and helps prepare improvements -- driven by Claude Code with a live dashboard.
+Kaizen is an agentic eval platform for AI systems. It helps a coding agent create a system definition, curate a Langfuse-backed dataset, write an eval script, run a baseline, and iterate on variants while Kaizen records scored runs under `kaizen/.kaizen/runs/`.
 ## Install In A Target Repo
-For a persistent local `kaizen` command:
 ```bash
 npm install -g @percepta/kaizen
 kaizen init
-kaizen create system <system-id> # add --eval-language ts for TypeScript
-kaizen run --system <system-id> --variant baseline --hypothesis "starting baseline"
+kaizen guide
+kaizen create system <system-id>
+kaizen create view <system-id> --type trace
+kaizen create view <system-id> --type dataset-item
+kaizen run --system <system-id> --variant baseline --diagnostic --hypothesis "starting baseline"
 kaizen studio
 ```
-For one-off use without a global install:
+For one-off use:
 ```bash
 npx @percepta/kaizen init
 ```
-For a repo-local dev dependency:
+Kaizen is installed inside the customer repo. The customer-owned footprint is intentionally small:
-```bash
-pnpm add -D @percepta/kaizen
-pnpm exec kaizen studio
-```
+- `kaizen/config.ts`
+- `kaizen/systems/<system-id>/system.md`
+- `kaizen/systems/<system-id>/eval.py|ts`
+- optional `kaizen/systems/<system-id>/trace.tsx`
+- optional `kaizen/systems/<system-id>/dataset-item.tsx`
+- optional `kaizen/systems/<system-id>/rubric.md`
+- `kaizen/.kaizen/runs/`
-Kaizen is installed inside the customer repo. System definitions, eval scripts, custom views, and `.kaizen/runs/` all live there; the CLI, runner, dashboard shell, and recipes come from the package.
-## Developing This Repo
-```bash
-pnpm install
-pnpm --filter @percepta/kaizen dev:studio
-```
+Package-owned agent guidance is printed with `kaizen guide`. Customer-specific durable notes belong in `kaizen/systems/<system-id>/system.md`; Kaizen does not create repo-level agent markdown such as `KAIZEN.md`, `AGENTS.md`, or `CLAUDE.md`.
-This starts the dashboard at http://localhost:6789 against `examples/legacy-workspace`, a transitional fixture with historical customer and system definitions. The CLI and dashboard both live in `packages/kaizen` (CLI in `src/`, Next.js dashboard in `dashboard/`).
+## Lifecycle
-Other dev scripts:
+1. Run `kaizen init` once in the target repo.
+2. Run `kaizen create system <system-id>` and fill in `kaizen/systems/<system-id>/system.md`.
+3. Use Studio Data to create or select a Langfuse dataset, add useful source traces, and label dataset items.
+4. Replace `kaizen/systems/<system-id>/eval.py|ts` with a real eval that reads the dataset named by `dataset_version`.
+5. Run a diagnostic baseline, then a full baseline.
+6. Run variants with `kaizen run`, inspect `kaizen log`, and use Studio to compare runs and failures.
-| Script                                      | What it does                      |
-| ------------------------------------------- | --------------------------------- |
-| `pnpm --filter @percepta/kaizen dev:studio` | Start the Studio dashboard        |
-| `pnpm --filter @percepta/kaizen dev:next`   | Start only the Next.js dev server |
-| `pnpm typecheck`                            | Typecheck all packages            |
-| `pnpm test`                                 | Run package tests                 |
+The eval script emits NDJSON events to `--out-fd`; the runner owns process supervision, `kaizen/.kaizen/runs/`, crash recording, and automatic promotion. For Langfuse-backed evals, the eval should also link each dataset item to the fresh trace generated by that run and write the primary metric as a trace score.
-## Publishing
+## Custom Views
-Publishing `@percepta/kaizen` to npm is automated with Changesets. For changes
-that affect the published package, add a changeset:
+Custom views are plain React components co-located with the system:
 ```bash
-pnpm changeset
+kaizen create view <system-id> --type trace
+kaizen create view <system-id> --type dataset-item
 ```
-Merging to `main` runs `.github/workflows/build-and-publish.yml`, which builds
-the CLI and bundled Studio, then either opens a version PR or publishes to npm
-using the `NPM_TOKEN` repository secret.
-## How It Works
-Kaizen closes the eval loop for AI systems:
-1. **Investigate** -- pull production traces from Langfuse, analyze failure patterns
-2. **Build dataset** -- create versioned eval datasets from traces with ground truth
-3. **Annotate** -- label ground truth via the dashboard's inline annotation view
-4. **Record runs** -- test system variants against ground truth, scored automatically
-5. **Improve** -- prepare a PR from the latest promoted baseline when a human asks
-The `/kaizen` slash command in Claude Code orchestrates this workflow. Variant-builder agents can execute in parallel worktrees, but they pass the main checkout's `.kaizen` path via `KAIZEN_STATE_DIR` or `--state-dir` so the dashboard always reads one canonical state tree.
-## Dashboard
-The web app (Next.js, pages router) provides:
+`trace.tsx` receives the full Langfuse trace payload plus actions for writing scores. `dataset-item.tsx` receives the dataset item, the linked source trace when available, and actions for updating the dataset item or linking run items. Browser-side credentials are not required; Studio proxies the write actions through local API routes.
-- **Data** -- inspect Langfuse datasets, dataset items, and source traces
-- **Experiments** -- inspect local Kaizen runs from the customer repo's `.kaizen/runs/` store
-- **Ideas** -- inspect Linear issues scoped to the system's configured project and the shared `Kaizen` label
-- **Source indicators** -- show whether a field is sourced from repo code, Langfuse, Linear, or the local filesystem
+Run `kaizen guide views` for the exact prop and action interfaces.
-### Keyboard Shortcuts
-| Shortcut | Action         |
-| -------- | -------------- |
-| `Cmd+[`  | Toggle sidebar |
-| `Cmd+/`  | Show shortcuts |
-## System Definitions
-In real use, each target/customer repo owns its own `customers/`, `systems/`, `rubrics/`, `eval/`, and optional `views/` directories. Each system is defined in `systems/*.md` with YAML frontmatter:
+## Developing This Repo
-```yaml
-run_eval: eval/<system>.ts # or .py
-eval_version: 1
-dataset_version: v1
-eval_style: ground-truth
-primary_metric: score
+```bash
+pnpm install
+pnpm --filter @percepta/kaizen dev:studio
 ```
-The eval script emits NDJSON events to `--out-fd`; the runner owns `.kaizen/runs/`.
-Kaizen runs Python evals with `python3`, JavaScript evals with `node`, and
-TypeScript evals with the package's bundled `tsx` loader. New system scaffolds
-default to Python; pass `--eval-language ts` to create a TypeScript eval.
-For Langfuse-backed production evals, the same script should also link each
-dataset item to the fresh trace produced by that run in a Langfuse dataset run
-and write the primary metric as a trace score. Those writes are for durable
-trace inspection; the NDJSON `complete.score` remains Kaizen's required result
-contract.
-This repo keeps historical definitions under `examples/legacy-workspace/` only as sample data for local Studio development:
-| Customer         | System                       | Primary Metric          |
-| ---------------- | ---------------------------- | ----------------------- |
-| Transcarent      | EMO HIE Processing           | F2                      |
-| Transcarent      | EMO Facility Processing      | F2                      |
-| Transcarent      | EMO Cost Savings Agent       | Classification Accuracy |
-| Transcarent      | EMO Summarization            | --                      |
-| Transcarent      | Orbit Call Summarization     | Judge Quality           |
-| Cityblock Health | BOI Chaselist Impact         | Calibration Error       |
-| Cityblock Health | Concurrent Review Agent      | --                      |
-| Cityblock Health | Contract Exclusion Detection | --                      |
-| Cityblock Health | Quality Gap Modeling         | Calibration Error       |
-| Janus Henderson  | Portfolio Analytics          | --                      |
-| Summa Health     | Agentic BI (SLCC)            | --                      |
-## Repository Structure
+This starts Studio at `http://localhost:6789` against `examples/demo-workspace`, a local fixture for package development. The CLI lives in `src/`; the bundled Next.js Studio lives in `dashboard/`.
-```
-kaizen/
-├── packages/kaizen/                 # Published @percepta/kaizen package
-│   ├── src/                         # CLI source
-│   ├── dashboard/                   # Next.js Studio (built into the published bundle)
-│   └── examples/legacy-workspace/   # Transitional customer/system fixture for local dev
-```
-## Tech Stack
+Useful scripts:
-- **Frontend**: Next.js (pages router), TypeScript, CSS modules, dark theme
-- **AI**: Claude Code with `/kaizen` skill + variant-builder agents
-- **Observability**: Langfuse for traces, datasets, and annotation state; `.kaizen/runs/` for local run truth
-- **Package manager**: pnpm
+| Script                                      | What it does                       |
+| ------------------------------------------- | ---------------------------------- |
+| `pnpm --filter @percepta/kaizen dev:studio` | Start Studio with the demo fixture |
+| `pnpm --filter @percepta/kaizen dev:next`   | Start only the Next.js dev server  |
+| `pnpm --filter @percepta/kaizen typecheck`  | Typecheck the package              |
+| `pnpm --filter @percepta/kaizen test`       | Run package tests                  |
-## Environment Setup
+## Environment
-Create `.env.local` in the workspace repo root with Kaizen credentials:
+Create `.env.local` in the workspace repo root:
-```
+```text
 LANGFUSE_HOST=https://...
 LANGFUSE_PUBLIC_KEY=pk-lf-...
 LANGFUSE_SECRET_KEY=sk-lf-...
 LINEAR_API_KEY=lin_api_...
+LINEAR_TEAM_KEY=ENG
 ```
-Kaizen Studio reads Langfuse and Linear credentials from the workspace root
-`.env.local`. Put the values there instead of app package env files so stale
-package-level placeholders cannot shadow the credentials Studio needs.
-Langfuse credentials power the Data surface. `LINEAR_API_KEY` powers the Ideas
-surface and the `kaizen ideas --system <id>` CLI command.
+Langfuse credentials power the Data surface and custom view actions. `LINEAR_API_KEY` and `LINEAR_TEAM_KEY` power `kaizen ideas --system <id>`.
-System Ideas configuration should use a stable Linear project URL or ID:
+System Ideas configuration should use a stable Linear project URL or ID in `system.md`:
 ```yaml
-linear_project: https://linear.app/aitco/project/kaizen-v0-555399b53e23
+linear_project: https://linear.app/<workspace>/project/<project-slug>
 ```
-Project names are intentionally not used for the connection because they can
-change in Linear without changing project identity.
+## Publishing
-## Docs
+Publishing `@percepta/kaizen` to npm is automated with Changesets. For changes that affect the published package, add a changeset:
-- [Langfuse Standards](docs/langfuse-standards.md) -- how we structure Langfuse across all customers
-- [Eval Framework](docs/eval-framework.md) -- evaluation philosophy and requirements
+```bash
+pnpm changeset
+```

package/agent/claude-command.md ADDED Viewed

@@ -0,0 +1,23 @@
+# Kaizen Claude Command Guide
+Use this as a Claude Code command body when Claude should drive the Kaizen lifecycle. It is package-owned; do not copy it into customer repos as durable markdown unless the user explicitly asks.
+## Rules
+- Never commit PHI or credentials.
+- Run `kaizen guide` first when guidance is not already in context.
+- Put customer-specific notes in `kaizen/systems/<system-id>/system.md`.
+- Use Studio for dataset curation and custom dataset item views for labeling workflows.
+## Workflow
+1. Select a system. If none is given, list `kaizen/systems/*/system.md` and ask which one.
+2. Read `system.md`, relevant application code, and current `kaizen log --system <system-id> --json`.
+3. If the system is new, run `kaizen create system <system-id> --eval-language py|ts` and fill in the scaffold.
+4. Use Studio Data to create/select a dataset, add traces, and label expected outputs.
+5. Replace the starter eval with real code that loads `--dataset`, runs the candidate, emits NDJSON events, and persists Langfuse links/scores when available.
+6. Run a diagnostic baseline, then the full baseline.
+7. Iterate variants with `kaizen run`.
+8. Create `trace.tsx` or `dataset-item.tsx` only when the default views are insufficient.
+For exact eval and view contracts, run `kaizen guide evals` and `kaizen guide views`.

package/agent/evals.md ADDED Viewed

@@ -0,0 +1,41 @@
+# Kaizen Eval Guide
+Eval scripts are customer-owned executable code stored at `kaizen/systems/<system-id>/eval.py|ts`. `kaizen run` invokes the path named by `run_eval` in `kaizen/systems/<system-id>/system.md`:
+```bash
+<run_eval> --variant <variant-id> --dataset <dataset_version> --out-fd 3 [--max-items <n>]
+```
+The eval must write NDJSON events to `--out-fd`. Do not write these events to normal stdout.
+```json
+{"type":"start","n":10,"eval_version":1,"dataset_version":"v1"}
+{"type":"item","id":"item-1","score":0.8,"breakdown":{"score":0.8},"trace_id":"trace-id-or-null"}
+{"type":"complete","score":0.82,"n":10,"breakdown":{"score":0.82},"worst_traces":[{"id":"item-1","score":0.8,"trace_id":"trace-id-or-null"}]}
+```
+The terminal `complete.score` is Kaizen's authoritative result. It must be a number in `[0, 1]`.
+## Langfuse Persistence
+For Langfuse-backed evals:
+- Treat `--dataset` as the Langfuse dataset name unless `system.md` says otherwise.
+- Load dataset items from that dataset.
+- Run the candidate system for each item.
+- Capture the fresh Langfuse trace id for that item.
+- Link the dataset item to the fresh trace in a Langfuse dataset run.
+- Write the primary metric as a Langfuse score on the fresh trace.
+- Emit the same item score through Kaizen's NDJSON stream.
+Langfuse stores trace inspection, dataset-run history, and score metadata. `kaizen/.kaizen/runs/` remains the source of truth for promotion and run state.
+## Baseline
+Run a diagnostic baseline first:
+```bash
+kaizen run --system <system-id> --variant baseline --diagnostic --hypothesis "starting baseline"
+```
+If setup, credentials, dataset access, and event schema are valid, run the full baseline without `--diagnostic`.

package/agent/overview.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Kaizen Agent Guide
+Kaizen helps a coding agent define, evaluate, inspect, and improve an AI system inside the customer repo. This guide is package-owned; rerun `kaizen guide` after package upgrades.
+Do not create extra long-lived agent markdown files. Customer-specific notes belong in `kaizen/systems/<system-id>/system.md`; repo-owned code belongs beside it in `kaizen/systems/<system-id>/`.
+## Commands
+Run commands from the repo root:
+- `kaizen init` - scaffold Kaizen once.
+- `kaizen guide topics` - list focused guide topics.
+- `kaizen create system <system-id> --eval-language py|ts` - create `kaizen/systems/<system-id>/system.md` and `kaizen/systems/<system-id>/eval.py|ts`.
+- `kaizen create view <system-id> --type trace` - create `kaizen/systems/<system-id>/trace.tsx`.
+- `kaizen create view <system-id> --type dataset-item` - create `kaizen/systems/<system-id>/dataset-item.tsx`.
+- `kaizen studio` - open Studio for dataset curation, trace inspection, and run review.
+- `kaizen run --system <system-id> --variant <variant-id> --hypothesis "<why>"` - record one eval run.
+- `kaizen run --system <system-id> --variant <variant-id> --diagnostic --hypothesis "<why>"` - run a small diagnostic sample first.
+- `kaizen log --system <system-id> --json` - inspect the promoted baseline and recent runs.
+Run state is written to `kaizen/.kaizen/`. When evaluating from a Git linked worktree, Kaizen automatically stores run state in the primary checkout's `kaizen/.kaizen/`.
+## Files
+- `kaizen/systems/<system-id>/system.md` is the durable system definition. It should explain the workflow, key files, setup, dataset, metric, known failures, and variant ideas.
+- `kaizen/systems/<system-id>/eval.py|ts` is the eval entrypoint named by `run_eval`.
+- `kaizen/systems/<system-id>/trace.tsx` is an optional custom trace view.
+- `kaizen/systems/<system-id>/dataset-item.tsx` is an optional custom dataset labeling view.
+- `kaizen/systems/<system-id>/rubric.md` is optional and only needed for LLM-as-judge or hybrid evals.
+Each `system.md` must include:
+```yaml
+run_eval: kaizen/systems/<system-id>/eval.py
+eval_version: 1
+dataset_version: <langfuse-dataset-name>
+eval_style: ground-truth
+primary_metric: score
+target: 0.90
+```
+## Lifecycle
+1. Run `kaizen create system <system-id>` unless the system already exists.
+2. Read the codebase and fill in `system.md` with real key files, setup, data sources, dataset, and metric.
+3. Use Studio Data to create/select a dataset, add representative traces, and label expected outputs.
+4. Replace the starter eval with real code that loads the dataset named by `--dataset`.
+5. Run a diagnostic baseline.
+6. Run the full baseline.
+7. Iterate on variants with `kaizen run`; read `kaizen log` and Studio failures between attempts.
+8. Create custom views only when the default JSON views are not enough for trace inspection or dataset labeling.
+For eval details, run `kaizen guide evals`. For view props and actions, run `kaizen guide views`.

package/agent/variant-builder.md ADDED Viewed

@@ -0,0 +1,22 @@
+# Kaizen Variant Builder Guide
+You implement and evaluate one variant, record one run with `kaizen run`, then stop.
+## Setup
+1. Work in the assigned worktree, not the main checkout.
+2. Let Kaizen auto-detect the primary checkout for run state. Runs from linked worktrees are recorded under the primary checkout's `kaizen/.kaizen/`.
+3. Read `kaizen/systems/<system-id>/system.md`, the parent run manifest when present, and the parent failures.
+4. Install or start only what the system setup section requires.
+## Run
+```bash
+kaizen run \
+  --system <system-id> \
+  --variant <variant-id> \
+  --parent <parent-run-id> \
+  --hypothesis "<what changed and why>"
+```
+The runner owns process supervision, `kaizen/.kaizen/runs/`, crash recording, and promotion. Read the single summary line it prints and include the run id and score in your handoff.

package/agent/views.md ADDED Viewed

@@ -0,0 +1,51 @@
+# Kaizen Custom Views Guide
+Custom views are customer-owned React components co-located with the system.
+```bash
+kaizen create view <system-id> --type trace
+kaizen create view <system-id> --type dataset-item
+```
+Studio loads:
+- `kaizen/systems/<system-id>/trace.tsx`
+- `kaizen/systems/<system-id>/dataset-item.tsx`
+No `system.md` frontmatter field is required.
+## Trace View
+```tsx
+import type { TraceRendererProps } from "@percepta/kaizen";
+export default function TraceView({ trace, actions }: TraceRendererProps) {
+  return <pre>{JSON.stringify(trace, null, 2)}</pre>;
+}
+```
+Trace views receive `{ trace, context, actions }`. `actions.createScore(...)` writes a Langfuse score for the current or supplied trace id.
+## Dataset Item View
+```tsx
+import type { DatasetItemRendererProps } from "@percepta/kaizen";
+export default function DatasetItemView({
+  datasetItem,
+  trace,
+  actions,
+}: DatasetItemRendererProps) {
+  return <pre>{JSON.stringify({ datasetItem, trace }, null, 2)}</pre>;
+}
+```
+Dataset item views receive `{ datasetItem, trace, context, actions }`. Use them for labeling expected output, metadata, review status, and scoring workflows.
+Available dataset actions:
+- `actions.updateDatasetItem({ expectedOutput?, metadata?, input?, sourceTraceId?, status? })`
+- `actions.createDatasetRunItem({ runName, datasetItemId?, traceId?, runDescription?, metadata? })`
+- `actions.createScore({ name, value, traceId?, comment?, metadata? })`
+When omitted, `datasetName`, `itemId`, and `traceId` default to the current Studio selection where Studio can infer them.

package/dashboard/.next/standalone/packages/kaizen/dashboard/.next/BUILD_ID CHANGED Viewed

	@@ -1 +1 @@
1	- ~~YpQ-I4VL-aEdQrM5uN7_3~~
1	+ SCF0o7YxElB9rzWaOohsA

package/dashboard/.next/standalone/packages/kaizen/dashboard/.next/build-manifest.json CHANGED Viewed

@@ -4,8 +4,8 @@
   ],
   "devFiles": [],
   "lowPriorityFiles": [
-    "static/YpQ-I4VL-aEdQrM5uN7_3/_buildManifest.js",
-    "static/YpQ-I4VL-aEdQrM5uN7_3/_ssgManifest.js"
+    "static/SCF0o7YxElB9rzWaOohsA/_buildManifest.js",
+    "static/SCF0o7YxElB9rzWaOohsA/_ssgManifest.js"
   ],
   "rootMainFiles": [],
   "rootMainFilesTree": {},
@@ -15,9 +15,9 @@
       "static/chunks/framework-7089c270fe56b51f.js",
       "static/chunks/main-7ac7f96d288497aa.js",
       "static/chunks/431-43358ce3c29e5e1b.js",
-      "static/css/b18a6732b96168e1.css",
-      "static/chunks/673-ed4be46027ae7a37.js",
-      "static/chunks/pages/index-1d8b6719f49e4ae0.js"
+      "static/css/cd3873236eb77caa.css",
+      "static/chunks/253-85c76c34f33c9604.js",
+      "static/chunks/pages/index-d3306bb6f5d7d235.js"
     ],
     "/[system]": [
       "static/chunks/webpack-8c7966d82a2912f0.js",
@@ -30,45 +30,45 @@
       "static/chunks/framework-7089c270fe56b51f.js",
       "static/chunks/main-7ac7f96d288497aa.js",
       "static/chunks/431-43358ce3c29e5e1b.js",
-      "static/css/b18a6732b96168e1.css",
-      "static/chunks/673-ed4be46027ae7a37.js",
-      "static/chunks/pages/[system]/benchmarks-559dc9df52db3af4.js"
+      "static/css/cd3873236eb77caa.css",
+      "static/chunks/253-85c76c34f33c9604.js",
+      "static/chunks/pages/[system]/benchmarks-30a17b7659010b8c.js"
     ],
-    "/[system]/data": [
+    "/[system]/data/[[...path]]": [
       "static/chunks/webpack-8c7966d82a2912f0.js",
       "static/chunks/framework-7089c270fe56b51f.js",
       "static/chunks/main-7ac7f96d288497aa.js",
       "static/chunks/431-43358ce3c29e5e1b.js",
-      "static/css/b18a6732b96168e1.css",
-      "static/chunks/673-ed4be46027ae7a37.js",
-      "static/chunks/pages/[system]/data-644e4280b4c86fe0.js"
+      "static/css/cd3873236eb77caa.css",
+      "static/chunks/253-85c76c34f33c9604.js",
+      "static/chunks/pages/[system]/data/[[...path]]-e5f4083fe9ffe429.js"
     ],
     "/[system]/eval": [
       "static/chunks/webpack-8c7966d82a2912f0.js",
       "static/chunks/framework-7089c270fe56b51f.js",
       "static/chunks/main-7ac7f96d288497aa.js",
       "static/chunks/431-43358ce3c29e5e1b.js",
-      "static/css/b18a6732b96168e1.css",
-      "static/chunks/673-ed4be46027ae7a37.js",
-      "static/chunks/pages/[system]/eval-3c911ea8744631fd.js"
+      "static/css/cd3873236eb77caa.css",
+      "static/chunks/253-85c76c34f33c9604.js",
+      "static/chunks/pages/[system]/eval-160237a604b47416.js"
     ],
-    "/[system]/experiments": [
+    "/[system]/experiments/[[...path]]": [
       "static/chunks/webpack-8c7966d82a2912f0.js",
       "static/chunks/framework-7089c270fe56b51f.js",
       "static/chunks/main-7ac7f96d288497aa.js",
       "static/chunks/431-43358ce3c29e5e1b.js",
-      "static/css/b18a6732b96168e1.css",
-      "static/chunks/673-ed4be46027ae7a37.js",
-      "static/chunks/pages/[system]/experiments-42f31600c2bb47ad.js"
+      "static/css/cd3873236eb77caa.css",
+      "static/chunks/253-85c76c34f33c9604.js",
+      "static/chunks/pages/[system]/experiments/[[...path]]-91e47a4893093600.js"
     ],
     "/[system]/ideas": [
       "static/chunks/webpack-8c7966d82a2912f0.js",
       "static/chunks/framework-7089c270fe56b51f.js",
       "static/chunks/main-7ac7f96d288497aa.js",
       "static/chunks/431-43358ce3c29e5e1b.js",
-      "static/css/b18a6732b96168e1.css",
-      "static/chunks/673-ed4be46027ae7a37.js",
-      "static/chunks/pages/[system]/ideas-6829a271003150a9.js"
+      "static/css/cd3873236eb77caa.css",
+      "static/chunks/253-85c76c34f33c9604.js",
+      "static/chunks/pages/[system]/ideas-96e58e4624952e26.js"
     ],
     "/_app": [
       "static/chunks/webpack-8c7966d82a2912f0.js",

package/dashboard/.next/standalone/packages/kaizen/dashboard/.next/prerender-manifest.json CHANGED Viewed

@@ -3,9 +3,9 @@
   "routes": {},
   "dynamicRoutes": {},
   "preview": {
-    "previewModeId": "02bf50b6d1c114ab64891ff63b9ae67b",
-    "previewModeSigningKey": "fe223618cfb9bb61306a9dea8261c44ddd7141789a6543970e870a61bf19da51",
-    "previewModeEncryptionKey": "a64d9df7a36d787a5996f6fb7171afb1fc8b7d35ad4962f94f482a0b687146c9"
+    "previewModeId": "0ba1834cfc7d7c8ea8708ad29269e503",
+    "previewModeSigningKey": "4cdb34c8c9deccef53be3f014ab0ccdd422a7b57e71290f1ce447fd0a2e2a138",
+    "previewModeEncryptionKey": "c48cc1326503410c45e8e0baf97d801d067e317fa759e378825440098271163f"
   },
   "notFoundRoutes": []
 }

package/dashboard/.next/standalone/packages/kaizen/dashboard/.next/routes-manifest.json CHANGED Viewed

@@ -38,12 +38,13 @@
       "namedRegex": "^/(?<nxtPsystem>[^/]+?)/benchmarks(?:/)?$"
     },
     {
-      "page": "/[system]/data",
-      "regex": "^/([^/]+?)/data(?:/)?$",
+      "page": "/[system]/data/[[...path]]",
+      "regex": "^/([^/]+?)/data(?:/(.+?))?(?:/)?$",
       "routeKeys": {
-        "nxtPsystem": "nxtPsystem"
+        "nxtPsystem": "nxtPsystem",
+        "nxtPpath": "nxtPpath"
       },
-      "namedRegex": "^/(?<nxtPsystem>[^/]+?)/data(?:/)?$"
+      "namedRegex": "^/(?<nxtPsystem>[^/]+?)/data(?:/(?<nxtPpath>.+?))?(?:/)?$"
     },
     {
       "page": "/[system]/eval",
@@ -54,12 +55,13 @@
       "namedRegex": "^/(?<nxtPsystem>[^/]+?)/eval(?:/)?$"
     },
     {
-      "page": "/[system]/experiments",
-      "regex": "^/([^/]+?)/experiments(?:/)?$",
+      "page": "/[system]/experiments/[[...path]]",
+      "regex": "^/([^/]+?)/experiments(?:/(.+?))?(?:/)?$",
       "routeKeys": {
-        "nxtPsystem": "nxtPsystem"
+        "nxtPsystem": "nxtPsystem",
+        "nxtPpath": "nxtPpath"
       },
-      "namedRegex": "^/(?<nxtPsystem>[^/]+?)/experiments(?:/)?$"
+      "namedRegex": "^/(?<nxtPsystem>[^/]+?)/experiments(?:/(?<nxtPpath>.+?))?(?:/)?$"
     },
     {
       "page": "/[system]/ideas",
@@ -77,6 +79,12 @@
       "routeKeys": {},
       "namedRegex": "^/(?:/)?$"
     },
+    {
+      "page": "/api/langfuse-action",
+      "regex": "^/api/langfuse\\-action(?:/)?$",
+      "routeKeys": {},
+      "namedRegex": "^/api/langfuse\\-action(?:/)?$"
+    },
     {
       "page": "/api/langfuse-dataset",
       "regex": "^/api/langfuse\\-dataset(?:/)?$",
@@ -89,6 +97,12 @@
       "routeKeys": {},
       "namedRegex": "^/api/langfuse\\-dataset\\-item(?:/)?$"
     },
+    {
+      "page": "/api/langfuse-dataset-mutation",
+      "regex": "^/api/langfuse\\-dataset\\-mutation(?:/)?$",
+      "routeKeys": {},
+      "namedRegex": "^/api/langfuse\\-dataset\\-mutation(?:/)?$"
+    },
     {
       "page": "/api/langfuse-datasets",
       "regex": "^/api/langfuse\\-datasets(?:/)?$",
@@ -101,6 +115,12 @@
       "routeKeys": {},
       "namedRegex": "^/api/langfuse\\-trace(?:/)?$"
     },
+    {
+      "page": "/api/langfuse-traces",
+      "regex": "^/api/langfuse\\-traces(?:/)?$",
+      "routeKeys": {},
+      "namedRegex": "^/api/langfuse\\-traces(?:/)?$"
+    },
     {
       "page": "/api/linear-ideas",
       "regex": "^/api/linear\\-ideas(?:/)?$",
@@ -150,8 +170,8 @@
       "routeKeys": {
         "nxtPsystem": "nxtPsystem"
       },
-      "dataRouteRegex": "^/_next/data/YpQ\\-I4VL\\-aEdQrM5uN7_3/([^/]+?)\\.json$",
-      "namedDataRouteRegex": "^/_next/data/YpQ\\-I4VL\\-aEdQrM5uN7_3/(?<nxtPsystem>[^/]+?)\\.json$"
+      "dataRouteRegex": "^/_next/data/SCF0o7YxElB9rzWaOohsA/([^/]+?)\\.json$",
+      "namedDataRouteRegex": "^/_next/data/SCF0o7YxElB9rzWaOohsA/(?<nxtPsystem>[^/]+?)\\.json$"
     }
   ],
   "rsc": {