npm - elasticdash-test - Versions diffs - 0.1.26 → 0.1.27 - Mend

elasticdash-test 0.1.26 → 0.1.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (85) hide show

package/README.md +100 -0
package/dist/cli.js +175 -0
package/dist/cli.js.map +1 -1
package/dist/index.cjs +62 -1
package/dist/index.d.ts +2 -0
package/dist/index.d.ts.map +1 -1
package/dist/index.js +2 -0
package/dist/index.js.map +1 -1
package/dist/tool-registry.d.ts +31 -0
package/dist/tool-registry.d.ts.map +1 -0
package/dist/tool-registry.js +73 -0
package/dist/tool-registry.js.map +1 -0
package/dist/tool-runner-worker.js +19 -2
package/dist/tool-runner-worker.js.map +1 -1
package/dist/utils/debug.d.ts +1 -1
package/dist/utils/debug.d.ts.map +1 -1
package/dist/utils/debug.js +2 -2
package/dist/utils/debug.js.map +1 -1
package/docs/observability_contract.md +192 -0
package/package.json +2 -2
package/src/cli.ts +184 -0
package/src/index.ts +4 -0
package/src/tool-registry.ts +94 -0
package/src/tool-runner-worker.ts +17 -2
package/src/utils/debug.ts +2 -2
package/dist/cloud-client.d.ts +0 -34
package/dist/cloud-client.d.ts.map +0 -1
package/dist/cloud-client.js +0 -103
package/dist/cloud-client.js.map +0 -1
package/dist/evaluators/determinism.d.ts +0 -3
package/dist/evaluators/determinism.d.ts.map +0 -1
package/dist/evaluators/determinism.js +0 -116
package/dist/evaluators/determinism.js.map +0 -1
package/dist/evaluators/index.d.ts +0 -4
package/dist/evaluators/index.d.ts.map +0 -1
package/dist/evaluators/index.js +0 -61
package/dist/evaluators/index.js.map +0 -1
package/dist/evaluators/latency-budget.d.ts +0 -3
package/dist/evaluators/latency-budget.d.ts.map +0 -1
package/dist/evaluators/latency-budget.js +0 -45
package/dist/evaluators/latency-budget.js.map +0 -1
package/dist/evaluators/llm-judge.d.ts +0 -3
package/dist/evaluators/llm-judge.d.ts.map +0 -1
package/dist/evaluators/llm-judge.js +0 -125
package/dist/evaluators/llm-judge.js.map +0 -1
package/dist/evaluators/output-contains.d.ts +0 -3
package/dist/evaluators/output-contains.d.ts.map +0 -1
package/dist/evaluators/output-contains.js +0 -52
package/dist/evaluators/output-contains.js.map +0 -1
package/dist/evaluators/output-schema.d.ts +0 -3
package/dist/evaluators/output-schema.d.ts.map +0 -1
package/dist/evaluators/output-schema.js +0 -58
package/dist/evaluators/output-schema.js.map +0 -1
package/dist/evaluators/token-budget.d.ts +0 -3
package/dist/evaluators/token-budget.d.ts.map +0 -1
package/dist/evaluators/token-budget.js +0 -45
package/dist/evaluators/token-budget.js.map +0 -1
package/dist/evaluators/types.d.ts +0 -104
package/dist/evaluators/types.d.ts.map +0 -1
package/dist/evaluators/types.js +0 -6
package/dist/evaluators/types.js.map +0 -1
package/dist/test-group/cli.d.ts +0 -8
package/dist/test-group/cli.d.ts.map +0 -1
package/dist/test-group/cli.js +0 -162
package/dist/test-group/cli.js.map +0 -1
package/dist/test-group/git-context.d.ts +0 -3
package/dist/test-group/git-context.d.ts.map +0 -1
package/dist/test-group/git-context.js +0 -59
package/dist/test-group/git-context.js.map +0 -1
package/dist/test-group/reporter.d.ts +0 -4
package/dist/test-group/reporter.d.ts.map +0 -1
package/dist/test-group/reporter.js +0 -54
package/dist/test-group/reporter.js.map +0 -1
package/dist/test-group/runner.d.ts +0 -18
package/dist/test-group/runner.d.ts.map +0 -1
package/dist/test-group/runner.js +0 -234
package/dist/test-group/runner.js.map +0 -1
package/dist/tracing-universal.d.ts +0 -13
package/dist/tracing-universal.d.ts.map +0 -1
package/dist/tracing-universal.js +0 -33
package/dist/tracing-universal.js.map +0 -1
package/docs/backend_rerun_alignment.md +0 -291
package/docs/backend_traceid_update.md +0 -141
package/docs/observability_backend_contract.md +0 -577
package/docs/observability_rerun_backend_plan.md +0 -596

package/docs/observability_backend_contract.md DELETED Viewed

@@ -1,577 +0,0 @@
-# Observability Backend & Dashboard Contract
-This document specifies exactly what the backend and dashboard frontend need to implement to support the SDK's new observability mode. The SDK implementation is complete — it sends events to the endpoints described below.
----
-## Part 1: Backend API Contract
-### Authentication
-All observability endpoints require a project-scoped API key passed as:
-```
-Authorization: Bearer <ELASTICDASH_API_KEY>
-```
-Return `401 Unauthorized` if the key is missing or invalid. The key should resolve to a `projectId` used for multi-tenant scoping.
----
-### Endpoint 1: `POST /api/observability/events`
-**Purpose:** Batch ingest of trace events from the SDK's `TelemetryBatcher`.
-**Request:**
-```json
-{
-  "sessionId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
-  "serviceId": "my-ai-app",
-  "events": [
-    {
-      "id": 1,
-      "type": "ai",
-      "name": "gpt-4o",
-      "input": { "messages": [{ "role": "user", "content": "Hello" }] },
-      "output": { "choices": [{ "message": { "role": "assistant", "content": "Hi!" } }] },
-      "timestamp": 1712851200000,
-      "durationMs": 1230,
-      "usage": { "inputTokens": 5, "outputTokens": 3, "totalTokens": 8 },
-      "streamed": false,
-      "schemaVersion": 1,
-      "traceId": "req-abc-123"
-    },
-    {
-      "id": 2,
-      "type": "tool",
-      "name": "searchDB",
-      "input": { "query": "pikachu" },
-      "output": { "results": ["..."] },
-      "timestamp": 1712851201230,
-      "durationMs": 45,
-      "schemaVersion": 1,
-      "traceId": "req-abc-123"
-    },
-    {
-      "id": 3,
-      "type": "side_effect",
-      "name": "__heartbeat__",
-      "input": { "sessionId": "f47ac10b-...", "serviceId": "my-ai-app" },
-      "output": { "uptime": 3600.5 },
-      "timestamp": 1712851230000,
-      "durationMs": 0,
-      "schemaVersion": 1
-    }
-  ]
-}
-```
-**Response:**
-```
-202 Accepted
-{ "ok": true, "ingested": 3 }
-```
-**Error responses:**
-| Status | Body | Meaning |
-|--------|------|---------|
-| `400` | `{ "ok": false, "error": "..." }` | Malformed payload (missing sessionId, invalid events array) |
-| `401` | `{ "ok": false, "error": "unauthorized" }` | Missing or invalid API key |
-| `429` | `{ "ok": false, "error": "rate_limited" }` | Per-serviceId rate limit exceeded — SDK retries with backoff |
-| `500+` | `{ "ok": false, "error": "..." }` | Server error — SDK retries with backoff |
-**SDK retry behavior:** On `429` or `5xx`, the SDK retries up to 3 times with exponential backoff (1s, 2s, 4s). After 3 failures, events are dropped. The backend should return `202` quickly and process events asynchronously.
-**Rate limiting recommendation:** 1000 events/min per `serviceId`.
----
-### Endpoint 2: `POST /api/observability/sessions`
-**Purpose:** Register a new observability session. Called by the SDK on `initObservability()` (not yet implemented in SDK — backend should auto-create sessions from the first event batch as a fallback).
-**Request:**
-```json
-{
-  "sessionId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
-  "serviceId": "my-ai-app",
-  "metadata": {
-    "nodeVersion": "20.11.0",
-    "platform": "linux"
-  },
-  "startedAt": 1712851200000
-}
-```
-**Response:**
-```
-201 Created
-{ "ok": true, "sessionId": "f47ac10b-..." }
-```
----
-### Endpoint 3: `GET /api/observability/sessions`
-**Purpose:** List active and recent sessions for a service.
-**Query params:**
-| Param | Required | Description |
-|-------|----------|-------------|
-| `serviceId` | No | Filter by service |
-| `status` | No | `active` or `ended` (default: all) |
-| `limit` | No | Max results (default 50) |
-| `offset` | No | Pagination offset |
-**Response:**
-```json
-{
-  "sessions": [
-    {
-      "sessionId": "f47ac10b-...",
-      "serviceId": "my-ai-app",
-      "startedAt": 1712851200000,
-      "lastHeartbeat": 1712854800000,
-      "endedAt": null,
-      "eventCount": 1247,
-      "metadata": {}
-    }
-  ],
-  "total": 1
-}
-```
-**Session status inference:**
-- `active`: `endedAt` is null AND `lastHeartbeat` was within the last 90 seconds (3x heartbeat interval)
-- `ended`: `endedAt` is set OR `lastHeartbeat` is older than 90 seconds
----
-### Endpoint 4: `GET /api/observability/events`
-**Purpose:** Query stored events with filters.
-**Query params:**
-| Param | Required | Description |
-|-------|----------|-------------|
-| `sessionId` | Yes* | Filter by session |
-| `serviceId` | Yes* | Filter by service (*at least one required) |
-| `type` | No | `ai`, `tool`, `http`, `db`, `side_effect` |
-| `name` | No | Event name (tool/model name) |
-| `traceId` | No | Request-level trace ID |
-| `from` | No | Start timestamp (ms) |
-| `to` | No | End timestamp (ms) |
-| `limit` | No | Max results (default 100, max 1000) |
-| `offset` | No | Pagination offset |
-| `sort` | No | `asc` or `desc` by timestamp (default `desc`) |
-**Response:**
-```json
-{
-  "events": [ /* WorkflowEvent[] */ ],
-  "total": 1247,
-  "hasMore": true
-}
-```
----
-### Endpoint 5: `GET /api/observability/stats`
-**Purpose:** Aggregated metrics for a service over a time window.
-**Query params:**
-| Param | Required | Description |
-|-------|----------|-------------|
-| `serviceId` | Yes | Service to aggregate |
-| `window` | No | `1h`, `6h`, `24h`, `7d` (default `1h`) |
-| `from` | No | Custom window start (overrides `window`) |
-| `to` | No | Custom window end |
-**Response:**
-```json
-{
-  "serviceId": "my-ai-app",
-  "window": { "from": 1712847600000, "to": 1712851200000 },
-  "summary": {
-    "totalEvents": 1247,
-    "aiCalls": 340,
-    "toolCalls": 890,
-    "errors": 17,
-    "activeSessions": 2
-  },
-  "ai": {
-    "byModel": {
-      "gpt-4o": {
-        "count": 200,
-        "avgDurationMs": 1230,
-        "p95DurationMs": 3400,
-        "p99DurationMs": 5200,
-        "totalInputTokens": 150000,
-        "totalOutputTokens": 45000,
-        "errorCount": 5
-      },
-      "claude-sonnet-4-6": {
-        "count": 140,
-        "avgDurationMs": 980,
-        "p95DurationMs": 2100,
-        "p99DurationMs": 3800,
-        "totalInputTokens": 120000,
-        "totalOutputTokens": 38000,
-        "errorCount": 2
-      }
-    }
-  },
-  "tools": {
-    "byName": {
-      "searchDB": { "count": 450, "avgDurationMs": 45, "p95DurationMs": 120, "errorCount": 3 },
-      "sendEmail": { "count": 440, "avgDurationMs": 200, "p95DurationMs": 500, "errorCount": 7 }
-    }
-  }
-}
-```
----
-### Storage Schema
-#### Table: `observability_events`
-| Column | Type | Index | Description |
-|--------|------|-------|-------------|
-| `id` | UUID / BIGSERIAL | PK | Internal row ID |
-| `project_id` | UUID | YES | Resolved from API key |
-| `session_id` | UUID | YES | From payload |
-| `service_id` | VARCHAR(255) | YES | From payload |
-| `event_id` | INT | — | The SDK-assigned sequential ID |
-| `event_type` | VARCHAR(20) | YES | `ai`, `tool`, `http`, `db`, `side_effect` |
-| `event_name` | VARCHAR(500) | YES | Tool/model name |
-| `trace_id` | VARCHAR(255) | YES | Request-level grouping |
-| `input` | JSONB | — | Event input payload |
-| `output` | JSONB | — | Event output payload |
-| `timestamp` | BIGINT | YES | Unix ms from SDK |
-| `duration_ms` | INT | — | Execution time |
-| `usage_input_tokens` | INT | — | LLM input tokens |
-| `usage_output_tokens` | INT | — | LLM output tokens |
-| `usage_total_tokens` | INT | — | LLM total tokens |
-| `streamed` | BOOLEAN | — | Was response streamed |
-| `stream_raw` | TEXT | — | Buffered stream text |
-| `schema_version` | SMALLINT | — | Default 1 |
-| `created_at` | TIMESTAMPTZ | — | Row insert time |
-**Composite indexes:**
-- `(project_id, session_id, timestamp)`
-- `(project_id, service_id, timestamp)`
-- `(project_id, service_id, event_type, timestamp)`
-#### Table: `observability_sessions`
-| Column | Type | Index | Description |
-|--------|------|-------|-------------|
-| `session_id` | UUID | PK | From SDK |
-| `project_id` | UUID | YES | Resolved from API key |
-| `service_id` | VARCHAR(255) | YES | From SDK |
-| `started_at` | BIGINT | YES | Unix ms |
-| `last_heartbeat` | BIGINT | — | Updated on `__heartbeat__` events |
-| `ended_at` | BIGINT | — | Set on `__session_end__` event |
-| `event_count` | INT | — | Running counter |
-| `metadata` | JSONB | — | Session metadata |
-| `created_at` | TIMESTAMPTZ | — | Row insert time |
----
-### Event Processing Logic
-When receiving a batch at `POST /api/observability/events`:
-1. Validate payload schema
-2. Resolve `projectId` from API key
-3. Upsert session in `observability_sessions` (create if first batch for this sessionId)
-4. For each event:
-   - If `name === '__heartbeat__'`: update `last_heartbeat` on the session row, do not store as event
-   - If `name === '__session_end__'`: set `ended_at` on the session row, do not store as event
-   - Otherwise: insert into `observability_events`
-5. Increment `event_count` on the session row
-6. Return `202 Accepted`
----
-## Part 2: Dashboard Frontend Contract
-### Navigation
-Add a top-level **Observability** nav item alongside the existing test/workflow views.
-### Page 1: Services Overview (`/observability`)
-**Data source:** `GET /api/observability/sessions?status=active`
-**Display:**
-- Table of services with columns: Service ID, Active Sessions, Last Heartbeat (relative time), Total Events (24h)
-- Each row links to the service detail page
-- Auto-refresh every 10 seconds
-### Page 2: Service Detail (`/observability/:serviceId`)
-**Data sources:**
-- `GET /api/observability/sessions?serviceId=X`
-- `GET /api/observability/stats?serviceId=X&window=1h`
-**Display:**
-- **Header:** Service name, active session count, status badge (active/idle)
-- **Metrics cards:** Total events, AI calls, tool calls, error rate (from stats endpoint)
-- **Charts:**
-  - AI call volume over time (line chart, grouped by model)
-  - Average latency over time per model (line chart)
-  - Token usage breakdown (stacked bar: input vs output, by model)
-  - Error rate trend (line chart)
-- **Sessions table:** List of recent sessions with sessionId, startedAt, duration, eventCount, status
-- Time window selector: 1h / 6h / 24h / 7d
-### Page 3: Session Detail (`/observability/:serviceId/:sessionId`)
-**Data source:** `GET /api/observability/events?sessionId=X&limit=100`
-**Display:**
-- **Timeline/waterfall:** All events displayed chronologically as a vertical timeline
-  - Each event shows: type icon (AI/tool/http/db), name, duration bar, timestamp
-  - Color-coded by type
-  - Errors highlighted in red
-- **Event detail panel** (click to expand):
-  - Input viewer (syntax-highlighted JSON, collapsible)
-  - Output viewer (syntax-highlighted JSON, collapsible)
-  - Duration, token usage (for AI events)
-  - Stream raw content viewer (for streamed responses)
-- **Filters:** Type dropdown, name search, time range
-- **Grouping:** Option to group by `traceId` (shows request-level waterfall)
-- **Live mode:** SSE/WebSocket connection to stream new events in real-time for active sessions
-### Page 4: Live Trace View (`/observability/:serviceId/:sessionId/live`)
-**Data source:** SSE endpoint `GET /api/observability/events/stream?sessionId=X`
-**Display:**
-- Real-time event stream (newest at top or bottom, configurable)
-- Auto-scroll with pause button
-- Same event cards as session detail but streaming in
-- Connection status indicator (connected/reconnecting/disconnected)
----
-## Part 3: SDK → Backend Event Reference
-### Event Types the SDK Sends
-| `type` | `name` | When | Key fields |
-|--------|--------|------|------------|
-| `ai` | Model name (e.g. `gpt-4o`) | Every `wrapAI` call | `input`, `output`, `usage`, `durationMs`, `streamed` |
-| `tool` | Tool name (e.g. `searchDB`) | Every `wrapTool` call | `input`, `output`, `durationMs`, `streamed` |
-| `side_effect` | `__heartbeat__` | Every 30s (configurable) | `input.sessionId`, `output.uptime` |
-| `side_effect` | `__session_end__` | On `shutdownObservability()` | `input.sessionId`, `output.uptime` |
-### Special Events (do not display in trace UI)
-- `__heartbeat__` — update session liveness, do not store as event
-- `__session_end__` — mark session as ended, do not store as event
-### Streamed Events
-When `streamed === true`:
-- `output` is `null`
-- `streamRaw` contains the full buffered text of the stream
-- Display `streamRaw` as the output in the UI
-### Error Events
-When a tool or AI call throws:
-- `output` is `{ "error": "Error message string" }`
-- `durationMs` reflects time until failure
-- Display with error styling in the UI
----
-## Part 4: Portal (Remote Rerun Queue) Contract
-The SDK's `elasticdash portal` command starts an HTTP server that the backend can push rerun tasks to. The backend also needs endpoints to receive results.
-### SDK Portal Endpoints (hosted on user's machine, default port 4574)
-These endpoints are served by the SDK. The backend calls them.
-#### `POST /api/portal/tasks` — Push a single rerun task
-**Request:**
-```json
-{
-  "taskId": "task-uuid-from-backend",
-  "type": "tool",
-  "name": "searchDB",
-  "input": { "query": "pikachu" },
-  "metadata": { "testGroupId": 42, "expectationIds": [1, 2, 3] }
-}
-```
-For AI tasks:
-```json
-{
-  "taskId": "task-uuid-from-backend",
-  "type": "ai",
-  "name": "gpt-4o",
-  "input": { "messages": [{ "role": "user", "content": "Hello" }] },
-  "model": "gpt-4o",
-  "provider": "openai",
-  "modelParameters": { "temperature": 0.7, "max_tokens": 512 },
-  "metadata": { "testGroupId": 42 }
-}
-```
-**Response:** `202 Accepted`
-```json
-{ "ok": true, "taskId": "task-uuid-from-backend", "position": 3 }
-```
-**Auth:** `Authorization: Bearer <api_key>` (validated if portal was started with `--api-key`)
-#### `POST /api/portal/tasks/batch` — Push multiple tasks
-**Request:**
-```json
-{ "tasks": [ /* PortalTask[] */ ] }
-```
-**Response:** `202 Accepted`
-```json
-{ "ok": true, "tasks": [{ "taskId": "...", "position": 1 }, { "taskId": "...", "position": 2 }] }
-```
-#### `GET /api/portal/status` — Health check
-**Response:**
-```json
-{
-  "ok": true,
-  "queueLength": 5,
-  "processing": "task-uuid-123",
-  "completed": 12,
-  "failed": 1
-}
-```
-#### `DELETE /api/portal/tasks/:taskId` — Cancel a pending task
-**Response:** `200` if removed, `404` if not found or already processing.
----
-### Backend Endpoints (needed for portal to work)
-These endpoints must be implemented on the backend. The SDK calls them.
-#### `POST /api/portal/register` — Portal registration
-Called by the SDK when `elasticdash portal` starts.
-**Request:**
-```json
-{
-  "portalUrl": "http://localhost:4574"
-}
-```
-**Auth:** `Authorization: Bearer <api_key>`
-**Response:** `200 OK`
-```json
-{ "ok": true }
-```
-The backend should store this portal URL and use it to push tasks. The registration should be scoped to the project resolved from the API key.
-#### `POST /api/portal/results/:taskId` — Receive task result
-Called by the SDK after each task completes (success or failure).
-**Request:**
-```json
-{
-  "taskId": "task-uuid-from-backend",
-  "ok": true,
-  "output": "The search returned 3 results for pikachu...",
-  "durationMs": 245,
-  "usage": {
-    "inputTokens": 150,
-    "outputTokens": 45,
-    "totalTokens": 195
-  },
-  "metadata": { "testGroupId": 42, "expectationIds": [1, 2, 3] }
-}
-```
-For failed tasks:
-```json
-{
-  "taskId": "task-uuid-from-backend",
-  "ok": false,
-  "output": null,
-  "error": "Tool not found: \"searchDB\". Available tools: fetchData, sendEmail",
-  "durationMs": 0,
-  "metadata": { "testGroupId": 42 }
-}
-```
-**Auth:** `Authorization: Bearer <api_key>`
-**Response:** `200 OK`
-```json
-{ "ok": true }
-```
-**Backend responsibilities on receiving a result:**
-1. Store the result (output, durationMs, usage, error)
-2. Run evaluations (llm-judge, output-schema, token-budget, latency-budget, etc.)
-3. Update the test group run status
-4. If all tasks for a test group are complete, mark the run as finished
----
-### Error Results Reference
-The SDK sends these error patterns — the backend should handle them gracefully:
-| Error pattern | Meaning |
-|--------------|---------|
-| `Tool not found: "<name>". Available tools: ...` | Tool doesn't exist in `ed_tools.ts` |
-| `Cannot find ed_tools.ts/js in workspace root.` | No tools module in the project |
-| `Unsupported AI provider: "<name>"` | Unknown provider string |
-| `Missing API key for provider "<name>". Expected environment variable: <VAR>` | LLM API key not configured |
-| `AI task input is empty; cannot execute.` | No prompt could be extracted from input |
-| `AI execution failed: <message>` | LLM API call failed (rate limit, network, invalid model) |
-| `Tool subprocess produced no output.` | Subprocess exited without result |
-| `Failed to spawn tool subprocess: <message>` | Could not start subprocess |
-| `Missing tool name on task.` | Task had no `name` field |
-| `Unknown task type: <type>` | Task type was neither `tool` nor `ai` |
----
-## Part 5: Existing Endpoint Compatibility
-The existing test-run endpoints remain unchanged:
-| Endpoint | Used by | Change |
-|----------|---------|--------|
-| `POST /api/trace-events` | HTTP workflow test mode | No change — still fire-and-forget per event |
-| `GET /api/run-configs/:runId` | HTTP workflow test mode | No change |
-| `POST /api/validate-workflow` | Dashboard test runs | No change |
-Observability mode uses a completely separate set of endpoints (`/api/observability/*`). There is no overlap or conflict with test-run endpoints.