npm - elasticdash-sdk - Versions diffs - 0.2.8 → 0.2.9 - Mend

elasticdash-sdk 0.2.8 → 0.2.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/README.md +15 -8
package/dist/index.cjs +445 -50
package/dist/interceptors/ai-interceptor.d.ts +28 -0
package/dist/interceptors/ai-interceptor.d.ts.map +1 -1
package/dist/interceptors/ai-interceptor.js +535 -48
package/dist/interceptors/ai-interceptor.js.map +1 -1
package/dist/interceptors/http.d.ts.map +1 -1
package/dist/interceptors/http.js +14 -9
package/dist/interceptors/http.js.map +1 -1
package/dist/interceptors/workflow-ai.d.ts.map +1 -1
package/dist/interceptors/workflow-ai.js +5 -1
package/dist/interceptors/workflow-ai.js.map +1 -1
package/docs/agent-integration-guide.md +9 -5
package/docs/matchers.md +1 -1
package/docs/quickstart.md +8 -0
package/docs/security-compliance.md +1 -1
package/docs/test-writing-guidelines.md +23 -0
package/package.json +1 -1
package/src/interceptors/ai-interceptor.ts +528 -46
package/src/interceptors/http.ts +17 -11
package/src/interceptors/workflow-ai.ts +5 -1

package/README.md CHANGED Viewed

@@ -28,7 +28,7 @@ An AI-native test runner for ElasticDash workflow testing. Built for async AI pi
 ## Features
 - 🎯 **Trace-first testing** — every test gets a `trace` context to record and assert on LLM calls and tool invocations
-- 🔍 **Automatic AI interception** — captures OpenAI, Gemini, and Grok calls without code changes
+- 🔍 **Automatic AI interception** — captures OpenAI, Anthropic, Gemini, Grok, Kimi, and AWS Bedrock calls without code changes
 - 🧪 **AI-specific matchers** — semantic output matching, LLM-judged evaluations, prompt assertions
 - 🛠️ **Tool & LLM recording & replay** — automatically trace tool and AI calls with checkpoint-based replay and mock support
 - 📊 **Interactive dashboard** — browse workflows, debug traces, validate fixes visually
@@ -201,7 +201,7 @@ Duration: 3.4s
 ### Recording Trace Data
-**Automatic (recommended):** Workflow code making real API calls to OpenAI, Gemini, or Grok is automatically intercepted and recorded.
+**Automatic (recommended):** Workflow code making real API calls to OpenAI, Anthropic, Gemini, Grok, Kimi, or AWS Bedrock is automatically intercepted and recorded.
 **Manual (for custom providers or mocks):**
@@ -255,9 +255,13 @@ The runner automatically intercepts and records calls to:
 - OpenAI (`api.openai.com`)
 - Gemini (`generativelanguage.googleapis.com`)
 - Grok/xAI (`api.x.ai`)
+- Kimi/Moonshot (`api.moonshot.ai`)
+- AWS Bedrock (`bedrock-runtime.<region>.amazonaws.com`) — both `InvokeModel`/`InvokeModelWithResponseStream` and `Converse`/`ConverseStream`
 No code changes needed — just run your workflow and assertions work automatically. Because these providers are auto-captured, most workflows do **not** need to wrap LLM calls with `wrapAI`. See [Picking a wrapper](#picking-a-wrapper) below.
+> **Note on Bedrock:** The interceptor sits on `globalThis.fetch`, so any code that reaches Bedrock through `fetch` is auto-captured (browsers, Workers, Deno, thin REST wrappers, and SDKs that use undici/fetch under the hood). `@aws-sdk/client-bedrock-runtime` on Node uses its own HTTP signer and bypasses `globalThis.fetch` — wrap those calls with `wrapAI({ provider: 'bedrock', model })` so events still get tagged and mocked rerun can match them. See [AWS Bedrock](#aws-bedrock) below.
 ### Picking a wrapper
 The SDK exposes three wrappers that look similar but solve different problems. Pick by what your function actually does:
@@ -267,7 +271,7 @@ The SDK exposes three wrappers that look similar but solve different problems. P
 | Deterministic (REST call, DB query, file IO — no LLM inside) | **`edTool`** | Records as a `tool` event AND registers in the global tool registry so CLI `run-tool`, MCP `run_tool`, and dashboard rerun can find it by name. |
 | Exactly one LLM round-trip, AND you need prompt mocks, AI output mocks by name, OR the provider isn't auto-intercepted | **`wrapAI`** | Records as an `ai` event with token usage. Only `wrapAI` supports prompt rewriting (`resolvePromptMock` / `resolveUserPromptMock`) and named AI output mocks. |
 | An agent loop (LLM + inner tools, multiple round-trips) | **`edTool`** on the outer boundary | The inner LLM calls are auto-captured by the AI interceptor. Wrapping the outer agent with `wrapAI` would hide the inner detail. |
-| A direct single call to an auto-intercepted provider SDK (Anthropic / OpenAI / Gemini / Grok) | **No wrapper** | The AI interceptor already records it as an `ai` event with token usage. |
+| A direct single call to an auto-intercepted provider SDK (Anthropic / OpenAI / Gemini / Grok / Kimi / Bedrock via `fetch`) | **No wrapper** | The AI interceptor already records it as an `ai` event with token usage. |
 > **`wrapTool`** is the primitive that `edTool` builds on. Use `wrapTool` directly only when you specifically do not want registry registration — for example, wrapping an inline closure inside another function.
@@ -378,7 +382,11 @@ export const callClaude = wrapAI('claude-sonnet-4-5', async (messages: Anthropic
 #### AWS Bedrock
-Bedrock calls go through the AWS SDK (which uses Node's HTTP stack, not `globalThis.fetch`), so they are **not auto-intercepted**. Wrap them with `wrapAI` using the unified **Converse API** — its `{ usage: { inputTokens, outputTokens } }` response shape is auto-extracted, and tagging `provider` with the underlying vendor (e.g. `'claude'` for `anthropic.*` model IDs) means existing matchers like `expect(trace).toHaveLLMStep({ provider: 'claude' })` match Bedrock-served calls with no change:
+Bedrock is recognised by URL pattern (`bedrock-runtime.<region>.amazonaws.com`) and supports both API families: `InvokeModel` / `InvokeModelWithResponseStream` (including streaming via the binary `application/vnd.amazon.eventstream` format) and the unified `Converse` / `ConverseStream`. Model IDs, prompts, completions, and token usage are extracted automatically — including for cross-region inference profiles like `us.anthropic.…` or `au.anthropic.…`.
+**If your code reaches Bedrock through `globalThis.fetch`** (browsers, Cloudflare Workers, Deno, undici-based clients, or a thin REST wrapper), nothing else is required. The interceptor captures the call, records it as an `ai` event with token usage, freezes it during `rerun_step`, and replays it during `rerun_workflow_mocked`.
+**If your code uses `@aws-sdk/client-bedrock-runtime` on Node**, the AWS SDK runs through its own HTTP signer and bypasses `globalThis.fetch`. Wrap the call with `wrapAI` so events still get tagged and mocked rerun can match them — the Converse response's `{ usage: { inputTokens, outputTokens } }` shape is auto-extracted, and tagging `provider` with the underlying vendor (e.g. `'claude'` for `anthropic.*` model IDs) means existing matchers like `expect(trace).toHaveLLMStep({ provider: 'claude' })` work unchanged:
 ```ts
 import { wrapAI } from 'elasticdash-sdk'
@@ -399,21 +407,20 @@ export const callClaudeOnBedrock = wrapAI(
       inferenceConfig: { maxTokens: 1024 },
     }))
   },
-  { provider: 'claude', model: MODEL_ID },
+  { provider: 'bedrock', model: MODEL_ID },
 )
 ```
 Notes:
 - **Credentials** come from the standard AWS provider chain (env vars, shared credentials file, IAM role) — the SDK does not manage them.
-- **Other vendors on Bedrock** (Llama, Titan, Mistral, Cohere, AI21) use the same pattern — change `modelId` and tag `provider` with the underlying vendor name (e.g. `'meta'`, `'amazon'`).
-- **Dashboard reruns of Bedrock events are not supported.** Re-run the workflow that contains the call instead of clicking rerun on the individual step.
+- **Other vendors on Bedrock** (Llama, Titan, Mistral, Cohere, AI21) use the same pattern. For Converse the response shape is identical across vendors. For raw `InvokeModel`, Anthropic gets first-class extraction; other vendors fall back to a best-effort `outputText` / `generation` / `choices` lookup.
 #### Use `wrapAI` when
 The function body is essentially one LLM round-trip, AND at least one of the following applies:
-- The provider is **not auto-intercepted** (anything outside Anthropic / OpenAI / Gemini / Grok — e.g., Mistral, Cohere, local Ollama, Bedrock).
+- The provider is **not auto-intercepted** (anything outside Anthropic / OpenAI / Gemini / Grok / Kimi / Bedrock — e.g., Mistral direct, Cohere direct, local Ollama), or the SDK bypasses `globalThis.fetch` (notably `@aws-sdk/client-bedrock-runtime` on Node).
 - You want **prompt mocks** — system or user prompt rewriting via `resolvePromptMock` / `resolveUserPromptMock` keyed by the name you pass to `wrapAI`. This is exclusive to `wrapAI`.
 - You want **AI output mocks keyed by a named step** — e.g., mock the `"router"` call without mocking every call to the same model. `resolveAIMock` keys off the name argument.
 - You want **one labelled boundary per logical step** in the trace (e.g., `"router"`, `"summarizer"`) with token usage attributed to that label, distinct from the raw provider-level event.