elasticdash-sdk 0.2.8 โ 0.2.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +15 -8
- package/dist/index.cjs +445 -50
- package/dist/interceptors/ai-interceptor.d.ts +28 -0
- package/dist/interceptors/ai-interceptor.d.ts.map +1 -1
- package/dist/interceptors/ai-interceptor.js +535 -48
- package/dist/interceptors/ai-interceptor.js.map +1 -1
- package/dist/interceptors/http.d.ts.map +1 -1
- package/dist/interceptors/http.js +14 -9
- package/dist/interceptors/http.js.map +1 -1
- package/dist/interceptors/workflow-ai.d.ts.map +1 -1
- package/dist/interceptors/workflow-ai.js +5 -1
- package/dist/interceptors/workflow-ai.js.map +1 -1
- package/docs/agent-integration-guide.md +9 -5
- package/docs/matchers.md +1 -1
- package/docs/quickstart.md +8 -0
- package/docs/security-compliance.md +1 -1
- package/docs/test-writing-guidelines.md +23 -0
- package/package.json +1 -1
- package/src/interceptors/ai-interceptor.ts +528 -46
- package/src/interceptors/http.ts +17 -11
- package/src/interceptors/workflow-ai.ts +5 -1
package/README.md
CHANGED
|
@@ -28,7 +28,7 @@ An AI-native test runner for ElasticDash workflow testing. Built for async AI pi
|
|
|
28
28
|
## Features
|
|
29
29
|
|
|
30
30
|
- ๐ฏ **Trace-first testing** โ every test gets a `trace` context to record and assert on LLM calls and tool invocations
|
|
31
|
-
- ๐ **Automatic AI interception** โ captures OpenAI, Gemini, and
|
|
31
|
+
- ๐ **Automatic AI interception** โ captures OpenAI, Anthropic, Gemini, Grok, Kimi, and AWS Bedrock calls without code changes
|
|
32
32
|
- ๐งช **AI-specific matchers** โ semantic output matching, LLM-judged evaluations, prompt assertions
|
|
33
33
|
- ๐ ๏ธ **Tool & LLM recording & replay** โ automatically trace tool and AI calls with checkpoint-based replay and mock support
|
|
34
34
|
- ๐ **Interactive dashboard** โ browse workflows, debug traces, validate fixes visually
|
|
@@ -201,7 +201,7 @@ Duration: 3.4s
|
|
|
201
201
|
|
|
202
202
|
### Recording Trace Data
|
|
203
203
|
|
|
204
|
-
**Automatic (recommended):** Workflow code making real API calls to OpenAI, Gemini, or
|
|
204
|
+
**Automatic (recommended):** Workflow code making real API calls to OpenAI, Anthropic, Gemini, Grok, Kimi, or AWS Bedrock is automatically intercepted and recorded.
|
|
205
205
|
|
|
206
206
|
**Manual (for custom providers or mocks):**
|
|
207
207
|
|
|
@@ -255,9 +255,13 @@ The runner automatically intercepts and records calls to:
|
|
|
255
255
|
- OpenAI (`api.openai.com`)
|
|
256
256
|
- Gemini (`generativelanguage.googleapis.com`)
|
|
257
257
|
- Grok/xAI (`api.x.ai`)
|
|
258
|
+
- Kimi/Moonshot (`api.moonshot.ai`)
|
|
259
|
+
- AWS Bedrock (`bedrock-runtime.<region>.amazonaws.com`) โ both `InvokeModel`/`InvokeModelWithResponseStream` and `Converse`/`ConverseStream`
|
|
258
260
|
|
|
259
261
|
No code changes needed โ just run your workflow and assertions work automatically. Because these providers are auto-captured, most workflows do **not** need to wrap LLM calls with `wrapAI`. See [Picking a wrapper](#picking-a-wrapper) below.
|
|
260
262
|
|
|
263
|
+
> **Note on Bedrock:** The interceptor sits on `globalThis.fetch`, so any code that reaches Bedrock through `fetch` is auto-captured (browsers, Workers, Deno, thin REST wrappers, and SDKs that use undici/fetch under the hood). `@aws-sdk/client-bedrock-runtime` on Node uses its own HTTP signer and bypasses `globalThis.fetch` โ wrap those calls with `wrapAI({ provider: 'bedrock', model })` so events still get tagged and mocked rerun can match them. See [AWS Bedrock](#aws-bedrock) below.
|
|
264
|
+
|
|
261
265
|
### Picking a wrapper
|
|
262
266
|
|
|
263
267
|
The SDK exposes three wrappers that look similar but solve different problems. Pick by what your function actually does:
|
|
@@ -267,7 +271,7 @@ The SDK exposes three wrappers that look similar but solve different problems. P
|
|
|
267
271
|
| Deterministic (REST call, DB query, file IO โ no LLM inside) | **`edTool`** | Records as a `tool` event AND registers in the global tool registry so CLI `run-tool`, MCP `run_tool`, and dashboard rerun can find it by name. |
|
|
268
272
|
| Exactly one LLM round-trip, AND you need prompt mocks, AI output mocks by name, OR the provider isn't auto-intercepted | **`wrapAI`** | Records as an `ai` event with token usage. Only `wrapAI` supports prompt rewriting (`resolvePromptMock` / `resolveUserPromptMock`) and named AI output mocks. |
|
|
269
273
|
| An agent loop (LLM + inner tools, multiple round-trips) | **`edTool`** on the outer boundary | The inner LLM calls are auto-captured by the AI interceptor. Wrapping the outer agent with `wrapAI` would hide the inner detail. |
|
|
270
|
-
| A direct single call to an auto-intercepted provider SDK (Anthropic / OpenAI / Gemini / Grok) | **No wrapper** | The AI interceptor already records it as an `ai` event with token usage. |
|
|
274
|
+
| A direct single call to an auto-intercepted provider SDK (Anthropic / OpenAI / Gemini / Grok / Kimi / Bedrock via `fetch`) | **No wrapper** | The AI interceptor already records it as an `ai` event with token usage. |
|
|
271
275
|
|
|
272
276
|
> **`wrapTool`** is the primitive that `edTool` builds on. Use `wrapTool` directly only when you specifically do not want registry registration โ for example, wrapping an inline closure inside another function.
|
|
273
277
|
|
|
@@ -378,7 +382,11 @@ export const callClaude = wrapAI('claude-sonnet-4-5', async (messages: Anthropic
|
|
|
378
382
|
|
|
379
383
|
#### AWS Bedrock
|
|
380
384
|
|
|
381
|
-
Bedrock
|
|
385
|
+
Bedrock is recognised by URL pattern (`bedrock-runtime.<region>.amazonaws.com`) and supports both API families: `InvokeModel` / `InvokeModelWithResponseStream` (including streaming via the binary `application/vnd.amazon.eventstream` format) and the unified `Converse` / `ConverseStream`. Model IDs, prompts, completions, and token usage are extracted automatically โ including for cross-region inference profiles like `us.anthropic.โฆ` or `au.anthropic.โฆ`.
|
|
386
|
+
|
|
387
|
+
**If your code reaches Bedrock through `globalThis.fetch`** (browsers, Cloudflare Workers, Deno, undici-based clients, or a thin REST wrapper), nothing else is required. The interceptor captures the call, records it as an `ai` event with token usage, freezes it during `rerun_step`, and replays it during `rerun_workflow_mocked`.
|
|
388
|
+
|
|
389
|
+
**If your code uses `@aws-sdk/client-bedrock-runtime` on Node**, the AWS SDK runs through its own HTTP signer and bypasses `globalThis.fetch`. Wrap the call with `wrapAI` so events still get tagged and mocked rerun can match them โ the Converse response's `{ usage: { inputTokens, outputTokens } }` shape is auto-extracted, and tagging `provider` with the underlying vendor (e.g. `'claude'` for `anthropic.*` model IDs) means existing matchers like `expect(trace).toHaveLLMStep({ provider: 'claude' })` work unchanged:
|
|
382
390
|
|
|
383
391
|
```ts
|
|
384
392
|
import { wrapAI } from 'elasticdash-sdk'
|
|
@@ -399,21 +407,20 @@ export const callClaudeOnBedrock = wrapAI(
|
|
|
399
407
|
inferenceConfig: { maxTokens: 1024 },
|
|
400
408
|
}))
|
|
401
409
|
},
|
|
402
|
-
{ provider: '
|
|
410
|
+
{ provider: 'bedrock', model: MODEL_ID },
|
|
403
411
|
)
|
|
404
412
|
```
|
|
405
413
|
|
|
406
414
|
Notes:
|
|
407
415
|
|
|
408
416
|
- **Credentials** come from the standard AWS provider chain (env vars, shared credentials file, IAM role) โ the SDK does not manage them.
|
|
409
|
-
- **Other vendors on Bedrock** (Llama, Titan, Mistral, Cohere, AI21) use the same pattern
|
|
410
|
-
- **Dashboard reruns of Bedrock events are not supported.** Re-run the workflow that contains the call instead of clicking rerun on the individual step.
|
|
417
|
+
- **Other vendors on Bedrock** (Llama, Titan, Mistral, Cohere, AI21) use the same pattern. For Converse the response shape is identical across vendors. For raw `InvokeModel`, Anthropic gets first-class extraction; other vendors fall back to a best-effort `outputText` / `generation` / `choices` lookup.
|
|
411
418
|
|
|
412
419
|
#### Use `wrapAI` when
|
|
413
420
|
|
|
414
421
|
The function body is essentially one LLM round-trip, AND at least one of the following applies:
|
|
415
422
|
|
|
416
|
-
- The provider is **not auto-intercepted** (anything outside Anthropic / OpenAI / Gemini / Grok โ e.g., Mistral, Cohere, local Ollama,
|
|
423
|
+
- The provider is **not auto-intercepted** (anything outside Anthropic / OpenAI / Gemini / Grok / Kimi / Bedrock โ e.g., Mistral direct, Cohere direct, local Ollama), or the SDK bypasses `globalThis.fetch` (notably `@aws-sdk/client-bedrock-runtime` on Node).
|
|
417
424
|
- You want **prompt mocks** โ system or user prompt rewriting via `resolvePromptMock` / `resolveUserPromptMock` keyed by the name you pass to `wrapAI`. This is exclusive to `wrapAI`.
|
|
418
425
|
- You want **AI output mocks keyed by a named step** โ e.g., mock the `"router"` call without mocking every call to the same model. `resolveAIMock` keys off the name argument.
|
|
419
426
|
- You want **one labelled boundary per logical step** in the trace (e.g., `"router"`, `"summarizer"`) with token usage attributed to that label, distinct from the raw provider-level event.
|