npm - elasticdash-sdk - Versions diffs - 0.2.6 → 0.2.7-beta-2 - Mend

elasticdash-sdk 0.2.6 → 0.2.7-beta-2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/README.md +133 -26
package/dist/dashboard-server.d.ts.map +1 -1
package/dist/dashboard-server.js +76 -3
package/dist/dashboard-server.js.map +1 -1
package/dist/execution/tool-runner.d.ts.map +1 -1
package/dist/execution/tool-runner.js +66 -5
package/dist/execution/tool-runner.js.map +1 -1
package/dist/index.cjs +57 -6
package/dist/tool-runner-worker.js +27 -2
package/dist/tool-runner-worker.js.map +1 -1
package/dist/trigger-executor.d.ts.map +1 -1
package/dist/trigger-executor.js +3 -1
package/dist/trigger-executor.js.map +1 -1
package/dist/workflow-runner-worker.js +24 -0
package/dist/workflow-runner-worker.js.map +1 -1
package/docs/agent-coding-instructions.md +8 -5
package/docs/agent-integration-guide.md +158 -16
package/docs/partial-mocking.md +10 -4
package/docs/workflow-modes.md +6 -7
package/package.json +1 -1
package/src/dashboard-server.ts +71 -3
package/src/execution/tool-runner.ts +62 -5
package/src/tool-runner-worker.ts +22 -2
package/src/trigger-executor.ts +3 -1
package/src/workflow-runner-worker.ts +23 -0

package/README.md CHANGED Viewed

@@ -46,28 +46,44 @@ npm install elasticdash-sdk
 **Requirements:** Node 20+. For Deno projects, see [Using elasticdash-sdk in Deno](docs/deno.md).
-### Setup with a Coding Agent
+### Setup with a Coding Agent (required after install)
-If you use a coding agent (Claude Code, Cursor, Copilot, Codex, Windsurf, etc.), tell your agent:
+`npm install` alone wires nothing — the SDK only starts capturing traces once `ed_tools.ts` and `ed_workflows.ts` exist and the init call runs. Use a coding agent to do that wiring. The smoothest path is two extra steps after install:
+**Step 2 — bake the integration guide into your project so the coding agent always sees it:**
+```bash
+npx elasticdash init-guide --target CLAUDE.md           # Claude Code
+npx elasticdash init-guide --target AGENTS.md           # default — works for Codex, Windsurf, others
+npx elasticdash init-guide --target .cursor/rules/elasticdash.md           # Cursor
+npx elasticdash init-guide --target .github/copilot-instructions.md        # Copilot
 ```
-Integrate elasticdash-sdk into this project.
-Read node_modules/elasticdash-sdk/docs/agent-coding-instructions.md for how to proceed,
-and node_modules/elasticdash-sdk/docs/agent-integration-guide.md for technical reference.
+If the target file already exists, the guide is appended (not overwritten). Use `--force` to replace the file entirely. Pick the target your agent actually reads; one file is enough.
+**Step 3 — tell your coding agent:**
+```
+Complete the elasticdash-sdk integration following the guide that was just added to this project.
 ```
-Your agent will read both docs and handle the full setup — creating `ed_tools.ts`, `ed_workflows.ts`, updating source files, and validating the connection.
+That's it. The agent reads the baked-in guide (which transcludes the same content as `node_modules/elasticdash-sdk/docs/agent-coding-instructions.md` and `agent-integration-guide.md`), then creates `ed_tools.ts`, `ed_workflows.ts`, calls `edInitObservability` from the entry point, updates source files to route tool calls through `ed_tools`, and validates the connection.
-**Optional:** To copy the agent instructions into your project for easier access:
+> **Do not shortcut this step.** Without `ed_tools.ts` and `ed_workflows.ts` plus the init call, the SDK does not intercept tool or AI calls — your project will run without errors and produce zero traces. A vague prompt like "install elasticdash-sdk" lets the agent stop at `npm install`; the prompt above is explicit about completing integration.
-```bash
-npx elasticdash init-guide                              # creates AGENTS.md
-npx elasticdash init-guide --target CLAUDE.md            # for Claude Code
-npx elasticdash init-guide --target .cursor/rules/elasticdash.md  # for Cursor
-npx elasticdash init-guide --target .github/copilot-instructions.md  # for Copilot
+> **Init must go through `edInitObservability` (the helper inside `ed_workflows.ts`), not `import { initObservability } from 'elasticdash-sdk'` in your entry file.** Both files in the integration share one CJS module instance via `createRequire(import.meta.url)`; importing `initObservability` directly hits a *different* ESM instance, leaving `_ed.startTrace` reading from an empty store. The symptom is `[elasticdash] startTrace: observability not initialised` at runtime. The integration guide's Step 3 explains why; the `edInitObservability` helper is the only correct path. For CLI scripts, also call `edShutdownObservability()` from a `finally` block at process exit — the SDK's auto-registered exit hooks are async and short-lived processes can terminate before the final batch flushes.
+> **Important: do not use `eval('require')` to load the SDK in `ed_tools.ts`.** The `eval('require')(...)` trick that older versions of this guide recommended works only in CJS — in any project with `"type": "module"` in `package.json`, it throws "require is not defined", the catch silently swallows the error, and the entire integration no-ops with zero logs and zero traces. Use `createRequire(import.meta.url)` from `node:module` instead; it works in both ESM and CJS.
+**Fallback** — if you don't want to add a file to your repo, you can skip `init-guide` and use this prompt instead, which directs the agent at the docs inside `node_modules/`:
+```
+Integrate elasticdash-sdk into this project.
+Read node_modules/elasticdash-sdk/docs/agent-coding-instructions.md for how to proceed,
+and node_modules/elasticdash-sdk/docs/agent-integration-guide.md for technical reference.
 ```
-If the target file already exists, the guide is appended (not overwritten). Use `--force` to replace the file entirely.
+This works but is more fragile — relies on the agent following the doc-reading instruction literally, and breaks if a different agent picks up the project later without the same prompt.
 ### Cloud Setup
@@ -235,20 +251,46 @@ expect(ctx.trace).toHaveCustomStep({ kind: 'rag', name: 'pokemon-search' })
 ### AI Interception
 The runner automatically intercepts and records calls to:
+- Anthropic (`api.anthropic.com`)
 - OpenAI (`api.openai.com`)
 - Gemini (`generativelanguage.googleapis.com`)
 - Grok/xAI (`api.x.ai`)
-No code changes needed — just run your workflow and assertions work automatically.
+No code changes needed — just run your workflow and assertions work automatically. Because these providers are auto-captured, most workflows do **not** need to wrap LLM calls with `wrapAI`. See [Picking a wrapper](#picking-a-wrapper) below.
+### Picking a wrapper
+The SDK exposes three wrappers that look similar but solve different problems. Pick by what your function actually does:
+| Your function is… | Use | Why |
+|---|---|---|
+| Deterministic (REST call, DB query, file IO — no LLM inside) | **`edTool`** | Records as a `tool` event AND registers in the global tool registry so CLI `run-tool`, MCP `run_tool`, and dashboard rerun can find it by name. |
+| Exactly one LLM round-trip, AND you need prompt mocks, AI output mocks by name, OR the provider isn't auto-intercepted | **`wrapAI`** | Records as an `ai` event with token usage. Only `wrapAI` supports prompt rewriting (`resolvePromptMock` / `resolveUserPromptMock`) and named AI output mocks. |
+| An agent loop (LLM + inner tools, multiple round-trips) | **`edTool`** on the outer boundary | The inner LLM calls are auto-captured by the AI interceptor. Wrapping the outer agent with `wrapAI` would hide the inner detail. |
+| A direct single call to an auto-intercepted provider SDK (Anthropic / OpenAI / Gemini / Grok) | **No wrapper** | The AI interceptor already records it as an `ai` event with token usage. |
+> **`wrapTool`** is the primitive that `edTool` builds on. Use `wrapTool` directly only when you specifically do not want registry registration — for example, wrapping an inline closure inside another function.
 ### Tool Recording
-**Recommended: `wrapTool`** wraps a tool function and automatically records its name, input, output, duration, and any streaming output. Works in both subprocess mode and HTTP mode:
+**Recommended: `edTool`** wraps a tool function (recording its name, input, output, duration, and any streaming output) *and* registers it in a global tool registry so it can be invoked by name from the CLI (`npx elasticdash run-tool <name>`), the MCP `run_tool`, and dashboard rerun:
 ```ts
-import { wrapTool } from 'elasticdash-sdk'
+import { edTool } from 'elasticdash-sdk'
 import { runSelectQuery } from './services/dataService'
+export const dataService = edTool('dataService', async (input: { query: string }) => {
+  return await runSelectQuery(input.query)
+})
+```
+Same event shape as `wrapTool` (`type: 'tool'`), so the existing tool-mock pipeline (`snapshot_mock_profile`, `mocked_tools_overrides`, strict mode) works unchanged. `defineTool` is an exported alias of `edTool`.
+**Lower-level: `wrapTool`** — same tracing behavior without the registry registration. Use this only when you have a specific reason to keep the tool unregistered (e.g., a closure created inside another function):
+```ts
+import { wrapTool } from 'elasticdash-sdk'
 export const dataService = wrapTool('dataService', async (input: { query: string }) => {
   return await runSelectQuery(input.query)
 })
@@ -287,9 +329,37 @@ In manual mode, always isolate tracing in a separate `try/catch` so trace loggin
 **→ See [Tool Recording & Replay](docs/tools.md) for checkpoint-based replay and freezing**
+### Agent-loop pattern
+If your "tool" is actually an agent — a function that calls an LLM and may iterate through tool-use blocks — wrap the outer boundary with `edTool`, **not** `wrapAI`. The AI interceptor will auto-record each inner LLM call as a separate `ai` event nested under the trace:
+```ts
+import { edTool } from 'elasticdash-sdk'
+import Anthropic from '@anthropic-ai/sdk'
+const client = new Anthropic()
+async function runSearchAgent(input: { query: string }) {
+  // Agent loop: each iteration produces its own auto-recorded `ai` event
+  while (true) {
+    const res = await client.messages.create({
+      model: 'claude-sonnet-4-5-20250929',
+      max_tokens: 1024,
+      messages: [/* ... */],
+    })
+    if (res.stop_reason === 'end_turn') return res
+    // ... handle tool_use blocks, append tool_result, loop
+  }
+}
+export const search = edTool('search', runSearchAgent)
+```
+Wrapping `runSearchAgent` with `wrapAI` instead would record one `ai` event covering the whole loop and hide the per-iteration calls. `edTool` keeps the agent visible as a single named, rerunnable, mockable boundary while leaving inner LLM detail intact for assertions and replay.
 ### AI Call Recording
-**`wrapAI`** wraps any AI call function and records its name, input, output, duration, and token usage (auto-detected for Anthropic, OpenAI, and Gemini SDK responses):
+**`wrapAI`** wraps a **single** LLM call and records it as a `type: 'ai'` event with name, input, output, duration, and token usage (auto-detected for Anthropic, OpenAI, and Gemini SDK responses):
 ```ts
 import { wrapAI } from 'elasticdash-sdk'
@@ -306,7 +376,20 @@ export const callClaude = wrapAI('claude-sonnet-4-5', async (messages: Anthropic
 })
 ```
-Use `wrapAI` when you have a custom AI wrapper or a provider not covered by automatic interception. For direct OpenAI/Anthropic/Gemini SDK calls inside a subprocess workflow, automatic interception via `installAIInterceptor` already handles recording without any code changes.
+#### Use `wrapAI` when
+The function body is essentially one LLM round-trip, AND at least one of the following applies:
+- The provider is **not auto-intercepted** (anything outside Anthropic / OpenAI / Gemini / Grok — e.g., Mistral, Cohere, local Ollama, Bedrock).
+- You want **prompt mocks** — system or user prompt rewriting via `resolvePromptMock` / `resolveUserPromptMock` keyed by the name you pass to `wrapAI`. This is exclusive to `wrapAI`.
+- You want **AI output mocks keyed by a named step** — e.g., mock the `"router"` call without mocking every call to the same model. `resolveAIMock` keys off the name argument.
+- You want **one labelled boundary per logical step** in the trace (e.g., `"router"`, `"summarizer"`) with token usage attributed to that label, distinct from the raw provider-level event.
+#### Do NOT use `wrapAI` when
+- The function is an **agent loop** (LLM + inner tool calls, multiple round-trips). Use `edTool` on the outer boundary and let the AI interceptor record each inner LLM call. See [Agent-loop pattern](#agent-loop-pattern) above.
+- The function is a **direct single-call use** of an auto-intercepted provider's SDK. The interceptor already records it as a `type: 'ai'` event with token usage — adding `wrapAI` only adds a redundant labelled wrapper.
+- The function **does not call an LLM**. Use `edTool`.
 **AI mocking (subprocess / test runner mode):** `wrapAI` also checks `resolveAIMock` at call time, so the dashboard can mock LLM responses the same way it mocks tool calls — without modifying your server code. Configure an `AIMockConfig` in the dashboard UI or pass it programmatically via the `aiMockConfig` option when running a workflow.
@@ -396,21 +479,25 @@ This file loads the SDK, shares the module instance with `ed_workflows.ts`, and
 ```ts
 // ed_tools.ts
+import { createRequire } from 'node:module';
 import { setElasticDashModule } from './ed_workflows';
-let wrapTool: <T extends (...args: any[]) => any>(name: string, fn: T) => T = (_name, fn) => fn;
+let edTool: <T extends (...args: any[]) => any>(name: string, fn: T) => T = (_name, fn) => fn;
+// `createRequire(import.meta.url)` works in BOTH ESM (`"type": "module"`)
+// and CJS projects. Do NOT use `eval('require')` — it silently throws in
+// ESM and the whole integration produces zero traces with zero logs.
+const nodeRequire = createRequire(import.meta.url);
 try {
-  // For Next.js / Turbopack: eval('require') bypasses static analysis.
-  // For plain Node.js projects you can use a normal require() or import.
-  const _edModule = (eval('require') as (id: string) => any)('elasticdash-sdk');
-  wrapTool = _edModule.wrapTool ?? wrapTool;
+  const _edModule = nodeRequire('elasticdash-sdk');
+  edTool = _edModule.edTool ?? _edModule.wrapTool ?? edTool;
   setElasticDashModule(_edModule);
-} catch {
-  // elasticdash-sdk not available — tools run without tracing
+} catch (err) {
+  console.error('[ed_tools] failed to load elasticdash-sdk:', err);
 }
-export const myTool = wrapTool('myTool', async (input: { query: string }) => {
+export const myTool = edTool('myTool', async (input: { query: string }) => {
   // ... your tool logic
 });
 ```
@@ -711,6 +798,26 @@ npx elasticdash dashboard --port 5000
 Optional project file: `ed_workers.ts` can be used by your app architecture (for example, exporting worker handlers), but it is not required or discovered by the ElasticDash CLI/dashboard.
+### Debugging reruns
+Workflow and tool reruns each run in an isolated subprocess. When a rerun hangs, runs unexpectedly slow, or fails with an opaque error, set these environment variables to surface what the parent and the worker are doing:
+| Variable | Default | Effect |
+|---|---|---|
+| `ELASTICDASH_DEBUG` | unset | When `1`, parent and worker emit stage breadcrumbs to stderr (`stage=spawned`, `stage=payload-written`, `stage=first-stdout`, `stage=workflow-call-start/end`, `stage=closed`, etc.) with `pid` and `elapsedMs`. |
+| `ELASTICDASH_HEARTBEAT_MS` | `5000` | Interval (ms) for the parent to log `still running pid=… elapsedMs=…` while a subprocess is alive. Set `0` to disable. Only emitted when `ELASTICDASH_DEBUG=1`. |
+| `ELASTICDASH_TOOL_TIMEOUT_MS` | unset (no timeout) | When set, the parent kills the **tool** subprocess after N ms (`SIGTERM`, then `SIGKILL` after a 2s grace) and surfaces `Tool subprocess timed out after Nms` with the child's exit code, signal, and last stderr. |
+| `ELASTICDASH_WORKFLOW_TIMEOUT_MS` | unset (no timeout) | Same as above for the **workflow** subprocess. |
+On failure, the parent's `error` string now always includes `[exit=… signal=… elapsedMs=… pid=… stderrBytes=…]` plus the last 1 KB of stderr — so an empty-output failure is no longer indistinguishable from a crash or signal kill.
+Example:
+```bash
+ELASTICDASH_DEBUG=1 ELASTICDASH_HEARTBEAT_MS=2000 ELASTICDASH_TOOL_TIMEOUT_MS=30000 \
+  npx elasticdash dashboard
+```
 ## TypeScript Setup
 For typed globals and matchers, extend your test directory's `tsconfig.json`:

package/dist/dashboard-server.d.ts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"dashboard-server.d.ts","sourceRoot":"","sources":["../src/dashboard-server.ts"],"names":[],"mappings":"AAeA,MAAM,WAAW,YAAY;IAC3B,IAAI,EAAE,MAAM,CAAA;IACZ,OAAO,EAAE,OAAO,CAAA;IAChB,SAAS,EAAE,MAAM,CAAA;IACjB,QAAQ,EAAE,MAAM,CAAA;IAChB,UAAU,CAAC,EAAE,MAAM,CAAA;IACnB,UAAU,CAAC,EAAE,MAAM,CAAA;IACnB,YAAY,CAAC,EAAE,MAAM,CAAA;IACrB,UAAU,CAAC,EAAE,MAAM,CAAA;CACpB;AAED,MAAM,WAAW,QAAQ;IACvB,IAAI,EAAE,MAAM,CAAA;IACZ,OAAO,EAAE,OAAO,CAAA;IAChB,SAAS,EAAE,MAAM,CAAA;IACjB,QAAQ,EAAE,MAAM,CAAA;IAChB,UAAU,CAAC,EAAE,MAAM,CAAA;IACnB,UAAU,CAAC,EAAE,MAAM,CAAA;CACpB;AAED,MAAM,WAAW,SAAS;IACxB,SAAS,EAAE,YAAY,EAAE,CAAA;IACzB,KAAK,EAAE,QAAQ,EAAE,CAAA;CAClB;AAED,MAAM,WAAW,sBAAsB;IACrC,IAAI,CAAC,EAAE,MAAM,CAAA;IACb,QAAQ,CAAC,EAAE,OAAO,CAAA;CACnB;AAED,MAAM,WAAW,eAAe;IAC9B,GAAG,EAAE,MAAM,CAAA;IACX,KAAK,IAAI,OAAO,CAAC,IAAI,CAAC,CAAA;CACvB;AA2CD,6DAA6D;AAC7D,MAAM,WAAW,aAAa;IAC5B,oHAAoH;IACpH,IAAI,EAAE,MAAM,GAAG,UAAU,GAAG,eAAe,CAAA;IAC3C,uEAAuE;IACvE,WAAW,CAAC,EAAE,MAAM,EAAE,CAAA;IACtB,wEAAwE;IACxE,QAAQ,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAA;CACnC;AAED,MAAM,WAAW,cAAc;IAC7B,CAAC,QAAQ,EAAE,MAAM,GAAG,aAAa,CAAA;CAClC;AAED,iEAAiE;AACjE,MAAM,WAAW,WAAW;IAC1B,IAAI,EAAE,MAAM,GAAG,UAAU,GAAG,eAAe,CAAA;IAC3C,WAAW,CAAC,EAAE,MAAM,EAAE,CAAA;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAA;CACnC;AAED,MAAM,WAAW,YAAY;IAC3B,CAAC,SAAS,EAAE,MAAM,GAAG,WAAW,CAAA;CACjC;~~AA42GD~~,MAAM,WAAW,kBAAkB;IACjC,IAAI,EAAE,MAAM,CAAA;IACZ,GAAG,EAAE,MAAM,CAAA;IACX,MAAM,CAAC,EAAE,MAAM,CAAA;IACf,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAA;IAChC,YAAY,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAA;IACtC,cAAc,CAAC,EAAE,kBAAkB,GAAG,MAAM,CAAA;CAC7C;AA+ID;;GAEG;AACH,wBAAsB,oBAAoB,CACxC,GAAG,EAAE,MAAM,EACX,OAAO,GAAE,sBAA2B,GACnC,OAAO,CAAC,eAAe,CAAC,CA2d1B;AAiFD,eAAO,MAAM,aAAa,EAAE,GAAG,CAAC,MAAM,EAAE,MAAM,CAAa,CAAC"}
1	+ {"version":3,"file":"dashboard-server.d.ts","sourceRoot":"","sources":["../src/dashboard-server.ts"],"names":[],"mappings":"AAeA,MAAM,WAAW,YAAY;IAC3B,IAAI,EAAE,MAAM,CAAA;IACZ,OAAO,EAAE,OAAO,CAAA;IAChB,SAAS,EAAE,MAAM,CAAA;IACjB,QAAQ,EAAE,MAAM,CAAA;IAChB,UAAU,CAAC,EAAE,MAAM,CAAA;IACnB,UAAU,CAAC,EAAE,MAAM,CAAA;IACnB,YAAY,CAAC,EAAE,MAAM,CAAA;IACrB,UAAU,CAAC,EAAE,MAAM,CAAA;CACpB;AAED,MAAM,WAAW,QAAQ;IACvB,IAAI,EAAE,MAAM,CAAA;IACZ,OAAO,EAAE,OAAO,CAAA;IAChB,SAAS,EAAE,MAAM,CAAA;IACjB,QAAQ,EAAE,MAAM,CAAA;IAChB,UAAU,CAAC,EAAE,MAAM,CAAA;IACnB,UAAU,CAAC,EAAE,MAAM,CAAA;CACpB;AAED,MAAM,WAAW,SAAS;IACxB,SAAS,EAAE,YAAY,EAAE,CAAA;IACzB,KAAK,EAAE,QAAQ,EAAE,CAAA;CAClB;AAED,MAAM,WAAW,sBAAsB;IACrC,IAAI,CAAC,EAAE,MAAM,CAAA;IACb,QAAQ,CAAC,EAAE,OAAO,CAAA;CACnB;AAED,MAAM,WAAW,eAAe;IAC9B,GAAG,EAAE,MAAM,CAAA;IACX,KAAK,IAAI,OAAO,CAAC,IAAI,CAAC,CAAA;CACvB;AA2CD,6DAA6D;AAC7D,MAAM,WAAW,aAAa;IAC5B,oHAAoH;IACpH,IAAI,EAAE,MAAM,GAAG,UAAU,GAAG,eAAe,CAAA;IAC3C,uEAAuE;IACvE,WAAW,CAAC,EAAE,MAAM,EAAE,CAAA;IACtB,wEAAwE;IACxE,QAAQ,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAA;CACnC;AAED,MAAM,WAAW,cAAc;IAC7B,CAAC,QAAQ,EAAE,MAAM,GAAG,aAAa,CAAA;CAClC;AAED,iEAAiE;AACjE,MAAM,WAAW,WAAW;IAC1B,IAAI,EAAE,MAAM,GAAG,UAAU,GAAG,eAAe,CAAA;IAC3C,WAAW,CAAC,EAAE,MAAM,EAAE,CAAA;IACtB,QAAQ,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAA;CACnC;AAED,MAAM,WAAW,YAAY;IAC3B,CAAC,SAAS,EAAE,MAAM,GAAG,WAAW,CAAA;CACjC;AAg7GD,MAAM,WAAW,kBAAkB;IACjC,IAAI,EAAE,MAAM,CAAA;IACZ,GAAG,EAAE,MAAM,CAAA;IACX,MAAM,CAAC,EAAE,MAAM,CAAA;IACf,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAA;IAChC,YAAY,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,OAAO,CAAC,CAAA;IACtC,cAAc,CAAC,EAAE,kBAAkB,GAAG,MAAM,CAAA;CAC7C;AA+ID;;GAEG;AACH,wBAAsB,oBAAoB,CACxC,GAAG,EAAE,MAAM,EACX,OAAO,GAAE,sBAA2B,GACnC,OAAO,CAAC,eAAe,CAAC,CA2d1B;AAiFD,eAAO,MAAM,aAAa,EAAE,GAAG,CAAC,MAAM,EAAE,MAAM,CAAa,CAAC"}

package/dist/dashboard-server.js CHANGED Viewed

@@ -288,6 +288,12 @@ function runToolInSubprocess(toolsModulePath, toolName, args) {
 }
 function runWorkflowInSubprocess(workflowsModulePath, toolsModulePath, workflowName, args, input, options) {
     return new Promise((resolve) => {
+        const startMs = Date.now();
+        const elapsed = () => Date.now() - startMs;
+        const debug = (...a) => {
+            if (process.env.ELASTICDASH_DEBUG === '1')
+                console.error(...a);
+        };
         const workerScript = new URL('./workflow-runner-worker.js', import.meta.url).pathname;
         const projectDir = path.dirname(workflowsModulePath);
         const denoProject = isDenoProject(projectDir);
@@ -304,13 +310,55 @@ function runWorkflowInSubprocess(workflowsModulePath, toolsModulePath, workflowN
             cwd: projectDir,
             stdio: ['pipe', 'pipe', 'pipe', 'pipe'],
         });
+        const pid = child.pid ?? -1;
+        debug(`[elasticdash dashboard] workflow subprocess stage=spawned pid=${pid} elapsedMs=${elapsed()} workflow=${workflowName}`);
+        // Heartbeat — workflows can be long; without this the dashboard is blind.
+        // 0 disables. Default 5s.
+        const heartbeatMs = Number(process.env.ELASTICDASH_HEARTBEAT_MS ?? 5000);
+        const heartbeat = heartbeatMs > 0
+            ? setInterval(() => {
+                debug(`[elasticdash dashboard] workflow subprocess heartbeat pid=${pid} elapsedMs=${elapsed()} workflow=${workflowName}`);
+            }, heartbeatMs)
+            : null;
+        // Optional kill switch. Default unset = no timeout (preserves prior behavior).
+        let timedOut = false;
+        const timeoutMs = Number(process.env.ELASTICDASH_WORKFLOW_TIMEOUT_MS ?? 0);
+        const timeout = timeoutMs > 0
+            ? setTimeout(() => {
+                timedOut = true;
+                debug(`[elasticdash dashboard] workflow subprocess TIMEOUT pid=${pid} after ${timeoutMs}ms — sending SIGTERM`);
+                try {
+                    child.kill('SIGTERM');
+                }
+                catch { /* already dead */ }
+                setTimeout(() => {
+                    try {
+                        child.kill('SIGKILL');
+                    }
+                    catch { /* already dead */ }
+                }, 2000);
+            }, timeoutMs)
+            : null;
+        const cleanup = () => {
+            if (heartbeat)
+                clearInterval(heartbeat);
+            if (timeout)
+                clearTimeout(timeout);
+        };
         let fd3Data = '';
         let stderr = '';
+        let sawFd3 = false;
+        let sawStdout = false;
+        let sawStderr = false;
         // Line-buffer stdout so that large result JSON lines split across multiple
         // data events are reassembled before processing.
         const WORKFLOW_RESULT_PREFIX = '__ELASTICDASH_RESULT__:';
         let stdoutBuf = '';
         child.stdout.on('data', (chunk) => {
+            if (!sawStdout) {
+                sawStdout = true;
+                debug(`[elasticdash dashboard] workflow subprocess stage=first-stdout pid=${pid} elapsedMs=${elapsed()}`);
+            }
             stdoutBuf += chunk.toString();
             const lines = stdoutBuf.split('\n');
             stdoutBuf = lines.pop() ?? ''; // keep last (possibly incomplete) line
@@ -325,14 +373,25 @@ function runWorkflowInSubprocess(workflowsModulePath, toolsModulePath, workflowN
             }
         });
         child.stderr.on('data', (chunk) => {
+            if (!sawStderr) {
+                sawStderr = true;
+                debug(`[elasticdash dashboard] workflow subprocess stage=first-stderr pid=${pid} elapsedMs=${elapsed()}`);
+            }
             stderr += chunk.toString();
             process.stderr.write(chunk);
         });
         const fd3 = child.stdio[3];
         fd3?.on('data', (chunk) => {
+            if (!sawFd3) {
+                sawFd3 = true;
+                debug(`[elasticdash dashboard] workflow subprocess stage=first-fd3 pid=${pid} elapsedMs=${elapsed()}`);
+            }
             fd3Data += chunk.toString();
         });
-        child.on('close', () => {
+        child.on('close', (code, signal) => {
+            cleanup();
+            const elapsedMs = elapsed();
+            debug(`[elasticdash dashboard] workflow subprocess stage=closed pid=${pid} code=${code} signal=${signal ?? 'none'} elapsedMs=${elapsedMs} stderrBytes=${stderr.length} fd3Bytes=${fd3Data.length}`);
             // Flush any remaining buffered stdout line (e.g. result with no trailing newline)
             if (stdoutBuf.startsWith(WORKFLOW_RESULT_PREFIX)) {
                 fd3Data += stdoutBuf.slice(WORKFLOW_RESULT_PREFIX.length);
@@ -345,11 +404,24 @@ function runWorkflowInSubprocess(workflowsModulePath, toolsModulePath, workflowN
                     resolve(JSON.parse(fd3Data));
                     return;
                 }
-                catch { /* fall through */ }
+                catch (parseErr) {
+                    const detail = `[exit=${code} signal=${signal ?? 'none'} elapsedMs=${elapsedMs} pid=${pid}] fd3 payload failed to parse: ${parseErr.message}`;
+                    resolve({ ok: false, error: detail });
+                    return;
+                }
             }
-            resolve({ ok: false, error: stderr.trim() || 'Workflow subprocess produced no output.' });
+            const stderrExcerpt = stderr.length > 1024 ? `…${stderr.slice(-1024)}` : stderr;
+            const detail = `[exit=${code} signal=${signal ?? 'none'} elapsedMs=${elapsedMs} pid=${pid} stderrBytes=${stderr.length}]`;
+            const baseError = timedOut
+                ? `Workflow subprocess timed out after ${timeoutMs}ms`
+                : (stderr.trim() || 'Workflow subprocess produced no output.');
+            const errorMsg = stderr.trim()
+                ? `${baseError} ${detail}`
+                : `${baseError} ${detail}${stderrExcerpt ? `\nLast stderr: ${stderrExcerpt}` : ''}`;
+            resolve({ ok: false, error: errorMsg });
         });
         child.on('error', (err) => {
+            cleanup();
             const hint = denoProject && err.code === 'ENOENT'
                 ? ' (Deno project detected — ensure "deno" is installed and available in PATH)'
                 : '';
@@ -373,6 +445,7 @@ function runWorkflowInSubprocess(workflowsModulePath, toolsModulePath, workflowN
         });
         child.stdin.write(payload);
         child.stdin.end(); // Always close stdin to avoid subprocess hang
+        debug(`[elasticdash dashboard] workflow subprocess stage=payload-written pid=${pid} elapsedMs=${elapsed()} payloadBytes=${payload.length}`);
     });
 }
 async function runToolObservation(cwd, observation, tools) {