npm - @diabolicallabs/llm-client - Versions diffs - 0.2.0 → 0.4.0 - Mend

@diabolicallabs/llm-client 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -6,7 +6,7 @@ Unified LLM API across Anthropic, OpenAI, Google Gemini, DeepSeek, and Perplexit
 ## Status
-**Published — v0.2.0.** All five providers are implemented. Perplexity adds web-grounded responses with citation extraction and search filters.
+**Published — v0.3.0.** All five providers implemented. v0.4.0 adds native strict structured outputs (OpenAI json_schema, Anthropic tool-use, Gemini responseSchema) triggered automatically when a Zod 4 schema is passed.
 ## Install
@@ -42,13 +42,74 @@ for await (const chunk of client.stream([{ role: 'user', content: 'Hello' }])) {
   process.stdout.write(chunk.token);
 }
-// Structured output (Zod schema)
+// Structured output — Zod 4 schema triggers strict native mode automatically
 import { z } from 'zod';
 const schema = z.object({ name: z.string(), score: z.number() });
 const result = await client.structured(messages, schema);
 // result.data is typed as { name: string; score: number }
+// result.model and result.id are populated (v0.4.0+)
 ```
+## Strict structured outputs (v0.4.0)
+Pass a **Zod 4** schema to `structured()` and the toolkit automatically routes to the strictest native path available for each provider. No opt-in flag required.
+```typescript
+import { z } from 'zod';
+const schema = z.object({
+  topic: z.string(),
+  bullets: z.array(z.string()),
+});
+const result = await client.structured(messages, schema);
+// result.data    — typed and Zod-validated
+// result.model   — model ID used (always present, v0.4.0+)
+// result.id      — provider request ID for tracing (OpenAI + Anthropic)
+// result.citations — Perplexity citations if any
+```
+### How detection works
+The toolkit checks for Zod 4's internal `_zod` marker at runtime. If the schema is a Zod 4 instance, it converts to JSON Schema using Zod 4's built-in `z.toJSONSchema()` and routes to the native path. If the schema is anything else (plain `{ parse }` object, Zod 3, etc.), it falls back to the v0.3.0 system-prompt path.
+### Schema-feature support matrix
+| Provider | Native mode | What's enforced | Known limits |
+|---|---|---|---|
+| OpenAI (`gpt-5.x`) | `response_format: { type: 'json_schema', strict: true }` | Schema structure guaranteed; model cannot produce off-schema output | No `format`, `pattern`, or recursive schemas (`z.lazy()`). Throws at conversion time with clear message. |
+| Anthropic | Tool-use with forced `tool_choice: { type: 'tool', name: 'extract' }` | Model must call the tool; `input` is pre-parsed JSON | Defense-in-depth `schema.parse()` still runs |
+| Gemini | `responseSchema` (OpenAPI 3.0) + `responseMimeType: 'application/json'` | Schema communicated to the model; belt-and-braces fence-strip retained | Tested via mocks only — file an issue if Gemini's API rejects the schema shape |
+| DeepSeek | None (prompt-only, API limitation) | System-prompt nudge + schema.parse() | Same as v0.3.0 |
+| Perplexity | None (prompt-only, API limitation) | System-prompt nudge + `<think>` strip + schema.parse() | Same as v0.3.0; `citations` propagated to structured response |
+### Prompt-mode escape hatch
+If your schema uses a feature unsupported in strict mode (e.g. `z.function()`, `z.lazy()`) and you need to keep using it, pass the escape hatch:
+```typescript
+const result = await client.structured(messages, schema, {
+  providerOptions: { structuredMode: 'prompt' },
+});
+// Forces the v0.3.0 prompt-only path regardless of schema type
+```
+Alternatively, catch the `LlmError` thrown during schema conversion and inform the user:
+```typescript
+try {
+  const result = await client.structured(messages, schema);
+} catch (err) {
+  if (err instanceof LlmError && err.kind === 'unknown') {
+    // Schema contains an unrepresentable feature — message names it
+    console.error(err.message);
+  }
+}
+```
+### Zod 3 schemas
+If a Zod 3 schema is passed, the toolkit throws `LlmError` with a clear "upgrade to Zod 4" message rather than silently falling through to prompt mode. Pass `providerOptions.structuredMode = 'prompt'` if you cannot upgrade immediately.
 ## Provider universe
 | Provider | Status | Env var |
@@ -159,10 +220,95 @@ interface LlmCallOptions {
   model?: string;
   maxTokens?: number;
   temperature?: number;
+  timeoutMs?: number;              // Per-call timeout (ms). Overrides config.timeoutMs.
+  signal?: AbortSignal;            // Caller-supplied cancel signal. Never retried.
+  streamStallTimeoutMs?: number;   // Per-chunk silence timeout for stream(). Default 30000.
   providerOptions?: Record<string, unknown>;  // Perplexity search filters, etc.
 }
 ```
+## Cancellation, timeouts, stall detection
+### Per-call timeout override
+The default timeout is set at client construction via `config.timeoutMs` (default 30 000 ms). Override it per-call:
+```typescript
+const client = createClient({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-6',
+  apiKey: process.env.ANTHROPIC_API_KEY!,
+  timeoutMs: 30_000, // client default
+});
+// This call gets 90 seconds — useful for sonar-deep-research or long reasoning
+const response = await client.complete(messages, { timeoutMs: 90_000 });
+```
+On timeout, `LlmError.kind === 'timeout'` and `retryable === true`. Each retry attempt gets a fresh deadline — the timeout resets per attempt, not across the full retry sequence.
+### Caller AbortSignal
+Pass any `AbortSignal` to cancel an in-flight call immediately:
+```typescript
+const ac = new AbortController();
+// Cancel on user navigation, request supersede, shutdown, etc.
+const responsePromise = client.complete(messages, { signal: ac.signal });
+// Cancel before the call returns
+ac.abort('user navigated away');
+try {
+  await responsePromise;
+} catch (err) {
+  if (err instanceof LlmError && err.kind === 'cancelled') {
+    // Gracefully handle the cancellation
+  }
+}
+```
+- A signal already aborted at call time throws immediately — no SDK call is made, no retry.
+- A mid-call abort propagates to the SDK (Anthropic, OpenAI, DeepSeek, Perplexity) or wins a `Promise.race` (Gemini). `kind === 'cancelled'`, `retryable === false`. Never retried.
+### Stream stall detection
+A stream that emits a first chunk and then silently hangs will stall the consumer indefinitely without this feature. `streamStallTimeoutMs` fires a timer per chunk — if no chunk arrives within the window, the stream is aborted and a `kind: 'stream_stall'` error surfaces:
+```typescript
+try {
+  for await (const chunk of client.stream(messages, { streamStallTimeoutMs: 10_000 })) {
+    process.stdout.write(chunk.token);
+  }
+} catch (err) {
+  if (err instanceof LlmError && err.kind === 'stream_stall') {
+    console.error('stream stalled — retry or fallback');
+  }
+}
+```
+- Default `streamStallTimeoutMs`: 30 000 ms (set independently of `timeoutMs` — tolerant of reasoning-model think-pauses).
+- The stall timer resets after each chunk arrives, so slow-but-not-stalled streams complete normally.
+- Stall errors are **not retried** — partial output is unsafe to re-issue. The error surfaces to the caller.
+### `LlmError.kind` discriminator
+```typescript
+type LlmErrorKind = 'cancelled' | 'timeout' | 'stream_stall' | 'http' | 'network' | 'unknown';
+class LlmError extends Error {
+  readonly provider: string;
+  readonly statusCode?: number;
+  readonly retryable: boolean;
+  readonly kind: LlmErrorKind | undefined; // undefined on errors from older paths
+}
+```
+### Gemini cancellation caveat
+`@google/genai` does not accept a per-call `AbortSignal`. Cancellation uses `Promise.race` — when the internal controller aborts, we stop awaiting, but the SDK's HTTP request continues in the background until the SDK-level timeout fires. The SDK client is constructed with `httpOptions.timeout = configTimeoutMs * 2` as a backstop. This bounds the leaked request to at most 2× the configured timeout. Native signal support will be added when the SDK provides it.
 ## Error handling
 All provider errors are normalized into `LlmError`:
@@ -174,12 +320,12 @@ try {
   const response = await client.complete(messages);
 } catch (err) {
   if (err instanceof LlmError) {
-    console.error(err.provider, err.statusCode, err.retryable);
+    console.error(err.provider, err.statusCode, err.retryable, err.kind);
   }
 }
 ```
-Retryable errors (429, 5xx, network failures) are retried automatically with exponential backoff and full jitter before throwing.
+Retryable errors (429, 5xx, network failures, timeout) are retried automatically with exponential backoff and full jitter before throwing. Cancelled and stream-stall errors are never retried.
 ## Token normalization

package/dist/index.d.ts CHANGED Viewed

@@ -6,6 +6,19 @@
  * Week 5 additions:
  *   LlmResponse.citations — populated by the Perplexity provider; undefined for all others.
  *   LlmCallOptions — per-call options type extracted for reuse; adds providerOptions escape hatch.
+ *
+ * Week 6 additions (v0.3.0 — abort/timeout/stall):
+ *   LlmCallOptions.timeoutMs      — per-call timeout override (ms); overrides config.timeoutMs.
+ *   LlmCallOptions.signal         — caller-supplied AbortSignal; aborts in-flight call.
+ *   LlmCallOptions.streamStallTimeoutMs — per-stream stall detection (ms); default 30000.
+ *   LlmClientConfig.streamStallTimeoutMs — config-level stall default.
+ *   LlmError.kind                 — discriminator for error classification.
+ *
+ * v0.4.0 additions (strict structured outputs):
+ *   LlmStructuredResponse.model      — model ID actually used (always populated).
+ *   LlmStructuredResponse.id         — provider request ID where available (debugging).
+ *   LlmStructuredResponse.citations  — web citations from Perplexity structured responses.
+ *   LlmClient.structured JSDoc       — Zod 4 trigger and structuredMode escape hatch.
  */
 interface LlmMessage {
     role: 'system' | 'user' | 'assistant';
@@ -20,6 +33,12 @@ interface LlmClientConfig {
     maxTokens?: number;
     temperature?: number;
     timeoutMs?: number;
+    /**
+     * Default stall timeout for stream() calls (ms). Fires when no chunk is received
+     * for this duration. Independent of timeoutMs — tolerant of reasoning-model think-pauses.
+     * Default: 30000.
+     */
+    streamStallTimeoutMs?: number;
 }
 interface LlmUsage {
     inputTokens: number;
@@ -47,41 +66,116 @@ interface LlmResponse {
 /**
  * Per-call options shared across complete(), stream(), and structured().
  * Extends the standard model/maxTokens/temperature overrides with:
- *   providerOptions — generic escape hatch for provider-specific parameters.
- *                     The Perplexity provider reads search_domain_filter and
- *                     search_recency_filter from this field; other providers ignore it.
- *                     Unknown fields are passed through unchanged.
+ *   timeoutMs           — per-call timeout override; overrides config.timeoutMs for this call only.
+ *   signal              — caller-supplied AbortSignal; aborts the in-flight call immediately.
+ *                         A pre-aborted signal throws without making an SDK call (no retry).
+ *                         A mid-call abort throws kind:'cancelled', retryable:false (no retry).
+ *   streamStallTimeoutMs — per-call stall detection for stream(); overrides config default.
+ *   providerOptions     — generic escape hatch for provider-specific parameters.
+ *                         The Perplexity provider reads search_domain_filter and
+ *                         search_recency_filter from this field; other providers ignore it.
+ *                         Unknown fields are passed through unchanged.
  */
-interface LlmCallOptions extends Partial<Pick<LlmClientConfig, 'model' | 'maxTokens' | 'temperature'>> {
+interface LlmCallOptions extends Partial<Pick<LlmClientConfig, 'model' | 'maxTokens' | 'temperature' | 'timeoutMs'>> {
+    /** Caller-supplied AbortSignal. Cancels the in-flight call. Never retried. */
+    signal?: AbortSignal;
+    /**
+     * Per-call stall timeout for stream() in ms. Overrides config.streamStallTimeoutMs.
+     * Fires when no chunk arrives within this window. Default: config.streamStallTimeoutMs ?? 30000.
+     */
+    streamStallTimeoutMs?: number;
     providerOptions?: Record<string, unknown>;
 }
 interface LlmStreamChunk {
     token: string;
     usage?: LlmUsage;
 }
+/**
+ * Discriminator for LlmError — lets callers branch on error class without
+ * parsing message strings.
+ *
+ * cancelled    — AbortSignal fired (caller-initiated). Never retried.
+ * timeout      — Per-call timeoutMs deadline exceeded. Retried by withRetry.
+ * stream_stall — No chunk received within streamStallTimeoutMs. Not retried
+ *                (partial stream output is unsafe to re-issue).
+ * http         — Non-retryable HTTP error (4xx excluding 429).
+ * network      — Retryable network-layer error (ECONNRESET, ETIMEDOUT, etc.).
+ * unknown      — Unclassified error.
+ */
+type LlmErrorKind = 'cancelled' | 'timeout' | 'stream_stall' | 'http' | 'network' | 'unknown';
 declare class LlmError extends Error {
     readonly name = "LlmError";
     readonly provider: string;
     readonly statusCode: number | undefined;
     readonly retryable: boolean;
+    /**
+     * Optional error kind discriminator. Present on errors produced by the abort/timeout/stall
+     * machinery (v0.3.0+). May be undefined on errors from providers that pre-date the kind field
+     * or on errors that fall through to the generic normalization path.
+     * Typed as LlmErrorKind | undefined to satisfy exactOptionalPropertyTypes.
+     */
+    readonly kind: LlmErrorKind | undefined;
     readonly cause: unknown;
     constructor(opts: {
         message: string;
         provider: string;
         statusCode?: number;
         retryable: boolean;
+        kind?: LlmErrorKind;
         cause?: unknown;
     });
 }
+/**
+ * Structured output response.
+ *
+ * v0.4.0 — additive fields:
+ *   model      — model ID reported by the provider (always present).
+ *   id         — provider request / message ID for tracing and debugging.
+ *                Populated by OpenAI (response.id) and Anthropic (response.id).
+ *                Undefined for Gemini, DeepSeek, and Perplexity.
+ *   citations  — web citations propagated from Perplexity structured calls.
+ *                Undefined for all other providers.
+ */
 type LlmStructuredResponse<T> = {
     data: T;
+    model: string;
+    id?: string;
     usage: LlmUsage;
     latencyMs: number;
+    citations?: Array<{
+        url: string;
+        title?: string;
+    }>;
 };
 interface LlmClient {
     readonly config: Readonly<LlmClientConfig>;
     complete(messages: LlmMessage[], options?: LlmCallOptions): Promise<LlmResponse>;
     stream(messages: LlmMessage[], options?: LlmCallOptions): AsyncGenerator<LlmStreamChunk>;
+    /**
+     * Structured output — parses and validates the response against a schema.
+     *
+     * **Strict native mode (v0.4.0+):**
+     * Pass a Zod 4 schema to automatically opt into the provider's strictest native
+     * structured-output path:
+     *   - OpenAI: `response_format: { type: 'json_schema', strict: true }` (gpt-5.x family)
+     *   - Anthropic: forced tool-use with `tool_choice: { type: 'tool', name: 'extract' }`
+     *   - Gemini: `responseSchema` populated in GenerateContentConfig
+     *
+     * **Prompt-only fallback:**
+     * If the schema is not a Zod 4 instance, or if
+     * `options.providerOptions.structuredMode === 'prompt'` is set, the v0.3.0
+     * system-prompt + parse path is used instead. This is the escape hatch for:
+     *   - Zod 4 schemas that use unrepresentable features (z.function(), z.lazy(), etc.)
+     *   - Non-Zod schema objects that satisfy the narrow `{ parse }` interface
+     *   - DeepSeek and Perplexity (no native schema mode — always prompt-only)
+     *
+     * **Defense-in-depth:** schema.parse() is called on the parsed result even
+     * after a native strict-mode call, to catch truncation or partial outputs.
+     *
+     * @param schema - A Zod 4 schema (triggers strict mode) or any `{ parse }` interface.
+     *                 Using a narrower interface than ZodType avoids a hard zod dependency at
+     *                 the types level.
+     */
     structured<T>(messages: LlmMessage[], schema: {
         parse: (data: unknown) => T;
     }, options?: LlmCallOptions): Promise<LlmStructuredResponse<T>>;