combined-ai 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,781 @@
1
+ # combined-ai
2
+
3
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)
4
+ [![TypeScript](https://img.shields.io/badge/TypeScript-strict-3178c6.svg)](https://www.typescriptlang.org/)
5
+ [![Node](https://img.shields.io/badge/Node-%E2%89%A520-339933.svg)](https://nodejs.org/)
6
+
7
+ **Multi-model consensus, pipeline, ensemble, and broadcast for TypeScript.**
8
+
9
+ Most AI libraries hand you one model at a time. combined-ai makes several models
10
+ **work together on a single prompt** — consensus, sequential refinement, a vote
11
+ on structured output, or a plain fan-out that returns every model's answer —
12
+ behind one tiny interface. Single-provider calls (`complete`/`stream`) are
13
+ included too.
14
+
15
+ ```ts
16
+ import { ProviderRegistry } from "combined-ai";
17
+
18
+ const registry = new ProviderRegistry({
19
+ anthropic: { apiKey: process.env.ANTHROPIC_API_KEY! },
20
+ openai: { apiKey: process.env.OPENAI_API_KEY! },
21
+ google: { apiKey: process.env.GEMINI_API_KEY! },
22
+ });
23
+
24
+ // Three models draft, critique each other, and one synthesizes the best answer.
25
+ const result = await registry.combine({
26
+ messages: [{ role: "user", content: "Design a rate limiter." }],
27
+ participants: ["anthropic", "openai", "google"],
28
+ });
29
+
30
+ console.log(result.text);
31
+ ```
32
+
33
+ ## Contents
34
+
35
+ - [Why combine?](#why-combine)
36
+ - [Requirements](#requirements)
37
+ - [Installation](#installation)
38
+ - [Combining providers](#combining-providers)
39
+ - [Consensus](#consensus)
40
+ - [Pipeline](#pipeline)
41
+ - [Ensemble](#ensemble)
42
+ - [Broadcast](#broadcast)
43
+ - [Per-participant models](#per-participant-models)
44
+ - [Reading the result](#reading-the-result)
45
+ - [Progress events](#progress-events)
46
+ - [Single-provider usage](#single-provider-usage)
47
+ - [Provider configuration](#provider-configuration)
48
+ - [Custom & gateway providers](#custom--gateway-providers)
49
+ - [Request options](#request-options)
50
+ - [Result fields](#result-fields)
51
+ - [Structured output](#structured-output)
52
+ - [Tool calling](#tool-calling)
53
+ - [Multimodal input](#multimodal-input)
54
+ - [Error handling](#error-handling)
55
+ - [Retries & cancellation](#retries--cancellation)
56
+ - [Public API](#public-api)
57
+ - [Development](#development)
58
+ - [Changelog](#changelog)
59
+ - [License](#license)
60
+
61
+ ## Why combine?
62
+
63
+ A single model gives you one answer with no second opinion. combined-ai runs
64
+ several models on the same prompt, with four strategies for four shapes of
65
+ problem:
66
+
67
+ | Strategy | Shape | Use it when… |
68
+ | ------------- | --------------------------------------------- | --------------------------------------------------------------- |
69
+ | `"consensus"` | draft → critique → synthesize | you want one well-reasoned answer that survived peer review. |
70
+ | `"pipeline"` | sequential refinement (a conveyor belt) | each model should improve the previous one's answer in turn. |
71
+ | `"ensemble"` | parallel structured answers → field-wise vote | you need extraction/classification **with a confidence score**. |
72
+ | `"broadcast"` | parallel fan-out, every raw answer returned | you want each model's answer side by side, with no combining. |
73
+
74
+ All four share one interface: configure a `ProviderRegistry`, then call
75
+ `registry.combine({ participants, messages, strategy })`. Participants can be
76
+ different providers, or the **same provider with different models**.
77
+
78
+ ## Requirements
79
+
80
+ - Node.js ≥ 20 (uses the global `fetch`, `ReadableStream`, `TextDecoder`).
81
+
82
+ ## Installation
83
+
84
+ ```bash
85
+ npm install combined-ai
86
+ # or: yarn add combined-ai / pnpm add combined-ai
87
+ ```
88
+
89
+ The published package is dual ESM + CJS with TypeScript types — any package
90
+ manager works as a consumer. (This repo uses Yarn 4 with Plug'n'Play for
91
+ development only; see [Development](#development).)
92
+
93
+ The library never reads environment variables — you always pass API keys in
94
+ explicitly via the registry config.
95
+
96
+ ## Combining providers
97
+
98
+ ```ts
99
+ const registry = new ProviderRegistry({
100
+ anthropic: { apiKey: process.env.ANTHROPIC_API_KEY! },
101
+ openai: { apiKey: process.env.OPENAI_API_KEY! },
102
+ google: { apiKey: process.env.GEMINI_API_KEY! },
103
+ });
104
+
105
+ const result = await registry.combine({
106
+ messages: [{ role: "user", content: "Design a rate limiter." }],
107
+ participants: ["anthropic", "openai", "google"],
108
+ strategy: "consensus", // optional; default
109
+ });
110
+ ```
111
+
112
+ `combine()` accepts the same request fields as `complete()` (`messages`,
113
+ `system`, `model`, `maxTokens`, `signal`) — they apply to every participant
114
+ unless a participant overrides them — plus:
115
+
116
+ | Field | Type | Notes |
117
+ | ----------------- | -------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
118
+ | `participants` | `ParticipantSpec[]` | Required, non-empty. A bare `ProviderName`, or `{ provider, model?, maxTokens?, label? }`. |
119
+ | `strategy` | `"consensus"` \| `"pipeline"` \| `"ensemble"` \| `"broadcast"` | Optional. Defaults to `"consensus"`. |
120
+ | `synthesizer` | `string` (participant id) | _Consensus only._ Who writes the final answer. Defaults to the first participant. |
121
+ | `attribution` | `"attributed"` \| `"anonymized"` | _Consensus only._ Default `"anonymized"` (Answer A/B/C) reduces bias. |
122
+ | `minParticipants` | `number` | _Consensus only._ Minimum drafts required to proceed (default 2). |
123
+ | `responseFormat` | `ResponseFormat` | _Ensemble only (required there)._ The shared JSON Schema every model answers under. |
124
+
125
+ **Two ways to call it.** When you know the strategy at the call site, prefer the
126
+ per-strategy method — `registry.consensus(req)`, `.pipeline(req)`,
127
+ `.ensemble(req)`, `.broadcast(req)` — each takes that strategy's request type and
128
+ returns its **concrete** result (`ConsensusResult`, `PipelineResult`, …), so you
129
+ never narrow a union. `registry.combine(request)` is the dispatcher and is generic over the strategy:
130
+ pass a literal `strategy` and it returns that strategy's concrete result; pass a
131
+ `strategy` only known at runtime and it returns the full `CombineResult` union to
132
+ narrow. The two share one engine and the same validation. See
133
+ [Reading the result](#reading-the-result).
134
+
135
+ ### Consensus
136
+
137
+ The default. Best when you want a single, well-reasoned answer that has been
138
+ checked by other models.
139
+
140
+ 1. **Draft** — every participant answers the prompt in parallel.
141
+ 2. **Critique** — every participant sees all drafts and critiques them, arguing
142
+ for the best one and ending with a structured verdict.
143
+ 3. **Synthesize** — the _synthesizer_ reads the drafts and critiques and writes
144
+ the single final answer.
145
+
146
+ ```ts
147
+ const result = await registry.combine({
148
+ messages: [{ role: "user", content: "Design a rate limiter." }],
149
+ participants: ["anthropic", "openai", "google"],
150
+ synthesizer: "anthropic", // optional; defaults to the first participant
151
+ });
152
+
153
+ console.log(result.text); // the final synthesized answer
154
+ ```
155
+
156
+ Behavior worth knowing:
157
+
158
+ - **Anonymized by default.** Critics and the synthesizer see `Answer A`/`B`/`C`
159
+ rather than model names, to neutralize brand and self-preference bias (pass
160
+ `attribution: "attributed"` to opt out). The result still records each
161
+ outcome's `id` and `provider`.
162
+ - **Correctness over popularity.** The synthesizer is told to adopt a lone
163
+ correct answer rather than average it away, not to favor its own (anonymized)
164
+ draft, and to flag genuine disagreement instead of papering over it. The final
165
+ answer is written fresh — it never alludes to the drafts, critiques, or
166
+ internal labels (a final sanitizing pass strips any leftover meta-commentary).
167
+ - **Lean inter-model messages.** The draft and critique text passed between
168
+ models drops greetings and filler but keeps reasoning and caveats, so critics
169
+ can check the _why_. The user-facing synthesis is unconstrained.
170
+ - **A single participant** with a successful draft degrades to a plain completion
171
+ (no critique/synthesis); if that lone draft fails or is empty, the run throws.
172
+
173
+ ### Pipeline
174
+
175
+ A conveyor belt: each participant refines the previous one's answer, in
176
+ **participant order**. The first writes an initial answer; each subsequent stage
177
+ gets the question plus the running answer and improves it; the **last stage to
178
+ produce an answer wins**.
179
+
180
+ ```ts
181
+ const result = await registry.combine({
182
+ messages: [{ role: "user", content: "Design a rate limiter." }],
183
+ participants: ["anthropic", "openai", "google"], // the conveyor order
184
+ strategy: "pipeline",
185
+ });
186
+
187
+ console.log(result.text); // the final, refined answer
188
+ console.log(result.finalParticipant); // id of the last stage that produced one
189
+ ```
190
+
191
+ - **Refiners preserve, not rewrite.** Each stage treats the current answer as a
192
+ strong baseline — fix errors, fill gaps, sharpen wording, but keep what's
193
+ correct (there's no downstream synthesizer to catch a regression).
194
+ - **The final answer is sanitized** when a refining stage actually changed it, to
195
+ strip "I improved the previous answer…" narration. A first-stage answer or an
196
+ unchanged passthrough is returned as-is (no wasted call).
197
+ - `synthesizer`, `attribution`, and `minParticipants` are consensus-specific and
198
+ ignored here.
199
+
200
+ ### Ensemble
201
+
202
+ A multi-model vote on **structured output** — the thing one provider can't give
203
+ you. Every participant answers the prompt independently under the same JSON
204
+ Schema, the typed objects are merged **mechanically** (no model adjudicates), and
205
+ you get an **agreement score** telling you how strongly the models concurred.
206
+
207
+ ```ts
208
+ const result = await registry.combine({
209
+ messages: [{ role: "user", content: "Extract the city and country: ..." }],
210
+ participants: ["anthropic", "openai", "google"],
211
+ strategy: "ensemble",
212
+ responseFormat: {
213
+ type: "json_schema",
214
+ schema: {
215
+ type: "object",
216
+ properties: { city: { type: "string" }, country: { type: "string" } },
217
+ required: ["city", "country"],
218
+ additionalProperties: false,
219
+ },
220
+ },
221
+ });
222
+
223
+ console.log(result.merged); // e.g. { city: "Paris", country: "France" }
224
+ console.log(result.agreement.overall); // 0–1: how much the models agreed
225
+ console.log(result.agreement.byField); // e.g. { city: 1, country: 0.67 }
226
+ ```
227
+
228
+ How the merge works (field-wise over the union of top-level keys):
229
+
230
+ - **Every field is a majority vote** — the most common value by deep equality,
231
+ ties broken by participant order. The merged value is always one a model
232
+ actually returned (never synthesized or averaged), so it stays within the
233
+ schema's types.
234
+ - **Agreement** per field is the share of **all** valid responses that voted for
235
+ the merged value; `overall` is the mean across fields. A field most models
236
+ omitted scores low — a low score flags it for review.
237
+
238
+ Notes:
239
+
240
+ - **`responseFormat` is required** for ensemble and **rejected** for the other
241
+ strategies. Its schema must have an **object** root (the field-wise vote needs
242
+ named fields).
243
+ - **The merge is shallow** — nested objects/arrays are voted on as whole values.
244
+ Keep schemas to flat fields for the most useful per-field agreement.
245
+
246
+ ### Broadcast
247
+
248
+ The simplest strategy: send the prompt to every participant **in parallel** and
249
+ get **all** of their answers back, unchanged. There is no critique, synthesis, or
250
+ vote — broadcast deliberately does **not** combine. Use it to compare models side
251
+ by side, or to drive your own selection/UI over the raw outputs.
252
+
253
+ ```ts
254
+ const result = await registry.combine({
255
+ messages: [{ role: "user", content: "Name a good book on databases." }],
256
+ participants: ["anthropic", "openai", "google"],
257
+ strategy: "broadcast",
258
+ });
259
+
260
+ for (const response of result.responses) {
261
+ if (response.status === "ok") {
262
+ console.log(`${response.id}: ${response.result.text}`);
263
+ } else {
264
+ console.log(`${response.id} failed: ${response.error.message}`);
265
+ }
266
+ }
267
+ ```
268
+
269
+ - **No single answer**, so `BroadcastResult` has **no `text`** field — read
270
+ `result.responses` (one outcome per participant, in participant order).
271
+ - **Each model answers the raw prompt** (no shaped framing) — you get the
272
+ unmodified per-model reply.
273
+ - **Fails only when every participant fails**; one or more failures are recorded
274
+ in `responses` and the run still returns the successes. An empty-text answer
275
+ still counts as a success (broadcast returns what each model gave back).
276
+ - **No structured output:** `responseFormat` is rejected (it's the
277
+ [ensemble](#ensemble) strategy's job); `synthesizer`, `attribution`, and
278
+ `minParticipants` are consensus-specific and ignored.
279
+
280
+ ### Per-participant models
281
+
282
+ Each participant is identified by an **id** (its label). A bare provider name has
283
+ an id equal to the provider name; the object form derives `<provider>-<model>`
284
+ when you set a model (or set `label` yourself). This lets one combine mix cheap
285
+ drafters with a strong synthesizer — and even run the **same provider twice**
286
+ with different models:
287
+
288
+ ```ts
289
+ await registry.combine({
290
+ messages,
291
+ participants: [
292
+ { provider: "google", model: "gemini-2.5-flash" }, // id "google-gemini-2.5-flash"
293
+ { provider: "openai", model: "gpt-4.1-mini" }, // id "openai-gpt-4.1-mini"
294
+ { provider: "openai", model: "gpt-4.1" }, // id "openai-gpt-4.1" (same provider, different model)
295
+ { provider: "anthropic" }, // id "anthropic" (default model)
296
+ ],
297
+ synthesizer: "anthropic", // a strong model adjudicates the cheap drafts
298
+ });
299
+ ```
300
+
301
+ Two participants that resolve to the same id are rejected unless you give one an
302
+ explicit `label`. A participant's `model`/`maxTokens` take precedence over the
303
+ request-wide values.
304
+
305
+ ### Reading the result
306
+
307
+ Every outcome carries both an `id` (the participant label) and `provider` (the
308
+ actual provider it ran on); `usage` is aggregated across **every** model call the
309
+ run made (the true multi-call cost), keyed by `id`.
310
+
311
+ If you call a per-strategy method, the result type is already concrete — no
312
+ narrowing:
313
+
314
+ ```ts
315
+ const result = await registry.pipeline({ messages, participants });
316
+ result.finalParticipant; // typed PipelineResult — `stages`, `text`, … all in scope
317
+ ```
318
+
319
+ `combine()` with a **literal** `strategy` is just as precise (it's generic over
320
+ the strategy, inferring the result type from `strategy`). You only narrow when the
321
+ strategy is dynamic, in which case `combine()` returns the `CombineResult` union
322
+ discriminated on `strategy`:
323
+
324
+ ```ts
325
+ const strategy = pickStrategyAtRuntime(); // : StrategyName
326
+ const result = await registry.combine({ messages, participants, strategy });
327
+
328
+ result.usage; // { total, byParticipant } — aggregated token usage, or undefined
329
+
330
+ if (result.strategy === "consensus") {
331
+ result.text; // the final synthesized answer
332
+ result.synthesizer; // id of the participant that wrote the final answer
333
+ result.drafts; // each participant's first-pass answer (has .id, .provider)
334
+ result.critiques; // each participant's critique
335
+ } else if (result.strategy === "pipeline") {
336
+ result.text; // the final, refined answer
337
+ result.finalParticipant; // id of the last stage that produced an answer
338
+ result.stages; // each stage in conveyor order (ok/failed)
339
+ } else if (result.strategy === "ensemble") {
340
+ result.text; // the merged object serialized as JSON
341
+ result.merged; // the voted object
342
+ result.agreement; // { overall, byField }
343
+ result.responses; // each participant's structured answer (ok/failed)
344
+ } else if (result.strategy === "broadcast") {
345
+ // No `text` — broadcast returns every raw answer, not one combined answer.
346
+ result.responses; // each participant's raw answer in order (ok/failed)
347
+ }
348
+ ```
349
+
350
+ `text` is present on every strategy **except** `broadcast` (which has no single
351
+ answer), so narrow on `result.strategy` before reading it.
352
+
353
+ **Partial failures are tolerated.** A participant that errors — or succeeds but
354
+ returns empty/invalid output — is recorded in the result and dropped from the
355
+ rest of the round; the run proceeds with the survivors. It throws only when too
356
+ few survive: consensus needs `minParticipants` drafts, pipeline needs at least
357
+ one advancing stage, ensemble needs at least one valid object, and broadcast needs
358
+ at least one participant to succeed. `combine()` also
359
+ validates the request up front and throws on bad input (no participants,
360
+ duplicate ids, empty `messages`, an out-of-range `minParticipants`, a
361
+ `synthesizer` that isn't a participant id, an unknown `strategy`, or a missing /
362
+ non-object `responseFormat` for ensemble).
363
+
364
+ ### Progress events
365
+
366
+ `combine()` takes an optional second argument with an `onEvent` callback that
367
+ fires as the run progresses — handy for a status display. Events are status only
368
+ (no token streaming); the answer is still the resolved
369
+ result.
370
+
371
+ ```ts
372
+ await registry.combine(
373
+ { messages, participants: ["anthropic", "openai"] },
374
+ {
375
+ onEvent: (event) => {
376
+ switch (event.type) {
377
+ case "phase":
378
+ console.log(`→ ${event.phase}`); // consensus phase boundary
379
+ break;
380
+ case "draft":
381
+ case "critique": // consensus
382
+ case "stage": // pipeline (has .index)
383
+ case "response": // ensemble, broadcast
384
+ console.log(` ${event.provider}: ${event.status}`); // "ok" | "failed"
385
+ break;
386
+ }
387
+ },
388
+ },
389
+ );
390
+ ```
391
+
392
+ Errors thrown from `onEvent` are swallowed so a listener can't break the run, and
393
+ there is no terminal event (the result is the return value).
394
+
395
+ ## Single-provider usage
396
+
397
+ The same registry talks to one provider at a time. Every provider implements one
398
+ contract, so the calling code is identical whichever you pick. The concrete
399
+ provider classes are intentionally not exported — you never construct them
400
+ yourself.
401
+
402
+ ```ts
403
+ const provider = registry.select("anthropic"); // throws if not configured
404
+
405
+ // Non-streaming: get the full response text.
406
+ const result = await provider.complete({
407
+ messages: [{ role: "user", content: "Say hello in one short sentence." }],
408
+ });
409
+ console.log(result.text, result.model);
410
+
411
+ // Streaming: consume text deltas as they arrive.
412
+ for await (const delta of provider.stream({
413
+ messages: [{ role: "user", content: "Count to five." }],
414
+ })) {
415
+ process.stdout.write(delta);
416
+ }
417
+ ```
418
+
419
+ You can also inspect what's configured:
420
+
421
+ ```ts
422
+ registry.has("openai"); // -> false if not configured
423
+ registry.names(); // -> the configured provider names
424
+ registry.select("openai"); // -> throws: No provider "openai" configured. Configured: anthropic
425
+ ```
426
+
427
+ ### Provider configuration
428
+
429
+ Pass an entry for each provider you want; omit one to leave it out.
430
+
431
+ ```ts
432
+ new ProviderRegistry({
433
+ anthropic: {
434
+ apiKey: "sk-ant-...", // required
435
+ model: "claude-opus-4-8", // optional; default
436
+ baseUrl: "https://api.anthropic.com", // optional; default
437
+ retry: { maxRetries: 2, baseDelayMs: 500 }, // optional; defaults
438
+ },
439
+ openai: {
440
+ apiKey: "sk-...",
441
+ model: "gpt-4.1", // optional; default
442
+ headers: { "x-trace": "..." }, // optional; merged into every request
443
+ },
444
+ google: {
445
+ apiKey: "...",
446
+ model: "gemini-2.5-pro", // optional; default
447
+ },
448
+ });
449
+ ```
450
+
451
+ ### Custom & gateway providers
452
+
453
+ Beyond the three built-ins you can register extra providers under names you
454
+ choose, via a `custom` map. Two forms:
455
+
456
+ - **`openai-compatible`** — point the OpenAI provider at any Chat Completions
457
+ endpoint (OpenRouter, Together, Groq, Ollama, a local server, …). `baseUrl`
458
+ (excluding the `/v1/chat/completions` path) and `model` are required; `headers`
459
+ and `retry` are optional.
460
+ - **`provider`** — bring your own object implementing the `Provider` interface,
461
+ for an API the library doesn't speak natively or to wrap one with
462
+ instrumentation.
463
+
464
+ ```ts
465
+ const registry = new ProviderRegistry({
466
+ anthropic: { apiKey: "sk-ant-..." },
467
+ custom: {
468
+ groq: {
469
+ kind: "openai-compatible",
470
+ apiKey: process.env.GROQ_API_KEY!,
471
+ baseUrl: "https://api.groq.com/openai",
472
+ model: "llama-3.3-70b-versatile",
473
+ },
474
+ mine: { kind: "provider", provider: myProviderInstance },
475
+ },
476
+ });
477
+
478
+ registry.select("groq"); // a normal Provider
479
+ registry.combine({ participants: ["anthropic", "groq"], messages }); // mix freely
480
+ ```
481
+
482
+ A custom name that collides with a built-in (`anthropic`/`openai`/`google`)
483
+ throws at construction. Custom providers work everywhere a built-in does —
484
+ `select()`, `combine()` participants, and results.
485
+
486
+ ### Request options
487
+
488
+ Both `complete()` and `stream()` (and `combine()`) take a `CompletionRequest`:
489
+
490
+ | Field | Type | Notes |
491
+ | ---------------- | ------------------ | ---------------------------------------------------------------------------------------------- |
492
+ | `messages` | `Message[]` | Required. `{ role: "user" \| "assistant"; content: string \| ContentPart[] }` |
493
+ | `system` | `string` | Optional system prompt. |
494
+ | `model` | `string` | Optional per-request model override. |
495
+ | `maxTokens` | `number` | Optional output cap (defaults: 16000 complete / 64000 stream). |
496
+ | `responseFormat` | `ResponseFormat` | Optional. Constrain the output to a JSON Schema — see [Structured output](#structured-output). |
497
+ | `tools` | `ToolDefinition[]` | Optional. Tools the model may call — see [Tool calling](#tool-calling). |
498
+ | `toolChoice` | `ToolChoice` | Optional. `"auto" \| "any" \| "none" \| { name }`. |
499
+ | `signal` | `AbortSignal` | Optional. Aborts the request (and an in-flight `stream()` read) when it fires. |
500
+
501
+ > **Gemini note:** Gemini 2.5 models are _thinking_ models, and their internal
502
+ > thinking tokens count against `maxTokens`. A very small cap can be consumed
503
+ > entirely by thinking, leaving the visible answer empty or truncated — give
504
+ > Gemini ample headroom (`gemini-2.5-pro` can't fully disable thinking).
505
+
506
+ ### Result fields
507
+
508
+ `complete()` resolves to a `CompletionResult`:
509
+
510
+ | Field | Type | Notes |
511
+ | ----------------- | -------------- | -------------------------------------------------------------------------------------------------- |
512
+ | `text` | `string` | The full answer. |
513
+ | `model` | `string` | The model that actually produced the response. |
514
+ | `finishReason` | `FinishReason` | Normalized stop reason: `"stop"` \| `"length"` \| `"content_filter"` \| `"tool_use"` \| `"other"`. |
515
+ | `rawFinishReason` | `string` | The provider's exact stop-reason string. |
516
+ | `refusal` | `string` | The refusal message when the model declined. |
517
+ | `usage` | `Usage` | Token usage (`inputTokens`/`outputTokens`/`totalTokens`), or `undefined` if none reported. |
518
+ | `parsed` | `unknown` | The parsed structured output when `responseFormat` was given. |
519
+ | `toolCalls` | `ToolCall[]` | The tool calls the model requested, when it called any. |
520
+
521
+ `finishReason` lets you tell a truncated/refused answer apart from a genuinely
522
+ empty one instead of just seeing `text: ""`. A `"length"` reason with empty
523
+ `text` on Gemini usually means the cap was spent on thinking tokens. `refusal` is
524
+ populated by OpenAI and Anthropic, and a set `refusal` always pairs with
525
+ `"content_filter"`; Gemini has no refusal-text field, so it signals a refusal via
526
+ `finishReason: "content_filter"` alone (the block reason lands in
527
+ `rawFinishReason`).
528
+
529
+ ```ts
530
+ const { text, finishReason, refusal } = await provider.complete({ messages });
531
+ if (finishReason === "length") {
532
+ // raise maxTokens and retry
533
+ } else if (refusal !== undefined) {
534
+ console.warn(`Model declined: ${refusal}`);
535
+ }
536
+ ```
537
+
538
+ ### Structured output
539
+
540
+ Pass `responseFormat` with a **plain JSON Schema** (no Zod, no runtime
541
+ dependency) to constrain a single provider's output. The model returns JSON in
542
+ `text`, and `complete()` also gives you the parsed value on `result.parsed`:
543
+
544
+ ```ts
545
+ const result = await registry.select("openai").complete({
546
+ messages: [{ role: "user", content: "Where is the Eiffel Tower?" }],
547
+ responseFormat: {
548
+ type: "json_schema",
549
+ schema: {
550
+ type: "object",
551
+ properties: { city: { type: "string" }, country: { type: "string" } },
552
+ required: ["city", "country"],
553
+ additionalProperties: false,
554
+ },
555
+ },
556
+ });
557
+
558
+ const place = result.parsed as { city: string; country: string };
559
+ // result.parsed is `undefined` if the output wasn't valid JSON; raw is in result.text.
560
+ ```
561
+
562
+ Each provider maps the schema to its native mechanism. For one schema to work
563
+ across all three, keep it simple: every object sets `additionalProperties: false`
564
+ and every property is `required` with a single non-null `type`. Avoid
565
+ optional/nullable fields, recursive schemas, `$ref`, and numeric/length
566
+ constraints. (The [ensemble](#ensemble) strategy uses this same field across
567
+ multiple models.)
568
+
569
+ ### Tool calling
570
+
571
+ Declare `tools` and the model can ask to call them. When it does, `complete()`
572
+ returns `result.toolCalls` (and `finishReason === "tool_use"`); you run the tools
573
+ and feed the results back by appending the call and its result to the
574
+ conversation, then call again. You own the loop.
575
+
576
+ ```ts
577
+ const provider = registry.select("anthropic");
578
+ const tools = [
579
+ {
580
+ name: "get_weather",
581
+ description: "Get the current weather for a city.",
582
+ parameters: {
583
+ type: "object",
584
+ properties: { city: { type: "string" } },
585
+ required: ["city"],
586
+ additionalProperties: false,
587
+ },
588
+ },
589
+ ];
590
+
591
+ const messages = [{ role: "user", content: "What's the weather in Paris?" }];
592
+ const first = await provider.complete({ messages, tools });
593
+
594
+ if (first.toolCalls) {
595
+ messages.push({
596
+ role: "assistant",
597
+ content: first.toolCalls.map((call) => ({ type: "tool_use", ...call })),
598
+ });
599
+ messages.push({
600
+ role: "user",
601
+ content: first.toolCalls.map((call) => ({
602
+ type: "tool_result",
603
+ toolUseId: call.id,
604
+ name: call.name, // Gemini matches results by name
605
+ content: runTool(call.name, call.input), // your code; returns a string
606
+ })),
607
+ });
608
+
609
+ const final = await provider.complete({ messages, tools });
610
+ console.log(final.text);
611
+ }
612
+ ```
613
+
614
+ - **`input` is always a parsed object** (OpenAI's JSON-string arguments are
615
+ parsed for you).
616
+ - **Set both `toolUseId` and `name`** on a tool result for portability — OpenAI
617
+ matches by id, Gemini by name (each throws if its key is missing).
618
+ - **`complete()`-only**, and intentionally **not** part of `combine()` (a
619
+ multi-model tool loop has no coherent shared state) — use `select()` for it.
620
+
621
+ ### Multimodal input
622
+
623
+ A message's `content` can be a `ContentPart[]` carrying images and documents
624
+ (PDFs) alongside text, as base64 bytes or a URL:
625
+
626
+ ```ts
627
+ await registry.select("anthropic").complete({
628
+ messages: [
629
+ {
630
+ role: "user",
631
+ content: [
632
+ { type: "text", text: "What's in this image?" },
633
+ {
634
+ type: "image",
635
+ source: { kind: "base64", mediaType: "image/png", data: pngBase64 },
636
+ },
637
+ ],
638
+ },
639
+ ],
640
+ });
641
+ ```
642
+
643
+ A `ContentPart` is a `TextPart`, `ImagePart`, or `FilePart`; `source` is either
644
+ `{ kind: "base64"; mediaType; data }` or `{ kind: "url"; url; mediaType? }`.
645
+ Provider support varies — OpenAI's Chat Completions has no URL file source, and
646
+ Gemini resolves a URL only from a Files API / `gs://` URI — so prefer base64 for
647
+ portability. The mapper throws on an unsupported combination.
648
+
649
+ ### Error handling
650
+
651
+ A failed call rejects (`complete()`) or throws on the first iteration
652
+ (`stream()`) with a `ProviderError` — branch on its fields rather than the
653
+ message string:
654
+
655
+ | Field | Type | Notes |
656
+ | ---------- | ------------------------ | ----------------------------------------------------------------------------------- |
657
+ | `kind` | `"api"` \| `"transport"` | `"api"` = the provider returned an error; `"transport"` = the request never landed. |
658
+ | `provider` | `ProviderName` | Which provider failed. |
659
+ | `status` | `number \| undefined` | HTTP status for `kind: "api"`; `undefined` for transport failures. |
660
+ | `code` | `string \| undefined` | Machine code from the body, where the provider sends one. |
661
+ | `type` | `string \| undefined` | Error category from the body. |
662
+ | `body` | `string \| undefined` | The raw error body, for `kind: "api"`. |
663
+ | `cause` | `unknown` | The underlying `fetch` rejection, for `kind: "transport"`. |
664
+
665
+ ```ts
666
+ import { ProviderError } from "combined-ai";
667
+
668
+ try {
669
+ const result = await provider.complete({ messages });
670
+ } catch (err) {
671
+ if (err instanceof ProviderError) {
672
+ if (err.status === 401) throw err; // bad key — unrecoverable
673
+ if (err.kind === "transport") {
674
+ /* never reached the provider */
675
+ }
676
+ }
677
+ throw err;
678
+ }
679
+ ```
680
+
681
+ `complete()` also throws (`kind: "api"`, `status: 200`) if a provider or proxy
682
+ returns HTTP 200 with an `{ error }` body, rather than yielding a silently empty
683
+ result. For `combine()`, individual provider failures are recorded rather than
684
+ thrown — see [Reading the result](#reading-the-result).
685
+
686
+ ### Retries & cancellation
687
+
688
+ Each provider automatically retries the routine retryable statuses — **429**,
689
+ **503**, and **529** — with bounded exponential backoff (honoring `Retry-After`),
690
+ for both `complete()` and `stream()`. Transport failures are **not** retried.
691
+ Configure per provider with `retry` (defaults: 2 retries, 500ms base); set
692
+ `maxRetries: 0` to disable.
693
+
694
+ ```ts
695
+ new ProviderRegistry({
696
+ anthropic: { apiKey: key, retry: { maxRetries: 4, baseDelayMs: 1000 } },
697
+ openai: { apiKey: key, retry: { maxRetries: 0 } }, // no retry
698
+ });
699
+ ```
700
+
701
+ Pass a `signal` to bound or cancel a call. For a timeout use
702
+ `AbortSignal.timeout(ms)`; to cancel manually use an `AbortController`. An aborted
703
+ call rejects with a transport `ProviderError` whose `cause` is the abort reason.
704
+ The backoff respects the signal too, and `combine()` threads one signal into
705
+ every participant call, so aborting it cancels the whole run.
706
+
707
+ ```ts
708
+ await provider.complete({ messages, signal: AbortSignal.timeout(30_000) });
709
+ ```
710
+
711
+ ## Public API
712
+
713
+ Exported from the package entry point:
714
+
715
+ - `ProviderRegistry` — the single entry point: `select()`, the strategy
716
+ dispatcher `combine()`, and the per-strategy methods `consensus()`,
717
+ `pipeline()`, `ensemble()`, `broadcast()`.
718
+ - Config types: `ProviderRegistryConfig`, `ProviderName`, `BuiltInProviderName`,
719
+ `CustomProviderConfig`, `CustomProviderInstance`, `OpenAICompatibleConfig`,
720
+ `AnthropicProviderOptions`, `OpenAIProviderOptions`, `GoogleProviderOptions`,
721
+ `RetryOptions`.
722
+ - Contract types: `Provider`, `Message`, `Role`, `ContentPart`, `TextPart`,
723
+ `ImagePart`, `FilePart`, `MediaSource`, `ToolUsePart`, `ToolResultPart`,
724
+ `CompletionRequest`, `CompletionResult`, `ResponseFormat`, `ToolDefinition`,
725
+ `ToolChoice`, `ToolCall`, `FinishReason`, `Usage`.
726
+ - Combine request types: `CombineRequest` (the dispatcher's broad type),
727
+ `CombineRequestBase`, and the per-strategy `ConsensusRequest`,
728
+ `PipelineRequest`, `EnsembleRequest` (`responseFormat` required),
729
+ `BroadcastRequest`; plus `ParticipantSpec`.
730
+ - Combine result types: `CombineResult` (= `ConsensusResult` | `PipelineResult` |
731
+ `EnsembleResult` | `BroadcastResult`), `EnsembleAgreement`, `CombineUsage`,
732
+ `ParticipantOutcome`, `StrategyName`, `CombineOptions`, `CombineEvent`, and the
733
+ strategy-generic utilities `StrategyRequest<S>` / `ResultFor<S>`.
734
+ - `ProviderError` (a value — usable with `instanceof`) and `ProviderErrorKind`.
735
+
736
+ The concrete provider classes (`AnthropicProvider`, `OpenAIProvider`,
737
+ `GoogleProvider`) are **not** exported — the registry constructs them internally.
738
+
739
+ ## Development
740
+
741
+ Uses **Yarn 4 (Plug'n'Play)** — always use `yarn`, never `npm`, for local work.
742
+
743
+ ```bash
744
+ yarn build # bundle to dist/ (ESM + CJS + types) via tsup
745
+ yarn typecheck # tsc --noEmit
746
+ yarn test # Jest (mocked unit tests; never makes network calls)
747
+ yarn test:integration # live API tests — see below
748
+ yarn lint # ESLint
749
+ yarn format # Prettier --write
750
+ ```
751
+
752
+ ### Live integration tests
753
+
754
+ `yarn test:integration` runs tests against the real provider APIs. They are
755
+ double-gated and skipped by default — each provider's suite runs only when both
756
+ `RUN_LIVE_TESTS=1` (set by the script) and that provider's key are present
757
+ (`ANTHROPIC_API_KEY` / `OPENAI_API_KEY` / `GEMINI_API_KEY`). To enable them:
758
+
759
+ ```bash
760
+ cp .env.example .env
761
+ # edit .env and set your key(s)
762
+ yarn test:integration # all integration tests
763
+ yarn test:integration openai.integration # just one provider's suite
764
+ yarn test:integration consensus.integration # a combine suite (needs all three keys)
765
+ ```
766
+
767
+ The combine suites (`consensus.integration`, `pipeline.integration`,
768
+ `ensemble.integration`, `broadcast.integration`) are **triple-gated** on all
769
+ three keys, since they exercise the full multi-model flow. Live tests use cheap models and a small token
770
+ cap, so cost is negligible. `.env` is gitignored and loaded automatically.
771
+
772
+ ## Changelog
773
+
774
+ Notable changes are recorded in [CHANGELOG.md](./CHANGELOG.md), following the
775
+ [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) format.
776
+
777
+ ## License
778
+
779
+ [MIT](./LICENSE) © Anders Jansson
780
+ </content>
781
+ </invoke>