phantomllm 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,725 @@
1
+ <p align="center">
2
+ <img src="logo.svg" alt="phantomllm" width="200" />
3
+ </p>
4
+
5
+ <h1 align="center">phantomllm</h1>
6
+
7
+ <p align="center">
8
+ Dockerized mock server for OpenAI-compatible APIs.<br/>
9
+ Test your LLM integrations against a real HTTP server instead of patching <code>fetch</code>.
10
+ </p>
11
+
12
+ ```typescript
13
+ import { MockLLM } from 'phantomllm';
14
+
15
+ const mock = new MockLLM();
16
+ await mock.start();
17
+ mock.given.chatCompletion.willReturn('Hello from the mock!');
18
+
19
+ // mock.apiBaseUrl => "http://localhost:55123/v1" — ready to plug into any client
20
+ await mock.stop();
21
+ ```
22
+
23
+ ## Table of Contents
24
+
25
+ - [Why phantomllm?](#why-phantomllm)
26
+ - [Prerequisites](#prerequisites)
27
+ - [Setup](#setup)
28
+ - [Getting the Server URL](#getting-the-server-url)
29
+ - [API Reference](#api-reference)
30
+ - [MockLLM](#mockllm)
31
+ - [Chat Completions](#chat-completions)
32
+ - [Streaming](#streaming)
33
+ - [Embeddings](#embeddings)
34
+ - [Error Simulation](#error-simulation)
35
+ - [Stub Matching](#stub-matching)
36
+ - [Integration Examples](#integration-examples)
37
+ - [OpenAI Node.js SDK](#openai-nodejs-sdk)
38
+ - [Vercel AI SDK](#vercel-ai-sdk)
39
+ - [opencode](#opencode)
40
+ - [LangChain](#langchain)
41
+ - [Python openai package](#python-openai-package)
42
+ - [Plain fetch](#plain-fetch)
43
+ - [curl](#curl)
44
+ - [Test Framework Integration](#test-framework-integration)
45
+ - [Vitest](#vitest)
46
+ - [Jest](#jest)
47
+ - [Shared Fixture for Multi-File Suites](#shared-fixture-for-multi-file-suites)
48
+ - [Performance](#performance)
49
+ - [Configuration](#configuration)
50
+ - [Troubleshooting](#troubleshooting)
51
+ - [License](#license)
52
+
53
+ ## Why phantomllm?
54
+
55
+ - **Real HTTP server** — no monkey-patching `fetch` or `http`. Your SDK makes actual network calls through a real TCP connection.
56
+ - **Works with any client** — OpenAI SDK, Vercel AI SDK, LangChain, opencode, Python, curl — anything that speaks the OpenAI API protocol.
57
+ - **Streaming support** — SSE chunked responses work exactly like the real OpenAI API.
58
+ - **Fast** — ~1s container cold start, sub-millisecond response latency, 4,000+ req/s throughput.
59
+ - **Simple API** — fluent `given/when` pattern: `mock.given.chatCompletion.forModel('gpt-4').willReturn('Hello')`.
60
+
61
+ ## Prerequisites
62
+
63
+ | Requirement | Version |
64
+ |-------------|---------|
65
+ | Node.js | 18+ |
66
+ | Docker | 20.10+ |
67
+
68
+ Docker must be running before you call `mock.start()`. Verify with:
69
+
70
+ ```bash
71
+ docker info
72
+ ```
73
+
74
+ ## Setup
75
+
76
+ ### 1. Install the package
77
+
78
+ ```bash
79
+ npm install phantomllm --save-dev
80
+ ```
81
+
82
+ ### 2. Build the Docker image
83
+
84
+ The mock server runs inside Docker. You need to build the image once before running tests:
85
+
86
+ ```bash
87
+ # Clone the repo (if using from source)
88
+ npm run docker:build
89
+ ```
90
+
91
+ This builds a ~170MB Alpine-based image tagged `phantomllm-server:latest`. The image contains only the compiled Fastify server — no dev dependencies.
92
+
93
+ ### 3. Verify it works
94
+
95
+ ```typescript
96
+ import { MockLLM } from 'phantomllm';
97
+
98
+ const mock = new MockLLM();
99
+ await mock.start();
100
+ console.log('Mock server running at:', mock.apiBaseUrl);
101
+ // => "Mock server running at: http://localhost:55123/v1"
102
+ await mock.stop();
103
+ ```
104
+
105
+ ## Getting the Server URL
106
+
107
+ `MockLLM` provides two URL getters. Use whichever fits your client:
108
+
109
+ ```typescript
110
+ const mock = new MockLLM();
111
+ await mock.start();
112
+
113
+ mock.baseUrl // "http://localhost:55123" — raw host:port
114
+ mock.apiBaseUrl // "http://localhost:55123/v1" — includes /v1 prefix
115
+ ```
116
+
117
+ **Which one to use:**
118
+
119
+ | Client / Tool | Property | Example |
120
+ |---------------|----------|---------|
121
+ | OpenAI SDK (`baseURL`) | `mock.apiBaseUrl` | `new OpenAI({ baseURL: mock.apiBaseUrl })` |
122
+ | Vercel AI SDK (`baseURL`) | `mock.apiBaseUrl` | `createOpenAI({ baseURL: mock.apiBaseUrl })` |
123
+ | LangChain (`configuration.baseURL`) | `mock.apiBaseUrl` | `new ChatOpenAI({ configuration: { baseURL: mock.apiBaseUrl } })` |
124
+ | opencode config | `mock.apiBaseUrl` | `"baseURL": "http://localhost:55123/v1"` |
125
+ | Python openai (`base_url`) | `mock.apiBaseUrl` | `OpenAI(base_url=mock.apiBaseUrl)` |
126
+ | Plain fetch | `mock.baseUrl` | `fetch(\`${mock.baseUrl}/v1/chat/completions\`)` |
127
+ | Admin API | `mock.baseUrl` | `fetch(\`${mock.baseUrl}/_admin/health\`)` |
128
+
129
+ Most SDK clients expect the URL to end with `/v1`. Use `mock.apiBaseUrl` and you won't need to think about it.
130
+
131
+ ## API Reference
132
+
133
+ ### MockLLM
134
+
135
+ The main class. Creates and manages a Docker container running the mock OpenAI server.
136
+
137
+ ```typescript
138
+ import { MockLLM } from 'phantomllm';
139
+
140
+ const mock = new MockLLM({
141
+ image: 'phantomllm-server:latest', // Docker image name (default)
142
+ containerPort: 8080, // Port inside the container (default)
143
+ reuse: true, // Reuse container across runs (default)
144
+ startupTimeout: 30_000, // Max ms to wait for startup (default)
145
+ });
146
+ ```
147
+
148
+ | Method / Property | Returns | Description |
149
+ |---|---|---|
150
+ | `await mock.start()` | `void` | Start the Docker container. Idempotent — safe to call twice. |
151
+ | `await mock.stop()` | `void` | Stop and remove the container. Idempotent. |
152
+ | `mock.baseUrl` | `string` | Server URL without `/v1`, e.g. `http://localhost:55123`. |
153
+ | `mock.apiBaseUrl` | `string` | Server URL with `/v1`, e.g. `http://localhost:55123/v1`. Pass this to SDK clients. |
154
+ | `mock.given` | `GivenStubs` | Entry point for the fluent stubbing API. |
155
+ | `await mock.clear()` | `void` | Remove all registered stubs. Call between tests. |
156
+
157
+ `MockLLM` implements `Symbol.asyncDispose` for automatic cleanup:
158
+
159
+ ```typescript
160
+ {
161
+ await using mock = new MockLLM();
162
+ await mock.start();
163
+ // ... use mock ...
164
+ } // mock.stop() called automatically
165
+ ```
166
+
167
+ ### Chat Completions
168
+
169
+ Stub `POST /v1/chat/completions` responses.
170
+
171
+ ```typescript
172
+ // Any request returns this content
173
+ mock.given.chatCompletion.willReturn('Hello!');
174
+
175
+ // Match by model
176
+ mock.given.chatCompletion
177
+ .forModel('gpt-4o')
178
+ .willReturn('I am GPT-4o.');
179
+
180
+ // Match by message content (case-insensitive substring)
181
+ mock.given.chatCompletion
182
+ .withMessageContaining('weather')
183
+ .willReturn('Sunny, 72F.');
184
+
185
+ // Combine matchers — both must match
186
+ mock.given.chatCompletion
187
+ .forModel('gpt-4o')
188
+ .withMessageContaining('translate')
189
+ .willReturn('Bonjour!');
190
+ ```
191
+
192
+ | Method | Description |
193
+ |---|---|
194
+ | `.forModel(model)` | Only match requests with this exact model name. |
195
+ | `.withMessageContaining(text)` | Only match when any user message contains this substring (case-insensitive). |
196
+ | `.willReturn(content)` | Return a `chat.completion` response with this content. |
197
+
198
+ ### Streaming
199
+
200
+ Return SSE-streamed responses, matching the real OpenAI streaming format.
201
+
202
+ ```typescript
203
+ mock.given.chatCompletion
204
+ .forModel('gpt-4o')
205
+ .willStream(['Hello', ', ', 'world', '!']);
206
+ ```
207
+
208
+ Each string becomes a separate `chat.completion.chunk` SSE event with `delta.content`. The stream ends with a chunk containing `finish_reason: "stop"` followed by `data: [DONE]`.
209
+
210
+ | Method | Description |
211
+ |---|---|
212
+ | `.willStream(chunks)` | Return a stream of SSE events, one per string in the array. |
213
+
214
+ ### Embeddings
215
+
216
+ Stub `POST /v1/embeddings` responses.
217
+
218
+ ```typescript
219
+ // Single embedding
220
+ mock.given.embedding
221
+ .forModel('text-embedding-3-small')
222
+ .willReturn([0.1, 0.2, 0.3]);
223
+
224
+ // Batch — multiple vectors for multiple inputs
225
+ mock.given.embedding
226
+ .willReturn([
227
+ [0.1, 0.2, 0.3],
228
+ [0.4, 0.5, 0.6],
229
+ ]);
230
+ ```
231
+
232
+ | Method | Description |
233
+ |---|---|
234
+ | `.forModel(model)` | Only match requests with this model. |
235
+ | `.willReturn(vector)` | Return a single vector (`number[]`) or batch of vectors (`number[][]`). |
236
+
237
+ ### Error Simulation
238
+
239
+ Force error responses to test retry logic, error handling, and fallbacks.
240
+
241
+ ```typescript
242
+ mock.given.chatCompletion.willError(429, 'Rate limit exceeded');
243
+ mock.given.chatCompletion.willError(500, 'Internal server error');
244
+ mock.given.embedding.willError(400, 'Invalid input');
245
+
246
+ // Scoped to a specific model
247
+ mock.given.chatCompletion
248
+ .forModel('gpt-4o')
249
+ .willError(403, 'Model access denied');
250
+ ```
251
+
252
+ Error responses follow the OpenAI error format:
253
+
254
+ ```json
255
+ {
256
+ "error": {
257
+ "message": "Rate limit exceeded",
258
+ "type": "api_error",
259
+ "param": null,
260
+ "code": null
261
+ }
262
+ }
263
+ ```
264
+
265
+ ### Stub Matching
266
+
267
+ When multiple stubs are registered, the most specific match wins:
268
+
269
+ 1. **Specificity** — a stub matching both model and content (specificity 2) beats one matching only model (specificity 1), which beats a catch-all (specificity 0).
270
+ 2. **Registration order** — among stubs with equal specificity, the first registered wins.
271
+
272
+ ```typescript
273
+ // Catch-all (specificity 0)
274
+ mock.given.chatCompletion.willReturn('Default response');
275
+
276
+ // Model-specific (specificity 1) — wins over catch-all for gpt-4o
277
+ mock.given.chatCompletion
278
+ .forModel('gpt-4o')
279
+ .willReturn('GPT-4o response');
280
+
281
+ // Model + content (specificity 2) — wins over model-only for matching messages
282
+ mock.given.chatCompletion
283
+ .forModel('gpt-4o')
284
+ .withMessageContaining('weather')
285
+ .willReturn('Weather-specific GPT-4o response');
286
+ ```
287
+
288
+ When no stub matches a request, the server returns HTTP 418 with a descriptive error message showing what was requested.
289
+
290
+ ## Integration Examples
291
+
292
+ ### OpenAI Node.js SDK
293
+
294
+ ```typescript
295
+ import OpenAI from 'openai';
296
+ import { MockLLM } from 'phantomllm';
297
+
298
+ const mock = new MockLLM();
299
+ await mock.start();
300
+
301
+ const openai = new OpenAI({
302
+ baseURL: mock.apiBaseUrl,
303
+ apiKey: 'test-key', // any string — the mock doesn't validate keys
304
+ });
305
+
306
+ // Non-streaming
307
+ mock.given.chatCompletion.willReturn('The capital of France is Paris.');
308
+
309
+ const response = await openai.chat.completions.create({
310
+ model: 'gpt-4o',
311
+ messages: [{ role: 'user', content: 'What is the capital of France?' }],
312
+ });
313
+ console.log(response.choices[0].message.content);
314
+ // => "The capital of France is Paris."
315
+
316
+ // Streaming
317
+ mock.given.chatCompletion.willStream(['The capital', ' of France', ' is Paris.']);
318
+
319
+ const stream = await openai.chat.completions.create({
320
+ model: 'gpt-4o',
321
+ messages: [{ role: 'user', content: 'Capital of France?' }],
322
+ stream: true,
323
+ });
324
+
325
+ for await (const chunk of stream) {
326
+ process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
327
+ }
328
+
329
+ // Embeddings
330
+ mock.given.embedding
331
+ .forModel('text-embedding-3-small')
332
+ .willReturn([0.1, 0.2, 0.3]);
333
+
334
+ const embedding = await openai.embeddings.create({
335
+ model: 'text-embedding-3-small',
336
+ input: 'Hello world',
337
+ });
338
+ console.log(embedding.data[0].embedding); // => [0.1, 0.2, 0.3]
339
+
340
+ await mock.stop();
341
+ ```
342
+
343
+ ### Vercel AI SDK
344
+
345
+ ```typescript
346
+ import { generateText, streamText } from 'ai';
347
+ import { createOpenAI } from '@ai-sdk/openai';
348
+ import { MockLLM } from 'phantomllm';
349
+
350
+ const mock = new MockLLM();
351
+ await mock.start();
352
+
353
+ const provider = createOpenAI({
354
+ baseURL: mock.apiBaseUrl,
355
+ apiKey: 'test-key',
356
+ });
357
+
358
+ // generateText
359
+ mock.given.chatCompletion.willReturn('Paris');
360
+
361
+ const { text } = await generateText({
362
+ model: provider.chat('gpt-4o'),
363
+ prompt: 'Capital of France?',
364
+ });
365
+ console.log(text); // => "Paris"
366
+
367
+ // streamText
368
+ mock.given.chatCompletion.willStream(['Par', 'is']);
369
+
370
+ const result = streamText({
371
+ model: provider.chat('gpt-4o'),
372
+ prompt: 'Capital of France?',
373
+ });
374
+
375
+ for await (const chunk of result.textStream) {
376
+ process.stdout.write(chunk);
377
+ }
378
+
379
+ await mock.stop();
380
+ ```
381
+
382
+ > **Note:** Use `provider.chat('model')` instead of `provider('model')` to ensure requests go through `/v1/chat/completions`. The default `provider('model')` in `@ai-sdk/openai` v3+ uses the Responses API.
383
+
384
+ ### opencode
385
+
386
+ Add a provider entry to your `opencode.json` pointing at the mock:
387
+
388
+ ```jsonc
389
+ {
390
+ "provider": {
391
+ "mock": {
392
+ "api": "openai",
393
+ "baseURL": "http://localhost:PORT/v1",
394
+ "apiKey": "test-key",
395
+ "models": {
396
+ "gpt-4o": { "id": "gpt-4o" }
397
+ }
398
+ }
399
+ }
400
+ }
401
+ ```
402
+
403
+ Start the mock and print the URL to use:
404
+
405
+ ```typescript
406
+ const mock = new MockLLM();
407
+ await mock.start();
408
+ console.log(`Set baseURL to: ${mock.apiBaseUrl}`);
409
+ // Update the port in opencode.json to match
410
+ ```
411
+
412
+ ### LangChain
413
+
414
+ ```typescript
415
+ import { ChatOpenAI } from '@langchain/openai';
416
+ import { MockLLM } from 'phantomllm';
417
+
418
+ const mock = new MockLLM();
419
+ await mock.start();
420
+
421
+ mock.given.chatCompletion.willReturn('Hello from LangChain!');
422
+
423
+ const model = new ChatOpenAI({
424
+ modelName: 'gpt-4o',
425
+ configuration: {
426
+ baseURL: mock.apiBaseUrl,
427
+ apiKey: 'test-key',
428
+ },
429
+ });
430
+
431
+ const response = await model.invoke('Say hello');
432
+ console.log(response.content); // => "Hello from LangChain!"
433
+
434
+ await mock.stop();
435
+ ```
436
+
437
+ ### Python openai package
438
+
439
+ The mock server is a real HTTP server — any language can use it. Start the mock from Node.js, then connect from Python:
440
+
441
+ ```python
442
+ import openai
443
+
444
+ client = openai.OpenAI(
445
+ base_url="http://localhost:55123/v1", # use mock.apiBaseUrl
446
+ api_key="test-key",
447
+ )
448
+
449
+ response = client.chat.completions.create(
450
+ model="gpt-4o",
451
+ messages=[{"role": "user", "content": "Hello"}],
452
+ )
453
+ print(response.choices[0].message.content)
454
+ ```
455
+
456
+ ### Plain fetch
457
+
458
+ ```typescript
459
+ const response = await fetch(`${mock.baseUrl}/v1/chat/completions`, {
460
+ method: 'POST',
461
+ headers: { 'Content-Type': 'application/json' },
462
+ body: JSON.stringify({
463
+ model: 'gpt-4o',
464
+ messages: [{ role: 'user', content: 'Hello' }],
465
+ }),
466
+ });
467
+
468
+ const data = await response.json();
469
+ console.log(data.choices[0].message.content);
470
+ ```
471
+
472
+ ### curl
473
+
474
+ ```bash
475
+ curl http://localhost:55123/v1/chat/completions \
476
+ -H "Content-Type: application/json" \
477
+ -d '{
478
+ "model": "gpt-4o",
479
+ "messages": [{"role": "user", "content": "Hello"}]
480
+ }'
481
+ ```
482
+
483
+ ## Test Framework Integration
484
+
485
+ ### Vitest
486
+
487
+ ```typescript
488
+ import { describe, it, expect, beforeAll, afterAll, beforeEach } from 'vitest';
489
+ import { MockLLM } from 'phantomllm';
490
+ import OpenAI from 'openai';
491
+
492
+ describe('my LLM feature', () => {
493
+ const mock = new MockLLM();
494
+ let openai: OpenAI;
495
+
496
+ beforeAll(async () => {
497
+ await mock.start();
498
+ openai = new OpenAI({ baseURL: mock.apiBaseUrl, apiKey: 'test' });
499
+ }, 30_000);
500
+
501
+ afterAll(async () => {
502
+ await mock.stop();
503
+ });
504
+
505
+ beforeEach(async () => {
506
+ await mock.clear(); // reset stubs between tests
507
+ });
508
+
509
+ it('should summarize text', async () => {
510
+ mock.given.chatCompletion
511
+ .withMessageContaining('summarize')
512
+ .willReturn('This is a summary.');
513
+
514
+ const res = await openai.chat.completions.create({
515
+ model: 'gpt-4o',
516
+ messages: [{ role: 'user', content: 'Please summarize this article.' }],
517
+ });
518
+
519
+ expect(res.choices[0].message.content).toBe('This is a summary.');
520
+ });
521
+
522
+ it('should handle rate limits', async () => {
523
+ mock.given.chatCompletion.willError(429, 'Rate limit exceeded');
524
+
525
+ await expect(
526
+ openai.chat.completions.create({
527
+ model: 'gpt-4o',
528
+ messages: [{ role: 'user', content: 'Hello' }],
529
+ }),
530
+ ).rejects.toThrow();
531
+ });
532
+
533
+ it('should stream responses', async () => {
534
+ mock.given.chatCompletion.willStream(['Hello', ' World']);
535
+
536
+ const stream = await openai.chat.completions.create({
537
+ model: 'gpt-4o',
538
+ messages: [{ role: 'user', content: 'Hi' }],
539
+ stream: true,
540
+ });
541
+
542
+ const chunks: string[] = [];
543
+ for await (const chunk of stream) {
544
+ const content = chunk.choices[0]?.delta?.content;
545
+ if (content) chunks.push(content);
546
+ }
547
+ expect(chunks).toEqual(['Hello', ' World']);
548
+ });
549
+ });
550
+ ```
551
+
552
+ ### Jest
553
+
554
+ ```typescript
555
+ import { MockLLM } from 'phantomllm';
556
+ import OpenAI from 'openai';
557
+
558
+ describe('my LLM feature', () => {
559
+ const mock = new MockLLM();
560
+ let openai: OpenAI;
561
+
562
+ beforeAll(async () => {
563
+ await mock.start();
564
+ openai = new OpenAI({ baseURL: mock.apiBaseUrl, apiKey: 'test' });
565
+ }, 30_000); // container startup timeout
566
+
567
+ afterAll(() => mock.stop());
568
+ beforeEach(() => mock.clear());
569
+
570
+ test('returns stubbed response', async () => {
571
+ mock.given.chatCompletion.willReturn('Mocked!');
572
+
573
+ const res = await openai.chat.completions.create({
574
+ model: 'gpt-4o',
575
+ messages: [{ role: 'user', content: 'Hi' }],
576
+ });
577
+
578
+ expect(res.choices[0].message.content).toBe('Mocked!');
579
+ });
580
+ });
581
+ ```
582
+
583
+ ### Shared Fixture for Multi-File Suites
584
+
585
+ Start one container for your entire test suite. Each test file imports the shared mock and clears stubs between tests.
586
+
587
+ **`tests/support/mock.ts`**
588
+
589
+ ```typescript
590
+ import { MockLLM } from 'phantomllm';
591
+
592
+ export const mock = new MockLLM();
593
+
594
+ export async function setup() {
595
+ await mock.start();
596
+ // Make the URL available to other processes if needed
597
+ process.env.PHANTOMLLM_URL = mock.apiBaseUrl;
598
+ }
599
+
600
+ export async function teardown() {
601
+ await mock.stop();
602
+ }
603
+ ```
604
+
605
+ **`vitest.config.ts`**
606
+
607
+ ```typescript
608
+ import { defineConfig } from 'vitest/config';
609
+
610
+ export default defineConfig({
611
+ test: {
612
+ globalSetup: ['./tests/support/mock.ts'],
613
+ },
614
+ });
615
+ ```
616
+
617
+ **Individual test files**
618
+
619
+ ```typescript
620
+ import { mock } from '../support/mock.js';
621
+
622
+ beforeEach(() => mock.clear());
623
+
624
+ it('works', async () => {
625
+ mock.given.chatCompletion.willReturn('Hello!');
626
+ // ...
627
+ });
628
+ ```
629
+
630
+ ## Performance
631
+
632
+ Benchmarks on Apple Silicon (Docker via OrbStack):
633
+
634
+ ### Container Lifecycle
635
+
636
+ | Metric | Time |
637
+ |--------|------|
638
+ | Cold start (`mock.start()`) | ~1.1s |
639
+ | Stop (`mock.stop()`) | ~130ms |
640
+ | Full lifecycle (start, stub, request, stop) | ~1.2s |
641
+
642
+ ### Request Latency (through Docker network)
643
+
644
+ | Metric | Median | p95 |
645
+ |--------|--------|-----|
646
+ | Chat completion (non-streaming) | 0.6ms | 1.8ms |
647
+ | Streaming TTFB | 0.7ms | 0.9ms |
648
+ | Streaming total (8 chunks) | 0.7ms | 1.0ms |
649
+ | Embedding (1536-dim) | 0.7ms | 1.6ms |
650
+ | Embedding batch (10x1536) | 1.9ms | 2.7ms |
651
+ | Stub registration | 0.5ms | 0.8ms |
652
+ | Clear stubs | 0.2ms | 0.4ms |
653
+
654
+ ### Throughput
655
+
656
+ | Metric | Requests/sec |
657
+ |--------|-------------|
658
+ | Sequential chat completions | ~4,300 |
659
+ | Concurrent (10 workers) | ~6,400 |
660
+ | Health endpoint | ~5,900 |
661
+
662
+ ### Tips
663
+
664
+ - **Don't restart between tests.** Call `mock.clear()` (sub-millisecond) instead of `stop()`/`start()` (~1.2s).
665
+ - **Use global setup.** Start one container for your entire suite. See [Shared Fixture](#shared-fixture-for-multi-file-suites).
666
+ - **Cache the Docker image in CI.** Build it once and cache the layer:
667
+
668
+ ```yaml
669
+ # GitHub Actions
670
+ - name: Build mock server image
671
+ run: npm run docker:build
672
+
673
+ - name: Cache Docker layers
674
+ uses: actions/cache@v4
675
+ with:
676
+ path: /tmp/.buildx-cache
677
+ key: ${{ runner.os }}-docker-phantomllm-${{ hashFiles('Dockerfile') }}
678
+ ```
679
+
680
+ ## Configuration
681
+
682
+ ### Constructor Options
683
+
684
+ | Option | Type | Default | Description |
685
+ |--------|------|---------|-------------|
686
+ | `image` | `string` | `'phantomllm-server:latest'` | Docker image to run. |
687
+ | `containerPort` | `number` | `8080` | Port the server listens on inside the container. |
688
+ | `reuse` | `boolean` | `true` | Reuse a running container across test runs. |
689
+ | `startupTimeout` | `number` | `30000` | Max milliseconds to wait for the container to become healthy. |
690
+
691
+ ### Environment Variables
692
+
693
+ | Variable | Description |
694
+ |----------|-------------|
695
+ | `PHANTOMLLM_IMAGE` | Override the Docker image name without changing code. Takes precedence over the default but not over the constructor `image` option. |
696
+
697
+ ### OpenAI-Compatible Endpoints
698
+
699
+ The mock server implements:
700
+
701
+ | Endpoint | Method | Description |
702
+ |----------|--------|-------------|
703
+ | `/v1/chat/completions` | POST | Chat completions (streaming and non-streaming) |
704
+ | `/v1/embeddings` | POST | Text embeddings |
705
+ | `/v1/models` | GET | List available models |
706
+ | `/_admin/stubs` | POST | Register a stub |
707
+ | `/_admin/stubs` | DELETE | Clear all stubs |
708
+ | `/_admin/health` | GET | Health check |
709
+
710
+ ## Troubleshooting
711
+
712
+ | Problem | Cause | Solution |
713
+ |---|---|---|
714
+ | `ContainerNotStartedError` | Using `baseUrl`, `given`, or `clear()` before `start()`. | Call `await mock.start()` first. |
715
+ | Container startup timeout | Docker not running or image not built. | Run `docker info` to verify Docker. Run `npm run docker:build` to build the image. |
716
+ | Connection refused | Wrong URL or container not ready. | Use `mock.apiBaseUrl` for SDK clients. Ensure `start()` has resolved. |
717
+ | Stubs leaking between tests | Stubs persist until cleared. | Call `await mock.clear()` in `beforeEach`. |
718
+ | 418 response | No stub matches the request. | Register a stub matching the model/content, or add a catch-all: `mock.given.chatCompletion.willReturn('...')`. |
719
+ | `PHANTOMLLM_IMAGE` env var | Need a custom image. | Set `PHANTOMLLM_IMAGE=my-registry/image:tag` in your environment. |
720
+ | Slow CI | Image rebuilt every run. | Cache Docker layers and enable container reuse. |
721
+ | AI SDK uses wrong endpoint | `provider('model')` defaults to Responses API in v3+. | Use `provider.chat('model')` to target `/v1/chat/completions`. |
722
+
723
+ ## License
724
+
725
+ MIT