@ljoukov/llm 5.0.4 → 7.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -46,14 +46,20 @@ Use one backend:
46
46
 
47
47
  - `GEMINI_API_KEY` or `GOOGLE_API_KEY` for the Gemini Developer API
48
48
  - `GOOGLE_SERVICE_ACCOUNT_JSON` for Vertex AI (the contents of a service account JSON key file, not a file path)
49
+ - `LLM_FILES_GCS_BUCKET` for canonical file storage used by `files.create()` and automatic large-attachment offload
50
+ - `LLM_FILES_GCS_PREFIX` (optional object-name prefix inside `LLM_FILES_GCS_BUCKET`)
49
51
  - `VERTEX_GCS_BUCKET` for Vertex-backed Gemini file attachments / `file_id` inputs
50
52
  - `VERTEX_GCS_PREFIX` (optional object-name prefix inside `VERTEX_GCS_BUCKET`)
51
53
 
52
54
  If a Gemini API key is present, the library uses the Gemini Developer API. Otherwise it falls back to Vertex AI.
53
55
 
54
- For Vertex-backed Gemini file inputs, the library mirrors OpenAI-backed canonical files into GCS and then passes the
55
- resulting `gs://...` URI to Vertex. Configure a lifecycle rule on that bucket to delete objects after 2 days if you
56
- want hard 48-hour cleanup for mirrored objects.
56
+ Canonical files are stored in GCS with a default `48h` TTL. OpenAI and ChatGPT consume those files via signed HTTPS
57
+ URLs. Gemini still mirrors canonical files lazily into provider-native storage when needed:
58
+
59
+ - Gemini Developer API mirrors into Gemini Files
60
+ - Vertex-backed Gemini mirrors into `VERTEX_GCS_BUCKET` and uses `gs://...` URIs
61
+
62
+ Configure lifecycle rules on those buckets if you want hard 48-hour cleanup for mirrored objects.
57
63
 
58
64
  #### Vertex AI service account setup
59
65
 
@@ -137,7 +143,7 @@ configureModelConcurrency({
137
143
  fireworks: 8,
138
144
  },
139
145
  modelCaps: {
140
- "gpt-5.2": 24,
146
+ "gpt-5.4-mini": 24,
141
147
  },
142
148
  providerModelCaps: {
143
149
  google: {
@@ -168,7 +174,7 @@ Use OpenAI-style request fields:
168
174
  import { generateText } from "@ljoukov/llm";
169
175
 
170
176
  const result = await generateText({
171
- model: "gpt-5.2",
177
+ model: "gpt-5.4-mini",
172
178
  input: "Write one sentence about TypeScript.",
173
179
  });
174
180
 
@@ -182,7 +188,7 @@ console.log(result.usage, result.costUsd);
182
188
  import { streamText } from "@ljoukov/llm";
183
189
 
184
190
  const call = streamText({
185
- model: "gpt-5.2",
191
+ model: "gpt-5.4-mini",
186
192
  input: "Explain what a hash function is in one paragraph.",
187
193
  });
188
194
 
@@ -228,7 +234,7 @@ const input: LlmInputMessage[] = [
228
234
  },
229
235
  ];
230
236
 
231
- const result = await generateText({ model: "gpt-5.2", input });
237
+ const result = await generateText({ model: "gpt-5.4-mini", input });
232
238
  console.log(result.text);
233
239
  ```
234
240
 
@@ -256,13 +262,13 @@ const input: LlmInputMessage[] = [
256
262
  },
257
263
  ];
258
264
 
259
- const result = await generateText({ model: "gpt-5.2", input });
265
+ const result = await generateText({ model: "gpt-5.4-mini", input });
260
266
  console.log(result.text);
261
267
  ```
262
268
 
263
- Canonical storage defaults to OpenAI Files with `purpose: "user_data"` and a `48h` TTL.
269
+ Canonical storage now uses GCS-backed objects with a `48h` TTL.
264
270
 
265
- - OpenAI models use that `file_id` directly.
271
+ - OpenAI and ChatGPT models resolve that `file_id` to a signed HTTPS URL.
266
272
  - Gemini Developer API mirrors the file lazily into Gemini Files when needed.
267
273
  - Vertex-backed Gemini mirrors the file lazily into `VERTEX_GCS_BUCKET` and uses `gs://...` URIs.
268
274
 
@@ -294,7 +300,7 @@ When the combined inline attachment payload in a single request would exceed abo
294
300
  the library automatically uploads those attachments to the canonical files store first and swaps the prompt to file
295
301
  references:
296
302
 
297
- - OpenAI: uses canonical OpenAI `file_id`s directly
303
+ - OpenAI / ChatGPT: use signed HTTPS URLs for canonical files
298
304
  - Gemini Developer API: mirrors to Gemini Files and sends `fileData.fileUri`
299
305
  - Vertex AI: mirrors to `VERTEX_GCS_BUCKET` and sends `gs://...` URIs
300
306
 
@@ -314,14 +320,28 @@ const input: LlmInputMessage[] = [
314
320
  },
315
321
  ];
316
322
 
317
- const result = await generateText({ model: "gpt-5.2", input });
323
+ const result = await generateText({ model: "gpt-5.4-mini", input });
318
324
  console.log(result.text);
319
325
  ```
320
326
 
321
327
  You can mix direct `file_id` parts with `inlineData`. Small attachments stay inline; oversized turns are upgraded to
322
328
  canonical files automatically. Tool loops do the same for large tool outputs, and they also re-check the combined size
323
- after parallel tool calls so a batch of individually-small images/files still gets upgraded to `file_id` references
324
- before the next model request if the aggregate payload is too large.
329
+ after parallel tool calls so a batch of individually-small images/files still gets upgraded to canonical-file
330
+ references before the next model request if the aggregate payload is too large.
331
+
332
+ You can also control image analysis fidelity with request-level `mediaResolution`:
333
+
334
+ - `low`, `medium`, `high`, `original`, `auto`
335
+ - OpenAI / ChatGPT map this onto image `detail`
336
+ - Gemini maps this onto media resolution/tokenization settings
337
+
338
+ ```ts
339
+ const result = await generateText({
340
+ model: "gpt-5.4",
341
+ mediaResolution: "original",
342
+ input,
343
+ });
344
+ ```
325
345
 
326
346
  OpenAI-style direct file-id example:
327
347
 
@@ -364,7 +384,7 @@ const input: LlmInputMessage[] = [
364
384
  },
365
385
  ];
366
386
 
367
- const result = await generateText({ model: "gpt-5.2", input });
387
+ const result = await generateText({ model: "gpt-5.4-mini", input });
368
388
  console.log(result.text);
369
389
  ```
370
390
 
@@ -390,7 +410,7 @@ const input: LlmInputMessage[] = [
390
410
  },
391
411
  ];
392
412
 
393
- const result = await generateText({ model: "gpt-5.2", input });
413
+ const result = await generateText({ model: "gpt-5.4-mini", input });
394
414
  console.log(result.text);
395
415
  ```
396
416
 
@@ -439,6 +459,11 @@ console.log(result.text);
439
459
 
440
460
  `chatgpt-gpt-5.4-fast` is also supported as a convenience alias for ChatGPT-authenticated `gpt-5.4` with priority processing enabled (`service_tier="priority"`), matching Codex `/fast` semantics.
441
461
 
462
+ Supported OpenAI text model ids are fixed literal unions in code, not arbitrary strings:
463
+
464
+ - OpenAI API: `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.4-nano`
465
+ - ChatGPT auth: `chatgpt-gpt-5.4`, `chatgpt-gpt-5.4-fast`, `chatgpt-gpt-5.4-mini`, `chatgpt-gpt-5.3-codex-spark`
466
+
442
467
  ## JSON outputs
443
468
 
444
469
  `generateJson()` validates the output with Zod and returns the parsed value.
@@ -458,7 +483,7 @@ const schema = z.object({
458
483
  });
459
484
 
460
485
  const { value } = await generateJson({
461
- model: "gpt-5.2",
486
+ model: "gpt-5.4-mini",
462
487
  input: "Return a JSON object with ok=true and message='hello'.",
463
488
  schema,
464
489
  });
@@ -481,7 +506,7 @@ const schema = z.object({
481
506
  });
482
507
 
483
508
  const call = streamJson({
484
- model: "gpt-5.2",
509
+ model: "gpt-5.4-mini",
485
510
  input: "Return a JSON object with ok=true and message='hello'.",
486
511
  schema,
487
512
  });
@@ -503,7 +528,7 @@ If you only want thought deltas (no partial JSON), set `streamMode: "final"`.
503
528
 
504
529
  ```ts
505
530
  const call = streamJson({
506
- model: "gpt-5.2",
531
+ model: "gpt-5.4-mini",
507
532
  input: "Return a JSON object with ok=true and message='hello'.",
508
533
  schema,
509
534
  streamMode: "final",
@@ -514,7 +539,7 @@ If you want to keep `generateJson()` but still stream thoughts, pass an `onEvent
514
539
 
515
540
  ```ts
516
541
  const { value } = await generateJson({
517
- model: "gpt-5.2",
542
+ model: "gpt-5.4-mini",
518
543
  input: "Return a JSON object with ok=true and message='hello'.",
519
544
  schema,
520
545
  onEvent: (event) => {
@@ -551,13 +576,13 @@ configureTelemetry({
551
576
  });
552
577
 
553
578
  const { value } = await generateJson({
554
- model: "gpt-5.2",
579
+ model: "gpt-5.4-mini",
555
580
  input: "Return { ok: true }.",
556
581
  schema: z.object({ ok: z.boolean() }),
557
582
  });
558
583
 
559
584
  await runAgentLoop({
560
- model: "gpt-5.2",
585
+ model: "gpt-5.4-mini",
561
586
  input: "Inspect the repo and update the file.",
562
587
  filesystemTool: true,
563
588
  });
@@ -567,7 +592,7 @@ Per-call opt-out:
567
592
 
568
593
  ```ts
569
594
  await generateJson({
570
- model: "gpt-5.2",
595
+ model: "gpt-5.4-mini",
571
596
  input: "Return { ok: true }.",
572
597
  schema: z.object({ ok: z.boolean() }),
573
598
  telemetry: false,
@@ -598,7 +623,7 @@ Use this when the model provider executes the tool remotely (for example search/
598
623
  import { generateText } from "@ljoukov/llm";
599
624
 
600
625
  const result = await generateText({
601
- model: "gpt-5.2",
626
+ model: "gpt-5.4-mini",
602
627
  input: "Find 3 relevant sources about X and summarize them.",
603
628
  tools: [{ type: "web-search", mode: "live" }, { type: "code-execution" }],
604
629
  });
@@ -615,7 +640,7 @@ import { runToolLoop, tool } from "@ljoukov/llm";
615
640
  import { z } from "zod";
616
641
 
617
642
  const result = await runToolLoop({
618
- model: "gpt-5.2",
643
+ model: "gpt-5.4-mini",
619
644
  input: "What is 12 * 9? Use the tool.",
620
645
  tools: {
621
646
  multiply: tool({
@@ -641,7 +666,7 @@ import { streamToolLoop, tool } from "@ljoukov/llm";
641
666
  import { z } from "zod";
642
667
 
643
668
  const call = streamToolLoop({
644
- model: "chatgpt-gpt-5.3-codex",
669
+ model: "chatgpt-gpt-5.3-codex-spark",
645
670
  input: "Start implementing the feature.",
646
671
  tools: {
647
672
  echo: tool({
@@ -665,7 +690,7 @@ import { createToolLoopSteeringChannel, runAgentLoop } from "@ljoukov/llm";
665
690
 
666
691
  const steering = createToolLoopSteeringChannel();
667
692
  const run = runAgentLoop({
668
- model: "chatgpt-gpt-5.3-codex",
693
+ model: "chatgpt-gpt-5.3-codex-spark",
669
694
  input: "Implement the task.",
670
695
  filesystemTool: true,
671
696
  steering,
@@ -683,13 +708,15 @@ const result = await run;
683
708
  - built-in subagent orchestration (delegate work across spawned agents),
684
709
  - your own custom runtime tools.
685
710
 
711
+ Subagents always inherit the parent run model. The subagent control tools do not expose a model override.
712
+
686
713
  For interactive runs where you want to stream events and inject steering mid-run, use `streamAgentLoop()`:
687
714
 
688
715
  ```ts
689
716
  import { streamAgentLoop } from "@ljoukov/llm";
690
717
 
691
718
  const call = streamAgentLoop({
692
- model: "chatgpt-gpt-5.3-codex",
719
+ model: "chatgpt-gpt-5.3-codex-spark",
693
720
  input: "Start implementation.",
694
721
  filesystemTool: true,
695
722
  });
@@ -704,7 +731,7 @@ console.log(result.text);
704
731
  For read/search/write tasks in a workspace, enable `filesystemTool`. The library auto-selects a tool profile by model
705
732
  when `profile: "auto"`:
706
733
 
707
- - Codex-like models (`gpt-5.4`, `chatgpt-gpt-5.4`, `chatgpt-gpt-5.4-fast`, and `*codex*` variants): Codex-compatible filesystem tool shape.
734
+ - Codex-like models (`gpt-5.4`, `chatgpt-gpt-5.4`, `chatgpt-gpt-5.4-fast`, and `chatgpt-gpt-5.3-codex-spark`): Codex-compatible filesystem tool shape.
708
735
  - Gemini models: Gemini-compatible filesystem tool shape.
709
736
  - Other models: model-agnostic profile (currently Gemini-style).
710
737
 
@@ -714,6 +741,7 @@ Confinement/policy is set through `filesystemTool.options`:
714
741
  - `fs`: backend (`createNodeAgentFilesystem()` or `createInMemoryAgentFilesystem()`).
715
742
  - `checkAccess`: hook for allow/deny policy + audit.
716
743
  - `allowOutsideCwd`: opt-out confinement (default is false).
744
+ - `mediaResolution`: default image fidelity for built-in `view_image` outputs.
717
745
 
718
746
  Detailed reference: `docs/agent-filesystem-tools.md`.
719
747
 
@@ -727,7 +755,7 @@ const fs = createInMemoryAgentFilesystem({
727
755
  });
728
756
 
729
757
  const result = await runAgentLoop({
730
- model: "chatgpt-gpt-5.3-codex",
758
+ model: "chatgpt-gpt-5.3-codex-spark",
731
759
  input: "Change value from 1 to 2 using filesystem tools.",
732
760
  filesystemTool: {
733
761
  profile: "auto",
@@ -753,7 +781,7 @@ Enable `subagentTool` to allow delegation via Codex-style control tools:
753
781
  import { runAgentLoop } from "@ljoukov/llm";
754
782
 
755
783
  const result = await runAgentLoop({
756
- model: "chatgpt-gpt-5.3-codex",
784
+ model: "chatgpt-gpt-5.3-codex-spark",
757
785
  input: "Plan the work, delegate in parallel where useful, and return a final merged result.",
758
786
  subagentTool: {
759
787
  enabled: true,
@@ -775,7 +803,7 @@ const fs = createInMemoryAgentFilesystem({
775
803
  });
776
804
 
777
805
  const result = await runAgentLoop({
778
- model: "chatgpt-gpt-5.3-codex",
806
+ model: "chatgpt-gpt-5.3-codex-spark",
779
807
  input: "Change value from 1 to 2 using filesystem tools.",
780
808
  filesystemTool: {
781
809
  profile: "auto",
@@ -819,7 +847,7 @@ import path from "node:path";
819
847
  import { runAgentLoop } from "@ljoukov/llm";
820
848
 
821
849
  await runAgentLoop({
822
- model: "chatgpt-gpt-5.3-codex",
850
+ model: "chatgpt-gpt-5.3-codex-spark",
823
851
  input: "Do the task",
824
852
  filesystemTool: true,
825
853
  logging: {
@@ -848,13 +876,13 @@ import {
848
876
  } from "@ljoukov/llm";
849
877
 
850
878
  const fs = createInMemoryAgentFilesystem({ "/repo/a.ts": "export const n = 1;\n" });
851
- const tools = createFilesystemToolSetForModel("chatgpt-gpt-5.3-codex", {
879
+ const tools = createFilesystemToolSetForModel("chatgpt-gpt-5.3-codex-spark", {
852
880
  cwd: "/repo",
853
881
  fs,
854
882
  });
855
883
 
856
884
  const result = await runToolLoop({
857
- model: "chatgpt-gpt-5.3-codex",
885
+ model: "chatgpt-gpt-5.3-codex-spark",
858
886
  input: "Update n to 2.",
859
887
  tools,
860
888
  });
@@ -905,15 +933,17 @@ Standard integration suite:
905
933
  npm run test:integration
906
934
  ```
907
935
 
908
- Large-file live integration tests are opt-in because they upload multi-megabyte fixtures to real provider file stores:
936
+ Large-file live integration tests are opt-in because they upload multi-megabyte fixtures to real canonical/provider
937
+ file stores:
909
938
 
910
939
  ```bash
911
940
  LLM_INTEGRATION_LARGE_FILES=1 npm run test:integration
912
941
  ```
913
942
 
914
- Those tests generate valid PDFs programmatically so the canonical upload path, `file_id` reuse, and automatic large
943
+ Those tests generate valid PDFs programmatically so the canonical upload path, signed-URL reuse, and automatic large
915
944
  attachment offload all exercise real provider APIs. The unit suite also covers direct-call upload logging plus
916
- `runAgentLoop()` upload telemetry/logging for combined-image overflow.
945
+ `runAgentLoop()` upload telemetry/logging for combined-image overflow, and the integration suite includes provider
946
+ format coverage for common document and image attachments.
917
947
 
918
948
  ## License
919
949