npm - @ljoukov/llm - Versions diffs - 5.0.4 → 7.0.0 - Mend

@ljoukov/llm 5.0.4 → 7.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -46,14 +46,20 @@ Use one backend:
 - `GEMINI_API_KEY` or `GOOGLE_API_KEY` for the Gemini Developer API
 - `GOOGLE_SERVICE_ACCOUNT_JSON` for Vertex AI (the contents of a service account JSON key file, not a file path)
+- `LLM_FILES_GCS_BUCKET` for canonical file storage used by `files.create()` and automatic large-attachment offload
+- `LLM_FILES_GCS_PREFIX` (optional object-name prefix inside `LLM_FILES_GCS_BUCKET`)
 - `VERTEX_GCS_BUCKET` for Vertex-backed Gemini file attachments / `file_id` inputs
 - `VERTEX_GCS_PREFIX` (optional object-name prefix inside `VERTEX_GCS_BUCKET`)
 If a Gemini API key is present, the library uses the Gemini Developer API. Otherwise it falls back to Vertex AI.
-For Vertex-backed Gemini file inputs, the library mirrors OpenAI-backed canonical files into GCS and then passes the
-resulting `gs://...` URI to Vertex. Configure a lifecycle rule on that bucket to delete objects after 2 days if you
-want hard 48-hour cleanup for mirrored objects.
+Canonical files are stored in GCS with a default `48h` TTL. OpenAI and ChatGPT consume those files via signed HTTPS
+URLs. Gemini still mirrors canonical files lazily into provider-native storage when needed:
+- Gemini Developer API mirrors into Gemini Files
+- Vertex-backed Gemini mirrors into `VERTEX_GCS_BUCKET` and uses `gs://...` URIs
+Configure lifecycle rules on those buckets if you want hard 48-hour cleanup for mirrored objects.
 #### Vertex AI service account setup
@@ -137,7 +143,7 @@ configureModelConcurrency({
     fireworks: 8,
   },
   modelCaps: {
-    "gpt-5.2": 24,
+    "gpt-5.4-mini": 24,
   },
   providerModelCaps: {
     google: {
@@ -168,7 +174,7 @@ Use OpenAI-style request fields:
 import { generateText } from "@ljoukov/llm";
 const result = await generateText({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "Write one sentence about TypeScript.",
 });
@@ -182,7 +188,7 @@ console.log(result.usage, result.costUsd);
 import { streamText } from "@ljoukov/llm";
 const call = streamText({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "Explain what a hash function is in one paragraph.",
 });
@@ -228,7 +234,7 @@ const input: LlmInputMessage[] = [
   },
 ];
-const result = await generateText({ model: "gpt-5.2", input });
+const result = await generateText({ model: "gpt-5.4-mini", input });
 console.log(result.text);
 ```
@@ -256,13 +262,13 @@ const input: LlmInputMessage[] = [
   },
 ];
-const result = await generateText({ model: "gpt-5.2", input });
+const result = await generateText({ model: "gpt-5.4-mini", input });
 console.log(result.text);
 ```
-Canonical storage defaults to OpenAI Files with `purpose: "user_data"` and a `48h` TTL.
+Canonical storage now uses GCS-backed objects with a `48h` TTL.
-- OpenAI models use that `file_id` directly.
+- OpenAI and ChatGPT models resolve that `file_id` to a signed HTTPS URL.
 - Gemini Developer API mirrors the file lazily into Gemini Files when needed.
 - Vertex-backed Gemini mirrors the file lazily into `VERTEX_GCS_BUCKET` and uses `gs://...` URIs.
@@ -294,7 +300,7 @@ When the combined inline attachment payload in a single request would exceed abo
 the library automatically uploads those attachments to the canonical files store first and swaps the prompt to file
 references:
-- OpenAI: uses canonical OpenAI `file_id`s directly
+- OpenAI / ChatGPT: use signed HTTPS URLs for canonical files
 - Gemini Developer API: mirrors to Gemini Files and sends `fileData.fileUri`
 - Vertex AI: mirrors to `VERTEX_GCS_BUCKET` and sends `gs://...` URIs
@@ -314,14 +320,28 @@ const input: LlmInputMessage[] = [
   },
 ];
-const result = await generateText({ model: "gpt-5.2", input });
+const result = await generateText({ model: "gpt-5.4-mini", input });
 console.log(result.text);
 ```
 You can mix direct `file_id` parts with `inlineData`. Small attachments stay inline; oversized turns are upgraded to
 canonical files automatically. Tool loops do the same for large tool outputs, and they also re-check the combined size
-after parallel tool calls so a batch of individually-small images/files still gets upgraded to `file_id` references
-before the next model request if the aggregate payload is too large.
+after parallel tool calls so a batch of individually-small images/files still gets upgraded to canonical-file
+references before the next model request if the aggregate payload is too large.
+You can also control image analysis fidelity with request-level `mediaResolution`:
+- `low`, `medium`, `high`, `original`, `auto`
+- OpenAI / ChatGPT map this onto image `detail`
+- Gemini maps this onto media resolution/tokenization settings
+```ts
+const result = await generateText({
+  model: "gpt-5.4",
+  mediaResolution: "original",
+  input,
+});
+```
 OpenAI-style direct file-id example:
@@ -364,7 +384,7 @@ const input: LlmInputMessage[] = [
   },
 ];
-const result = await generateText({ model: "gpt-5.2", input });
+const result = await generateText({ model: "gpt-5.4-mini", input });
 console.log(result.text);
 ```
@@ -390,7 +410,7 @@ const input: LlmInputMessage[] = [
   },
 ];
-const result = await generateText({ model: "gpt-5.2", input });
+const result = await generateText({ model: "gpt-5.4-mini", input });
 console.log(result.text);
 ```
@@ -439,6 +459,11 @@ console.log(result.text);
 `chatgpt-gpt-5.4-fast` is also supported as a convenience alias for ChatGPT-authenticated `gpt-5.4` with priority processing enabled (`service_tier="priority"`), matching Codex `/fast` semantics.
+Supported OpenAI text model ids are fixed literal unions in code, not arbitrary strings:
+- OpenAI API: `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.4-nano`
+- ChatGPT auth: `chatgpt-gpt-5.4`, `chatgpt-gpt-5.4-fast`, `chatgpt-gpt-5.4-mini`, `chatgpt-gpt-5.3-codex-spark`
 ## JSON outputs
 `generateJson()` validates the output with Zod and returns the parsed value.
@@ -458,7 +483,7 @@ const schema = z.object({
 });
 const { value } = await generateJson({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "Return a JSON object with ok=true and message='hello'.",
   schema,
 });
@@ -481,7 +506,7 @@ const schema = z.object({
 });
 const call = streamJson({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "Return a JSON object with ok=true and message='hello'.",
   schema,
 });
@@ -503,7 +528,7 @@ If you only want thought deltas (no partial JSON), set `streamMode: "final"`.
 ```ts
 const call = streamJson({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "Return a JSON object with ok=true and message='hello'.",
   schema,
   streamMode: "final",
@@ -514,7 +539,7 @@ If you want to keep `generateJson()` but still stream thoughts, pass an `onEvent
 ```ts
 const { value } = await generateJson({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "Return a JSON object with ok=true and message='hello'.",
   schema,
   onEvent: (event) => {
@@ -551,13 +576,13 @@ configureTelemetry({
 });
 const { value } = await generateJson({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "Return { ok: true }.",
   schema: z.object({ ok: z.boolean() }),
 });
 await runAgentLoop({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "Inspect the repo and update the file.",
   filesystemTool: true,
 });
@@ -567,7 +592,7 @@ Per-call opt-out:
 ```ts
 await generateJson({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "Return { ok: true }.",
   schema: z.object({ ok: z.boolean() }),
   telemetry: false,
@@ -598,7 +623,7 @@ Use this when the model provider executes the tool remotely (for example search/
 import { generateText } from "@ljoukov/llm";
 const result = await generateText({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "Find 3 relevant sources about X and summarize them.",
   tools: [{ type: "web-search", mode: "live" }, { type: "code-execution" }],
 });
@@ -615,7 +640,7 @@ import { runToolLoop, tool } from "@ljoukov/llm";
 import { z } from "zod";
 const result = await runToolLoop({
-  model: "gpt-5.2",
+  model: "gpt-5.4-mini",
   input: "What is 12 * 9? Use the tool.",
   tools: {
     multiply: tool({
@@ -641,7 +666,7 @@ import { streamToolLoop, tool } from "@ljoukov/llm";
 import { z } from "zod";
 const call = streamToolLoop({
-  model: "chatgpt-gpt-5.3-codex",
+  model: "chatgpt-gpt-5.3-codex-spark",
   input: "Start implementing the feature.",
   tools: {
     echo: tool({
@@ -665,7 +690,7 @@ import { createToolLoopSteeringChannel, runAgentLoop } from "@ljoukov/llm";
 const steering = createToolLoopSteeringChannel();
 const run = runAgentLoop({
-  model: "chatgpt-gpt-5.3-codex",
+  model: "chatgpt-gpt-5.3-codex-spark",
   input: "Implement the task.",
   filesystemTool: true,
   steering,
@@ -683,13 +708,15 @@ const result = await run;
 - built-in subagent orchestration (delegate work across spawned agents),
 - your own custom runtime tools.
+Subagents always inherit the parent run model. The subagent control tools do not expose a model override.
 For interactive runs where you want to stream events and inject steering mid-run, use `streamAgentLoop()`:
 ```ts
 import { streamAgentLoop } from "@ljoukov/llm";
 const call = streamAgentLoop({
-  model: "chatgpt-gpt-5.3-codex",
+  model: "chatgpt-gpt-5.3-codex-spark",
   input: "Start implementation.",
   filesystemTool: true,
 });
@@ -704,7 +731,7 @@ console.log(result.text);
 For read/search/write tasks in a workspace, enable `filesystemTool`. The library auto-selects a tool profile by model
 when `profile: "auto"`:
-- Codex-like models (`gpt-5.4`, `chatgpt-gpt-5.4`, `chatgpt-gpt-5.4-fast`, and `*codex*` variants): Codex-compatible filesystem tool shape.
+- Codex-like models (`gpt-5.4`, `chatgpt-gpt-5.4`, `chatgpt-gpt-5.4-fast`, and `chatgpt-gpt-5.3-codex-spark`): Codex-compatible filesystem tool shape.
 - Gemini models: Gemini-compatible filesystem tool shape.
 - Other models: model-agnostic profile (currently Gemini-style).
@@ -714,6 +741,7 @@ Confinement/policy is set through `filesystemTool.options`:
 - `fs`: backend (`createNodeAgentFilesystem()` or `createInMemoryAgentFilesystem()`).
 - `checkAccess`: hook for allow/deny policy + audit.
 - `allowOutsideCwd`: opt-out confinement (default is false).
+- `mediaResolution`: default image fidelity for built-in `view_image` outputs.
 Detailed reference: `docs/agent-filesystem-tools.md`.
@@ -727,7 +755,7 @@ const fs = createInMemoryAgentFilesystem({
 });
 const result = await runAgentLoop({
-  model: "chatgpt-gpt-5.3-codex",
+  model: "chatgpt-gpt-5.3-codex-spark",
   input: "Change value from 1 to 2 using filesystem tools.",
   filesystemTool: {
     profile: "auto",
@@ -753,7 +781,7 @@ Enable `subagentTool` to allow delegation via Codex-style control tools:
 import { runAgentLoop } from "@ljoukov/llm";
 const result = await runAgentLoop({
-  model: "chatgpt-gpt-5.3-codex",
+  model: "chatgpt-gpt-5.3-codex-spark",
   input: "Plan the work, delegate in parallel where useful, and return a final merged result.",
   subagentTool: {
     enabled: true,
@@ -775,7 +803,7 @@ const fs = createInMemoryAgentFilesystem({
 });
 const result = await runAgentLoop({
-  model: "chatgpt-gpt-5.3-codex",
+  model: "chatgpt-gpt-5.3-codex-spark",
   input: "Change value from 1 to 2 using filesystem tools.",
   filesystemTool: {
     profile: "auto",
@@ -819,7 +847,7 @@ import path from "node:path";
 import { runAgentLoop } from "@ljoukov/llm";
 await runAgentLoop({
-  model: "chatgpt-gpt-5.3-codex",
+  model: "chatgpt-gpt-5.3-codex-spark",
   input: "Do the task",
   filesystemTool: true,
   logging: {
@@ -848,13 +876,13 @@ import {
 } from "@ljoukov/llm";
 const fs = createInMemoryAgentFilesystem({ "/repo/a.ts": "export const n = 1;\n" });
-const tools = createFilesystemToolSetForModel("chatgpt-gpt-5.3-codex", {
+const tools = createFilesystemToolSetForModel("chatgpt-gpt-5.3-codex-spark", {
   cwd: "/repo",
   fs,
 });
 const result = await runToolLoop({
-  model: "chatgpt-gpt-5.3-codex",
+  model: "chatgpt-gpt-5.3-codex-spark",
   input: "Update n to 2.",
   tools,
 });
@@ -905,15 +933,17 @@ Standard integration suite:
 npm run test:integration
 ```
-Large-file live integration tests are opt-in because they upload multi-megabyte fixtures to real provider file stores:
+Large-file live integration tests are opt-in because they upload multi-megabyte fixtures to real canonical/provider
+file stores:
 ```bash
 LLM_INTEGRATION_LARGE_FILES=1 npm run test:integration
 ```
-Those tests generate valid PDFs programmatically so the canonical upload path, `file_id` reuse, and automatic large
+Those tests generate valid PDFs programmatically so the canonical upload path, signed-URL reuse, and automatic large
 attachment offload all exercise real provider APIs. The unit suite also covers direct-call upload logging plus
-`runAgentLoop()` upload telemetry/logging for combined-image overflow.
+`runAgentLoop()` upload telemetry/logging for combined-image overflow, and the integration suite includes provider
+format coverage for common document and image attachments.
 ## License