@ljoukov/llm 4.0.12 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,7 +8,7 @@
8
8
  Unified TypeScript wrapper over:
9
9
 
10
10
  - **OpenAI Responses API** (`openai`)
11
- - **Google Gemini via Vertex AI** (`@google/genai`)
11
+ - **Google Gemini** via **Vertex AI** or the **Gemini Developer API** (`@google/genai`)
12
12
  - **Fireworks chat-completions models** (`kimi-k2.5`, `glm-5`, `minimax-m2.1`, `gpt-oss-120b`)
13
13
  - **ChatGPT subscription models** via `chatgpt-*` model ids (reuses Codex auth store, or a token provider)
14
14
  - **Agentic orchestration with subagents** via `runAgentLoop()` + built-in delegation control tools
@@ -38,11 +38,22 @@ See Node.js docs on environment variables and dotenv files: https://nodejs.org/a
38
38
  - `OPENAI_RESPONSES_WEBSOCKET_MODE` (`auto` | `off` | `only`, default: `auto`)
39
39
  - `OPENAI_BASE_URL` (optional; defaults to `https://api.openai.com/v1`)
40
40
 
41
- ### Gemini (Vertex AI)
41
+ ### Gemini
42
+
43
+ Use one backend:
44
+
45
+ - `GEMINI_API_KEY` or `GOOGLE_API_KEY` for the Gemini Developer API
46
+ - `GOOGLE_SERVICE_ACCOUNT_JSON` for Vertex AI (the contents of a service account JSON key file, not a file path)
47
+ - `VERTEX_GCS_BUCKET` for Vertex-backed Gemini file attachments / `file_id` inputs
48
+ - `VERTEX_GCS_PREFIX` (optional object-name prefix inside `VERTEX_GCS_BUCKET`)
49
+
50
+ If a Gemini API key is present, the library uses the Gemini Developer API. Otherwise it falls back to Vertex AI.
42
51
 
43
- - `GOOGLE_SERVICE_ACCOUNT_JSON` (the contents of a service account JSON key file, not a file path)
52
+ For Vertex-backed Gemini file inputs, the library mirrors OpenAI-backed canonical files into GCS and then passes the
53
+ resulting `gs://...` URI to Vertex. Configure a lifecycle rule on that bucket to delete objects after 2 days if you
54
+ want hard 48-hour cleanup for mirrored objects.
44
55
 
45
- #### Get a service account key JSON
56
+ #### Vertex AI service account setup
46
57
 
47
58
  You need a **Google service account key JSON** for your Firebase / GCP project (this is what you put into
48
59
  `GOOGLE_SERVICE_ACCOUNT_JSON`).
@@ -219,16 +230,72 @@ const result = await generateText({ model: "gpt-5.2", input });
219
230
  console.log(result.text);
220
231
  ```
221
232
 
233
+ ### Files API
234
+
235
+ The library now exposes an OpenAI-like canonical files API:
236
+
237
+ ```ts
238
+ import fs from "node:fs";
239
+ import { files, generateText, type LlmInputMessage } from "@ljoukov/llm";
240
+
241
+ const stored = await files.create({
242
+ data: fs.readFileSync("report.pdf"),
243
+ filename: "report.pdf",
244
+ mimeType: "application/pdf",
245
+ });
246
+
247
+ const input: LlmInputMessage[] = [
248
+ {
249
+ role: "user",
250
+ content: [
251
+ { type: "text", text: "Summarize the PDF in 5 bullets." },
252
+ { type: "input_file", file_id: stored.id, filename: stored.filename },
253
+ ],
254
+ },
255
+ ];
256
+
257
+ const result = await generateText({ model: "gpt-5.2", input });
258
+ console.log(result.text);
259
+ ```
260
+
261
+ Canonical storage defaults to OpenAI Files with `purpose: "user_data"` and a `48h` TTL.
262
+
263
+ - OpenAI models use that `file_id` directly.
264
+ - Gemini Developer API mirrors the file lazily into Gemini Files when needed.
265
+ - Vertex-backed Gemini mirrors the file lazily into `VERTEX_GCS_BUCKET` and uses `gs://...` URIs.
266
+
267
+ Available methods:
268
+
269
+ - `files.create({ path | data, filename?, mimeType? })`
270
+ - `files.retrieve(fileId)`
271
+ - `files.delete(fileId)`
272
+ - `files.content(fileId)`
273
+
222
274
  ### Attachments (files / images)
223
275
 
224
276
  Use `inlineData` parts to attach base64-encoded bytes (intermixed with text). `inlineData.data` is base64 (not a data
225
277
  URL).
226
278
 
279
+ Optional: set `filename` on `inlineData` to preserve the original file name when the provider supports it.
280
+
227
281
  Note: `inlineData` is mapped based on `mimeType`.
228
282
 
229
283
  - `image/*` -> image input (`input_image`)
230
284
  - otherwise -> file input (`input_file`, e.g. `application/pdf`)
231
285
 
286
+ You can also pass OpenAI-style file/image parts directly:
287
+
288
+ - `input_file` with `file_id`
289
+ - `input_image` with `file_id`
290
+
291
+ When the combined inline attachment payload in a single request would exceed about `20 MiB` of base64/data-URL text,
292
+ the library automatically uploads those attachments to the canonical files store first and swaps the prompt to file
293
+ references:
294
+
295
+ - OpenAI: uses canonical OpenAI `file_id`s directly
296
+ - Gemini Developer API: mirrors to Gemini Files and sends `fileData.fileUri`
297
+ - Vertex AI: mirrors to `VERTEX_GCS_BUCKET` and sends `gs://...` URIs
298
+
232
299
  ```ts
233
300
  import fs from "node:fs";
234
301
  import { generateText, type LlmInputMessage } from "@ljoukov/llm";
@@ -249,6 +316,34 @@ const result = await generateText({ model: "gpt-5.2", input });
249
316
  console.log(result.text);
250
317
  ```
251
318
 
319
+ You can mix direct `file_id` parts with `inlineData`. Small attachments stay inline; oversized turns are upgraded to
320
+ canonical files automatically. Tool loops do the same for large tool outputs, and they also re-check the combined size
321
+ after parallel tool calls so a batch of individually-small images/files still gets upgraded to `file_id` references
322
+ before the next model request if the aggregate payload is too large.
323
+
324
+ OpenAI-style direct file-id example:
325
+
326
+ ```ts
327
+ import { files, generateText, type LlmInputMessage } from "@ljoukov/llm";
328
+
329
+ const stored = await files.create({
330
+ path: "doc.pdf",
331
+ });
332
+
333
+ const input: LlmInputMessage[] = [
334
+ {
335
+ role: "user",
336
+ content: [
337
+ { type: "text", text: "Summarize the attachment." },
338
+ { type: "input_file", file_id: stored.id, filename: stored.filename },
339
+ ],
340
+ },
341
+ ];
342
+
343
+ const result = await generateText({ model: "gemini-2.5-pro", input });
344
+ console.log(result.text);
345
+ ```
346
+
252
347
  PDF attachment example:
253
348
 
254
349
  ```ts
@@ -664,6 +759,7 @@ const result = await runAgentLoop({
664
759
  emit: (event) => {
665
760
  // Forward to your backend (Cloud Logging, OpenTelemetry, Datadog, etc.)
666
761
  // event.type: "agent.run.started" | "agent.run.stream" | "agent.run.completed"
762
+ // agent.run.completed also includes uploadCount, uploadBytes, and uploadLatencyMs
667
763
  // event carries runId, parentRunId, depth, model, timestamp + payload
668
764
  },
669
765
  flush: async () => {
@@ -693,6 +789,9 @@ Each LLM call writes:
693
789
  - `error.txt` plus `response.metadata.json` on failure.
694
790
 
695
791
  `image_url` data URLs are redacted in text/metadata logs (`data:...,...`) so base64 payloads are not printed inline.
792
+ Every canonical upload or provider mirror is also appended to `agent.log` as a `[upload] ...` line with source,
793
+ backend, bytes, and latency. Direct `generateText()` / `streamText()` calls inherit the same upload logging when you run
794
+ them inside an agent logging session, and their `response.metadata.json` includes an `uploads` summary.
696
795
 
697
796
  ```ts
698
797
  import path from "node:path";
@@ -771,6 +870,30 @@ the current directory, subagents enabled, and `Esc` interrupt support:
771
870
  npm run example:cli-chat
772
871
  ```
773
872
 
873
+ ## Testing
874
+
875
+ Unit tests:
876
+
877
+ ```bash
878
+ npm run test:unit
879
+ ```
880
+
881
+ Standard integration suite:
882
+
883
+ ```bash
884
+ npm run test:integration
885
+ ```
886
+
887
+ Large-file live integration tests are opt-in because they upload multi-megabyte fixtures to real provider file stores:
888
+
889
+ ```bash
890
+ LLM_INTEGRATION_LARGE_FILES=1 npm run test:integration
891
+ ```
892
+
893
+ Those tests generate valid PDFs programmatically so the canonical upload path, `file_id` reuse, and automatic large
894
+ attachment offload all exercise real provider APIs. The unit suite also covers direct-call upload logging plus
895
+ `runAgentLoop()` upload telemetry/logging for combined-image overflow.
896
+
774
897
  ## License
775
898
 
776
899
  MIT