@ljoukov/llm 3.0.4 → 3.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -11,6 +11,7 @@ Unified TypeScript wrapper over:
11
11
  - **Google Gemini via Vertex AI** (`@google/genai`)
12
12
  - **Fireworks chat-completions models** (`kimi-k2.5`, `glm-5`, `minimax-m2.1`, `gpt-oss-120b`)
13
13
  - **ChatGPT subscription models** via `chatgpt-*` model ids (reuses Codex auth store, or a token provider)
14
+ - **Agentic orchestration with subagents** via `runAgentLoop()` + built-in delegation control tools
14
15
 
15
16
  Designed around a single streaming API that yields:
16
17
 
@@ -109,21 +110,40 @@ SSE for the rest of the process to avoid repeated failing upgrade attempts.
109
110
 
110
111
  ### Adaptive per-model concurrency
111
112
 
112
- Provider calls use adaptive, overload-aware concurrency (with retry/backoff where supported). Configure hard caps per
113
- model/per binary with env vars (clamped to `1..64`, default `3`):
113
+ Provider calls use adaptive, overload-aware concurrency (with retry/backoff where supported). Configure hard caps in
114
+ code (clamped to `1..64`):
114
115
 
115
- - global cap: `LLM_MAX_PARALLEL_REQUESTS_PER_MODEL`
116
- - provider caps: `OPENAI_MAX_PARALLEL_REQUESTS_PER_MODEL`, `GOOGLE_MAX_PARALLEL_REQUESTS_PER_MODEL`,
117
- `FIREWORKS_MAX_PARALLEL_REQUESTS_PER_MODEL`
118
- - model overrides:
119
- - `LLM_MAX_PARALLEL_REQUESTS_MODEL_<MODEL>`
120
- - `<PROVIDER>_MAX_PARALLEL_REQUESTS_MODEL_<MODEL>`
116
+ ```ts
117
+ import { configureModelConcurrency } from "@ljoukov/llm";
118
+
119
+ configureModelConcurrency({
120
+ globalCap: 8,
121
+ providerCaps: {
122
+ openai: 16,
123
+ google: 3,
124
+ fireworks: 8,
125
+ },
126
+ modelCaps: {
127
+ "gpt-5.2": 24,
128
+ },
129
+ providerModelCaps: {
130
+ google: {
131
+ "gemini-3.1-pro-preview": 2,
132
+ },
133
+ },
134
+ });
135
+ ```
136
+
137
+ Default caps (without configuration):
121
138
 
122
- `<MODEL>` is uppercased and non-alphanumeric characters become `_` (for example `gpt-5.2` -> `GPT_5_2`).
139
+ - OpenAI: `12`
140
+ - Google preview models (`*preview*`): `2`
141
+ - Other Google models: `4`
142
+ - Fireworks: `6`
123
143
 
124
144
  ## Usage
125
145
 
126
- `v2` uses OpenAI-style request fields:
146
+ Use OpenAI-style request fields:
127
147
 
128
148
  - `input`: string or message array
129
149
  - `instructions`: optional top-level system instructions
@@ -326,8 +346,8 @@ console.log(result.text);
326
346
 
327
347
  - OpenAI API models use structured outputs (`json_schema`) when possible.
328
348
  - Gemini uses `responseJsonSchema`.
329
- - `chatgpt-*` models try to use structured outputs too; if rejected by the endpoint/model, it falls back to best-effort
330
- JSON parsing.
349
+ - `chatgpt-*` models try to use structured outputs too; if the endpoint/account/model rejects `json_schema`, the call
350
+ retries with best-effort JSON parsing.
331
351
 
332
352
  ```ts
333
353
  import { generateJson } from "@ljoukov/llm";
@@ -412,12 +432,12 @@ There are three tool-enabled call patterns:
412
432
 
413
433
  1. `generateText()` for provider-native/server-side tools (for example web search).
414
434
  2. `runToolLoop()` for your runtime JS/TS tools (function tools executed in your process).
415
- 3. `runAgentLoop()` for filesystem tasks (a convenience wrapper around `runToolLoop()`).
435
+ 3. `runAgentLoop()` for full agentic loops (a convenience wrapper around `runToolLoop()` with built-in subagent orchestration and optional filesystem tools).
416
436
 
417
437
  Architecture note:
418
438
 
419
- - Filesystem tools are not a separate execution system.
420
- - `runAgentLoop()` constructs a filesystem toolset, merges your optional custom tools, then calls the same `runToolLoop()` engine.
439
+ - Built-in filesystem tools are not a separate execution system.
440
+ - `runAgentLoop()` can construct a filesystem toolset, merges your optional custom tools, and calls the same `runToolLoop()` engine.
421
441
  - This behavior is model-agnostic at API level; profile selection only adapts tool shape for model compatibility.
422
442
 
423
443
  ### Provider-Native Tools (`generateText()`)
@@ -461,9 +481,18 @@ console.log(result.text);
461
481
 
462
482
  Use `customTool()` only when you need freeform/non-JSON tool input grammar.
463
483
 
464
- ### Filesystem Tasks (`runAgentLoop()`)
484
+ ### Agentic Loop (`runAgentLoop()`)
465
485
 
466
- Use this for read/search/write tasks in a workspace. The library auto-selects filesystem tool profile by model when `profile: "auto"`:
486
+ `runAgentLoop()` is the high-level agentic API. It supports:
487
+
488
+ - optional filesystem workspace tools,
489
+ - built-in subagent orchestration (delegate work across spawned agents),
490
+ - your own custom runtime tools.
491
+
492
+ #### 1) Filesystem agent loop
493
+
494
+ For read/search/write tasks in a workspace, enable `filesystemTool`. The library auto-selects a tool profile by model
495
+ when `profile: "auto"`:
467
496
 
468
497
  - Codex-like models: Codex-compatible filesystem tool shape.
469
498
  - Gemini models: Gemini-compatible filesystem tool shape.
@@ -478,10 +507,54 @@ Confinement/policy is set through `filesystemTool.options`:
478
507
 
479
508
  Detailed reference: `docs/agent-filesystem-tools.md`.
480
509
 
481
- Subagent delegation can be enabled via `subagentTool` (Codex-style control tools):
510
+ Filesystem-only example:
511
+
512
+ ```ts
513
+ import { createInMemoryAgentFilesystem, runAgentLoop } from "@ljoukov/llm";
514
+
515
+ const fs = createInMemoryAgentFilesystem({
516
+ "/repo/src/a.ts": "export const value = 1;\n",
517
+ });
518
+
519
+ const result = await runAgentLoop({
520
+ model: "chatgpt-gpt-5.3-codex",
521
+ input: "Change value from 1 to 2 using filesystem tools.",
522
+ filesystemTool: {
523
+ profile: "auto",
524
+ options: {
525
+ cwd: "/repo",
526
+ fs,
527
+ },
528
+ },
529
+ });
530
+
531
+ console.log(result.text);
532
+ ```
533
+
534
+ #### 2) Add subagent orchestration
535
+
536
+ Enable `subagentTool` to allow delegation via Codex-style control tools:
482
537
 
483
538
  - `spawn_agent`, `send_input`, `resume_agent`, `wait`, `close_agent`
484
- - Optional limits: `maxAgents`, `maxDepth`, wait timeouts.
539
+ - optional limits: `maxAgents`, `maxDepth`, wait timeouts
540
+
541
+ ```ts
542
+ import { runAgentLoop } from "@ljoukov/llm";
543
+
544
+ const result = await runAgentLoop({
545
+ model: "chatgpt-gpt-5.3-codex",
546
+ input: "Plan the work, delegate in parallel where useful, and return a final merged result.",
547
+ subagentTool: {
548
+ enabled: true,
549
+ maxAgents: 4,
550
+ maxDepth: 2,
551
+ },
552
+ });
553
+
554
+ console.log(result.text);
555
+ ```
556
+
557
+ #### 3) Combine filesystem + subagents
485
558
 
486
559
  ```ts
487
560
  import { createInMemoryAgentFilesystem, runAgentLoop } from "@ljoukov/llm";
@@ -510,6 +583,37 @@ const result = await runAgentLoop({
510
583
  console.log(result.text);
511
584
  ```
512
585
 
586
+ ### Agent Telemetry (Pluggable Backends)
587
+
588
+ `runAgentLoop()` supports optional telemetry hooks that keep default behavior unchanged.
589
+ You can attach any backend by implementing a sink with `emit(event)` and optional `flush()`.
590
+
591
+ ```ts
592
+ import { runAgentLoop } from "@ljoukov/llm";
593
+
594
+ const result = await runAgentLoop({
595
+ model: "chatgpt-gpt-5.3-codex",
596
+ input: "Summarize the report and update output JSON files.",
597
+ filesystemTool: true,
598
+ telemetry: {
599
+ includeLlmStreamEvents: false, // enable only if you need token/delta event fan-out
600
+ sink: {
601
+ emit: (event) => {
602
+ // Forward to your backend (Cloud Logging, OpenTelemetry, Datadog, etc.)
603
+ // event.type: "agent.run.started" | "agent.run.stream" | "agent.run.completed"
604
+ // event carries runId, parentRunId, depth, model, timestamp + payload
605
+ },
606
+ flush: async () => {
607
+ // Optional: flush buffered telemetry on run completion.
608
+ },
609
+ },
610
+ },
611
+ });
612
+ ```
613
+
614
+ Telemetry emits parent/child run correlation (`runId` + `parentRunId`) for subagents.
615
+ See `docs/agent-telemetry.md` for event schema, design rationale, and backend adapter guidance.
616
+
513
617
  If you need exact control over tool definitions, build the filesystem toolset yourself and call `runToolLoop()` directly.
514
618
 
515
619
  ```ts