npm - @ljoukov/llm - Versions diffs - 3.0.4 → 3.0.6 - Mend

@ljoukov/llm 3.0.4 → 3.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md CHANGED Viewed

@@ -11,6 +11,7 @@ Unified TypeScript wrapper over:
 - **Google Gemini via Vertex AI** (`@google/genai`)
 - **Fireworks chat-completions models** (`kimi-k2.5`, `glm-5`, `minimax-m2.1`, `gpt-oss-120b`)
 - **ChatGPT subscription models** via `chatgpt-*` model ids (reuses Codex auth store, or a token provider)
+- **Agentic orchestration with subagents** via `runAgentLoop()` + built-in delegation control tools
 Designed around a single streaming API that yields:
@@ -109,21 +110,40 @@ SSE for the rest of the process to avoid repeated failing upgrade attempts.
 ### Adaptive per-model concurrency
-Provider calls use adaptive, overload-aware concurrency (with retry/backoff where supported). Configure hard caps per
-model/per binary with env vars (clamped to `1..64`, default `3`):
+Provider calls use adaptive, overload-aware concurrency (with retry/backoff where supported). Configure hard caps in
+code (clamped to `1..64`):
-- global cap: `LLM_MAX_PARALLEL_REQUESTS_PER_MODEL`
-- provider caps: `OPENAI_MAX_PARALLEL_REQUESTS_PER_MODEL`, `GOOGLE_MAX_PARALLEL_REQUESTS_PER_MODEL`,
-  `FIREWORKS_MAX_PARALLEL_REQUESTS_PER_MODEL`
-- model overrides:
-  - `LLM_MAX_PARALLEL_REQUESTS_MODEL_<MODEL>`
-  - `<PROVIDER>_MAX_PARALLEL_REQUESTS_MODEL_<MODEL>`
+```ts
+import { configureModelConcurrency } from "@ljoukov/llm";
+configureModelConcurrency({
+  globalCap: 8,
+  providerCaps: {
+    openai: 16,
+    google: 3,
+    fireworks: 8,
+  },
+  modelCaps: {
+    "gpt-5.2": 24,
+  },
+  providerModelCaps: {
+    google: {
+      "gemini-3.1-pro-preview": 2,
+    },
+  },
+});
+```
+Default caps (without configuration):
-`<MODEL>` is uppercased and non-alphanumeric characters become `_` (for example `gpt-5.2` -> `GPT_5_2`).
+- OpenAI: `12`
+- Google preview models (`*preview*`): `2`
+- Other Google models: `4`
+- Fireworks: `6`
 ## Usage
-`v2` uses OpenAI-style request fields:
+Use OpenAI-style request fields:
 - `input`: string or message array
 - `instructions`: optional top-level system instructions
@@ -326,8 +346,8 @@ console.log(result.text);
 - OpenAI API models use structured outputs (`json_schema`) when possible.
 - Gemini uses `responseJsonSchema`.
-- `chatgpt-*` models try to use structured outputs too; if rejected by the endpoint/model, it falls back to best-effort
-  JSON parsing.
+- `chatgpt-*` models try to use structured outputs too; if the endpoint/account/model rejects `json_schema`, the call
+  retries with best-effort JSON parsing.
 ```ts
 import { generateJson } from "@ljoukov/llm";
@@ -412,12 +432,12 @@ There are three tool-enabled call patterns:
 1. `generateText()` for provider-native/server-side tools (for example web search).
 2. `runToolLoop()` for your runtime JS/TS tools (function tools executed in your process).
-3. `runAgentLoop()` for filesystem tasks (a convenience wrapper around `runToolLoop()`).
+3. `runAgentLoop()` for full agentic loops (a convenience wrapper around `runToolLoop()` with built-in subagent orchestration and optional filesystem tools).
 Architecture note:
-- Filesystem tools are not a separate execution system.
-- `runAgentLoop()` constructs a filesystem toolset, merges your optional custom tools, then calls the same `runToolLoop()` engine.
+- Built-in filesystem tools are not a separate execution system.
+- `runAgentLoop()` can construct a filesystem toolset, merges your optional custom tools, and calls the same `runToolLoop()` engine.
 - This behavior is model-agnostic at API level; profile selection only adapts tool shape for model compatibility.
 ### Provider-Native Tools (`generateText()`)
@@ -461,9 +481,18 @@ console.log(result.text);
 Use `customTool()` only when you need freeform/non-JSON tool input grammar.
-### Filesystem Tasks (`runAgentLoop()`)
+### Agentic Loop (`runAgentLoop()`)
-Use this for read/search/write tasks in a workspace. The library auto-selects filesystem tool profile by model when `profile: "auto"`:
+`runAgentLoop()` is the high-level agentic API. It supports:
+- optional filesystem workspace tools,
+- built-in subagent orchestration (delegate work across spawned agents),
+- your own custom runtime tools.
+#### 1) Filesystem agent loop
+For read/search/write tasks in a workspace, enable `filesystemTool`. The library auto-selects a tool profile by model
+when `profile: "auto"`:
 - Codex-like models: Codex-compatible filesystem tool shape.
 - Gemini models: Gemini-compatible filesystem tool shape.
@@ -478,10 +507,54 @@ Confinement/policy is set through `filesystemTool.options`:
 Detailed reference: `docs/agent-filesystem-tools.md`.
-Subagent delegation can be enabled via `subagentTool` (Codex-style control tools):
+Filesystem-only example:
+```ts
+import { createInMemoryAgentFilesystem, runAgentLoop } from "@ljoukov/llm";
+const fs = createInMemoryAgentFilesystem({
+  "/repo/src/a.ts": "export const value = 1;\n",
+});
+const result = await runAgentLoop({
+  model: "chatgpt-gpt-5.3-codex",
+  input: "Change value from 1 to 2 using filesystem tools.",
+  filesystemTool: {
+    profile: "auto",
+    options: {
+      cwd: "/repo",
+      fs,
+    },
+  },
+});
+console.log(result.text);
+```
+#### 2) Add subagent orchestration
+Enable `subagentTool` to allow delegation via Codex-style control tools:
 - `spawn_agent`, `send_input`, `resume_agent`, `wait`, `close_agent`
-- Optional limits: `maxAgents`, `maxDepth`, wait timeouts.
+- optional limits: `maxAgents`, `maxDepth`, wait timeouts
+```ts
+import { runAgentLoop } from "@ljoukov/llm";
+const result = await runAgentLoop({
+  model: "chatgpt-gpt-5.3-codex",
+  input: "Plan the work, delegate in parallel where useful, and return a final merged result.",
+  subagentTool: {
+    enabled: true,
+    maxAgents: 4,
+    maxDepth: 2,
+  },
+});
+console.log(result.text);
+```
+#### 3) Combine filesystem + subagents
 ```ts
 import { createInMemoryAgentFilesystem, runAgentLoop } from "@ljoukov/llm";
@@ -510,6 +583,37 @@ const result = await runAgentLoop({
 console.log(result.text);
 ```
+### Agent Telemetry (Pluggable Backends)
+`runAgentLoop()` supports optional telemetry hooks that keep default behavior unchanged.
+You can attach any backend by implementing a sink with `emit(event)` and optional `flush()`.
+```ts
+import { runAgentLoop } from "@ljoukov/llm";
+const result = await runAgentLoop({
+  model: "chatgpt-gpt-5.3-codex",
+  input: "Summarize the report and update output JSON files.",
+  filesystemTool: true,
+  telemetry: {
+    includeLlmStreamEvents: false, // enable only if you need token/delta event fan-out
+    sink: {
+      emit: (event) => {
+        // Forward to your backend (Cloud Logging, OpenTelemetry, Datadog, etc.)
+        // event.type: "agent.run.started" | "agent.run.stream" | "agent.run.completed"
+        // event carries runId, parentRunId, depth, model, timestamp + payload
+      },
+      flush: async () => {
+        // Optional: flush buffered telemetry on run completion.
+      },
+    },
+  },
+});
+```
+Telemetry emits parent/child run correlation (`runId` + `parentRunId`) for subagents.
+See `docs/agent-telemetry.md` for event schema, design rationale, and backend adapter guidance.
 If you need exact control over tool definitions, build the filesystem toolset yourself and call `runToolLoop()` directly.
 ```ts