@codilore/llm 1.15.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (145) hide show
  1. package/AGENTS.md +321 -0
  2. package/README.md +131 -0
  3. package/example/call-sites.md +591 -0
  4. package/example/tutorial.ts +255 -0
  5. package/package.json +50 -0
  6. package/script/recording-cost-report.ts +250 -0
  7. package/script/setup-recording-env.ts +542 -0
  8. package/src/cache-policy.ts +111 -0
  9. package/src/index.ts +32 -0
  10. package/src/llm.ts +186 -0
  11. package/src/protocols/anthropic-messages.ts +841 -0
  12. package/src/protocols/bedrock-converse.ts +649 -0
  13. package/src/protocols/bedrock-event-stream.ts +87 -0
  14. package/src/protocols/gemini.ts +465 -0
  15. package/src/protocols/index.ts +6 -0
  16. package/src/protocols/openai-chat.ts +431 -0
  17. package/src/protocols/openai-compatible-chat.ts +24 -0
  18. package/src/protocols/openai-responses.ts +987 -0
  19. package/src/protocols/shared.ts +283 -0
  20. package/src/protocols/utils/bedrock-auth.ts +70 -0
  21. package/src/protocols/utils/bedrock-cache.ts +37 -0
  22. package/src/protocols/utils/bedrock-media.ts +80 -0
  23. package/src/protocols/utils/cache.ts +16 -0
  24. package/src/protocols/utils/gemini-tool-schema.ts +101 -0
  25. package/src/protocols/utils/lifecycle.ts +102 -0
  26. package/src/protocols/utils/openai-options.ts +84 -0
  27. package/src/protocols/utils/tool-stream.ts +218 -0
  28. package/src/provider.ts +37 -0
  29. package/src/providers/amazon-bedrock.ts +43 -0
  30. package/src/providers/anthropic.ts +35 -0
  31. package/src/providers/azure.ts +110 -0
  32. package/src/providers/cloudflare.ts +127 -0
  33. package/src/providers/github-copilot.ts +66 -0
  34. package/src/providers/google.ts +35 -0
  35. package/src/providers/index.ts +11 -0
  36. package/src/providers/openai-compatible-profile.ts +20 -0
  37. package/src/providers/openai-compatible.ts +65 -0
  38. package/src/providers/openai-options.ts +81 -0
  39. package/src/providers/openai.ts +63 -0
  40. package/src/providers/openrouter.ts +98 -0
  41. package/src/providers/xai.ts +56 -0
  42. package/src/route/auth-options.ts +57 -0
  43. package/src/route/auth.ts +156 -0
  44. package/src/route/client.ts +434 -0
  45. package/src/route/endpoint.ts +53 -0
  46. package/src/route/executor.ts +374 -0
  47. package/src/route/framing.ts +27 -0
  48. package/src/route/index.ts +25 -0
  49. package/src/route/protocol.ts +84 -0
  50. package/src/route/transport/http.ts +108 -0
  51. package/src/route/transport/index.ts +33 -0
  52. package/src/route/transport/websocket.ts +280 -0
  53. package/src/schema/errors.ts +203 -0
  54. package/src/schema/events.ts +370 -0
  55. package/src/schema/ids.ts +43 -0
  56. package/src/schema/index.ts +5 -0
  57. package/src/schema/messages.ts +404 -0
  58. package/src/schema/options.ts +221 -0
  59. package/src/tool-runtime.ts +78 -0
  60. package/src/tool.ts +241 -0
  61. package/src/utils/record.ts +3 -0
  62. package/sst-env.d.ts +10 -0
  63. package/test/adapter.test.ts +164 -0
  64. package/test/auth-options.types.ts +168 -0
  65. package/test/auth.test.ts +103 -0
  66. package/test/cache-policy.test.ts +262 -0
  67. package/test/continuation-scenarios.ts +104 -0
  68. package/test/endpoint.test.ts +58 -0
  69. package/test/executor.test.ts +418 -0
  70. package/test/exports.test.ts +62 -0
  71. package/test/fixtures/media/restroom.png +0 -0
  72. package/test/fixtures/recordings/anthropic-messages/accepts-malformed-assistant-tool-order-with-default-patch.json +29 -0
  73. package/test/fixtures/recordings/anthropic-messages/anthropic-opus-4-7-image-tool-result.json +43 -0
  74. package/test/fixtures/recordings/anthropic-messages/claude-opus-4-7-drives-a-tool-loop.json +56 -0
  75. package/test/fixtures/recordings/anthropic-messages/rejects-malformed-assistant-tool-order-without-patch.json +29 -0
  76. package/test/fixtures/recordings/anthropic-messages/streams-text.json +29 -0
  77. package/test/fixtures/recordings/anthropic-messages/streams-tool-call.json +29 -0
  78. package/test/fixtures/recordings/anthropic-messages-cache/writes-then-reads-cache-control-on-identical-second-call.json +48 -0
  79. package/test/fixtures/recordings/bedrock-converse/drives-a-tool-loop.json +55 -0
  80. package/test/fixtures/recordings/bedrock-converse/streams-a-tool-call.json +29 -0
  81. package/test/fixtures/recordings/bedrock-converse/streams-text.json +29 -0
  82. package/test/fixtures/recordings/cloudflare-ai-gateway/cloudflare-ai-gateway-workers-ai-gpt-oss-20b-tools-tool-call.json +32 -0
  83. package/test/fixtures/recordings/cloudflare-ai-gateway/cloudflare-ai-gateway-workers-ai-llama-3-1-8b-text.json +32 -0
  84. package/test/fixtures/recordings/cloudflare-workers-ai/cloudflare-workers-ai-gpt-oss-20b-tools-tool-call.json +32 -0
  85. package/test/fixtures/recordings/cloudflare-workers-ai/cloudflare-workers-ai-llama-3-1-8b-text.json +32 -0
  86. package/test/fixtures/recordings/gemini/gemini-2-5-flash-image.json +32 -0
  87. package/test/fixtures/recordings/gemini/streams-text.json +28 -0
  88. package/test/fixtures/recordings/gemini/streams-tool-call.json +28 -0
  89. package/test/fixtures/recordings/gemini-cache/reports-cachedcontenttokencount-on-identical-second-call.json +46 -0
  90. package/test/fixtures/recordings/openai-chat/continues-after-tool-result.json +28 -0
  91. package/test/fixtures/recordings/openai-chat/drives-a-tool-loop-end-to-end.json +46 -0
  92. package/test/fixtures/recordings/openai-chat/streams-text.json +28 -0
  93. package/test/fixtures/recordings/openai-chat/streams-tool-call.json +28 -0
  94. package/test/fixtures/recordings/openai-compatible-chat/deepseek-streams-text.json +28 -0
  95. package/test/fixtures/recordings/openai-compatible-chat/groq-llama-3-3-70b-drives-a-tool-loop.json +53 -0
  96. package/test/fixtures/recordings/openai-compatible-chat/groq-streams-text.json +28 -0
  97. package/test/fixtures/recordings/openai-compatible-chat/groq-streams-tool-call.json +28 -0
  98. package/test/fixtures/recordings/openai-compatible-chat/openrouter-claude-opus-4-7-drives-a-tool-loop.json +54 -0
  99. package/test/fixtures/recordings/openai-compatible-chat/openrouter-gpt-4o-mini-drives-a-tool-loop.json +53 -0
  100. package/test/fixtures/recordings/openai-compatible-chat/openrouter-gpt-5-5-drives-a-tool-loop.json +54 -0
  101. package/test/fixtures/recordings/openai-compatible-chat/openrouter-streams-text.json +28 -0
  102. package/test/fixtures/recordings/openai-compatible-chat/openrouter-streams-tool-call.json +28 -0
  103. package/test/fixtures/recordings/openai-compatible-chat/togetherai-streams-text.json +28 -0
  104. package/test/fixtures/recordings/openai-compatible-chat/togetherai-streams-tool-call.json +28 -0
  105. package/test/fixtures/recordings/openai-responses/gpt-5-5-drives-a-tool-loop.json +54 -0
  106. package/test/fixtures/recordings/openai-responses/gpt-5-5-streams-text.json +28 -0
  107. package/test/fixtures/recordings/openai-responses/gpt-5-5-streams-tool-call.json +28 -0
  108. package/test/fixtures/recordings/openai-responses/openai-responses-gpt-5-5-image-tool-result.json +42 -0
  109. package/test/fixtures/recordings/openai-responses/openai-responses-gpt-5-5-reasoning-continuation.json +58 -0
  110. package/test/fixtures/recordings/openai-responses/openai-responses-gpt-5-5-reasoning.json +32 -0
  111. package/test/fixtures/recordings/openai-responses-cache/reports-cached-tokens-on-identical-second-call.json +46 -0
  112. package/test/generate-object.test.ts +184 -0
  113. package/test/lib/effect.ts +50 -0
  114. package/test/lib/http.ts +98 -0
  115. package/test/lib/openai-chunks.ts +27 -0
  116. package/test/lib/sse.ts +17 -0
  117. package/test/lib/tool-runtime.ts +146 -0
  118. package/test/llm.test.ts +167 -0
  119. package/test/provider/anthropic-messages-cache.recorded.test.ts +54 -0
  120. package/test/provider/anthropic-messages.recorded.test.ts +46 -0
  121. package/test/provider/anthropic-messages.test.ts +829 -0
  122. package/test/provider/bedrock-converse-cache.recorded.test.ts +54 -0
  123. package/test/provider/bedrock-converse.test.ts +707 -0
  124. package/test/provider/cloudflare.test.ts +230 -0
  125. package/test/provider/gemini-cache.recorded.test.ts +48 -0
  126. package/test/provider/gemini.test.ts +476 -0
  127. package/test/provider/golden.recorded.test.ts +219 -0
  128. package/test/provider/openai-chat.test.ts +446 -0
  129. package/test/provider/openai-compatible-chat.test.ts +238 -0
  130. package/test/provider/openai-responses-cache.recorded.test.ts +46 -0
  131. package/test/provider/openai-responses.test.ts +1322 -0
  132. package/test/provider/openrouter.test.ts +56 -0
  133. package/test/provider.types.ts +41 -0
  134. package/test/recorded-golden.ts +97 -0
  135. package/test/recorded-runner.ts +100 -0
  136. package/test/recorded-scenarios.ts +531 -0
  137. package/test/recorded-test.ts +74 -0
  138. package/test/recorded-utils.ts +56 -0
  139. package/test/recorded-websocket.ts +26 -0
  140. package/test/route.test.ts +43 -0
  141. package/test/schema.test.ts +97 -0
  142. package/test/tool-runtime.test.ts +802 -0
  143. package/test/tool-stream.test.ts +99 -0
  144. package/test/tool.types.ts +40 -0
  145. package/tsconfig.json +15 -0
package/AGENTS.md ADDED
@@ -0,0 +1,321 @@
1
+ # LLM Package Guide
2
+
3
+ ## Effect
4
+
5
+ - Prefer `HttpClient.HttpClient` / `HttpClientResponse.HttpClientResponse` over web `fetch` / `Response` at package boundaries.
6
+ - Use `Stream.Stream` for streaming data flow. Avoid ad hoc async generators or manual web reader loops unless an Effect `Stream` API cannot model the behavior.
7
+ - Use Effect Schema codecs for JSON encode/decode (`Schema.fromJsonString(...)`) instead of direct `JSON.parse` / `JSON.stringify` in implementation code.
8
+ - In `Effect.gen`, yield yieldable errors directly (`return yield* new MyError(...)`) instead of `Effect.fail(new MyError(...))`.
9
+ - Use `Effect.void` instead of `Effect.succeed(undefined)` when the successful value is intentionally void.
10
+
11
+ ## Conventions
12
+
13
+ Per-type constructors live on the type, not as top-level re-exports. Use `Message.system(...)`, `Message.user(...)`, `Message.assistant(...)`, `Message.tool(...)`, `Model.make(...)`, `ToolDefinition.make(...)`, `ToolCallPart.make(...)`, `ToolResultPart.make(...)`, `ToolChoice.make(...)`, `ToolChoice.named(...)`, `SystemPart.make(...)`, and `GenerationOptions.make(...)` directly. The top-level `LLM` namespace is reserved for request-shaped call APIs: `LLM.request`, `LLM.generate`, `LLM.stream`, `LLM.updateRequest`, and `LLM.generateObject`. Two ways to construct the same thing is one too many.
14
+
15
+ ## Tests
16
+
17
+ - Use `testEffect(...)` from `test/lib/effect.ts` for tests requiring Effect layers.
18
+ - Keep provider tests fixture-first. Live provider calls must stay behind `RECORD=true` and required API-key checks.
19
+
20
+ ## Architecture
21
+
22
+ This package is an Effect Schema-first LLM core. The Schema classes in `src/schema/` are the canonical runtime data model. Convenience functions in `src/llm.ts` are thin constructors that return those same Schema class instances; they should improve callsites without creating a second model.
23
+
24
+ Primary in-repo integration point:
25
+
26
+ - `packages/codilore/src/session/llm.ts` is the session-owned orchestration layer that decides whether a request uses AI SDK or this package's native route runtime.
27
+ - `packages/codilore/src/session/llm/native-request.ts` is the lowering adapter from Codilore's session/AI SDK-shaped data into this package's `LLMRequest` model.
28
+ - `packages/codilore/src/session/llm/native-runtime.ts` is the execution adapter that calls raw `LLMClient.stream(request)` and bridges one provider turn of Codilore tool calls through this package's typed dispatcher.
29
+ - `packages/codilore/src/session/llm/ai-sdk.ts` keeps the default AI SDK path compatible by converting AI SDK stream parts into this package's shared `LLMEvent`s.
30
+
31
+ Keep this package independent of session concerns. Session auth, permissions, plugins, telemetry headers, and runtime selection belong in `packages/codilore/src/session/llm.ts` and its local adapters.
32
+
33
+ ### Request Flow
34
+
35
+ The intended callsite is:
36
+
37
+ ```ts
38
+ const request = LLM.request({
39
+ model: OpenAI.configure({ apiKey }).responses("gpt-4o-mini"),
40
+ system: "You are concise.",
41
+ prompt: "Say hello.",
42
+ })
43
+
44
+ const response = yield * LLMClient.generate(request)
45
+ ```
46
+
47
+ `LLM.request(...)` builds an `LLMRequest`. `LLMClient.generate(...)` reads the executable route carried by `request.model.route`, builds the provider-native body, asks the route's transport for a real `HttpClientRequest.HttpClientRequest`, sends it through `RequestExecutor.Service`, parses the provider stream into common `LLMEvent`s, and finally returns an `LLMResponse`.
48
+
49
+ Use `LLMClient.stream(request)` when callers want incremental `LLMEvent`s. Use `LLMClient.generate(request)` when callers want those same events collected into an `LLMResponse`. Use `LLMClient.prepare<Body>(request)` to compile a request through the route pipeline without sending it — the optional `Body` type argument narrows `.body` to the route's native shape (e.g. `prepare<OpenAIChatBody>(...)` returns a `PreparedRequestOf<OpenAIChatBody>`). The runtime body is identical; the generic is a type-level assertion.
50
+
51
+ Filter or narrow `LLMEvent` streams with `LLMEvent.is.*` (camelCase guards, e.g. `events.filter(LLMEvent.is.toolCall)`). The kebab-case `LLMEvent.guards["tool-call"]` form also works but prefer `is.*` in new code.
52
+
53
+ ### Routes
54
+
55
+ A route is the registered, runnable composition of four orthogonal pieces:
56
+
57
+ - **`Protocol`** (`src/route/protocol.ts`) — semantic API contract. Owns request body construction (`body.from`), the body schema (`body.schema`), the streaming-event schema (`stream.event`), and the event-to-`LLMEvent` state machine (`stream.step`). `Route.make(...)` validates and JSON-encodes the body from `body.schema` and decodes frames with `stream.event`. Examples: `OpenAIChat.protocol`, `OpenAIResponses.protocol`, `AnthropicMessages.protocol`, `Gemini.protocol`, `BedrockConverse.protocol`.
58
+ - **`Endpoint`** (`src/route/endpoint.ts`) — URL construction. The host, path, and route query live on the endpoint. `Endpoint.path("/chat/completions", { baseURL })` is the common case; pass a function for paths that embed the model id or a body field (e.g. `Endpoint.path(({ body }) => `/model/${body.modelId}/converse-stream`)`).
59
+ - **`Auth`** (`src/route/auth.ts`) — per-request transport authentication. Provider facades configure credentials onto the route before model selection, usually via `Auth.bearer(apiKey)` or `Auth.header(name, apiKey)`. Routes that need per-request signing (Bedrock SigV4, future Vertex IAM, Azure AAD) implement `Auth` as a function that signs the body and merges signed headers into the result.
60
+ - **`Framing`** (`src/route/framing.ts`) — bytes → frames. SSE (`Framing.sse`) is shared; Bedrock keeps its AWS event-stream framing as a typed `Framing<object>` value alongside its protocol.
61
+
62
+ Compose them via `Route.make(...)`:
63
+
64
+ ```ts
65
+ export const route = Route.make({
66
+ id: "openai-chat",
67
+ provider: "openai",
68
+ protocol: OpenAIChat.protocol,
69
+ endpoint: Endpoint.path("/chat/completions", {
70
+ baseURL: "https://api.openai.com/v1",
71
+ }),
72
+ auth: Auth.bearer(),
73
+ framing: Framing.sse,
74
+ })
75
+ ```
76
+
77
+ Route defaults are request-shaping defaults such as `headers`, `limits`, `generation`, `providerOptions`, and `http`. Endpoint host/query belongs on the route endpoint. Selected `Model` values carry only model id, provider id, and the configured route value. Model capability/catalog metadata lives outside this package; protocol support is enforced by request lowering and typed `LLMError`s.
78
+
79
+ The four-axis decomposition is the reason DeepSeek, TogetherAI, Cerebras, Baseten, Fireworks, and DeepInfra all reuse `OpenAIChat.protocol` verbatim — each provider deployment is a 5-15 line `Route.make(...)` call instead of a 300-400 line route clone. Bug fixes in one protocol propagate to every consumer of that protocol in a single commit.
80
+
81
+ When a provider ships a non-HTTP transport (OpenAI's WebSocket Responses backend, hypothetical bidirectional streaming APIs), the seam is `Transport` — `WebSocketTransport.jsonTransport.with(...)` constructs an IO template whose `prepare` receives the route endpoint/auth at compile time, builds a WebSocket URL and message, and whose `frames` yields decoded text from the socket. Same protocol and endpoint source, different transport.
82
+
83
+ ### URL Construction
84
+
85
+ `Endpoint` owns `{ baseURL, path, query }`. Each protocol route includes a canonical endpoint when the provider has one (e.g. `https://api.openai.com/v1`); provider helpers override endpoint fields by configuring the route before selecting a model. Routes that have no canonical URL (OpenAI-compatible Chat, GitHub Copilot) require configuration before execution.
86
+
87
+ For providers where the URL is derived from typed inputs (Azure resource name, Bedrock region), the provider helper configures the route endpoint before calling `.model(...)`. Use `AtLeastOne<T>` from `route/auth-options.ts` for inputs that accept either of two derivation paths (Azure: `resourceName` or `baseURL`).
88
+
89
+ ### Provider Facades
90
+
91
+ Provider-facing APIs are configured facades over route values. Endpoint/auth/resource/API-version setup happens before model selection, and model selectors accept only a model or deployment id:
92
+
93
+ ```ts
94
+ const openai = OpenAI.configure({ apiKey, baseURL })
95
+ const model = openai.responses("gpt-4o-mini")
96
+
97
+ const azure = Azure.configure({ resourceName, apiKey, apiVersion: "v1" })
98
+ const deployment = azure.responses("my-deployment")
99
+
100
+ const gateway = CloudflareAIGateway.configure({ accountId, gatewayId, gatewayApiKey, apiKey })
101
+ const proxied = gateway.model("openai/gpt-4o-mini")
102
+ ```
103
+
104
+ Keep provider facades small and explicit:
105
+
106
+ - Use branded `ProviderID.make(...)` and `ModelID.make(...)` where ids are constructed directly.
107
+ - Use `model` for the default API path and named methods for provider-native alternatives such as OpenAI `responses`, `responsesWebSocket`, and `chat`.
108
+ - Put provider-specific setup on `.configure(...)`; do not add `model(id, overrides)` as a duplicate construction path.
109
+ - Export lower-level `routes` arrays separately only when advanced internal wiring needs them.
110
+ - Prefer `apiKey` as provider-specific sugar and `auth` as the explicit override; keep them mutually exclusive in provider option types with `ProviderAuthOption`.
111
+ - Resolve `apiKey` → `Auth` with `AuthOptions.bearer(options, "<PROVIDER>_API_KEY")` (it honors an explicit `auth` override and falls back to `Auth.config(envVar)` so missing keys surface a typed `Authentication` error rather than a runtime crash).
112
+ - Use separate top-level facades for products with different required setup, such as `CloudflareAIGateway` and `CloudflareWorkersAI`.
113
+
114
+ `Provider.make(...)` remains available for simple static provider definitions, but new built-in providers should prefer plain configured facades unless a helper removes real duplication without adding runtime behavior.
115
+
116
+ ### Folder layout
117
+
118
+ ```
119
+ packages/llm/src/
120
+ schema/ canonical Schema model, split by concern
121
+ ids.ts branded IDs, literal types, ProviderMetadata
122
+ options.ts Generation/Provider/Http options, Limits, Model, cache policy
123
+ messages.ts content parts, Message, ToolDefinition, LLMRequest
124
+ events.ts Usage, individual events, LLMEvent, PreparedRequest, LLMResponse
125
+ errors.ts error reasons, LLMError, ToolFailure
126
+ index.ts barrel
127
+ llm.ts request constructors and convenience helpers
128
+ route/
129
+ index.ts @codilore/llm/route advanced barrel
130
+ client.ts Route.make + LLMClient.prepare/stream/generate
131
+ executor.ts RequestExecutor service + transport error mapping
132
+ protocol.ts Protocol type + Protocol.make
133
+ endpoint.ts Endpoint type + Endpoint.path
134
+ auth.ts Auth type + Auth.bearer / Auth.apiKeyHeader / Auth.passthrough
135
+ auth-options.ts ProviderAuthOption shape, AuthOptions.bearer, AtLeastOne helper
136
+ framing.ts Framing type + Framing.sse
137
+ transport/ transport implementations
138
+ index.ts Transport type + HttpTransport / WebSocketTransport namespaces
139
+ http.ts HttpTransport.httpJson — POST + framing
140
+ websocket.ts WebSocketTransport.json + WebSocketExecutor service
141
+ protocols/
142
+ shared.ts ProviderShared toolkit used inside protocol impls
143
+ openai-chat.ts protocol + route (compose OpenAIChat.protocol)
144
+ openai-responses.ts
145
+ anthropic-messages.ts
146
+ gemini.ts
147
+ bedrock-converse.ts
148
+ bedrock-event-stream.ts framing for AWS event-stream binary frames
149
+ openai-compatible-chat.ts route that reuses OpenAIChat.protocol, no canonical URL
150
+ utils/ per-protocol helpers (auth, cache, media, tool-stream, ...)
151
+ providers/
152
+ openai-compatible.ts generic compatible helper + family model helpers
153
+ openai-compatible-profile.ts family defaults (deepseek, togetherai, ...)
154
+ azure.ts / amazon-bedrock.ts / cloudflare.ts / github-copilot.ts / google.ts / xai.ts / openai.ts / anthropic.ts / openrouter.ts
155
+ tool.ts typed tool() helper
156
+ tool-runtime.ts narrow one-call typed tool dispatcher
157
+ ```
158
+
159
+ The dependency arrow points down: `providers/*.ts` files import protocol routes and auth-option utilities; protocol modules import `endpoint`, `auth`, `framing`, and transport pieces. Protocols do not import provider facades. Lower-level modules know nothing about provider catalog metadata.
160
+
161
+ ### Shared protocol helpers
162
+
163
+ `ProviderShared` exports a small toolkit used inside protocol implementations to keep them focused on provider-native shapes:
164
+
165
+ - `joinText(parts)` — joins an array of `TextPart` (or anything with a `.text`) with newlines. Use this anywhere a protocol flattens text content into a single string for a provider field.
166
+ - `parseToolInput(route, name, raw)` — Schema-decodes a tool-call argument string with the canonical "Invalid JSON input for `<route>` tool call `<name>`" error message. Treats empty input as `{}`.
167
+ - `parseJson(route, raw, message)` — generic JSON-via-Schema decode for non-tool bodies.
168
+ - `eventError(route, message, ...)` — typed `InvalidProviderOutput` constructor for stream-time decode failures.
169
+ - `validateWith(decoder)` — maps Schema decode errors to `InvalidRequest`. `Route.make(...)` uses this for body validation; lower-level routes can reuse it.
170
+ - `matchToolChoice(provider, choice, branches)` — branches over `LLMRequest["toolChoice"]` for provider-specific lowering.
171
+
172
+ If you find yourself copying a 3-to-5-line snippet between two protocols, lift it into `ProviderShared` next to these helpers rather than duplicating.
173
+
174
+ ### Chronological System Updates
175
+
176
+ `LLMRequest.system` is the initial privileged prompt that applies ahead of the conversation. `Message.system(...)` is a separate, provider-neutral chronological operator update inside `LLMRequest.messages`; it applies only from its position in history onward and accepts text content only.
177
+
178
+ Native chronological system messages are route/model-specific. Anthropic Messages lowers them natively for Claude Opus 4.8 (`claude-opus-4-8`). Other routes and models intentionally lower the update in place into ordinary user-compatible text using this stable escaped representation:
179
+
180
+ ```text
181
+ <system-update>
182
+ ...
183
+ </system-update>
184
+ ```
185
+
186
+ The wrapped-user fallback preserves ordering while visibly lowering authority. Never silently pass a raw chronological `role: "system"` through a route that might reject it. Do not insert raw retrieved documents, tool output, or web content into privileged chronological system updates; keep untrusted content in ordinary user/tool channels.
187
+
188
+ ### Tools
189
+
190
+ Tool loops are represented in common messages and events:
191
+
192
+ ```ts
193
+ const call = ToolCallPart.make({ id: "call_1", name: "lookup", input: { query: "weather" } })
194
+ const result = Message.tool({ id: "call_1", name: "lookup", result: { forecast: "sunny" } })
195
+
196
+ const followUp = LLM.request({
197
+ model,
198
+ messages: [Message.user("Weather?"), Message.assistant([call]), result],
199
+ })
200
+ ```
201
+
202
+ Routes lower these into provider-native assistant tool-call messages and tool-result messages. Streaming providers should emit `tool-input-delta` events while arguments arrive, then a final `tool-call` event with parsed input.
203
+
204
+ ### Tool dispatch
205
+
206
+ `LLM.stream(request)` and `LLM.generate(request)` each run exactly one provider turn. Add tool schemas to `request.tools` with `Tool.toDefinitions(tools)`. When a caller wants the package's typed one-call execution behavior, pass each canonical local `tool-call` event to `ToolRuntime.dispatch(tools, call)`.
207
+
208
+ ```ts
209
+ const get_weather = tool({
210
+ description: "Get current weather for a city",
211
+ parameters: Schema.Struct({ city: Schema.String }),
212
+ success: Schema.Struct({ temperature: Schema.Number, condition: Schema.String }),
213
+ execute: ({ city }) =>
214
+ Effect.gen(function* () {
215
+ // city: string — typed from parameters Schema
216
+ const data = yield* WeatherApi.fetch(city)
217
+ return { temperature: data.temp, condition: data.cond }
218
+ // return type checked against success Schema
219
+ }),
220
+ })
221
+
222
+ const tools = { get_weather, get_time, ... }
223
+ const events = yield* LLM.stream(
224
+ LLM.updateRequest(request, { tools: Tool.toDefinitions(tools) }),
225
+ ).pipe(Stream.runCollect)
226
+
227
+ const call = Array.from(events).find(LLMEvent.is.toolCall)
228
+ if (call && !call.providerExecuted) {
229
+ const dispatched = yield* ToolRuntime.dispatch(tools, call)
230
+ // Persist call + dispatched.result, then construct the next request explicitly.
231
+ }
232
+ ```
233
+
234
+ The dispatcher:
235
+
236
+ - On `tool-call`: looks up the named tool, decodes input against `parameters` Schema, dispatches to the typed `execute`, encodes the result against `success` Schema, and returns canonical `tool-result` events.
237
+ - Does not stream providers, construct Session events, schedule fibers, append history, count steps, or continue model rounds.
238
+ - Leaves persistence and continuation to the enclosing product flow.
239
+
240
+ Handler dependencies (services, permissions, plugin hooks, abort handling) are closed over by the consumer at tool-construction time. Build the tools record inside an `Effect.gen` once and reuse it across many dispatches.
241
+
242
+ Errors must be expressed as `ToolFailure`. The runtime catches it and emits a `tool-error` event, then a `tool-result` of `type: "error"`, so the model can self-correct on the next step. Anything that is not a `ToolFailure` is treated as a defect and fails the stream. Three recoverable error paths produce `tool-error` events:
243
+
244
+ - The model called an unknown tool name.
245
+ - Input failed the `parameters` Schema.
246
+ - The handler returned a `ToolFailure`.
247
+
248
+ Provider-defined / hosted tools (Anthropic `web_search` / `code_execution` / `web_fetch`, OpenAI Responses `web_search_call` / `file_search_call` / `code_interpreter_call` / `mcp_call` / `local_shell_call` / `image_generation_call` / `computer_use_call`) pass through the runtime untouched:
249
+
250
+ - Routes surface the model's call as a `tool-call` event with `providerExecuted: true`, and the provider's result as a matching `tool-result` event with `providerExecuted: true`.
251
+ - Callers detect `providerExecuted` on `tool-call` and **skip local dispatch** — no handler is invoked and no `tool-error` is raised for "unknown tool". The provider already executed it.
252
+ - Callers that continue should retain both events in explicit history when the protocol requires it. Anthropic encodes them back as `server_tool_use` + `web_search_tool_result` (or `code_execution_tool_result` / `web_fetch_tool_result`) blocks; OpenAI Responses callers typically use `previous_response_id` instead of resending hosted-tool items.
253
+
254
+ Add provider-defined tools to `request.tools` (no runtime entry needed). The matching route must know how to lower the tool definition into the provider-native shape; right now Anthropic accepts `web_search` / `code_execution` / `web_fetch` and OpenAI Responses accepts the hosted tool names listed above.
255
+
256
+ ## Protocol File Style
257
+
258
+ Protocol files should look self-similar. Provider quirks belong behind named helpers so a new route can be reviewed by comparing the same sections across files.
259
+
260
+ ### Section order
261
+
262
+ Use this order for every protocol module:
263
+
264
+ 1. Public model input
265
+ 2. Request body schema
266
+ 3. Streaming event schema
267
+ 4. Parser state
268
+ 5. Request body construction (`fromRequest`)
269
+ 6. Stream parsing (`step` and per-event handlers)
270
+ 7. Protocol and route
271
+ 8. Protocol route export
272
+
273
+ ### Rules
274
+
275
+ - Keep protocol files focused on the protocol. Move provider-specific projection, signing, media normalization, or other bulky transformations into `src/protocols/utils/*`.
276
+ - Use `Effect.fn("Provider.fromRequest")` for request body construction entrypoints. Use `Effect.fn(...)` for event handlers that yield effects; keep purely synchronous handlers as plain functions returning a `StepResult` that the dispatcher lifts via `Effect.succeed(...)`.
277
+ - Parser state owns terminal information. The state machine records finish reason, usage, and pending tool calls; emit one terminal `finish` event (or `provider-error`) for each completed response. If a provider splits reason and usage across events, merge them in parser state before flushing.
278
+ - Emit exactly one terminal `finish` event for a completed response, normally after a matching `step-finish`. Use `stream.terminal` to stop reading when the provider has a completion sentinel; use `stream.onHalt` when the final event must be flushed after the framed stream ends.
279
+ - Use shared helpers for repeated protocol policy such as text joining, usage totals, JSON parsing, and tool-call accumulation. `ToolStream` (`protocols/utils/tool-stream.ts`) accumulates streamed tool-call arguments uniformly.
280
+ - Make intentional provider differences explicit in helper names or comments. If two protocol files differ visually, the reason should be obvious from the names.
281
+ - Prefer dispatched per-event handlers (`onMessageStart`, `onContentBlockDelta`, ...) called from a small top-level `step` switch over a long if-chain. The dispatcher keeps the event surface visible at a glance.
282
+ - Keep tests in the same conceptual order as the protocol: basic prepare, tools prepare, unsupported lowering, text/usage parsing, tool streaming, finish reasons, provider errors.
283
+
284
+ ### Review checklist
285
+
286
+ - Can the file be skimmed side-by-side with `openai-chat.ts` without hunting for equivalent sections?
287
+ - Are provider quirks named, isolated, and covered by focused tests?
288
+ - Does request body construction validate unsupported common content at the protocol boundary?
289
+ - Does stream parsing emit stable common events without leaking provider event order to callers?
290
+ - Does `toolChoice: "none"` behavior read as intentional?
291
+
292
+ ## Recording Tests
293
+
294
+ Recorded tests use one cassette file per scenario. A cassette holds an ordered array of `{ request, response }` interactions, so multi-step flows (tool loops, retries, polling) record into a single file. Use `recordedTests({ prefix, requires })` and let the helper derive cassette names from test names:
295
+
296
+ ```ts
297
+ const recorded = recordedTests({ prefix: "openai-chat", requires: ["OPENAI_API_KEY"] })
298
+
299
+ recorded.effect("streams text", () =>
300
+ Effect.gen(function* () {
301
+ // test body
302
+ }),
303
+ )
304
+ ```
305
+
306
+ Replay is the default. `RECORD=true` records fresh cassettes and requires the listed env vars. Cassettes are written as pretty-printed JSON so multi-interaction diffs stay reviewable.
307
+
308
+ Pass `provider`, `protocol`, and optional `tags` to `recordedTests(...)` / `recorded.effect.with(...)` so cassettes carry searchable metadata. Use recorded-test filters to replay or record a narrow subset without rewriting a whole file:
309
+
310
+ - `RECORDED_PROVIDER=openai` matches tests tagged with `provider:openai`; comma-separated values are allowed.
311
+ - `RECORDED_PREFIX=openai-chat` matches cassette groups by `recordedTests({ prefix })`; comma-separated values are allowed.
312
+ - `RECORDED_TAGS=tool` requires all listed tags to be present, e.g. `RECORDED_TAGS=provider:togetherai,tool`.
313
+ - `RECORDED_TEST="streams text"` matches by test name, kebab-case test id, or cassette path.
314
+
315
+ Filters apply in replay and record mode. Combine them with `RECORD=true` when refreshing only one provider or scenario.
316
+
317
+ **Binary response bodies.** Most providers stream text (SSE, JSON). AWS Bedrock streams binary AWS event-stream frames whose CRC32 fields would be mangled by a UTF-8 round-trip — those bodies are stored as base64 with `bodyEncoding: "base64"` on the response snapshot. Detection is by `Content-Type` in `@codilore/http-recorder` (currently `application/vnd.amazon.eventstream` and `application/octet-stream`); cassettes for SSE/JSON routes omit the field and decode as text.
318
+
319
+ **Matching strategy.** Replay walks the cassette in record order via an internal cursor: the Nth runtime request is served by the Nth recorded interaction, and each one is validated by comparing method, URL, allow-listed headers, and the canonical JSON body. This handles tool loops (each round's request differs as history grows) and retry/polling scenarios (successive byte-identical requests with different responses) uniformly. If a test reorders its requests, re-record the cassette. `scriptedResponses` (in `test/lib/http.ts`) is the deterministic counterpart for tests that don't need a live provider; it scripts response bodies in order without reading from disk.
320
+
321
+ Do not blanket re-record an entire test file when adding one cassette. `RECORD=true` rewrites every recorded case that runs, and provider streams contain volatile IDs, timestamps, fingerprints, and obfuscation fields. Prefer deleting the one cassette you intend to refresh, or run a focused test pattern that only registers the scenario you want to record. Keep stable existing cassettes unchanged unless their request shape or expected behavior changed.
package/README.md ADDED
@@ -0,0 +1,131 @@
1
+ # @codilore/llm
2
+
3
+ Schema-first LLM core for Codilore. One typed request, response, event, and tool language; provider quirks live in adapters, not in calling code.
4
+
5
+ ```ts
6
+ import { Effect } from "effect"
7
+ import { LLM, LLMClient } from "@codilore/llm"
8
+ import { OpenAI } from "@codilore/llm/providers"
9
+
10
+ const model = OpenAI.configure({ apiKey: process.env.OPENAI_API_KEY }).responses("gpt-4o-mini")
11
+
12
+ const request = LLM.request({
13
+ model,
14
+ system: "You are concise.",
15
+ prompt: "Say hello in one short sentence.",
16
+ generation: { maxTokens: 40 },
17
+ })
18
+
19
+ const program = Effect.gen(function* () {
20
+ const response = yield* LLMClient.generate(request)
21
+ console.log(response.text)
22
+ })
23
+ ```
24
+
25
+ Run `LLMClient.stream(request)` instead of `generate` when you want incremental `LLMEvent`s. The event stream is provider-neutral — same shape across OpenAI Chat, OpenAI Responses, Anthropic Messages, Gemini, Bedrock Converse, and any OpenAI-compatible deployment.
26
+
27
+ ## Public API
28
+
29
+ - **`LLM.request({...})`** — build a provider-neutral `LLMRequest`. Accepts ergonomic inputs (`system: string`, `prompt: string`) that normalize into the canonical Schema classes.
30
+ - **`LLM.generate` / `LLM.stream`** — re-exported from `LLMClient` for one-import use.
31
+ - **`Message.user(...)` / `Message.assistant(...)` / `Message.tool(...)`** — message constructors from the canonical schema model.
32
+ - **`Model.make(...)` / `ToolCallPart.make(...)` / `ToolResultPart.make(...)` / `ToolDefinition.make(...)`** — model and tool-related constructors from the canonical schema model.
33
+ - **`LLMClient.prepare(request)`** — compile a request through protocol body construction, validation, and HTTP preparation without sending. Useful for inspection and testing.
34
+ - **`LLMEvent.is.*`** — typed guards (`is.textDelta`, `is.toolCall`, `is.finish`, …) for filtering streams.
35
+
36
+ ## Caching
37
+
38
+ Prompt caching is **on by default**. Every `LLMRequest` resolves to `cache: "auto"` unless the caller opts out with `cache: "none"`. Each protocol translates `CacheHint`s to its wire format (`cache_control` on Anthropic, `cachePoint` on Bedrock; OpenAI and Gemini do implicit caching server-side and don't need inline markers — auto is a no-op there).
39
+
40
+ ### Auto placement
41
+
42
+ `"auto"` places three breakpoints — last tool definition, last system part, latest user message. The last-user-message boundary is the load-bearing detail: in a tool-use loop, a single user turn expands into many assistant/tool round-trips, all sharing that prefix. Caching at that boundary lets every intra-turn API call hit.
43
+
44
+ The math justifies the default: Anthropic's 5-minute cache write is 1.25× base, read is 0.1×, so a single reuse within 5 minutes already wins. One-shot completions below the per-model minimum-cacheable-token threshold silently no-op on the wire, so the worst case is harmless.
45
+
46
+ ### Opting out
47
+
48
+ ```ts
49
+ LLM.request({
50
+ model,
51
+ system,
52
+ prompt: "one-off question",
53
+ cache: "none",
54
+ })
55
+ ```
56
+
57
+ ### Granular policy
58
+
59
+ ```ts
60
+ cache: {
61
+ tools?: boolean,
62
+ system?: boolean,
63
+ messages?: "latest-user-message" | "latest-assistant" | { tail: number },
64
+ ttlSeconds?: number, // ≥ 3600 → 1h on Anthropic/Bedrock; else 5m
65
+ }
66
+ ```
67
+
68
+ ### Manual hints
69
+
70
+ Inline `CacheHint` on any text / system / tool / tool-result part overrides automatic placement. The auto policy preserves manual hints; it only fills gaps.
71
+
72
+ ```ts
73
+ LLM.request({
74
+ model,
75
+ system: [
76
+ { type: "text", text: "stable system prompt", cache: { type: "ephemeral" } },
77
+ ],
78
+ ...
79
+ })
80
+ ```
81
+
82
+ ### Provider behavior table
83
+
84
+ | Protocol | `cache: "auto"` |
85
+ | ----------------------- | ------------------------------------------------------------------------- |
86
+ | Anthropic Messages | emits up to 3 `cache_control` markers (4-breakpoint cap enforced) |
87
+ | Bedrock Converse | emits up to 3 `cachePoint` blocks (4-breakpoint cap enforced) |
88
+ | OpenAI Chat / Responses | no-op (implicit caching above 1024 tokens) |
89
+ | Gemini | no-op (implicit caching on 2.5+; explicit `CachedContent` is out-of-band) |
90
+
91
+ Normalized cache usage is read back into `response.usage.cacheReadInputTokens` and `cacheWriteInputTokens` across every provider.
92
+
93
+ ## Providers
94
+
95
+ Provider facades configure endpoint/auth/deployment details first, then expose model selectors that take only a model or deployment id. The selected model carries the executable route value used at runtime.
96
+
97
+ ```ts
98
+ import { OpenAI, CloudflareAIGateway } from "@codilore/llm/providers"
99
+
100
+ const openai = OpenAI.configure({ apiKey: process.env.OPENAI_API_KEY }).responses("gpt-4o-mini")
101
+ const gateway = CloudflareAIGateway.configure({
102
+ accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
103
+ gatewayApiKey: process.env.CLOUDFLARE_API_TOKEN,
104
+ }).model("workers-ai/@cf/meta/llama-3.1-8b-instruct")
105
+ ```
106
+
107
+ Included providers: OpenAI, Anthropic, Google (Gemini), Amazon Bedrock, Azure OpenAI, Cloudflare AI Gateway, Cloudflare Workers AI, GitHub Copilot, OpenRouter, xAI, plus generic OpenAI-compatible helpers for DeepSeek, Cerebras, Groq, Fireworks, Together, etc.
108
+
109
+ ## Provider options & HTTP overlays
110
+
111
+ Three escape hatches in order of stability:
112
+
113
+ 1. **`generation`** — portable knobs (`maxTokens`, `temperature`, `topP`, `topK`, penalties, seed, stop).
114
+ 2. **`providerOptions: { <provider>: {...} }`** — typed-at-the-facade provider-specific knobs (OpenAI `promptCacheKey`, Anthropic `thinking`, Gemini `thinkingConfig`, OpenRouter routing).
115
+ 3. **`http: { body, headers, query }`** — last-resort serializable overlays merged into the final HTTP request. Reach for this only when a stable typed path doesn't yet exist.
116
+
117
+ Route/provider defaults are overridden by request-level values for each axis.
118
+
119
+ ## Routes
120
+
121
+ Adding a new model or deployment is usually 5-15 lines using `Route.make({ protocol, endpoint, auth, framing, ... })`. The route owns endpoint/auth/framing and the protocol owns body construction plus stream parsing. Transports are reusable IO templates that receive route endpoint/auth at compile time. Capability/catalog metadata lives outside this low-level package; unsupported request shapes fail during protocol lowering. See `AGENTS.md` for the architectural detail.
122
+
123
+ ## Effect
124
+
125
+ This package is built on Effect. Public methods return `Effect` or `Stream`; provide `LLMClient.layer` for runtime dispatch and import the provider/protocol modules for the routes you use. The example at `example/tutorial.ts` is a runnable walkthrough.
126
+
127
+ ## See also
128
+
129
+ - `AGENTS.md` — architecture, route construction, contributor guide
130
+ - `example/tutorial.ts` — runnable end-to-end walkthrough
131
+ - `test/provider/*.test.ts` — fixture-first protocol tests; `*.recorded.test.ts` files cover live cassettes