ollama-agent-router 0.1.6 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,9 +1,18 @@
1
1
  # ollama-agent-router
2
2
 
3
- `ollama-agent-router` is a local HTTP and CLI gateway for Ollama. It exposes an OpenAI-compatible chat completion endpoint and routes each request to the best configured local model based on task type, queue depth, loaded model state, GPU/VRAM headroom, priority, and sync/async policy.
3
+ Ollama Agent Router is a local LLM router for Ollama. It provides an OpenAI-compatible API gateway that routes agent and chat requests to the best local Ollama model based on task type, GPU/VRAM headroom, queue depth, loaded model state, and sync/async policy.
4
4
 
5
5
  It is designed for machines that run several Ollama models with different strengths, for example a small triage model, one or more code models, and a larger exclusive reasoning model.
6
6
 
7
+ ## Why use Ollama Agent Router?
8
+
9
+ Use `ollama-agent-router` when you need:
10
+
11
+ - an Ollama router for multiple local models
12
+ - an Ollama agent router for coding agents and autonomous workflows
13
+ - an OpenAI-compatible local LLM router
14
+ - GPU-aware routing, queues, async jobs, and model selection
15
+
7
16
  ## Architecture
8
17
 
9
18
  Request flow:
@@ -97,6 +106,7 @@ Lookup order:
97
106
  Top-level sections:
98
107
 
99
108
  - `server`: host, port, base path, HTTPS certificates, and JSON body limit.
109
+ - `access`: optional access control for standalone, runtime-agent, and admin planes.
100
110
  - `ollama`: base URL, OpenAI-compatible path, native API path, keep-alive, timeout.
101
111
  - `gpu`: provider, VRAM limits, GPU-only default, NVIDIA monitor command.
102
112
  - `router`: default mode, heavy-load thresholds, classifier config.
@@ -146,6 +156,143 @@ server:
146
156
  caPath:
147
157
  ```
148
158
 
159
+ ## Access Planes
160
+
161
+ The router can expose three separate access planes:
162
+
163
+ - **Standalone plane**: the full local OpenAI-compatible router API, including `POST /v1/chat/completions` and job endpoints.
164
+ - **Runtime agent plane**: machine-local endpoints used by Kong or another gateway, including `/v1/router/*` and selected-model execution.
165
+ - **Admin plane**: access-management endpoints under `/v1/admin/access/*`.
166
+
167
+ Access control is backward-compatible. If `access` is not configured, the standalone and runtime-agent planes stay enabled without API key requirements, matching earlier releases. The admin plane is disabled by default.
168
+
169
+ API keys are sent with:
170
+
171
+ ```text
172
+ Authorization: Bearer <api-key>
173
+ ```
174
+
175
+ `x-api-key` is also accepted for clients that cannot set bearer tokens.
176
+
177
+ Create SHA-256 key hashes before putting keys in config:
178
+
179
+ ```bash
180
+ node -e "const crypto=require('crypto'); console.log('sha256:'+crypto.createHash('sha256').update(process.argv[1]).digest('hex'))" 'secret-value'
181
+ ```
182
+
183
+ Example access configuration:
184
+
185
+ ```yaml
186
+ access:
187
+ managedConfigPath: ./ollama-agent-router.access.yaml
188
+ bootstrapIfMissing: true
189
+
190
+ admin:
191
+ enabled: true
192
+ allowedIps: [127.0.0.1, "::1", 10.0.0.0/8]
193
+ trustedProxy: false
194
+ apiKeyHashes:
195
+ - sha256:replace-with-admin-key-hash
196
+ clientCert:
197
+ required: false
198
+ allowedFingerprints: []
199
+ allowedSubjects: []
200
+ auditLog: true
201
+
202
+ managed:
203
+ version: 1
204
+ planes:
205
+ standalone:
206
+ enabled: true
207
+ auth:
208
+ requireApiKey: true
209
+ anonymous: reject
210
+ defaultLimit:
211
+ requests: 60
212
+ windowSeconds: 60
213
+ runtimeAgent:
214
+ enabled: true
215
+ auth:
216
+ requireApiKey: true
217
+ anonymous: reject
218
+ defaultLimit:
219
+ requests: 600
220
+ windowSeconds: 60
221
+ apiKeys:
222
+ - id: local-client
223
+ name: Local standalone client
224
+ keyHash: sha256:replace-with-client-key-hash
225
+ enabled: true
226
+ scopes: [standalone]
227
+ limits:
228
+ standalone:
229
+ requests: 120
230
+ windowSeconds: 60
231
+ - id: kong-runtime
232
+ name: Kong runtime caller
233
+ keyHash: sha256:replace-with-kong-key-hash
234
+ enabled: true
235
+ scopes: [runtimeAgent]
236
+ limits:
237
+ runtimeAgent:
238
+ requests: 2000
239
+ windowSeconds: 60
240
+ ```
241
+
242
+ `access.managed` is the initial access policy. When `access.managedConfigPath` is set, the router loads that file at startup. If the file is missing and `bootstrapIfMissing: true`, it writes the initial policy to that path. Admin API changes are then written atomically to this managed YAML file and survive restarts.
243
+
244
+ The admin plane security settings are boot-only. They are intentionally not managed through the admin API:
245
+
246
+ - `access.admin.allowedIps`
247
+ - `access.admin.trustedProxy`
248
+ - `access.admin.apiKeyHashes`
249
+ - `access.admin.clientCert`
250
+
251
+ This prevents the admin API from changing the rules that protect itself. When `access.admin.enabled: true`, `access.managedConfigPath` and at least one admin API key hash are required.
252
+
253
+ Admin API:
254
+
255
+ ```bash
256
+ curl http://127.0.0.1:11435/v1/admin/access/config \
257
+ -H 'authorization: Bearer admin-secret'
258
+
259
+ curl -X PUT http://127.0.0.1:11435/v1/admin/access/config \
260
+ -H 'authorization: Bearer admin-secret' \
261
+ -H 'content-type: application/json' \
262
+ -d '{
263
+ "expectedVersion": 1,
264
+ "config": {
265
+ "version": 1,
266
+ "planes": {
267
+ "standalone": {
268
+ "enabled": true,
269
+ "auth": {"requireApiKey": true, "anonymous": "reject"},
270
+ "defaultLimit": {"requests": 60, "windowSeconds": 60}
271
+ },
272
+ "runtimeAgent": {
273
+ "enabled": true,
274
+ "auth": {"requireApiKey": true, "anonymous": "reject"},
275
+ "defaultLimit": {"requests": 600, "windowSeconds": 60}
276
+ }
277
+ },
278
+ "apiKeys": []
279
+ }
280
+ }'
281
+ ```
282
+
283
+ The admin `PUT` supports optimistic concurrency. If `expectedVersion` is present and does not match the active managed access config, the router returns `409`.
284
+
285
+ For admin client certificate checks, enable HTTPS, configure `server.https.caPath`, and set:
286
+
287
+ ```yaml
288
+ access:
289
+ admin:
290
+ clientCert:
291
+ required: true
292
+ ```
293
+
294
+ The HTTPS server requests a client certificate and the admin middleware verifies that it is trusted. Optional fingerprint and subject allowlists can narrow trust further.
295
+
149
296
  ## API Examples
150
297
 
151
298
  Sync-preferred request:
@@ -197,7 +344,7 @@ curl http://127.0.0.1:11435/v1/router/gpu
197
344
 
198
345
  ## Kong Runtime Agent API
199
346
 
200
- When used with `kong-ollama-router`, this process acts as a local runtime agent. Kong owns public request validation, classification, model selection, and response enrichment. The node-router supplies machine-local state and executes the model selected by Kong.
347
+ When used with [`kong-ollama-agent-router`](https://github.com/ExeconOne/kong-ollama-agent-router), this process acts as a local runtime agent. Kong owns public request validation, classification, model selection, and response enrichment. The node-router supplies machine-local state and executes the model selected by Kong.
201
348
 
202
349
  Kong-facing endpoints:
203
350