ollama-agent-router 0.1.6 → 0.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +200 -2
- package/dist/cli.js +645 -147
- package/dist/cli.js.map +1 -1
- package/dist/index.d.ts +1579 -3
- package/dist/index.js +665 -154
- package/dist/index.js.map +1 -1
- package/docs/kong-runtime-contract-plan.md +3 -3
- package/examples/gex44-secured.yaml +181 -0
- package/package.json +20 -9
package/README.md
CHANGED
|
@@ -1,9 +1,18 @@
|
|
|
1
1
|
# ollama-agent-router
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Ollama Agent Router is a local LLM router for Ollama. It provides an OpenAI-compatible API gateway that routes agent and chat requests to the best local Ollama model based on task type, GPU/VRAM headroom, queue depth, loaded model state, and sync/async policy.
|
|
4
4
|
|
|
5
5
|
It is designed for machines that run several Ollama models with different strengths, for example a small triage model, one or more code models, and a larger exclusive reasoning model.
|
|
6
6
|
|
|
7
|
+
## Why use Ollama Agent Router?
|
|
8
|
+
|
|
9
|
+
Use `ollama-agent-router` when you need:
|
|
10
|
+
|
|
11
|
+
- an Ollama router for multiple local models
|
|
12
|
+
- an Ollama agent router for coding agents and autonomous workflows
|
|
13
|
+
- an OpenAI-compatible local LLM router
|
|
14
|
+
- GPU-aware routing, queues, async jobs, and model selection
|
|
15
|
+
|
|
7
16
|
## Architecture
|
|
8
17
|
|
|
9
18
|
Request flow:
|
|
@@ -85,6 +94,8 @@ Start with:
|
|
|
85
94
|
ollama-agent-router serve --config examples/gex44.yaml
|
|
86
95
|
```
|
|
87
96
|
|
|
97
|
+
`examples/gex44-secured.yaml` is the same hardware profile with the standalone plane locked down: API key required, anonymous access rejected, per-key rate limits, and the admin plane enabled on localhost. Use it as a starting point when the router is exposed beyond a single user or process.
|
|
98
|
+
|
|
88
99
|
## Config Reference
|
|
89
100
|
|
|
90
101
|
Lookup order:
|
|
@@ -97,6 +108,7 @@ Lookup order:
|
|
|
97
108
|
Top-level sections:
|
|
98
109
|
|
|
99
110
|
- `server`: host, port, base path, HTTPS certificates, and JSON body limit.
|
|
111
|
+
- `access`: optional access control for standalone, runtime-agent, and admin planes.
|
|
100
112
|
- `ollama`: base URL, OpenAI-compatible path, native API path, keep-alive, timeout.
|
|
101
113
|
- `gpu`: provider, VRAM limits, GPU-only default, NVIDIA monitor command.
|
|
102
114
|
- `router`: default mode, heavy-load thresholds, classifier config.
|
|
@@ -146,6 +158,192 @@ server:
|
|
|
146
158
|
caPath:
|
|
147
159
|
```
|
|
148
160
|
|
|
161
|
+
## Access Planes
|
|
162
|
+
|
|
163
|
+
The router can expose three separate access planes:
|
|
164
|
+
|
|
165
|
+
- **Standalone plane**: the full local OpenAI-compatible router API, including `POST /v1/chat/completions` and job endpoints.
|
|
166
|
+
- **Runtime agent plane**: machine-local endpoints used by Kong or another gateway, including `/v1/router/*` and selected-model execution.
|
|
167
|
+
- **Admin plane**: access-management endpoints under `/v1/admin/access/*`.
|
|
168
|
+
|
|
169
|
+
Access control is backward-compatible. If `access` is not configured, the standalone and runtime-agent planes stay enabled without API key requirements, matching earlier releases. The admin plane is disabled by default.
|
|
170
|
+
|
|
171
|
+
API keys are sent with:
|
|
172
|
+
|
|
173
|
+
```text
|
|
174
|
+
Authorization: Bearer <api-key>
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
`x-api-key` is also accepted for clients that cannot set bearer tokens.
|
|
178
|
+
|
|
179
|
+
Create SHA-256 key hashes before putting keys in config:
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
node -e "const crypto=require('crypto'); console.log('sha256:'+crypto.createHash('sha256').update(process.argv[1]).digest('hex'))" 'secret-value'
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
Example access configuration:
|
|
186
|
+
|
|
187
|
+
```yaml
|
|
188
|
+
access:
|
|
189
|
+
managedConfigPath: ./ollama-agent-router.access.yaml
|
|
190
|
+
bootstrapIfMissing: true
|
|
191
|
+
|
|
192
|
+
admin:
|
|
193
|
+
enabled: true
|
|
194
|
+
allowedIps: [127.0.0.1, "::1", 10.0.0.0/8]
|
|
195
|
+
trustedProxy: false
|
|
196
|
+
apiKeyHashes:
|
|
197
|
+
- sha256:replace-with-admin-key-hash
|
|
198
|
+
clientCert:
|
|
199
|
+
required: false
|
|
200
|
+
allowedFingerprints: []
|
|
201
|
+
allowedSubjects: []
|
|
202
|
+
auditLog: true
|
|
203
|
+
|
|
204
|
+
managed:
|
|
205
|
+
version: 1
|
|
206
|
+
planes:
|
|
207
|
+
standalone:
|
|
208
|
+
enabled: true
|
|
209
|
+
auth:
|
|
210
|
+
requireApiKey: true
|
|
211
|
+
anonymous: reject
|
|
212
|
+
defaultLimit:
|
|
213
|
+
requests: 60
|
|
214
|
+
windowSeconds: 60
|
|
215
|
+
runtimeAgent:
|
|
216
|
+
enabled: true
|
|
217
|
+
auth:
|
|
218
|
+
requireApiKey: true
|
|
219
|
+
anonymous: reject
|
|
220
|
+
defaultLimit:
|
|
221
|
+
requests: 600
|
|
222
|
+
windowSeconds: 60
|
|
223
|
+
apiKeys:
|
|
224
|
+
- id: local-client
|
|
225
|
+
name: Local standalone client
|
|
226
|
+
keyHash: sha256:replace-with-client-key-hash
|
|
227
|
+
enabled: true
|
|
228
|
+
scopes: [standalone]
|
|
229
|
+
limits:
|
|
230
|
+
standalone:
|
|
231
|
+
requests: 120
|
|
232
|
+
windowSeconds: 60
|
|
233
|
+
- id: kong-runtime
|
|
234
|
+
name: Kong runtime caller
|
|
235
|
+
keyHash: sha256:replace-with-kong-key-hash
|
|
236
|
+
enabled: true
|
|
237
|
+
scopes: [runtimeAgent]
|
|
238
|
+
limits:
|
|
239
|
+
runtimeAgent:
|
|
240
|
+
requests: 2000
|
|
241
|
+
windowSeconds: 60
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
`access.managed` is the initial access policy. When `access.managedConfigPath` is set, the router loads that file at startup. If the file is missing and `bootstrapIfMissing: true`, it writes the initial policy to that path. Admin API changes are then written atomically to this managed YAML file and survive restarts.
|
|
245
|
+
|
|
246
|
+
The admin plane security settings are boot-only. They are intentionally not managed through the admin API:
|
|
247
|
+
|
|
248
|
+
- `access.admin.allowedIps`
|
|
249
|
+
- `access.admin.trustedProxy`
|
|
250
|
+
- `access.admin.apiKeyHashes`
|
|
251
|
+
- `access.admin.clientCert`
|
|
252
|
+
|
|
253
|
+
This prevents the admin API from changing the rules that protect itself. When `access.admin.enabled: true`, `access.managedConfigPath` and at least one admin API key hash are required.
|
|
254
|
+
|
|
255
|
+
Admin API:
|
|
256
|
+
|
|
257
|
+
**Read the current managed access config**
|
|
258
|
+
|
|
259
|
+
```bash
|
|
260
|
+
curl http://127.0.0.1:11435/v1/admin/access/config \
|
|
261
|
+
-H 'authorization: Bearer admin-secret'
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
**Replace the entire managed access config** (planes + all keys at once)
|
|
265
|
+
|
|
266
|
+
```bash
|
|
267
|
+
curl -X PUT http://127.0.0.1:11435/v1/admin/access/config \
|
|
268
|
+
-H 'authorization: Bearer admin-secret' \
|
|
269
|
+
-H 'content-type: application/json' \
|
|
270
|
+
-d '{
|
|
271
|
+
"expectedVersion": 1,
|
|
272
|
+
"config": {
|
|
273
|
+
"planes": {
|
|
274
|
+
"standalone": {
|
|
275
|
+
"enabled": true,
|
|
276
|
+
"auth": {"requireApiKey": true, "anonymous": "reject"},
|
|
277
|
+
"defaultLimit": {"requests": 60, "windowSeconds": 60}
|
|
278
|
+
},
|
|
279
|
+
"runtimeAgent": {
|
|
280
|
+
"enabled": true,
|
|
281
|
+
"auth": {"requireApiKey": true, "anonymous": "reject"},
|
|
282
|
+
"defaultLimit": {"requests": 600, "windowSeconds": 60}
|
|
283
|
+
}
|
|
284
|
+
},
|
|
285
|
+
"apiKeys": []
|
|
286
|
+
}
|
|
287
|
+
}'
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
`expectedVersion` enables optimistic concurrency. If present and the value does not match the active managed config version, the router returns `409`.
|
|
291
|
+
|
|
292
|
+
**Add an API key**
|
|
293
|
+
|
|
294
|
+
Generate a key and its SHA-256 hash first:
|
|
295
|
+
|
|
296
|
+
```bash
|
|
297
|
+
node -e "
|
|
298
|
+
const c = require('crypto'), k = 'onr-' + c.randomBytes(20).toString('hex');
|
|
299
|
+
console.log('key: ', k);
|
|
300
|
+
console.log('hash: sha256:' + c.createHash('sha256').update(k).digest('hex'));
|
|
301
|
+
"
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
Then add the key:
|
|
305
|
+
|
|
306
|
+
```bash
|
|
307
|
+
curl -X POST http://127.0.0.1:11435/v1/admin/access/keys \
|
|
308
|
+
-H 'authorization: Bearer admin-secret' \
|
|
309
|
+
-H 'content-type: application/json' \
|
|
310
|
+
-d '{
|
|
311
|
+
"id": "user-alice",
|
|
312
|
+
"name": "Alice",
|
|
313
|
+
"keyHash": "sha256:<hash>",
|
|
314
|
+
"scopes": ["standalone"],
|
|
315
|
+
"limits": {
|
|
316
|
+
"standalone": {"requests": 100, "windowSeconds": 60}
|
|
317
|
+
}
|
|
318
|
+
}'
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
Returns `201` with the created key entry. Returns `409` if the `id` is already in use.
|
|
322
|
+
|
|
323
|
+
`scopes` controls which planes accept the key. Valid values are `standalone`, `runtimeAgent`, or both. `limits` is optional; when omitted, the plane's `defaultLimit` applies.
|
|
324
|
+
|
|
325
|
+
**Revoke an API key**
|
|
326
|
+
|
|
327
|
+
```bash
|
|
328
|
+
curl -X DELETE http://127.0.0.1:11435/v1/admin/access/keys/user-alice \
|
|
329
|
+
-H 'authorization: Bearer admin-secret'
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
Returns `200 { "revoked": { ... } }` with the removed key entry. Returns `404` if the id is not found. The change takes effect immediately without a restart.
|
|
333
|
+
|
|
334
|
+
All admin operations are written atomically to `access.managedConfigPath` and appended to the audit log when `auditLog: true`.
|
|
335
|
+
|
|
336
|
+
For admin client certificate checks, enable HTTPS, configure `server.https.caPath`, and set:
|
|
337
|
+
|
|
338
|
+
```yaml
|
|
339
|
+
access:
|
|
340
|
+
admin:
|
|
341
|
+
clientCert:
|
|
342
|
+
required: true
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
The HTTPS server requests a client certificate and the admin middleware verifies that it is trusted. Optional fingerprint and subject allowlists can narrow trust further.
|
|
346
|
+
|
|
149
347
|
## API Examples
|
|
150
348
|
|
|
151
349
|
Sync-preferred request:
|
|
@@ -197,7 +395,7 @@ curl http://127.0.0.1:11435/v1/router/gpu
|
|
|
197
395
|
|
|
198
396
|
## Kong Runtime Agent API
|
|
199
397
|
|
|
200
|
-
When used with `kong-ollama-router
|
|
398
|
+
When used with [`kong-ollama-agent-router`](https://github.com/ExeconOne/kong-ollama-agent-router), this process acts as a local runtime agent. Kong owns public request validation, classification, model selection, and response enrichment. The node-router supplies machine-local state and executes the model selected by Kong.
|
|
201
399
|
|
|
202
400
|
Kong-facing endpoints:
|
|
203
401
|
|