ollama-agent-router 0.1.7 → 0.1.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +266 -2
- package/dist/cli.js +80 -35
- package/dist/cli.js.map +1 -1
- package/dist/index.d.ts +83 -11
- package/dist/index.js +81 -35
- package/dist/index.js.map +1 -1
- package/examples/gex44-secured.yaml +187 -0
- package/examples/gex44.yaml +1 -5
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -94,6 +94,221 @@ Start with:
|
|
|
94
94
|
ollama-agent-router serve --config examples/gex44.yaml
|
|
95
95
|
```
|
|
96
96
|
|
|
97
|
+
`examples/gex44-secured.yaml` is the same hardware profile with the standalone plane locked down: API key required, anonymous access rejected, per-key rate limits, and the admin plane enabled on localhost. Use it as a starting point when the router is exposed beyond a single user or process.
|
|
98
|
+
|
|
99
|
+
## Routing Algorithm
|
|
100
|
+
|
|
101
|
+
### Candidate selection
|
|
102
|
+
|
|
103
|
+
For every request the router builds a candidate list from three sources, merged in order:
|
|
104
|
+
|
|
105
|
+
1. `router.preferredModels` from the request — added first, regardless of `routes`.
|
|
106
|
+
2. `routes[taskType]` — the ordered list for the classified task type.
|
|
107
|
+
3. Any model whose `purpose` or `tags` array contains the task type — acts as a catch-all fallback.
|
|
108
|
+
|
|
109
|
+
Models listed in `router.forbiddenModels` are dropped from the candidate list entirely.
|
|
110
|
+
|
|
111
|
+
### Blocking checks
|
|
112
|
+
|
|
113
|
+
Before scoring, each candidate is checked for hard blocks:
|
|
114
|
+
|
|
115
|
+
- **`gpu_only`** — `requireGpuOnly` is set (globally or per-request) and the model is not fully on GPU, has a CPU/GPU split in `ollama ps`, or there is not enough free VRAM to load it.
|
|
116
|
+
- **`busy`** — the model has `exclusive: true` and is already running, or `allowWhenBusy: false` and has reached `maxConcurrent`.
|
|
117
|
+
|
|
118
|
+
Blocked models are excluded from sync selection but can still be picked for async jobs.
|
|
119
|
+
|
|
120
|
+
### Scoring
|
|
121
|
+
|
|
122
|
+
Every non-blocked candidate receives a numeric score. Higher score wins. Starting value: **100**.
|
|
123
|
+
|
|
124
|
+
| Component | Delta | Notes |
|
|
125
|
+
|---|---|---|
|
|
126
|
+
| Route position | `+50` for index 0, `−8` per step | First entry in `routes[taskType]` gets the full bonus |
|
|
127
|
+
| `model.priority` | `+priority` | Set per model, 1–100 |
|
|
128
|
+
| `purpose` match | `+25` | Model's `purpose` array contains the task type |
|
|
129
|
+
| `preferredModels` | `+80` for index 0, `−10` per step | Request-level override |
|
|
130
|
+
| Already loaded in Ollama | **`+20`** | Model appears in `ollama ps` output |
|
|
131
|
+
| Heavy complexity + `costClass: high` | `+20` | Classifier returned `heavy`; rewards large models |
|
|
132
|
+
| Light complexity + `costClass: low` | `+15` | Classifier returned `light`; rewards small models |
|
|
133
|
+
| Free VRAM headroom | `+0..+25` | Scales with `(freeMb − requiredMb) / 512`, capped at 25 |
|
|
134
|
+
| Insufficient VRAM | **`−60`** | `model.sizeGb × 1024 + vramSafetyReserveMb > freeMb` |
|
|
135
|
+
| Queue depth | `−18 × queueDepth` | Per-model queue length |
|
|
136
|
+
| Running count | `−25 × running` | Per-model active executions |
|
|
137
|
+
| Exclusive + running | `−80 × running` additional | `exclusive: true` models penalised heavily while in use |
|
|
138
|
+
|
|
139
|
+
The candidate with the highest score is selected. The others appear in `fallbackModels` in the response.
|
|
140
|
+
|
|
141
|
+
### Model config fields that affect routing
|
|
142
|
+
|
|
143
|
+
```yaml
|
|
144
|
+
models:
|
|
145
|
+
- name: gpt-oss:20b
|
|
146
|
+
sizeGb: 14.0 # used for VRAM headroom calculation
|
|
147
|
+
purpose: [agentic_reasoning, large_context, planning, tool_use, complex_debugging]
|
|
148
|
+
# +25 score when task type matches; also adds model to the candidate list
|
|
149
|
+
priority: 95 # added directly to score; use to rank models of similar capability
|
|
150
|
+
maxConcurrent: 1 # hard cap on parallel executions
|
|
151
|
+
costClass: high # low | medium | high — matched against request complexity for bonus/penalty
|
|
152
|
+
exclusive: true # if running, gets −80 extra penalty per execution; only one at a time
|
|
153
|
+
allowWhenBusy: false # if false and maxConcurrent reached → blocked entirely
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
**`purpose`** — declares what the model can do. Each entry that matches the request's task type adds `+25` to the score and also makes the model a candidate even when it is not listed in `routes[taskType]`. Use it for every task type the model handles well, including secondary ones (e.g. add `agentic_reasoning` to a coder model that works as a capable fallback).
|
|
157
|
+
|
|
158
|
+
**`costClass`** — signals the relative weight of the model:
|
|
159
|
+
- `high`: gets `+20` when the classifier decides the request is complex (`heavy`). Intended for large reasoning models.
|
|
160
|
+
- `low`: gets `+15` when the request is simple (`light`). Intended for small triage/chat models.
|
|
161
|
+
- `medium`: no complexity bonus in either direction.
|
|
162
|
+
|
|
163
|
+
**`exclusive`** — intended for large models that cannot safely share GPU memory with another concurrent execution. While one request is running, the model accumulates `−80` per running job on top of the standard `−25`, making it effectively unselectable for sync requests until free.
|
|
164
|
+
|
|
165
|
+
### `routes` config and its relation to scoring
|
|
166
|
+
|
|
167
|
+
```yaml
|
|
168
|
+
routes:
|
|
169
|
+
agentic_reasoning: [gpt-oss:20b, qwen2.5-coder:7b]
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Order matters: `gpt-oss:20b` at index 0 gets `+50`, `qwen2.5-coder:7b` at index 1 gets `+42`. Each additional position costs `−8`.
|
|
173
|
+
|
|
174
|
+
A model does not need to be in `routes` to be selected — if it declares the task type in `purpose` or `tags` it will still enter the candidate list (with a route-position score of 0).
|
|
175
|
+
|
|
176
|
+
### Sync vs async decision
|
|
177
|
+
|
|
178
|
+
After scoring, the router checks whether to run synchronously or push to the async queue:
|
|
179
|
+
|
|
180
|
+
1. If `router.mode: async` — always async.
|
|
181
|
+
2. If heavy load is detected (total queue depth ≥ `router.heavyLoadQueueDepth` **or** free VRAM < `router.heavyLoadGpuFreeMbThreshold`) and `allowAsync: true` — async.
|
|
182
|
+
3. If the top-scored model is busy and `allowAsync: true` — async on that model.
|
|
183
|
+
4. Otherwise — sync on the top-scored model.
|
|
184
|
+
|
|
185
|
+
`allowAsync` defaults to `true`. Set `"router": {"mode": "sync"}` in the request to force synchronous execution regardless of load.
|
|
186
|
+
|
|
187
|
+
### Forcing a specific model
|
|
188
|
+
|
|
189
|
+
`preferredModels` adds `+80` to the first entry, making it win unless blocked by VRAM or busy constraints. `forbiddenModels` removes models from the candidate list entirely — useful when testing a specific model in isolation.
|
|
190
|
+
|
|
191
|
+
### Request examples
|
|
192
|
+
|
|
193
|
+
**Explicit task type — let the router pick the best model for the task:**
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
curl -s http://127.0.0.1:11435/v1/chat/completions \
|
|
197
|
+
-H 'content-type: application/json' \
|
|
198
|
+
-H 'authorization: Bearer <api-key>' \
|
|
199
|
+
-d '{
|
|
200
|
+
"model": "auto",
|
|
201
|
+
"messages": [{"role": "user", "content": "Plan a multi-service refactor"}],
|
|
202
|
+
"router": {
|
|
203
|
+
"taskType": "agentic_reasoning"
|
|
204
|
+
}
|
|
205
|
+
}'
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
**Explicit task type with async fallback on heavy load:**
|
|
209
|
+
|
|
210
|
+
```bash
|
|
211
|
+
curl -s http://127.0.0.1:11435/v1/chat/completions \
|
|
212
|
+
-H 'content-type: application/json' \
|
|
213
|
+
-H 'authorization: Bearer <api-key>' \
|
|
214
|
+
-d '{
|
|
215
|
+
"model": "auto",
|
|
216
|
+
"messages": [{"role": "user", "content": "Plan a multi-service refactor"}],
|
|
217
|
+
"router": {
|
|
218
|
+
"taskType": "agentic_reasoning",
|
|
219
|
+
"allowAsync": true
|
|
220
|
+
}
|
|
221
|
+
}'
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
Returns `202` with a job id when load is high; `200` with the result when run synchronously.
|
|
225
|
+
|
|
226
|
+
**Force a specific model, block all others:**
|
|
227
|
+
|
|
228
|
+
```bash
|
|
229
|
+
curl -s http://127.0.0.1:11435/v1/chat/completions \
|
|
230
|
+
-H 'content-type: application/json' \
|
|
231
|
+
-H 'authorization: Bearer <api-key>' \
|
|
232
|
+
-d '{
|
|
233
|
+
"model": "auto",
|
|
234
|
+
"messages": [{"role": "user", "content": "Review this PR diff"}],
|
|
235
|
+
"router": {
|
|
236
|
+
"taskType": "code_review",
|
|
237
|
+
"preferredModels": ["gpt-oss:20b"],
|
|
238
|
+
"forbiddenModels": ["qwen2.5-coder:7b", "deepseek-coder:6.7b"]
|
|
239
|
+
}
|
|
240
|
+
}'
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
**Force sync, no async fallback even under load:**
|
|
244
|
+
|
|
245
|
+
```bash
|
|
246
|
+
curl -s http://127.0.0.1:11435/v1/chat/completions \
|
|
247
|
+
-H 'content-type: application/json' \
|
|
248
|
+
-H 'authorization: Bearer <api-key>' \
|
|
249
|
+
-d '{
|
|
250
|
+
"model": "auto",
|
|
251
|
+
"messages": [{"role": "user", "content": "Fix the off-by-one error"}],
|
|
252
|
+
"router": {
|
|
253
|
+
"taskType": "code_fix",
|
|
254
|
+
"mode": "sync",
|
|
255
|
+
"allowAsync": false
|
|
256
|
+
}
|
|
257
|
+
}'
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
**High priority request — jumps ahead in the queue:**
|
|
261
|
+
|
|
262
|
+
```bash
|
|
263
|
+
curl -s http://127.0.0.1:11435/v1/chat/completions \
|
|
264
|
+
-H 'content-type: application/json' \
|
|
265
|
+
-H 'authorization: Bearer <api-key>' \
|
|
266
|
+
-d '{
|
|
267
|
+
"model": "auto",
|
|
268
|
+
"messages": [{"role": "user", "content": "Summarize this log"}],
|
|
269
|
+
"router": {
|
|
270
|
+
"taskType": "summarize",
|
|
271
|
+
"priority": "high"
|
|
272
|
+
}
|
|
273
|
+
}'
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
**GPU-only — reject if model would run on CPU or with a CPU/GPU split:**
|
|
277
|
+
|
|
278
|
+
```bash
|
|
279
|
+
curl -s http://127.0.0.1:11435/v1/chat/completions \
|
|
280
|
+
-H 'content-type: application/json' \
|
|
281
|
+
-H 'authorization: Bearer <api-key>' \
|
|
282
|
+
-d '{
|
|
283
|
+
"model": "auto",
|
|
284
|
+
"messages": [{"role": "user", "content": "Generate a REST API scaffold"}],
|
|
285
|
+
"router": {
|
|
286
|
+
"taskType": "code_generate",
|
|
287
|
+
"requireGpuOnly": true
|
|
288
|
+
}
|
|
289
|
+
}'
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
Returns `503` if no GPU-only candidate is available.
|
|
293
|
+
|
|
294
|
+
**Check what the router decided** — every `200` response includes a `router` object:
|
|
295
|
+
|
|
296
|
+
```json
|
|
297
|
+
{
|
|
298
|
+
"router": {
|
|
299
|
+
"mode": "sync",
|
|
300
|
+
"taskType": "agentic_reasoning",
|
|
301
|
+
"selectedModel": "gpt-oss:20b",
|
|
302
|
+
"fallbackModels": ["gpt-oss:20b", "qwen2.5-coder:7b"],
|
|
303
|
+
"queueTimeMs": 3,
|
|
304
|
+
"executionTimeMs": 8420,
|
|
305
|
+
"decisionReason": "Selected gpt-oss:20b for agentic_reasoning with score 290.0"
|
|
306
|
+
}
|
|
307
|
+
}
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
`decisionReason` includes the winning score, which helps diagnose unexpected model selection — compare it against the scoring table above to see which component tipped the balance.
|
|
311
|
+
|
|
97
312
|
## Config Reference
|
|
98
313
|
|
|
99
314
|
Lookup order:
|
|
@@ -252,17 +467,22 @@ This prevents the admin API from changing the rules that protect itself. When `a
|
|
|
252
467
|
|
|
253
468
|
Admin API:
|
|
254
469
|
|
|
470
|
+
**Read the current managed access config**
|
|
471
|
+
|
|
255
472
|
```bash
|
|
256
473
|
curl http://127.0.0.1:11435/v1/admin/access/config \
|
|
257
474
|
-H 'authorization: Bearer admin-secret'
|
|
475
|
+
```
|
|
476
|
+
|
|
477
|
+
**Replace the entire managed access config** (planes + all keys at once)
|
|
258
478
|
|
|
479
|
+
```bash
|
|
259
480
|
curl -X PUT http://127.0.0.1:11435/v1/admin/access/config \
|
|
260
481
|
-H 'authorization: Bearer admin-secret' \
|
|
261
482
|
-H 'content-type: application/json' \
|
|
262
483
|
-d '{
|
|
263
484
|
"expectedVersion": 1,
|
|
264
485
|
"config": {
|
|
265
|
-
"version": 1,
|
|
266
486
|
"planes": {
|
|
267
487
|
"standalone": {
|
|
268
488
|
"enabled": true,
|
|
@@ -280,7 +500,51 @@ curl -X PUT http://127.0.0.1:11435/v1/admin/access/config \
|
|
|
280
500
|
}'
|
|
281
501
|
```
|
|
282
502
|
|
|
283
|
-
|
|
503
|
+
`expectedVersion` enables optimistic concurrency. If present and the value does not match the active managed config version, the router returns `409`.
|
|
504
|
+
|
|
505
|
+
**Add an API key**
|
|
506
|
+
|
|
507
|
+
Generate a key and its SHA-256 hash first:
|
|
508
|
+
|
|
509
|
+
```bash
|
|
510
|
+
node -e "
|
|
511
|
+
const c = require('crypto'), k = 'onr-' + c.randomBytes(20).toString('hex');
|
|
512
|
+
console.log('key: ', k);
|
|
513
|
+
console.log('hash: sha256:' + c.createHash('sha256').update(k).digest('hex'));
|
|
514
|
+
"
|
|
515
|
+
```
|
|
516
|
+
|
|
517
|
+
Then add the key:
|
|
518
|
+
|
|
519
|
+
```bash
|
|
520
|
+
curl -X POST http://127.0.0.1:11435/v1/admin/access/keys \
|
|
521
|
+
-H 'authorization: Bearer admin-secret' \
|
|
522
|
+
-H 'content-type: application/json' \
|
|
523
|
+
-d '{
|
|
524
|
+
"id": "user-alice",
|
|
525
|
+
"name": "Alice",
|
|
526
|
+
"keyHash": "sha256:<hash>",
|
|
527
|
+
"scopes": ["standalone"],
|
|
528
|
+
"limits": {
|
|
529
|
+
"standalone": {"requests": 100, "windowSeconds": 60}
|
|
530
|
+
}
|
|
531
|
+
}'
|
|
532
|
+
```
|
|
533
|
+
|
|
534
|
+
Returns `201` with the created key entry. Returns `409` if the `id` is already in use.
|
|
535
|
+
|
|
536
|
+
`scopes` controls which planes accept the key. Valid values are `standalone`, `runtimeAgent`, or both. `limits` is optional; when omitted, the plane's `defaultLimit` applies.
|
|
537
|
+
|
|
538
|
+
**Revoke an API key**
|
|
539
|
+
|
|
540
|
+
```bash
|
|
541
|
+
curl -X DELETE http://127.0.0.1:11435/v1/admin/access/keys/user-alice \
|
|
542
|
+
-H 'authorization: Bearer admin-secret'
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
Returns `200 { "revoked": { ... } }` with the removed key entry. Returns `404` if the id is not found. The change takes effect immediately without a restart.
|
|
546
|
+
|
|
547
|
+
All admin operations are written atomically to `access.managedConfigPath` and appended to the audit log when `auditLog: true`.
|
|
284
548
|
|
|
285
549
|
For admin client certificate checks, enable HTTPS, configure `server.https.caPath`, and set:
|
|
286
550
|
|
package/dist/cli.js
CHANGED
|
@@ -44,6 +44,17 @@ var protectedPlaneConfigSchema = z.object({
|
|
|
44
44
|
}).default({ requireApiKey: false, anonymous: "allow" }),
|
|
45
45
|
defaultLimit: trafficLimitSchema.optional()
|
|
46
46
|
});
|
|
47
|
+
var apiKeySchema = z.object({
|
|
48
|
+
id: z.string().min(1).regex(/^[a-zA-Z0-9._:-]+$/, "api key id may contain only letters, numbers, dots, underscores, colons, and dashes"),
|
|
49
|
+
name: z.string().min(1).optional(),
|
|
50
|
+
keyHash: z.string().regex(/^sha256:[a-fA-F0-9]{64}$/, "keyHash must use sha256:<64 hex chars>"),
|
|
51
|
+
enabled: z.boolean().default(true),
|
|
52
|
+
scopes: z.array(z.enum(["standalone", "runtimeAgent"])).min(1),
|
|
53
|
+
limits: z.object({
|
|
54
|
+
standalone: trafficLimitSchema.optional(),
|
|
55
|
+
runtimeAgent: trafficLimitSchema.optional()
|
|
56
|
+
}).optional()
|
|
57
|
+
});
|
|
47
58
|
var managedAccessConfigSchema = z.object({
|
|
48
59
|
version: z.number().int().nonnegative().default(1),
|
|
49
60
|
updatedAt: z.string().datetime().optional(),
|
|
@@ -60,19 +71,7 @@ var managedAccessConfigSchema = z.object({
|
|
|
60
71
|
standalone: { enabled: true, auth: { requireApiKey: false, anonymous: "allow" } },
|
|
61
72
|
runtimeAgent: { enabled: true, auth: { requireApiKey: false, anonymous: "allow" } }
|
|
62
73
|
}),
|
|
63
|
-
apiKeys: z.array(
|
|
64
|
-
z.object({
|
|
65
|
-
id: z.string().min(1).regex(/^[a-zA-Z0-9._:-]+$/, "api key id may contain only letters, numbers, dots, underscores, colons, and dashes"),
|
|
66
|
-
name: z.string().min(1).optional(),
|
|
67
|
-
keyHash: z.string().regex(/^sha256:[a-fA-F0-9]{64}$/, "keyHash must use sha256:<64 hex chars>"),
|
|
68
|
-
enabled: z.boolean().default(true),
|
|
69
|
-
scopes: z.array(z.enum(["standalone", "runtimeAgent"])).min(1),
|
|
70
|
-
limits: z.object({
|
|
71
|
-
standalone: trafficLimitSchema.optional(),
|
|
72
|
-
runtimeAgent: trafficLimitSchema.optional()
|
|
73
|
-
}).optional()
|
|
74
|
-
})
|
|
75
|
-
).default([])
|
|
74
|
+
apiKeys: z.array(apiKeySchema).default([])
|
|
76
75
|
});
|
|
77
76
|
var defaultManagedAccessConfig = managedAccessConfigSchema.parse({});
|
|
78
77
|
var adminPlaneConfigSchema = z.object({
|
|
@@ -159,8 +158,7 @@ var modelSpecSchema = z2.object({
|
|
|
159
158
|
timeoutMs: z2.number().int().positive(),
|
|
160
159
|
costClass: z2.enum(["low", "medium", "high"]).default("medium"),
|
|
161
160
|
exclusive: z2.boolean().default(false),
|
|
162
|
-
allowWhenBusy: z2.boolean().default(false)
|
|
163
|
-
tags: z2.array(z2.string()).default([])
|
|
161
|
+
allowWhenBusy: z2.boolean().default(false)
|
|
164
162
|
});
|
|
165
163
|
var appConfigSchema = z2.object({
|
|
166
164
|
server: z2.object({
|
|
@@ -364,7 +362,6 @@ models:
|
|
|
364
362
|
costClass: low
|
|
365
363
|
exclusive: false
|
|
366
364
|
allowWhenBusy: true
|
|
367
|
-
tags: [general]
|
|
368
365
|
routes:
|
|
369
366
|
triage: [llama3.2:3b]
|
|
370
367
|
simple_chat: [llama3.2:3b]
|
|
@@ -962,8 +959,7 @@ function buildModelSpec(name, role, sizeGb, cpuOnly) {
|
|
|
962
959
|
timeoutMs: heavy ? 3e5 : code ? 18e4 : 9e4,
|
|
963
960
|
costClass: heavy ? "high" : code ? "medium" : "low",
|
|
964
961
|
exclusive: heavy,
|
|
965
|
-
allowWhenBusy: !heavy
|
|
966
|
-
tags: tagsForRole(role)
|
|
962
|
+
allowWhenBusy: !heavy
|
|
967
963
|
};
|
|
968
964
|
}
|
|
969
965
|
function purposesForRole(role) {
|
|
@@ -981,21 +977,6 @@ function purposesForRole(role) {
|
|
|
981
977
|
return ["triage", "simple_chat", "summarize"];
|
|
982
978
|
}
|
|
983
979
|
}
|
|
984
|
-
function tagsForRole(role) {
|
|
985
|
-
switch (role) {
|
|
986
|
-
case "code":
|
|
987
|
-
return ["code", "fallback"];
|
|
988
|
-
case "review":
|
|
989
|
-
return ["code", "review"];
|
|
990
|
-
case "heavy":
|
|
991
|
-
return ["reasoning", "large_context"];
|
|
992
|
-
case "tool":
|
|
993
|
-
return ["tool_use"];
|
|
994
|
-
case "fast":
|
|
995
|
-
default:
|
|
996
|
-
return ["fast", "chat"];
|
|
997
|
-
}
|
|
998
|
-
}
|
|
999
980
|
function generateRoutes(models) {
|
|
1000
981
|
const fast = models.filter((model) => model.costClass === "low").map((model) => model.name);
|
|
1001
982
|
const code = models.filter((model) => model.purpose.includes("code_generate")).map((model) => model.name);
|
|
@@ -1551,7 +1532,6 @@ var RoutingEngine = class {
|
|
|
1551
1532
|
score += Math.max(0, 50 - routeIndex * 8);
|
|
1552
1533
|
score += model.priority;
|
|
1553
1534
|
if (model.purpose.includes(context.classification.taskType)) score += 25;
|
|
1554
|
-
if (model.tags.includes(context.classification.taskType)) score += 15;
|
|
1555
1535
|
if (preferredIndex >= 0) score += 80 - preferredIndex * 10;
|
|
1556
1536
|
if (loaded) score += 20;
|
|
1557
1537
|
if (context.classification.complexity === "heavy" && model.costClass === "high") score += 20;
|
|
@@ -1573,7 +1553,7 @@ var RoutingEngine = class {
|
|
|
1573
1553
|
for (const name of context.router.preferredModels) names.add(name);
|
|
1574
1554
|
for (const name of routeNames) names.add(name);
|
|
1575
1555
|
for (const model of this.config.models) {
|
|
1576
|
-
if (model.purpose.includes(context.classification.taskType)
|
|
1556
|
+
if (model.purpose.includes(context.classification.taskType)) {
|
|
1577
1557
|
names.add(model.name);
|
|
1578
1558
|
}
|
|
1579
1559
|
}
|
|
@@ -1652,6 +1632,51 @@ var AccessControlStore = class {
|
|
|
1652
1632
|
});
|
|
1653
1633
|
return structuredClone(updated);
|
|
1654
1634
|
}
|
|
1635
|
+
async addApiKey(input2) {
|
|
1636
|
+
const key = apiKeySchema.parse(input2);
|
|
1637
|
+
await this.enqueueWrite(async () => {
|
|
1638
|
+
if (this.managed.apiKeys.some((k) => k.id === key.id)) {
|
|
1639
|
+
throw new AccessHttpError(409, `API key with id '${key.id}' already exists`);
|
|
1640
|
+
}
|
|
1641
|
+
if (!this.access.managedConfigPath) {
|
|
1642
|
+
throw new AccessHttpError(500, "access.managedConfigPath is not configured");
|
|
1643
|
+
}
|
|
1644
|
+
const next = {
|
|
1645
|
+
...this.managed,
|
|
1646
|
+
version: this.managed.version + 1,
|
|
1647
|
+
updatedAt: (/* @__PURE__ */ new Date()).toISOString(),
|
|
1648
|
+
apiKeys: [...this.managed.apiKeys, key]
|
|
1649
|
+
};
|
|
1650
|
+
await writeManagedAccessConfig(this.access.managedConfigPath, next);
|
|
1651
|
+
this.managed = next;
|
|
1652
|
+
this.access.managed = next;
|
|
1653
|
+
});
|
|
1654
|
+
return structuredClone(key);
|
|
1655
|
+
}
|
|
1656
|
+
async revokeApiKey(id) {
|
|
1657
|
+
let removed;
|
|
1658
|
+
await this.enqueueWrite(async () => {
|
|
1659
|
+
const idx = this.managed.apiKeys.findIndex((k) => k.id === id);
|
|
1660
|
+
if (idx === -1) {
|
|
1661
|
+
throw new AccessHttpError(404, `API key '${id}' not found`);
|
|
1662
|
+
}
|
|
1663
|
+
if (!this.access.managedConfigPath) {
|
|
1664
|
+
throw new AccessHttpError(500, "access.managedConfigPath is not configured");
|
|
1665
|
+
}
|
|
1666
|
+
removed = this.managed.apiKeys[idx];
|
|
1667
|
+
const next = {
|
|
1668
|
+
...this.managed,
|
|
1669
|
+
version: this.managed.version + 1,
|
|
1670
|
+
updatedAt: (/* @__PURE__ */ new Date()).toISOString(),
|
|
1671
|
+
apiKeys: this.managed.apiKeys.filter((_, i) => i !== idx)
|
|
1672
|
+
};
|
|
1673
|
+
await writeManagedAccessConfig(this.access.managedConfigPath, next);
|
|
1674
|
+
this.managed = next;
|
|
1675
|
+
this.access.managed = next;
|
|
1676
|
+
this.limiter.clear();
|
|
1677
|
+
});
|
|
1678
|
+
return structuredClone(removed);
|
|
1679
|
+
}
|
|
1655
1680
|
publicMiddleware(planeOrPlanes) {
|
|
1656
1681
|
return (req, res, next) => {
|
|
1657
1682
|
const planes = Array.isArray(planeOrPlanes) ? planeOrPlanes : [planeOrPlanes];
|
|
@@ -1962,6 +1987,26 @@ function createApp(config, deps) {
|
|
|
1962
1987
|
next(error);
|
|
1963
1988
|
}
|
|
1964
1989
|
});
|
|
1990
|
+
api.post("/v1/admin/access/keys", adminAccess, async (req, res, next) => {
|
|
1991
|
+
try {
|
|
1992
|
+
const key = await access3.addApiKey(req.body);
|
|
1993
|
+
auditAdmin(config.access.admin, req, "success", "key_added", res.locals.admin?.remoteIp, key.id);
|
|
1994
|
+
res.status(201).json(key);
|
|
1995
|
+
} catch (error) {
|
|
1996
|
+
auditAdmin(config.access.admin, req, "failure", error instanceof Error ? error.message : String(error), res.locals.admin?.remoteIp);
|
|
1997
|
+
next(error);
|
|
1998
|
+
}
|
|
1999
|
+
});
|
|
2000
|
+
api.delete("/v1/admin/access/keys/:id", adminAccess, async (req, res, next) => {
|
|
2001
|
+
try {
|
|
2002
|
+
const revoked = await access3.revokeApiKey(req.params.id);
|
|
2003
|
+
auditAdmin(config.access.admin, req, "success", "key_revoked", res.locals.admin?.remoteIp, revoked.id);
|
|
2004
|
+
res.json({ revoked });
|
|
2005
|
+
} catch (error) {
|
|
2006
|
+
auditAdmin(config.access.admin, req, "failure", error instanceof Error ? error.message : String(error), res.locals.admin?.remoteIp);
|
|
2007
|
+
next(error);
|
|
2008
|
+
}
|
|
2009
|
+
});
|
|
1965
2010
|
api.get("/v1/router/capabilities", runtimeAgentAccess, (_req, res) => {
|
|
1966
2011
|
res.json(buildCapabilities(config));
|
|
1967
2012
|
});
|