omnius 1.0.267 → 1.0.269
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +581 -304
- package/docs/operations/delay-analysis.md +230 -0
- package/npm-shrinkwrap.json +2 -2
- package/package.json +1 -1
|
@@ -0,0 +1,230 @@
|
|
|
1
|
+
# Unnecessary Causes of Delays Between Agent Actions
|
|
2
|
+
|
|
3
|
+
Analysis of `packages/orchestrator/src/` (95 files) — identified delay sources ranked by impact.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## HIGH IMPACT
|
|
8
|
+
|
|
9
|
+
### 1. `streaming-executor.ts:171-192` — `waitAll()` busy-poll loop
|
|
10
|
+
|
|
11
|
+
```typescript
|
|
12
|
+
async waitAll(): Promise<void> {
|
|
13
|
+
while (true) {
|
|
14
|
+
const pending = Array.from(this.tools.values()).filter(
|
|
15
|
+
e => e.state === "queued" || e.state === "executing"
|
|
16
|
+
);
|
|
17
|
+
if (pending.length === 0) break;
|
|
18
|
+
const executing = pending.filter(e => e.promise);
|
|
19
|
+
if (executing.length > 0) {
|
|
20
|
+
await Promise.allSettled(executing.map(e => e.promise!));
|
|
21
|
+
this.processQueue();
|
|
22
|
+
} else {
|
|
23
|
+
this.processQueue();
|
|
24
|
+
await new Promise(r => setTimeout(r, 1)); // ← 1ms spin loop
|
|
25
|
+
}
|
|
26
|
+
}
|
|
27
|
+
}
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
**Problem:** When tools are queued but none have started (no `promise`), the loop spins with 1ms `setTimeout` between `processQueue()` calls. Each iteration costs at least 1ms of wall-clock delay. With many queued-but-blocked tools, this adds up.
|
|
31
|
+
|
|
32
|
+
**Fix:** Replace with Promise-based notification (event emitter or `AbortController`-based wait) so the loop yields until a tool completes or a new tool is enqueued.
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
### 2. `ollama-pool.ts:850-869` — Instance readiness probe loop
|
|
37
|
+
|
|
38
|
+
```typescript
|
|
39
|
+
await new Promise((r) => setTimeout(r, 500)); // ← 500ms between probes
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
**Problem:** When spawning a new Ollama instance, the code polls `/api/version` every **500ms** with a 2-second abort timeout. This adds **500-2000ms** of delay per new model instance. If the pool is empty and a model isn't loaded, every first request pays this cost.
|
|
43
|
+
|
|
44
|
+
**Fix:** Use exponential backoff (100ms → 200ms → 400ms) or a readiness event emitter instead of fixed polling.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
### 3. `cascadeBackend.ts:105-156` — Sequential fallback chain
|
|
49
|
+
|
|
50
|
+
```typescript
|
|
51
|
+
const result = await backend.chatCompletion(request);
|
|
52
|
+
// On failure → try newBackend.chatCompletion(request);
|
|
53
|
+
// On failure → try primaryBackend.chatCompletion({ timeoutMs: 10_000 });
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
**Problem:** The cascade backend tries multiple backends in sequence. Each failure triggers a retry on the next backend. If the first backend is slow to fail (not crash), the full timeout (10s+) is paid before the fallback activates.
|
|
57
|
+
|
|
58
|
+
**Fix:** Start all backends simultaneously with `Promise.race()` and cancel losers on first success.
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## MEDIUM IMPACT
|
|
63
|
+
|
|
64
|
+
### 4. `ollama-pool.ts:1190` — Unbounded slot waiter queue
|
|
65
|
+
|
|
66
|
+
```typescript
|
|
67
|
+
await new Promise<void>((resolve) => this.slotWaiters.push(resolve));
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
**Problem:** When all GPU slots are occupied, new requests queue as promises in `slotWaiters`. They only resolve when a slot frees up — but there's no timeout or progress feedback. Under heavy load, this can stall indefinitely.
|
|
71
|
+
|
|
72
|
+
**Fix:** Add a configurable timeout (e.g., 30s) and reject with a clear "no GPU available" error.
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
### 5. `ollama-pool.ts:1336` — GPU detection on every acquire
|
|
77
|
+
|
|
78
|
+
```typescript
|
|
79
|
+
const rawGpus = await this.gpuDetector();
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
**Problem:** GPU detection runs on every placement decision. If `gpuDetector()` calls `nvidia-smi` or similar system commands, this adds **10-100ms** per request.
|
|
83
|
+
|
|
84
|
+
**Fix:** Cache GPU detection results with a TTL (e.g., 30s). GPU topology rarely changes at runtime.
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
### 6. `tool-batching.ts:247-253` — Forced serial execution of non-concurrent tools
|
|
89
|
+
|
|
90
|
+
```typescript
|
|
91
|
+
for (const call of batch.calls) {
|
|
92
|
+
results.push(await executeFn(call));
|
|
93
|
+
}
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
**Problem:** Write/shell tools are forced to run serially. If the agent produces 3+ write tools in one turn, they queue sequentially. No parallelism for independent writes.
|
|
97
|
+
|
|
98
|
+
**Fix:** Add a write-conflict analyzer (e.g., file-level locking) to allow parallel writes to different files.
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
### 7. `steeringIntake.ts:97` — 15s steering timeout
|
|
103
|
+
|
|
104
|
+
```typescript
|
|
105
|
+
timeoutMs = 15_000,
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
**Problem:** Steering intake has a 15-second timeout. If the model is slow or the prompt is large, this adds a full 15s delay before the steering response arrives.
|
|
109
|
+
|
|
110
|
+
**Fix:** Reduce to 5-8s for steering (which is advisory, not critical) or make it non-blocking with a best-effort result.
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
### 8. `verifierRunner.ts:161` — 60s test timeout
|
|
115
|
+
|
|
116
|
+
```typescript
|
|
117
|
+
timeout: 60_000,
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
**Problem:** Test execution has a 60-second timeout. If tests hang or are slow, this blocks the entire agent loop.
|
|
121
|
+
|
|
122
|
+
**Fix:** Add a per-test timeout and a total timeout with early termination on repeated failures.
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
## LOW-MEDIUM IMPACT
|
|
127
|
+
|
|
128
|
+
### 9. `ollama-pool.ts:1236,1257` — VRAM estimation per model
|
|
129
|
+
|
|
130
|
+
```typescript
|
|
131
|
+
const vramNeededMB = await this.estimateModelVramMB(model);
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
**Problem:** VRAM estimation runs before every spawn/placement. If it involves model metadata lookups or API calls, this adds latency.
|
|
135
|
+
|
|
136
|
+
**Fix:** Cache VRAM estimates keyed by model name with a TTL.
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
### 10. `streaming-executor.ts:21-26` — `stableValueKey` deep serialization
|
|
141
|
+
|
|
142
|
+
```typescript
|
|
143
|
+
function stableValueKey(value: unknown): string {
|
|
144
|
+
if (value === null || typeof value !== "object") return JSON.stringify(value);
|
|
145
|
+
if (Array.isArray(value)) return `[${value.map(stableValueKey).join(",")}]`;
|
|
146
|
+
const record = value as Record<string, unknown>;
|
|
147
|
+
return `{${Object.keys(record).sort().map((key) => ...).join(",")}}`;
|
|
148
|
+
}
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
**Problem:** Deep serialization of tool arguments for deduplication. For large args (e.g., file contents), this adds O(n) serialization cost per tool call.
|
|
152
|
+
|
|
153
|
+
**Fix:** Add a size cap (e.g., 10KB) — if the value exceeds it, use a hash instead of deep serialization.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
### 11. `streaming-executor.ts:263-291` — Duplicate detection overhead
|
|
158
|
+
|
|
159
|
+
```typescript
|
|
160
|
+
private findPriorEquivalent(entry: StreamingToolEntry): StreamingToolEntry | null
|
|
161
|
+
private cloneDuplicateResult(entry: StreamingToolEntry): boolean
|
|
162
|
+
private mirrorPriorEquivalent(entry: StreamingToolEntry): boolean
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
**Problem:** Every tool call is checked against all prior calls for equivalence. This is O(n²) in the number of tools per turn.
|
|
166
|
+
|
|
167
|
+
**Fix:** Use a hash-based index (tool name + arg hash) for O(1) lookup instead of linear scan.
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
### 12. `ollama-pool.ts:1697` — Stale process cleanup timer
|
|
172
|
+
|
|
173
|
+
```typescript
|
|
174
|
+
const handle = setTimeout(async () => {
|
|
175
|
+
const { cleanupStaleOllamaProcesses } = await import("./ollama-pool-cleanup.js");
|
|
176
|
+
const report = await cleanupStaleOllamaProcesses({ ... });
|
|
177
|
+
}, ...);
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
**Problem:** Cleanup runs as a deferred `setTimeout` — adds latency before stale processes are actually cleaned up.
|
|
181
|
+
|
|
182
|
+
**Fix:** Run cleanup eagerly when a slot is freed, not on a timer.
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
### 13. `ollama-pool.ts:298,304` — `execSync` with 3s timeout
|
|
187
|
+
|
|
188
|
+
```typescript
|
|
189
|
+
{ encoding: "utf8", timeout: 3_000 },
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
**Problem:** Synchronous child processes block the event loop. If the command takes the full 3s, the entire process stalls.
|
|
193
|
+
|
|
194
|
+
**Fix:** Use `execFile` with `Promise` wrapper or `spawn` with timeout signal.
|
|
195
|
+
|
|
196
|
+
---
|
|
197
|
+
|
|
198
|
+
### 14. `preflightSnapshot.ts:284,311` — Multiple 1.5s probe timeouts
|
|
199
|
+
|
|
200
|
+
```typescript
|
|
201
|
+
timeout: 1500
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
**Problem:** Preflight checks run multiple probes, each with 1.5s timeout. If multiple probes fail, this adds 3-6s of startup delay.
|
|
205
|
+
|
|
206
|
+
**Fix:** Run probes in parallel with `Promise.allSettled()` and use the first success.
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
### 15. `tool-batching.ts:212-240` — Concurrency limit on reads
|
|
211
|
+
|
|
212
|
+
```typescript
|
|
213
|
+
export async function withConcurrencyLimit<T>(...) {
|
|
214
|
+
// Uses a limiter to cap parallel reads
|
|
215
|
+
}
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
**Problem:** Concurrent-safe tools (reads) are limited by a concurrency cap. If the limit is too low, parallel reads are serialized unnecessarily.
|
|
219
|
+
|
|
220
|
+
**Fix:** Make the concurrency limit configurable and increase the default (e.g., from 2 to 8).
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
## Summary: Top 3 Fixes for Immediate Impact
|
|
225
|
+
|
|
226
|
+
| Priority | File | Change | Expected Savings |
|
|
227
|
+
|----------|------|--------|------------------|
|
|
228
|
+
| 1 | `streaming-executor.ts` | Replace 1ms spin loop with Promise notification | Eliminates busy-poll latency entirely |
|
|
229
|
+
| 2 | `ollama-pool.ts` | Cache GPU detection with 30s TTL | Eliminates 10-100ms per request |
|
|
230
|
+
| 3 | `cascadeBackend.ts` | Parallelize backend fallbacks with `Promise.race()` | Eliminates cascading timeout accumulation |
|
package/npm-shrinkwrap.json
CHANGED
|
@@ -1,12 +1,12 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "omnius",
|
|
3
|
-
"version": "1.0.
|
|
3
|
+
"version": "1.0.269",
|
|
4
4
|
"lockfileVersion": 3,
|
|
5
5
|
"requires": true,
|
|
6
6
|
"packages": {
|
|
7
7
|
"": {
|
|
8
8
|
"name": "omnius",
|
|
9
|
-
"version": "1.0.
|
|
9
|
+
"version": "1.0.269",
|
|
10
10
|
"bundleDependencies": [
|
|
11
11
|
"image-to-ascii"
|
|
12
12
|
],
|
package/package.json
CHANGED