omnius 1.0.267 → 1.0.269

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,230 @@
1
+ # Unnecessary Causes of Delays Between Agent Actions
2
+
3
+ Analysis of `packages/orchestrator/src/` (95 files) — identified delay sources ranked by impact.
4
+
5
+ ---
6
+
7
+ ## HIGH IMPACT
8
+
9
+ ### 1. `streaming-executor.ts:171-192` — `waitAll()` busy-poll loop
10
+
11
+ ```typescript
12
+ async waitAll(): Promise<void> {
13
+ while (true) {
14
+ const pending = Array.from(this.tools.values()).filter(
15
+ e => e.state === "queued" || e.state === "executing"
16
+ );
17
+ if (pending.length === 0) break;
18
+ const executing = pending.filter(e => e.promise);
19
+ if (executing.length > 0) {
20
+ await Promise.allSettled(executing.map(e => e.promise!));
21
+ this.processQueue();
22
+ } else {
23
+ this.processQueue();
24
+ await new Promise(r => setTimeout(r, 1)); // ← 1ms spin loop
25
+ }
26
+ }
27
+ }
28
+ ```
29
+
30
+ **Problem:** When tools are queued but none have started (no `promise`), the loop spins with 1ms `setTimeout` between `processQueue()` calls. Each iteration costs at least 1ms of wall-clock delay. With many queued-but-blocked tools, this adds up.
31
+
32
+ **Fix:** Replace with Promise-based notification (event emitter or `AbortController`-based wait) so the loop yields until a tool completes or a new tool is enqueued.
33
+
34
+ ---
35
+
36
+ ### 2. `ollama-pool.ts:850-869` — Instance readiness probe loop
37
+
38
+ ```typescript
39
+ await new Promise((r) => setTimeout(r, 500)); // ← 500ms between probes
40
+ ```
41
+
42
+ **Problem:** When spawning a new Ollama instance, the code polls `/api/version` every **500ms** with a 2-second abort timeout. This adds **500-2000ms** of delay per new model instance. If the pool is empty and a model isn't loaded, every first request pays this cost.
43
+
44
+ **Fix:** Use exponential backoff (100ms → 200ms → 400ms) or a readiness event emitter instead of fixed polling.
45
+
46
+ ---
47
+
48
+ ### 3. `cascadeBackend.ts:105-156` — Sequential fallback chain
49
+
50
+ ```typescript
51
+ const result = await backend.chatCompletion(request);
52
+ // On failure → try newBackend.chatCompletion(request);
53
+ // On failure → try primaryBackend.chatCompletion({ timeoutMs: 10_000 });
54
+ ```
55
+
56
+ **Problem:** The cascade backend tries multiple backends in sequence. Each failure triggers a retry on the next backend. If the first backend is slow to fail (not crash), the full timeout (10s+) is paid before the fallback activates.
57
+
58
+ **Fix:** Start all backends simultaneously with `Promise.race()` and cancel losers on first success.
59
+
60
+ ---
61
+
62
+ ## MEDIUM IMPACT
63
+
64
+ ### 4. `ollama-pool.ts:1190` — Unbounded slot waiter queue
65
+
66
+ ```typescript
67
+ await new Promise<void>((resolve) => this.slotWaiters.push(resolve));
68
+ ```
69
+
70
+ **Problem:** When all GPU slots are occupied, new requests queue as promises in `slotWaiters`. They only resolve when a slot frees up — but there's no timeout or progress feedback. Under heavy load, this can stall indefinitely.
71
+
72
+ **Fix:** Add a configurable timeout (e.g., 30s) and reject with a clear "no GPU available" error.
73
+
74
+ ---
75
+
76
+ ### 5. `ollama-pool.ts:1336` — GPU detection on every acquire
77
+
78
+ ```typescript
79
+ const rawGpus = await this.gpuDetector();
80
+ ```
81
+
82
+ **Problem:** GPU detection runs on every placement decision. If `gpuDetector()` calls `nvidia-smi` or similar system commands, this adds **10-100ms** per request.
83
+
84
+ **Fix:** Cache GPU detection results with a TTL (e.g., 30s). GPU topology rarely changes at runtime.
85
+
86
+ ---
87
+
88
+ ### 6. `tool-batching.ts:247-253` — Forced serial execution of non-concurrent tools
89
+
90
+ ```typescript
91
+ for (const call of batch.calls) {
92
+ results.push(await executeFn(call));
93
+ }
94
+ ```
95
+
96
+ **Problem:** Write/shell tools are forced to run serially. If the agent produces 3+ write tools in one turn, they queue sequentially. No parallelism for independent writes.
97
+
98
+ **Fix:** Add a write-conflict analyzer (e.g., file-level locking) to allow parallel writes to different files.
99
+
100
+ ---
101
+
102
+ ### 7. `steeringIntake.ts:97` — 15s steering timeout
103
+
104
+ ```typescript
105
+ timeoutMs = 15_000,
106
+ ```
107
+
108
+ **Problem:** Steering intake has a 15-second timeout. If the model is slow or the prompt is large, this adds a full 15s delay before the steering response arrives.
109
+
110
+ **Fix:** Reduce to 5-8s for steering (which is advisory, not critical) or make it non-blocking with a best-effort result.
111
+
112
+ ---
113
+
114
+ ### 8. `verifierRunner.ts:161` — 60s test timeout
115
+
116
+ ```typescript
117
+ timeout: 60_000,
118
+ ```
119
+
120
+ **Problem:** Test execution has a 60-second timeout. If tests hang or are slow, this blocks the entire agent loop.
121
+
122
+ **Fix:** Add a per-test timeout and a total timeout with early termination on repeated failures.
123
+
124
+ ---
125
+
126
+ ## LOW-MEDIUM IMPACT
127
+
128
+ ### 9. `ollama-pool.ts:1236,1257` — VRAM estimation per model
129
+
130
+ ```typescript
131
+ const vramNeededMB = await this.estimateModelVramMB(model);
132
+ ```
133
+
134
+ **Problem:** VRAM estimation runs before every spawn/placement. If it involves model metadata lookups or API calls, this adds latency.
135
+
136
+ **Fix:** Cache VRAM estimates keyed by model name with a TTL.
137
+
138
+ ---
139
+
140
+ ### 10. `streaming-executor.ts:21-26` — `stableValueKey` deep serialization
141
+
142
+ ```typescript
143
+ function stableValueKey(value: unknown): string {
144
+ if (value === null || typeof value !== "object") return JSON.stringify(value);
145
+ if (Array.isArray(value)) return `[${value.map(stableValueKey).join(",")}]`;
146
+ const record = value as Record<string, unknown>;
147
+ return `{${Object.keys(record).sort().map((key) => ...).join(",")}}`;
148
+ }
149
+ ```
150
+
151
+ **Problem:** Deep serialization of tool arguments for deduplication. For large args (e.g., file contents), this adds O(n) serialization cost per tool call.
152
+
153
+ **Fix:** Add a size cap (e.g., 10KB) — if the value exceeds it, use a hash instead of deep serialization.
154
+
155
+ ---
156
+
157
+ ### 11. `streaming-executor.ts:263-291` — Duplicate detection overhead
158
+
159
+ ```typescript
160
+ private findPriorEquivalent(entry: StreamingToolEntry): StreamingToolEntry | null
161
+ private cloneDuplicateResult(entry: StreamingToolEntry): boolean
162
+ private mirrorPriorEquivalent(entry: StreamingToolEntry): boolean
163
+ ```
164
+
165
+ **Problem:** Every tool call is checked against all prior calls for equivalence. This is O(n²) in the number of tools per turn.
166
+
167
+ **Fix:** Use a hash-based index (tool name + arg hash) for O(1) lookup instead of linear scan.
168
+
169
+ ---
170
+
171
+ ### 12. `ollama-pool.ts:1697` — Stale process cleanup timer
172
+
173
+ ```typescript
174
+ const handle = setTimeout(async () => {
175
+ const { cleanupStaleOllamaProcesses } = await import("./ollama-pool-cleanup.js");
176
+ const report = await cleanupStaleOllamaProcesses({ ... });
177
+ }, ...);
178
+ ```
179
+
180
+ **Problem:** Cleanup runs as a deferred `setTimeout` — adds latency before stale processes are actually cleaned up.
181
+
182
+ **Fix:** Run cleanup eagerly when a slot is freed, not on a timer.
183
+
184
+ ---
185
+
186
+ ### 13. `ollama-pool.ts:298,304` — `execSync` with 3s timeout
187
+
188
+ ```typescript
189
+ { encoding: "utf8", timeout: 3_000 },
190
+ ```
191
+
192
+ **Problem:** Synchronous child processes block the event loop. If the command takes the full 3s, the entire process stalls.
193
+
194
+ **Fix:** Use `execFile` with `Promise` wrapper or `spawn` with timeout signal.
195
+
196
+ ---
197
+
198
+ ### 14. `preflightSnapshot.ts:284,311` — Multiple 1.5s probe timeouts
199
+
200
+ ```typescript
201
+ timeout: 1500
202
+ ```
203
+
204
+ **Problem:** Preflight checks run multiple probes, each with 1.5s timeout. If multiple probes fail, this adds 3-6s of startup delay.
205
+
206
+ **Fix:** Run probes in parallel with `Promise.allSettled()` and use the first success.
207
+
208
+ ---
209
+
210
+ ### 15. `tool-batching.ts:212-240` — Concurrency limit on reads
211
+
212
+ ```typescript
213
+ export async function withConcurrencyLimit<T>(...) {
214
+ // Uses a limiter to cap parallel reads
215
+ }
216
+ ```
217
+
218
+ **Problem:** Concurrent-safe tools (reads) are limited by a concurrency cap. If the limit is too low, parallel reads are serialized unnecessarily.
219
+
220
+ **Fix:** Make the concurrency limit configurable and increase the default (e.g., from 2 to 8).
221
+
222
+ ---
223
+
224
+ ## Summary: Top 3 Fixes for Immediate Impact
225
+
226
+ | Priority | File | Change | Expected Savings |
227
+ |----------|------|--------|------------------|
228
+ | 1 | `streaming-executor.ts` | Replace 1ms spin loop with Promise notification | Eliminates busy-poll latency entirely |
229
+ | 2 | `ollama-pool.ts` | Cache GPU detection with 30s TTL | Eliminates 10-100ms per request |
230
+ | 3 | `cascadeBackend.ts` | Parallelize backend fallbacks with `Promise.race()` | Eliminates cascading timeout accumulation |
@@ -1,12 +1,12 @@
1
1
  {
2
2
  "name": "omnius",
3
- "version": "1.0.267",
3
+ "version": "1.0.269",
4
4
  "lockfileVersion": 3,
5
5
  "requires": true,
6
6
  "packages": {
7
7
  "": {
8
8
  "name": "omnius",
9
- "version": "1.0.267",
9
+ "version": "1.0.269",
10
10
  "bundleDependencies": [
11
11
  "image-to-ascii"
12
12
  ],
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "omnius",
3
- "version": "1.0.267",
3
+ "version": "1.0.269",
4
4
  "description": "AI coding agent powered by open-source models (Ollama/vLLM) — interactive TUI with agentic tool-calling loop",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",