@aion0/forge 0.10.22 → 0.10.25

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,482 @@
1
+ # TP Automation API — 3-Step Workflow for Dev/QA
2
+
3
+ Minimum HTTP contract for driving a `build → upgrade → run tests →
4
+ collect results` cycle against an automation testbed on TP. All
5
+ endpoints are stateless except for the durable `PytestExecution` /
6
+ `SingleDevice` rows the celery workers update — callers just POST then
7
+ poll, no websockets/sse/callbacks.
8
+
9
+ ```
10
+ <TP-base-url>/<endpoint>
11
+ ```
12
+
13
+ `<TP-base-url>`:
14
+ - Production: `https://nac-tp.fortinet-us.com`
15
+ - Test: `http://10.15.33.25:8000`
16
+ - Dev (.11): `http://10.15.33.11:8000`
17
+
18
+ ## Authentication
19
+
20
+ Every endpoint requires a JWT in the `Authorization` header:
21
+
22
+ ```
23
+ Authorization: JWT <token>
24
+ ```
25
+
26
+ Mint a token:
27
+
28
+ ```bash
29
+ T=$(curl -s -X POST <TP-base-url>/token-auth/ \
30
+ -H 'Content-Type: application/json' \
31
+ --data-binary @- <<JSON | jq -r .token
32
+ {"username":"<user>","password":"<pw>"}
33
+ JSON
34
+ )
35
+ ```
36
+
37
+ In examples below `$T` stands for the JWT.
38
+
39
+ > **Note on prod:** `/token-auth/` is **not** exposed at the public
40
+ > reverse proxy on prod (SSO-only). Automation scripts on prod-facing
41
+ > endpoints need to be run from a host that can reach the internal
42
+ > port, or they need an SSO-issued token. On the dev `.11` server,
43
+ > `/token-auth/` works directly with username + password.
44
+
45
+ ---
46
+
47
+ ## Step 1 — Upgrade the testbed
48
+
49
+ ```
50
+ POST /adc/automation/upgrade/ — kick off (HTTP 202)
51
+ GET /adc/automation/upgrade/<testbed>/ — poll status
52
+ ```
53
+
54
+ The dev team has three modes available; pick whichever matches your
55
+ build pipeline:
56
+
57
+ | `mode` | When to use it | Required extra fields |
58
+ |---|---|---|
59
+ | `command` | You have a Jenkins-built image and Jenkins already produces the full `execute restore image scp ...` command string. | `command` |
60
+ | `build` | You have a build number from the official build server. | `build_number` |
61
+ | `ga` | You want the latest GA build for a specific FortiNAC version. The system resolves the build number via `BuildHistory.GA=True`. | `version` |
62
+
63
+ ### Request
64
+
65
+ Content-Type: `application/json`.
66
+
67
+ ```json
68
+ {
69
+ "testbed": "AT16_Combined_FSW",
70
+ "mode": "command",
71
+ "command": "execute restore image scp /var/lib/jenkins/jobs/fortinac-build-7.6-1/builds/3983/archive/nacos/FNAC_ESX-v7-build6956-FORTINET_job1_3983.out 10.15.33.5 jenkins fortinet"
72
+ }
73
+ ```
74
+
75
+ ```json
76
+ {
77
+ "testbed": "AT16_Combined_FSW",
78
+ "mode": "build",
79
+ "build_number": "0815"
80
+ }
81
+ ```
82
+
83
+ ```json
84
+ {
85
+ "testbed": "AT16_Combined_FSW",
86
+ "mode": "ga",
87
+ "version": "7.6.5"
88
+ }
89
+ ```
90
+
91
+ Internal behavior:
92
+
93
+ 1. Looks up the testbed's `deployinfo`, extracts every device whose
94
+ name contains `nac` or `ncm`. A multi-NAC testbed gets one celery
95
+ task per device, running in parallel.
96
+ 2. `mode=ga` resolves `version` → `build_number` via `BuildHistory`,
97
+ then dispatches the same path as `mode=build`.
98
+ 3. Concurrency guard: HTTP **409** if any target IP is already
99
+ `PROGRESS`.
100
+ 4. `mode=command` validates the command starts with
101
+ `execute restore image` (security guard — keeps arbitrary CLI from
102
+ riding through this endpoint).
103
+
104
+ ### Response (HTTP 202)
105
+
106
+ ```json
107
+ {
108
+ "testbed": "AT16_Combined_FSW",
109
+ "mode": "command",
110
+ "build_number": "",
111
+ "target_ips": ["10.15.52.152"],
112
+ "tasks": [
113
+ {"ip": "10.15.52.152", "celery_id": "f62a4cb7-d435-41de-adb6-12cca405d507"}
114
+ ]
115
+ }
116
+ ```
117
+
118
+ Returns in <500 ms. The actual upgrade work runs asynchronously in
119
+ celery workers — the HTTP call returns the moment the task is queued.
120
+
121
+ ### Poll for status
122
+
123
+ ```
124
+ GET /adc/automation/upgrade/<testbed>/
125
+ ```
126
+
127
+ ```json
128
+ {
129
+ "testbed": "AT16_Combined_FSW",
130
+ "status": "PROGRESS",
131
+ "target_ips": ["10.15.52.152"],
132
+ "per_device": [
133
+ {
134
+ "ip": "10.15.52.152",
135
+ "status": "PROGRESS",
136
+ "last_task_type": "command",
137
+ "last_build_number": null,
138
+ "updated_at": "2026-05-28T18:14:19+00:00",
139
+ "log_tail": "...last 2 KB of the NAC's SSH output..."
140
+ }
141
+ ]
142
+ }
143
+ ```
144
+
145
+ Aggregate `status` values:
146
+
147
+ | Value | Meaning |
148
+ |---|---|
149
+ | `UNKNOWN` | No upgrade has been dispatched against this testbed (yet). |
150
+ | `PROGRESS` | At least one device is still mid-upgrade. |
151
+ | `SUCCESS` | Every device finished with `SUCCESS`. |
152
+ | `FAILURE` | At least one device finished with `FAILURE` / `TIMEOUT` (and no device is still in flight). |
153
+
154
+ `per_device[].log_tail` is the last ~2 KB of the device's SSH session
155
+ output, updated every ~3 s by the worker. Useful for live debugging
156
+ without flooding the response.
157
+
158
+ ### Concrete polling loop
159
+
160
+ ```bash
161
+ while :; do
162
+ R=$(curl -sH "Authorization: JWT $T" $TP/adc/automation/upgrade/AT16_Combined_FSW/)
163
+ STATUS=$(echo "$R" | jq -r .status)
164
+ echo "$(date +%H:%M:%S) $STATUS"
165
+ case "$STATUS" in
166
+ PROGRESS) sleep 15 ;;
167
+ SUCCESS) echo "upgrade complete"; break ;;
168
+ FAILURE) echo "upgrade FAILED"; echo "$R" | jq .per_device; exit 1 ;;
169
+ *) echo "$R"; exit 1 ;;
170
+ esac
171
+ done
172
+ ```
173
+
174
+ A real upgrade takes 2-15 minutes depending on the image and how many
175
+ devices the testbed has.
176
+
177
+ ---
178
+
179
+ ## Step 2 — Run pytest cases on that testbed
180
+
181
+ ```
182
+ POST /adc/automation/pytest/ — kick off (HTTP 202)
183
+ GET /adc/automation/pytest/<exec_id>/ — poll status + results
184
+ ```
185
+
186
+ ### Request
187
+
188
+ Content-Type: `application/json`.
189
+
190
+ ```json
191
+ {
192
+ "user": "<TP username>",
193
+ "lab": "AT16_Combined_FSW",
194
+ "testcase": [
195
+ "/Tests_CLI/test_cli_sanity.py::TestGetBasic::test_get_system_status"
196
+ ],
197
+ "argument": "-vv --tb=short",
198
+ "extra_pytest_options": ""
199
+ }
200
+ ```
201
+
202
+ | Field | Type | Notes |
203
+ |---|---|---|
204
+ | `user` | string | TP username of the caller. Must own the lab (be in `users` or `usedby` on the `AutomationTBUser` row). |
205
+ | `lab` | string | AT lab name. Same `<testbed>` value as Step 1. |
206
+ | `testcase` | **list of strings** | Each entry is a pytest test-id path **relative to the tests repo root** (`/root/fnac_auto/tests` on the controller). Leading slash is OK; both `/Tests_CLI/...` and `Tests_CLI/...` work. The handler iterates the list, joins with spaces. |
207
+ | `argument` | string | Raw pytest CLI args injected between the testcase paths and the framework's `--html`/`--rack-file` flags. `-k`, `-m`, `-vv`, `--tb=short`, `--maxfail=N`, etc. |
208
+ | `extra_pytest_options` | string | (optional) Extra options appended *after* `--html`/`--rack-file` on the real run. Use for env-specific flags that shouldn't apply to the `--collect-only` dry run. |
209
+
210
+ ### Response (HTTP 202)
211
+
212
+ ```json
213
+ {
214
+ "exec_id": 446,
215
+ "status": "Initiating",
216
+ "lab": "AT16_Combined_FSW",
217
+ "testcase_count": 1
218
+ }
219
+ ```
220
+
221
+ Returns in ~300 ms. Save `exec_id` — it's the handle for Step 3.
222
+
223
+ ### What runs on the controller
224
+
225
+ The celery worker SFTP-uploads this script to the controller VM and
226
+ launches it under `nohup setsid`:
227
+
228
+ ```bash
229
+ #!/bin/bash
230
+ set +e
231
+ cd /root/fnac_auto/tests && git pull origin main || true
232
+ cd /root/fnac_auto/test-framework && git pull origin main || true
233
+ cd /root/fnac_auto
234
+ source venv/bin/activate
235
+ export PYTHONPATH=/root/fnac_auto/test-framework:/root/fnac_auto/tests
236
+ export DISPLAY=:99
237
+ pytest <expanded testcase paths> <argument> --collect-only
238
+ pytest <expanded testcase paths> <argument> --html=<report> --rack-file <rack> <extra_pytest_options>
239
+ echo $? > /tmp/pytest_<exec_id>.exit
240
+ ```
241
+
242
+ Key properties:
243
+
244
+ - **Pytest is fully detached** from the launching ssh session via
245
+ `nohup setsid`. The ssh session can drop, the network can blip, the
246
+ TP backend can restart — pytest keeps running on the controller.
247
+ - **TP polls via short, separate ssh round-trips** to read the log
248
+ file and check for the terminal `/tmp/pytest_<exec_id>.exit` marker.
249
+ - **Surviving a TP backend restart** is automatic — the celery worker
250
+ is a separate process and just keeps polling. Verified end-to-end on
251
+ `.11` (exec_id 447: killed runserver mid-run, exec completed
252
+ normally and report was fetched back).
253
+
254
+ ### Concurrency guard
255
+
256
+ If another `PytestExecution` on the same lab has `status` =
257
+ `Running` or `Initiating`, the POST returns HTTP **409**:
258
+
259
+ ```json
260
+ {"error": "another execution is already in flight on 'AT16_Combined_FSW'"}
261
+ ```
262
+
263
+ ---
264
+
265
+ ## Step 3 — Read per-test results
266
+
267
+ ```
268
+ GET /adc/automation/pytest/<exec_id>/
269
+ ```
270
+
271
+ Same endpoint as the Step-2 poll — there's nothing distinct to call.
272
+
273
+ ### Response
274
+
275
+ ```json
276
+ {
277
+ "exec_id": 446,
278
+ "status": "Done",
279
+ "lab": "AT16_Combined_FSW",
280
+ "controller": "10.15.52.159",
281
+ "remote_pid": "177854",
282
+ "pass_count": 1,
283
+ "fail_count": 0,
284
+ "skip_count": 0,
285
+ "error_count": 0,
286
+ "total_count": 1,
287
+ "start_timestamp": "1780090011",
288
+ "end_timestamp": "1780090041",
289
+ "report_file_path": "data/AutomationTest/test_446_report_2026-05-29_21:26:50.html",
290
+ "log_file_path": "/tmp/pytest_446.log",
291
+ "log_size": 25338,
292
+ "log_tail": "...last 4 KB of pytest stdout..."
293
+ }
294
+ ```
295
+
296
+ | Field | Meaning |
297
+ |---|---|
298
+ | `status` | One of `Initiating`, `Running`, `Done`, `Failed`, `Cancelled`. **Treat `Done` as terminal — success/failure is inferred from counts.** |
299
+ | `pass_count`, `fail_count`, `skip_count`, `error_count`, `total_count` | Parsed from pytest's own summary line. Populated **mid-run** (updated every 15 s), so you can see partial progress without waiting for the run to finish. |
300
+ | `start_timestamp` / `end_timestamp` | Unix epoch seconds, written by the worker. |
301
+ | `report_file_path` | Server-relative path to the pytest-html report. Fetch via `<TP>/<report_file_path>` with the JWT for per-test rows + tracebacks. |
302
+ | `log_file_path` | On-controller path to the live log. Mostly informational — use `log_tail` to read the last 4 KB without an extra round-trip. |
303
+ | `log_tail` | Last 4 KB of pytest stdout. |
304
+ | `remote_pid` | The detached pytest tree's pid on the controller. Useful if you ever need to ssh to the controller and kill it manually. |
305
+
306
+ ### Concrete polling loop
307
+
308
+ ```bash
309
+ EXEC=$(curl -sX POST -H "Authorization: JWT $T" \
310
+ -H 'Content-Type: application/json' \
311
+ -d '{"user":"alice","lab":"AT16_Combined_FSW","testcase":["/Tests_CLI/test_cli_sanity.py::TestGetBasic::test_get_system_status"],"argument":"-vv"}' \
312
+ $TP/adc/automation/pytest/ | jq -r .exec_id)
313
+ echo "started exec $EXEC"
314
+
315
+ while :; do
316
+ R=$(curl -sH "Authorization: JWT $T" $TP/adc/automation/pytest/$EXEC/)
317
+ STATUS=$(echo "$R" | jq -r .status)
318
+ COUNTS=$(echo "$R" | jq -r '"P\(.pass_count)/F\(.fail_count)/S\(.skip_count)/E\(.error_count) of T\(.total_count)"')
319
+ echo "$(date +%H:%M:%S) $STATUS $COUNTS"
320
+ case "$STATUS" in
321
+ Initiating|Running) sleep 15 ;;
322
+ Done)
323
+ FAIL=$(echo "$R" | jq -r .fail_count)
324
+ ERR=$(echo "$R" | jq -r .error_count)
325
+ if [ "$FAIL" -gt 0 ] || [ "$ERR" -gt 0 ]; then
326
+ echo "tests had failures — fetch report at $TP/$(echo "$R" | jq -r .report_file_path)"
327
+ exit 1
328
+ fi
329
+ echo "all pass"; break ;;
330
+ Failed|Cancelled)
331
+ echo "execution terminated: $STATUS"
332
+ echo "$R" | jq -r .log_tail; exit 1 ;;
333
+ esac
334
+ done
335
+ ```
336
+
337
+ ---
338
+
339
+ ## End-to-end script
340
+
341
+ ```bash
342
+ #!/usr/bin/env bash
343
+ set -euo pipefail
344
+ TP=http://10.15.33.11:8000
345
+ USER=admin
346
+ TESTBED=AT16_Combined_FSW
347
+
348
+ T=$(curl -s -X POST $TP/token-auth/ -H 'Content-Type: application/json' \
349
+ --data-binary @- <<JSON | jq -r .token
350
+ {"username":"$USER","password":"<your-pw>"}
351
+ JSON
352
+ )
353
+
354
+ # ---- Step 1: upgrade testbed ----
355
+ COMMAND='execute restore image scp /var/lib/jenkins/jobs/fortinac-build-7.6-1/builds/3983/archive/nacos/FNAC_ESX-v7-build6956-FORTINET_job1_3983.out 10.15.33.5 jenkins fortinet'
356
+
357
+ curl -s -X POST -H "Authorization: JWT $T" \
358
+ -H 'Content-Type: application/json' \
359
+ -d "$(jq -n --arg cmd "$COMMAND" --arg tb "$TESTBED" \
360
+ '{testbed:$tb, mode:"command", command:$cmd}')" \
361
+ $TP/adc/automation/upgrade/ > /dev/null
362
+
363
+ while :; do
364
+ R=$(curl -sH "Authorization: JWT $T" $TP/adc/automation/upgrade/$TESTBED/)
365
+ S=$(echo "$R" | jq -r .status)
366
+ echo "upgrade: $S"
367
+ [ "$S" = "SUCCESS" ] && break
368
+ [ "$S" = "FAILURE" ] && { echo "$R" | jq .per_device; exit 1; }
369
+ sleep 30
370
+ done
371
+
372
+ # ---- Step 2 + 3: run tests, collect results ----
373
+ EXEC=$(curl -s -X POST -H "Authorization: JWT $T" \
374
+ -H 'Content-Type: application/json' \
375
+ -d "$(jq -n --arg tb "$TESTBED" --arg user "$USER" \
376
+ '{user:$user, lab:$tb,
377
+ testcase:["/Tests_CLI/test_cli_sanity.py::TestGetBasic::test_get_system_status"],
378
+ argument:"-vv --tb=short"}')" \
379
+ $TP/adc/automation/pytest/ | jq -r .exec_id)
380
+ echo "started exec $EXEC"
381
+
382
+ while :; do
383
+ R=$(curl -sH "Authorization: JWT $T" $TP/adc/automation/pytest/$EXEC/)
384
+ S=$(echo "$R" | jq -r .status)
385
+ echo "pytest: $S $(echo "$R" | jq -r '"P\(.pass_count)/F\(.fail_count) of T\(.total_count)"')"
386
+ case "$S" in
387
+ Done)
388
+ F=$(echo "$R" | jq -r .fail_count)
389
+ E=$(echo "$R" | jq -r .error_count)
390
+ [ "$F" -gt 0 ] || [ "$E" -gt 0 ] && {
391
+ echo "report: $TP/$(echo "$R" | jq -r .report_file_path)"
392
+ exit 1
393
+ }
394
+ echo "PASS"; break ;;
395
+ Failed|Cancelled)
396
+ echo "exec terminated: $S"; echo "$R" | jq -r .log_tail; exit 1 ;;
397
+ *) sleep 15 ;;
398
+ esac
399
+ done
400
+ ```
401
+
402
+ ---
403
+
404
+ ## Behaviors worth knowing
405
+
406
+ ### Concurrency
407
+
408
+ For each `<testbed>`:
409
+ - **Upgrade**: HTTP 409 if any target IP currently has
410
+ `SingleDevice.last_task_status='PROGRESS'`.
411
+ - **Pytest**: HTTP 409 if any `PytestExecution` with this `lab` has
412
+ `status` in `{Running, Initiating}`.
413
+
414
+ You can have an upgrade AND a pytest running simultaneously on
415
+ *different* testbeds. The guards are per-testbed.
416
+
417
+ ### Survival semantics
418
+
419
+ | Failure | Upgrade | Pytest v2 |
420
+ |---|---|---|
421
+ | ssh session drops mid-run | N/A — TP only ssh's briefly | Pytest keeps running (`nohup setsid` on controller) |
422
+ | TP backend restart mid-run | Celery worker keeps going, completes the run, writes terminal state | Same — celery `pytest_poll` keeps short-polling the controller |
423
+ | Controller VM reboot mid-run | Upgrade reboots the NAC; this is expected/desired. | Pytest is lost (controller died); polling will eventually fail with the max-runtime safety net (6 h cap). |
424
+ | Celery worker restart | In-flight task may be lost (celery uses ack_late=false by default). Worth a follow-up. | Same. |
425
+
426
+ ### Per-test pass/fail visibility
427
+
428
+ The `passed_nodes` / `failed_nodes` fields on the legacy `pytest_run`
429
+ response shape don't apply to v2. v2 surfaces:
430
+
431
+ - **aggregate counts** (`pass_count`, `fail_count`, etc.) in the GET
432
+ response — populated mid-run as pytest emits them
433
+ - **the HTML report** at `report_file_path` — the standard
434
+ pytest-html document with per-test rows, durations, tracebacks, and
435
+ captured logs. Fetch via `<TP>/<report_file_path>` with the JWT.
436
+
437
+ If you need per-test JSON instead of the HTML report, ask — it's a
438
+ trivial extra endpoint that parses the HTML and returns
439
+ `{passed: [...], failed: [...]}`.
440
+
441
+ ### Test-case path format
442
+
443
+ Whatever you'd type after `pytest` on the command line, prefixed (or
444
+ not) with `/`. Relative to `/root/fnac_auto/tests/` on the controller.
445
+
446
+ | Want to | Use |
447
+ |---|---|
448
+ | Run one test | `["/Tests_CLI/test_cli_sanity.py::TestGetBasic::test_get_system_status"]` |
449
+ | Run several | `["/Tests_CLI/test_cli_sanity.py::TestGetBasic::test_get_system_status", "/Tests_CLI/test_cli_sanity.py::TestShowFullConfig::test_nacos_show_config"]` |
450
+ | Run a whole file | `["/Tests_CLI/test_cli_sanity.py"]` |
451
+ | Run by marker | `["/Tests_CLI"]` + `argument: "-m smoke"` |
452
+ | Dry-run / list | `["/Tests_CLI"]` + `argument: "--collect-only"` |
453
+
454
+ Tip: discover paths via `GET /adc/get_testcases` (legacy endpoint;
455
+ returns the full test tree). Note: this endpoint does a `git pull` on
456
+ TP's local mirror of the tests repo as a side effect — fine in normal
457
+ use, just be aware.
458
+
459
+ ---
460
+
461
+ ## Source files
462
+
463
+ | Concern | File |
464
+ |---|---|
465
+ | URL routes | `backend/adc/urls.py` |
466
+ | Upgrade API | `backend/adc/views/automation/upgrade_api.py` |
467
+ | Upgrade tasks | `backend/adc/tasks.py` (`upgrade_nac_command`, `upgrade_nac_build`) |
468
+ | Pytest v2 API | `backend/adc/views/automation/pytest_api.py` |
469
+ | Pytest v2 tasks | `backend/adc/tasks.py` (`pytest_launch`, `pytest_poll`) |
470
+ | Per-device state | `backend/adc/models.py` (`SingleDevice`, `PytestExecution`) |
471
+ | GA-build resolution | `backend/adc/models.py` (`BuildHistory.GA=True`) |
472
+
473
+ Legacy endpoints (still live, used by the `/automation` UI page, kept
474
+ for backward compatibility):
475
+
476
+ - `POST /adc/nac-upgrade-testbed/`
477
+ - `POST /adc/pytest_run`
478
+ - `POST /adc/get_test_execution_by_id`
479
+
480
+ Avoid these for new automation — they're synchronous-in-the-request,
481
+ don't survive TP restarts, and the upgrade variant blocks the HTTP
482
+ request for the full restore duration.
@@ -26,6 +26,7 @@ import {
26
26
  import { getMemoryStore } from './memory-store';
27
27
  import { buildMemoryContext } from './build-memory-context';
28
28
  import { buildMemoryTools } from './memory-tools';
29
+ import { buildStartWatchTool } from '../watch/start-watch-tool';
29
30
  import { estimateTokens } from '../memory/token-estimate';
30
31
  import {
31
32
  listInstalledConnectors,
@@ -48,10 +49,25 @@ const MAX_TOKENS = 16000;
48
49
  // and recalled via buildMemoryContext as compact blocks instead.
49
50
  const HISTORY_MSG_BUDGET = 60;
50
51
  const HISTORY_TOKEN_BUDGET = 8000;
52
+ // Hard cap on a single tool_result stored into the conversation (chars).
53
+ // A giant result (e.g. a connector returning a full test tree) would
54
+ // otherwise blow the whole HISTORY_TOKEN_BUDGET, push its paired
55
+ // assistant tool_use out of the window, and leave an orphan tool_result
56
+ // that trimOrphanToolResults strips — yielding an empty history and an
57
+ // "messages must not be empty" provider error. ~16k chars ≈ 4k tokens,
58
+ // half the budget, so a complete tool_use+result pair always survives.
59
+ const MAX_TOOL_RESULT_CHARS = 16000;
51
60
 
52
61
  // After clipping to last N, the first kept message may be a tool_result
53
62
  // whose tool_use was cut. Anthropic/OpenAI both reject that, so drop
54
63
  // leading tool_result-bearing user messages until the slice starts clean.
64
+ function truncateToolResult(s: string): string {
65
+ if (s.length <= MAX_TOOL_RESULT_CHARS) return s;
66
+ return s.slice(0, MAX_TOOL_RESULT_CHARS) +
67
+ `\n\n[… tool result truncated: ${s.length} chars total, showing first ${MAX_TOOL_RESULT_CHARS}. ` +
68
+ `Refine the call (filter / paginate / flatten) to get a smaller, complete result.]`;
69
+ }
70
+
55
71
  function trimOrphanToolResults(history: Message[]): Message[] {
56
72
  let i = 0;
57
73
  while (i < history.length) {
@@ -73,6 +89,7 @@ export interface AgentEvent {
73
89
  | 'message_saved' // a full message persisted (assistant or tool-results carrier)
74
90
  | 'memory_status' // pinned/blocks/hits snapshot from Temper for the UI strip
75
91
  | 'turn_done' // loop finished
92
+ | 'watch_status' // ambient background-watch progress (status chip, NOT a message)
76
93
  | 'error'; // unrecoverable
77
94
  message_id?: string;
78
95
  data?: any;
@@ -392,6 +409,11 @@ export async function runTurn(args: RunTurnArgs): Promise<{ ok: boolean; error?:
392
409
  const memHandlers: Record<string, BuiltinHandler> = {};
393
410
  for (const t of memTools) memHandlers[t.def.name] = t.handle;
394
411
 
412
+ // start_watch — LLM-driven background watch (always available). Bound
413
+ // to this session so completion reports back here.
414
+ const watchTool = buildStartWatchTool(args.sessionId);
415
+ memHandlers[watchTool.def.name] = watchTool.handle;
416
+
395
417
  if (memStore.enabled) {
396
418
  // Inspector strip (memory_status event) wants the full inventory —
397
419
  // keep its own listBlocks call. The prompt-injection text comes
@@ -466,6 +488,7 @@ export async function runTurn(args: RunTurnArgs): Promise<{ ok: boolean; error?:
466
488
  const builtinDefsAll = [
467
489
  ...BUILTIN_TOOL_DEFS,
468
490
  ...memTools.map((m) => m.def),
491
+ watchTool.def,
469
492
  ];
470
493
  const allTools: LlmTool[] = [
471
494
  ...builtinDefsAll.map((t) => ({
@@ -500,6 +523,13 @@ export async function runTurn(args: RunTurnArgs): Promise<{ ok: boolean; error?:
500
523
  const history = trimOrphanToolResults(
501
524
  listMessagesCapped(args.sessionId, HISTORY_MSG_BUDGET, HISTORY_TOKEN_BUDGET, estimateTokens),
502
525
  );
526
+ // Belt-and-suspenders: tool_result truncation should keep a complete
527
+ // pair in-window, but if history is somehow empty, fail clearly
528
+ // instead of letting the provider throw "messages must not be empty".
529
+ if (history.length === 0) {
530
+ cb({ type: 'error', data: { error: 'Conversation context is empty after trimming an oversized result. Clear the chat or retry with a narrower query.' } });
531
+ return { ok: false, error: 'empty history' };
532
+ }
503
533
 
504
534
  assistantBlocksAccum = [];
505
535
  let currentTextBuf = '';
@@ -543,11 +573,11 @@ export async function runTurn(args: RunTurnArgs): Promise<{ ok: boolean; error?:
543
573
  const toolUses = result.content.filter((b): b is ToolUseBlock => b.type === 'tool_use');
544
574
  const toolResults: ToolResultBlock[] = [];
545
575
  for (const t of toolUses) {
546
- const r = await dispatchTool({ id: t.id, name: t.name, input: t.input }, memHandlers);
576
+ const r = await dispatchTool({ id: t.id, name: t.name, input: t.input }, { extraBuiltins: memHandlers, sessionId: args.sessionId });
547
577
  const block: ToolResultBlock = {
548
578
  type: 'tool_result',
549
579
  tool_use_id: t.id,
550
- content: r.content,
580
+ content: truncateToolResult(r.content),
551
581
  is_error: r.is_error,
552
582
  };
553
583
  toolResults.push(block);
@@ -483,6 +483,12 @@ export interface DispatchOptions {
483
483
  * therefore don't need an LLM-friendly truncation.
484
484
  */
485
485
  noTruncation?: boolean;
486
+ /** Chat session that triggered this call — used to register a watch
487
+ * (async tools) bound to the right session for completion callbacks. */
488
+ sessionId?: string;
489
+ /** Remaining chain budget for async watch callbacks. Defaults to the
490
+ * full depth at the top level; decremented when a watch chains a tool. */
491
+ chainDepth?: number;
486
492
  }
487
493
 
488
494
  export async function dispatchTool(
@@ -567,30 +573,64 @@ export async function dispatchTool(
567
573
  }
568
574
 
569
575
  try {
576
+ let result: ToolResult;
570
577
  switch (protocol) {
571
578
  case 'http':
572
- return await runHttp({ tool: located.tool, settings: effectiveSettings, args: argInput, connectorAuth: def.auth, noTruncation: opts.noTruncation });
579
+ result = await runHttp({ tool: located.tool, settings: effectiveSettings, args: argInput, connectorAuth: def.auth, noTruncation: opts.noTruncation });
580
+ break;
573
581
  case 'shell':
574
- return await runShell({ tool: located.tool, settings: effectiveSettings, args: argInput });
582
+ result = await runShell({ tool: located.tool, settings: effectiveSettings, args: argInput });
583
+ break;
575
584
  case 'ssh':
576
- return await runSsh({ tool: located.tool, settings: effectiveSettings, args: argInput });
585
+ result = await runSsh({ tool: located.tool, settings: effectiveSettings, args: argInput });
586
+ break;
577
587
  case 'browser': {
578
588
  // Hand the whole connector + tool spec + input + settings to the
579
589
  // extension's runner.ts via the bridge. The extension keeps owning
580
590
  // the runner logic (tab acquire, navigate, executeScript).
581
591
  const connector = buildConnectorPayload(def, located.entry, effectiveSettings);
582
- const result = (await bridgeRpc('connector.run', {
592
+ const r = (await bridgeRpc('connector.run', {
583
593
  pluginId: located.connectorId, // wire-name kept for extension
584
594
  toolName: located.toolName,
585
595
  input: argInput,
586
596
  connector,
587
597
  settings: effectiveSettings,
588
598
  }, located.tool.timeout_ms)) as { content?: string; is_error?: boolean } | null;
589
- return { content: result?.content ?? '(no content returned)', is_error: !!result?.is_error };
599
+ result = { content: r?.content ?? '(no content returned)', is_error: !!r?.is_error };
600
+ break;
590
601
  }
591
602
  default:
592
603
  return { content: `unknown protocol "${protocol}" on tool ${call.name}`, is_error: true };
593
604
  }
605
+
606
+ // Async (long-task watch): if the tool declared an `async` block and
607
+ // it ran without error, register a background watch that polls to
608
+ // completion and reports back to the originating chat session. The
609
+ // tool's own result is returned to the caller immediately (detach).
610
+ if (located.tool.async && !result.is_error) {
611
+ try {
612
+ const { registerWatch, DEFAULT_CHAIN_DEPTH } = await import('../watch/register');
613
+ let parsed: unknown = result.content;
614
+ try { parsed = JSON.parse(result.content); } catch { /* keep string */ }
615
+ const reg = registerWatch({
616
+ spec: located.tool.async,
617
+ connectorId: located.connectorId,
618
+ toolName: located.toolName,
619
+ args: argInput,
620
+ result: parsed,
621
+ settings: effectiveSettings,
622
+ sessionId: opts.sessionId ?? null,
623
+ chainDepth: opts.chainDepth ?? DEFAULT_CHAIN_DEPTH,
624
+ });
625
+ const note = reg.ok
626
+ ? `\n\n[watch ${reg.watch_id} registered — polling in the background; you'll get a chat update on completion.]`
627
+ : `\n\n[watch not registered: ${reg.reason}]`;
628
+ result = { ...result, content: result.content + note };
629
+ } catch (e) {
630
+ console.warn('[dispatch] registerWatch failed', (e as Error).message);
631
+ }
632
+ }
633
+ return result;
594
634
  } catch (e) {
595
635
  return { content: `connector tool failed: ${(e as Error).message}`, is_error: true };
596
636
  }
@@ -37,6 +37,7 @@ import {
37
37
  } from './chat/session-store';
38
38
  import { runTurn, type AgentEvent } from './chat/agent-loop';
39
39
  import { bridgePush } from './chat/bridge-client';
40
+ import { startWatchRunner } from './watch/watch-runner';
40
41
 
41
42
  const PORT = Number(process.env.CHAT_PORT) || 8408;
42
43
  const startTime = Date.now();
@@ -302,6 +303,17 @@ httpServer.listen(PORT, '127.0.0.1', () => {
302
303
  const main = ensureMainSession();
303
304
  console.log(`[chat] Main session: ${main.id.slice(0, 8)} "${main.title}"`);
304
305
  } catch (e) { console.warn('[chat] ensureMainSession failed:', (e as Error).message); }
306
+
307
+ // Background long-task watches: poll to completion, then feed the
308
+ // result back into the originating session (assistant replies) or push
309
+ // ambient progress (status chip, not a message).
310
+ startWatchRunner({
311
+ onProgress: (sessionId, payload) => fanoutEvent(sessionId, { type: 'watch_status', data: payload }),
312
+ runChat: (sessionId, text) => {
313
+ void runTurn({ sessionId, userText: text, callbacks: { onEvent: (e) => fanoutEvent(sessionId, e) } })
314
+ .catch((err) => console.error('[watch] runChat failed', (err as Error).message));
315
+ },
316
+ });
305
317
  });
306
318
 
307
319
  function shutdown(): void {