npm - @newsails/veil-cli - Versions diffs - 1.0.1 - Mend

@newsails/veil-cli 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (199) hide show

package/.veil/agents/analyst/AGENT.md +21 -0
package/.veil/agents/analyst/agent.json +23 -0
package/.veil/agents/assistant/AGENT.md +15 -0
package/.veil/agents/assistant/agent.json +19 -0
package/.veil/agents/coder/AGENT.md +18 -0
package/.veil/agents/coder/agent.json +19 -0
package/.veil/agents/hello/AGENT.md +5 -0
package/.veil/agents/hello/agent.json +13 -0
package/.veil/agents/writer/AGENT.md +12 -0
package/.veil/agents/writer/agent.json +17 -0
package/.veil/memory/MEMORY.md +343 -0
package/.veil/memory/agents/analyst/MEMORY.md +55 -0
package/.veil/memory/agents/hello/MEMORY.md +12 -0
package/.veil/runtime.pid +1 -0
package/.veil/settings.json +10 -0
package/.veil-studio/studio.db +0 -0
package/.veil-studio/studio.db-shm +0 -0
package/.veil-studio/studio.db-wal +0 -0
package/PLAN/01-vision.md +26 -0
package/PLAN/02-tech-stack.md +94 -0
package/PLAN/03-agents.md +232 -0
package/PLAN/04-runtime.md +171 -0
package/PLAN/05-tools.md +211 -0
package/PLAN/06-communication.md +243 -0
package/PLAN/07-storage.md +218 -0
package/PLAN/08-api-cli.md +153 -0
package/PLAN/09-permissions.md +108 -0
package/PLAN/10-ably.md +105 -0
package/PLAN/11-file-formats.md +442 -0
package/PLAN/12-folder-structure.md +205 -0
package/PLAN/13-operations.md +212 -0
package/PLAN/README.md +23 -0
package/README.md +128 -0
package/REPORT.md +174 -0
package/TODO.md +45 -0
package/ai-tests/FRONTEND_PROMPT.md +220 -0
package/ai-tests/Research & Planning.md +814 -0
package/ai-tests/prompt-001-basic-api.md +230 -0
package/ai-tests/prompt-002-basic-flows.md +230 -0
package/ai-tests/prompt-003-agent-behaviors.md +220 -0
package/api/middleware.js +60 -0
package/api/routes/agents.js +193 -0
package/api/routes/chat.js +93 -0
package/api/routes/completions.js +122 -0
package/api/routes/daemons.js +80 -0
package/api/routes/memory.js +169 -0
package/api/routes/models.js +40 -0
package/api/routes/remote-methods.js +74 -0
package/api/routes/sessions.js +208 -0
package/api/routes/settings.js +108 -0
package/api/routes/system.js +50 -0
package/api/routes/tasks.js +270 -0
package/api/server.js +120 -0
package/cli/formatter.js +70 -0
package/cli/index.js +443 -0
package/cli/parser.js +113 -0
package/config/config.json +10 -0
package/config/models.json +6826 -0
package/core/agent.js +329 -0
package/core/cancel.js +38 -0
package/core/compaction.js +176 -0
package/core/events.js +13 -0
package/core/loop.js +564 -0
package/core/memory.js +51 -0
package/core/prompt.js +185 -0
package/core/queue.js +96 -0
package/core/registry.js +291 -0
package/core/remote-methods.js +124 -0
package/core/router.js +386 -0
package/core/running-sessions.js +18 -0
package/docs/api/01-system.md +84 -0
package/docs/api/02-agents.md +374 -0
package/docs/api/03-chat.md +269 -0
package/docs/api/04-tasks.md +470 -0
package/docs/api/05-sessions.md +444 -0
package/docs/api/06-daemons.md +142 -0
package/docs/api/07-memory.md +186 -0
package/docs/api/08-settings.md +133 -0
package/docs/api/09-models.md +119 -0
package/docs/api/09-websocket.md +350 -0
package/docs/api/10-completions.md +134 -0
package/docs/api/README.md +116 -0
package/docs/guide/01-quickstart.md +220 -0
package/docs/guide/02-folder-structure.md +185 -0
package/docs/guide/03-configuration.md +252 -0
package/docs/guide/04-agents.md +267 -0
package/docs/guide/05-cli.md +290 -0
package/docs/guide/06-tools.md +643 -0
package/docs/guide/07-permissions.md +236 -0
package/docs/guide/08-memory.md +139 -0
package/docs/guide/09-multi-agent.md +271 -0
package/docs/guide/10-daemons.md +226 -0
package/docs/guide/README.md +53 -0
package/docs/index.html +623 -0
package/examples/README.md +151 -0
package/examples/agents/assistant/AGENT.md +31 -0
package/examples/agents/assistant/SOUL.md +9 -0
package/examples/agents/assistant/agent.json +74 -0
package/examples/agents/hello/AGENT.md +15 -0
package/examples/agents/hello/agent.json +14 -0
package/examples/agents/monitor/AGENT.md +51 -0
package/examples/agents/monitor/agent.json +33 -0
package/examples/agents/monitor/heartbeats/monitor.md +24 -0
package/examples/agents/orchestrator/AGENT.md +70 -0
package/examples/agents/orchestrator/agent.json +30 -0
package/examples/agents/researcher/AGENT.md +52 -0
package/examples/agents/researcher/agent.json +49 -0
package/examples/agents/researcher/skills/web-research.md +28 -0
package/examples/skills/code-review.md +72 -0
package/examples/skills/summarise.md +59 -0
package/examples/skills/web-research.md +42 -0
package/examples/tools/word-count/index.js +27 -0
package/examples/tools/word-count/tool.json +18 -0
package/infrastructure/database.js +563 -0
package/infrastructure/scheduler.js +122 -0
package/llm/client.js +206 -0
package/migrations/001-initial.sql +121 -0
package/migrations/002-debuggability.sql +13 -0
package/migrations/003-drop-orphaned-columns.sql +72 -0
package/migrations/004-session-message-token-fields.sql +78 -0
package/migrations/005-session-thinking.sql +5 -0
package/package.json +30 -0
package/schemas/agent.json +143 -0
package/schemas/settings.json +111 -0
package/scripts/fetch-models.js +93 -0
package/session-debug-scenario.md +248 -0
package/settings/fields.js +52 -0
package/system-prompts/base-core.md +7 -0
package/system-prompts/environment.md +13 -0
package/system-prompts/reminders/anti-drift.md +6 -0
package/system-prompts/reminders/stall-recovery.md +10 -0
package/system-prompts/safety-rules.md +25 -0
package/system-prompts/task-heuristics.md +27 -0
package/test/client.js +71 -0
package/test/integration/01-health.test.js +25 -0
package/test/integration/02-agents.test.js +80 -0
package/test/integration/03-chat-hello.test.js +48 -0
package/test/integration/04-chat-multiturn.test.js +61 -0
package/test/integration/05-chat-writer.test.js +48 -0
package/test/integration/06-task-basic.test.js +68 -0
package/test/integration/07-task-tools.test.js +74 -0
package/test/integration/08-task-code-analysis.test.js +69 -0
package/test/integration/09-memory-analyst.test.js +63 -0
package/test/integration/10-task-advanced.test.js +85 -0
package/test/integration/11-sessions-advanced.test.js +84 -0
package/test/integration/12-assistant-chat-tools.test.js +75 -0
package/test/integration/13-edge-cases.test.js +99 -0
package/test/integration/14-cancel.test.js +62 -0
package/test/integration/15-debug.test.js +106 -0
package/test/integration/16-memory-api.test.js +83 -0
package/test/integration/17-settings-api.test.js +41 -0
package/test/integration/18-tool-search-activation.test.js +119 -0
package/test/results/.gitkeep +0 -0
package/test/runner.js +206 -0
package/test/smoke.js +216 -0
package/tools/agent_message.js +85 -0
package/tools/agent_send.js +80 -0
package/tools/agent_spawn.js +44 -0
package/tools/bash.js +49 -0
package/tools/edit_file.js +41 -0
package/tools/glob.js +64 -0
package/tools/grep.js +82 -0
package/tools/list_dir.js +63 -0
package/tools/log_write.js +31 -0
package/tools/memory_read.js +38 -0
package/tools/memory_search.js +65 -0
package/tools/memory_write.js +42 -0
package/tools/read_file.js +48 -0
package/tools/sleep.js +22 -0
package/tools/task_create.js +41 -0
package/tools/task_respond.js +37 -0
package/tools/task_spawn.js +64 -0
package/tools/task_status.js +39 -0
package/tools/task_subscribe.js +37 -0
package/tools/todo_read.js +26 -0
package/tools/todo_write.js +38 -0
package/tools/tool_activate.js +24 -0
package/tools/tool_search.js +24 -0
package/tools/web_fetch.js +50 -0
package/tools/web_search.js +52 -0
package/tools/write_file.js +28 -0
package/ui/api.js +190 -0
package/ui/app.js +281 -0
package/ui/index.html +382 -0
package/ui/views/agents.js +377 -0
package/ui/views/chat.js +610 -0
package/ui/views/connection.js +96 -0
package/ui/views/daemons.js +129 -0
package/ui/views/feed.js +194 -0
package/ui/views/memory.js +263 -0
package/ui/views/models.js +146 -0
package/ui/views/sessions.js +314 -0
package/ui/views/settings.js +142 -0
package/ui/views/tasks.js +415 -0
package/utils/context.js +49 -0
package/utils/id.js +16 -0
package/utils/models.js +88 -0
package/utils/paths.js +213 -0
package/utils/settings.js +172 -0

package/ai-tests/prompt-001-basic-api.md ADDED Viewed

@@ -0,0 +1,230 @@
+# Test Prompt 001 — Group 1: Basic API (HTTP Contract Tests)
+**Phase:** 1
+**Focus:** Pure HTTP contract verification — API surface, field shapes, auth enforcement, CRUD persistence
+**Agents running tasks:** No — this phase does not ask agents to do intelligent work
+---
+## Your Role
+You are a test engineer executing Phase 1 of the VeilCLI test suite. Set up a clean workspace, start the server, and run HTTP contract tests against the live API. You decide how to implement the tests — curl, scripts, Node.js, a mix — whatever gets the job done cleanly. What matters is the quality of verification, not the implementation method.
+Do NOT just check HTTP status codes and call it a day. Each test below specifies what a **genuine pass** looks like. A test that returns HTTP 200 but doesn't verify the actual field values is not a passing test — it's a false positive.
+---
+## Environment
+- **VeilCLI source:** `/home/ixi/khacloud/drive/Plugins/VeilCli`
+- **How to invoke:** Try `veil` first (it is globally linked)
+- **Reference auth.json** (contains real API keys — you must copy this): `/home/ixi/khacloud/drive/Plugins/VeilCli/.veil/auth.json`
+- **API DOCS:** `/home/ixi/khacloud/drive/Plugins/VeilCli/docs/api/` check them to ensure your tests are right
+---
+## ⚠️ Non-Blocking Execution Rule
+**Any command that takes time must be run in the background and monitored by polling — never block waiting for it.**
+This applies to:
+- Starting the server (`veil start` → run in background, poll `GET /health` in a loop until ready)
+- Running test scripts — if a script can hang, run it with a timeout or in background and tail its output
+If a command blocks and gets stuck, you get stuck too. Background + poll is the pattern for everything time-sensitive in this phase.
+---
+## Step 1 — Create the Test Workspace
+Create the workspace at:
+```
+/home/ixi/khacloud/drive/Plugins/VeilCli_TESTS/workspace-test-001
+```
+Inside it, create `.veil/settings.json` with the following intent:
+- Port **5151** (never 5050 — that's the default and may conflict with a running instance)
+- A secret value of your choice (you'll need it for the auth tests — write it down)
+- Reasonable iteration and duration limits for tests
+- Permissive tool permissions (allow all)
+Copy the reference `auth.json` into `.veil/auth.json`. Do **not** copy agents, memory, or anything else — the workspace must be clean.
+---
+## Step 2 — Start the Server
+From **inside** `/home/ixi/khacloud/drive/Plugins/VeilCli_TESTS/workspace-test-001`, run `veil start` in the background. The server reads workspace config from the current directory.
+Confirm the server is up by polling `GET http://localhost:5151/health`. Do not proceed until it responds. If it doesn't respond within 20 seconds, treat this as a fatal failure, capture the startup output, and stop.
+---
+## Step 3 — Run the Tests
+Run all tests below and record results. All requests go to `http://localhost:5151`. Include the `X-Veil-Secret: <your-secret>` header on all requests **unless the test explicitly says not to** (the auth tests need to test the absence of it).
+---
+### Test 01 — Health & Status
+**What this catches:** `/health` is a liveness probe — it must be fast and require no database access. `/status` provides a runtime snapshot. If either is missing fields or counts don't change with state changes, observability is broken.
+**Verify:**
+- `GET /health` responds with HTTP 200. No auth header needed on this endpoint. Should respond immediately (under 1 second).
+- `GET /status` responds with a JSON body. Verify it contains all of: `uptime`, `cwd`, and numeric counts for agents, sessions, and tasks. All counts should be 0 or reflect actual state — not `null` or `undefined`.
+- Create one agent via the API (any minimal valid config). Then call `GET /status` again and verify the agent count increased by exactly 1. This confirms the status snapshot reflects live state, not cached data.
+---
+### Test 02 — Auth Enforcement
+**What this catches:** If the secret middleware has a bug (wrong header name, wrong comparison, only applied to some routes), requests that should be rejected get through silently.
+**Verify:**
+- Call `GET /agents` with **no auth header** → must return HTTP 401. Not 200, not 403, not 500.
+- Call `GET /agents` with the header `X-Veil-Secret: wrong-value` → must return HTTP 401.
+- Call `GET /agents` with the correct `X-Veil-Secret: <your-secret>` → must return HTTP 200.
+- Repeat the unauthorized test on a **different endpoint** (e.g. `GET /sessions`) to confirm the middleware applies globally, not just to the agents route.
+- `GET /health` should return 200 even without an auth header — confirm this endpoint is intentionally unprotected.
+---
+### Test 03 — Agent CRUD
+**What this catches:** Agent creation, persistence, update, and deletion. The common failure mode: create returns 201, but the agent was never actually written to disk/DB — a subsequent GET returns nothing.
+**Verify:**
+*Create:*
+- `POST /agents` with a minimal valid agent config (name, model, at least one mode enabled). Expect HTTP 201. The response must contain a `name` field matching what you sent. If you receive 201 but no `name` field, that is a FAIL.
+*Read after create:*
+- `GET /agents` → response must be an array containing the agent you just created (match by name). An empty array here after a successful POST is a FAIL.
+- `GET /agents/:name` → response must include key config fields that match what you sent in the POST body. Verify at least `name` and `model` match exactly.
+*Update:*
+- `PUT /agents/:name` changing a specific field (e.g. `temperature` or `description`). Expect HTTP 200.
+- Immediately `GET /agents/:name` again → verify the field you changed now has the new value. If it still has the old value, the update was not persisted.
+*Delete:*
+- `DELETE /agents/:name` → expect HTTP 200 or 204.
+- `GET /agents/:name` → must return HTTP 404. If it returns 200 with data, the deletion failed silently.
+- `GET /agents` → the deleted agent must not appear in the list.
+---
+### Test 04 — Settings CRUD
+**What this catches:** Settings have a layered merge system (global → project → local → CLI flags). Read/write must work correctly, API keys must be redacted, and live updates must apply without a server restart.
+**Verify:**
+*Read:*
+- `GET /settings` → response must include top-level fields like `port`, `maxIterations`, `permissions`. Verify `port` is `5151` (matching what you set).
+- The response must NOT contain raw API key values. Any field representing an API key (e.g. inside `models.main.api_key`) must be redacted (e.g. `"***"` or omitted entirely). If a real key string appears in the response, that is a security FAIL.
+- `GET /settings?level=project` → should return only what is in your project-level `settings.json`, not merged defaults.
+- `GET /settings?level=merged` → should include defaults. The merged result should have more fields than the project-level result.
+*Live update:*
+- `PUT /settings` with a change to a safe field like `maxIterations` (use a recognizable value like `42`).
+- Immediately `GET /settings` and verify the change is reflected — no server restart required. If it still shows the old value, live reload is broken.
+---
+### Test 05 — Session CRUD
+**What this catches:** Sessions are the backbone of chat continuity. Pre-creating sessions (before any chat), reset, and deletion must all work correctly. Broken session management means broken history, broken resumption, and broken context.
+**Verify:**
+*Create:*
+- `POST /sessions` with at least an `agent` field pointing to an existing agent. Expect a response containing a session `id`. A response without an `id` field is a FAIL.
+*Read:*
+- `GET /sessions` → list must contain the created session.
+- `GET /sessions/:id` → verify the response contains: `id`, `agent`, `status`, and a message count field. Verify the agent name matches what you used to create it.
+*Reset:*
+- `POST /sessions/:id/reset` → expect HTTP 200.
+- After reset, `GET /sessions/:id` → verify message count is 0 and the session still exists (was not deleted). The session being gone after a reset is a FAIL.
+*Delete (soft):*
+- `DELETE /sessions/:id` → expect HTTP 200 or 204.
+- `GET /sessions/:id` → if soft delete is supported, status should reflect it. Note the behavior you observe.
+*Delete (hard):*
+- Create a second session, then `DELETE /sessions/:id?hard=true` → expect HTTP 200 or 204.
+- `GET /sessions/:id` must return HTTP 404. If the session is still retrievable, hard delete is broken.
+---
+### Test 06 — Models Endpoint
+**What this catches:** The models index is used internally for context window size limits and cost calculation. If the endpoint is broken or returns empty, token tracking and cost estimation silently fail.
+**Verify:**
+*List:*
+- `GET /models` → response must contain a non-empty array of models. An empty array is a FAIL — it means the local model index was never populated.
+- Pick any model from the list and verify it has at minimum these fields with non-null values: `id`, `name`, `context_length`, and a `pricing` object.
+- The response should also have an `updated_at` or similar timestamp indicating when the index was last fetched.
+*Single model lookup:*
+- From the list above, take a model and identify its provider and name from its `id` (typically `provider/model-name` format).
+- `GET /models/:provider/:name` → should return that specific model's details. A 404 here when the model exists in the list is a FAIL.
+---
+## Step 4 — Stop the Server
+After all tests are complete, stop the server cleanly (`veil stop` from the workspace directory or send SIGTERM).
+---
+## Step 5 — Write the Summary
+Create a file `summary.md` in `/home/ixi/khacloud/drive/Plugins/VeilCli_TESTS/workspace-test-001/` with the following structure:
+```markdown
+# Test Run Summary — workspace-test-001
+**Date:** <date>
+**Port:** 5151
+**Phase:** Group 1 — Basic API (HTTP Contract Tests)
+## Results
+| Test | Status | Notes |
+|------|--------|-------|
+| Test 01: Health & Status | PASS / FAIL | |
+| Test 02: Auth Enforcement | PASS / FAIL | |
+| Test 03: Agent CRUD | PASS / FAIL | |
+| Test 04: Settings CRUD | PASS / FAIL | |
+| Test 05: Session CRUD | PASS / FAIL | |
+| Test 06: Models Endpoint | PASS / FAIL | |
+## Failures
+For each FAIL, write:
+- Which test and which specific assertion failed
+- What value was expected
+- What value was actually returned
+- Whether this looks like a VeilCLI bug or a test setup issue
+## Observations
+Anything unexpected you noticed that wasn't a hard FAIL but seems worth flagging —
+odd field names, inconsistent behavior, missing documentation cues, things that were
+confusing to test against, etc.
+```
+---
+## Notes on Avoiding False Positives
+- **Never accept HTTP status code alone as a pass.** Always read the response body and verify the fields that matter.
+- **Mutations must be confirmed.** After every PUT or DELETE, do a GET to verify the state actually changed.
+- **Counts must be exact.** If you create 1 agent and the count goes from 0 to 2, that's a bug.
+- **404 must be a real 404.** After deletion, if the endpoint returns 200 with an empty body or `{}`, that is NOT the same as 404 — it is a fail.
+- **Settings redaction must be verified.** Don't assume keys are redacted — actively check the response for strings that look like real API keys.
+- If something fails, **do not work around it** — record it as a failure and move on to the next test. The goal is an accurate picture of what works and what doesn't.

package/ai-tests/prompt-002-basic-flows.md ADDED Viewed

@@ -0,0 +1,230 @@
+# Test Prompt 002 — Group 2: Basic Flows (End-to-End Happy Paths)
+**Phase:** 2
+**Focus:** End-to-end flows that simulate real user behavior — chat lifecycle, async task execution, SSE streaming, session resumption
+**Agents running tasks:** Yes — agents will be doing real work in this phase
+---
+## Your Role
+You are a test engineer executing Phase 2 of the VeilCLI test suite. This phase goes beyond HTTP contract tests — agents must actually run, produce output, and you verify that the full pipeline (request → agentic loop → tool execution → response) works correctly end-to-end.
+These are the tests that catch runtime failures that the HTTP surface hides. A 200 response means nothing if the agent returned an empty output, the session wasn't persisted, the SSE stream never sent any chunks, or the tool pipeline silently dropped the result.
+---
+## Environment
+- **VeilCLI source:** `/home/ixi/khacloud/drive/Plugins/VeilCli`
+- **How to invoke:** `veil` (globally linked)
+- **Reference auth.json:** `/home/ixi/khacloud/drive/Plugins/VeilCli/.veil/auth.json`
+- **API DOCS:** `/home/ixi/khacloud/drive/Plugins/VeilCli/docs/api/` check them to ensure your tests are right
+- **Agent schema & examples:** `/home/ixi/khacloud/drive/Plugins/VeilCli/schemas/agent.json` and `/home/ixi/khacloud/drive/Plugins/VeilCli/examples/`
+---
+## ⚠️ Non-Blocking Execution Rule
+**Any command that takes time must be run in the background and monitored by polling — never block waiting for it.**
+This applies to:
+- Starting the server (`veil start` → run in background, poll `GET /health` in a loop until ready)
+- Running test scripts — if a script can hang, run it with a timeout or in background and tail its output
+If a command blocks and gets stuck, you get stuck too. Background + poll is the pattern for everything time-sensitive in this phase.
+---
+## Step 1 — Create the Test Workspace
+Create the workspace at:
+```
+/home/ixi/khacloud/drive/Plugins/VeilCli_TESTS/workspace-test-002
+```
+Inside it, create `.veil/settings.json`:
+- Port **5252**
+- A test secret of your choice
+- Reasonable iteration and duration limits (agents need enough room to complete short tasks)
+- Permissive tool permissions
+Copy the reference `auth.json` into `.veil/auth.json`.
+---
+## Step 2 — Create the Test Agents
+You need two agents for this phase. Create them as agent folders inside `.veil/agents/`. Check the schema and examples for valid `agent.json` structure.
+**Agent 1 — `chat-basic`**
+- Chat mode enabled
+- No tools needed (this agent just talks, doesn't use tools)
+- Memory disabled (keep it simple)
+- Use the default/main model from auth.json
+**Agent 2 — `task-runner`**
+- Task mode enabled
+- Must have file I/O tools available: at minimum `write_file`, `read_file`, `list_dir`
+- Memory disabled
+- Use the default/main model
+Give each agent a brief `AGENT.md` so it has a clear identity and doesn't confuse itself. Keep the prompts minimal — just name and role.
+---
+## Step 3 — Start the Server
+From inside `workspace-test-002`, run `veil start` in the background. Confirm `GET http://localhost:5252/health` responds before proceeding. If it doesn't come up within 20 seconds, that's a fatal failure — capture the server output and stop.
+---
+## Step 4 — Run the Tests
+All requests to `http://localhost:5252`. Include the `X-Veil-Secret` header on all requests.
+---
+### Test 01 — Chat Happy Path (Multi-Turn)
+**What this catches:** The most critical regression in VeilCLI. If a new user creates an agent and sends a message and gets nothing back — or can't continue a conversation — the product is broken. Also catches the silent failure where the session isn't actually persisted after a successful response.
+**Setup:** Use the `chat-basic` agent.
+**Verify:**
+*First message:*
+- `POST /agents/chat-basic/chat` with a simple conversational message. Expect HTTP 200.
+- The response must contain a non-empty `message` field. An empty string is a FAIL.
+- The response must contain a `sessionId`. Missing sessionId means the conversation cannot be continued.
+- Verify token counts on the response — both input and output tokens must be non-zero.
+*Second message (session continuation):*
+- Send a second message using the `sessionId` from the first response. Ask something that only makes sense if the agent read the first message (e.g. reference something specific you said in message 1).
+- Response must be non-empty.
+- `GET /sessions/:id/messages` — the message history must contain exactly 2 user turns and 2 assistant turns, in the correct order (user → assistant → user → assistant). If the second message created a new session instead of continuing, message count will be wrong.
+- Verify each assistant message has non-zero output_tokens. Verify each user message has non-zero input_tokens.
+---
+### Test 02 — Async Task Full Lifecycle
+**What this catches:** Task mode is async — the status must transition correctly from pending → processing → finished, output must be non-empty, and the event log must capture the full execution trace. Previous test approaches would check the 202 response and stop there, missing silent failures in the execution loop.
+**Setup:** Use the `task-runner` agent.
+**The task to give the agent:** Ask it to write a specific, recognizable string (e.g. `"VEIL_TASK_MARKER_002"`) into a file named `task-output.txt` inside the workspace directory, then confirm what it wrote.
+This task is deliberately designed to:
+1. Force tool use (`write_file`)
+2. Leave a verifiable side effect on disk
+3. Be short enough to complete quickly
+**Verify:**
+*Creation:*
+- `POST /agents/task-runner/task` → expect HTTP 202. Response must contain a `taskId`. The initial status must be `pending` (not already `finished` — that would mean it ran synchronously which is wrong).
+*Polling to completion:*
+- Poll `GET /tasks/:id` at a reasonable interval until status reaches a terminal state (`finished`, `failed`, or `canceled`). Set a polling timeout of 90 seconds. If it times out, record the last known status and mark as FAIL.
+- Final status must be `finished`. If it's `failed`, the output/error field should explain why — record it.
+*Output verification:*
+- The task's `output` field must be non-empty. An empty output with `finished` status is a silent failure — the agent "completed" but produced nothing.
+- Verify token_input and token_output on the task record are both non-zero.
+- Verify `iterations` on the task record is at least 1.
+*Event trace:*
+- `GET /tasks/:id/events` — the event list must not be empty.
+- Verify at least one `status.change` event exists showing the transition from `pending` to `processing`.
+- Verify at least one `tool.start` event for `write_file` exists — this confirms the agent actually attempted to use the tool, not just described what it would do.
+- Verify the corresponding `tool.end` event for `write_file` exists and does not contain an error result.
+*Filesystem side effect:*
+- Check that `task-output.txt` actually exists in the workspace. Its content must contain the marker string you specified. This is the definitive verification that the tool pipeline worked end-to-end — not just that the agent said it ran the tool.
+---
+### Test 03 — Chat SSE Streaming
+**What this catches:** SSE (Server-Sent Events) is an entirely separate code path from the regular JSON chat response. It's the primary interface for any streaming UI client. If it's broken, the UI shows nothing while non-streaming works fine — and a simple HTTP status check would miss this.
+**Setup:** Use the `chat-basic` agent.
+**Verify:**
+*Response type:*
+- `POST /agents/chat-basic/chat` with `"sse": true` in the request body. The response must have `Content-Type: text/event-stream`. If it returns `application/json`, the SSE code path is not executing.
+*Stream content:*
+- Consume the event stream until a `done` event is received (or a timeout, 60 seconds max).
+- There must be at least one `chunk` event (or equivalent token/delta event) received before the `done`. A stream that goes straight to `done` with no chunks means content was never streamed — it was buffered and sent all at once (or not at all).
+- The `done` event must contain: a final `message` (non-empty) and token usage data (`tokenUsage` or equivalent). A `done` event with an empty message is a FAIL.
+*Functional equivalence:*
+- The final assembled content from all chunks should be meaningfully similar to what a non-SSE chat response would return for the same prompt. You don't need to compare them exactly — but if the SSE version returns a completely empty or single-character response while non-SSE returns a full paragraph, something is wrong.
+---
+### Test 04 — Session Resumption
+**What this catches:** A user closes their client, comes back later, and resumes. The session must be loadable, the history must be intact, and after a reset the agent must start fresh. This catches two distinct failure modes: history not persisting (messages lost on retrieval) and reset not working (messages still present after clear).
+**Setup:** Use the `chat-basic` agent.
+**Verify:**
+*History persistence:*
+- Start a conversation with 3 message exchanges (3 user turns, 3 assistant responses). Use the same `sessionId` throughout.
+- After the third exchange, `GET /sessions/:id/messages` → the history must contain all 6 messages (3 user + 3 assistant) in the correct chronological order.
+- Verify the `role` field on each message is either `user` or `assistant`. No message should have a null or missing role.
+*Cold resumption:*
+- Without using the sessionId for a few seconds (simulate reconnect), send a new message using the same `sessionId`.
+- This must work as a continuation — no error about session not found, no empty response.
+- After this 4th exchange, message count must be 8 (4 user + 4 assistant). If it resets to 2, the session lookup on resume is creating a new session.
+*Reset:*
+- `POST /sessions/:id/reset` → expect HTTP 200.
+- `GET /sessions/:id/messages` → message count must be 0. The session itself must still exist (status is not `deleted`).
+- Send one more message on the same session after reset. It must work normally as a fresh start. The agent should not reference anything from the prior conversation — if it does, the reset didn't actually clear the context used in the prompt.
+---
+## Step 5 — Stop the Server
+Stop the server cleanly from the workspace directory after all tests complete.
+---
+## Step 6 — Write the Summary
+Create `summary.md` in `workspace-test-002/`:
+```markdown
+# Test Run Summary — workspace-test-002
+**Date:** <date>
+**Port:** 5252
+**Phase:** Group 2 — Basic Flows
+## Results
+| Test | Status | Notes |
+|------|--------|-------|
+| Test 01: Chat Happy Path | PASS / FAIL | |
+| Test 02: Async Task Lifecycle | PASS / FAIL | |
+| Test 03: SSE Streaming | PASS / FAIL | |
+| Test 04: Session Resumption | PASS / FAIL | |
+## Failures
+For each FAIL:
+- Which test and specific assertion
+- Expected vs actual
+- Whether it looks like a VeilCLI bug or test setup issue
+## Observations
+Anything unexpected — agent behavior oddities, timing issues, fields that behaved differently
+than the docs described, things that were harder to verify than expected.
+```

package/ai-tests/prompt-003-agent-behaviors.md ADDED Viewed

@@ -0,0 +1,220 @@
+# Test Prompt 003 — Group 3: Agent Behaviors (Config Enforcement)
+**Phase:** 3
+**Focus:** Verify that agent configuration is actually enforced at runtime — tool restrictions, mode gating, iteration limits, and live config reload
+**Agents running tasks:** Yes
+---
+## Your Role
+You are a test engineer executing Phase 3 of the VeilCLI test suite. This phase tests whether VeilCLI actually honors agent configuration. These are silent failure territory: the HTTP response is 200, the task status is `finished`, but the agent did something it wasn't supposed to — used a forbidden tool, ran in a disabled mode, ignored iteration limits. A basic status-code check would pass all of these while the system is broken.
+The primary verification mechanism in this phase is **task event inspection** — `GET /tasks/:id/events` — not the agent's text output.
+---
+## Environment
+- **VeilCLI source:** `/home/ixi/khacloud/drive/Plugins/VeilCli`
+- **How to invoke:** `veil` (globally linked)
+- **Reference auth.json:** `/home/ixi/khacloud/drive/Plugins/VeilCli/.veil/auth.json`
+- **API DOCS:** `/home/ixi/khacloud/drive/Plugins/VeilCli/docs/api/` — check them to ensure your tests are right
+- **Agent schema & examples:** `/home/ixi/khacloud/drive/Plugins/VeilCli/schemas/agent.json` and `/home/ixi/khacloud/drive/Plugins/VeilCli/examples/`
+---
+## ⚠️ Non-Blocking Execution Rule
+**Any command that takes time must be run in the background and monitored by polling — never block waiting for it.**
+This applies to:
+- Starting the server (`veil start` → run in background, poll `GET /health` in a loop until ready)
+- Running test scripts — if a script can hang, run it with a timeout or in background and tail its output
+- Waiting for tasks to finish → always poll `GET /tasks/:id` in a loop with a sleep interval, never a blocking wait
+If a command blocks and gets stuck, you get stuck too. Background + poll is the pattern for everything time-sensitive in this phase.
+---
+## Step 1 — Create the Test Workspace
+Create the workspace at:
+```
+/home/ixi/khacloud/drive/Plugins/VeilCli_TESTS/workspace-test-003
+```
+`.veil/settings.json`:
+- Port **5353**
+- A test secret of your choice
+- Permissive execution-level permissions (allow all) — the restrictions being tested are at the agent config level, not the global settings level
+- Reasonable iteration and duration limits for a test environment
+Copy the reference `auth.json` into `.veil/auth.json`.
+---
+## Step 2 — Create the Test Agents
+You need four agents. Check the schema and docs for exact valid fields. Pay close attention to the difference between:
+- `tools` — a whitelist of tool names the LLM is shown (if set, LLM only sees these tools)
+- `disallowedTools` — a blacklist that hides specific tools from the LLM
+- `permissions.allow/deny` — execution-level gatekeeping (different layer from LLM visibility)
+Read the docs and guide on permissions and tool configuration before creating agents — the distinction between these fields matters for writing correct test configs.
+**Agent 1 — `restricted-agent`**
+- Task mode enabled
+- Configure it to explicitly deny/hide `bash` from the LLM (use the appropriate field)
+- Has other tools available (file I/O tools) so it can still do useful work
+- Keep memory disabled
+**Agent 2 — `readonly-agent`**
+- Task mode enabled
+- Whitelist only `read_file` as an available tool (LLM sees nothing else)
+- Keep memory disabled
+**Agent 3 — `chat-disabled-agent`**
+- Task mode enabled
+- Chat mode explicitly disabled (`modes.chat.enabled: false`)
+- Keep memory disabled
+**Agent 4 — `iteration-agent`**
+- Task mode enabled
+- Set `maxIterations` very low (3) at the agent's task mode level
+- Has file I/O tools so it can attempt multi-step work
+- Keep memory disabled
+---
+## Step 3 — Start the Server
+From inside `workspace-test-003`, start `veil` in the background. Poll `GET http://localhost:5353/health` until it responds (20 second timeout). Fatal failure if it doesn't come up.
+---
+## Step 4 — Run the Tests
+---
+### Test 01 — Deny List Enforcement (`disallowedTools`)
+**What this catches:** If the deny/hidden tool list isn't applied when loading the agent's tool set, the LLM gets access to tools it shouldn't have. Since the LLM won't call a tool it can't see, this manifests as the forbidden tool appearing in task events — which a passing HTTP status would never reveal.
+**Setup:** Use `restricted-agent`. Give it a task that explicitly requires bash — something like "run the shell command `echo BASH_WAS_CALLED` and report what it output."
+**Verify:**
+- Poll the task to a terminal state.
+- `GET /tasks/:id/events` — scan all events. There must be **zero** `tool.start` events with the denied tool's name. If even one appears, the restriction is broken.
+- The task outcome (finished or failed) is secondary — what matters is that the forbidden tool was never invoked. The agent should either work around it or explain it can't do the task without that tool.
+- Also check `GET /agents/restricted-agent/skills` — the denied tool must not appear in the skills/tools list returned for this agent.
+---
+### Test 02 — Whitelist Enforcement (`tools` allow list)
+**What this catches:** Mirror of Test 01. If the tools whitelist doesn't actually restrict what the LLM sees, the agent can use any available tool regardless of config.
+**Setup:** Use `readonly-agent` (only `read_file` whitelisted). First, place a file in the workspace with known content. Then give the agent a task that requires both reading AND writing: "Read the file at [path], then write its content in reverse to a new file called reversed.txt."
+**Verify:**
+- Poll to terminal state.
+- `GET /tasks/:id/events` — `write_file` must never appear in any `tool.start` event. The agent may read the file successfully, but it cannot write.
+- Check that `reversed.txt` does **not** exist on disk. If it does, the whitelist failed.
+- `GET /agents/readonly-agent/skills` — only `read_file` (and possibly system tools) should appear.
+---
+### Test 03 — Mode Enforcement
+**What this catches:** If `modes.chat.enabled: false` doesn't actually gate the chat endpoint, disabled modes are meaningless. This is a simple but important contract.
+**Setup:** Use `chat-disabled-agent`.
+**Verify:**
+- `POST /agents/chat-disabled-agent/chat` — must return an error response (not HTTP 200). Check the docs for what status code and error shape VeilCLI returns for disabled modes.
+- `POST /agents/chat-disabled-agent/task` — must still work (task mode is enabled). Create a simple task and verify it gets a 202 with a taskId. The agent isn't broken — just its chat endpoint is gated.
+---
+### Test 04 — `maxIterations` and `onExhausted` Enforcement
+**What this catches:** Without iteration limits, a broken or confused agent loop runs forever and burns tokens. `onExhausted` determines the final behavior — if the wrong state is reached, downstream systems (callers polling for `waiting` or `failed`) behave incorrectly.
+**Setup:** Use `iteration-agent` (maxIterations: 3). Give it a task designed to require many steps — something like: "Research and write a comprehensive 10-section report on Node.js best practices, writing each section to its own file." This is intentionally too big to finish in 3 iterations.
+**Test onExhausted: "fail" behavior:**
+- Submit the task with the agent configured for `onExhausted: "fail"` (set this at test time, either via agent config or task parameters — check docs for how to override per-task).
+- Poll to terminal state.
+- Final status must be `failed`. Not `finished`, not `waiting` — `failed`.
+- The `iterations` field on the task record must be ≤ 3.
+**Test onExhausted: "wait" behavior:**
+- Reconfigure or create a variant of the agent with `onExhausted: "wait"`.
+- Submit the same large task.
+- Poll until status is `waiting` (not `finished` or `failed`). This confirms the agent paused instead of failing.
+- `POST /tasks/:id/respond` with a message like "Continue where you left off." — verify the task resumes (status goes back to `processing` then eventually terminates).
+- Check docs for the exact `respond` endpoint payload format.
+---
+### Test 05 — Hot-Reload from Disk
+**What this catches:** `POST /agents/:name/reload` must actually re-read the agent's `agent.json` from disk. If it's a no-op, live config changes never take effect without a server restart.
+**Setup:** Use any of the agents created above (e.g. `restricted-agent`). Note its current `description` and `temperature` values.
+**Verify:**
+*Change takes effect after reload:*
+- Directly edit `restricted-agent`'s `agent.json` on disk — change `description` to something recognizable like `"HOT_RELOAD_TEST_VALUE"` and change `temperature` to an unusual value like `0.42`.
+- `GET /agents/restricted-agent` — verify the OLD values are still returned (confirms the API serves cached config, not live disk reads).
+- `POST /agents/restricted-agent/reload` — expect HTTP 200.
+- `GET /agents/restricted-agent` — now the description and temperature must match what you wrote to disk. If they don't, reload is broken.
+*Change does NOT take effect without reload:*
+- Make another disk edit (change description again to a different recognizable value).
+- Do NOT call reload.
+- `GET /agents/restricted-agent` — must still show the previous value (the one from the first reload). This confirms the server doesn't auto-watch for file changes — reload is required.
+---
+## Step 5 — Stop the Server
+Stop the server cleanly from the workspace directory.
+---
+## Step 6 — Write the Summary
+Create `summary.md` in `workspace-test-003/`:
+```markdown
+# Test Run Summary — workspace-test-003
+**Date:** <date>
+**Port:** 5353
+**Phase:** Group 3 — Agent Behaviors (Config Enforcement)
+**Test Path:** /home/ixi/khacloud/drive/Plugins/VeilCli_TESTS/workspace-test-003/
+## Results
+| Test | Status | Notes |
+|------|--------|-------|
+| Test 01: Deny List Enforcement | PASS / FAIL | |
+| Test 02: Whitelist Enforcement | PASS / FAIL | |
+| Test 03: Mode Enforcement | PASS / FAIL | |
+| Test 04a: maxIterations + onExhausted fail | PASS / FAIL | |
+| Test 04b: maxIterations + onExhausted wait | PASS / FAIL | |
+| Test 05: Hot-Reload from Disk | PASS / FAIL | |
+## Failures
+For each FAIL:
+- Which test and specific assertion
+- Expected vs actual
+- Whether it looks like a VeilCLI bug or test setup issue
+## Observations
+Unexpected behavior, config fields that didn't behave as documented, or anything worth flagging.
+```