npm - @checkstack/ai-backend - Versions diffs - 0.1.3 → 0.1.4 - Mend

@checkstack/ai-backend 0.1.3 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/CHANGELOG.md +61 -0
package/package.json +4 -4
package/src/agent-runner.test.ts +50 -0
package/src/agent-runner.ts +13 -3
package/src/chat/chat-handler.ts +6 -0
package/src/chat/chat-service.ts +13 -18
package/src/chat/classifier.logic.test.ts +11 -0
package/src/chat/classifier.logic.ts +16 -9
package/src/chat/model-schema.test.ts +264 -0
package/src/chat/model-schema.ts +334 -0
package/src/chat/sdk-tools.ts +32 -35
package/src/chat/system-prompt.test.ts +113 -0
package/src/chat/system-prompt.ts +146 -0
package/src/generated/docs-index.ts +4 -3
package/src/serializer.test.ts +22 -0

package/src/generated/docs-index.ts CHANGED Viewed

@@ -82,10 +82,11 @@ export const DOCS_INDEX: readonly DocsIndexEntry[] = [
       "Per-integration model selection",
       "Off-topic guard",
       "Per-integration LLM spend cap",
+      "Dates and timezones",
       "No secret leaves the backend",
       "Related"
     ],
-    "content": "The in-app AI assistant is a server-side agent loop built on the Vercel AI SDK. It runs entirely on the backend, uses the same tool registry as the MCP server, persists conversations in shared Postgres, and never lets the model silently change state. Read tools auto-run; mutating and destructive tools surface a confirm card that a human must approve.\n\n## The agent loop runs on the backend\n\nThe chat turn is a raw HTTP handler at `/api/ai/chat` (server-sent events, because streaming needs a raw handler). The handler authenticates the request through the platform auth strategy, requires a logged-in real user (applications and services use MCP, not chat), and hands the turn to the agent loop. The model provider is created on the backend from the selected integration's credentials, so the API key never crosses to the browser. The browser only ever receives streamed tokens and tool events.\n\n```ts\n// Provider-agnostic via base-URL override (OpenAI, Azure, OpenRouter, Ollama, ...).\nconst model = buildLanguageModel({ connection, model: conversation.model });\nconst result = streamText({ model, system, messages, tools, stopWhen: stepCountIs(8) });\nreturn result.toUIMessageStreamResponse();\n```\n\n## Tools come from the same registry\n\nThe loop offers the model exactly the tools the resolver allows for the logged-in principal, no more. The model is treated as an untrusted caller: it picks arguments, but it can never reach a tool the principal cannot use, and every tool call is gated server-side.\n\n- `read` tools auto-run. Their execution re-enters the live router as the logged-in user (the chat request's own auth is forwarded), so handler-side authorization runs exactly as for any other caller. Each successful read writes an `ai_tool_calls` row with `transport: \"chat\"` (and an args hash, never the raw args), so chat reads appear in the audit log AND count toward the per-principal rate-limit budget, exactly like MCP reads.\n- `mutate` and `destructive` tools never run inline. Their executor runs the `propose` dry-run and returns a confirm card carrying the single-use proposal token and the validated payload. Nothing is committed until the operator clicks Apply, which calls `applyTool` with the token.\n\n```ts\n// Disposition of a model-requested tool (the single server-side gate):\ndisposeAgentTool({ toolName, principal, resolver, getTool });\n// -> { kind: \"run\" } | { kind: \"confirm\" } | { kind: \"refused\" }\n```\n\nThe same per-principal rate-limit budget that protects MCP is enforced before every tool call in the loop. See [Propose and apply](/checkstack/developer-guide/ai/propose-apply/) for the budget and the confirm-card token lifecycle.\n\n## The model acknowledges a confirm-card decision\n\nA confirm card ends the model's turn: the model has said what it will do and is now waiting on the operator. When the operator clicks Apply (or Decline), the actual apply still runs through the unchanged `applyTool` propose/apply path, and then a short follow-up turn makes the model react so the conversation does not dead-end on \"waiting for your confirmation\".\n\nThat follow-up is a second mode of the same `/api/ai/chat` handler. Instead of a `message`, the POST body carries a `decision` (the proposal token plus `apply` or `decline`); the handler routes it to `streamDecision`, which streams the model's acknowledgment over the same SSE path as a normal turn.\n\n```ts\n// POST /api/ai/chat — a normal turn OR a confirm-card decision turn:\n{ conversationId, connectionId, model?, message: \"...\" }                 // user turn\n{ conversationId, connectionId, model?, decision: { token, kind } }       // apply | decline\n```\n\nThe decision note handed to the model is derived SERVER-SIDE from the stored proposal (its tool name and the one-line summary captured at propose time), so no client-supplied text ever reaches the model. The note is EPHEMERAL: it is appended to that turn's history only and never persisted; the assistant's streamed reply is what gets saved, and it carries the outcome forward so later turns know the change is live. `streamDecision` re-checks ownership, that the proposal belongs to THIS conversation, and — for an `apply` — that the proposal is actually in the `applied` state (the apply ran first), refusing with a 409 otherwise so the model can never falsely claim a change took effect.\n\n```ts\nproposeApply.describeProposal({ token }); // read-only: tool name, summary, status, conversation (no consume)\nbuildDecisionNote({ decision, toolName, summary }); // the ephemeral, server-derived note\n```\n\n## Conversations are durable and pod-independent\n\nConversations and messages live in `ai_conversations` and `ai_messages` in shared Postgres. The user message is persisted before streaming begins, and the assistant message on completion, so a mid-stream pod restart still leaves a complete, resumable transcript. Any pod can list, open, or continue a chat; nothing about a conversation is pod-local. All reads are owner-scoped, so a user can only ever see their own conversations.\n\nWhen a turn is resumed, the model must see its prior TOOL interactions, not just the text it eventually said. On completion the loop persists the canonical AI-SDK `ResponseMessage[]` for the turn (assistant tool-call parts plus tool-result parts) into the additive `ai_messages.model_messages` column. On the next turn `toModelMessages` replays those messages verbatim, so a multi-turn conversation reconstructs the full tool-call history for the model. Rows written before this column existed (or plain user/system rows) fall back to text-only replay. Because the replay history lives in shared Postgres, replay is identical on whichever pod handles the next turn.\n\nThe conversation contract (RPC):\n\n```ts\nai.listChatIntegrations(); // selectable providers + model UX (no secrets)\nai.listConversations();    // the user's conversations, newest first\nai.createConversation({ integrationId, model });\nai.getConversation({ id }); // conversation + full transcript\nai.updateConversation({ id, title, model });\nai.archiveConversation({ id }); // soft delete: the user-facing \"Delete\" action\nai.deleteConversation({ id }); // hard delete (cascades); not the user action\n```\n\n## Deleting a chat is a soft archive\n\nThe user-facing \"Delete\" action in the sidebar does NOT remove the row. It calls `archiveConversation`, which stamps an `archived_at` timestamp on the `ai_conversations` row. `listConversations` filters `archived_at IS NULL`, so an archived chat disappears from the sidebar, but the conversation and its messages are RETAINED in Postgres for later abuse introspection. The archive is owner-scoped, so a user can only archive their own chats, and a repeat archive of an already-archived row is a no-op.\n\nThe frontend confirms the action through a modal (labeled \"Delete\"), then archives, lets the owning plugin's query invalidation refresh the list, and clears the view back to the empty state if the archived chat was the open one. The hard `deleteConversation` method is retained for non-user callers (such as a retention sweep) but is never wired to the sidebar, so nothing the user clicks ever hard-deletes a transcript.\n\n```ts\narchiveConversation({ id, userId }); // stamps archived_at = now, owner-scoped\n// listConversations filters archived_at IS NULL\n```\n\n## Starting a new chat\n\nThe \"New chat\" button creates a fresh conversation, makes it the active (highlighted) one, and clears the message view. Because `createConversation` is an oRPC mutation, it auto-invalidates the plugin's conversation list on success, so the new chat appears in the sidebar immediately. To avoid spawning a pile of empty \"Untitled chat\" rows, the click is deduplicated: if the conversation already open is itself an empty untitled draft (no title and no messages), the button reuses it instead of creating another row. The decision is a pure helper so it is unit-testable without rendering the page.\n\n```ts\ndecideNewChatAction({ current, messages }); // -> { kind: \"reuse\" } | { kind: \"create\" }\n```\n\n## Conversations are auto-titled\n\nA new conversation starts untitled, so the sidebar would otherwise show \"Untitled chat\". After the first user message of a still-untitled conversation, the backend derives a concise title (at most six words, no quotes, markdown, or trailing punctuation) and persists it with `updateConversation({ title })`. The title is produced by a cheap `generateText` call that reuses the turn's resolved connection and model, then sanitized with `sanitizeGeneratedTitle`.\n\nTitling is fire-and-forget: it runs detached from the streamed turn, so it never delays or crashes the response. On any model or sanitize failure it falls back to a deterministic heuristic from the first message (`deriveHeuristicTitle`: collapse whitespace, first six words, capped at sixty characters). The title lives in the shared `ai_conversations` table, so it is readable on every pod. The chat page invalidates the conversation list when a turn completes to pick up the new title, because the streaming turn is a raw SSE fetch rather than an oRPC mutation and so does not auto-invalidate.\n\n```ts\nderiveHeuristicTitle(firstMessage); // fallback when the model errors\nsanitizeGeneratedTitle(raw);        // strip quotes/markdown/punctuation, cap length\n```\n\n## Per-integration model selection\n\nModel choice is a property of the credential and provider, so it lives on the OpenAI-compatible integration connection: `defaultModel` is required and `availableModels` is an optional allowlist. The chat model picker always renders a `Select` whose options are `[defaultModel, ...availableModels]`, de-duplicated with the default first, so the connection's own default is always selectable. With no `availableModels` the picker contains just the default; it is never a free-text field. The model id is untrusted wire input, so it is revalidated server-side at two points: `createConversation` / `updateConversation` coerce a stored model against the integration's allowlist, and `buildLanguageModel` always runs the requested (or stored) model id through `resolveModelId` before handing it to the provider, so an out-of-allowlist id is coerced to `defaultModel` and never reaches the provider. An empty allowlist allows any model from the picker's default (free-text providers like Ollama still configure a `defaultModel`).\n\n```ts\nresolveModelId({ connection, requested }); // requested, or defaultModel if out of allowlist\n```\n\n## Off-topic guard\n\nThe assistant helps with operating Checkstack (incidents, health checks, anomalies, automations, monitoring, and on-call) AND with questions about the assistant itself or how to use Checkstack. Two layers keep clearly unrelated requests (general coding help, creative writing, general trivia) from spending tokens on the expensive tool loop.\n\nFirst, the system prompt instructs the assistant to decline clearly unrelated requests with a one-line redirect, so even a request that slips past the classifier is steered back.\n\nSecond, a cheap topical pre-classifier runs BEFORE the agent/tool loop. It is a small `generateText` call (injectable like the title generator, defaulting to the turn's resolved model) with a tight prompt that returns a single token: `ON_TOPIC` or `OFF_TOPIC`. The following are always `ON_TOPIC`:\n\n- Checkstack operations: incidents, health checks, anomalies, automations, monitoring, on-call, the platform's data and configuration.\n- Meta/capability questions about the assistant itself: \"what can you do?\", \"who are you?\", \"help\", \"what features do you have?\".\n- Greetings and conversational openers: \"hi\", \"hello\", \"hey\".\n- How-to and conceptual questions about using Checkstack features or workflows: \"how do health checks work?\", \"how do I create an automation?\".\n\nOnly CLEARLY unrelated requests are `OFF_TOPIC`: general coding help unrelated to Checkstack, creative writing, and general trivia or knowledge questions.\n\nThe reply is parsed by a pure function that leans toward `ON_TOPIC` on anything ambiguous or unrecognized, because a false refusal of a real ops question is worse than letting one off-topic request slide.\n\n- On `OFF_TOPIC` the turn short-circuits: the expensive tool loop never runs. A canned, concise refusal is streamed back over the same SSE path the normal turn uses (so the frontend renders it identically) and persisted as the assistant message. The refusal nudges the user toward supported topics rather than just declining.\n- The classifier is fail-open: if the classifier model call throws, the turn proceeds normally. A classifier hiccup must never block legitimate use.\n- The classifier's own small token usage is recorded against the shared `ai_spend` ledger, exactly like any other model call, so it is accounted toward the spend cap.\n\n```ts\nbuildClassifierPrompt({ userText });   // { system, prompt } for the cheap call\nparseClassifierVerdict(raw);           // \"ON_TOPIC\" | \"OFF_TOPIC\" (ambiguous -> ON_TOPIC)\n```\n\n## Per-integration LLM spend cap\n\nEach OpenAI-compatible connection may carry an optional `spendCap`. It is OFF by default: no cap is enforced unless you configure one in the connection's settings form. The cap is a token-count budget, not a USD budget, because token counts are deterministic and provider-agnostic. Every OpenAI-compatible provider (OpenAI, Azure, OpenRouter, Ollama, vLLM, LM Studio) reports token usage through the AI SDK, but only some publish a price table and self-hosted models have none, so a USD cap would need a per-model pricing table that drifts and is meaningless for local models.\n\n```ts\nspendCap: { tokenBudget: 200000, windowMinutes: 60 } // optional; omit for no cap\n```\n\nWhen a cap is set, the loop refuses a new turn once the principal's token usage against that integration in the trailing `windowMinutes` reaches `tokenBudget`, returning a clear spend-exceeded error (HTTP 429). Spend is a rolling-window SUM over the shared `ai_spend` ledger: every completed turn appends one row with the AI SDK's reported input and output tokens, keyed by integration and principal. Because the sum is read from the same shared table every pod writes to, the cap holds across all pods, exactly like the per-principal tool rate-limit budget. An in-memory per-pod token counter would let N pods each allow the cap, which a single-process test could never catch, so the ledger is durable Postgres and the cross-pod count is verified in `core/ai-backend/src/rate-limit/spend-ledger.it.test.ts`.\n\n## No secret leaves the backend\n\nThe integration API key is stored in the Secrets Vault and read only on the backend when building the model provider. The chat RPCs expose only non-secret model UX metadata (`listChatIntegrations` returns connection id, name, default model, and the allowlist). The streamed response carries tokens, tool calls, and tool results (already redacted by their source procedures), never the credential. The no-secret-leak guarantee is regression-guarded across every AI DTO in `core/ai-backend/src/hardening/no-secret-leak.test.ts`.\n\nThe free-form `ai_messages.content` and `model_messages` bags are an exception that could, in principle, carry a credential if a buggy or malicious tool result smuggled one in. That guarantee is no longer merely architectural: `appendMessage` runs `scrubContent` on every message write, redacting any credential-shaped key (`apiKey`, `authorization`, `password`, `x-secret`, and similar) and any high-confidence credential value (an `sk-...` key, a `Bearer` token) before the row reaches Postgres. The scrub is conservative, so ordinary chat prose that merely mentions the word \"token\" or \"password\" is preserved; only credentials are stripped. The canary regression test injects a secret into message content and asserts it is stripped on write in `core/ai-backend/src/chat/scrub-content.test.ts` and `core/ai-backend/src/hardening/no-secret-leak.test.ts`.\n\n## Related\n\nChat shares the [tool registry](/checkstack/developer-guide/ai/tool-registry/) and resolver with the [MCP server](/checkstack/developer-guide/ai/mcp-server/), and gates mutating tools through [propose and apply](/checkstack/developer-guide/ai/propose-apply/). A model that picks a tool the principal cannot use is refused server-side (guarded in `core/ai-backend/src/chat/agent-loop.test.ts` and `core/ai-backend/src/hardening/handler-authz.test.ts`), and cross-pod conversation readback is verified in `core/ai-backend/src/chat/conversation-store.it.test.ts`. See the [AI platform overview](/checkstack/developer-guide/ai/) for the full security model.",
+    "content": "The in-app AI assistant is a server-side agent loop built on the Vercel AI SDK. It runs entirely on the backend, uses the same tool registry as the MCP server, persists conversations in shared Postgres, and never lets the model silently change state. Read tools auto-run; mutating and destructive tools surface a confirm card that a human must approve.\n\n## The agent loop runs on the backend\n\nThe chat turn is a raw HTTP handler at `/api/ai/chat` (server-sent events, because streaming needs a raw handler). The handler authenticates the request through the platform auth strategy, requires a logged-in real user (applications and services use MCP, not chat), and hands the turn to the agent loop. The model provider is created on the backend from the selected integration's credentials, so the API key never crosses to the browser. The browser only ever receives streamed tokens and tool events.\n\n```ts\n// Provider-agnostic via base-URL override (OpenAI, Azure, OpenRouter, Ollama, ...).\nconst model = buildLanguageModel({ connection, model: conversation.model });\nconst result = streamText({ model, system, messages, tools, stopWhen: stepCountIs(8) });\nreturn result.toUIMessageStreamResponse();\n```\n\n## Tools come from the same registry\n\nThe loop offers the model exactly the tools the resolver allows for the logged-in principal, no more. The model is treated as an untrusted caller: it picks arguments, but it can never reach a tool the principal cannot use, and every tool call is gated server-side.\n\n- `read` tools auto-run. Their execution re-enters the live router as the logged-in user (the chat request's own auth is forwarded), so handler-side authorization runs exactly as for any other caller. Each successful read writes an `ai_tool_calls` row with `transport: \"chat\"` (and an args hash, never the raw args), so chat reads appear in the audit log AND count toward the per-principal rate-limit budget, exactly like MCP reads.\n- `mutate` and `destructive` tools never run inline. Their executor runs the `propose` dry-run and returns a confirm card carrying the single-use proposal token and the validated payload. Nothing is committed until the operator clicks Apply, which calls `applyTool` with the token.\n\n```ts\n// Disposition of a model-requested tool (the single server-side gate):\ndisposeAgentTool({ toolName, principal, resolver, getTool });\n// -> { kind: \"run\" } | { kind: \"confirm\" } | { kind: \"refused\" }\n```\n\nThe same per-principal rate-limit budget that protects MCP is enforced before every tool call in the loop. See [Propose and apply](/checkstack/developer-guide/ai/propose-apply/) for the budget and the confirm-card token lifecycle.\n\n## The model acknowledges a confirm-card decision\n\nA confirm card ends the model's turn: the model has said what it will do and is now waiting on the operator. When the operator clicks Apply (or Decline), the actual apply still runs through the unchanged `applyTool` propose/apply path, and then a short follow-up turn makes the model react so the conversation does not dead-end on \"waiting for your confirmation\".\n\nThat follow-up is a second mode of the same `/api/ai/chat` handler. Instead of a `message`, the POST body carries a `decision` (the proposal token plus `apply` or `decline`); the handler routes it to `streamDecision`, which streams the model's acknowledgment over the same SSE path as a normal turn.\n\n```ts\n// POST /api/ai/chat — a normal turn OR a confirm-card decision turn:\n{ conversationId, connectionId, model?, message: \"...\" }                 // user turn\n{ conversationId, connectionId, model?, decision: { token, kind } }       // apply | decline\n```\n\nThe decision note handed to the model is derived SERVER-SIDE from the stored proposal (its tool name and the one-line summary captured at propose time), so no client-supplied text ever reaches the model. The note is EPHEMERAL: it is appended to that turn's history only and never persisted; the assistant's streamed reply is what gets saved, and it carries the outcome forward so later turns know the change is live. `streamDecision` re-checks ownership, that the proposal belongs to THIS conversation, and — for an `apply` — that the proposal is actually in the `applied` state (the apply ran first), refusing with a 409 otherwise so the model can never falsely claim a change took effect.\n\n```ts\nproposeApply.describeProposal({ token }); // read-only: tool name, summary, status, conversation (no consume)\nbuildDecisionNote({ decision, toolName, summary }); // the ephemeral, server-derived note\n```\n\n## Conversations are durable and pod-independent\n\nConversations and messages live in `ai_conversations` and `ai_messages` in shared Postgres. The user message is persisted before streaming begins, and the assistant message on completion, so a mid-stream pod restart still leaves a complete, resumable transcript. Any pod can list, open, or continue a chat; nothing about a conversation is pod-local. All reads are owner-scoped, so a user can only ever see their own conversations.\n\nWhen a turn is resumed, the model must see its prior TOOL interactions, not just the text it eventually said. On completion the loop persists the canonical AI-SDK `ResponseMessage[]` for the turn (assistant tool-call parts plus tool-result parts) into the additive `ai_messages.model_messages` column. On the next turn `toModelMessages` replays those messages verbatim, so a multi-turn conversation reconstructs the full tool-call history for the model. Rows written before this column existed (or plain user/system rows) fall back to text-only replay. Because the replay history lives in shared Postgres, replay is identical on whichever pod handles the next turn.\n\nThe conversation contract (RPC):\n\n```ts\nai.listChatIntegrations(); // selectable providers + model UX (no secrets)\nai.listConversations();    // the user's conversations, newest first\nai.createConversation({ integrationId, model });\nai.getConversation({ id }); // conversation + full transcript\nai.updateConversation({ id, title, model });\nai.archiveConversation({ id }); // soft delete: the user-facing \"Delete\" action\nai.deleteConversation({ id }); // hard delete (cascades); not the user action\n```\n\n## Deleting a chat is a soft archive\n\nThe user-facing \"Delete\" action in the sidebar does NOT remove the row. It calls `archiveConversation`, which stamps an `archived_at` timestamp on the `ai_conversations` row. `listConversations` filters `archived_at IS NULL`, so an archived chat disappears from the sidebar, but the conversation and its messages are RETAINED in Postgres for later abuse introspection. The archive is owner-scoped, so a user can only archive their own chats, and a repeat archive of an already-archived row is a no-op.\n\nThe frontend confirms the action through a modal (labeled \"Delete\"), then archives, lets the owning plugin's query invalidation refresh the list, and clears the view back to the empty state if the archived chat was the open one. The hard `deleteConversation` method is retained for non-user callers (such as a retention sweep) but is never wired to the sidebar, so nothing the user clicks ever hard-deletes a transcript.\n\n```ts\narchiveConversation({ id, userId }); // stamps archived_at = now, owner-scoped\n// listConversations filters archived_at IS NULL\n```\n\n## Starting a new chat\n\nThe \"New chat\" button creates a fresh conversation, makes it the active (highlighted) one, and clears the message view. Because `createConversation` is an oRPC mutation, it auto-invalidates the plugin's conversation list on success, so the new chat appears in the sidebar immediately. To avoid spawning a pile of empty \"Untitled chat\" rows, the click is deduplicated: if the conversation already open is itself an empty untitled draft (no title and no messages), the button reuses it instead of creating another row. The decision is a pure helper so it is unit-testable without rendering the page.\n\n```ts\ndecideNewChatAction({ current, messages }); // -> { kind: \"reuse\" } | { kind: \"create\" }\n```\n\n## Conversations are auto-titled\n\nA new conversation starts untitled, so the sidebar would otherwise show \"Untitled chat\". After the first user message of a still-untitled conversation, the backend derives a concise title (at most six words, no quotes, markdown, or trailing punctuation) and persists it with `updateConversation({ title })`. The title is produced by a cheap `generateText` call that reuses the turn's resolved connection and model, then sanitized with `sanitizeGeneratedTitle`.\n\nTitling is fire-and-forget: it runs detached from the streamed turn, so it never delays or crashes the response. On any model or sanitize failure it falls back to a deterministic heuristic from the first message (`deriveHeuristicTitle`: collapse whitespace, first six words, capped at sixty characters). The title lives in the shared `ai_conversations` table, so it is readable on every pod. The chat page invalidates the conversation list when a turn completes to pick up the new title, because the streaming turn is a raw SSE fetch rather than an oRPC mutation and so does not auto-invalidate.\n\n```ts\nderiveHeuristicTitle(firstMessage); // fallback when the model errors\nsanitizeGeneratedTitle(raw);        // strip quotes/markdown/punctuation, cap length\n```\n\n## Per-integration model selection\n\nModel choice is a property of the credential and provider, so it lives on the OpenAI-compatible integration connection: `defaultModel` is required and `availableModels` is an optional allowlist. The chat model picker always renders a `Select` whose options are `[defaultModel, ...availableModels]`, de-duplicated with the default first, so the connection's own default is always selectable. With no `availableModels` the picker contains just the default; it is never a free-text field. The model id is untrusted wire input, so it is revalidated server-side at two points: `createConversation` / `updateConversation` coerce a stored model against the integration's allowlist, and `buildLanguageModel` always runs the requested (or stored) model id through `resolveModelId` before handing it to the provider, so an out-of-allowlist id is coerced to `defaultModel` and never reaches the provider. An empty allowlist allows any model from the picker's default (free-text providers like Ollama still configure a `defaultModel`).\n\n```ts\nresolveModelId({ connection, requested }); // requested, or defaultModel if out of allowlist\n```\n\n## Off-topic guard\n\nThe assistant helps with operating Checkstack (incidents, health checks, anomalies, automations, monitoring, and on-call) AND with questions about the assistant itself or how to use Checkstack. Two layers keep clearly unrelated requests (general coding help, creative writing, general trivia) from spending tokens on the expensive tool loop.\n\nFirst, the system prompt instructs the assistant to decline clearly unrelated requests with a one-line redirect, so even a request that slips past the classifier is steered back.\n\nSecond, a cheap topical pre-classifier runs BEFORE the agent/tool loop. It is a small `generateText` call (injectable like the title generator, defaulting to the turn's resolved model) with a tight prompt that returns a single token: `ON_TOPIC` or `OFF_TOPIC`. The following are always `ON_TOPIC`:\n\n- Checkstack operations: incidents, health checks, anomalies, automations, monitoring, on-call, the platform's data and configuration.\n- Meta/capability questions about the assistant itself: \"what can you do?\", \"who are you?\", \"help\", \"what features do you have?\".\n- Greetings and conversational openers: \"hi\", \"hello\", \"hey\".\n- How-to and conceptual questions about using Checkstack features or workflows: \"how do health checks work?\", \"how do I create an automation?\".\n\nOnly CLEARLY unrelated requests are `OFF_TOPIC`: general coding help unrelated to Checkstack, creative writing, and general trivia or knowledge questions.\n\nThe reply is parsed by a pure function that leans toward `ON_TOPIC` on anything ambiguous or unrecognized, because a false refusal of a real ops question is worse than letting one off-topic request slide.\n\n- On `OFF_TOPIC` the turn short-circuits: the expensive tool loop never runs. A canned, concise refusal is streamed back over the same SSE path the normal turn uses (so the frontend renders it identically) and persisted as the assistant message. The refusal nudges the user toward supported topics rather than just declining.\n- The classifier is fail-open: if the classifier model call throws, the turn proceeds normally. A classifier hiccup must never block legitimate use.\n- The classifier's own small token usage is recorded against the shared `ai_spend` ledger, exactly like any other model call, so it is accounted toward the spend cap.\n\n```ts\nbuildClassifierPrompt({ userText });   // { system, prompt } for the cheap call\nparseClassifierVerdict(raw);           // \"ON_TOPIC\" | \"OFF_TOPIC\" (ambiguous -> ON_TOPIC)\n```\n\n## Per-integration LLM spend cap\n\nEach OpenAI-compatible connection may carry an optional `spendCap`. It is OFF by default: no cap is enforced unless you configure one in the connection's settings form. The cap is a token-count budget, not a USD budget, because token counts are deterministic and provider-agnostic. Every OpenAI-compatible provider (OpenAI, Azure, OpenRouter, Ollama, vLLM, LM Studio) reports token usage through the AI SDK, but only some publish a price table and self-hosted models have none, so a USD cap would need a per-model pricing table that drifts and is meaningless for local models.\n\n```ts\nspendCap: { tokenBudget: 200000, windowMinutes: 60 } // optional; omit for no cap\n```\n\nWhen a cap is set, the loop refuses a new turn once the principal's token usage against that integration in the trailing `windowMinutes` reaches `tokenBudget`, returning a clear spend-exceeded error (HTTP 429). Spend is a rolling-window SUM over the shared `ai_spend` ledger: every completed turn appends one row with the AI SDK's reported input and output tokens, keyed by integration and principal. Because the sum is read from the same shared table every pod writes to, the cap holds across all pods, exactly like the per-principal tool rate-limit budget. An in-memory per-pod token counter would let N pods each allow the cap, which a single-process test could never catch, so the ledger is durable Postgres and the cross-pod count is verified in `core/ai-backend/src/rate-limit/spend-ledger.it.test.ts`.\n\n## Dates and timezones\n\nThe model produces dates as text, so the chat enforces an unambiguous wire contract: every date-time a tool receives must be RFC 3339 with an EXPLICIT timezone offset (`2026-07-01T22:00:00Z` or `2026-07-01T22:00:00+02:00`). Zone-less values (`2026-07-01T22:00:00`) and date-only values (`2026-07-01`) are rejected, because feeding a zone-less string to `new Date()` would interpret it in the pod's local zone and the same string could then resolve to different instants on different pods. A rejected value comes back to the model as a tool-input error naming the field and the requirement, so the model repairs the call itself. The contract is enforced centrally for every tool input and structured output, gated to date fields, in `core/ai-backend/src/chat/model-schema.ts`.\n\nTo turn an operator's bare \"22:00\" into an offset, the model needs a reference timezone. The browser sends its IANA zone (`Intl.DateTimeFormat().resolvedOptions().timeZone`) with every turn, and that zone is folded into the system prompt. So by default each operator's times are interpreted in their own browser timezone, with no configuration.\n\nWhen no browser zone is available (a headless automation \"AI Action\", or a client without `Intl`), the reference zone falls back to the host/container timezone, NOT to UTC. Operators override it by setting the container's `TZ`:\n\n```env\n# Reference timezone for AI date interpretation when no browser zone is sent\n# (e.g. automation AI Actions). Any IANA zone id.\nTZ=Europe/Berlin\n```\n\nThis only affects how a bare time is interpreted into an offset; storage is always an absolute instant. The regular (non-AI) UI is unaffected: its date pickers produce real `Date` objects, which serialize as absolute instants and render back in each viewer's own browser zone.\n\n## No secret leaves the backend\n\nThe integration API key is stored in the Secrets Vault and read only on the backend when building the model provider. The chat RPCs expose only non-secret model UX metadata (`listChatIntegrations` returns connection id, name, default model, and the allowlist). The streamed response carries tokens, tool calls, and tool results (already redacted by their source procedures), never the credential. The no-secret-leak guarantee is regression-guarded across every AI DTO in `core/ai-backend/src/hardening/no-secret-leak.test.ts`.\n\nThe free-form `ai_messages.content` and `model_messages` bags are an exception that could, in principle, carry a credential if a buggy or malicious tool result smuggled one in. That guarantee is no longer merely architectural: `appendMessage` runs `scrubContent` on every message write, redacting any credential-shaped key (`apiKey`, `authorization`, `password`, `x-secret`, and similar) and any high-confidence credential value (an `sk-...` key, a `Bearer` token) before the row reaches Postgres. The scrub is conservative, so ordinary chat prose that merely mentions the word \"token\" or \"password\" is preserved; only credentials are stripped. The canary regression test injects a secret into message content and asserts it is stripped on write in `core/ai-backend/src/chat/scrub-content.test.ts` and `core/ai-backend/src/hardening/no-secret-leak.test.ts`.\n\n## Related\n\nChat shares the [tool registry](/checkstack/developer-guide/ai/tool-registry/) and resolver with the [MCP server](/checkstack/developer-guide/ai/mcp-server/), and gates mutating tools through [propose and apply](/checkstack/developer-guide/ai/propose-apply/). A model that picks a tool the principal cannot use is refused server-side (guarded in `core/ai-backend/src/chat/agent-loop.test.ts` and `core/ai-backend/src/hardening/handler-authz.test.ts`), and cross-pod conversation readback is verified in `core/ai-backend/src/chat/conversation-store.it.test.ts`. See the [AI platform overview](/checkstack/developer-guide/ai/) for the full security model.",
     "truncated": false
   },
   {
@@ -2574,7 +2575,7 @@ export const DOCS_INDEX: readonly DocsIndexEntry[] = [
       "Database connection errors",
       "Next Steps"
     ],
-    "content": "This guide walks you through deploying Checkstack using Docker.\n\n## Prerequisites\n\n- Docker installed and running\n- PostgreSQL database (or use a managed service like Supabase, Neon, etc.)\n\n## Required Environment Variables\n\nCheckstack requires four environment variables to run:\n\n| Variable | Description | Requirements |\n|----------|-------------|--------------|\n| `DATABASE_URL` | PostgreSQL connection string | Valid Postgres URI |\n| `ENCRYPTION_MASTER_KEY` | Encrypts secrets in the database | 64 hex characters (32 bytes) |\n| `BETTER_AUTH_SECRET` | Signs session cookies and OAuth states | Minimum 32 characters |\n| `BASE_URL` | Exact URL used to access Checkstack in the browser | e.g. `http://192.168.1.123:3000` or `https://status.example.com` |\n\n## Generating Secrets\n\n### ENCRYPTION_MASTER_KEY\n\nGenerate a secure 32-byte key:\n\n```bash\n# Using Node.js\nnode -e \"console.log(require('crypto').randomBytes(32).toString('hex'))\"\n\n# Using OpenSSL\nopenssl rand -hex 32\n```\n\nThis produces a 64-character hexadecimal string (e.g., `a1b2c3d4e5f6...`).\n\n### BETTER_AUTH_SECRET\n\nGenerate a secure random string (minimum 32 characters):\n\n```bash\n# Using Node.js\nnode -e \"console.log(require('crypto').randomBytes(32).toString('base64'))\"\n\n# Using OpenSSL\nopenssl rand -base64 32\n```\n\n## Quick Start\n\n```bash\n# Pull the latest image\ndocker pull ghcr.io/enyineer/checkstack:latest\n\n# Run with required environment variables\ndocker run -d \\\n  --name checkstack \\\n  -e DATABASE_URL=\"postgresql://user:password@host:5432/checkstack\" \\\n  -e ENCRYPTION_MASTER_KEY=\"<your-64-char-hex-key>\" \\\n  -e BETTER_AUTH_SECRET=\"<your-32-char-secret>\" \\\n  -e BASE_URL=\"http://192.168.1.123:3000\" \\\n  -p 3000:3000 \\\n  ghcr.io/enyineer/checkstack:latest\n```\n\n## Docker Compose (Recommended)\n\nThe Checkstack repository includes a ready-to-use `docker-compose.yml` in the project root that runs both Checkstack and PostgreSQL:\n\n```bash\n# Clone the repository (or download just the docker-compose.yml)\ngit clone https://github.com/enyineer/checkstack.git\ncd checkstack\n\n# Create your .env file with required secrets\ncat > .env << EOF\nPOSTGRES_USER=checkstack\nPOSTGRES_PASSWORD=checkstack\nPOSTGRES_DB=checkstack\nENCRYPTION_MASTER_KEY=$(openssl rand -hex 32)\nBETTER_AUTH_SECRET=$(openssl rand -base64 32)\nBASE_URL=\"http://192.168.1.123:3000\" # Must match exactly how you access Checkstack!\nEOF\n\n# Start everything\ndocker compose up -d\n```\n\n### Updating the Checkstack Image\n\nTo update to a newer version:\n\n```bash\n# Pull the latest image\ndocker compose pull\n\n# Recreate containers with the new image\ndocker compose up -d\n```\n\n> [!TIP]\n> You can also pin to a specific version by editing the `image:` line in `docker-compose.yml`:\n> ```yaml\n> image: ghcr.io/enyineer/checkstack:<version>\n> ```\n\n## Quick Start (Single Container)\n\nIf you already have a PostgreSQL database, you can run Checkstack as a single container:\n\n```bash\ndocker run -d \\\n  --name checkstack \\\n  -e DATABASE_URL=\"postgresql://user:password@host:5432/checkstack\" \\\n  -e ENCRYPTION_MASTER_KEY=\"<your-64-char-hex-key>\" \\\n  -e BETTER_AUTH_SECRET=\"<your-32-char-secret>\" \\\n  -e BASE_URL=\"http://192.168.1.123:3000\" \\\n  -p 3000:3000 \\\n  ghcr.io/enyineer/checkstack:latest\n```\n\n## Optional Environment Variables\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `LOG_LEVEL` | `info` | Logging level (debug, info, warn, error) |\n| `INTERNAL_URL` | (falls back to `BASE_URL`) | Internal RPC URL for backend-to-backend calls. Set to K8s service name (e.g., `http://checkstack-service:3000`) for multi-pod load balancing. |\n\n## Onboarding flow\n\n> [!TIP]\n> After first start, you'll have to create your first admin user.\n>\n> Upon opening the page eg. at `http://localhost:3000` you'll be greeted with a signup form.\n\n## Health Check\n\nVerify Checkstack is running:\n\n```bash\ncurl http://localhost:3000/api/health\n```\n\n## Troubleshooting\n\n### \"ENCRYPTION_MASTER_KEY must be 32 bytes (64 hex characters)\"\n\nYour encryption key is not the correct length. Generate a new one using the commands above.\n\n### \"BETTER_AUTH_SECRET must be at least 32 characters\"\n\nYour auth secret is too short. Generate a longer one using the commands above.\n\n### Onboarding screen does not appear / app loads empty or very slowly\n\nThis is almost always caused by a wrong `BASE_URL`. When `BASE_URL` points to an incorrect or unreachable address, the frontend cannot reach the backend for session and onboarding checks, which causes it to silently show empty state.\n\n**Make sure `BASE_URL` matches the EXACT URL YOU put into the browser address bar (including possible LAN IPs or the domain):**\n\n```bash\n# Example (Docker on LAN):\nBASE_URL=http://192.168.1.123:3000\n\n# Example (Production):\nBASE_URL=https://status.example.com\n```\n\nYou can verify the value your container is using by checking the config endpoint:\n\n```bash\ncurl http://localhost:3000/api/config\n# Expected: {\"baseUrl\":\"http://localhost:3000\"}\n```\n\nIf `baseUrl` in the response points to port `5173` or any other wrong address, update `BASE_URL` in your `.env` file and recreate the container:\n\n```bash\ndocker compose up -d --force-recreate\n```\n\n### Database connection errors\n\n- Verify your `DATABASE_URL` is correct and the database is reachable\n- Ensure PostgreSQL is running and accepting connections\n- Check firewall rules allow connections between containers\n\n## Next Steps\n\n- [Configure authentication strategies](/checkstack/user-guide/reference/authentication-strategies/)\n- [Set up notification channels](/checkstack/developer-guide/backend/notifications/strategies/)\n- [Create your first health check](/checkstack/developer-guide/backend/healthchecks/strategies/)",
+    "content": "This guide walks you through deploying Checkstack using Docker.\n\n## Prerequisites\n\n- Docker installed and running\n- PostgreSQL database (or use a managed service like Supabase, Neon, etc.)\n\n## Required Environment Variables\n\nCheckstack requires four environment variables to run:\n\n| Variable | Description | Requirements |\n|----------|-------------|--------------|\n| `DATABASE_URL` | PostgreSQL connection string | Valid Postgres URI |\n| `ENCRYPTION_MASTER_KEY` | Encrypts secrets in the database | 64 hex characters (32 bytes) |\n| `BETTER_AUTH_SECRET` | Signs session cookies and OAuth states | Minimum 32 characters |\n| `BASE_URL` | Exact URL used to access Checkstack in the browser | e.g. `http://192.168.1.123:3000` or `https://status.example.com` |\n\n## Generating Secrets\n\n### ENCRYPTION_MASTER_KEY\n\nGenerate a secure 32-byte key:\n\n```bash\n# Using Node.js\nnode -e \"console.log(require('crypto').randomBytes(32).toString('hex'))\"\n\n# Using OpenSSL\nopenssl rand -hex 32\n```\n\nThis produces a 64-character hexadecimal string (e.g., `a1b2c3d4e5f6...`).\n\n### BETTER_AUTH_SECRET\n\nGenerate a secure random string (minimum 32 characters):\n\n```bash\n# Using Node.js\nnode -e \"console.log(require('crypto').randomBytes(32).toString('base64'))\"\n\n# Using OpenSSL\nopenssl rand -base64 32\n```\n\n## Quick Start\n\n```bash\n# Pull the latest image\ndocker pull ghcr.io/enyineer/checkstack:latest\n\n# Run with required environment variables\ndocker run -d \\\n  --name checkstack \\\n  -e DATABASE_URL=\"postgresql://user:password@host:5432/checkstack\" \\\n  -e ENCRYPTION_MASTER_KEY=\"<your-64-char-hex-key>\" \\\n  -e BETTER_AUTH_SECRET=\"<your-32-char-secret>\" \\\n  -e BASE_URL=\"http://192.168.1.123:3000\" \\\n  -p 3000:3000 \\\n  ghcr.io/enyineer/checkstack:latest\n```\n\n## Docker Compose (Recommended)\n\nThe Checkstack repository includes a ready-to-use `docker-compose.yml` in the project root that runs both Checkstack and PostgreSQL:\n\n```bash\n# Clone the repository (or download just the docker-compose.yml)\ngit clone https://github.com/enyineer/checkstack.git\ncd checkstack\n\n# Create your .env file with required secrets\ncat > .env << EOF\nPOSTGRES_USER=checkstack\nPOSTGRES_PASSWORD=checkstack\nPOSTGRES_DB=checkstack\nENCRYPTION_MASTER_KEY=$(openssl rand -hex 32)\nBETTER_AUTH_SECRET=$(openssl rand -base64 32)\nBASE_URL=\"http://192.168.1.123:3000\" # Must match exactly how you access Checkstack!\nEOF\n\n# Start everything\ndocker compose up -d\n```\n\n### Updating the Checkstack Image\n\nTo update to a newer version:\n\n```bash\n# Pull the latest image\ndocker compose pull\n\n# Recreate containers with the new image\ndocker compose up -d\n```\n\n> [!TIP]\n> You can also pin to a specific version by editing the `image:` line in `docker-compose.yml`:\n> ```yaml\n> image: ghcr.io/enyineer/checkstack:<version>\n> ```\n\n## Quick Start (Single Container)\n\nIf you already have a PostgreSQL database, you can run Checkstack as a single container:\n\n```bash\ndocker run -d \\\n  --name checkstack \\\n  -e DATABASE_URL=\"postgresql://user:password@host:5432/checkstack\" \\\n  -e ENCRYPTION_MASTER_KEY=\"<your-64-char-hex-key>\" \\\n  -e BETTER_AUTH_SECRET=\"<your-32-char-secret>\" \\\n  -e BASE_URL=\"http://192.168.1.123:3000\" \\\n  -p 3000:3000 \\\n  ghcr.io/enyineer/checkstack:latest\n```\n\n## Optional Environment Variables\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `LOG_LEVEL` | `info` | Logging level (debug, info, warn, error) |\n| `INTERNAL_URL` | (falls back to `BASE_URL`) | Internal RPC URL for backend-to-backend calls. Set to K8s service name (e.g., `http://checkstack-service:3000`) for multi-pod load balancing. |\n| `TZ` | (host default) | IANA timezone the AI assistant uses to interpret bare times (e.g. \"22:00\") when no browser timezone is available, such as automation AI Actions. In-app chat already uses each operator's own browser timezone. Set this to your deployment's local zone (e.g. `Europe/Berlin`) to override the fallback. See [Internal chat](/checkstack/developer-guide/ai/chat/#dates-and-timezones). |\n\n## Onboarding flow\n\n> [!TIP]\n> After first start, you'll have to create your first admin user.\n>\n> Upon opening the page eg. at `http://localhost:3000` you'll be greeted with a signup form.\n\n## Health Check\n\nVerify Checkstack is running:\n\n```bash\ncurl http://localhost:3000/api/health\n```\n\n## Troubleshooting\n\n### \"ENCRYPTION_MASTER_KEY must be 32 bytes (64 hex characters)\"\n\nYour encryption key is not the correct length. Generate a new one using the commands above.\n\n### \"BETTER_AUTH_SECRET must be at least 32 characters\"\n\nYour auth secret is too short. Generate a longer one using the commands above.\n\n### Onboarding screen does not appear / app loads empty or very slowly\n\nThis is almost always caused by a wrong `BASE_URL`. When `BASE_URL` points to an incorrect or unreachable address, the frontend cannot reach the backend for session and onboarding checks, which causes it to silently show empty state.\n\n**Make sure `BASE_URL` matches the EXACT URL YOU put into the browser address bar (including possible LAN IPs or the domain):**\n\n```bash\n# Example (Docker on LAN):\nBASE_URL=http://192.168.1.123:3000\n\n# Example (Production):\nBASE_URL=https://status.example.com\n```\n\nYou can verify the value your container is using by checking the config endpoint:\n\n```bash\ncurl http://localhost:3000/api/config\n# Expected: {\"baseUrl\":\"http://localhost:3000\"}\n```\n\nIf `baseUrl` in the response points to port `5173` or any other wrong address, update `BASE_URL` in your `.env` file and recreate the container:\n\n```bash\ndocker compose up -d --force-recreate\n```\n\n### Database connection errors\n\n- Verify your `DATABASE_URL` is correct and the database is reachable\n- Ensure PostgreSQL is running and accepting connections\n- Check firewall rules allow connections between containers\n\n## Next Steps\n\n- [Configure authentication strategies](/checkstack/user-guide/reference/authentication-strategies/)\n- [Set up notification channels](/checkstack/developer-guide/backend/notifications/strategies/)\n- [Create your first health check](/checkstack/developer-guide/backend/healthchecks/strategies/)",
     "truncated": false
   },
   {
@@ -3018,4 +3019,4 @@ export const DOCS_INDEX: readonly DocsIndexEntry[] = [
 ];
 /** A content hash of the source tree, so a CI check can detect drift. */
-export const DOCS_INDEX_HASH = "1ceed8a6cbc326a222af17a8edc94219488f1a4b835deb360acf954ffcf30a25";
+export const DOCS_INDEX_HASH = "24a01723a26c2389adda98d7fde1e96d0b0e5e44ba42c04eb6c149bc060d6ddc";

package/src/serializer.test.ts CHANGED Viewed

@@ -41,6 +41,28 @@ describe("serializeTool", () => {
     );
   });
+  test("serializes a tool whose schema has Date fields (no throw)", () => {
+    // Regression for "The assistant hit an error: Date cannot be represented in
+    // JSON Schema": tools that read timestamped resources (incidents, health
+    // checks, anomalies) carry `z.date()` fields. Projecting the tool list must
+    // not throw - it runs on every chat turn before the model is even called.
+    const dateTool: RegisteredAiTool = {
+      name: "incident.get",
+      description: "Get an incident.",
+      effect: "read",
+      input: z.object({ id: z.string() }),
+      output: z.object({ id: z.string(), createdAt: z.date() }),
+      requiredAccessRules: ["incident.incident.read"],
+      execute: () => Promise.resolve({}),
+    };
+    const descriptor = serializeTool({ tool: dateTool });
+    const out = descriptor.outputSchema as {
+      properties: Record<string, Record<string, unknown>>;
+    };
+    expect(out.properties.createdAt?.type).toBe("string");
+    expect(out.properties.createdAt?.format).toBe("date-time");
+  });
   test("never emits a secret VALUE into the descriptor", () => {
     // A tool whose input has an x-secret field: the schema describes the field
     // but the descriptor must never contain a concrete secret value.