durar-ai 2026.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (571) hide show
  1. package/CHANGELOG.md +5497 -0
  2. package/LICENSE +21 -0
  3. package/README.md +614 -0
  4. package/assets/avatar-placeholder.svg +19 -0
  5. package/assets/chrome-extension/icons/icon128.png +0 -0
  6. package/assets/chrome-extension/icons/icon16.png +0 -0
  7. package/assets/chrome-extension/icons/icon32.png +0 -0
  8. package/assets/chrome-extension/icons/icon48.png +0 -0
  9. package/assets/dmg-background-small.png +0 -0
  10. package/assets/dmg-background.png +0 -0
  11. package/docs/.i18n/README.md +72 -0
  12. package/docs/.i18n/ar-navigation.json +18 -0
  13. package/docs/.i18n/de-navigation.json +18 -0
  14. package/docs/.i18n/es-navigation.json +18 -0
  15. package/docs/.i18n/fr-navigation.json +18 -0
  16. package/docs/.i18n/glossary.ar.json +5 -0
  17. package/docs/.i18n/glossary.de.json +5 -0
  18. package/docs/.i18n/glossary.es.json +5 -0
  19. package/docs/.i18n/glossary.fr.json +5 -0
  20. package/docs/.i18n/glossary.id.json +5 -0
  21. package/docs/.i18n/glossary.it.json +5 -0
  22. package/docs/.i18n/glossary.ja-JP.json +14 -0
  23. package/docs/.i18n/glossary.ko.json +5 -0
  24. package/docs/.i18n/glossary.pl.json +5 -0
  25. package/docs/.i18n/glossary.pt-BR.json +5 -0
  26. package/docs/.i18n/glossary.tr.json +5 -0
  27. package/docs/.i18n/glossary.zh-CN.json +358 -0
  28. package/docs/.i18n/id-navigation.json +18 -0
  29. package/docs/.i18n/it-navigation.json +18 -0
  30. package/docs/.i18n/ja-navigation.json +18 -0
  31. package/docs/.i18n/ko-navigation.json +18 -0
  32. package/docs/.i18n/pl-navigation.json +18 -0
  33. package/docs/.i18n/pt-BR-navigation.json +18 -0
  34. package/docs/.i18n/tr-navigation.json +18 -0
  35. package/docs/.i18n/zh-Hans-navigation.json +544 -0
  36. package/docs/assets/install-script.svg +1 -0
  37. package/docs/assets/macos-onboarding/01-macos-warning.jpeg +0 -0
  38. package/docs/assets/macos-onboarding/02-local-networks.jpeg +0 -0
  39. package/docs/assets/macos-onboarding/03-security-notice.png +0 -0
  40. package/docs/assets/macos-onboarding/04-choose-gateway.png +0 -0
  41. package/docs/assets/macos-onboarding/05-permissions.png +0 -0
  42. package/docs/assets/openclaw-logo-text-dark.png +0 -0
  43. package/docs/assets/openclaw-logo-text-dark.svg +418 -0
  44. package/docs/assets/openclaw-logo-text.png +0 -0
  45. package/docs/assets/openclaw-logo-text.svg +418 -0
  46. package/docs/assets/pixel-lobster.svg +60 -0
  47. package/docs/assets/showcase/agents-ui.jpg +0 -0
  48. package/docs/assets/showcase/bambu-cli.png +0 -0
  49. package/docs/assets/showcase/codexmonitor.png +0 -0
  50. package/docs/assets/showcase/gohome-grafana.png +0 -0
  51. package/docs/assets/showcase/ios-testflight.jpg +0 -0
  52. package/docs/assets/showcase/oura-health.png +0 -0
  53. package/docs/assets/showcase/padel-cli.svg +11 -0
  54. package/docs/assets/showcase/padel-screenshot.jpg +0 -0
  55. package/docs/assets/showcase/papla-tts.jpg +0 -0
  56. package/docs/assets/showcase/pr-review-telegram.jpg +0 -0
  57. package/docs/assets/showcase/roborock-screenshot.jpg +0 -0
  58. package/docs/assets/showcase/roborock-status.svg +13 -0
  59. package/docs/assets/showcase/roof-camera-sky.jpg +0 -0
  60. package/docs/assets/showcase/snag.png +0 -0
  61. package/docs/assets/showcase/tesco-shop.jpg +0 -0
  62. package/docs/assets/showcase/wienerlinien.png +0 -0
  63. package/docs/assets/showcase/wine-cellar-skill.jpg +0 -0
  64. package/docs/assets/showcase/winix-air-purifier.jpg +0 -0
  65. package/docs/assets/showcase/xuezh-pronunciation.jpeg +0 -0
  66. package/docs/assets/sponsors/blacksmith-light.svg +14 -0
  67. package/docs/assets/sponsors/blacksmith.svg +14 -0
  68. package/docs/assets/sponsors/convex-light.svg +16 -0
  69. package/docs/assets/sponsors/convex.svg +16 -0
  70. package/docs/assets/sponsors/github-light.svg +3 -0
  71. package/docs/assets/sponsors/github.svg +3 -0
  72. package/docs/assets/sponsors/nvidia-dark.svg +9 -0
  73. package/docs/assets/sponsors/nvidia.svg +9 -0
  74. package/docs/assets/sponsors/openai-light.svg +3 -0
  75. package/docs/assets/sponsors/openai.svg +3 -0
  76. package/docs/assets/sponsors/vercel-light.svg +5 -0
  77. package/docs/assets/sponsors/vercel.svg +5 -0
  78. package/docs/auth-credential-semantics.md +80 -0
  79. package/docs/automation/auth-monitoring.md +8 -0
  80. package/docs/automation/clawflow.md +8 -0
  81. package/docs/automation/cron-jobs.md +410 -0
  82. package/docs/automation/cron-vs-heartbeat.md +8 -0
  83. package/docs/automation/gmail-pubsub.md +8 -0
  84. package/docs/automation/hooks.md +303 -0
  85. package/docs/automation/index.md +115 -0
  86. package/docs/automation/poll.md +8 -0
  87. package/docs/automation/standing-orders.md +254 -0
  88. package/docs/automation/taskflow.md +82 -0
  89. package/docs/automation/tasks.md +323 -0
  90. package/docs/automation/troubleshooting.md +8 -0
  91. package/docs/automation/webhook.md +8 -0
  92. package/docs/brave-search.md +103 -0
  93. package/docs/channels/bluebubbles.md +435 -0
  94. package/docs/channels/broadcast-groups.md +442 -0
  95. package/docs/channels/channel-routing.md +139 -0
  96. package/docs/channels/discord.md +1254 -0
  97. package/docs/channels/feishu.md +793 -0
  98. package/docs/channels/googlechat.md +270 -0
  99. package/docs/channels/group-messages.md +84 -0
  100. package/docs/channels/groups.md +410 -0
  101. package/docs/channels/imessage.md +427 -0
  102. package/docs/channels/index.md +50 -0
  103. package/docs/channels/irc.md +252 -0
  104. package/docs/channels/line.md +225 -0
  105. package/docs/channels/location.md +56 -0
  106. package/docs/channels/matrix.md +869 -0
  107. package/docs/channels/mattermost.md +472 -0
  108. package/docs/channels/msteams.md +805 -0
  109. package/docs/channels/nextcloud-talk.md +149 -0
  110. package/docs/channels/nostr.md +252 -0
  111. package/docs/channels/pairing.md +129 -0
  112. package/docs/channels/qqbot.md +193 -0
  113. package/docs/channels/signal.md +337 -0
  114. package/docs/channels/slack.md +681 -0
  115. package/docs/channels/synology-chat.md +185 -0
  116. package/docs/channels/telegram.md +1072 -0
  117. package/docs/channels/tlon.md +290 -0
  118. package/docs/channels/troubleshooting.md +133 -0
  119. package/docs/channels/twitch.md +394 -0
  120. package/docs/channels/whatsapp.md +488 -0
  121. package/docs/channels/zalo.md +254 -0
  122. package/docs/channels/zalouser.md +195 -0
  123. package/docs/ci.md +66 -0
  124. package/docs/cli/acp.md +316 -0
  125. package/docs/cli/agent.md +57 -0
  126. package/docs/cli/agents.md +220 -0
  127. package/docs/cli/approvals.md +136 -0
  128. package/docs/cli/backup.md +84 -0
  129. package/docs/cli/browser.md +233 -0
  130. package/docs/cli/channels.md +131 -0
  131. package/docs/cli/clawbot.md +21 -0
  132. package/docs/cli/completion.md +35 -0
  133. package/docs/cli/config.md +353 -0
  134. package/docs/cli/configure.md +70 -0
  135. package/docs/cli/cron.md +167 -0
  136. package/docs/cli/daemon.md +57 -0
  137. package/docs/cli/dashboard.md +22 -0
  138. package/docs/cli/devices.md +171 -0
  139. package/docs/cli/directory.md +63 -0
  140. package/docs/cli/dns.md +48 -0
  141. package/docs/cli/docs.md +28 -0
  142. package/docs/cli/doctor.md +63 -0
  143. package/docs/cli/flows.md +18 -0
  144. package/docs/cli/gateway.md +307 -0
  145. package/docs/cli/health.md +36 -0
  146. package/docs/cli/hooks.md +337 -0
  147. package/docs/cli/index.md +1836 -0
  148. package/docs/cli/logs.md +59 -0
  149. package/docs/cli/mcp.md +505 -0
  150. package/docs/cli/memory.md +139 -0
  151. package/docs/cli/message.md +300 -0
  152. package/docs/cli/models.md +136 -0
  153. package/docs/cli/node.md +137 -0
  154. package/docs/cli/nodes.md +66 -0
  155. package/docs/cli/onboard.md +171 -0
  156. package/docs/cli/pairing.md +65 -0
  157. package/docs/cli/plugins.md +305 -0
  158. package/docs/cli/qr.md +52 -0
  159. package/docs/cli/reset.md +35 -0
  160. package/docs/cli/sandbox.md +197 -0
  161. package/docs/cli/secrets.md +197 -0
  162. package/docs/cli/security.md +86 -0
  163. package/docs/cli/sessions.md +113 -0
  164. package/docs/cli/setup.md +45 -0
  165. package/docs/cli/skills.md +59 -0
  166. package/docs/cli/status.md +35 -0
  167. package/docs/cli/system.md +71 -0
  168. package/docs/cli/tui.md +30 -0
  169. package/docs/cli/uninstall.md +39 -0
  170. package/docs/cli/update.md +113 -0
  171. package/docs/cli/voicecall.md +34 -0
  172. package/docs/cli/webhooks.md +91 -0
  173. package/docs/concepts/agent-loop.md +168 -0
  174. package/docs/concepts/agent-workspace.md +246 -0
  175. package/docs/concepts/agent.md +129 -0
  176. package/docs/concepts/architecture.md +156 -0
  177. package/docs/concepts/compaction.md +122 -0
  178. package/docs/concepts/context-engine.md +274 -0
  179. package/docs/concepts/context.md +179 -0
  180. package/docs/concepts/delegate-architecture.md +307 -0
  181. package/docs/concepts/dreaming.md +173 -0
  182. package/docs/concepts/features.md +76 -0
  183. package/docs/concepts/markdown-formatting.md +130 -0
  184. package/docs/concepts/memory-builtin.md +105 -0
  185. package/docs/concepts/memory-honcho.md +140 -0
  186. package/docs/concepts/memory-qmd.md +163 -0
  187. package/docs/concepts/memory-search.md +141 -0
  188. package/docs/concepts/memory.md +121 -0
  189. package/docs/concepts/messages.md +161 -0
  190. package/docs/concepts/model-failover.md +349 -0
  191. package/docs/concepts/model-providers.md +799 -0
  192. package/docs/concepts/models.md +255 -0
  193. package/docs/concepts/multi-agent.md +615 -0
  194. package/docs/concepts/oauth.md +225 -0
  195. package/docs/concepts/presence.md +102 -0
  196. package/docs/concepts/queue.md +89 -0
  197. package/docs/concepts/retry.md +69 -0
  198. package/docs/concepts/session-pruning.md +92 -0
  199. package/docs/concepts/session-tool.md +141 -0
  200. package/docs/concepts/session.md +116 -0
  201. package/docs/concepts/soul.md +110 -0
  202. package/docs/concepts/streaming.md +161 -0
  203. package/docs/concepts/system-prompt.md +182 -0
  204. package/docs/concepts/timezone.md +97 -0
  205. package/docs/concepts/typebox.md +307 -0
  206. package/docs/concepts/typing-indicators.md +69 -0
  207. package/docs/concepts/usage-tracking.md +59 -0
  208. package/docs/date-time.md +128 -0
  209. package/docs/debug/node-issue.md +85 -0
  210. package/docs/diagnostics/flags.md +91 -0
  211. package/docs/docs.json +1601 -0
  212. package/docs/gateway/authentication.md +218 -0
  213. package/docs/gateway/background-process.md +131 -0
  214. package/docs/gateway/bonjour.md +179 -0
  215. package/docs/gateway/bridge-protocol.md +89 -0
  216. package/docs/gateway/cli-backends.md +310 -0
  217. package/docs/gateway/configuration-examples.md +631 -0
  218. package/docs/gateway/configuration-reference.md +3618 -0
  219. package/docs/gateway/configuration.md +698 -0
  220. package/docs/gateway/discovery.md +141 -0
  221. package/docs/gateway/doctor.md +494 -0
  222. package/docs/gateway/gateway-lock.md +37 -0
  223. package/docs/gateway/health.md +61 -0
  224. package/docs/gateway/heartbeat.md +443 -0
  225. package/docs/gateway/index.md +367 -0
  226. package/docs/gateway/local-models.md +163 -0
  227. package/docs/gateway/logging.md +113 -0
  228. package/docs/gateway/multiple-gateways.md +120 -0
  229. package/docs/gateway/network-model.md +25 -0
  230. package/docs/gateway/openai-http-api.md +280 -0
  231. package/docs/gateway/openresponses-http-api.md +340 -0
  232. package/docs/gateway/openshell.md +307 -0
  233. package/docs/gateway/pairing.md +138 -0
  234. package/docs/gateway/protocol.md +588 -0
  235. package/docs/gateway/remote-gateway-readme.md +164 -0
  236. package/docs/gateway/remote.md +251 -0
  237. package/docs/gateway/sandbox-vs-tool-policy-vs-elevated.md +141 -0
  238. package/docs/gateway/sandboxing.md +473 -0
  239. package/docs/gateway/secrets-plan-contract.md +116 -0
  240. package/docs/gateway/secrets.md +541 -0
  241. package/docs/gateway/security/index.md +1362 -0
  242. package/docs/gateway/tailscale.md +136 -0
  243. package/docs/gateway/tools-invoke-http-api.md +161 -0
  244. package/docs/gateway/troubleshooting.md +451 -0
  245. package/docs/gateway/trusted-proxy-auth.md +399 -0
  246. package/docs/help/debugging.md +168 -0
  247. package/docs/help/environment.md +165 -0
  248. package/docs/help/faq.md +3244 -0
  249. package/docs/help/index.md +28 -0
  250. package/docs/help/scripts.md +27 -0
  251. package/docs/help/testing.md +640 -0
  252. package/docs/help/troubleshooting.md +372 -0
  253. package/docs/images/configure-model-picker-unsearchable.png +0 -0
  254. package/docs/images/feishu-step2-create-app.png +0 -0
  255. package/docs/images/feishu-step3-credentials.png +0 -0
  256. package/docs/images/feishu-step4-permissions.png +0 -0
  257. package/docs/images/feishu-step5-bot-capability.png +0 -0
  258. package/docs/images/feishu-step6-event-subscription.png +0 -0
  259. package/docs/images/feishu-verification-token.png +0 -0
  260. package/docs/images/groups-flow.svg +52 -0
  261. package/docs/images/mobile-ui-screenshot.png +0 -0
  262. package/docs/index.md +196 -0
  263. package/docs/install/ansible.md +230 -0
  264. package/docs/install/azure.md +311 -0
  265. package/docs/install/bun.md +55 -0
  266. package/docs/install/clawdock.md +106 -0
  267. package/docs/install/development-channels.md +131 -0
  268. package/docs/install/digitalocean.md +129 -0
  269. package/docs/install/docker-vm-runtime.md +142 -0
  270. package/docs/install/docker.md +412 -0
  271. package/docs/install/exe-dev.md +133 -0
  272. package/docs/install/fly.md +504 -0
  273. package/docs/install/gcp.md +412 -0
  274. package/docs/install/hetzner.md +259 -0
  275. package/docs/install/index.md +212 -0
  276. package/docs/install/installer.md +443 -0
  277. package/docs/install/kubernetes.md +192 -0
  278. package/docs/install/macos-vm.md +281 -0
  279. package/docs/install/migrating-matrix.md +349 -0
  280. package/docs/install/migrating.md +112 -0
  281. package/docs/install/nix.md +89 -0
  282. package/docs/install/node.md +144 -0
  283. package/docs/install/northflank.mdx +42 -0
  284. package/docs/install/oracle.md +158 -0
  285. package/docs/install/podman.md +210 -0
  286. package/docs/install/railway.mdx +90 -0
  287. package/docs/install/raspberry-pi.md +159 -0
  288. package/docs/install/render.mdx +165 -0
  289. package/docs/install/uninstall.md +128 -0
  290. package/docs/install/updating.md +142 -0
  291. package/docs/logging.md +389 -0
  292. package/docs/nav-tabs-underline.js +100 -0
  293. package/docs/network.md +69 -0
  294. package/docs/nodes/audio.md +191 -0
  295. package/docs/nodes/camera.md +162 -0
  296. package/docs/nodes/images.md +73 -0
  297. package/docs/nodes/index.md +408 -0
  298. package/docs/nodes/location-command.md +98 -0
  299. package/docs/nodes/media-understanding.md +432 -0
  300. package/docs/nodes/talk.md +92 -0
  301. package/docs/nodes/troubleshooting.md +123 -0
  302. package/docs/nodes/voicewake.md +66 -0
  303. package/docs/perplexity.md +181 -0
  304. package/docs/pi-dev.md +80 -0
  305. package/docs/pi.md +570 -0
  306. package/docs/platforms/android.md +244 -0
  307. package/docs/platforms/digitalocean.md +266 -0
  308. package/docs/platforms/index.md +55 -0
  309. package/docs/platforms/ios.md +223 -0
  310. package/docs/platforms/linux.md +100 -0
  311. package/docs/platforms/mac/bundled-gateway.md +75 -0
  312. package/docs/platforms/mac/canvas.md +125 -0
  313. package/docs/platforms/mac/child-process.md +69 -0
  314. package/docs/platforms/mac/dev-setup.md +107 -0
  315. package/docs/platforms/mac/health.md +34 -0
  316. package/docs/platforms/mac/icon.md +31 -0
  317. package/docs/platforms/mac/logging.md +57 -0
  318. package/docs/platforms/mac/menu-bar.md +81 -0
  319. package/docs/platforms/mac/peekaboo.md +65 -0
  320. package/docs/platforms/mac/permissions.md +50 -0
  321. package/docs/platforms/mac/remote.md +84 -0
  322. package/docs/platforms/mac/signing.md +47 -0
  323. package/docs/platforms/mac/skills.md +40 -0
  324. package/docs/platforms/mac/voice-overlay.md +60 -0
  325. package/docs/platforms/mac/voicewake.md +67 -0
  326. package/docs/platforms/mac/webchat.md +51 -0
  327. package/docs/platforms/mac/xpc.md +61 -0
  328. package/docs/platforms/macos.md +229 -0
  329. package/docs/platforms/oracle.md +305 -0
  330. package/docs/platforms/raspberry-pi.md +420 -0
  331. package/docs/platforms/windows.md +241 -0
  332. package/docs/plugins/agent-tools.md +10 -0
  333. package/docs/plugins/architecture.md +1609 -0
  334. package/docs/plugins/building-extensions.md +10 -0
  335. package/docs/plugins/building-plugins.md +319 -0
  336. package/docs/plugins/bundles.md +292 -0
  337. package/docs/plugins/community.md +149 -0
  338. package/docs/plugins/manifest.md +412 -0
  339. package/docs/plugins/sdk-channel-plugins.md +508 -0
  340. package/docs/plugins/sdk-entrypoints.md +210 -0
  341. package/docs/plugins/sdk-migration.md +359 -0
  342. package/docs/plugins/sdk-overview.md +475 -0
  343. package/docs/plugins/sdk-provider-plugins.md +712 -0
  344. package/docs/plugins/sdk-runtime.md +381 -0
  345. package/docs/plugins/sdk-setup.md +516 -0
  346. package/docs/plugins/sdk-testing.md +263 -0
  347. package/docs/plugins/voice-call.md +466 -0
  348. package/docs/plugins/zalouser.md +78 -0
  349. package/docs/prose.md +134 -0
  350. package/docs/providers/anthropic.md +402 -0
  351. package/docs/providers/bedrock-mantle.md +91 -0
  352. package/docs/providers/bedrock.md +273 -0
  353. package/docs/providers/chutes.md +103 -0
  354. package/docs/providers/claude-max-api-proxy.md +163 -0
  355. package/docs/providers/cloudflare-ai-gateway.md +71 -0
  356. package/docs/providers/deepgram.md +93 -0
  357. package/docs/providers/deepseek.md +53 -0
  358. package/docs/providers/fireworks.md +69 -0
  359. package/docs/providers/github-copilot.md +80 -0
  360. package/docs/providers/glm.md +68 -0
  361. package/docs/providers/google.md +149 -0
  362. package/docs/providers/groq.md +105 -0
  363. package/docs/providers/huggingface.md +193 -0
  364. package/docs/providers/index.md +81 -0
  365. package/docs/providers/kilocode.md +89 -0
  366. package/docs/providers/litellm.md +159 -0
  367. package/docs/providers/minimax.md +281 -0
  368. package/docs/providers/mistral.md +68 -0
  369. package/docs/providers/models.md +56 -0
  370. package/docs/providers/moonshot.md +224 -0
  371. package/docs/providers/nvidia.md +58 -0
  372. package/docs/providers/ollama.md +379 -0
  373. package/docs/providers/openai.md +472 -0
  374. package/docs/providers/opencode-go.md +45 -0
  375. package/docs/providers/opencode.md +68 -0
  376. package/docs/providers/openrouter.md +59 -0
  377. package/docs/providers/perplexity-provider.md +62 -0
  378. package/docs/providers/qianfan.md +90 -0
  379. package/docs/providers/qwen.md +128 -0
  380. package/docs/providers/qwen_modelstudio.md +137 -0
  381. package/docs/providers/sglang.md +115 -0
  382. package/docs/providers/stepfun.md +152 -0
  383. package/docs/providers/synthetic.md +101 -0
  384. package/docs/providers/together.md +70 -0
  385. package/docs/providers/venice.md +282 -0
  386. package/docs/providers/vercel-ai-gateway.md +60 -0
  387. package/docs/providers/vllm.md +103 -0
  388. package/docs/providers/volcengine.md +94 -0
  389. package/docs/providers/xai.md +94 -0
  390. package/docs/providers/xiaomi.md +89 -0
  391. package/docs/providers/zai.md +75 -0
  392. package/docs/reference/AGENTS.default.md +126 -0
  393. package/docs/reference/RELEASING.md +138 -0
  394. package/docs/reference/api-usage-costs.md +198 -0
  395. package/docs/reference/credits.md +30 -0
  396. package/docs/reference/device-models.md +47 -0
  397. package/docs/reference/memory-config.md +421 -0
  398. package/docs/reference/prompt-caching.md +344 -0
  399. package/docs/reference/rpc.md +43 -0
  400. package/docs/reference/secretref-credential-surface.md +148 -0
  401. package/docs/reference/secretref-user-supplied-credentials-matrix.json +607 -0
  402. package/docs/reference/session-management-compaction.md +352 -0
  403. package/docs/reference/templates/AGENTS.dev.md +84 -0
  404. package/docs/reference/templates/AGENTS.md +219 -0
  405. package/docs/reference/templates/BOOT.md +12 -0
  406. package/docs/reference/templates/BOOTSTRAP.md +62 -0
  407. package/docs/reference/templates/CLAUDE.md +1 -0
  408. package/docs/reference/templates/HEARTBEAT.md +14 -0
  409. package/docs/reference/templates/IDENTITY.dev.md +48 -0
  410. package/docs/reference/templates/IDENTITY.md +30 -0
  411. package/docs/reference/templates/SOUL.dev.md +77 -0
  412. package/docs/reference/templates/SOUL.md +45 -0
  413. package/docs/reference/templates/TOOLS.dev.md +25 -0
  414. package/docs/reference/templates/TOOLS.md +47 -0
  415. package/docs/reference/templates/USER.dev.md +19 -0
  416. package/docs/reference/templates/USER.md +24 -0
  417. package/docs/reference/test.md +119 -0
  418. package/docs/reference/token-use.md +197 -0
  419. package/docs/reference/transcript-hygiene.md +151 -0
  420. package/docs/reference/wizard.md +245 -0
  421. package/docs/security/CONTRIBUTING-THREAT-MODEL.md +98 -0
  422. package/docs/security/THREAT-MODEL-ATLAS.md +608 -0
  423. package/docs/security/formal-verification.md +167 -0
  424. package/docs/snippets/plugin-publish/minimal-openclaw.plugin.json +9 -0
  425. package/docs/snippets/plugin-publish/minimal-package.json +16 -0
  426. package/docs/start/bootstrapping.md +41 -0
  427. package/docs/start/docs-directory.md +67 -0
  428. package/docs/start/getting-started.md +148 -0
  429. package/docs/start/hubs.md +199 -0
  430. package/docs/start/lore.md +219 -0
  431. package/docs/start/onboarding-overview.md +69 -0
  432. package/docs/start/onboarding.md +92 -0
  433. package/docs/start/openclaw.md +225 -0
  434. package/docs/start/quickstart.md +22 -0
  435. package/docs/start/setup.md +172 -0
  436. package/docs/start/showcase.md +418 -0
  437. package/docs/start/wizard-cli-automation.md +233 -0
  438. package/docs/start/wizard-cli-reference.md +324 -0
  439. package/docs/start/wizard.md +127 -0
  440. package/docs/style.css +37 -0
  441. package/docs/tools/acp-agents.md +837 -0
  442. package/docs/tools/agent-send.md +100 -0
  443. package/docs/tools/apply-patch.md +52 -0
  444. package/docs/tools/brave-search.md +107 -0
  445. package/docs/tools/browser-linux-troubleshooting.md +145 -0
  446. package/docs/tools/browser-login.md +73 -0
  447. package/docs/tools/browser-wsl2-windows-remote-cdp-troubleshooting.md +221 -0
  448. package/docs/tools/browser.md +890 -0
  449. package/docs/tools/btw.md +142 -0
  450. package/docs/tools/capability-cookbook.md +119 -0
  451. package/docs/tools/clawhub.md +348 -0
  452. package/docs/tools/code-execution.md +90 -0
  453. package/docs/tools/creating-skills.md +119 -0
  454. package/docs/tools/diffs.md +434 -0
  455. package/docs/tools/duckduckgo-search.md +102 -0
  456. package/docs/tools/elevated.md +116 -0
  457. package/docs/tools/exa-search.md +127 -0
  458. package/docs/tools/exec-approvals.md +635 -0
  459. package/docs/tools/exec.md +237 -0
  460. package/docs/tools/firecrawl.md +147 -0
  461. package/docs/tools/gemini-search.md +98 -0
  462. package/docs/tools/grok-search.md +102 -0
  463. package/docs/tools/image-generation.md +139 -0
  464. package/docs/tools/index.md +174 -0
  465. package/docs/tools/kimi-search.md +98 -0
  466. package/docs/tools/llm-task.md +119 -0
  467. package/docs/tools/lobster.md +348 -0
  468. package/docs/tools/loop-detection.md +100 -0
  469. package/docs/tools/minimax-search.md +99 -0
  470. package/docs/tools/multi-agent-sandbox-tools.md +373 -0
  471. package/docs/tools/ollama-search.md +100 -0
  472. package/docs/tools/pdf.md +176 -0
  473. package/docs/tools/perplexity-search.md +185 -0
  474. package/docs/tools/plugin.md +348 -0
  475. package/docs/tools/reactions.md +78 -0
  476. package/docs/tools/searxng-search.md +132 -0
  477. package/docs/tools/skills-config.md +133 -0
  478. package/docs/tools/skills.md +377 -0
  479. package/docs/tools/slash-commands.md +322 -0
  480. package/docs/tools/subagents.md +341 -0
  481. package/docs/tools/tavily.md +129 -0
  482. package/docs/tools/thinking.md +102 -0
  483. package/docs/tools/tts.md +452 -0
  484. package/docs/tools/web-fetch.md +159 -0
  485. package/docs/tools/web.md +417 -0
  486. package/docs/tts.md +452 -0
  487. package/docs/vps.md +115 -0
  488. package/docs/web/control-ui.md +318 -0
  489. package/docs/web/dashboard.md +93 -0
  490. package/docs/web/index.md +126 -0
  491. package/docs/web/tui.md +176 -0
  492. package/docs/web/webchat.md +77 -0
  493. package/docs/whatsapp-openclaw-ai-zh.jpg +0 -0
  494. package/docs/whatsapp-openclaw.jpg +0 -0
  495. package/durar.mjs +180 -0
  496. package/package.json +1259 -0
  497. package/scripts/npm-runner.mjs +111 -0
  498. package/scripts/postinstall-bundled-plugins.mjs +188 -0
  499. package/skills/1password/SKILL.md +70 -0
  500. package/skills/1password/references/cli-examples.md +29 -0
  501. package/skills/1password/references/get-started.md +17 -0
  502. package/skills/apple-notes/SKILL.md +77 -0
  503. package/skills/apple-reminders/SKILL.md +118 -0
  504. package/skills/bear-notes/SKILL.md +107 -0
  505. package/skills/blogwatcher/SKILL.md +69 -0
  506. package/skills/blucli/SKILL.md +47 -0
  507. package/skills/bluebubbles/SKILL.md +131 -0
  508. package/skills/camsnap/SKILL.md +45 -0
  509. package/skills/canvas/SKILL.md +199 -0
  510. package/skills/clawhub/SKILL.md +77 -0
  511. package/skills/coding-agent/SKILL.md +316 -0
  512. package/skills/discord/SKILL.md +197 -0
  513. package/skills/eightctl/SKILL.md +50 -0
  514. package/skills/gemini/SKILL.md +43 -0
  515. package/skills/gh-issues/SKILL.md +885 -0
  516. package/skills/gifgrep/SKILL.md +79 -0
  517. package/skills/github/SKILL.md +163 -0
  518. package/skills/gog/SKILL.md +116 -0
  519. package/skills/goplaces/SKILL.md +52 -0
  520. package/skills/healthcheck/SKILL.md +245 -0
  521. package/skills/himalaya/SKILL.md +257 -0
  522. package/skills/himalaya/references/configuration.md +184 -0
  523. package/skills/himalaya/references/message-composition.md +199 -0
  524. package/skills/imsg/SKILL.md +122 -0
  525. package/skills/mcporter/SKILL.md +61 -0
  526. package/skills/model-usage/SKILL.md +69 -0
  527. package/skills/model-usage/references/codexbar-cli.md +33 -0
  528. package/skills/model-usage/scripts/model_usage.py +320 -0
  529. package/skills/model-usage/scripts/test_model_usage.py +40 -0
  530. package/skills/nano-pdf/SKILL.md +38 -0
  531. package/skills/node-connect/SKILL.md +142 -0
  532. package/skills/notion/SKILL.md +174 -0
  533. package/skills/obsidian/SKILL.md +81 -0
  534. package/skills/openai-whisper/SKILL.md +38 -0
  535. package/skills/openai-whisper-api/SKILL.md +62 -0
  536. package/skills/openai-whisper-api/scripts/transcribe.sh +88 -0
  537. package/skills/openhue/SKILL.md +112 -0
  538. package/skills/oracle/SKILL.md +125 -0
  539. package/skills/ordercli/SKILL.md +78 -0
  540. package/skills/peekaboo/SKILL.md +190 -0
  541. package/skills/sag/SKILL.md +87 -0
  542. package/skills/session-logs/SKILL.md +151 -0
  543. package/skills/sherpa-onnx-tts/SKILL.md +109 -0
  544. package/skills/sherpa-onnx-tts/bin/sherpa-onnx-tts +178 -0
  545. package/skills/skill-creator/SKILL.md +372 -0
  546. package/skills/skill-creator/license.txt +202 -0
  547. package/skills/skill-creator/scripts/init_skill.py +378 -0
  548. package/skills/skill-creator/scripts/package_skill.py +139 -0
  549. package/skills/skill-creator/scripts/quick_validate.py +159 -0
  550. package/skills/skill-creator/scripts/test_package_skill.py +160 -0
  551. package/skills/skill-creator/scripts/test_quick_validate.py +72 -0
  552. package/skills/slack/SKILL.md +144 -0
  553. package/skills/songsee/SKILL.md +49 -0
  554. package/skills/sonoscli/SKILL.md +65 -0
  555. package/skills/spotify-player/SKILL.md +64 -0
  556. package/skills/summarize/SKILL.md +87 -0
  557. package/skills/taskflow/SKILL.md +149 -0
  558. package/skills/taskflow/examples/inbox-triage.lobster +33 -0
  559. package/skills/taskflow/examples/pr-intake.lobster +32 -0
  560. package/skills/taskflow-inbox-triage/SKILL.md +119 -0
  561. package/skills/things-mac/SKILL.md +86 -0
  562. package/skills/tmux/SKILL.md +170 -0
  563. package/skills/tmux/scripts/find-sessions.sh +112 -0
  564. package/skills/tmux/scripts/wait-for-text.sh +83 -0
  565. package/skills/trello/SKILL.md +108 -0
  566. package/skills/video-frames/SKILL.md +46 -0
  567. package/skills/video-frames/scripts/frame.sh +81 -0
  568. package/skills/voice-call/SKILL.md +45 -0
  569. package/skills/wacli/SKILL.md +72 -0
  570. package/skills/weather/SKILL.md +129 -0
  571. package/skills/xurl/SKILL.md +461 -0
@@ -0,0 +1,640 @@
1
+ ---
2
+ summary: "Testing kit: unit/e2e/live suites, Docker runners, and what each test covers"
3
+ read_when:
4
+ - Running tests locally or in CI
5
+ - Adding regressions for model/provider bugs
6
+ - Debugging gateway + agent behavior
7
+ title: "Testing"
8
+ ---
9
+
10
+ # Testing
11
+
12
+ Durar has three Vitest suites (unit/integration, e2e, live) and a small set of Docker runners.
13
+
14
+ This doc is a “how we test” guide:
15
+
16
+ - What each suite covers (and what it deliberately does _not_ cover)
17
+ - Which commands to run for common workflows (local, pre-push, debugging)
18
+ - How live tests discover credentials and select models/providers
19
+ - How to add regressions for real-world model/provider issues
20
+
21
+ ## Quick start
22
+
23
+ Most days:
24
+
25
+ - Full gate (expected before push): `pnpm build && pnpm check && pnpm test`
26
+ - Faster local full-suite run on a roomy machine: `pnpm test:max`
27
+ - Direct Vitest watch loop (modern projects config): `pnpm test:watch`
28
+ - Direct file targeting now routes extension/channel paths too: `pnpm test extensions/discord/src/monitor/message-handler.preflight.test.ts`
29
+
30
+ When you touch tests or want extra confidence:
31
+
32
+ - Coverage gate: `pnpm test:coverage`
33
+ - E2E suite: `pnpm test:e2e`
34
+
35
+ When debugging real providers/models (requires real creds):
36
+
37
+ - Live suite (models + gateway tool/image probes): `pnpm test:live`
38
+ - Target one live file quietly: `pnpm test:live -- src/agents/models.profiles.live.test.ts`
39
+
40
+ Tip: when you only need one failing case, prefer narrowing live tests via the allowlist env vars described below.
41
+
42
+ ## Test suites (what runs where)
43
+
44
+ Think of the suites as “increasing realism” (and increasing flakiness/cost):
45
+
46
+ ### Unit / integration (default)
47
+
48
+ - Command: `pnpm test`
49
+ - Config: native Vitest `projects` via `vitest.config.ts`
50
+ - Files: core/unit inventories under `src/**/*.test.ts`, `packages/**/*.test.ts`, `test/**/*.test.ts`, and the whitelisted `ui` node tests covered by `vitest.unit.config.ts`
51
+ - Scope:
52
+ - Pure unit tests
53
+ - In-process integration tests (gateway auth, routing, tooling, parsing, config)
54
+ - Deterministic regressions for known bugs
55
+ - Expectations:
56
+ - Runs in CI
57
+ - No real keys required
58
+ - Should be fast and stable
59
+ - Projects note:
60
+ - `pnpm test`, `pnpm test:watch`, and `pnpm test:changed` all use the same native Vitest root `projects` config now.
61
+ - Direct file filters route natively through the root project graph, so `pnpm test extensions/discord/src/monitor/message-handler.preflight.test.ts` works without a custom wrapper.
62
+ - Embedded runner note:
63
+ - When you change message-tool discovery inputs or compaction runtime context,
64
+ keep both levels of coverage.
65
+ - Add focused helper regressions for pure routing/normalization boundaries.
66
+ - Also keep the embedded runner integration suites healthy:
67
+ `src/agents/pi-embedded-runner/compact.hooks.test.ts`,
68
+ `src/agents/pi-embedded-runner/run.overflow-compaction.test.ts`, and
69
+ `src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts`.
70
+ - Those suites verify that scoped ids and compaction behavior still flow
71
+ through the real `run.ts` / `compact.ts` paths; helper-only tests are not a
72
+ sufficient substitute for those integration paths.
73
+ - Pool note:
74
+ - Base Vitest config now defaults to `threads`.
75
+ - The shared Vitest config also fixes `isolate: false` and uses the non-isolated runner across the root projects, e2e, and live configs.
76
+ - The root UI lane keeps its `jsdom` setup and optimizer, but now runs on the shared non-isolated runner too.
77
+ - `pnpm test` inherits the same `threads` + `isolate: false` defaults from the root `vitest.config.ts` projects config.
78
+ - The shared `scripts/run-vitest.mjs` launcher now also adds `--no-maglev` for Vitest child Node processes by default to reduce V8 compile churn during big local runs. Set `Durar_VITEST_ENABLE_MAGLEV=1` if you need to compare against stock V8 behavior.
79
+ - Fast-local iteration note:
80
+ - `pnpm test:changed` runs the native projects config with `--changed origin/main`.
81
+ - `pnpm test:max` and `pnpm test:changed:max` keep the same native projects config, just with a higher worker cap.
82
+ - Local worker auto-scaling is intentionally conservative now and also backs off when the host load average is already high, so multiple concurrent Vitest runs do less damage by default.
83
+ - The base Vitest config marks the projects/config files as `forceRerunTriggers` so changed-mode reruns stay correct when test wiring changes.
84
+ - The config keeps `Durar_VITEST_FS_MODULE_CACHE` enabled on supported hosts; set `Durar_VITEST_FS_MODULE_CACHE_PATH=/abs/path` if you want one explicit cache location for direct profiling.
85
+ - Perf-debug note:
86
+ - `pnpm test:perf:imports` enables Vitest import-duration reporting plus import-breakdown output.
87
+ - `pnpm test:perf:imports:changed` scopes the same profiling view to files changed since `origin/main`.
88
+ - `pnpm test:perf:profile:main` writes a main-thread CPU profile for Vitest/Vite startup and transform overhead.
89
+ - `pnpm test:perf:profile:runner` writes runner CPU+heap profiles for the unit suite with file parallelism disabled.
90
+
91
+ ### E2E (gateway smoke)
92
+
93
+ - Command: `pnpm test:e2e`
94
+ - Config: `vitest.e2e.config.ts`
95
+ - Files: `src/**/*.e2e.test.ts`, `test/**/*.e2e.test.ts`
96
+ - Runtime defaults:
97
+ - Uses Vitest `threads` with `isolate: false`, matching the rest of the repo.
98
+ - Uses adaptive workers (CI: up to 2, local: 1 by default).
99
+ - Runs in silent mode by default to reduce console I/O overhead.
100
+ - Useful overrides:
101
+ - `Durar_E2E_WORKERS=<n>` to force worker count (capped at 16).
102
+ - `Durar_E2E_VERBOSE=1` to re-enable verbose console output.
103
+ - Scope:
104
+ - Multi-instance gateway end-to-end behavior
105
+ - WebSocket/HTTP surfaces, node pairing, and heavier networking
106
+ - Expectations:
107
+ - Runs in CI (when enabled in the pipeline)
108
+ - No real keys required
109
+ - More moving parts than unit tests (can be slower)
110
+
111
+ ### E2E: OpenShell backend smoke
112
+
113
+ - Command: `pnpm test:e2e:openshell`
114
+ - File: `test/openshell-sandbox.e2e.test.ts`
115
+ - Scope:
116
+ - Starts an isolated OpenShell gateway on the host via Docker
117
+ - Creates a sandbox from a temporary local Dockerfile
118
+ - Exercises Durar's OpenShell backend over real `sandbox ssh-config` + SSH exec
119
+ - Verifies remote-canonical filesystem behavior through the sandbox fs bridge
120
+ - Expectations:
121
+ - Opt-in only; not part of the default `pnpm test:e2e` run
122
+ - Requires a local `openshell` CLI plus a working Docker daemon
123
+ - Uses isolated `HOME` / `XDG_CONFIG_HOME`, then destroys the test gateway and sandbox
124
+ - Useful overrides:
125
+ - `Durar_E2E_OPENSHELL=1` to enable the test when running the broader e2e suite manually
126
+ - `Durar_E2E_OPENSHELL_COMMAND=/path/to/openshell` to point at a non-default CLI binary or wrapper script
127
+
128
+ ### Live (real providers + real models)
129
+
130
+ - Command: `pnpm test:live`
131
+ - Config: `vitest.live.config.ts`
132
+ - Files: `src/**/*.live.test.ts`
133
+ - Default: **enabled** by `pnpm test:live` (sets `Durar_LIVE_TEST=1`)
134
+ - Scope:
135
+ - “Does this provider/model actually work _today_ with real creds?”
136
+ - Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior
137
+ - Expectations:
138
+ - Not CI-stable by design (real networks, real provider policies, quotas, outages)
139
+ - Costs money / uses rate limits
140
+ - Prefer running narrowed subsets instead of “everything”
141
+ - Live runs source `~/.profile` to pick up missing API keys.
142
+ - By default, live runs still isolate `HOME` and copy config/auth material into a temp test home so unit fixtures cannot mutate your real `~/.Durar`.
143
+ - Set `Durar_LIVE_USE_REAL_HOME=1` only when you intentionally need live tests to use your real home directory.
144
+ - `pnpm test:live` now defaults to a quieter mode: it keeps `[live] ...` progress output, but suppresses the extra `~/.profile` notice and mutes gateway bootstrap logs/Bonjour chatter. Set `Durar_LIVE_TEST_QUIET=0` if you want the full startup logs back.
145
+ - API key rotation (provider-specific): set `*_API_KEYS` with comma/semicolon format or `*_API_KEY_1`, `*_API_KEY_2` (for example `OPENAI_API_KEYS`, `ANTHROPIC_API_KEYS`, `GEMINI_API_KEYS`) or per-live override via `Durar_LIVE_*_KEY`; tests retry on rate limit responses.
146
+ - Progress/heartbeat output:
147
+ - Live suites now emit progress lines to stderr so long provider calls are visibly active even when Vitest console capture is quiet.
148
+ - `vitest.live.config.ts` disables Vitest console interception so provider/gateway progress lines stream immediately during live runs.
149
+ - Tune direct-model heartbeats with `Durar_LIVE_HEARTBEAT_MS`.
150
+ - Tune gateway/probe heartbeats with `Durar_LIVE_GATEWAY_HEARTBEAT_MS`.
151
+
152
+ ## Which suite should I run?
153
+
154
+ Use this decision table:
155
+
156
+ - Editing logic/tests: run `pnpm test` (and `pnpm test:coverage` if you changed a lot)
157
+ - Touching gateway networking / WS protocol / pairing: add `pnpm test:e2e`
158
+ - Debugging “my bot is down” / provider-specific failures / tool calling: run a narrowed `pnpm test:live`
159
+
160
+ ## Live: Android node capability sweep
161
+
162
+ - Test: `src/gateway/android-node.capabilities.live.test.ts`
163
+ - Script: `pnpm android:test:integration`
164
+ - Goal: invoke **every command currently advertised** by a connected Android node and assert command contract behavior.
165
+ - Scope:
166
+ - Preconditioned/manual setup (the suite does not install/run/pair the app).
167
+ - Command-by-command gateway `node.invoke` validation for the selected Android node.
168
+ - Required pre-setup:
169
+ - Android app already connected + paired to the gateway.
170
+ - App kept in foreground.
171
+ - Permissions/capture consent granted for capabilities you expect to pass.
172
+ - Optional target overrides:
173
+ - `Durar_ANDROID_NODE_ID` or `Durar_ANDROID_NODE_NAME`.
174
+ - `Durar_ANDROID_GATEWAY_URL` / `Durar_ANDROID_GATEWAY_TOKEN` / `Durar_ANDROID_GATEWAY_PASSWORD`.
175
+ - Full Android setup details: [Android App](/platforms/android)
176
+
177
+ ## Live: model smoke (profile keys)
178
+
179
+ Live tests are split into two layers so we can isolate failures:
180
+
181
+ - “Direct model” tells us the provider/model can answer at all with the given key.
182
+ - “Gateway smoke” tells us the full gateway+agent pipeline works for that model (sessions, history, tools, sandbox policy, etc.).
183
+
184
+ ### Layer 1: Direct model completion (no gateway)
185
+
186
+ - Test: `src/agents/models.profiles.live.test.ts`
187
+ - Goal:
188
+ - Enumerate discovered models
189
+ - Use `getApiKeyForModel` to select models you have creds for
190
+ - Run a small completion per model (and targeted regressions where needed)
191
+ - How to enable:
192
+ - `pnpm test:live` (or `Durar_LIVE_TEST=1` if invoking Vitest directly)
193
+ - Set `Durar_LIVE_MODELS=modern` (or `all`, alias for modern) to actually run this suite; otherwise it skips to keep `pnpm test:live` focused on gateway smoke
194
+ - How to select models:
195
+ - `Durar_LIVE_MODELS=modern` to run the modern allowlist (Opus/Sonnet 4.6+, GPT-5.x + Codex, Gemini 3, GLM 4.7, MiniMax M2.7, Grok 4)
196
+ - `Durar_LIVE_MODELS=all` is an alias for the modern allowlist
197
+ - or `Durar_LIVE_MODELS="openai/gpt-5.4,anthropic/claude-opus-4-6,..."` (comma allowlist)
198
+ - How to select providers:
199
+ - `Durar_LIVE_PROVIDERS="google,google-antigravity,google-gemini-cli"` (comma allowlist)
200
+ - Where keys come from:
201
+ - By default: profile store and env fallbacks
202
+ - Set `Durar_LIVE_REQUIRE_PROFILE_KEYS=1` to enforce **profile store** only
203
+ - Why this exists:
204
+ - Separates “provider API is broken / key is invalid” from “gateway agent pipeline is broken”
205
+ - Contains small, isolated regressions (example: OpenAI Responses/Codex Responses reasoning replay + tool-call flows)
206
+
207
+ ### Layer 2: Gateway + dev agent smoke (what "@Durar" actually does)
208
+
209
+ - Test: `src/gateway/gateway-models.profiles.live.test.ts`
210
+ - Goal:
211
+ - Spin up an in-process gateway
212
+ - Create/patch a `agent:dev:*` session (model override per run)
213
+ - Iterate models-with-keys and assert:
214
+ - “meaningful” response (no tools)
215
+ - a real tool invocation works (read probe)
216
+ - optional extra tool probes (exec+read probe)
217
+ - OpenAI regression paths (tool-call-only → follow-up) keep working
218
+ - Probe details (so you can explain failures quickly):
219
+ - `read` probe: the test writes a nonce file in the workspace and asks the agent to `read` it and echo the nonce back.
220
+ - `exec+read` probe: the test asks the agent to `exec`-write a nonce into a temp file, then `read` it back.
221
+ - image probe: the test attaches a generated PNG (cat + randomized code) and expects the model to return `cat <CODE>`.
222
+ - Implementation reference: `src/gateway/gateway-models.profiles.live.test.ts` and `src/gateway/live-image-probe.ts`.
223
+ - How to enable:
224
+ - `pnpm test:live` (or `Durar_LIVE_TEST=1` if invoking Vitest directly)
225
+ - How to select models:
226
+ - Default: modern allowlist (Opus/Sonnet 4.6+, GPT-5.x + Codex, Gemini 3, GLM 4.7, MiniMax M2.7, Grok 4)
227
+ - `Durar_LIVE_GATEWAY_MODELS=all` is an alias for the modern allowlist
228
+ - Or set `Durar_LIVE_GATEWAY_MODELS="provider/model"` (or comma list) to narrow
229
+ - How to select providers (avoid “OpenRouter everything”):
230
+ - `Durar_LIVE_GATEWAY_PROVIDERS="google,google-antigravity,google-gemini-cli,openai,anthropic,zai,minimax"` (comma allowlist)
231
+ - Tool + image probes are always on in this live test:
232
+ - `read` probe + `exec+read` probe (tool stress)
233
+ - image probe runs when the model advertises image input support
234
+ - Flow (high level):
235
+ - Test generates a tiny PNG with “CAT” + random code (`src/gateway/live-image-probe.ts`)
236
+ - Sends it via `agent` `attachments: [{ mimeType: "image/png", content: "<base64>" }]`
237
+ - Gateway parses attachments into `images[]` (`src/gateway/server-methods/agent.ts` + `src/gateway/chat-attachments.ts`)
238
+ - Embedded agent forwards a multimodal user message to the model
239
+ - Assertion: reply contains `cat` + the code (OCR tolerance: minor mistakes allowed)
240
+
241
+ Tip: to see what you can test on your machine (and the exact `provider/model` ids), run:
242
+
243
+ ```bash
244
+ Durar models list
245
+ Durar models list --json
246
+ ```
247
+
248
+ ## Live: CLI backend smoke (Claude CLI or other local CLIs)
249
+
250
+ - Test: `src/gateway/gateway-cli-backend.live.test.ts`
251
+ - Goal: validate the Gateway + agent pipeline using a local CLI backend, without touching your default config.
252
+ - Enable:
253
+ - `pnpm test:live` (or `Durar_LIVE_TEST=1` if invoking Vitest directly)
254
+ - `Durar_LIVE_CLI_BACKEND=1`
255
+ - Defaults:
256
+ - Model: `claude-cli/claude-sonnet-4-6`
257
+ - Command: `claude`
258
+ - Args: `["-p","--output-format","stream-json","--include-partial-messages","--verbose","--permission-mode","bypassPermissions"]`
259
+ - Overrides (optional):
260
+ - `Durar_LIVE_CLI_BACKEND_MODEL="claude-cli/claude-opus-4-6"`
261
+ - `Durar_LIVE_CLI_BACKEND_MODEL="codex-cli/gpt-5.4"`
262
+ - `Durar_LIVE_CLI_BACKEND_COMMAND="/full/path/to/claude"`
263
+ - `Durar_LIVE_CLI_BACKEND_ARGS='["-p","--output-format","stream-json","--include-partial-messages","--verbose","--permission-mode","bypassPermissions"]'`
264
+ - `Durar_LIVE_CLI_BACKEND_CLEAR_ENV='["ANTHROPIC_API_KEY","ANTHROPIC_API_KEY_OLD"]'`
265
+ - `Durar_LIVE_CLI_BACKEND_IMAGE_PROBE=1` to send a real image attachment (paths are injected into the prompt).
266
+ - `Durar_LIVE_CLI_BACKEND_IMAGE_ARG="--image"` to pass image file paths as CLI args instead of prompt injection.
267
+ - `Durar_LIVE_CLI_BACKEND_IMAGE_MODE="repeat"` (or `"list"`) to control how image args are passed when `IMAGE_ARG` is set.
268
+ - `Durar_LIVE_CLI_BACKEND_RESUME_PROBE=1` to send a second turn and validate resume flow.
269
+ - `Durar_LIVE_CLI_BACKEND_DISABLE_MCP_CONFIG=0` to keep Claude CLI MCP config enabled (default injects a temporary strict empty `--mcp-config` so ambient/global MCP servers stay disabled during the smoke).
270
+
271
+ Example:
272
+
273
+ ```bash
274
+ Durar_LIVE_CLI_BACKEND=1 \
275
+ Durar_LIVE_CLI_BACKEND_MODEL="claude-cli/claude-sonnet-4-6" \
276
+ pnpm test:live src/gateway/gateway-cli-backend.live.test.ts
277
+ ```
278
+
279
+ Docker recipe:
280
+
281
+ ```bash
282
+ pnpm test:docker:live-cli-backend
283
+ ```
284
+
285
+ Notes:
286
+
287
+ - The Docker runner lives at `scripts/test-live-cli-backend-docker.sh`.
288
+ - It runs the live CLI-backend smoke inside the repo Docker image as the non-root `node` user, because Claude CLI rejects `bypassPermissions` when invoked as root.
289
+ - For `claude-cli`, it installs the Linux `@anthropic-ai/claude-code` package into a cached writable prefix at `Durar_DOCKER_CLI_TOOLS_DIR` (default: `~/.cache/Durar/docker-cli-tools`).
290
+ - For `claude-cli`, the live smoke injects a strict empty MCP config unless you set `Durar_LIVE_CLI_BACKEND_DISABLE_MCP_CONFIG=0`.
291
+ - It copies `~/.claude` into the container when available, but on machines where Claude auth is backed by `ANTHROPIC_API_KEY`, it also preserves `ANTHROPIC_API_KEY` / `ANTHROPIC_API_KEY_OLD` for the child Claude CLI via `Durar_LIVE_CLI_BACKEND_PRESERVE_ENV`.
292
+
293
+ ## Live: ACP bind smoke (`/acp spawn ... --bind here`)
294
+
295
+ - Test: `src/gateway/gateway-acp-bind.live.test.ts`
296
+ - Goal: validate the real ACP conversation-bind flow with a live ACP agent:
297
+ - send `/acp spawn <agent> --bind here`
298
+ - bind a synthetic message-channel conversation in place
299
+ - send a normal follow-up on that same conversation
300
+ - verify the follow-up lands in the bound ACP session transcript
301
+ - Enable:
302
+ - `pnpm test:live src/gateway/gateway-acp-bind.live.test.ts`
303
+ - `Durar_LIVE_ACP_BIND=1`
304
+ - Defaults:
305
+ - ACP agent: `claude`
306
+ - Synthetic channel: Slack DM-style conversation context
307
+ - ACP backend: `acpx`
308
+ - Overrides:
309
+ - `Durar_LIVE_ACP_BIND_AGENT=claude`
310
+ - `Durar_LIVE_ACP_BIND_AGENT=codex`
311
+ - `Durar_LIVE_ACP_BIND_AGENT_COMMAND='npx -y @agentclientprotocol/claude-agent-acp@<version>'`
312
+ - Notes:
313
+ - This lane uses the gateway `chat.send` surface with admin-only synthetic originating-route fields so tests can attach message-channel context without pretending to deliver externally.
314
+ - When `Durar_LIVE_ACP_BIND_AGENT_COMMAND` is unset, the test uses the embedded `acpx` plugin's built-in agent registry for the selected ACP harness agent.
315
+
316
+ Example:
317
+
318
+ ```bash
319
+ Durar_LIVE_ACP_BIND=1 \
320
+ Durar_LIVE_ACP_BIND_AGENT=claude \
321
+ pnpm test:live src/gateway/gateway-acp-bind.live.test.ts
322
+ ```
323
+
324
+ Docker recipe:
325
+
326
+ ```bash
327
+ pnpm test:docker:live-acp-bind
328
+ ```
329
+
330
+ Docker notes:
331
+
332
+ - The Docker runner lives at `scripts/test-live-acp-bind-docker.sh`.
333
+ - It sources `~/.profile`, copies the matching CLI auth home (`~/.claude` or `~/.codex`) into the container, then installs the requested live CLI (`@anthropic-ai/claude-code` or `@openai/codex`) if missing.
334
+ - Inside Docker, the runner relies on the embedded `acpx` plugin's built-in agent registry and the sourced profile env; use `Durar_LIVE_ACP_BIND_AGENT_COMMAND` only when you need a custom harness command.
335
+
336
+ ### Recommended live recipes
337
+
338
+ Narrow, explicit allowlists are fastest and least flaky:
339
+
340
+ - Single model, direct (no gateway):
341
+ - `Durar_LIVE_MODELS="openai/gpt-5.4" pnpm test:live src/agents/models.profiles.live.test.ts`
342
+
343
+ - Single model, gateway smoke:
344
+ - `Durar_LIVE_GATEWAY_MODELS="openai/gpt-5.4" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
345
+
346
+ - Tool calling across several providers:
347
+ - `Durar_LIVE_GATEWAY_MODELS="openai/gpt-5.4,anthropic/claude-opus-4-6,google/gemini-3-flash-preview,zai/glm-4.7,minimax/MiniMax-M2.7" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
348
+
349
+ - Google focus (Gemini API key + Antigravity):
350
+ - Gemini (API key): `Durar_LIVE_GATEWAY_MODELS="google/gemini-3-flash-preview" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
351
+ - Antigravity (OAuth): `Durar_LIVE_GATEWAY_MODELS="google-antigravity/claude-opus-4-6-thinking,google-antigravity/gemini-3-pro-high" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
352
+
353
+ Notes:
354
+
355
+ - `google/...` uses the Gemini API (API key).
356
+ - `google-antigravity/...` uses the Antigravity OAuth bridge (Cloud Code Assist-style agent endpoint).
357
+ - `google-gemini-cli/...` uses the local Gemini CLI on your machine (separate auth + tooling quirks).
358
+ - Gemini API vs Gemini CLI:
359
+ - API: Durar calls Google’s hosted Gemini API over HTTP (API key / profile auth); this is what most users mean by “Gemini”.
360
+ - CLI: Durar shells out to a local `gemini` binary; it has its own auth and can behave differently (streaming/tool support/version skew).
361
+
362
+ ## Live: model matrix (what we cover)
363
+
364
+ There is no fixed “CI model list” (live is opt-in), but these are the **recommended** models to cover regularly on a dev machine with keys.
365
+
366
+ ### Modern smoke set (tool calling + image)
367
+
368
+ This is the “common models” run we expect to keep working:
369
+
370
+ - OpenAI (non-Codex): `openai/gpt-5.4` (optional: `openai/gpt-5.4-mini`)
371
+ - OpenAI Codex: `openai-codex/gpt-5.4`
372
+ - Anthropic: `anthropic/claude-opus-4-6` (or `anthropic/claude-sonnet-4-6`)
373
+ - Google (Gemini API): `google/gemini-3.1-pro-preview` and `google/gemini-3-flash-preview` (avoid older Gemini 2.x models)
374
+ - Google (Antigravity): `google-antigravity/claude-opus-4-6-thinking` and `google-antigravity/gemini-3-flash`
375
+ - Z.AI (GLM): `zai/glm-4.7`
376
+ - MiniMax: `minimax/MiniMax-M2.7`
377
+
378
+ Run gateway smoke with tools + image:
379
+ `Durar_LIVE_GATEWAY_MODELS="openai/gpt-5.4,openai-codex/gpt-5.4,anthropic/claude-opus-4-6,google/gemini-3.1-pro-preview,google/gemini-3-flash-preview,google-antigravity/claude-opus-4-6-thinking,google-antigravity/gemini-3-flash,zai/glm-4.7,minimax/MiniMax-M2.7" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
380
+
381
+ ### Baseline: tool calling (Read + optional Exec)
382
+
383
+ Pick at least one per provider family:
384
+
385
+ - OpenAI: `openai/gpt-5.4` (or `openai/gpt-5.4-mini`)
386
+ - Anthropic: `anthropic/claude-opus-4-6` (or `anthropic/claude-sonnet-4-6`)
387
+ - Google: `google/gemini-3-flash-preview` (or `google/gemini-3.1-pro-preview`)
388
+ - Z.AI (GLM): `zai/glm-4.7`
389
+ - MiniMax: `minimax/MiniMax-M2.7`
390
+
391
+ Optional additional coverage (nice to have):
392
+
393
+ - xAI: `xai/grok-4` (or latest available)
394
+ - Mistral: `mistral/`… (pick one “tools” capable model you have enabled)
395
+ - Cerebras: `cerebras/`… (if you have access)
396
+ - LM Studio: `lmstudio/`… (local; tool calling depends on API mode)
397
+
398
+ ### Vision: image send (attachment → multimodal message)
399
+
400
+ Include at least one image-capable model in `Durar_LIVE_GATEWAY_MODELS` (Claude/Gemini/OpenAI vision-capable variants, etc.) to exercise the image probe.
401
+
402
+ ### Aggregators / alternate gateways
403
+
404
+ If you have keys enabled, we also support testing via:
405
+
406
+ - OpenRouter: `openrouter/...` (hundreds of models; use `Durar models scan` to find tool+image capable candidates)
407
+ - OpenCode: `opencode/...` for Zen and `opencode-go/...` for Go (auth via `OPENCODE_API_KEY` / `OPENCODE_ZEN_API_KEY`)
408
+
409
+ More providers you can include in the live matrix (if you have creds/config):
410
+
411
+ - Built-in: `openai`, `openai-codex`, `anthropic`, `google`, `google-vertex`, `google-antigravity`, `google-gemini-cli`, `zai`, `openrouter`, `opencode`, `opencode-go`, `xai`, `groq`, `cerebras`, `mistral`, `github-copilot`
412
+ - Via `models.providers` (custom endpoints): `minimax` (cloud/API), plus any OpenAI/Anthropic-compatible proxy (LM Studio, vLLM, LiteLLM, etc.)
413
+
414
+ Tip: don’t try to hardcode “all models” in docs. The authoritative list is whatever `discoverModels(...)` returns on your machine + whatever keys are available.
415
+
416
+ ## Credentials (never commit)
417
+
418
+ Live tests discover credentials the same way the CLI does. Practical implications:
419
+
420
+ - If the CLI works, live tests should find the same keys.
421
+ - If a live test says “no creds”, debug the same way you’d debug `Durar models list` / model selection.
422
+
423
+ - Per-agent auth profiles: `~/.Durar/agents/<agentId>/agent/auth-profiles.json` (this is what “profile keys” means in the live tests)
424
+ - Config: `~/.Durar/Durar.json` (or `Durar_CONFIG_PATH`)
425
+ - Legacy state dir: `~/.Durar/credentials/` (copied into the staged live home when present, but not the main profile-key store)
426
+ - Live local runs copy the active config, per-agent `auth-profiles.json` files, legacy `credentials/`, and supported external CLI auth dirs into a temp test home by default; `agents.*.workspace` / `agentDir` path overrides are stripped in that staged config so probes stay off your real host workspace.
427
+
428
+ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local tests after `source ~/.profile`, or use the Docker runners below (they can mount `~/.profile` into the container).
429
+
430
+ ## Deepgram live (audio transcription)
431
+
432
+ - Test: `src/media-understanding/providers/deepgram/audio.live.test.ts`
433
+ - Enable: `DEEPGRAM_API_KEY=... DEEPGRAM_LIVE_TEST=1 pnpm test:live src/media-understanding/providers/deepgram/audio.live.test.ts`
434
+
435
+ ## BytePlus coding plan live
436
+
437
+ - Test: `src/agents/byteplus.live.test.ts`
438
+ - Enable: `BYTEPLUS_API_KEY=... BYTEPLUS_LIVE_TEST=1 pnpm test:live src/agents/byteplus.live.test.ts`
439
+ - Optional model override: `BYTEPLUS_CODING_MODEL=ark-code-latest`
440
+
441
+ ## Image generation live
442
+
443
+ - Test: `src/image-generation/runtime.live.test.ts`
444
+ - Command: `pnpm test:live src/image-generation/runtime.live.test.ts`
445
+ - Scope:
446
+ - Enumerates every registered image-generation provider plugin
447
+ - Loads missing provider env vars from your login shell (`~/.profile`) before probing
448
+ - Uses live/env API keys ahead of stored auth profiles by default, so stale test keys in `auth-profiles.json` do not mask real shell credentials
449
+ - Skips providers with no usable auth/profile/model
450
+ - Runs the stock image-generation variants through the shared runtime capability:
451
+ - `google:flash-generate`
452
+ - `google:pro-generate`
453
+ - `google:pro-edit`
454
+ - `openai:default-generate`
455
+ - Current bundled providers covered:
456
+ - `openai`
457
+ - `google`
458
+ - Optional narrowing:
459
+ - `Durar_LIVE_IMAGE_GENERATION_PROVIDERS="openai,google"`
460
+ - `Durar_LIVE_IMAGE_GENERATION_MODELS="openai/gpt-image-1,google/gemini-3.1-flash-image-preview"`
461
+ - `Durar_LIVE_IMAGE_GENERATION_CASES="google:flash-generate,google:pro-edit"`
462
+ - Optional auth behavior:
463
+ - `Durar_LIVE_REQUIRE_PROFILE_KEYS=1` to force profile-store auth and ignore env-only overrides
464
+
465
+ ## Docker runners (optional "works in Linux" checks)
466
+
467
+ These Docker runners split into two buckets:
468
+
469
+ - Live-model runners: `test:docker:live-models` and `test:docker:live-gateway` run only their matching profile-key live file inside the repo Docker image (`src/agents/models.profiles.live.test.ts` and `src/gateway/gateway-models.profiles.live.test.ts`), mounting your local config dir and workspace (and sourcing `~/.profile` if mounted). The matching local entrypoints are `test:live:models-profiles` and `test:live:gateway-profiles`.
470
+ - Docker live runners default to a smaller smoke cap so a full Docker sweep stays practical:
471
+ `test:docker:live-models` defaults to `Durar_LIVE_MAX_MODELS=12`, and
472
+ `test:docker:live-gateway` defaults to `Durar_LIVE_GATEWAY_SMOKE=1`,
473
+ `Durar_LIVE_GATEWAY_MAX_MODELS=8`,
474
+ `Durar_LIVE_GATEWAY_STEP_TIMEOUT_MS=45000`, and
475
+ `Durar_LIVE_GATEWAY_MODEL_TIMEOUT_MS=90000`. Override those env vars when you
476
+ explicitly want the larger exhaustive scan.
477
+ - `test:docker:all` builds the live Docker image once via `test:docker:live-build`, then reuses it for the two live Docker lanes.
478
+ - Container smoke runners: `test:docker:openwebui`, `test:docker:onboard`, `test:docker:gateway-network`, `test:docker:mcp-channels`, and `test:docker:plugins` boot one or more real containers and verify higher-level integration paths.
479
+
480
+ The live-model Docker runners also bind-mount only the needed CLI auth homes (or all supported ones when the run is not narrowed), then copy them into the container home before the run so external-CLI OAuth can refresh tokens without mutating the host auth store:
481
+
482
+ - Direct models: `pnpm test:docker:live-models` (script: `scripts/test-live-models-docker.sh`)
483
+ - ACP bind smoke: `pnpm test:docker:live-acp-bind` (script: `scripts/test-live-acp-bind-docker.sh`)
484
+ - CLI backend smoke: `pnpm test:docker:live-cli-backend` (script: `scripts/test-live-cli-backend-docker.sh`)
485
+ - Gateway + dev agent: `pnpm test:docker:live-gateway` (script: `scripts/test-live-gateway-models-docker.sh`)
486
+ - Open WebUI live smoke: `pnpm test:docker:openwebui` (script: `scripts/e2e/openwebui-docker.sh`)
487
+ - Onboarding wizard (TTY, full scaffolding): `pnpm test:docker:onboard` (script: `scripts/e2e/onboard-docker.sh`)
488
+ - Gateway networking (two containers, WS auth + health): `pnpm test:docker:gateway-network` (script: `scripts/e2e/gateway-network-docker.sh`)
489
+ - MCP channel bridge (seeded Gateway + stdio bridge + raw Claude notification-frame smoke): `pnpm test:docker:mcp-channels` (script: `scripts/e2e/mcp-channels-docker.sh`)
490
+ - Plugins (install smoke + `/plugin` alias + Claude-bundle restart semantics): `pnpm test:docker:plugins` (script: `scripts/e2e/plugins-docker.sh`)
491
+
492
+ The live-model Docker runners also bind-mount the current checkout read-only and
493
+ stage it into a temporary workdir inside the container. This keeps the runtime
494
+ image slim while still running Vitest against your exact local source/config.
495
+ They also set `Durar_SKIP_CHANNELS=1` so gateway live probes do not start
496
+ real Telegram/Discord/etc. channel workers inside the container.
497
+ `test:docker:live-models` still runs `pnpm test:live`, so pass through
498
+ `Durar_LIVE_GATEWAY_*` as well when you need to narrow or exclude gateway
499
+ live coverage from that Docker lane.
500
+ `test:docker:openwebui` is a higher-level compatibility smoke: it starts an
501
+ Durar gateway container with the OpenAI-compatible HTTP endpoints enabled,
502
+ starts a pinned Open WebUI container against that gateway, signs in through
503
+ Open WebUI, verifies `/api/models` exposes `Durar/default`, then sends a
504
+ real chat request through Open WebUI's `/api/chat/completions` proxy.
505
+ The first run can be noticeably slower because Docker may need to pull the
506
+ Open WebUI image and Open WebUI may need to finish its own cold-start setup.
507
+ This lane expects a usable live model key, and `Durar_PROFILE_FILE`
508
+ (`~/.profile` by default) is the primary way to provide it in Dockerized runs.
509
+ Successful runs print a small JSON payload like `{ "ok": true, "model":
510
+ "Durar/default", ... }`.
511
+ `test:docker:mcp-channels` is intentionally deterministic and does not need a
512
+ real Telegram, Discord, or iMessage account. It boots a seeded Gateway
513
+ container, starts a second container that spawns `Durar mcp serve`, then
514
+ verifies routed conversation discovery, transcript reads, attachment metadata,
515
+ live event queue behavior, outbound send routing, and Claude-style channel +
516
+ permission notifications over the real stdio MCP bridge. The notification check
517
+ inspects the raw stdio MCP frames directly so the smoke validates what the
518
+ bridge actually emits, not just what a specific client SDK happens to surface.
519
+
520
+ Manual ACP plain-language thread smoke (not CI):
521
+
522
+ - `bun scripts/dev/discord-acp-plain-language-smoke.ts --channel <discord-channel-id> ...`
523
+ - Keep this script for regression/debug workflows. It may be needed again for ACP thread routing validation, so do not delete it.
524
+
525
+ Useful env vars:
526
+
527
+ - `Durar_CONFIG_DIR=...` (default: `~/.Durar`) mounted to `/home/node/.Durar`
528
+ - `Durar_WORKSPACE_DIR=...` (default: `~/.Durar/workspace`) mounted to `/home/node/.Durar/workspace`
529
+ - `Durar_PROFILE_FILE=...` (default: `~/.profile`) mounted to `/home/node/.profile` and sourced before running tests
530
+ - `Durar_DOCKER_CLI_TOOLS_DIR=...` (default: `~/.cache/Durar/docker-cli-tools`) mounted to `/home/node/.npm-global` for cached CLI installs inside Docker
531
+ - External CLI auth dirs under `$HOME` are mounted read-only under `/host-auth/...`, then copied into `/home/node/...` before tests start
532
+ - Default: mount all supported dirs (`.codex`, `.claude`, `.minimax`)
533
+ - Narrowed provider runs mount only the needed dirs inferred from `Durar_LIVE_PROVIDERS` / `Durar_LIVE_GATEWAY_PROVIDERS`
534
+ - Override manually with `Durar_DOCKER_AUTH_DIRS=all`, `Durar_DOCKER_AUTH_DIRS=none`, or a comma list like `Durar_DOCKER_AUTH_DIRS=.claude,.codex`
535
+ - `Durar_LIVE_GATEWAY_MODELS=...` / `Durar_LIVE_MODELS=...` to narrow the run
536
+ - `Durar_LIVE_GATEWAY_PROVIDERS=...` / `Durar_LIVE_PROVIDERS=...` to filter providers in-container
537
+ - `Durar_LIVE_REQUIRE_PROFILE_KEYS=1` to ensure creds come from the profile store (not env)
538
+ - `Durar_OPENWEBUI_MODEL=...` to choose the model exposed by the gateway for the Open WebUI smoke
539
+ - `Durar_OPENWEBUI_PROMPT=...` to override the nonce-check prompt used by the Open WebUI smoke
540
+ - `OPENWEBUI_IMAGE=...` to override the pinned Open WebUI image tag
541
+
542
+ ## Docs sanity
543
+
544
+ Run docs checks after doc edits: `pnpm check:docs`.
545
+ Run full Mintlify anchor validation when you need in-page heading checks too: `pnpm docs:check-links:anchors`.
546
+
547
+ ## Offline regression (CI-safe)
548
+
549
+ These are “real pipeline” regressions without real providers:
550
+
551
+ - Gateway tool calling (mock OpenAI, real gateway + agent loop): `src/gateway/gateway.test.ts` (case: "runs a mock OpenAI tool call end-to-end via gateway agent loop")
552
+ - Gateway wizard (WS `wizard.start`/`wizard.next`, writes config + auth enforced): `src/gateway/gateway.test.ts` (case: "runs wizard over ws and writes auth token config")
553
+
554
+ ## Agent reliability evals (skills)
555
+
556
+ We already have a few CI-safe tests that behave like “agent reliability evals”:
557
+
558
+ - Mock tool-calling through the real gateway + agent loop (`src/gateway/gateway.test.ts`).
559
+ - End-to-end wizard flows that validate session wiring and config effects (`src/gateway/gateway.test.ts`).
560
+
561
+ What’s still missing for skills (see [Skills](/tools/skills)):
562
+
563
+ - **Decisioning:** when skills are listed in the prompt, does the agent pick the right skill (or avoid irrelevant ones)?
564
+ - **Compliance:** does the agent read `SKILL.md` before use and follow required steps/args?
565
+ - **Workflow contracts:** multi-turn scenarios that assert tool order, session history carryover, and sandbox boundaries.
566
+
567
+ Future evals should stay deterministic first:
568
+
569
+ - A scenario runner using mock providers to assert tool calls + order, skill file reads, and session wiring.
570
+ - A small suite of skill-focused scenarios (use vs avoid, gating, prompt injection).
571
+ - Optional live evals (opt-in, env-gated) only after the CI-safe suite is in place.
572
+
573
+ ## Contract tests (plugin and channel shape)
574
+
575
+ Contract tests verify that every registered plugin and channel conforms to its
576
+ interface contract. They iterate over all discovered plugins and run a suite of
577
+ shape and behavior assertions. The default `pnpm test` unit lane intentionally
578
+ skips these shared seam and smoke files; run the contract commands explicitly
579
+ when you touch shared channel or provider surfaces.
580
+
581
+ ### Commands
582
+
583
+ - All contracts: `pnpm test:contracts`
584
+ - Channel contracts only: `pnpm test:contracts:channels`
585
+ - Provider contracts only: `pnpm test:contracts:plugins`
586
+
587
+ ### Channel contracts
588
+
589
+ Located in `src/channels/plugins/contracts/*.contract.test.ts`:
590
+
591
+ - **plugin** - Basic plugin shape (id, name, capabilities)
592
+ - **setup** - Setup wizard contract
593
+ - **session-binding** - Session binding behavior
594
+ - **outbound-payload** - Message payload structure
595
+ - **inbound** - Inbound message handling
596
+ - **actions** - Channel action handlers
597
+ - **threading** - Thread ID handling
598
+ - **directory** - Directory/roster API
599
+ - **group-policy** - Group policy enforcement
600
+
601
+ ### Provider status contracts
602
+
603
+ Located in `src/plugins/contracts/*.contract.test.ts`.
604
+
605
+ - **status** - Channel status probes
606
+ - **registry** - Plugin registry shape
607
+
608
+ ### Provider contracts
609
+
610
+ Located in `src/plugins/contracts/*.contract.test.ts`:
611
+
612
+ - **auth** - Auth flow contract
613
+ - **auth-choice** - Auth choice/selection
614
+ - **catalog** - Model catalog API
615
+ - **discovery** - Plugin discovery
616
+ - **loader** - Plugin loading
617
+ - **runtime** - Provider runtime
618
+ - **shape** - Plugin shape/interface
619
+ - **wizard** - Setup wizard
620
+
621
+ ### When to run
622
+
623
+ - After changing plugin-sdk exports or subpaths
624
+ - After adding or modifying a channel or provider plugin
625
+ - After refactoring plugin registration or discovery
626
+
627
+ Contract tests run in CI and do not require real API keys.
628
+
629
+ ## Adding regressions (guidance)
630
+
631
+ When you fix a provider/model issue discovered in live:
632
+
633
+ - Add a CI-safe regression if possible (mock/stub provider, or capture the exact request-shape transformation)
634
+ - If it’s inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars
635
+ - Prefer targeting the smallest layer that catches the bug:
636
+ - provider request conversion/replay bug → direct models test
637
+ - gateway session/history/tool pipeline bug → gateway live smoke or CI-safe gateway mock test
638
+ - SecretRef traversal guardrail:
639
+ - `src/secrets/exec-secret-ref-id-parity.test.ts` derives one sampled target per SecretRef class from registry metadata (`listSecretTargetRegistryEntries()`), then asserts traversal-segment exec ids are rejected.
640
+ - If you add a new `includeInPlan` SecretRef target family in `src/secrets/target-registry-data.ts`, update `classifyTargetClass` in that test. The test intentionally fails on unclassified target ids so new classes cannot be skipped silently.