@rudderjs/ai 1.17.2 → 1.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (377) hide show
  1. package/README.md +19 -1274
  2. package/dist/budget-orm/index.d.ts +1 -95
  3. package/dist/budget-orm/index.d.ts.map +1 -1
  4. package/dist/budget-orm/index.js +4 -176
  5. package/dist/budget-orm/index.js.map +1 -1
  6. package/dist/chat-mentions.d.ts +1 -58
  7. package/dist/chat-mentions.d.ts.map +1 -1
  8. package/dist/chat-mentions.js +4 -80
  9. package/dist/chat-mentions.js.map +1 -1
  10. package/dist/commands/ai-eval.d.ts +1 -92
  11. package/dist/commands/ai-eval.d.ts.map +1 -1
  12. package/dist/commands/ai-eval.js +4 -377
  13. package/dist/commands/ai-eval.js.map +1 -1
  14. package/dist/commands/make-agent.d.ts +1 -2
  15. package/dist/commands/make-agent.d.ts.map +1 -1
  16. package/dist/commands/make-agent.js +4 -22
  17. package/dist/commands/make-agent.js.map +1 -1
  18. package/dist/computer-use/index.d.ts +1 -52
  19. package/dist/computer-use/index.d.ts.map +1 -1
  20. package/dist/computer-use/index.js +4 -50
  21. package/dist/computer-use/index.js.map +1 -1
  22. package/dist/conversation-orm/index.d.ts +1 -108
  23. package/dist/conversation-orm/index.d.ts.map +1 -1
  24. package/dist/conversation-orm/index.js +4 -214
  25. package/dist/conversation-orm/index.js.map +1 -1
  26. package/dist/doctor.d.ts +1 -1
  27. package/dist/doctor.d.ts.map +1 -1
  28. package/dist/doctor.js +4 -65
  29. package/dist/doctor.js.map +1 -1
  30. package/dist/eval/index.d.ts +1 -270
  31. package/dist/eval/index.d.ts.map +1 -1
  32. package/dist/eval/index.js +4 -509
  33. package/dist/eval/index.js.map +1 -1
  34. package/dist/gateway/index.d.ts +1 -10
  35. package/dist/gateway/index.d.ts.map +1 -1
  36. package/dist/gateway/index.js +4 -10
  37. package/dist/gateway/index.js.map +1 -1
  38. package/dist/index.d.ts +1 -66
  39. package/dist/index.d.ts.map +1 -1
  40. package/dist/index.js +4 -78
  41. package/dist/index.js.map +1 -1
  42. package/dist/mcp/index.d.ts +1 -15
  43. package/dist/mcp/index.d.ts.map +1 -1
  44. package/dist/mcp/index.js +4 -14
  45. package/dist/mcp/index.js.map +1 -1
  46. package/dist/memory-embedding/index.d.ts +1 -120
  47. package/dist/memory-embedding/index.d.ts.map +1 -1
  48. package/dist/memory-embedding/index.js +4 -228
  49. package/dist/memory-embedding/index.js.map +1 -1
  50. package/dist/memory-orm/index.d.ts +1 -117
  51. package/dist/memory-orm/index.d.ts.map +1 -1
  52. package/dist/memory-orm/index.js +4 -186
  53. package/dist/memory-orm/index.js.map +1 -1
  54. package/dist/node/index.d.ts +1 -2
  55. package/dist/node/index.d.ts.map +1 -1
  56. package/dist/node/index.js +4 -2
  57. package/dist/node/index.js.map +1 -1
  58. package/dist/observers.d.ts +1 -129
  59. package/dist/observers.d.ts.map +1 -1
  60. package/dist/observers.js +4 -39
  61. package/dist/observers.js.map +1 -1
  62. package/dist/react/index.d.ts +1 -15
  63. package/dist/react/index.d.ts.map +1 -1
  64. package/dist/react/index.js +4 -15
  65. package/dist/react/index.js.map +1 -1
  66. package/dist/server/index.d.ts +1 -1
  67. package/dist/server/index.d.ts.map +1 -1
  68. package/dist/server/index.js +4 -1
  69. package/dist/server/index.js.map +1 -1
  70. package/package.json +9 -13
  71. package/boost/guidelines.md +0 -260
  72. package/boost/skills/ai-agents/SKILL.md +0 -240
  73. package/boost/skills/ai-tools/SKILL.md +0 -260
  74. package/dist/agent-run-store.d.ts +0 -161
  75. package/dist/agent-run-store.d.ts.map +0 -1
  76. package/dist/agent-run-store.js +0 -98
  77. package/dist/agent-run-store.js.map +0 -1
  78. package/dist/agent-sse.d.ts +0 -153
  79. package/dist/agent-sse.d.ts.map +0 -1
  80. package/dist/agent-sse.js +0 -282
  81. package/dist/agent-sse.js.map +0 -1
  82. package/dist/agent.d.ts +0 -508
  83. package/dist/agent.d.ts.map +0 -1
  84. package/dist/agent.js +0 -1538
  85. package/dist/agent.js.map +0 -1
  86. package/dist/attachment.d.ts +0 -31
  87. package/dist/attachment.d.ts.map +0 -1
  88. package/dist/attachment.js +0 -89
  89. package/dist/attachment.js.map +0 -1
  90. package/dist/audio.d.ts +0 -45
  91. package/dist/audio.d.ts.map +0 -1
  92. package/dist/audio.js +0 -93
  93. package/dist/audio.js.map +0 -1
  94. package/dist/base64.d.ts +0 -7
  95. package/dist/base64.d.ts.map +0 -1
  96. package/dist/base64.js +0 -39
  97. package/dist/base64.js.map +0 -1
  98. package/dist/budget/pricing.d.ts +0 -124
  99. package/dist/budget/pricing.d.ts.map +0 -1
  100. package/dist/budget/pricing.js +0 -175
  101. package/dist/budget/pricing.js.map +0 -1
  102. package/dist/budget/storage.d.ts +0 -104
  103. package/dist/budget/storage.d.ts.map +0 -1
  104. package/dist/budget/storage.js +0 -0
  105. package/dist/budget/storage.js.map +0 -1
  106. package/dist/budget/with-budget.d.ts +0 -119
  107. package/dist/budget/with-budget.d.ts.map +0 -1
  108. package/dist/budget/with-budget.js +0 -175
  109. package/dist/budget/with-budget.js.map +0 -1
  110. package/dist/cached-embedding.d.ts +0 -14
  111. package/dist/cached-embedding.d.ts.map +0 -1
  112. package/dist/cached-embedding.js +0 -44
  113. package/dist/cached-embedding.js.map +0 -1
  114. package/dist/computer-use/actions.d.ts +0 -214
  115. package/dist/computer-use/actions.d.ts.map +0 -1
  116. package/dist/computer-use/actions.js +0 -48
  117. package/dist/computer-use/actions.js.map +0 -1
  118. package/dist/computer-use/errors.d.ts +0 -57
  119. package/dist/computer-use/errors.d.ts.map +0 -1
  120. package/dist/computer-use/errors.js +0 -76
  121. package/dist/computer-use/errors.js.map +0 -1
  122. package/dist/computer-use/playwright.d.ts +0 -76
  123. package/dist/computer-use/playwright.d.ts.map +0 -1
  124. package/dist/computer-use/playwright.js +0 -270
  125. package/dist/computer-use/playwright.js.map +0 -1
  126. package/dist/computer-use/tool.d.ts +0 -154
  127. package/dist/computer-use/tool.d.ts.map +0 -1
  128. package/dist/computer-use/tool.js +0 -210
  129. package/dist/computer-use/tool.js.map +0 -1
  130. package/dist/continuation-validation.d.ts +0 -85
  131. package/dist/continuation-validation.d.ts.map +0 -1
  132. package/dist/continuation-validation.js +0 -166
  133. package/dist/continuation-validation.js.map +0 -1
  134. package/dist/conversation-persistence.d.ts +0 -46
  135. package/dist/conversation-persistence.d.ts.map +0 -1
  136. package/dist/conversation-persistence.js +0 -176
  137. package/dist/conversation-persistence.js.map +0 -1
  138. package/dist/conversation.d.ts +0 -11
  139. package/dist/conversation.d.ts.map +0 -1
  140. package/dist/conversation.js +0 -55
  141. package/dist/conversation.js.map +0 -1
  142. package/dist/eval/fixtures.d.ts +0 -65
  143. package/dist/eval/fixtures.d.ts.map +0 -1
  144. package/dist/eval/fixtures.js +0 -110
  145. package/dist/eval/fixtures.js.map +0 -1
  146. package/dist/eval/html-reporter.d.ts +0 -25
  147. package/dist/eval/html-reporter.d.ts.map +0 -1
  148. package/dist/eval/html-reporter.js +0 -209
  149. package/dist/eval/html-reporter.js.map +0 -1
  150. package/dist/eval/json-reporter.d.ts +0 -43
  151. package/dist/eval/json-reporter.d.ts.map +0 -1
  152. package/dist/eval/json-reporter.js +0 -40
  153. package/dist/eval/json-reporter.js.map +0 -1
  154. package/dist/facade.d.ts +0 -96
  155. package/dist/facade.d.ts.map +0 -1
  156. package/dist/facade.js +0 -146
  157. package/dist/facade.js.map +0 -1
  158. package/dist/fake.d.ts +0 -201
  159. package/dist/fake.d.ts.map +0 -1
  160. package/dist/fake.js +0 -428
  161. package/dist/fake.js.map +0 -1
  162. package/dist/file-search.d.ts +0 -168
  163. package/dist/file-search.d.ts.map +0 -1
  164. package/dist/file-search.js +0 -158
  165. package/dist/file-search.js.map +0 -1
  166. package/dist/files.d.ts +0 -27
  167. package/dist/files.d.ts.map +0 -1
  168. package/dist/files.js +0 -44
  169. package/dist/files.js.map +0 -1
  170. package/dist/gateway/http-gateway-adapter.d.ts +0 -94
  171. package/dist/gateway/http-gateway-adapter.d.ts.map +0 -1
  172. package/dist/gateway/http-gateway-adapter.js +0 -106
  173. package/dist/gateway/http-gateway-adapter.js.map +0 -1
  174. package/dist/gateway/sse.d.ts +0 -28
  175. package/dist/gateway/sse.d.ts.map +0 -1
  176. package/dist/gateway/sse.js +0 -78
  177. package/dist/gateway/sse.js.map +0 -1
  178. package/dist/handoff.d.ts +0 -95
  179. package/dist/handoff.d.ts.map +0 -1
  180. package/dist/handoff.js +0 -78
  181. package/dist/handoff.js.map +0 -1
  182. package/dist/handoffs-driver.d.ts +0 -58
  183. package/dist/handoffs-driver.d.ts.map +0 -1
  184. package/dist/handoffs-driver.js +0 -103
  185. package/dist/handoffs-driver.js.map +0 -1
  186. package/dist/image.d.ts +0 -40
  187. package/dist/image.d.ts.map +0 -1
  188. package/dist/image.js +0 -109
  189. package/dist/image.js.map +0 -1
  190. package/dist/mcp/client-tools.d.ts +0 -39
  191. package/dist/mcp/client-tools.d.ts.map +0 -1
  192. package/dist/mcp/client-tools.js +0 -147
  193. package/dist/mcp/client-tools.js.map +0 -1
  194. package/dist/mcp/server-from-agent.d.ts +0 -24
  195. package/dist/mcp/server-from-agent.d.ts.map +0 -1
  196. package/dist/mcp/server-from-agent.js +0 -113
  197. package/dist/mcp/server-from-agent.js.map +0 -1
  198. package/dist/mcp/types.d.ts +0 -64
  199. package/dist/mcp/types.d.ts.map +0 -1
  200. package/dist/mcp/types.js +0 -6
  201. package/dist/mcp/types.js.map +0 -1
  202. package/dist/memory-extract.d.ts +0 -60
  203. package/dist/memory-extract.d.ts.map +0 -1
  204. package/dist/memory-extract.js +0 -163
  205. package/dist/memory-extract.js.map +0 -1
  206. package/dist/memory-inject.d.ts +0 -39
  207. package/dist/memory-inject.d.ts.map +0 -1
  208. package/dist/memory-inject.js +0 -135
  209. package/dist/memory-inject.js.map +0 -1
  210. package/dist/memory.d.ts +0 -55
  211. package/dist/memory.d.ts.map +0 -1
  212. package/dist/memory.js +0 -132
  213. package/dist/memory.js.map +0 -1
  214. package/dist/middleware.d.ts +0 -18
  215. package/dist/middleware.d.ts.map +0 -1
  216. package/dist/middleware.js +0 -72
  217. package/dist/middleware.js.map +0 -1
  218. package/dist/node/attachment.d.ts +0 -6
  219. package/dist/node/attachment.d.ts.map +0 -1
  220. package/dist/node/attachment.js +0 -35
  221. package/dist/node/attachment.js.map +0 -1
  222. package/dist/node/transcription.d.ts +0 -4
  223. package/dist/node/transcription.d.ts.map +0 -1
  224. package/dist/node/transcription.js +0 -8
  225. package/dist/node/transcription.js.map +0 -1
  226. package/dist/output.d.ts +0 -22
  227. package/dist/output.d.ts.map +0 -1
  228. package/dist/output.js +0 -60
  229. package/dist/output.js.map +0 -1
  230. package/dist/provider-tools.d.ts +0 -87
  231. package/dist/provider-tools.d.ts.map +0 -1
  232. package/dist/provider-tools.js +0 -189
  233. package/dist/provider-tools.js.map +0 -1
  234. package/dist/providers/anthropic.d.ts +0 -24
  235. package/dist/providers/anthropic.d.ts.map +0 -1
  236. package/dist/providers/anthropic.js +0 -405
  237. package/dist/providers/anthropic.js.map +0 -1
  238. package/dist/providers/azure.d.ts +0 -13
  239. package/dist/providers/azure.d.ts.map +0 -1
  240. package/dist/providers/azure.js +0 -15
  241. package/dist/providers/azure.js.map +0 -1
  242. package/dist/providers/bedrock.d.ts +0 -75
  243. package/dist/providers/bedrock.d.ts.map +0 -1
  244. package/dist/providers/bedrock.js +0 -181
  245. package/dist/providers/bedrock.js.map +0 -1
  246. package/dist/providers/cohere.d.ts +0 -13
  247. package/dist/providers/cohere.d.ts.map +0 -1
  248. package/dist/providers/cohere.js +0 -87
  249. package/dist/providers/cohere.js.map +0 -1
  250. package/dist/providers/deepseek.d.ts +0 -12
  251. package/dist/providers/deepseek.d.ts.map +0 -1
  252. package/dist/providers/deepseek.js +0 -15
  253. package/dist/providers/deepseek.js.map +0 -1
  254. package/dist/providers/elevenlabs.d.ts +0 -98
  255. package/dist/providers/elevenlabs.d.ts.map +0 -1
  256. package/dist/providers/elevenlabs.js +0 -229
  257. package/dist/providers/elevenlabs.js.map +0 -1
  258. package/dist/providers/google-cache-registry.d.ts +0 -132
  259. package/dist/providers/google-cache-registry.d.ts.map +0 -1
  260. package/dist/providers/google-cache-registry.js +0 -209
  261. package/dist/providers/google-cache-registry.js.map +0 -1
  262. package/dist/providers/google.d.ts +0 -38
  263. package/dist/providers/google.d.ts.map +0 -1
  264. package/dist/providers/google.js +0 -903
  265. package/dist/providers/google.js.map +0 -1
  266. package/dist/providers/groq.d.ts +0 -12
  267. package/dist/providers/groq.d.ts.map +0 -1
  268. package/dist/providers/groq.js +0 -15
  269. package/dist/providers/groq.js.map +0 -1
  270. package/dist/providers/jina.d.ts +0 -13
  271. package/dist/providers/jina.d.ts.map +0 -1
  272. package/dist/providers/jina.js +0 -90
  273. package/dist/providers/jina.js.map +0 -1
  274. package/dist/providers/mistral.d.ts +0 -13
  275. package/dist/providers/mistral.d.ts.map +0 -1
  276. package/dist/providers/mistral.js +0 -46
  277. package/dist/providers/mistral.js.map +0 -1
  278. package/dist/providers/ollama.d.ts +0 -11
  279. package/dist/providers/ollama.d.ts.map +0 -1
  280. package/dist/providers/ollama.js +0 -15
  281. package/dist/providers/ollama.js.map +0 -1
  282. package/dist/providers/openai.d.ts +0 -79
  283. package/dist/providers/openai.d.ts.map +0 -1
  284. package/dist/providers/openai.js +0 -792
  285. package/dist/providers/openai.js.map +0 -1
  286. package/dist/providers/openrouter.d.ts +0 -43
  287. package/dist/providers/openrouter.d.ts.map +0 -1
  288. package/dist/providers/openrouter.js +0 -21
  289. package/dist/providers/openrouter.js.map +0 -1
  290. package/dist/providers/voyage.d.ts +0 -91
  291. package/dist/providers/voyage.d.ts.map +0 -1
  292. package/dist/providers/voyage.js +0 -166
  293. package/dist/providers/voyage.js.map +0 -1
  294. package/dist/providers/xai.d.ts +0 -12
  295. package/dist/providers/xai.d.ts.map +0 -1
  296. package/dist/providers/xai.js +0 -15
  297. package/dist/providers/xai.js.map +0 -1
  298. package/dist/queue-job.d.ts +0 -100
  299. package/dist/queue-job.d.ts.map +0 -1
  300. package/dist/queue-job.js +0 -185
  301. package/dist/queue-job.js.map +0 -1
  302. package/dist/react/agent-run.d.ts +0 -111
  303. package/dist/react/agent-run.d.ts.map +0 -1
  304. package/dist/react/agent-run.js +0 -107
  305. package/dist/react/agent-run.js.map +0 -1
  306. package/dist/react/useAgentRun.d.ts +0 -68
  307. package/dist/react/useAgentRun.d.ts.map +0 -1
  308. package/dist/react/useAgentRun.js +0 -125
  309. package/dist/react/useAgentRun.js.map +0 -1
  310. package/dist/registry.d.ts +0 -45
  311. package/dist/registry.d.ts.map +0 -1
  312. package/dist/registry.js +0 -131
  313. package/dist/registry.js.map +0 -1
  314. package/dist/rerank.d.ts +0 -20
  315. package/dist/rerank.d.ts.map +0 -1
  316. package/dist/rerank.js +0 -40
  317. package/dist/rerank.js.map +0 -1
  318. package/dist/resume-approval.d.ts +0 -30
  319. package/dist/resume-approval.d.ts.map +0 -1
  320. package/dist/resume-approval.js +0 -147
  321. package/dist/resume-approval.js.map +0 -1
  322. package/dist/sanitize-conversation.d.ts +0 -43
  323. package/dist/sanitize-conversation.d.ts.map +0 -1
  324. package/dist/sanitize-conversation.js +0 -85
  325. package/dist/sanitize-conversation.js.map +0 -1
  326. package/dist/scoped-tool.d.ts +0 -98
  327. package/dist/scoped-tool.d.ts.map +0 -1
  328. package/dist/scoped-tool.js +0 -174
  329. package/dist/scoped-tool.js.map +0 -1
  330. package/dist/server/provider.d.ts +0 -22
  331. package/dist/server/provider.d.ts.map +0 -1
  332. package/dist/server/provider.js +0 -194
  333. package/dist/server/provider.js.map +0 -1
  334. package/dist/similarity-search.d.ts +0 -163
  335. package/dist/similarity-search.d.ts.map +0 -1
  336. package/dist/similarity-search.js +0 -147
  337. package/dist/similarity-search.js.map +0 -1
  338. package/dist/sub-agent-run-store.d.ts +0 -157
  339. package/dist/sub-agent-run-store.d.ts.map +0 -1
  340. package/dist/sub-agent-run-store.js +0 -87
  341. package/dist/sub-agent-run-store.js.map +0 -1
  342. package/dist/tool-execution.d.ts +0 -16
  343. package/dist/tool-execution.d.ts.map +0 -1
  344. package/dist/tool-execution.js +0 -498
  345. package/dist/tool-execution.js.map +0 -1
  346. package/dist/tool-helpers.d.ts +0 -77
  347. package/dist/tool-helpers.d.ts.map +0 -1
  348. package/dist/tool-helpers.js +0 -117
  349. package/dist/tool-helpers.js.map +0 -1
  350. package/dist/tool.d.ts +0 -216
  351. package/dist/tool.d.ts.map +0 -1
  352. package/dist/tool.js +0 -175
  353. package/dist/tool.js.map +0 -1
  354. package/dist/transcription.d.ts +0 -42
  355. package/dist/transcription.d.ts.map +0 -1
  356. package/dist/transcription.js +0 -77
  357. package/dist/transcription.js.map +0 -1
  358. package/dist/types.d.ts +0 -1020
  359. package/dist/types.d.ts.map +0 -1
  360. package/dist/types.js +0 -2
  361. package/dist/types.js.map +0 -1
  362. package/dist/util/hash.d.ts +0 -11
  363. package/dist/util/hash.d.ts.map +0 -1
  364. package/dist/util/hash.js +0 -23
  365. package/dist/util/hash.js.map +0 -1
  366. package/dist/vector-stores/index.d.ts +0 -96
  367. package/dist/vector-stores/index.d.ts.map +0 -1
  368. package/dist/vector-stores/index.js +0 -153
  369. package/dist/vector-stores/index.js.map +0 -1
  370. package/dist/vercel-protocol.d.ts +0 -18
  371. package/dist/vercel-protocol.d.ts.map +0 -1
  372. package/dist/vercel-protocol.js +0 -75
  373. package/dist/vercel-protocol.js.map +0 -1
  374. package/dist/zod-to-json-schema.d.ts +0 -16
  375. package/dist/zod-to-json-schema.d.ts.map +0 -1
  376. package/dist/zod-to-json-schema.js +0 -17
  377. package/dist/zod-to-json-schema.js.map +0 -1
package/README.md CHANGED
@@ -1,1284 +1,29 @@
1
1
  # @rudderjs/ai
2
2
 
3
- AI engine for RudderJS providers, agents, tools, streaming, middleware, structured output, conversation memory, and testing fakes.
3
+ > Deprecated. The AI engine moved to [`@gemstack/ai-sdk`](https://www.npmjs.com/package/@gemstack/ai-sdk).
4
4
 
5
- ## Installation
5
+ This package is now a thin compatibility shim that re-exports `@gemstack/ai-sdk` (and every one of its subpaths) so existing Rudder apps and the internal dependents (`telescope`, `orm-prisma`, `orm-drizzle`) keep working unchanged.
6
6
 
7
- ```bash
8
- pnpm add @rudderjs/ai
9
- ```
10
-
11
- Install the provider SDK(s) you need:
12
-
13
- ```bash
14
- pnpm add @anthropic-ai/sdk # Anthropic (Claude)
15
- pnpm add openai # OpenAI (GPT) — also used for OpenRouter / Mistral / DeepSeek / Groq / xAI / Ollama
16
- pnpm add @google/genai # Google (Gemini)
17
- pnpm add cohere-ai # Cohere (reranking + embeddings)
18
- pnpm add @aws-sdk/client-bedrock-runtime # AWS Bedrock
19
- # ElevenLabs (premium TTS + STT) — no extra package needed (direct HTTP)
20
- # VoyageAI (embeddings + reranking) — no extra package needed (direct HTTP)
21
- # Jina — no extra package needed (direct HTTP)
22
- ```
23
-
24
- ## Runtime Compatibility
25
-
26
- `@rudderjs/ai` is runtime-agnostic via subpath exports:
27
-
28
- | Entry | Runtimes | Use for |
29
- |---|---|---|
30
- | `@rudderjs/ai` | Node, browser, Electron main+renderer, React Native | Agents, tools, streaming, providers — any `fetch`-capable JS runtime |
31
- | `@rudderjs/ai/node` | Node only | `documentFromPath()`, `imageFromPath()`, `transcribeFromPath()` (filesystem helpers) |
32
- | `@rudderjs/ai/server` | Node only | `AiProvider` (the RudderJS `ServiceProvider` — auto-discovered, you rarely import it) |
33
- | `@rudderjs/ai/mcp` | Node only (in practice) | `mcpClientTools()` + `mcpServerFromAgent()` — requires `@modelcontextprotocol/sdk` |
34
- | `@rudderjs/ai/memory-orm` | Node only | `OrmUserMemory` + `UserMemoryRecord` — ORM-backed `UserMemory` |
35
- | `@rudderjs/ai/memory-embedding` | Node only | `EmbeddingUserMemory` — semantic recall via the registered embedding model |
36
- | `@rudderjs/ai/budget-orm` | Node only | `OrmBudgetStorage` + `BudgetUsageRecord` — ORM-backed `BudgetStorage` |
37
- | `@rudderjs/ai/eval` | Any `fetch`-capable runtime | `evalSuite()` + `runSuite()` + metrics for testing agents against real models |
38
- | `@rudderjs/ai/computer-use` | Node only (in practice) | `executeComputerAction(page, action, state)` — lower-level Playwright dispatcher |
39
-
40
- The main entry has zero `node:*` static imports, so you can call agents and tools directly from a React Native screen, an Electron renderer, or a browser. `@rudderjs/core` is an optional peer — only `/server` consumers pull it in.
41
-
42
- **Security:** Calling LLM providers directly from a client leaks your API key. Use a server-side proxy in production. The main client-side use case is BYOK desktop apps (Electron) where the user supplies their own key.
43
-
44
- ## Setup
45
-
46
- ```ts
47
- // config/ai.ts
48
- export default {
49
- default: 'anthropic/claude-sonnet-4-5',
50
- providers: {
51
- anthropic: { driver: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY! },
52
- openai: { driver: 'openai', apiKey: process.env.OPENAI_API_KEY! },
53
- google: { driver: 'google', apiKey: process.env.GOOGLE_API_KEY! },
54
- ollama: { driver: 'ollama', baseUrl: 'http://localhost:11434' },
55
- cohere: { driver: 'cohere', apiKey: process.env.COHERE_API_KEY! },
56
- jina: { driver: 'jina', apiKey: process.env.JINA_API_KEY! },
57
- voyage: { driver: 'voyage', apiKey: process.env.VOYAGE_API_KEY! }, // embeddings + reranking
58
- elevenlabs:{ driver: 'elevenlabs', apiKey: process.env.ELEVENLABS_API_KEY! }, // premium TTS + STT
59
- openrouter: {
60
- driver: 'openrouter',
61
- apiKey: process.env.OPENROUTER_API_KEY!,
62
- siteUrl: process.env.APP_URL, // optional — sent as HTTP-Referer
63
- siteName: 'My App', // optional — sent as X-Title
64
- },
65
- bedrock: {
66
- driver: 'bedrock',
67
- region: process.env.AWS_REGION ?? 'us-east-1',
68
- // credentials are read from the AWS chain (env, IAM, ~/.aws/credentials)
69
- },
70
- },
71
- }
72
-
73
- ```
74
-
75
- `AiProvider` is picked up by [auto-discovery](https://github.com/rudderjs/rudder/blob/main/docs/guide/service-providers.md#auto-discovery) — `pnpm rudder providers:discover` is all that's needed. The class lives at `@rudderjs/ai/server` (the main entry is runtime-agnostic); auto-discovery reads `rudderjs.providerSubpath` and loads it for you.
76
-
77
- ## Usage
78
-
79
- ### Agent Class
80
-
81
- ```ts
82
- import { Agent, toolDefinition, stepCountIs } from '@rudderjs/ai'
83
- import type { HasTools } from '@rudderjs/ai'
84
- import { z } from 'zod'
85
-
86
- const searchTool = toolDefinition({
87
- name: 'search_users',
88
- description: 'Search users by name',
89
- inputSchema: z.object({ query: z.string() }),
90
- }).server(async ({ query }) => {
91
- return db.users.findMany({ where: { name: { contains: query } } })
92
- })
93
-
94
- class SearchAgent extends Agent implements HasTools {
95
- instructions() { return 'You help find users in the system.' }
96
- model() { return 'anthropic/claude-sonnet-4-5' }
97
- tools() { return [searchTool] }
98
- stopWhen() { return stepCountIs(5) }
99
- }
100
-
101
- const response = await new SearchAgent().prompt('Find all admins')
102
- console.log(response.text)
103
- ```
104
-
105
- ### Anonymous Agent
106
-
107
- ```ts
108
- import { agent, AI } from '@rudderjs/ai'
109
-
110
- const response = await agent('You summarize text.').prompt('Summarize this...')
111
-
112
- // Or via facade
113
- const response = await AI.prompt('Hello world')
114
- ```
115
-
116
- ### Tools (Server + Client)
117
-
118
- A `Tool` is just `{ definition, execute? }`. The presence or absence of
119
- `execute` is the only discriminator: with it, the tool runs server-side;
120
- without it, it's a client tool that the browser executes via
121
- `@rudderjs/panels`'s `clientTools` registry.
122
-
123
- ```ts
124
- import { toolDefinition, dynamicTool } from '@rudderjs/ai'
125
- import { z } from 'zod'
126
-
127
- // Server tool — executes on backend
128
- const weatherTool = toolDefinition({
129
- name: 'get_weather',
130
- description: 'Get weather for a location',
131
- inputSchema: z.object({ location: z.string() }),
132
- needsApproval: true, // pauses the agent loop until the user approves
133
- lazy: true, // not sent to LLM upfront
134
- }).server(async ({ location }) => ({ temp: 72, unit: 'F' }))
135
-
136
- // Client tool — no `.server()`, so the browser executes it
137
- const readFormState = toolDefinition({
138
- name: 'read_form_state',
139
- description: 'Read the user\'s current local form values',
140
- inputSchema: z.object({ fields: z.array(z.string()).optional() }),
141
- })
142
-
143
- // Dynamic tool — schemas built at runtime from user data
144
- const customTool = dynamicTool({
145
- name: 'custom_op',
146
- description: 'Built at runtime',
147
- inputSchema: z.object({ q: z.string() }),
148
- }).server(async (input) => JSON.stringify(input))
149
- ```
150
-
151
- ### Client tool round-trip and approval gates
152
-
153
- When the model calls a client tool (no `execute`) or a tool with
154
- `needsApproval: true`, the agent loop **stops** instead of failing — and
155
- exposes the pending state on `AgentResponse`:
156
-
157
- ```ts
158
- const result = await agent({ tools: [readFormState, weatherTool] })
159
- .prompt('what is in the form?', {
160
- toolCallStreamingMode: 'stop-on-client-tool',
161
- })
162
-
163
- if (result.finishReason === 'client_tool_calls') {
164
- // result.pendingClientToolCalls — execute these in the browser, then
165
- // re-POST with `messages: [...history, assistantMsg, ...toolResultMsgs]`
166
- }
167
- if (result.finishReason === 'tool_approval_required') {
168
- // result.pendingApprovalToolCall — show approval UI, then re-POST with
169
- // `approvedToolCallIds: [id]` or `rejectedToolCallIds: [id]`
170
- }
171
- ```
172
-
173
- The **continuation** uses `options.messages` instead of `history` + `input`:
174
-
175
- ```ts
176
- await agent({ tools: [...] }).prompt('', {
177
- messages: [...priorConversation, assistantWithToolCalls, toolResult],
178
- approvedToolCallIds: ['tc_id'], // or rejectedToolCallIds
179
- })
180
- ```
181
-
182
- When continuing after an approval round-trip, the loop transparently
183
- **resumes the pending tool call server-side** before re-entering the model
184
- loop — the resulting `tool` messages are exposed via
185
- `result.resumedToolMessages` so callers can persist them. This guarantees
186
- the conversation store never holds an unfulfilled `tool_use` block.
187
-
188
- `@rudderjs/panels` does all the wiring (validating message prefixes against
189
- the persisted store, executing client tools via the `clientTools` registry,
190
- showing the inline approval card) — see its README for the end-to-end flow.
191
-
192
- ### Tailoring what the model sees with `.modelOutput()`
193
-
194
- A server tool returns its full structured result to the **UI** (via telemetry, stream chunks, observers). By default the model sees that same JSON on its next step — but big JSON eats context for no reason when the model only needs a summary. Use `.modelOutput(fn)` to map result → model-facing string while leaving the UI's view untouched:
195
-
196
- ```ts
197
- const searchTool = toolDefinition({
198
- name: 'search_docs',
199
- description: 'Full-text search across the docs',
200
- inputSchema: z.object({ query: z.string() }),
201
- })
202
- .server(async ({ query }) => ({
203
- results: await docs.search(query), // [{ title, url, snippet }, ...]
204
- total: await docs.count(query),
205
- }))
206
- .modelOutput((r) => `Found ${r.total} results. Top: ${r.results.slice(0, 3).map(x => x.title).join(', ')}`)
207
- ```
208
-
209
- The UI still receives `{ results, total }` in the tool-result chunk — useful for rendering a rich results card — but the model only sees the summary string on its next step. Smaller context, same UX.
210
-
211
- ### Subagents — `agent.asTool()`
212
-
213
- Wrap one agent as a tool another agent can call. The parent delegates work; the subagent runs its own loop end-to-end (its own model, tools, middleware) and returns a single result.
214
-
215
- ```ts
216
- class Researcher extends Agent implements HasTools {
217
- instructions() { return 'You research topics and return concise summaries.' }
218
- model() { return 'anthropic/claude-sonnet-4-6' }
219
- tools() { return [searchTool, readUrlTool] }
220
- }
221
-
222
- class Planner extends Agent implements HasTools {
223
- instructions() { return 'You break work into steps. Use `research` for facts.' }
224
- model() { return 'anthropic/claude-opus-4-7' }
225
- tools() {
226
- return [
227
- new Researcher().asTool({
228
- name: 'research',
229
- description: 'Research a topic in depth and return a summary.',
230
- }),
231
- ]
232
- }
233
- }
234
-
235
- await new Planner().prompt('Plan a launch for our new ORM feature.')
236
- ```
237
-
238
- Defaults are tuned for the zero-config case:
239
-
240
- - `inputSchema` defaults to `{ prompt: string }` and the subagent runs with `input.prompt`.
241
- - The parent model only sees `response.text` on its next step (override with `modelOutput`); the UI still receives the full `AgentResponse` via the `tool-result` chunk.
242
-
243
- For a typed input schema, pass an explicit `inputSchema` and a `prompt` mapper:
244
-
245
- ```ts
246
- new Researcher().asTool({
247
- name: 'research',
248
- description: 'Research a topic in depth.',
249
- inputSchema: z.object({ topic: z.string(), depth: z.enum(['quick', 'deep']) }),
250
- prompt: ({ topic, depth }) => `Research ${topic} at ${depth} depth.`,
251
- modelOutput: (r) => `${r.steps.length} step(s); ${r.text.slice(0, 280)}…`,
252
- })
253
- ```
254
-
255
- The wrapped subagent runs via `prompt()` (non-streaming) by default — to surface inner-agent progress as `tool-update` chunks in the parent stream, pass `streaming: true` (or a custom `(chunk) => SubAgentUpdate | null` projector). Pass `suspendable: { runStore }` to opt into the propagation protocol when the sub-agent pauses on a **client tool call** (`finishReason: 'client_tool_calls'`) or an **approval gate** (`finishReason: 'tool_approval_required'`) — the parent loop halts, the snapshot persists in the run store with a `pauseKind: 'client_tool' | 'approval'` discriminator, and the host resumes via `Agent.resumeAsTool(subRunId, results, { runStore, agent, approvedToolCallIds? })`. See `docs/guide/ai.md` for the full flow. `InMemorySubAgentRunStore` works for tests; `CachedSubAgentRunStore` plugs into `@rudderjs/cache` for cross-process persistence. Suspend without streaming throws at builder time.
256
-
257
- ### Handoffs — `handoff()`
258
-
259
- Sometimes a parent agent shouldn't *call* a specialist and incorporate its result — it should *step out* and let the specialist own the rest of the conversation. That's a handoff.
260
-
261
- ```ts
262
- import { Agent, handoff } from '@rudderjs/ai'
263
-
264
- class SalesAgent extends Agent {
265
- instructions() { return 'You handle pricing, plans, and upgrades.' }
266
- }
267
- class SupportAgent extends Agent {
268
- instructions() { return 'You triage bugs and walk users through fixes.' }
269
- }
270
-
271
- class TriageAgent extends Agent {
272
- instructions() { return 'Greet the user, then route them to the right specialist.' }
273
- tools() {
274
- return [
275
- handoff(SalesAgent, { when: 'pricing or sales questions' }),
276
- handoff(SupportAgent, { when: 'bug reports or technical issues' }),
277
- ]
278
- }
279
- }
280
-
281
- const r = await new TriageAgent().prompt('What does the Pro plan cost?')
282
- console.log(r.text) // "The Pro plan is $49/month..." (from SalesAgent)
283
- console.log(r.handoffPath) // ['TriageAgent', 'SalesAgent']
284
- ```
285
-
286
- How it differs from `asTool`:
287
-
288
- | | `asTool` (call-and-return) | `handoff` (control transfer) |
289
- |---|---|---|
290
- | Parent loop | continues after subagent finishes | ends |
291
- | Conversation owner | parent | child |
292
- | Final `text` | parent's | last child in the chain |
293
- | `r.steps` | parent steps + a single tool-result step for the subagent | parent steps + each agent's steps merged in order |
294
- | Use case | "look something up and use it" | "transfer to the right specialist" |
295
-
296
- Default: the model writes a transition message (`{ message: string }`) that becomes the child's first user message. The full prior conversation flows through to the child — but the child uses its own `instructions()` as the system message. Multi-hop is supported (Triage → Sales → Billing); cycles are bounded by `MAX_HANDOFFS = 5` and surface a clear error.
297
-
298
- ```ts
299
- // Custom name + payload
300
- handoff(SalesAgent, {
301
- name: 'pivotToSales',
302
- description: 'Transfer the user to a sales specialist.',
303
- inputSchema: z.object({ urgency: z.enum(['low', 'high']), context: z.string() }),
304
- })
305
- ```
306
-
307
- In `agent.stream()`, a `'handoff'` `StreamChunk` is emitted right before control transfers, with `{ from, to, message? }` for UIs to render a transition indicator before the next agent's chunks arrive.
308
-
309
- ### Tool execution context
310
-
311
- Server-tool executes can optionally accept a second `ctx: ToolCallContext`
312
- argument carrying loop-level metadata — currently `{ toolCallId }`. The
313
- parameter is optional, so existing one-arg tools keep working unchanged.
314
-
315
- ```ts
316
- import { toolDefinition, type ToolCallContext } from '@rudderjs/ai'
317
-
318
- const myTool = toolDefinition({
319
- name: 'my_tool',
320
- description: '...',
321
- inputSchema: z.object({ q: z.string() }),
322
- }).server(async (input, ctx?: ToolCallContext) => {
323
- console.log('this call id:', ctx?.toolCallId)
324
- return { ok: true }
325
- })
326
- ```
327
-
328
- The primary consumer is `@pilotiq-pro/ai`'s `runAgentTool`, which uses
329
- `ctx.toolCallId` to correlate sub-agent suspensions with the parent's
330
- `run_agent` call (see "Pausing the loop from a server tool" below).
331
-
332
- ### Pausing the loop from a server tool
333
-
334
- A server tool's async-generator execute can `yield` a `pauseForClientTools`
335
- control chunk to halt the enclosing agent loop and surface a set of
336
- **client** tool calls to the caller — as if the model itself had emitted
337
- them. The yielding tool's own call stays orphaned in the message history
338
- until the caller resolves it on continuation.
339
-
340
- ```ts
341
- import { toolDefinition, pauseForClientTools } from '@rudderjs/ai'
342
-
343
- const runNestedTool = toolDefinition({
344
- name: 'run_nested',
345
- description: 'Runs a nested workflow that may need browser interaction',
346
- inputSchema: z.object({ task: z.string() }),
347
- }).server(async function* (input, ctx) {
348
- // ...do some server-side work, maybe yield progress chunks...
349
-
350
- if (needsBrowserAction) {
351
- // Persist whatever state you need to resume later, keyed by an
352
- // opaque `resumeHandle` your continuation logic understands.
353
- const handle = await persistMyResumeState({
354
- parentToolCallId: ctx?.toolCallId,
355
- task: input.task,
356
- // ...
357
- })
358
-
359
- // Yielding the control chunk halts iteration. The agent loop
360
- // appends the toolCalls to its own pendingClientToolCalls,
361
- // sets stop-for-client-tools, and emits 'pending-client-tools'
362
- // upward. The browser executes the calls and POSTs back, your
363
- // continuation handler picks up `handle` and resumes.
364
- yield pauseForClientTools(
365
- [{ id: 'call_xyz', name: 'update_form_state', arguments: { ... } }],
366
- handle,
367
- )
368
- // Unreachable — the loop halts iteration after the pause chunk.
369
- return null as never
370
- }
371
-
372
- return { result: 'done' }
373
- })
374
- ```
375
-
376
- **Why a yield instead of a throw:**
377
-
378
- - Symmetry with the existing `tool-update` yield protocol (no parallel
379
- catch-based control path)
380
- - Middleware can observe pauses through `runOnChunk`; throws would route
381
- through `onError` and muddle telemetry
382
- - Exceptions signal "something went wrong"; this is not an error
383
- - Any server tool can yield this — not just nested agent runners. E.g., a
384
- tool that wants the browser's geolocation, clipboard, or a user file
385
- upload.
386
-
387
- **Recognizing the chunk:** the loop uses `isPauseForClientToolsChunk(value)`
388
- internally. Tool authors should construct chunks via the
389
- `pauseForClientTools()` factory rather than by hand so future shape
390
- changes stay source-compatible.
391
-
392
- **Approval pauses:** the sibling `pauseForApproval(toolCall, isClientTool, resumeHandle?)`
393
- chunk halts the parent loop when a sub-agent's inner approval gate fires
394
- (inner `finishReason === 'tool_approval_required'`). The parent's loop
395
- sets `loopFinishReason = 'tool_approval_required'` and surfaces the
396
- gated call on `pendingApprovalToolCall`. The wrapping `asTool({ suspendable })`
397
- generator persists a snapshot with `pauseKind: 'approval'` and yields
398
- this chunk automatically — hand-rolled tools that wrap their own
399
- approval-gated sub-agents can yield it directly. Resume with
400
- `Agent.resumeAsTool(subRunId, [], { runStore, agent, approvedToolCallIds: [...] })`
401
- (or `rejectedToolCallIds`).
7
+ ## Migrate
402
8
 
403
- **Resuming:** that's caller territory `@rudderjs/ai` knows nothing about
404
- the resume protocol. The canonical implementation is in
405
- `@rudderjs/panels`'s `subAgentResume.ts`, which uses a runStore to persist
406
- sub-agent state and re-invokes the tool's enclosing agent on the
407
- continuation request.
9
+ Replace the import specifier; the API is identical.
408
10
 
409
- ### Structured Output
410
-
411
- ```ts
412
- import { agent, Output } from '@rudderjs/ai'
413
- import { z } from 'zod'
414
-
415
- const output = Output.object({
416
- schema: z.object({
417
- people: z.array(z.string()),
418
- companies: z.array(z.string()),
419
- }),
420
- })
421
-
422
- // Use with agent (append output instructions to system prompt)
423
- ```
424
-
425
- ### Prompt caching
426
-
427
- Mark stable parts of the prompt as cacheable. Provider adapters translate the markers to native primitives — Anthropic adds `cache_control: { type: 'ephemeral' }` to the last content block of each marked region. Cache hits typically save 50–90% on input tokens for long system prompts and tool definitions.
428
-
429
- ```ts
430
- class SupportAgent extends Agent {
431
- instructions() { return LONG_SYSTEM_PROMPT } // 50k tokens of policy
432
- tools() { return [...biggToolList] } // 30k tokens of tool defs
433
-
434
- cacheable() {
435
- return { instructions: true, tools: true }
436
- // ^ both eligible — Anthropic caches up to the last marked block
437
- }
438
- }
439
-
440
- await new SupportAgent().prompt('How do I reset my password?')
441
- // ↑ first call: cache miss; subsequent calls within 5 minutes: cache hit
442
- ```
443
-
444
- Cache the first N messages of a multi-turn conversation:
445
-
446
- ```ts
447
- class ChatAgent extends Agent {
448
- cacheable() { return { messages: 4 } } // cache up to message[3]
449
- }
11
+ ```diff
12
+ - import { Agent } from '@rudderjs/ai'
13
+ + import { Agent } from '@gemstack/ai-sdk'
450
14
  ```
451
15
 
452
- Per-call override:
16
+ Subpaths map one to one:
453
17
 
454
- ```ts
455
- await agent.prompt('one-off', { cache: false }) // disable for this call
456
- await agent.prompt('different', { cache: { tools: true } }) // replace agent default
457
- ```
458
-
459
- Google's `cachedContent` is the only provider with a stateful cache resource — its TTL is configurable via the `ttl` field (default `'1h'`):
460
-
461
- ```ts
462
- class SupportAgent extends Agent {
463
- cacheable() {
464
- return { instructions: true, tools: true, ttl: '6h' }
465
- // ^ Google-only; Anthropic/OpenAI ignore it
466
- }
467
- }
468
- ```
469
-
470
- When `@rudderjs/cache` is installed and registered, the Google cache registry uses it for cross-process / cross-restart persistence so multi-worker deployments don't create duplicate cache resources. Without it, the registry falls back to in-memory storage and warns once on first use.
471
-
472
- **Provider support:**
473
-
474
- | Provider | Status |
18
+ | Old | New |
475
19
  |---|---|
476
- | Anthropic | ✓ — `cache_control` on system, tools, and Nth message |
477
- | OpenAI | ✓ — `prompt_cache_key` for routing affinity (caching is automatic above 1024 tokens) |
478
- | Google | ✓ — `cachedContent` resource translation, with TTL refresh and 404 recovery |
479
-
480
- Other adapters ignore the markers — the request runs uncached.
481
-
482
- ### Failover
483
-
484
- Try multiple providers in order — if the primary fails, fall through to the next:
485
-
486
- ```ts
487
- class ResilientAgent extends Agent {
488
- instructions() { return 'You are helpful.' }
489
- model() { return 'anthropic/claude-sonnet-4-5' }
490
- failover() { return ['openai/gpt-4o', 'google/gemini-2.5-pro'] }
491
- }
492
-
493
- // If Anthropic is down, tries OpenAI, then Google
494
- const response = await new ResilientAgent().prompt('Hello')
495
- ```
496
-
497
- Works with both `prompt()` and `stream()`.
498
-
499
- The same pattern is available on the media generators (Image, Audio, Transcription) — pass extra provider/model strings to `.failover(...)`:
500
-
501
- ```ts
502
- await ImageGenerator.of('A donut')
503
- .model('openai/dall-e-3')
504
- .failover('google/imagen-3', 'azure/dall-e-3')
505
- .generate()
506
-
507
- await AudioGenerator.of('Hello').model('openai/tts-1-hd').failover('elevenlabs/eleven_multilingual_v2').generate()
508
- await Transcription.fromBytes(bytes).model('openai/whisper-1').failover('google/gemini-2.0-flash-exp').generate()
509
- ```
510
-
511
- Tried in order. If the primary fails (provider error, capability missing, etc.), the next candidate runs. Only the last error surfaces if every candidate fails.
512
-
513
- ### Image Generation
514
-
515
- ```ts
516
- import { AI } from '@rudderjs/ai'
517
-
518
- const result = await AI.image('A mountain at sunset')
519
- .model('openai/dall-e-3')
520
- .size('landscape')
521
- .quality('hd')
522
- .generate()
523
-
524
- // result.images[0].base64 or result.images[0].url
525
- await AI.image('Logo design').model('openai/dall-e-3').store('images/logo.png')
526
- ```
527
-
528
- ### Text-to-Speech
529
-
530
- ```ts
531
- import { AI } from '@rudderjs/ai'
532
-
533
- const result = await AI.audio('Hello world')
534
- .model('openai/tts-1')
535
- .voice('nova')
536
- .format('mp3')
537
- .generate()
538
-
539
- // result.audio → Buffer
540
- await AI.audio('Welcome').model('openai/tts-1').store('audio/welcome.mp3')
541
- ```
542
-
543
- ### Speech-to-Text
544
-
545
- ```ts
546
- import { AI } from '@rudderjs/ai'
547
-
548
- const bytes = new Uint8Array(/* recorded audio */)
549
-
550
- const result = await AI.transcribe(bytes)
551
- .model('openai/whisper-1')
552
- .language('en')
553
- .generate()
554
-
555
- // result.text → transcribed text
556
- ```
557
-
558
- In Node, load the file with the `/node` helper:
559
-
560
- ```ts
561
- import { transcribeFromPath } from '@rudderjs/ai/node'
562
-
563
- const result = await (await transcribeFromPath('./meeting.mp3'))
564
- .model('openai/whisper-1')
565
- .language('en')
566
- .generate()
567
- ```
568
-
569
- ### Provider Tools (WebSearch, WebFetch)
570
-
571
- Built-in tools that leverage provider capabilities:
572
-
573
- ```ts
574
- import { AI, WebSearch, WebFetch } from '@rudderjs/ai'
575
-
576
- const agent = AI.agent({
577
- instructions: 'Research assistant',
578
- tools: [
579
- WebSearch.make().domains(['docs.rudderjs.dev']).toTool(),
580
- WebFetch.make().maxLength(5000).toTool(),
581
- ],
582
- })
583
- ```
584
-
585
- ### Computer Use (Anthropic)
586
-
587
- `computerUseTool({ page })` exposes a Playwright `Page` to an Anthropic Claude model via the native `computer_20250124` tool block — the model takes screenshots, moves the cursor, clicks, types, scrolls, presses keys. Anthropic-only (`anthropic/*` and `bedrock/anthropic.*`); OpenRouter-routed Anthropic models throw `ComputerUseProviderError`.
588
-
589
- ```ts
590
- import { chromium } from 'playwright'
591
- import { Agent, computerUseTool } from '@rudderjs/ai'
592
-
593
- const browser = await chromium.launch()
594
- const page = await browser.newPage({ viewport: { width: 1280, height: 800 } })
595
-
596
- class BrowserAgent extends Agent {
597
- model() { return 'anthropic/claude-sonnet-4-5' }
598
- tools() {
599
- return [
600
- computerUseTool({
601
- page,
602
- viewport: { width: 1280, height: 800 },
603
- needsApproval: true, // default — pauses the loop before every action
604
- maxActions: 50, // per-instance safety cap; throws ComputerUseLimitError when exceeded
605
- }),
606
- ]
607
- }
608
- }
609
-
610
- await new BrowserAgent().prompt('Find the cheapest flight from SFO to JFK next Tuesday.')
611
- ```
612
-
613
- Playwright is **not** a dep of `@rudderjs/ai` — install it in your app. The tool accepts a structural `PageLike` subset so types check without the 300MB Playwright bundle. Action failures surface as `is_error: true` tool-results so the model can retry. `needsApproval: true` (the default) routes every action through the standard approval gate — review what the model wants to click before it clicks it.
614
-
615
- ### Hosted vector stores + `fileSearch`
616
-
617
- `VectorStores` is a CRUD façade over the provider's hosted vector store; `fileSearch({ stores })` is the agent tool that queries them. The provider runs ingestion, chunking, embedding, and retrieval server-side — no embedding pipeline, no pgvector setup, no `execute` to write. Supported on **OpenAI** (`vectorStores.*`) and **Gemini** (`fileSearchStores`). Same façade, same agent surface.
618
-
619
- ```ts
620
- import { Agent, VectorStores, fileSearch } from '@rudderjs/ai'
621
-
622
- // 1. Manage the store
623
- const kb = await VectorStores.create('Knowledge Base') // OpenAI default
624
- await kb.add({ filePath: './report.pdf', attributes: { author: 'Alice', year: 2026 } })
625
-
626
- // 2. Use it as an agent tool
627
- class SupportAgent extends Agent {
628
- model() { return 'openai/gpt-4o' } // or 'google/gemini-2.5-flash'
629
- tools() {
630
- return [
631
- fileSearch({
632
- stores: [kb.id],
633
- where: { author: 'Alice', year: 2026 }, // server-side metadata filter
634
- maxResults: 10,
635
- }),
636
- ]
637
- }
638
- }
639
- ```
640
-
641
- **Provider override:** pass `{ provider: 'google' }` to `VectorStores.create(...)` for Gemini. Store ids are full resource paths (`fileSearchStores/foo-bar`) on Gemini, opaque (`vs_abc123`) on OpenAI — apps pass them back verbatim through the same `VectorStores` API.
642
-
643
- **Self-hosted RAG fallback.** `fileSearch({ ..., fallback: { model, column, embedWith } })` lifts a `similaritySearch` `execute` onto the tool. Providers that recognize the file-search hint (OpenAI, Gemini) still emit their native block; other providers serialize the tool as a function-call and run the fallback against a local pgvector model. Same agent prompt across hosted and self-hosted RAG.
644
-
645
- Full surface (provider-differences table, `where`/filter shapes, testing with `AiFake`): the framework's [Vector Stores guide](https://github.com/rudderjs/rudder/blob/main/docs/guide/vector-stores.md).
646
-
647
- ### Reranking
648
-
649
- Reorder documents by relevance to a query — useful for RAG pipelines:
650
-
651
- ```ts
652
- import { AI } from '@rudderjs/ai'
653
-
654
- // One-shot
655
- const result = await AI.rerank('search query', documents, {
656
- model: 'cohere/rerank-v3.5',
657
- topK: 5,
658
- })
659
- // result.results → [{ index, relevanceScore, document }, ...]
660
-
661
- // Fluent builder
662
- const result = await AI.rerank('how to deploy', docs)
663
- .model('jina/jina-reranker-v2-base-multilingual')
664
- .topK(10)
665
- .rank()
666
- ```
667
-
668
- Supported providers: **Cohere** (`cohere-ai` SDK) and **Jina** (direct HTTP, no SDK).
669
-
670
- ### File Management
671
-
672
- Upload, list, and delete files on provider platforms — needed for large document context and assistant APIs:
673
-
674
- ```ts
675
- import { AI } from '@rudderjs/ai'
676
-
677
- const files = AI.files('openai')
678
-
679
- // Upload
680
- const uploaded = await files.upload('./report.pdf', { purpose: 'assistants' })
681
- // uploaded → { id, filename, bytes, purpose }
682
-
683
- // List
684
- const { files: allFiles } = await files.list()
685
-
686
- // Delete
687
- await files.delete(uploaded.id)
688
-
689
- // Retrieve content (OpenAI, Anthropic)
690
- const content = await files.retrieve(uploaded.id)
691
- // content → { data: Buffer, mimeType }
692
- ```
693
-
694
- Supported providers: **OpenAI** (full CRUD + retrieve), **Anthropic** (full CRUD + retrieve), **Google** (upload, list, delete — no retrieve).
695
-
696
- ### Embeddings
697
-
698
- ```ts
699
- import { AI } from '@rudderjs/ai'
700
-
701
- // Single text
702
- const result = await AI.embed('Hello world')
703
-
704
- // Batch (auto-chunks arrays > 100 items)
705
- const result = await AI.embed(['text one', 'text two'])
706
-
707
- // With caching
708
- const result = await AI.embed('text', { cache: true })
709
-
710
- // Specific model
711
- const result = await AI.embed('text', { model: 'openai/text-embedding-3-small' })
712
- ```
713
-
714
- ### Vercel AI Protocol
715
-
716
- Stream to frontend frameworks (Next.js, Nuxt, SvelteKit):
717
-
718
- ```ts
719
- import { toVercelResponse } from '@rudderjs/ai'
720
-
721
- // In a route handler
722
- const { stream } = agent('You are helpful.').stream(input)
723
- return toVercelResponse(stream)
724
- ```
725
-
726
- ### Streaming
727
-
728
- ```ts
729
- const { stream, response } = agent('You are helpful.').stream('Tell me a story')
730
-
731
- for await (const chunk of stream) {
732
- if (chunk.type === 'text-delta') process.stdout.write(chunk.text!)
733
- }
734
-
735
- const final = await response // full AgentResponse when stream completes
736
- ```
737
-
738
- ### Queued prompts (`agent.queue()`)
739
-
740
- Push the agent run onto the queue for background execution. Returns a builder so you can configure the queue, attach success/failure callbacks, and (optionally) stream progress to a broadcast channel as it runs.
741
-
742
- Requires `@rudderjs/queue` (and `@rudderjs/broadcast` if you call `.broadcast()`).
743
-
744
- ```ts
745
- // Fire-and-forget background run
746
- await new SupportAgent()
747
- .queue('Help with refund request')
748
- .onQueue('ai')
749
- .send()
750
-
751
- // With success/failure callbacks
752
- await new ResearchAgent()
753
- .queue('Research GPT-5 architecture')
754
- .then(response => console.log('Done:', response.text))
755
- .catch(error => console.error('Failed:', error))
756
- .send()
757
- ```
758
-
759
- #### Stream progress to a broadcast channel — `.broadcast(channel)`
760
-
761
- Background AI work + live UI without polling. Each stream chunk is broadcast to the channel as the job runs; the final response is broadcast as a `done` event:
762
-
763
- ```ts
764
- await new SupportAgent()
765
- .queue('Help with refund request')
766
- .broadcast(`user.${userId}.support`)
767
- .send()
768
-
769
- // Subscribers on `user.${userId}.support` receive:
770
- // { event: 'chunk', data: <StreamChunk> } // one per stream chunk (text-delta, tool-call, ...)
771
- // { event: 'done', data: <AgentResponse> } // final result, after the loop ends
772
- // { event: 'error', data: { message } } // on failure
773
- ```
774
-
775
- The wire shape matches the framework's normal `StreamChunk` types — the same `text-delta` / `tool-call` / `tool-result` shapes you'd iterate from `agent.stream()`. Frontends can subscribe to the channel and reuse their existing chunk-handling code.
776
-
777
- Pass `eventPrefix` to namespace events when the channel carries other unrelated messages:
778
-
779
- ```ts
780
- .broadcast('shared-channel', { eventPrefix: 'agent.' })
781
- // emits 'agent.chunk', 'agent.done', 'agent.error'
782
- ```
783
-
784
- **Process model:** `@rudderjs/broadcast`'s `broadcast()` writes to the WS server in the same process. In the typical RudderJS dev setup (single process running both web + `queue:work`) this works out of the box. Production deployments that run the queue worker as a separate process from the broadcast WS server will need a pub/sub bridge (Redis, Reverb, etc.) — outside the scope of v1.
785
-
786
- ### Conversation History
787
-
788
- Pass message history to maintain context across turns:
789
-
790
- ```ts
791
- const response = await agent('You are helpful.').prompt('Follow up question', {
792
- history: [
793
- { role: 'user', content: 'What is TypeScript?' },
794
- { role: 'assistant', content: 'TypeScript is a typed superset of JavaScript...' },
795
- ],
796
- })
797
- ```
798
-
799
- Works with both `.prompt()` and `.stream()`. History messages are prepended after the system prompt, before the current user message.
800
-
801
- ### Auto-persist conversations
802
-
803
- Override `conversational()` on an agent class to auto-load and auto-save threads without threading user ids through every call site:
804
-
805
- ```ts
806
- class ChatAgent extends Agent {
807
- conversational() { return { user: Auth.user()?.id } }
808
- }
809
-
810
- await new ChatAgent().prompt('Hi') // auto-loads + auto-saves
811
- await new ChatAgent().prompt('Continue?') // resumes same thread (per user + class)
812
- ```
813
-
814
- Returning `false` (the default) keeps the agent stateless. Async returns are awaited; an optional `historyLimit` caps loaded messages. Per-call escape hatches: `prompt(input, { conversation: false })` or `agent.forUser(id).prompt()` / `agent.continue(id).prompt()` — explicit always wins. See `docs/guide/ai.md` for the full precedence chain.
815
-
816
- ### User memory beyond conversation history (Mem0-style)
817
-
818
- Conversation history persists messages; user memory persists **facts** that should travel across conversations. Useful when the agent needs to remember "Alice's project is named Foo" in a brand-new thread without replaying the entire prior session.
819
-
820
- ```ts
821
- import type { UserMemory } from '@rudderjs/ai'
822
- import { MemoryUserMemory } from '@rudderjs/ai'
823
-
824
- // config/ai.ts — wire a backend
825
- export default {
826
- default: 'anthropic/claude-sonnet-4-5',
827
- providers: { /* ... */ },
828
- memory: new MemoryUserMemory(), // in-process; swap for an ORM- or embedding-backed store in production
829
- } satisfies AiConfig
830
-
831
- // Use it directly
832
- const memory = app().make<UserMemory>('ai.memory')
833
- await memory.remember('user_123', 'Project name is Foo', { tags: ['project'] })
834
- const facts = await memory.recall('user_123', 'project')
835
- //=> [{ fact: 'Project name is Foo', tags: ['project'], ... }]
836
- ```
837
-
838
- Or declare on an agent class to opt into auto-inject — relevant facts get prepended to the system prompt before each turn, with no plumbing on the caller's side:
839
-
840
- ```ts
841
- class SupportAgent extends Agent {
842
- remembers() {
843
- return {
844
- user: ctx.user.id,
845
- inject: 'auto', // recall + prepend matching facts before each model call
846
- tags: ['support'], // recall scope
847
- injectLimit: 5, // cap facts per turn
848
- injectTokenBudget: 400, // hard token cap; lowest-score facts drop first
849
- }
850
- }
851
- }
852
-
853
- await new SupportAgent().prompt('Where is my project deployed?')
854
- // system prompt sent to the model:
855
- // "You are a support agent.\n\n
856
- // <user-memory>\n
857
- // - Project Foo deploys to fly.io us-east\n
858
- // - …\n
859
- // </user-memory>"
860
- ```
861
-
862
- The auto-cascade runs in `Agent.prompt` / `Agent.stream`, before conversation persistence. `withMemoryInject(spec)` is also exported so you can drop it into `agent.middleware()` manually if you want full control.
863
-
864
- **Continuation note:** when you pass `options.messages` (e.g. resuming after a client-tool round-trip), both auto-inject and auto-extract are skipped — the system prompt was already augmented on the original turn, and re-extracting would write the same facts twice.
865
-
866
- #### Auto-extract — distill facts from each turn
867
-
868
- Set `extract: 'auto'` (and an `extractWith` model) and a small model is asked to pull durable facts from each successful turn:
869
-
870
- ```ts
871
- class SupportAgent extends Agent {
872
- remembers() {
873
- return {
874
- user: ctx.user.id,
875
- inject: 'auto',
876
- extract: 'auto',
877
- extractWith: 'anthropic/claude-haiku-4-5', // small model for fact distillation
878
- tags: ['support'],
879
- }
880
- }
881
- }
882
-
883
- await new SupportAgent().prompt('hey, my project is named Foo and lives at /var/www/foo')
884
- // On success, the small model is asked to distill durable facts. Survivors above
885
- // the confidence threshold (default 0.7) get written via `mem.remember()`:
886
- // - "Project name is Foo" (score ~0.95, tags: ['support', 'project'])
887
- ```
888
-
889
- Failures (network, JSON parse, schema mismatch, store write) route through `MemoryExtractOptions.onError` and never break the parent run. Failed parent runs do NOT trigger extract.
890
-
891
- **Poisoning mitigation** — auto-extraction trusts the user's own conversation as input. The default 0.7 confidence threshold is the v1 defense against adversarial "facts." Pair with `MemoryExtractOptions.onExtracted` for an audit log when shipping to production, and tighten the threshold for high-risk domains.
892
-
893
- #### Production backend — `OrmUserMemory`
894
-
895
- For production, swap `MemoryUserMemory` for `OrmUserMemory` (subpath `@rudderjs/ai/memory-orm`) — persists rows via your registered `@rudderjs/orm` adapter (Prisma today; Drizzle once you wire the tables):
896
-
897
- ```ts
898
- // config/ai.ts
899
- import type { AiConfig } from '@rudderjs/ai'
900
- import { OrmUserMemory } from '@rudderjs/ai/memory-orm'
901
-
902
- export default {
903
- default: 'anthropic/claude-sonnet-4-5',
904
- providers: { /* ... */ },
905
- memory: new OrmUserMemory(),
906
- } satisfies AiConfig
907
- ```
908
-
909
- Add the schema to your Prisma file (or import the reference string `userMemoryPrismaSchema` from `@rudderjs/ai/memory-orm`):
910
-
911
- ```prisma
912
- model UserMemory {
913
- id String @id @default(cuid())
914
- userId String
915
- fact String
916
- /// JSON-encoded `string[]` of tags, or null
917
- tags String?
918
- /// Confidence score in [0, 1] — extract sets this from the model's self-rating
919
- score Float?
920
- /// Phase 5 — vector embedding for cosine recall (nullable so Phase 4 ignores it)
921
- embedding Bytes?
922
- createdAt DateTime @default(now())
923
- updatedAt DateTime @updatedAt
924
-
925
- @@index([userId])
926
- }
927
- ```
928
-
929
- Then run `pnpm exec prisma db push` (dev) or `pnpm exec prisma migrate dev` (prod). The `embedding Bytes?` column is intentionally nullable — Phase 5's `EmbeddingUserMemory` populates it without forcing a follow-up migration.
930
-
931
- `OrmUserMemory.recall()` uses **OR-of-LIKE token overlap** on the `fact` column — same semantic as `MemoryUserMemory`. Tag-array filtering happens JS-side after fetch (pushing tags into the WHERE is adapter-specific; that lands in a follow-up).
932
-
933
- #### Embedding backend — `EmbeddingUserMemory` (Phase 5)
934
-
935
- For semantic recall ("Where do I deploy?" matching "Project Foo lives at fly.io"), wrap `OrmUserMemory` with `EmbeddingUserMemory` from `@rudderjs/ai/memory-embedding`:
936
-
937
- ```ts
938
- import { OrmUserMemory } from '@rudderjs/ai/memory-orm'
939
- import { EmbeddingUserMemory } from '@rudderjs/ai/memory-embedding'
940
-
941
- export default {
942
- default: 'anthropic/claude-sonnet-4-5',
943
- providers: { /* ... */ },
944
- memory: new EmbeddingUserMemory({
945
- inner: new OrmUserMemory(),
946
- model: 'openai/text-embedding-3-small',
947
- threshold: 0.5, // cosine floor; matches below get dropped
948
- }),
949
- } satisfies AiConfig
950
- ```
951
-
952
- `remember()` embeds the fact via `AI.embed()` and writes the Float32-packed vector into the row's `embedding` column. `recall()` embeds the query and ranks all of the user's facts by **pure-JS cosine similarity** (acceptable up to a few thousand facts/user; for larger workloads, B7 lands a pgvector-backed variant).
953
-
954
- **GDPR right-to-be-forgotten cascades automatically** — the embedding lives in the same row as the fact, so `forget()` / `forgetAll()` delete both. No second store to keep in sync.
955
-
956
- **Backward compat with Phase 4:** rows persisted before `EmbeddingUserMemory` was wired in have `embedding === null`. The default `nullEmbeddingFallback: 'token-overlap'` falls back to the same token-overlap matching `MemoryUserMemory` uses, so upgrading from `OrmUserMemory` doesn't lose recall on existing rows. New `remember()` calls populate the embedding column going forward. Set `nullEmbeddingFallback: 'skip'` to drop pre-embedding rows entirely.
957
-
958
- `embed()` failures (network down, missing peer SDK) are swallowed: `remember()` still persists the entry with `embedding === null`, and `recall()` falls back to token-overlap. The parent prompt never breaks because of memory work.
959
-
960
- **A4 status (all phases shipped):** interface, in-process backend, per-call/class declaration, auto-inject, auto-extract, ORM-backed `OrmUserMemory`, and embedding-backed `EmbeddingUserMemory` all ship today. The roadmap item is complete.
961
-
962
- ### Model Selection
963
-
964
- Configure available models for user selection (used by `@rudderjs/panels` chat UI):
965
-
966
- ```ts
967
- // config/ai.ts
968
- export default {
969
- default: 'anthropic/claude-sonnet-4-5',
970
- providers: { ... },
971
- models: [
972
- { id: 'anthropic/claude-sonnet-4-5', label: 'Claude Sonnet 4.5', default: true },
973
- { id: 'anthropic/claude-opus-4-5', label: 'Claude Opus 4.5' },
974
- { id: 'openai/gpt-4o', label: 'GPT-4o' },
975
- { id: 'google/gemini-2.5-pro', label: 'Gemini 2.5 Pro' },
976
- ],
977
- }
978
- ```
979
-
980
- The model registry is available via `AiRegistry.getModels()` / `AiRegistry.getDefault()`.
981
-
982
- ### Middleware
983
-
984
- ```ts
985
- import type { AiMiddleware } from '@rudderjs/ai'
986
-
987
- const loggingMiddleware: AiMiddleware = {
988
- name: 'logger',
989
- onStart(ctx) { console.log(`[AI] Request ${ctx.requestId} started`) },
990
- onFinish(ctx) { console.log(`[AI] Request ${ctx.requestId} finished`) },
991
- onBeforeToolCall(ctx, toolName, args) {
992
- console.log(`[AI] Calling tool: ${toolName}`, args)
993
- },
994
- }
995
- ```
996
-
997
- ### Per-user budgets — `withBudget(...)`
998
-
999
- Cap daily or monthly spend per user. The middleware pre-debits the estimated input cost on every iteration (refusing with `BudgetExceededError` when the cap would be exceeded) and trues up the actual delta after each step's usage report:
1000
-
1001
- ```ts
1002
- import { withBudget, memoryBudgetStorage, BudgetExceededError } from '@rudderjs/ai'
1003
-
1004
- class ChatAgent extends Agent {
1005
- model() { return 'anthropic/claude-sonnet-4-5' }
1006
- middleware() {
1007
- return [
1008
- withBudget({
1009
- user: () => req.user?.id ?? null, // null bypasses enforcement (unauth)
1010
- budget: { period: 'monthly', cap: 5.00 }, // USD; also 'daily'
1011
- storage: memoryBudgetStorage(), // swap for ormBudgetStorage in production
1012
- // timezone: 'America/Los_Angeles', // optional — period rollover boundary
1013
- // onExceeded: ({ spent, cap }) => log.warn({ spent, cap }, 'budget hit'),
1014
- // pricing: { ...ModelPricing, 'custom/model': { ... } }, // overrides
1015
- }),
1016
- ]
1017
- }
1018
- }
1019
- ```
1020
-
1021
- **Production storage — `OrmBudgetStorage`** persists spend rows via your registered ORM adapter:
1022
-
1023
- ```ts
1024
- import { OrmBudgetStorage, BudgetUsageRecord } from '@rudderjs/ai/budget-orm'
1025
-
1026
- withBudget({
1027
- user: () => req.user.id,
1028
- budget: { period: 'monthly', cap: 25 },
1029
- storage: new OrmBudgetStorage(),
1030
- })
1031
- ```
1032
-
1033
- Schema reference is exported as `budgetUsagePrismaSchema` from `@rudderjs/ai/budget-orm` (also lives at `playground/prisma/schema/ai.prisma`). The `@@unique([userId, period, periodKey])` index is load-bearing — it provides first-write race protection. Caveats: refunds on errors are **not** issued; cache token deltas (Anthropic ephemeral, OpenAI prefix) aren't yet exposed on `TokenUsage`, so cached requests bill at full input rate; default token estimator is `text.length / 4` (override via `estimateTokens` for `tiktoken`). Under high single-user concurrency, total spend can briefly exceed `cap` by up to `costUsd × concurrency` (R-M-W race in the cap-check). The `BudgetExceededError` bubbles up — catch it at the route boundary to return a friendly 402.
1034
-
1035
- ### Testing
1036
-
1037
- ```ts
1038
- import { AiFake, AI } from '@rudderjs/ai'
1039
-
1040
- const fake = AiFake.fake()
1041
- fake.respondWith('Mocked response')
1042
-
1043
- const response = await AI.prompt('Hello')
1044
- assert.strictEqual(response.text, 'Mocked response')
1045
-
1046
- fake.assertPrompted(input => input.includes('Hello'))
1047
- fake.restore()
1048
- ```
1049
-
1050
- Fakes cover every modality:
1051
-
1052
- ```ts
1053
- fake.respondWith('text') // text generation
1054
- fake.respondWithImage('base64...') // image generation
1055
- fake.respondWithAudio(Buffer.from('')) // TTS
1056
- fake.respondWithTranscription('text') // STT
1057
- fake.respondWithEmbedding([[0.1, 0.2]]) // embeddings
1058
- fake.respondWithRanking([ // reranking
1059
- { index: 0, relevanceScore: 0.95, document: 'most relevant' },
1060
- ])
1061
- fake.respondWithFileUpload({ // file upload
1062
- id: 'file-123', filename: 'report.pdf', bytes: 1024,
1063
- })
1064
-
1065
- // Assertions
1066
- fake.assertPrompted() fake.assertImageGenerated()
1067
- fake.assertAudioGenerated() fake.assertTranscribed()
1068
- fake.assertEmbedded() fake.assertReranked()
1069
- fake.assertFileUploaded()
1070
- ```
1071
-
1072
- **Strict mode (`preventStrayPrompts`).** Without it, an unscripted prompt silently falls back to the ambient `respondWith` default — which means a test that forgets to assert anything still passes. Strict mode flips that around: any prompt without a matching scripted response throws.
1073
-
1074
- ```ts
1075
- const fake = AiFake.fake().preventStrayPrompts()
1076
- fake.respondWithSequence([{ text: 'expected reply' }])
1077
-
1078
- await new ChatAgent().prompt('hello') // OK — consumes step 0
1079
- await new ChatAgent().prompt('again') // throws "Stray prompt: no scripted response at step 1"
1080
- ```
1081
-
1082
- Under strict mode, only `respondWithSequence` entries count as valid responses; ambient `respondWith` is ignored. Force a single-step script via `respondWithSequence([{ text: '...' }])` if you want exact-one-prompt tests with content.
1083
-
1084
- ### Evals — `@rudderjs/ai/eval`
1085
-
1086
- `AiFake` proves the agent's wiring works; **evals** prove the agent does the right thing on real models. Define a suite of input cases + assertions, run them against any `Agent`, get a console report with pass/fail + cost + tokens:
1087
-
1088
- ```ts
1089
- // evals/support-agent.eval.ts
1090
- import { evalSuite, llmJudge, exactMatch, regex } from '@rudderjs/ai/eval'
1091
- import { SupportAgent } from '../app/Agents/SupportAgent.js'
1092
-
1093
- export default evalSuite('SupportAgent', {
1094
- agent: () => new SupportAgent(),
1095
- cases: [
1096
- { name: 'password reset',
1097
- input: 'How do I reset my password?',
1098
- assert: llmJudge('mentions a password reset link') },
1099
- { name: 'price',
1100
- input: 'How much does this cost?',
1101
- assert: exactMatch('$99/month') },
1102
- { name: 'support email',
1103
- input: 'How do I contact support?',
1104
- assert: regex(/support@example\.com/) },
1105
- ],
1106
- })
1107
- ```
1108
-
1109
- Run via the CLI (Phase 2):
1110
-
1111
- ```bash
1112
- pnpm rudder ai:eval # all suites under evals/**/*.eval.ts
1113
- pnpm rudder ai:eval support # only suites whose name includes "support"
1114
- pnpm rudder ai:eval --bail # stop on first failing suite
1115
- pnpm rudder ai:eval --json # machine-readable envelope to stdout
1116
- ```
1117
-
1118
- ```text
1119
- SupportAgent (3 cases, 2.3s, $0.014)
1120
- ✓ password reset 1.2s $0.003 tokens: 487
1121
- ✓ price 0.8s $0.002 tokens: 312
1122
- ✗ support email 1.1s $0.002 tokens: 425
1123
- pattern /support@example\.com/ did not match "Reach us at hello@…"
1124
-
1125
- 2 passed, 1 failed
1126
- total: $0.007 • cumulative tokens: 1,224
1127
- ```
1128
-
1129
- Exits 0 when every case passes, 1 on any failure. `--json` emits `{ suites: [{ suite, passed, failed, cases: [{ name, status, pass, score?, reason?, tokens, cost, duration }] }] }` to stdout — pipe directly into `jq` for CI gates.
1130
-
1131
- Override the discovery pattern via `config('ai').eval.pattern` (`'evals/**/*.eval.ts'` by default; supports `<dir>/**/*<suffix>` and `*<suffix>` shapes).
1132
-
1133
- Or run programmatically:
1134
-
1135
- ```ts
1136
- import { runSuite, reportConsole, reportJson } from '@rudderjs/ai/eval'
1137
- import suite from './evals/support-agent.eval.ts'
1138
-
1139
- reportConsole(await runSuite(suite))
1140
- // reportJson(await runSuite(suite)) // structured envelope for CI scripts
1141
- ```
1142
-
1143
- **Built-in metrics:**
1144
-
1145
- | Metric | Behavior |
1146
- |---|---|
1147
- | `exactMatch(string)` | `response.text === expected` |
1148
- | `regex(RegExp)` | `pattern.test(response.text)` |
1149
- | `llmJudge(criterion, opts?)` | Asks a small model whether the response satisfies a natural-language criterion. Returns the judge's reasoning in `reason` so failures are debuggable. |
1150
- | `jsonShape(zodSchema)` | Strips ```` ``` ```` fences, parses, runs zod `safeParse`. Surfaces the zod issue path on failure. Pairs with `Output.object({ schema })` on the agent. |
1151
- | `semanticMatch(reference, opts?)` | Embeds reference + response via `AI.embed()`, cosine similarity vs `opts.threshold` (default `0.85`). Embed cost rolls into the case's cost rollup. Requires a provider with `createEmbedding()` (openai/google/mistral/cohere/jina). |
1152
- | `tokenCost(threshold)` | Passes when `response.usage.totalTokens <= threshold`. Detects prompt-size regressions before they show up as a billing surprise. |
1153
-
1154
- `compose(...metrics)` runs them in order, short-circuits on the first failure, surfaces its reason. Useful for "must be valid JSON AND under budget" assertions:
1155
-
1156
- ```ts
1157
- { input: '…',
1158
- assert: compose(jsonShape(SummarySchema), tokenCost(800)) }
1159
- ```
1160
-
1161
- User-defined metrics implement `(response, ctx) => MetricResult` — no inheritance, no decorators. The catalog is just a starting set.
1162
-
1163
- **Failure semantics:** the runner never throws upward. Agent errors AND assertion throws become `failed` rows with the message in `reason`. Per-case `timeout` (ms) caps long runs. Per-case `agent` factory overrides the suite default — useful for stress-testing one case against a different model.
1164
-
1165
- **Record + replay:**
1166
-
1167
- ```bash
1168
- pnpm rudder ai:eval --record support # run live, save fixtures
1169
- pnpm rudder ai:eval --replay support # zero API calls, deterministic
1170
- ```
1171
-
1172
- `--record` runs each matching case against the real provider and writes assistant turns (text + tool calls) to `evals/__fixtures__/<suite>/<case>.json` (commit these alongside the suite for diffable regression history). `--replay` swaps the runtime with `AiFake` and feeds each case its recorded fixture — same agent code path, scripted responses. Cases without a fixture fall through to a normal run with a stderr warning. The two modes are mutually exclusive.
1173
-
1174
- **Telescope hook:** `aiObservers` emits an `agent.eval.completed` event after every case (passing, failing, skipped). Telescope's AI collector aggregates pass-rate per `(suite, case)` over time.
1175
-
1176
- **HTML report:**
1177
-
1178
- ```bash
1179
- pnpm rudder ai:eval --html report.html # write a self-contained HTML report
1180
- ```
1181
-
1182
- Self-contained HTML (inline CSS + vanilla JS, no external assets), pasteable into PR comments / Slack threads, openable offline. Coexists with `--json` (JSON to stdout, HTML to disk). Click any case row to expand the prompt + response.
1183
-
1184
- Annotate suites with optional metadata:
1185
-
1186
- ```ts
1187
- export default evalSuite('SupportAgent', {
1188
- agent: () => new SupportAgent(),
1189
- cases: [/* … */],
1190
- metadata: {
1191
- owner: '@billing-team',
1192
- lastReviewed: '2026-05-01',
1193
- ticket: 'AI-42',
1194
- },
1195
- })
1196
- ```
1197
-
1198
- ### MCP integration
1199
-
1200
- `@rudderjs/ai/mcp` bridges agents and Model Context Protocol servers in both directions. Optional peer: `@modelcontextprotocol/sdk`.
1201
-
1202
- ```ts
1203
- import { mcpClientTools, mcpServerFromAgent } from '@rudderjs/ai/mcp'
1204
- ```
1205
-
1206
- #### Consume MCP tools in an Agent — `mcpClientTools(transport, opts?)`
1207
-
1208
- Connect to a remote MCP server and surface its tools to an agent.
1209
-
1210
- ```ts
1211
- // HTTP transport
1212
- const tools = await mcpClientTools('https://api.example.com/mcp')
1213
-
1214
- // Local subprocess (stdio)
1215
- const tools = await mcpClientTools({ command: 'npx', args: ['some-mcp-server'] })
1216
-
1217
- // Already-connected SDK Client (caller owns lifecycle)
1218
- const tools = await mcpClientTools(myClient)
1219
-
1220
- class ResearchAgent extends Agent {
1221
- instructions() { return 'You have access to remote tools via MCP.' }
1222
- tools() { return tools }
1223
- }
1224
- ```
1225
-
1226
- The remote server's JSON Schema flows directly to providers via the `jsonSchema` passthrough field on `ToolDefinitionOptions` — no zod round-trip. When this connector owns the underlying client (URL or stdio transport), the returned array exposes a non-enumerable `close()` for shutdown:
1227
-
1228
- ```ts
1229
- const tools = await mcpClientTools('https://api.example.com/mcp')
1230
- // ... use tools in agent ...
1231
- await tools.close?.()
1232
- ```
1233
-
1234
- Options: `filter` (drop tools by name), `namePrefix` (avoid collisions across multiple servers), `streaming` (forward MCP `notifications/progress` as `tool-update` chunks; default `true`).
1235
-
1236
- #### Expose an Agent as an MCP server — `mcpServerFromAgent(AgentClass, opts?)`
1237
-
1238
- Wrap an `Agent` so external MCP clients (Claude Desktop, Cursor, etc.) can call it. Returns a `McpServer` from `@modelcontextprotocol/sdk` — connect with any SDK transport.
1239
-
1240
- ```ts
1241
- import { mcpServerFromAgent } from '@rudderjs/ai/mcp'
1242
- import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
1243
-
1244
- const server = await mcpServerFromAgent(ResearchAgent)
1245
- await server.connect(new StdioServerTransport())
1246
- ```
1247
-
1248
- Three exposure modes via `opts.expose`:
1249
- - `'tools'` *(default)* — one MCP tool per `agent.tools()` entry; the wrapping agent isn't called, individual tools execute directly
1250
- - `'agent'` — one MCP tool that runs the whole agent (`prompt(text) → response.text`); the differentiator move — ship an agent, callable from any MCP-aware client
1251
- - `'both'` — individual tools and the agent prompt-tool side by side
1252
-
1253
- Other options: `name`, `version`, `instructions` (defaults to `agent.instructions()`), `agentToolName` (renames the prompt-tool when `expose: 'agent' | 'both'`).
1254
-
1255
- Approval gates (`needsApproval: true`) are dropped on the MCP side — there's no MCP-protocol way to forward "this tool needs human approval" to a remote client. The gate fires only inside the wrapping agent, not for external MCP callers.
1256
-
1257
- ## Providers
1258
-
1259
- | Provider | SDK | Model String | Text | Embeddings | Images | TTS/STT | Reranking | Files |
1260
- |---|---|---|:---:|:---:|:---:|:---:|:---:|:---:|
1261
- | Anthropic | `@anthropic-ai/sdk` | `anthropic/claude-sonnet-4-5` | ✓ | | | | | ✓ |
1262
- | OpenAI | `openai` | `openai/gpt-4o` | ✓ | ✓ | ✓ | ✓ | | ✓ |
1263
- | Google | `@google/genai` | `google/gemini-2.5-pro` | ✓ | ✓ | ✓ | | | ✓ |
1264
- | Cohere | `cohere-ai` | `cohere/rerank-v3.5` | | ✓ | | | ✓ | |
1265
- | Jina | *(none)* | `jina/jina-reranker-v2-base-multilingual` | | ✓ | | | ✓ | |
1266
- | VoyageAI | *(none)* | `voyage/voyage-3-large` | | ✓ | | | ✓ | |
1267
- | ElevenLabs | *(none)* | `elevenlabs/eleven_multilingual_v2` | | | | ✓ | | |
1268
- | Ollama | *(none)* | `ollama/llama3` | ✓ | | | | | |
1269
- | Groq | *(none)* | `groq/llama-3.3-70b` | ✓ | | | | | |
1270
- | DeepSeek | *(none)* | `deepseek/deepseek-chat` | ✓ | | | | | |
1271
- | xAI | *(none)* | `xai/grok-3` | ✓ | | | | | |
1272
- | Mistral | *(none)* | `mistral/mistral-large` | ✓ | ✓ | | | | |
1273
- | Azure OpenAI | `openai` | `azure/gpt-4o` | ✓ | | | | | |
1274
- | OpenRouter | `openai` | `openrouter/anthropic/claude-3.5-sonnet` | ✓ | | | | | |
1275
- | AWS Bedrock | `@aws-sdk/client-bedrock-runtime` | `bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0` | ✓ | | | | | |
1276
-
1277
- ## Notes
1278
-
1279
- - Provider SDKs are optional dependencies — install only what you use
1280
- - `exactOptionalPropertyTypes` compatible
1281
- - All adapters lazy-load their SDK on first use
1282
- - Ollama, Groq, DeepSeek, xAI, Mistral, OpenRouter reuse the OpenAI adapter (OpenAI-compatible API)
1283
- - Cohere requires `cohere-ai` SDK; Jina uses direct HTTP (no SDK needed)
1284
- - Bedrock uses the AWS credential chain (env vars / IAM roles / `~/.aws/credentials`); v1 supports Anthropic Claude models on Bedrock
20
+ | `@rudderjs/ai` | `@gemstack/ai-sdk` |
21
+ | `@rudderjs/ai/server` | `@gemstack/ai-sdk/server` |
22
+ | `@rudderjs/ai/node` | `@gemstack/ai-sdk/node` |
23
+ | `@rudderjs/ai/mcp` | `@gemstack/ai-sdk/mcp` |
24
+ | `@rudderjs/ai/eval` | `@gemstack/ai-sdk/eval` |
25
+ | `@rudderjs/ai/computer-use` | `@gemstack/ai-sdk/computer-use` |
26
+ | `@rudderjs/ai/react` | `@gemstack/ai-sdk/react` |
27
+ | `@rudderjs/ai/*` | `@gemstack/ai-sdk/*` |
28
+
29
+ See the `@gemstack/ai-sdk` README for full documentation.