@dbx-tools/appkit-mastra 0.1.5 → 0.1.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,728 @@
1
+ # @dbx-tools/appkit-mastra
2
+
3
+ An AppKit plugin that hosts [Mastra](https://mastra.ai) agents inside a
4
+ Databricks App with user-scoped workspace auth (OBO), optional
5
+ Lakebase-backed memory, and an AI SDK chat route the React client can
6
+ consume with `useChat()`.
7
+
8
+ The plugin is designed so that wiring it up looks the same as the
9
+ AppKit
10
+ `[agents](https://developers.databricks.com/docs/appkit/v0/plugins/agents)`
11
+ plugin - same `createAgent` / `tool` helpers, same `tools(plugins)`
12
+ callback shape, same `ToolkitOptions`. Switching between the two for a
13
+ given agent is a one-line import change.
14
+
15
+ ## Quick start
16
+
17
+ The pattern below is the direct counterpart of AppKit's `agents` plugin
18
+ example - swap `agents` for `mastra` and the imports stay structurally
19
+ identical:
20
+
21
+ ```ts
22
+ import { analytics, createApp, files, lakebase, server } from "@databricks/appkit";
23
+ import { createAgent, mastra, tool } from "@dbx-tools/appkit-mastra";
24
+ import { z } from "zod";
25
+
26
+ const support = createAgent({
27
+ instructions: "You help customers with data and files.",
28
+ tools(plugins) {
29
+ return {
30
+ ...plugins.analytics.toolkit(), // every analytics tool
31
+ ...plugins.files.toolkit({ only: ["uploads.read"] }), // filtered subset
32
+ get_weather: tool({
33
+ description: "Weather",
34
+ schema: z.object({ city: z.string() }),
35
+ execute: async ({ city }) => `Sunny in ${city}`,
36
+ }),
37
+ };
38
+ },
39
+ });
40
+
41
+ await createApp({
42
+ plugins: [
43
+ server(),
44
+ analytics(),
45
+ files(),
46
+ // Drop `lakebase()` in and `mastra` auto-enables per-agent thread
47
+ // storage (`schemaName: "mastra_<agentId>"`) plus a shared
48
+ // semantic-recall vector index. Skip it for a stateless agent.
49
+ lakebase(),
50
+ mastra({ agents: support }),
51
+ ],
52
+ });
53
+ ```
54
+
55
+ `createAgent` is a no-op identity helper that anchors type inference.
56
+ `tool` is the AppKit-shaped factory (`{ description, schema, execute }`)
57
+ that auto-adapts to Mastra's `createTool` under the hood.
58
+
59
+ Memory + storage cascades:
60
+
61
+ - **No `lakebase()` registered** ▸ agent is fully stateless. No threads,
62
+ no recall. Same as `mastra()` alone.
63
+ - `**lakebase()` registered, no `storage` / `memory` config** ▸ both
64
+ auto-turn on. Each agent gets its own `PostgresStore` schema; every
65
+ agent shares one `PgVector` recall index.
66
+ - **Per-agent opt-out** ▸ `createAgent({ ..., memory: false, storage: false })`
67
+ for routing / one-shot agents that don't need history.
68
+ - **Per-agent override** ▸ pass a `PgVectorConfig` / `PostgresStoreConfig` object on the agent for a private index or a shared external schema.
69
+
70
+ See [Memory + storage](#memory--storage) for the full cascade and worked
71
+ examples.
72
+
73
+ On the React side, never hardcode `/api/mastra/...`. Pull the published
74
+ paths from `usePluginClientConfig` and use the `chatUrl` helper. Import
75
+ them from the dependency-free `@dbx-tools/appkit-mastra-shared` package
76
+ so your browser bundle doesn't pull in `pg`, `fastembed`, or Mastra:
77
+
78
+ ```tsx
79
+ import { usePluginClientConfig } from "@databricks/appkit-ui/react";
80
+ import { chatUrl, type MastraClientConfig } from "@dbx-tools/appkit-mastra-shared";
81
+ import { useChat } from "@ai-sdk/react";
82
+ import { DefaultChatTransport } from "ai";
83
+ import { useMemo } from "react";
84
+
85
+ function Chat() {
86
+ const config = usePluginClientConfig<MastraClientConfig>("mastra");
87
+ const transport = useMemo(
88
+ () => new DefaultChatTransport({ api: chatUrl(config) }),
89
+ [config],
90
+ );
91
+ const { messages, sendMessage } = useChat({ transport });
92
+ // ...
93
+ }
94
+ ```
95
+
96
+ See [Client wiring](#client-wiring) for the full `MastraClientConfig`
97
+ shape and per-agent selection.
98
+
99
+ ## Sensible defaults
100
+
101
+ The plugin is opinionated about what "no config" should mean. Everything
102
+ below can be overridden, but the bare path works:
103
+
104
+
105
+ | Scenario | What you get |
106
+ | --------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
107
+ | `mastra()` | One built-in `default` analyst agent, model from `/serving-endpoints`; memory + storage auto-on if the `lakebase` plugin is registered |
108
+ | `mastra({ agents: def })` | Single-agent shorthand - `def` is registered and marked as default |
109
+ | `mastra({ agents: [def1, def2] })` | Array shorthand - keys come from each `def.name` (or `agent_<i>`); first one is default |
110
+ | `mastra({ agents: { x: def, y: def }})` | Record - keys are the registered ids; first key is the default |
111
+ | No `defaultAgent` set | First registered agent wins |
112
+ | No `model` on an agent | Falls back to `config.defaultModel`, then `DATABRICKS_SERVING_ENDPOINT_NAME`, then walks `defaultModelFallbacks` (Thinking → Balanced → Fast tiers, Claude ↔ GPT ↔ Gemini interleaved within each, then open-weights) and picks the first endpoint actually present in the workspace |
113
+ | No `name` on a definition | Uses the registry key as `Agent.name` |
114
+ | No `tools` on an agent | Inherits plugin-level `config.tools` ambient set (if any) |
115
+ | No `storage` / `memory` and `lakebase()` registered | Both auto-default to `true`. Pass `false` (or a custom config object) on the plugin or an agent to opt out / override. |
116
+ | `storage` / `memory` on an agent | Cascades from plugin: storage namespaces **per-agent** (`schemaName: "mastra_<agentId>"`), vector recall is **shared** across agents on one `PgVector` singleton. |
117
+
118
+
119
+ Every field on `MastraAgentDefinition`:
120
+
121
+
122
+ | Field | Description |
123
+ | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
124
+ | `name` | Display name. Defaults to the registry key. |
125
+ | `description` | Long-form description, surfaced as `Agent.description`. |
126
+ | `instructions` | System prompt body. Required. |
127
+ | `model` | Per-agent model override. String = `modelId` sugar; otherwise a Mastra `DynamicArgument`. |
128
+ | `tools` | Plain record OR `(plugins) => tools` callback (see below). |
129
+ | `memory` | `false`, `true`, or a `PgVectorConfig`. Cascades from `config.memory`. **Default: shared singleton `PgVector` across every agent** - object override switches to a dedicated index. |
130
+ | `storage` | `false`, `true`, or a `PostgresStoreConfig`. Cascades from `config.storage`. **Default: per-agent namespace** via `schemaName: "mastra_<agentId>"` so threads stay isolated. |
131
+
132
+
133
+ Plugin-level fields:
134
+
135
+
136
+ | Field | Description |
137
+ | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
138
+ | `agents` | The registry. Omit for a built-in `default` analyst. |
139
+ | `defaultAgent` | Id that `chatRoute` binds to when no `:agentId` is supplied. Defaults to first registered. |
140
+ | `defaultModel` | Fallback for any agent that omits `model`. Same shape (string sugar or `DynamicArgument`). |
141
+ | `defaultModelFallbacks` | Priority-ordered list walked when no `model` / env / override is set. First entry whose endpoint exists in the workspace wins. Default chains the three `ModelTier`s (Thinking → Balanced → Fast); within each tier providers are interleaved Claude ↔ GPT ↔ Gemini with open-weights appended. Compose your own with `modelsForTier(ModelTier.Fast)` or read straight from `MODEL_CATALOG`. |
142
+ | `tools` | Ambient tools merged into every agent (per-agent tools win on collisions). |
143
+ | `storage` | `undefined` (default) auto-enables when `lakebase()` is registered; `true` does the same explicitly; `false` opts out; an object opens a dedicated `PostgresStore`. Per-agent default is `schemaName: "mastra_<agentId>"`. |
144
+ | `memory` | `undefined` (default) auto-enables when `lakebase()` is registered; `true` does the same explicitly; `false` opts out; an object opens a dedicated `PgVector`. Default behavior: one shared `PgVector` singleton across every agent. |
145
+ | `modelFuzzyMatch` | `false` to disable fuzzy snapping of model ids against the workspace's Model Serving catalogue. Defaults to `true`. |
146
+ | `modelFuzzyThreshold` | Fuse.js score threshold (`0` exact, `1` anything). Defaults to `0.4`. |
147
+ | `modelCacheTtlMs` | TTL for the cached endpoint list, per workspace host. Defaults to 5 minutes. Concurrent callers share one in-flight fetch. |
148
+ | `modelOverride` | `false` to disable per-request `X-Mastra-Model` / `?model=` / body overrides. Defaults to `true`. |
149
+ | `styleInstructions` | Style guardrails appended to every agent's `instructions` to curb LLM-isms (em dashes, emojis, sycophantic openers, throwaway closers). `undefined` (default) uses the built-in `DEFAULT_STYLE_INSTRUCTIONS`; a string replaces it; `false` disables. |
150
+
151
+
152
+ ## The `tools(plugins)` callback
153
+
154
+ Each registered agent can supply either a static `tools: { ... }` record
155
+ or a `tools(plugins)` callback. The returned record accepts **any**
156
+ tool shape Mastra understands:
157
+
158
+ - Mastra tools built with `createTool` or `new Tool(...)`
159
+ - AppKit-shaped tools built with the `tool()` wrapper (below)
160
+ - Vercel AI SDK tools (`tool({...})` from `ai`)
161
+ - Provider-defined tools (e.g. `openai.tools.webSearch(...)`)
162
+ - Toolkits returned from `plugins.<name>.toolkit(...)`
163
+
164
+ ```ts
165
+ tools(plugins) {
166
+ return {
167
+ // Sibling plugin toolkits.
168
+ ...plugins.analytics.toolkit(), // every analytics tool
169
+ ...plugins.files.toolkit({ only: ["uploads.read"] }), // filtered subset
170
+
171
+ // AppKit-shaped inline tool.
172
+ get_weather: tool({
173
+ description: "Weather",
174
+ schema: z.object({ city: z.string() }),
175
+ execute: async ({ city }) => `Sunny in ${city}`,
176
+ }),
177
+
178
+ // Existing Mastra tool dropped in unchanged.
179
+ save_doc: existingMastraTool,
180
+ };
181
+ }
182
+ ```
183
+
184
+ `plugins` is a typed `Record<string, { toolkit(opts?): tools }>` matching
185
+ AppKit's `Plugins` type. It's backed by a runtime Proxy that
186
+ **auto-discovers any registered AppKit `ToolProvider` plugin** -
187
+ `analytics`, `files`, `lakebase`, `genie`, plus any third-party plugin
188
+ that implements the standard `getAgentTools()` + `executeAgentTool()` +
189
+ `toolkit()` interface. Tool calls dispatch through the plugin's
190
+ `executeAgentTool`, so OBO auth (`asUser`) and telemetry spans stay
191
+ intact.
192
+
193
+ `plugins.genie` is special-cased: it swaps the generic AppKit toolkit
194
+ (which only emits a single final result chunk per call) for a
195
+ streaming-aware tools record built on top of the plugin's
196
+ `exports().sendMessage` AsyncGenerator. Each Genie wire event
197
+ (`FETCHING_METADATA`, `ASKING_AI`, `EXECUTING_QUERY`, attached SQL,
198
+ errors) is normalised into a `GenieProgress` payload and pushed
199
+ mid-flight through Mastra's `ctx.writer`, surfacing as
200
+ `tool-output` chunks the React client can render as inline status
201
+ pills and SQL blocks while the LLM is still waiting on the final
202
+ `tool-result`. Tool ids are stable: `genie` for the default alias,
203
+ `genie-<alias>` for additional aliases, and one shared
204
+ `genie_get_conversation`.
205
+
206
+ Genie tool-result shape (LLM-bound):
207
+
208
+ ```
209
+ {
210
+ conversationId?: string, // pass back to continue the same Genie thread
211
+ genieAnswer?: string, // Genie's prose answer; pass through verbatim
212
+ datasets?: [{ // metadata only, one per executed SQL statement
213
+ chartId: string, // embed [[chart:<chartId>]] to render inline
214
+ title?: string,
215
+ description?: string,
216
+ columns: string[],
217
+ rowCount: number,
218
+ sql?: string,
219
+ }],
220
+ suggestedFollowUps?: string[], // UI shows as buttons; don't list in reply
221
+ error?: string,
222
+ }
223
+ ```
224
+
225
+ `datasets[]` is metadata only - column names, row count, the SQL
226
+ Genie ran. The actual rows ride a separate `kind: "chart"` writer
227
+ event so the LLM never has them in context (token cost stays flat
228
+ regardless of dataset size). The model references each dataset by
229
+ its `chartId` via the `[[chart:<chartId>]]` marker to display the
230
+ chart inline; see the `render_data` section below for how the
231
+ client resolves those markers.
232
+
233
+ Genie data flow:
234
+
235
+ - The writer emits `kind: "started" | "status" | "sql" |
236
+ "suggested" | "error"` events for the live loading pill. SQL
237
+ text is shown via a small Shiki-highlighted block.
238
+ - The writer **also** emits two `kind: "chart"` events per
239
+ executed SQL statement, sharing the same `chartId`: the first
240
+ carries `{chartId, title, description?, data}` (rows converted
241
+ to objects keyed by column name with best-effort numeric
242
+ coercion); the second, when the chart-planner agent finishes,
243
+ carries `{chartId, option}` (the resolved Echarts spec). The
244
+ chat client's `<ChartSlot>` merges them by `chartId` and
245
+ renders inline at the matching `[[chart:<chartId>]]` marker.
246
+ This is the exact same wire format as the `render_data` tool,
247
+ so Genie and hand-built charts are indistinguishable on the
248
+ client.
249
+ - After a hard reload, `synthesizeToolEventsFromHistory` rebuilds
250
+ `suggested` events from the persisted tool-result; the SQL pill
251
+ and chart events are live-only and don't replay.
252
+
253
+ ### `render_data` (system-default ambient tool)
254
+
255
+ `buildAgents` registers a system-level `render_data` tool on
256
+ every agent so the model can submit any tabular dataset for
257
+ inline charting. Users can shadow it by including a same-named
258
+ tool in `config.tools` or in a per-agent `tools` map; otherwise
259
+ it's just there.
260
+
261
+ The tool is generic - not coupled to Genie or any particular
262
+ upstream. Input is `{ title, description?, data: Row[] }` where
263
+ `data` is an array of objects keyed by column name (a SQL row
264
+ set, an API response, a hand-built array, etc.).
265
+
266
+ #### How it works (shared pipeline with Genie)
267
+
268
+ Both `render_data` and Genie's `drainGenieStream` route through
269
+ one helper, `emitChartWithPlanning`, so the wire format and
270
+ trace shape are consistent everywhere:
271
+
272
+ 1. Mint a short `chartId` (8 hex chars).
273
+ 2. Push a `{ kind: "chart", chartId, title, description?, data }`
274
+ event to `ctx.writer` immediately. The chat client mounts a
275
+ `<ChartSlot>` showing a "Rendering chart" skeleton.
276
+ 3. Kick off the chart-planner agent
277
+ (`modelForTier(ModelTier.Fast)`) with the dataset in the
278
+ background. The agent's `structuredOutput` schema is a compact
279
+ "chart plan" (chart type + axis labels + categories +
280
+ series); the helper expands the plan into an Echarts
281
+ `EChartsOption` JSON.
282
+ 4. When the planner resolves, push a follow-up
283
+ `{ kind: "chart", chartId, option }` event. The client's
284
+ `<ChartSlot>` merges it with the dataset event (last write
285
+ wins per field) and swaps in `<ReactECharts>`.
286
+ 5. If the planner fails, no follow-up event fires. Once the
287
+ parent tool finishes, the slot transitions to a "couldn't
288
+ render chart" fallback frame.
289
+
290
+ The parent tool (`render_data` or Genie) `await`s the planner
291
+ promise(s) before its `execute` returns, so chart latency shows
292
+ up under the tool's trace span. The dataset event fires
293
+ **immediately**, though, so the calling LLM gets back the
294
+ `chartId` synchronously and the chat client has a layout slot
295
+ ready before the planner resolves.
296
+
297
+ The LLM-bound payload is just `{ chartId }` (for `render_data`)
298
+ or `datasets[]` metadata (for Genie); row data and option JSON
299
+ never reach the LLM, so token cost stays flat regardless of
300
+ dataset size.
301
+
302
+ #### Inline placement contract
303
+
304
+ The model embeds `[[chart:<chartId>]]` on its own line in its
305
+ markdown reply at the position where the chart should appear:
306
+
307
+ ```markdown
308
+ ## Audit Score
309
+
310
+ Audit Score is stable at ~94%, hovering between 93.5 and 95.0.
311
+
312
+ [[chart:a3f9c1d2]]
313
+
314
+ ## Service Time
315
+
316
+ Service time is the outlier at 162.5s, up from a target of 150s.
317
+
318
+ [[chart:b7e2d4f1]]
319
+ ```
320
+
321
+ The chat client splits the assistant text on these markers and
322
+ drops a `<ChartSlot>` in at each spot. A marker that arrives
323
+ before its `chart` event (rare; only during fast streaming) shows
324
+ a "Queueing chart" skeleton; a chart whose marker the model
325
+ forgot to place falls through to the end of the reply as a
326
+ fallback. All three states (queueing, rendering, rendered) share
327
+ the same fixed-height frame so the layout doesn't jump as
328
+ charts resolve.
329
+
330
+ #### Trade-offs
331
+
332
+ - Both events ride the writer, not the persisted tool-result, so
333
+ charts don't re-render after a hard reload. The model can call
334
+ `render_data` again (or re-ask Genie) on the next turn if the
335
+ user wants the chart back.
336
+ - The chart-planner is a separate model call per dataset (fast
337
+ tier, but still ~1-3s each). For an N-chart turn, latency is
338
+ `Genie + max(planners)` since the planners run concurrently
339
+ with the rest of Genie's stream and with each other.
340
+
341
+ Plugins that aren't registered (or don't implement the toolkit
342
+ interface) resolve to `undefined` at runtime, so guard with `?.` /
343
+ `?? {}` when a backing plugin is optional in some environments:
344
+
345
+ ```ts
346
+ tools(plugins) {
347
+ return {
348
+ ...(plugins.analytics?.toolkit() ?? {}),
349
+ ...(plugins.genie?.toolkit({ prefix: "g_" }) ?? {}),
350
+ };
351
+ }
352
+ ```
353
+
354
+ `plugins.<name>.toolkit(opts)` accepts the same `ToolkitOptions` shape
355
+ AppKit's own toolkits expose (passed through verbatim):
356
+
357
+ - `prefix?: string` - prepended to every key (AppKit default: `${pluginName}.`)
358
+ - `only?: string[]` / `except?: string[]` - allow/deny list against the
359
+ local tool name
360
+ - `rename?: Record<string, string>` - remap individual keys
361
+
362
+ ### `tool()` vs `createTool()`
363
+
364
+ The `tool()` factory mirrors `@databricks/appkit/beta`'s shape so
365
+ sharing tool code between the AppKit `agents` plugin and this one is a
366
+ single-line import change:
367
+
368
+ ```ts
369
+ import { tool, createTool } from "@dbx-tools/appkit-mastra";
370
+
371
+ // AppKit-shaped (description / schema / flat-arg execute).
372
+ const weather = tool({
373
+ description: "Weather",
374
+ schema: z.object({ city: z.string() }),
375
+ execute: async ({ city }) => `Sunny in ${city}`,
376
+ });
377
+
378
+ // Full Mastra `createTool` (id required, inputSchema, advanced fields).
379
+ const saveDoc = createTool({
380
+ id: "save-doc",
381
+ description: "Persist a document",
382
+ inputSchema: z.object({ key: z.string(), value: z.any() }),
383
+ outputSchema: z.object({ saved: z.boolean() }),
384
+ requireApproval: true,
385
+ execute: async (input, ctx) => {
386
+ await ctx.mastra?.getStorage()?.set(input.key, input.value);
387
+ return { saved: true };
388
+ },
389
+ });
390
+ ```
391
+
392
+ When `tool()`'s `id` is omitted it's auto-derived from a slugified
393
+ description plus a 6-char FNV-1a base-32 suffix - stable across runs
394
+ so traces stay readable. Pass an explicit `id` when you want to pin
395
+ one.
396
+
397
+ Reach for `createTool` when you need Mastra-only fields (`outputSchema`,
398
+ `suspendSchema`, `requireApproval`, `mcp`, etc.).
399
+
400
+ ## Model resolution
401
+
402
+ Each agent call resolves a `MastraModelConfig` lazily so concurrent
403
+ requests get distinct user identities. There are two paths through
404
+ the resolver depending on whether the caller asked for a specific
405
+ model.
406
+
407
+ ### Explicit ask (override / agent / plugin / env)
408
+
409
+ When any of these is set the resolver fuzzy-matches that single id
410
+ against the live `/serving-endpoints` list, in priority order:
411
+
412
+ 1. Per-request override (`X-Mastra-Model` header, `?model=` query,
413
+ or `model` / `modelId` body field; see below)
414
+ 2. Per-agent `def.model` (string sugar or `DynamicArgument`)
415
+ 3. Plugin-level `config.defaultModel`
416
+ 4. `DATABRICKS_SERVING_ENDPOINT_NAME` env var
417
+
418
+ The matcher is `fuse.js` extended search with tokens split on
419
+ non-word characters and AND-joined. Exact matches win immediately;
420
+ loose tokens like `"claude sonnet"` snap to
421
+ `databricks-claude-sonnet-4-6`, `"llama 70b"` to
422
+ `databricks-meta-llama-3-3-70b-instruct`, `"DBRX"` to
423
+ `databricks-dbrx-instruct`, and so on. If no candidate scores below
424
+ `modelFuzzyThreshold` (default `0.4`) the input is returned verbatim
425
+ and Databricks surfaces the canonical 404.
426
+
427
+ ### No explicit ask (tier-aware fallback list)
428
+
429
+ When nothing is set, the resolver walks an opinionated
430
+ priority-ordered list and returns **the first id that is actually
431
+ present in the workspace's endpoint listing**. This is how a workspace
432
+ without Claude Opus still gets a sensible default automatically -
433
+ the resolver skips ahead to whichever Sonnet / GPT-5 / Gemini / Llama
434
+ variant is wired up.
435
+
436
+ The catalogue is grouped two ways:
437
+
438
+ - By **capability tier** via the `ModelTier` enum:
439
+ `ModelTier.Thinking` (deepest reasoning), `ModelTier.Balanced`
440
+ (cost/latency sweet spot), `ModelTier.Fast` (cheap & quick for
441
+ classification / routing / simple summarisation).
442
+ - By **provider** within each tier: `claude`, `gpt`, `gemini`,
443
+ `openSource`.
444
+
445
+ Both views live on `MODEL_CATALOG[tier][provider]`. The walked
446
+ `FALLBACK_MODEL_IDS` chains the three tiers in descending power
447
+ (Thinking → Balanced → Fast); within each tier providers are
448
+ round-robin-zipped (Claude ↔ GPT ↔ Gemini) before the open-weights
449
+ tail is appended as the universal floor.
450
+
451
+
452
+ | Tier (most powerful first) | Claude | GPT | Gemini | Open weights |
453
+ | -------------------------- | -------------------------------- | ------------------------------------- | ------------------------------- | ---------------------------------------------- |
454
+ | `ModelTier.Thinking` | Opus 4.8 → 4.7 → 4.6 → 4.5 → 4.1 | 5.5 Pro | 3.1 Pro → 3 Pro → 2.5 Pro | Llama 4 Maverick, GPT-OSS 120B, Llama 3.1 405B |
455
+ | `ModelTier.Balanced` | Sonnet 4.6 → 4.5 → 4 | 5.5 → 5.4 → 5.2 → 5.1 → 5 | 3.5 Flash → 3 Flash → 2.5 Flash | Llama 3.3 70B, Qwen3-Next 80B, Qwen35 122B |
456
+ | `ModelTier.Fast` | Haiku 4.5 | 5.4 mini → 5.4 nano → 5 mini → 5 nano | 3.1 Flash Lite | GPT-OSS 20B, Gemma 3 12B, Llama 3.1 8B |
457
+
458
+
459
+ #### Pick a tier-appropriate model for one agent
460
+
461
+ Use `modelForTier(tier)` to grab the top of a tier as a string; the
462
+ agent-step resolver fuzzy-matches it against the live catalogue at
463
+ call time so it still works when the literal top pick isn't deployed.
464
+
465
+ ```ts
466
+ import { createAgent, ModelTier, modelForTier } from "@dbx-tools/appkit-mastra";
467
+
468
+ const classifier = createAgent({
469
+ instructions: "Classify this email into one of: billing, support, spam.",
470
+ model: modelForTier(ModelTier.Fast),
471
+ });
472
+
473
+ const planner = createAgent({
474
+ instructions: "Plan a multi-step data migration.",
475
+ model: modelForTier(ModelTier.Thinking),
476
+ });
477
+ ```
478
+
479
+ #### Bias the plugin-level fallback toward a tier
480
+
481
+ `modelsForTier(tier)` returns the priority-ordered list for one tier;
482
+ pass it to `defaultModelFallbacks` to scope the auto-resolver:
483
+
484
+ ```ts
485
+ import { mastra, ModelTier, modelsForTier } from "@dbx-tools/appkit-mastra";
486
+
487
+ mastra({
488
+ // All agents that omit `model` will land on a Fast-tier endpoint.
489
+ defaultModelFallbacks: modelsForTier(ModelTier.Fast),
490
+ });
491
+ ```
492
+
493
+ #### Pin a custom approved subset
494
+
495
+ Mix in your own endpoint names (internal fine-tunes, regulated
496
+ allowlists, etc) in front of the catalogue:
497
+
498
+ ```ts
499
+ mastra({
500
+ defaultModelFallbacks: [
501
+ "my-org-finetune-v2", // try internal endpoint first
502
+ "databricks-claude-sonnet-4-6", // approved fallback
503
+ ],
504
+ });
505
+ ```
506
+
507
+ If the workspace has none of the listed ids, the top fallback is
508
+ returned and Databricks surfaces the canonical error.
509
+
510
+ The endpoint list is cached per workspace host through AppKit's
511
+ built-in `CacheManager` (`CacheManager.getInstanceSync().getOrExecute`),
512
+ which is the TypeScript counterpart of Python's `cachetools.TTLCache`
513
+ plus `cachetools-async` rolled into one: per-entry TTL (default 5
514
+ minutes via `modelCacheTtlMs`), bounded size, in-flight request
515
+ coalescing (the manager's internal `inFlightRequests` map shares one
516
+ fetch across every concurrent caller), telemetry spans, and optional
517
+ Lakebase persistence when the `lakebase` plugin is wired up. No extra
518
+ dependency lives in this package; the catalogue piggybacks on whatever
519
+ storage backend AppKit picked at boot.
520
+
521
+ String values (`"databricks-claude-sonnet-4-6"`) are `modelId` sugar
522
+ layered on top of the auto-resolver - workspace URL, provider, and OBO
523
+ auth stay default. Pass a `DynamicArgument<MastraModelConfig>` on
524
+ `def.model` / `config.defaultModel` when you need full control over
525
+ auth, provider, or URL; that path bypasses the fuzzy matcher and
526
+ per-request override.
527
+
528
+ ### `GET /api/mastra/models`
529
+
530
+ The plugin exposes the cached endpoint catalogue at `/models` (mounted
531
+ under the plugin prefix, default `/api/mastra`) so clients can populate
532
+ model pickers and validate `?model=` choices without a separate
533
+ Databricks SDK round-trip:
534
+
535
+ ```bash
536
+ curl -s http://localhost:8000/api/mastra/models | jq
537
+ # {
538
+ # "endpoints": [
539
+ # { "name": "databricks-claude-sonnet-4-6", "task": "llm/v1/chat", "state": "READY", ... },
540
+ # { "name": "databricks-meta-llama-3-3-70b-instruct", ... },
541
+ # ...
542
+ # ]
543
+ # }
544
+ ```
545
+
546
+ Same payload from a sibling plugin or script (no HTTP round-trip):
547
+
548
+ ```ts
549
+ import { pluginUtils } from "@dbx-tools/appkit-shared";
550
+ import { mastra } from "@dbx-tools/appkit-mastra";
551
+
552
+ const m = pluginUtils.require(this.context, mastra).exports();
553
+ const endpoints = await m.asUser(req).listModels(); // user-scoped
554
+ m.clearModelsCache(); // force the next call to re-fetch
555
+ ```
556
+
557
+ ### Per-request model override
558
+
559
+ Any in-flight request can pick a different backing endpoint without
560
+ redeploying. Sources, checked in priority order:
561
+
562
+
563
+ | Source | Example |
564
+ | ------------------------- | ------------------------------------------------ |
565
+ | `X-Mastra-Model` header | `curl -H 'X-Mastra-Model: claude-haiku' ...` |
566
+ | `?model=` query parameter | `POST /api/mastra/route/chat?model=llama-70b` |
567
+ | Body `model` or `modelId` | `{ "messages": [...], "model": "claude-haiku" }` |
568
+
569
+
570
+ The override flows through the same fuzzy matcher as static ids, so
571
+ `X-Mastra-Model: claude sonnet` still snaps to
572
+ `databricks-claude-sonnet-4-6`. Set `modelOverride: false` on the
573
+ plugin config to disable the override path entirely (e.g. for a
574
+ multi-tenant deployment where untrusted clients shouldn't pick the
575
+ endpoint).
576
+
577
+ ## Memory + storage
578
+
579
+ Memory and storage are split into two independent knobs and both auto-on
580
+ the moment the `lakebase` plugin is registered. Bare `mastra()` next to
581
+ `lakebase()` already gets you per-agent threads + shared semantic recall;
582
+ zero extra config required.
583
+
584
+
585
+ | Knob | Default when `lakebase()` is registered | What it backs |
586
+ | --------- | ----------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
587
+ | `storage` | **Per-agent** `PostgresStore` namespaced by `schemaName: "mastra_<agentId>"` so threads + messages stay isolated. | Mastra threads, messages, working memory. |
588
+ | `memory` | **Shared singleton** `PgVector` across every agent (cross-agent semantic recall on one index). | RAG-style recall over past messages via FastEmbed vectors. |
589
+
590
+
591
+ Override either at the plugin level, the agent level, or both. The agent
592
+ value wins when set; otherwise the plugin value cascades.
593
+
594
+ ```ts
595
+ mastra({
596
+ // Plugin defaults. Either field becomes the cascading baseline.
597
+ // Omit entirely to inherit "auto-on when lakebase is present".
598
+ storage: true, // (default behavior when lakebase is registered)
599
+ memory: true, // (default behavior when lakebase is registered)
600
+
601
+ agents: {
602
+ analyst: createAgent({
603
+ instructions: "...",
604
+ // No overrides: inherits the auto-on defaults above.
605
+ // - threads stored under schema "mastra_analyst"
606
+ // - recalls from the shared vector index
607
+ }),
608
+
609
+ router: createAgent({
610
+ instructions: "Stateless routing agent.",
611
+ // Opt out of both for a fully stateless agent.
612
+ storage: false,
613
+ memory: false,
614
+ }),
615
+
616
+ legal: createAgent({
617
+ instructions: "Compliance-bounded assistant.",
618
+ // Private vector index so legal's recall doesn't bleed into
619
+ // analyst's. Threads still get their own per-agent schema.
620
+ memory: { connectionString: process.env.LEGAL_PG_URL!, /* ... */ },
621
+ }),
622
+
623
+ archive: createAgent({
624
+ instructions: "Read-only archive viewer.",
625
+ // Pin to a specific schema (e.g. shared with another service).
626
+ storage: {
627
+ schemaName: "shared_history",
628
+ pool: archivePool,
629
+ },
630
+ }),
631
+ },
632
+ });
633
+ ```
634
+
635
+ Notes:
636
+
637
+ - `PostgresStore` runs `CREATE SCHEMA IF NOT EXISTS` on `init()`, so
638
+ per-agent schemas spring into existence the first time an agent saves
639
+ a message. No bundle / migration step required.
640
+ - Disabling `lakebase()` from your plugin list while leaving `storage` /
641
+ `memory` truthy fails fast at setup with a clear "lakebase plugin not
642
+ registered" error.
643
+ - The `lakebase` plugin is declared as a **required** resource only when
644
+ `storage` / `memory` is explicitly truthy at registration time. Auto-on
645
+ defaults activate inside `setup:complete`, after lakebase is already
646
+ proven to be present.
647
+
648
+ ## Runtime exports
649
+
650
+ Other plugins / route handlers can introspect the registry via the
651
+ `exports()` surface, modeled on AppKit's:
652
+
653
+ ```ts
654
+ import { pluginUtils } from "@dbx-tools/appkit-shared";
655
+ import { mastra } from "@dbx-tools/appkit-mastra";
656
+
657
+ const m = pluginUtils.require(this.context, mastra).exports();
658
+ m.list(); // ["analyst", "helper"]
659
+ m.get("analyst"); // Agent | null
660
+ m.getDefault(); // Agent | null
661
+ m.getMastra(); // underlying Mastra instance (advanced)
662
+ m.listModels(); // Promise<ServingEndpointSummary[]> - cached + OBO when wrapped with asUser(req)
663
+ m.clearModelsCache(); // force the next listModels() to re-fetch
664
+ ```
665
+
666
+ ## Client wiring
667
+
668
+ `clientConfig()` publishes the mount paths, default agent id, and the
669
+ full registry to `usePluginClientConfig("mastra")` so the React client
670
+ never has to hardcode `/api/mastra` or rely on `DEFAULT_AGENT_ID`
671
+ constants. A tiny URL helper (`chatUrl`) and the `MastraClientConfig`
672
+ type ship from the standalone `@dbx-tools/appkit-mastra-shared`
673
+ package; that package is pure (no `pg` / `fastembed` / Mastra
674
+ dependencies) so it imports cleanly into Vite / Webpack / esbuild
675
+ builds.
676
+
677
+ ```tsx
678
+ import { usePluginClientConfig } from "@databricks/appkit-ui/react";
679
+ import { chatUrl, type MastraClientConfig } from "@dbx-tools/appkit-mastra-shared";
680
+ import { useChat } from "@ai-sdk/react";
681
+ import { DefaultChatTransport } from "ai";
682
+ import { useMemo, useState } from "react";
683
+
684
+ function Chat() {
685
+ const config = usePluginClientConfig<MastraClientConfig>("mastra");
686
+ const [selected, setSelected] = useState<string>();
687
+ const api = chatUrl(config, selected); // defaults to config.defaultAgent
688
+
689
+ const transport = useMemo(() => new DefaultChatTransport({ api }), [api]);
690
+ const { messages, sendMessage } = useChat({ transport });
691
+
692
+ return (
693
+ <>
694
+ <select onChange={(e) => setSelected(e.target.value)}>
695
+ {config.agents.map((id) => (
696
+ <option key={id} value={id}>
697
+ {id}
698
+ </option>
699
+ ))}
700
+ </select>
701
+ {/* render messages, etc. */}
702
+ </>
703
+ );
704
+ }
705
+ ```
706
+
707
+ `MastraClientConfig` fields (all derived from the server-side plugin
708
+ mount, so a custom `mastra({ name: "myMastra" })` rewrites every path):
709
+
710
+
711
+ | Field | Example | Description |
712
+ | ------------------ | ----------------------------------- | -------------------------------------------------------- |
713
+ | `basePath` | `"/api/mastra"` | Plugin mount path. |
714
+ | `chatPath` | `"/api/mastra/route/chat"` | Default-agent chat URL. Use `chatUrl(config)` to get it. |
715
+ | `chatPathTemplate` | `"/api/mastra/route/chat/:agentId"` | OpenAPI-style template for tools / docs. |
716
+ | `modelsPath` | `"/api/mastra/models"` | `GET` cached endpoint catalogue. |
717
+ | `defaultAgent` | `"analyst"` | Agent id `chatRoute` binds to when none is supplied. |
718
+ | `agents` | `["analyst", "helper"]` | Every registered agent id in order. |
719
+
720
+
721
+ `chatUrl(config, agentId?)` returns `config.chatPath` for the default
722
+ agent (the registered `chatRoute` mount that omits `:agentId`), and
723
+ `${config.chatPath}/${encodeURIComponent(agentId)}` otherwise. Pure
724
+ function: no React, no hooks, safe in service workers and SSR.
725
+
726
+ ## License
727
+
728
+ Apache-2.0