npm - @mastra/mcp-docs-server - Versions diffs - 1.1.0 → 1.1.1-alpha.0 - Mend

@mastra/mcp-docs-server 1.1.0 → 1.1.1-alpha.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/.docs/docs/memory/observational-memory.md +166 -0
package/.docs/docs/memory/overview.md +4 -2
package/.docs/reference/index.md +1 -0
package/.docs/reference/memory/memory-class.md +3 -0
package/.docs/reference/memory/observational-memory.md +217 -0
package/CHANGELOG.md +7 -0
package/package.json +5 -6

package/.docs/docs/memory/observational-memory.md ADDED Viewed

@@ -0,0 +1,166 @@
+# Observational Memory
+Observational Memory (OM) is Mastra's memory system for long-context agentic memory. Two background agents — an **Observer** and a **Reflector** — watch your agent's conversations and maintain a dense observation log that replaces raw message history as it grows.
+## Quick Start
+```typescript
+import { Memory } from "@mastra/memory";
+import { Agent } from "@mastra/core/agent";
+export const agent = new Agent({
+  name: "my-agent",
+  instructions: "You are a helpful assistant.",
+  model: "openai/gpt-5-mini",
+  memory: new Memory({
+    options: {
+      observationalMemory: true,
+    },
+  }),
+});
+```
+That's it. The agent now has humanlike long-term memory that persists across conversations.
+See [configuration options](https://mastra.ai/reference/memory/observational-memory) for full API details.
+> **Note:** OM currently only supports `@mastra/pg`, `@mastra/libsql`, and `@mastra/mongodb` storage adapters. It also uses background agents for managing memory. The default model (configurable) is `google/gemini-2.5-flash` as it's the one we've tested the most.
+## Benefits
+- **Prompt caching**: OM's context is stable — observations append over time rather than being dynamically retrieved each turn. This keeps the prompt prefix cacheable, which reduces costs.
+- **Compression**: Raw message history and tool results get compressed into a dense observation log. Smaller context means faster responses and longer coherent conversations.
+- **Zero context rot**: The agent sees relevant information instead of noisy tool calls and irrelevant tokens, so the agent stays on task over long sessions.
+## How It Works
+You don't remember every word of every conversation you've ever had. You observe what happened subconsciously, then your brain reflects — reorganizing, combining, and condensing into long-term memory. OM works the same way.
+Every time an agent responds, it sees a context window containing its system prompt, recent message history, and any injected context. The context window is finite — even models with large token limits perform worse when the window is full. This causes two problems:
+- **Context rot**: the more raw message history an agent carries, the worse it performs.
+- **Context waste**: most of that history contains tokens no longer needed to keep the agent on task.
+OM solves both problems by compressing old context into dense observations.
+### Observations
+When message history tokens exceed a threshold (default: 30,000), the Observer creates observations — concise notes about what happened:
+```text
+Date: 2026-01-15
+- 🔴 12:10 User is building a Next.js app with Supabase auth, due in 1 week (meaning January 22nd 2026)
+  - 🔴 12:10 App uses server components with client-side hydration
+  - 🟡 12:12 User asked about middleware configuration for protected routes
+  - 🔴 12:15 User stated the app name is "Acme Dashboard"
+```
+The compression is typically 5–40×. The Observer also tracks a **current task** and **suggested response** so the agent picks up where it left off.
+Example: an agent using Playwright MCP might see 50,000+ tokens per page snapshot. With OM, the Observer watches the interaction and creates a few hundred tokens of observations about what was on the page and what actions were taken. The agent stays on task without carrying every raw snapshot.
+### Reflections
+When observations exceed their threshold (default: 40,000 tokens), the Reflector condenses them — combining related items and reflecting on patterns.
+The result is a three-tier system:
+1. **Recent messages** — exact conversation history for the current task
+2. **Observations** — a log of what the Observer has seen
+3. **Reflections** — condensed observations when memory becomes too long
+## Models
+The Observer and Reflector run in the background. Any model that works with Mastra's model routing (e.g. `openai/...`, `google/...`, `deepseek/...`) can be used.
+The default is `google/gemini-2.5-flash` — it works well for both observation and reflection, and its 1M token context window gives the Reflector headroom.
+We've also tested `deepseek`, `qwen3`, and `glm-4.7` for the Observer. For the Reflector, make sure the model's context window can fit all observations. Note that Claude 4.5 models currently don't work well as observer or reflector.
+```typescript
+const memory = new Memory({
+  options: {
+    observationalMemory: {
+      model: "deepseek/deepseek-reasoner",
+    },
+  },
+});
+```
+See [model configuration](https://mastra.ai/reference/memory/observational-memory) for using different models per agent.
+## Scopes
+### Thread scope (default)
+Each thread has its own observations.
+```typescript
+observationalMemory: {
+  scope: "thread",
+}
+```
+### Resource scope
+Observations are shared across all threads for a resource (typically a user). Enables cross-conversation memory.
+```typescript
+observationalMemory: {
+  scope: "resource",
+}
+```
+> **Warning:** In resource scope, unobserved messages across _all_ threads are processed together. For users with many existing threads, this can be slow. Use thread scope for existing apps.
+## Token Budgets
+OM uses token thresholds to decide when to observe and reflect. See [token budget configuration](https://mastra.ai/reference/memory/observational-memory) for details.
+```typescript
+const memory = new Memory({
+  options: {
+    observationalMemory: {
+      observation: {
+        // when to run the Observer (default: 30,000)
+        messageTokens: 30_000,
+      },
+      reflection: {
+        // when to run the Reflector (default: 40,000)
+        observationTokens: 40_000,
+      },
+      // let message history borrow from observation budget
+      shareTokenBudget: false,
+    },
+  },
+});
+```
+## Migrating existing threads
+No manual migration needed. OM reads existing messages and observes them lazily when thresholds are exceeded.
+- **Thread scope**: The first time a thread exceeds `observation.messageTokens`, the Observer processes the backlog.
+- **Resource scope**: All unobserved messages across all threads for a resource are processed together. For users with many existing threads, this could take significant time.
+## Viewing in Mastra Studio
+Mastra Studio shows OM status in real time in the memory tab: token usage, which model is running, current observations, and reflection history.
+## Comparing OM with other memory features
+- **[Message history](https://mastra.ai/docs/memory/message-history)** — high-fidelity record of the current conversation
+- **[Working memory](https://mastra.ai/docs/memory/working-memory)** — small, structured state (JSON or markdown) for user preferences, names, goals
+- **[Semantic Recall](https://mastra.ai/docs/memory/semantic-recall)** — RAG-based retrieval of relevant past messages
+- **Observational Memory** — long-context agentic memory that compresses extended sessions
+If you're using working memory to store conversation summaries or ongoing state that grows over time, OM is a better fit. Working memory is for small, structured data; OM is for long-running event logs. OM also manages message history automatically—the `messageTokens` setting controls how much raw history remains before observation runs.
+In practical terms, OM replaces both working memory and message history, and has greater accuracy (and lower cost) than Semantic Recall.
+## Related
+- [Observational Memory Reference](https://mastra.ai/reference/memory/observational-memory) — configuration options and API
+- [Memory Overview](https://mastra.ai/docs/memory/overview)
+- [Message History](https://mastra.ai/docs/memory/message-history)
+- [Memory Processors](https://mastra.ai/docs/memory/memory-processors)

package/.docs/docs/memory/overview.md CHANGED Viewed

@@ -2,11 +2,12 @@
 Memory enables your agent to remember user messages, agent replies, and tool results across interactions, giving it the context it needs to stay consistent, maintain conversation flow, and produce better answers over time.
-Mastra supports three complementary memory types:
+Mastra supports four complementary memory types:
 - [**Message history**](https://mastra.ai/docs/memory/message-history) - keeps recent messages from the current conversation so they can be rendered in the UI and used to maintain short-term continuity within the exchange.
 - [**Working memory**](https://mastra.ai/docs/memory/working-memory) - stores persistent, structured user data such as names, preferences, and goals.
 - [**Semantic recall**](https://mastra.ai/docs/memory/semantic-recall) - retrieves relevant messages from older conversations based on semantic meaning rather than exact keywords, mirroring how humans recall information by association. Requires a [vector database](https://mastra.ai/docs/memory/semantic-recall) and an [embedding model](https://mastra.ai/docs/memory/semantic-recall).
+- [**Observational memory**](https://mastra.ai/docs/memory/observational-memory) - uses background Observer and Reflector agents to maintain a dense observation log that replaces raw message history as it grows, keeping the context window small while preserving long-term memory across conversations.
 If the combined memory exceeds the model's context limit, [memory processors](https://mastra.ai/docs/memory/memory-processors) can filter, trim, or prioritize content so the most relevant information is preserved.
@@ -17,6 +18,7 @@ Choose a memory option to get started:
 - [Message history](https://mastra.ai/docs/memory/message-history)
 - [Working memory](https://mastra.ai/docs/memory/working-memory)
 - [Semantic recall](https://mastra.ai/docs/memory/semantic-recall)
+- [Observational memory](https://mastra.ai/docs/memory/observational-memory)
 ## Storage
@@ -39,5 +41,5 @@ This visibility helps you understand why an agent made specific decisions and ve
 ## Next steps
 - Learn more about [Storage](https://mastra.ai/docs/memory/storage) providers and configuration options
-- Add [Message history](https://mastra.ai/docs/memory/message-history), [Working memory](https://mastra.ai/docs/memory/working-memory), or [Semantic recall](https://mastra.ai/docs/memory/semantic-recall)
+- Add [Message history](https://mastra.ai/docs/memory/message-history), [Working memory](https://mastra.ai/docs/memory/working-memory), [Semantic recall](https://mastra.ai/docs/memory/semantic-recall), or [Observational memory](https://mastra.ai/docs/memory/observational-memory)
 - Visit [Memory configuration reference](https://mastra.ai/reference/memory/memory-class) for all available options

package/.docs/reference/index.md CHANGED Viewed

@@ -104,6 +104,7 @@ The Reference section provides documentation of Mastra's API, including paramete
 - [Tool Call Accuracy Scorers](https://mastra.ai/reference/evals/tool-call-accuracy)
 - [Toxicity](https://mastra.ai/reference/evals/toxicity)
 - [Memory Class](https://mastra.ai/reference/memory/memory-class)
+- [Observational Memory](https://mastra.ai/reference/memory/observational-memory)
 - [.createThread()](https://mastra.ai/reference/memory/createThread)
 - [.deleteMessages()](https://mastra.ai/reference/memory/deleteMessages)
 - [.getThreadById()](https://mastra.ai/reference/memory/getThreadById)

package/.docs/reference/memory/memory-class.md CHANGED Viewed

@@ -44,6 +44,8 @@ export const agent = new Agent({
 **workingMemory?:** (`WorkingMemory`): Configuration for working memory feature. Can be \`{ enabled: boolean; template?: string; schema?: ZodObject\<any> | JSONSchema7; scope?: 'thread' | 'resource' }\` or \`{ enabled: boolean }\` to disable. (Default: `{ enabled: false, template: '# User Information\n- **First Name**:\n- **Last Name**:\n...' }`)
+**observationalMemory?:** (`boolean | ObservationalMemoryOptions`): Enable Observational Memory for long-context agentic memory. Set to \`true\` for defaults, or pass a config object to customize token budgets, models, and scope. See \[Observational Memory reference]\(/reference/memory/observational-memory) for configuration details. (Default: `false`)
 **generateTitle?:** (`boolean | { model: DynamicArgument<MastraLanguageModel>; instructions?: DynamicArgument<string> }`): Controls automatic thread title generation from the user's first message. Can be a boolean or an object with custom model and instructions. (Default: `false`)
 ## Returns
@@ -134,6 +136,7 @@ export const agent = new Agent({
 - [Getting Started with Memory](https://mastra.ai/docs/memory/overview)
 - [Semantic Recall](https://mastra.ai/docs/memory/semantic-recall)
 - [Working Memory](https://mastra.ai/docs/memory/working-memory)
+- [Observational Memory](https://mastra.ai/docs/memory/observational-memory)
 - [Memory Processors](https://mastra.ai/docs/memory/memory-processors)
 - [createThread](https://mastra.ai/reference/memory/createThread)
 - [recall](https://mastra.ai/reference/memory/recall)

package/.docs/reference/memory/observational-memory.md ADDED Viewed

@@ -0,0 +1,217 @@
+# Observational Memory
+Observational Memory (OM) is Mastra's memory system for long-context agentic memory. Two background agents — an **Observer** that watches conversations and creates observations, and a **Reflector** that restructures observations by combining related items, reflecting on overarching patterns, and condensing where possible — maintain an observation log that replaces raw message history as it grows.
+## Usage
+```typescript
+import { Memory } from "@mastra/memory";
+import { Agent } from "@mastra/core/agent";
+export const agent = new Agent({
+  name: "my-agent",
+  instructions: "You are a helpful assistant.",
+  model: "openai/gpt-5-mini",
+  memory: new Memory({
+    options: {
+      observationalMemory: true,
+    },
+  }),
+});
+```
+## Configuration
+The `observationalMemory` option accepts `true`, `false`, or a configuration object.
+Setting `observationalMemory: true` enables it with all defaults. Setting `observationalMemory: false` or omitting it disables it.
+**enabled?:** (`boolean`): Enable or disable Observational Memory. When omitted from a config object, defaults to \`true\`. Only \`enabled: false\` explicitly disables it. (Default: `true`)
+**model?:** (`string | LanguageModel | DynamicModel | ModelWithRetries[]`): Model for both the Observer and Reflector agents. Sets the model for both at once. Cannot be used together with \`observation.model\` or \`reflection.model\` — an error will be thrown if both are set. (Default: `'google/gemini-2.5-flash'`)
+**scope?:** (`'resource' | 'thread'`): Memory scope for observations. \`'thread'\` keeps observations per-thread. \`'resource'\` shares observations across all threads for a resource, enabling cross-conversation memory. (Default: `'thread'`)
+**shareTokenBudget?:** (`boolean`): Share the token budget between messages and observations. When enabled, the total budget is \`observation.messageTokens + reflection.observationTokens\`. Messages can use more space when observations are small, and vice versa. This maximizes context usage through flexible allocation. (Default: `false`)
+**observation?:** (`ObservationalMemoryObservationConfig`): Configuration for the observation step. Controls when the Observer agent runs and how it behaves.
+**reflection?:** (`ObservationalMemoryReflectionConfig`): Configuration for the reflection step. Controls when the Reflector agent runs and how it behaves.
+### Observation config
+**model?:** (`string | LanguageModel | DynamicModel | ModelWithRetries[]`): Model for the Observer agent. Cannot be set if a top-level \`model\` is also provided. (Default: `'google/gemini-2.5-flash'`)
+**messageTokens?:** (`number`): Token count of unobserved messages that triggers observation. When unobserved message tokens exceed this threshold, the Observer agent is called. (Default: `30000`)
+**maxTokensPerBatch?:** (`number`): Maximum tokens per batch when observing multiple threads in resource scope. Threads are chunked into batches of this size and processed in parallel. Lower values mean more parallelism but more API calls. (Default: `10000`)
+**modelSettings?:** (`ObservationalMemoryModelSettings`): Model settings for the Observer agent. (Default: `{ temperature: 0.3, maxOutputTokens: 100_000 }`)
+### Reflection config
+**model?:** (`string | LanguageModel | DynamicModel | ModelWithRetries[]`): Model for the Reflector agent. Cannot be set if a top-level \`model\` is also provided. (Default: `'google/gemini-2.5-flash'`)
+**observationTokens?:** (`number`): Token count of observations that triggers reflection. When observation tokens exceed this threshold, the Reflector agent is called to condense them. (Default: `40000`)
+**modelSettings?:** (`ObservationalMemoryModelSettings`): Model settings for the Reflector agent. (Default: `{ temperature: 0, maxOutputTokens: 100_000 }`)
+### Model settings
+**temperature?:** (`number`): Temperature for generation. Lower values produce more consistent output. (Default: `0.3`)
+**maxOutputTokens?:** (`number`): Maximum output tokens. Set high to prevent truncation of observations. (Default: `100000`)
+## Examples
+### Resource scope with custom thresholds
+```typescript
+import { Memory } from "@mastra/memory";
+import { Agent } from "@mastra/core/agent";
+export const agent = new Agent({
+  name: "my-agent",
+  instructions: "You are a helpful assistant.",
+  model: "openai/gpt-5-mini",
+  memory: new Memory({
+    options: {
+      observationalMemory: {
+        scope: "resource",
+        observation: {
+          messageTokens: 20_000,
+        },
+        reflection: {
+          observationTokens: 60_000,
+        },
+      },
+    },
+  }),
+});
+```
+### Shared token budget
+```typescript
+import { Memory } from "@mastra/memory";
+import { Agent } from "@mastra/core/agent";
+export const agent = new Agent({
+  name: "my-agent",
+  instructions: "You are a helpful assistant.",
+  model: "openai/gpt-5-mini",
+  memory: new Memory({
+    options: {
+      observationalMemory: {
+        shareTokenBudget: true,
+        observation: {
+          messageTokens: 20_000,
+        },
+        reflection: {
+          observationTokens: 80_000,
+        },
+      },
+    },
+  }),
+});
+```
+When `shareTokenBudget` is enabled, the total budget is `observation.messageTokens + reflection.observationTokens` (100k in this example). If observations only use 30k tokens, messages can expand to use up to 70k. If messages are short, observations have more room before triggering reflection.
+### Custom model
+```typescript
+import { Memory } from "@mastra/memory";
+import { Agent } from "@mastra/core/agent";
+export const agent = new Agent({
+  name: "my-agent",
+  instructions: "You are a helpful assistant.",
+  model: "openai/gpt-5-mini",
+  memory: new Memory({
+    options: {
+      observationalMemory: {
+        model: "openai/gpt-4o-mini",
+      },
+    },
+  }),
+});
+```
+### Different models per agent
+```typescript
+import { Memory } from "@mastra/memory";
+import { Agent } from "@mastra/core/agent";
+export const agent = new Agent({
+  name: "my-agent",
+  instructions: "You are a helpful assistant.",
+  model: "openai/gpt-5-mini",
+  memory: new Memory({
+    options: {
+      observationalMemory: {
+        observation: {
+          model: "google/gemini-2.5-flash",
+        },
+        reflection: {
+          model: "openai/gpt-4o-mini",
+        },
+      },
+    },
+  }),
+});
+```
+## Standalone usage
+Most users should use the `Memory` class above. Using `ObservationalMemory` directly is mainly useful for benchmarking, experimentation, or when you need to control processor ordering with other processors (like [guardrails](https://mastra.ai/docs/agents/guardrails)).
+```typescript
+import { ObservationalMemory } from "@mastra/memory/processors";
+import { Agent } from "@mastra/core/agent";
+import { LibSQLStore } from "@mastra/libsql";
+const storage = new LibSQLStore({
+  id: "my-storage",
+  url: "file:./memory.db",
+});
+const om = new ObservationalMemory({
+  storage: storage.stores.memory,
+  model: "google/gemini-2.5-flash",
+  scope: "resource",
+  observation: {
+    messageTokens: 20_000,
+  },
+  reflection: {
+    observationTokens: 60_000,
+  },
+});
+export const agent = new Agent({
+  name: "my-agent",
+  instructions: "You are a helpful assistant.",
+  model: "openai/gpt-5-mini",
+  inputProcessors: [om],
+  outputProcessors: [om],
+});
+```
+### Standalone config
+The standalone `ObservationalMemory` class accepts all the same options as the `observationalMemory` config object above, plus the following:
+**storage:** (`MemoryStorage`): Storage adapter for persisting observations. Must be a MemoryStorage instance (from \`MastraStorage.stores.memory\`).
+**onDebugEvent?:** (`(event: ObservationDebugEvent) => void`): Debug callback for observation events. Called whenever observation-related events occur. Useful for debugging and understanding the observation flow.
+**obscureThreadIds?:** (`boolean`): When enabled, thread IDs are hashed before being included in observation context. This prevents the LLM from recognizing patterns in thread identifiers. Automatically enabled when using resource scope through the Memory class. (Default: `false`)
+### Related
+- [Observational Memory](https://mastra.ai/docs/memory/observational-memory)
+- [Memory Overview](https://mastra.ai/docs/memory/overview)
+- [Memory Class](https://mastra.ai/reference/memory/memory-class)
+- [Memory Processors](https://mastra.ai/docs/memory/memory-processors)
+- [Processors](https://mastra.ai/docs/agents/processors)

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,12 @@
 # @mastra/mcp-docs-server
+## 1.1.1-alpha.0
+### Patch Changes
+- Updated dependencies [[`90f7894`](https://github.com/mastra-ai/mastra/commit/90f7894568dc9481f40a4d29672234fae23090bb), [`8109aee`](https://github.com/mastra-ai/mastra/commit/8109aeeab758e16cd4255a6c36f044b70eefc6a6)]:
+  - @mastra/core@1.2.1-alpha.0
 ## 1.1.0
 ### Minor Changes

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@mastra/mcp-docs-server",
-  "version": "1.1.0",
+  "version": "1.1.1-alpha.0",
   "description": "MCP server for accessing Mastra.ai documentation, changelogs, and news.",
   "type": "module",
   "main": "dist/index.js",
@@ -29,7 +29,7 @@
     "jsdom": "^26.1.0",
     "local-pkg": "^1.1.2",
     "zod": "^3.25.76",
-    "@mastra/core": "1.2.0",
+    "@mastra/core": "1.2.1-alpha.0",
     "@mastra/mcp": "^1.0.0"
   },
   "devDependencies": {
@@ -46,9 +46,9 @@
     "tsx": "^4.21.0",
     "typescript": "^5.9.3",
     "vitest": "4.0.16",
-    "@internal/types-builder": "0.0.32",
-    "@mastra/core": "1.2.0",
-    "@internal/lint": "0.0.57"
+    "@internal/lint": "0.0.57",
+    "@mastra/core": "1.2.1-alpha.0",
+    "@internal/types-builder": "0.0.32"
   },
   "homepage": "https://mastra.ai",
   "repository": {
@@ -63,7 +63,6 @@
     "node": ">=22.13.0"
   },
   "scripts": {
-    "build:docs": "cd ../../docs && pnpm build",
     "prepare-docs": "tsx ./scripts/prepare-docs.ts",
     "build:cli": "tsup --silent --config tsup.config.ts",
     "pretest": "pnpm turbo build --filter @mastra/mcp-docs-server",