npm - @mastra/memory - Versions diffs - 1.4.0 → 1.5.0 - Mend

@mastra/memory 1.4.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/dist/{chunk-QRKB5I2S.cjs → chunk-LLTHE64H.cjs} RENAMED Viewed

@@ -18,54 +18,7 @@ var o200k_base__default = /*#__PURE__*/_interopDefault(o200k_base);
 // src/processors/observational-memory/observational-memory.ts
 // src/processors/observational-memory/observer-agent.ts
-var USE_CONDENSED_PROMPT = process.env.OM_USE_CONDENSED_PROMPT === "1" || process.env.OM_USE_CONDENSED_PROMPT === "true";
-var CONDENSED_OBSERVER_EXTRACTION_INSTRUCTIONS = `You are the memory consciousness of an AI assistant. Your observations will be the ONLY information the assistant has about past interactions with this user.
-CORE PRINCIPLES:
-1. BE SPECIFIC - Vague observations are useless. Capture details that distinguish and identify.
-2. ANCHOR IN TIME - Note when things happened and when they were said.
-3. TRACK STATE CHANGES - When information updates or supersedes previous info, make it explicit.
-4. USE COMMON SENSE - If it would help the assistant remember later, observe it.
-ASSERTIONS VS QUESTIONS:
-- User TELLS you something \u2192 \u{1F534} "User stated [fact]"
-- User ASKS something \u2192 \u{1F7E1} "User asked [question]"
-- User assertions are authoritative. They are the source of truth about their own life.
-TEMPORAL ANCHORING:
-- Always include message time at the start: (14:30) User stated...
-- Add estimated date at the END only for relative time references:
-  "User will visit parents this weekend. (meaning Jan 18-19)"
-- Don't add end dates for present-moment statements or vague terms like "recently"
-- Split multi-event statements into separate observations, each with its own date
-DETAILS TO ALWAYS PRESERVE:
-- Names, handles, usernames, titles (@username, "Dr. Smith")
-- Numbers, counts, quantities (4 items, 3 sessions, 27th in list)
-- Measurements, percentages, statistics (5kg, 20% improvement, 85% accuracy)
-- Sequences and orderings (steps 1-5, chord progression, lucky numbers)
-- Prices, dates, times, durations ($50, March 15, 2 hours)
-- Locations and distinguishing attributes (near X, based in Y, specializes in Z)
-- User's specific role (presenter, volunteer, organizer - not just "attended")
-- Exact phrasing when unusual ("movement session" for exercise)
-- Verbatim text being collaborated on (code, formatted text, ASCII art)
-WHEN ASSISTANT PROVIDES LISTS/RECOMMENDATIONS:
-Don't just say "Assistant recommended 5 hotels." Capture what distinguishes each:
-"Assistant recommended: Hotel A (near station), Hotel B (pet-friendly), Hotel C (has pool)..."
-STATE CHANGES:
-When user updates information, note what changed:
-"User will use the new method (replacing the old approach)"
-WHO/WHAT/WHERE/WHEN:
-Capture all dimensions. Not just "User went on a trip" but who with, where, when, and what happened.
-Don't repeat observations that have already been captured in previous sessions.
-REMEMBER: These observations are your ENTIRE memory. Any detail you fail to observe is permanently forgotten. Use common sense - if something seems like it might be important to remember, it probably is. When in doubt, observe it.`;
-var CURRENT_OBSERVER_EXTRACTION_INSTRUCTIONS = `CRITICAL: DISTINGUISH USER ASSERTIONS FROM QUESTIONS
+var OBSERVER_EXTRACTION_INSTRUCTIONS = `CRITICAL: DISTINGUISH USER ASSERTIONS FROM QUESTIONS
 When the user TELLS you something about themselves, mark it as an assertion:
 - "I have two kids" \u2192 \u{1F534} (14:30) User stated has two kids
@@ -73,8 +26,8 @@ When the user TELLS you something about themselves, mark it as an assertion:
 - "I graduated in 2019" \u2192 \u{1F534} (14:32) User stated graduated in 2019
 When the user ASKS about something, mark it as a question/request:
-- "Can you help me with X?" \u2192 \u{1F7E1} (15:00) User asked help with X
-- "What's the best way to do Y?" \u2192 \u{1F7E1} (15:01) User asked best way to do Y
+- "Can you help me with X?" \u2192 \u{1F534} (15:00) User asked help with X
+- "What's the best way to do Y?" \u2192 \u{1F534} (15:01) User asked best way to do Y
 Distinguish between QUESTIONS and STATEMENTS OF INTENT:
 - "Can you recommend..." \u2192 Question (extract as "User asked...")
@@ -256,60 +209,39 @@ CONVERSATION CONTEXT:
 - When who/what/where/when is mentioned, note that in the observation. Example: if the user received went on a trip with someone, observe who that someone was, where the trip was, when it happened, and what happened, not just that the user went on the trip.
 - For any described entity (like a person, place, thing, etc), preserve the attributes that would help identify or describe the specific entity later: location ("near X"), specialty ("focuses on Y"), unique feature ("has Z"), relationship ("owned by W"), or other details. The entity's name is important, but so are any additional details that distinguish it. If there are a list of entities, preserve these details for each of them.
-ACTIONABLE INSIGHTS:
-- What worked well in explanations
-- What needs follow-up or clarification
-- User's stated goals or next steps (note if the user tells you not to do a next step, or asks for something specific, other next steps besides the users request should be marked as "waiting for user", unless the user explicitly says to continue all next steps)`;
-var OBSERVER_EXTRACTION_INSTRUCTIONS = USE_CONDENSED_PROMPT ? CONDENSED_OBSERVER_EXTRACTION_INSTRUCTIONS : CURRENT_OBSERVER_EXTRACTION_INSTRUCTIONS;
-var CONDENSED_OBSERVER_OUTPUT_FORMAT = `Use priority levels:
-- \u{1F534} High: explicit user facts, preferences, goals achieved, critical context
-- \u{1F7E1} Medium: project details, learned information, tool results
-- \u{1F7E2} Low: minor details, uncertain observations
+USER MESSAGE CAPTURE:
+- Short and medium-length user messages should be captured nearly verbatim in your own words.
+- For very long user messages, summarize but quote key phrases that carry specific intent or meaning.
+- This is critical for continuity: when the conversation window shrinks, the observations are the only record of what the user said.
-Group observations by date, then list each with 24-hour time.
-Group related observations (like tool sequences) by indenting.
+AVOIDING REPETITIVE OBSERVATIONS:
+- Do NOT repeat the same observation across multiple turns if there is no new information.
+- When the agent performs repeated similar actions (e.g., browsing files, running the same tool type multiple times), group them into a single parent observation with sub-bullets for each new result.
-<observations>
-Date: Dec 4, 2025
-* \u{1F534} (09:15) User stated they have 3 kids: Emma (12), Jake (9), and Lily (5)
-* \u{1F534} (09:16) User's anniversary is March 15
-* \u{1F7E1} (09:20) User asked how to optimize database queries
-* \u{1F7E1} (10:30) User working on auth refactor - targeting 50% latency reduction
-* \u{1F7E1} (10:45) Assistant recommended hotels: Grand Plaza (downtown, $180/night), Seaside Inn (near beach, pet-friendly), Mountain Lodge (has pool, free breakfast)
-* \u{1F534} (11:00) User's friend @maria_dev recommended using Redis for caching
-* \u{1F7E1} (11:15) User attended the tech conference as a speaker (presented on microservices)
-* \u{1F534} (11:30) User will visit parents this weekend (meaning Dec 7-8, 2025)
-* \u{1F7E1} (14:00) Agent debugging auth issue
-  * -> ran git status, found 3 modified files
-  * -> viewed auth.ts:45-60, found missing null check
-  * -> applied fix, tests now pass
-* \u{1F7E1} (14:30) Assistant provided dataset stats: 7,342 samples, 89.6% accuracy, 23ms inference time
-* \u{1F534} (15:00) User's lucky numbers from fortune cookie: 7, 14, 23, 38, 42, 49
+Example \u2014 BAD (repetitive):
+* \u{1F7E1} (14:30) Agent used view tool on src/auth.ts
+* \u{1F7E1} (14:31) Agent used view tool on src/users.ts
+* \u{1F7E1} (14:32) Agent used view tool on src/routes.ts
-Date: Dec 5, 2025
-* \u{1F534} (09:00) User switched from Python to TypeScript for the project (no longer using Python)
-* \u{1F7E1} (09:30) User bought running shoes for $120 at SportMart (downtown location)
-* \u{1F534} (10:00) User prefers morning meetings, not afternoon (updating previous preference)
-* \u{1F7E1} (10:30) User went to Italy with their sister last summer (meaning July 2025), visited Rome and Florence for 2 weeks
-* \u{1F534} (10:45) User's dentist appointment is next Tuesday (meaning Dec 10, 2025)
-* \u{1F7E2} (11:00) User mentioned they might try the new coffee shop
-</observations>
+Example \u2014 GOOD (grouped):
+* \u{1F7E1} (14:30) Agent browsed source files for auth flow
+  * -> viewed src/auth.ts \u2014 found token validation logic
+  * -> viewed src/users.ts \u2014 found user lookup by email
+  * -> viewed src/routes.ts \u2014 found middleware chain
-<current-task>
-Primary: Implementing OAuth2 flow for the auth refactor
-Secondary: Waiting for user to confirm database schema changes
-</current-task>
+Only add a new observation for a repeated action if the NEW result changes the picture.
-<suggested-response>
-The OAuth2 implementation is ready for testing. Would you like me to walk through the flow?
-</suggested-response>`;
+ACTIONABLE INSIGHTS:
+- What worked well in explanations
+- What needs follow-up or clarification
+- User's stated goals or next steps (note if the user tells you not to do a next step, or asks for something specific, other next steps besides the users request should be marked as "waiting for user", unless the user explicitly says to continue all next steps)`;
 var OBSERVER_OUTPUT_FORMAT_BASE = `Use priority levels:
 - \u{1F534} High: explicit user facts, preferences, goals achieved, critical context
 - \u{1F7E1} Medium: project details, learned information, tool results
 - \u{1F7E2} Low: minor details, uncertain observations
 Group related observations (like tool sequences) by indenting:
-* \u{1F7E1} (14:33) Agent debugging auth issue
+* \u{1F534} (14:33) Agent debugging auth issue
   * -> ran git status, found 3 modified files
   * -> viewed auth.ts:45-60, found missing null check
   * -> applied fix, tests now pass
@@ -319,11 +251,11 @@ Group observations by date, then list each with 24-hour time.
 <observations>
 Date: Dec 4, 2025
 * \u{1F534} (14:30) User prefers direct answers
-* \u{1F7E1} (14:31) Working on feature X
-* \u{1F7E2} (14:32) User might prefer dark mode
+* \u{1F534} (14:31) Working on feature X
+* \u{1F7E1} (14:32) User might prefer dark mode
 Date: Dec 5, 2025
-* \u{1F7E1} (09:15) Continued work on feature X
+* \u{1F534} (09:15) Continued work on feature X
 </observations>
 <current-task>
@@ -340,29 +272,21 @@ Hint for the agent's immediate next message. Examples:
 - "The assistant should wait for the user to respond before continuing."
 - Call the view tool on src/example.ts to continue debugging.
 </suggested-response>`;
-var CONDENSED_OBSERVER_GUIDELINES = `- Be specific: "User prefers short answers without lengthy explanations" not "User stated a preference"
-- Use terse language - dense sentences without unnecessary words
-- Don't repeat observations that have already been captured
-- When the agent calls tools, observe what was called, why, and what was learned
-- Include line numbers when observing code files
-- If the agent provides a detailed response, observe the key points so it could be repeated
-- Start each observation with a priority emoji (\u{1F534}, \u{1F7E1}, \u{1F7E2})
-- Observe WHAT happened and WHAT it means, not HOW well it was done
-- If the user provides detailed messages or code snippets, observe all important details`;
-var OBSERVER_GUIDELINES = USE_CONDENSED_PROMPT ? CONDENSED_OBSERVER_GUIDELINES : `- Be specific enough for the assistant to act on
+var OBSERVER_GUIDELINES = `- Be specific enough for the assistant to act on
 - Good: "User prefers short, direct answers without lengthy explanations"
 - Bad: "User stated a preference" (too vague)
 - Add 1 to 5 observations per exchange
-- Use terse language to save tokens. Sentences should be dense without unnecessary words.
-- Do not add repetitive observations that have already been observed.
-- If the agent calls tools, observe what was called, why, and what was learned.
-- When observing files with line numbers, include the line number if useful.
-- If the agent provides a detailed response, observe the contents so it could be repeated.
+- Use terse language to save tokens. Sentences should be dense without unnecessary words
+- Do not add repetitive observations that have already been observed. Group repeated similar actions (tool calls, file browsing) under a single parent with sub-bullets for new results
+- If the agent calls tools, observe what was called, why, and what was learned
+- When observing files with line numbers, include the line number if useful
+- If the agent provides a detailed response, observe the contents so it could be repeated
 - Make sure you start each observation with a priority emoji (\u{1F534}, \u{1F7E1}, \u{1F7E2})
-- Observe WHAT the agent did and WHAT it means, not HOW well it did it.
-- If the user provides detailed messages or code snippets, observe all important details.`;
+- User messages are always \u{1F534} priority, so are the completions of tasks. Capture the user's words closely \u2014 short/medium messages near-verbatim, long messages summarized with key quotes
+- Observe WHAT the agent did and WHAT it means
+- If the user provides detailed messages or code snippets, observe all important details`;
 function buildObserverSystemPrompt(multiThread = false, instruction) {
-  const outputFormat = USE_CONDENSED_PROMPT ? CONDENSED_OBSERVER_OUTPUT_FORMAT : OBSERVER_OUTPUT_FORMAT_BASE;
+  const outputFormat = OBSERVER_OUTPUT_FORMAT_BASE;
   if (multiThread) {
     return `You are the memory consciousness of an AI assistant. Your observations will be the ONLY information the assistant has about past interactions with this user.
@@ -383,7 +307,7 @@ Your output MUST use XML tags to structure the response. Each thread's observati
 <thread id="thread_id_1">
 Date: Dec 4, 2025
 * \u{1F534} (14:30) User prefers direct answers
-* \u{1F7E1} (14:31) Working on feature X
+* \u{1F534} (14:31) Working on feature X
 <current-task>
 What the agent is currently working on in this thread
@@ -396,7 +320,7 @@ Hint for the agent's next message in this thread
 <thread id="thread_id_2">
 Date: Dec 5, 2025
-* \u{1F7E1} (09:15) User asked about deployment
+* \u{1F534} (09:15) User asked about deployment
 <current-task>
 Current task for this thread
@@ -409,7 +333,7 @@ Suggested response for this thread
 </observations>
 Use priority levels:
-- \u{1F534} High: explicit user facts, preferences, goals achieved, critical context
+- \u{1F534} High: explicit user facts, preferences, goals achieved, critical context, user messages
 - \u{1F7E1} Medium: project details, learned information, tool results
 - \u{1F7E2} Low: minor details, uncertain observations
@@ -563,7 +487,7 @@ ${formattedMessages}
 `;
   prompt += `Date: Dec 5, 2025
 `;
-  prompt += `* \u{1F7E1} (09:15) User asked about deployment
+  prompt += `* \u{1F534} (09:15) User asked about deployment
 `;
   prompt += `<current-task>Discussing deployment options</current-task>
 `;
@@ -576,6 +500,9 @@ ${formattedMessages}
 }
 function parseMultiThreadObserverOutput(output) {
   const threads = /* @__PURE__ */ new Map();
+  if (detectDegenerateRepetition(output)) {
+    return { threads, rawOutput: output, degenerate: true };
+  }
   const observationsMatch = output.match(/^[ \t]*<observations>([\s\S]*?)^[ \t]*<\/observations>/im);
   const observationsContent = observationsMatch?.[1] ?? output;
   const threadRegex = /<thread\s+id="([^"]+)">([\s\S]*?)<\/thread>/gi;
@@ -597,7 +524,7 @@ function parseMultiThreadObserverOutput(output) {
       suggestedContinuation = suggestedMatch[1].trim();
       observations = observations.replace(/<suggested-response>[\s\S]*?<\/suggested-response>/i, "");
     }
-    observations = observations.trim();
+    observations = sanitizeObservationLines(observations.trim());
     threads.set(threadId, {
       observations,
       currentTask,
@@ -642,8 +569,15 @@ IMPORTANT: Do NOT include <current-task> or <suggested-response> sections in you
   return prompt;
 }
 function parseObserverOutput(output) {
+  if (detectDegenerateRepetition(output)) {
+    return {
+      observations: "",
+      rawOutput: output,
+      degenerate: true
+    };
+  }
   const parsed = parseMemorySectionXml(output);
-  const observations = parsed.observations || "";
+  const observations = sanitizeObservationLines(parsed.observations || "");
   return {
     observations,
     currentTask: parsed.currentTask || void 0,
@@ -684,6 +618,42 @@ function extractListItemsOnly(content) {
   }
   return listLines.join("\n").trim();
 }
+var MAX_OBSERVATION_LINE_CHARS = 1e4;
+function sanitizeObservationLines(observations) {
+  if (!observations) return observations;
+  const lines = observations.split("\n");
+  let changed = false;
+  for (let i = 0; i < lines.length; i++) {
+    if (lines[i].length > MAX_OBSERVATION_LINE_CHARS) {
+      lines[i] = lines[i].slice(0, MAX_OBSERVATION_LINE_CHARS) + " \u2026 [truncated]";
+      changed = true;
+    }
+  }
+  return changed ? lines.join("\n") : observations;
+}
+function detectDegenerateRepetition(text) {
+  if (!text || text.length < 2e3) return false;
+  const windowSize = 200;
+  const step = Math.max(1, Math.floor(text.length / 50));
+  const seen = /* @__PURE__ */ new Map();
+  let duplicateWindows = 0;
+  let totalWindows = 0;
+  for (let i = 0; i + windowSize <= text.length; i += step) {
+    const window = text.slice(i, i + windowSize);
+    totalWindows++;
+    const count = (seen.get(window) ?? 0) + 1;
+    seen.set(window, count);
+    if (count > 1) duplicateWindows++;
+  }
+  if (totalWindows > 5 && duplicateWindows / totalWindows > 0.4) {
+    return true;
+  }
+  const lines = text.split("\n");
+  for (const line of lines) {
+    if (line.length > 5e4) return true;
+  }
+  return false;
+}
 function hasCurrentTaskSection(observations) {
   if (/<current-task>/i.test(observations)) {
     return true;
@@ -895,8 +865,14 @@ IMPORTANT: Do NOT include <current-task> or <suggested-response> sections in you
   return prompt;
 }
 function parseReflectorOutput(output) {
+  if (detectDegenerateRepetition(output)) {
+    return {
+      observations: "",
+      degenerate: true
+    };
+  }
   const parsed = parseReflectorSectionXml(output);
-  const observations = parsed.observations || "";
+  const observations = sanitizeObservationLines(parsed.observations || "");
   return {
     observations,
     suggestedContinuation: parsed.suggestedResponse || void 0
@@ -1276,7 +1252,9 @@ var OBSERVATION_CONTEXT_INSTRUCTIONS = `IMPORTANT: When responding, reference sp
 KNOWLEDGE UPDATES: When asked about current state (e.g., "where do I currently...", "what is my current..."), always prefer the MOST RECENT information. Observations include dates - if you see conflicting information, the newer observation supersedes the older one. Look for phrases like "will start", "is switching", "changed to", "moved to" as indicators that previous information has been updated.
-PLANNED ACTIONS: If the user stated they planned to do something (e.g., "I'm going to...", "I'm looking forward to...", "I will...") and the date they planned to do it is now in the past (check the relative time like "3 weeks ago"), assume they completed the action unless there's evidence they didn't. For example, if someone said "I'll start my new diet on Monday" and that was 2 weeks ago, assume they started the diet.`;
+PLANNED ACTIONS: If the user stated they planned to do something (e.g., "I'm going to...", "I'm looking forward to...", "I will...") and the date they planned to do it is now in the past (check the relative time like "3 weeks ago"), assume they completed the action unless there's evidence they didn't. For example, if someone said "I'll start my new diet on Monday" and that was 2 weeks ago, assume they started the diet.
+MOST RECENT USER INPUT: Treat the most recent user message as the highest-priority signal for what to do next. Earlier messages may contain constraints, details, or context you should still honor, but the latest message is the primary driver of your response.`;
 var ObservationalMemory = class _ObservationalMemory {
   id = "observational-memory";
   name = "Observational Memory";
@@ -1446,41 +1424,65 @@ var ObservationalMemory = class _ObservationalMemory {
     }
     return [];
   }
+  /**
+   * Resolve bufferActivation config into an absolute retention floor (tokens to keep).
+   * - Value in (0, 1]: ratio → retentionFloor = threshold * (1 - value)
+   * - Value >= 1000: absolute token count → retentionFloor = value
+   */
+  resolveRetentionFloor(bufferActivation, messageTokensThreshold) {
+    if (bufferActivation >= 1e3) return bufferActivation;
+    return messageTokensThreshold * (1 - bufferActivation);
+  }
+  /**
+   * Convert bufferActivation to the equivalent ratio (0-1) for the storage layer.
+   * When bufferActivation >= 1000, it's an absolute retention target, so we compute
+   * the equivalent ratio: 1 - (bufferActivation / threshold).
+   */
+  resolveActivationRatio(bufferActivation, messageTokensThreshold) {
+    if (bufferActivation >= 1e3) {
+      return Math.max(0, Math.min(1, 1 - bufferActivation / messageTokensThreshold));
+    }
+    return bufferActivation;
+  }
   /**
    * Calculate the projected message tokens that would be removed if activation happened now.
    * This replicates the chunk boundary logic in swapBufferedToActive without actually activating.
    */
-  calculateProjectedMessageRemoval(chunks, activationRatio, messageTokensThreshold, currentPendingTokens) {
+  calculateProjectedMessageRemoval(chunks, bufferActivation, messageTokensThreshold, currentPendingTokens) {
     if (chunks.length === 0) return 0;
-    const retentionFloor = messageTokensThreshold * (1 - activationRatio);
+    const retentionFloor = this.resolveRetentionFloor(bufferActivation, messageTokensThreshold);
     const targetMessageTokens = Math.max(0, currentPendingTokens - retentionFloor);
     let cumulativeMessageTokens = 0;
-    let bestBoundary = 0;
-    let bestBoundaryMessageTokens = 0;
+    let bestOverBoundary = 0;
+    let bestOverTokens = 0;
+    let bestUnderBoundary = 0;
+    let bestUnderTokens = 0;
     for (let i = 0; i < chunks.length; i++) {
       cumulativeMessageTokens += chunks[i].messageTokens ?? 0;
       const boundary = i + 1;
-      const isUnder = cumulativeMessageTokens <= targetMessageTokens;
-      const bestIsUnder = bestBoundaryMessageTokens <= targetMessageTokens;
-      if (bestBoundary === 0) {
-        bestBoundary = boundary;
-        bestBoundaryMessageTokens = cumulativeMessageTokens;
-      } else if (isUnder && !bestIsUnder) {
-        bestBoundary = boundary;
-        bestBoundaryMessageTokens = cumulativeMessageTokens;
-      } else if (isUnder && bestIsUnder) {
-        if (cumulativeMessageTokens > bestBoundaryMessageTokens) {
-          bestBoundary = boundary;
-          bestBoundaryMessageTokens = cumulativeMessageTokens;
+      if (cumulativeMessageTokens >= targetMessageTokens) {
+        if (bestOverBoundary === 0 || cumulativeMessageTokens < bestOverTokens) {
+          bestOverBoundary = boundary;
+          bestOverTokens = cumulativeMessageTokens;
         }
-      } else if (!isUnder && !bestIsUnder) {
-        if (cumulativeMessageTokens < bestBoundaryMessageTokens) {
-          bestBoundary = boundary;
-          bestBoundaryMessageTokens = cumulativeMessageTokens;
+      } else {
+        if (cumulativeMessageTokens > bestUnderTokens) {
+          bestUnderBoundary = boundary;
+          bestUnderTokens = cumulativeMessageTokens;
         }
       }
     }
-    if (bestBoundary === 0) {
+    const maxOvershoot = retentionFloor * 0.95;
+    const overshoot = bestOverTokens - targetMessageTokens;
+    const remainingAfterOver = currentPendingTokens - bestOverTokens;
+    let bestBoundaryMessageTokens;
+    if (bestOverBoundary > 0 && overshoot <= maxOvershoot && (remainingAfterOver >= 1e3 || retentionFloor === 0)) {
+      bestBoundaryMessageTokens = bestOverTokens;
+    } else if (bestUnderBoundary > 0) {
+      bestBoundaryMessageTokens = bestUnderTokens;
+    } else if (bestOverBoundary > 0) {
+      bestBoundaryMessageTokens = bestOverTokens;
+    } else {
       return chunks[0]?.messageTokens ?? 0;
     }
     return bestBoundaryMessageTokens;
@@ -1488,8 +1490,12 @@ var ObservationalMemory = class _ObservationalMemory {
   /**
    * Check if we've crossed a new bufferTokens interval boundary.
    * Returns true if async buffering should be triggered.
+   *
+   * When pending tokens are within ~1 bufferTokens of the observation threshold,
+   * the buffer interval is halved to produce finer-grained chunks right before
+   * activation. This improves chunk boundary selection, reducing overshoot.
    */
-  shouldTriggerAsyncObservation(currentTokens, lockKey, record) {
+  shouldTriggerAsyncObservation(currentTokens, lockKey, record, messageTokensThreshold) {
     if (!this.isAsyncObservationEnabled()) return false;
     if (record.isBufferingObservation) {
       if (isOpActiveInProcess(record.id, "bufferingObservation")) return false;
@@ -1503,11 +1509,13 @@ var ObservationalMemory = class _ObservationalMemory {
     const dbBoundary = record.lastBufferedAtTokens ?? 0;
     const memBoundary = _ObservationalMemory.lastBufferedBoundary.get(bufferKey) ?? 0;
     const lastBoundary = Math.max(dbBoundary, memBoundary);
-    const currentInterval = Math.floor(currentTokens / bufferTokens);
-    const lastInterval = Math.floor(lastBoundary / bufferTokens);
+    const rampPoint = messageTokensThreshold ? messageTokensThreshold - bufferTokens * 1.1 : Infinity;
+    const effectiveBufferTokens = currentTokens >= rampPoint ? bufferTokens / 2 : bufferTokens;
+    const currentInterval = Math.floor(currentTokens / effectiveBufferTokens);
+    const lastInterval = Math.floor(lastBoundary / effectiveBufferTokens);
     const shouldTrigger = currentInterval > lastInterval;
     omDebug(
-      `[OM:shouldTriggerAsyncObs] tokens=${currentTokens}, bufferTokens=${bufferTokens}, currentInterval=${currentInterval}, lastInterval=${lastInterval}, lastBoundary=${lastBoundary} (db=${dbBoundary}, mem=${memBoundary}), shouldTrigger=${shouldTrigger}`
+      `[OM:shouldTriggerAsyncObs] tokens=${currentTokens}, bufferTokens=${bufferTokens}, effectiveBufferTokens=${effectiveBufferTokens}, rampPoint=${rampPoint}, currentInterval=${currentInterval}, lastInterval=${lastInterval}, lastBoundary=${lastBoundary} (db=${dbBoundary}, mem=${memBoundary}), shouldTrigger=${shouldTrigger}`
     );
     return shouldTrigger;
   }
@@ -1768,9 +1776,17 @@ Async buffering is enabled by default \u2014 this opt-out is only needed when us
       }
     }
     if (this.observationConfig.bufferActivation !== void 0) {
-      if (this.observationConfig.bufferActivation <= 0 || this.observationConfig.bufferActivation > 1) {
+      if (this.observationConfig.bufferActivation <= 0) {
+        throw new Error(`observation.bufferActivation must be > 0, got ${this.observationConfig.bufferActivation}`);
+      }
+      if (this.observationConfig.bufferActivation > 1 && this.observationConfig.bufferActivation < 1e3) {
+        throw new Error(
+          `observation.bufferActivation must be <= 1 (ratio) or >= 1000 (absolute token retention), got ${this.observationConfig.bufferActivation}`
+        );
+      }
+      if (this.observationConfig.bufferActivation >= 1e3 && this.observationConfig.bufferActivation >= observationThreshold) {
         throw new Error(
-          `observation.bufferActivation must be in range (0, 1], got ${this.observationConfig.bufferActivation}`
+          `observation.bufferActivation as absolute retention (${this.observationConfig.bufferActivation}) must be less than messageTokens (${observationThreshold})`
         );
       }
     }
@@ -2385,18 +2401,31 @@ Async buffering is enabled by default \u2014 this opt-out is only needed when us
   async callObserver(existingObservations, messagesToObserve, abortSignal, options) {
     const agent = this.getObserverAgent();
     const prompt = buildObserverPrompt(existingObservations, messagesToObserve, options);
-    const result = await this.withAbortCheck(
-      () => agent.generate(prompt, {
-        modelSettings: {
-          ...this.observationConfig.modelSettings
-        },
-        providerOptions: this.observationConfig.providerOptions,
-        ...abortSignal ? { abortSignal } : {},
-        ...options?.requestContext ? { requestContext: options.requestContext } : {}
-      }),
-      abortSignal
-    );
-    const parsed = parseObserverOutput(result.text);
+    const doGenerate = async () => {
+      const result2 = await this.withAbortCheck(
+        () => agent.generate(prompt, {
+          modelSettings: {
+            ...this.observationConfig.modelSettings
+          },
+          providerOptions: this.observationConfig.providerOptions,
+          ...abortSignal ? { abortSignal } : {},
+          ...options?.requestContext ? { requestContext: options.requestContext } : {}
+        }),
+        abortSignal
+      );
+      return result2;
+    };
+    let result = await doGenerate();
+    let parsed = parseObserverOutput(result.text);
+    if (parsed.degenerate) {
+      omDebug(`[OM:callObserver] degenerate repetition detected, retrying once`);
+      result = await doGenerate();
+      parsed = parseObserverOutput(result.text);
+      if (parsed.degenerate) {
+        omDebug(`[OM:callObserver] degenerate repetition on retry, failing`);
+        throw new Error("Observer produced degenerate output after retry");
+      }
+    }
     const usage = result.totalUsage ?? result.usage;
     return {
       observations: parsed.observations,
@@ -2430,18 +2459,30 @@ Async buffering is enabled by default \u2014 this opt-out is only needed when us
     for (const msg of allMessages) {
       this.observedMessageIds.add(msg.id);
     }
-    const result = await this.withAbortCheck(
-      () => agent$1.generate(prompt, {
-        modelSettings: {
-          ...this.observationConfig.modelSettings
-        },
-        providerOptions: this.observationConfig.providerOptions,
-        ...abortSignal ? { abortSignal } : {},
-        ...requestContext ? { requestContext } : {}
-      }),
-      abortSignal
-    );
-    const parsed = parseMultiThreadObserverOutput(result.text);
+    const doGenerate = async () => {
+      return this.withAbortCheck(
+        () => agent$1.generate(prompt, {
+          modelSettings: {
+            ...this.observationConfig.modelSettings
+          },
+          providerOptions: this.observationConfig.providerOptions,
+          ...abortSignal ? { abortSignal } : {},
+          ...requestContext ? { requestContext } : {}
+        }),
+        abortSignal
+      );
+    };
+    let result = await doGenerate();
+    let parsed = parseMultiThreadObserverOutput(result.text);
+    if (parsed.degenerate) {
+      omDebug(`[OM:callMultiThreadObserver] degenerate repetition detected, retrying once`);
+      result = await doGenerate();
+      parsed = parseMultiThreadObserverOutput(result.text);
+      if (parsed.degenerate) {
+        omDebug(`[OM:callMultiThreadObserver] degenerate repetition on retry, failing`);
+        throw new Error("Multi-thread observer produced degenerate output after retry");
+      }
+    }
     const results = /* @__PURE__ */ new Map();
     for (const [threadId, threadResult] of parsed.threads) {
       results.set(threadId, {
@@ -2528,11 +2569,22 @@ Async buffering is enabled by default \u2014 this opt-out is only needed when us
         totalUsage.totalTokens += usage.totalTokens ?? 0;
       }
       parsed = parseReflectorOutput(result.text);
-      reflectedTokens = this.tokenCounter.countObservations(parsed.observations);
+      if (parsed.degenerate) {
+        omDebug(
+          `[OM:callReflector] attempt #${attemptNumber}: degenerate repetition detected, treating as compression failure`
+        );
+        reflectedTokens = originalTokens;
+      } else {
+        reflectedTokens = this.tokenCounter.countObservations(parsed.observations);
+      }
       omDebug(
-        `[OM:callReflector] attempt #${attemptNumber} parsed: reflectedTokens=${reflectedTokens}, targetThreshold=${targetThreshold}, compressionValid=${validateCompression(reflectedTokens, targetThreshold)}, parsedObsLen=${parsed.observations?.length}`
+        `[OM:callReflector] attempt #${attemptNumber} parsed: reflectedTokens=${reflectedTokens}, targetThreshold=${targetThreshold}, compressionValid=${validateCompression(reflectedTokens, targetThreshold)}, parsedObsLen=${parsed.observations?.length}, degenerate=${parsed.degenerate ?? false}`
       );
-      if (validateCompression(reflectedTokens, targetThreshold) || currentLevel >= maxLevel) {
+      if (!parsed.degenerate && (validateCompression(reflectedTokens, targetThreshold) || currentLevel >= maxLevel)) {
+        break;
+      }
+      if (parsed.degenerate && currentLevel >= maxLevel) {
+        omDebug(`[OM:callReflector] degenerate output persists at maxLevel=${maxLevel}, breaking`);
         break;
       }
       if (streamContext?.writer) {
@@ -2840,6 +2892,20 @@ ${suggestedResponse}
           omDebug(
             `[OM:threshold] activation succeeded, obsTokens=${updatedRecord.observationTokenCount}, activeObsLen=${updatedRecord.activeObservations?.length}`
           );
+          if (activationResult.suggestedContinuation || activationResult.currentTask) {
+            const thread = await this.storage.getThreadById({ threadId });
+            if (thread) {
+              const newMetadata = memory.setThreadOMMetadata(thread.metadata, {
+                suggestedResponse: activationResult.suggestedContinuation,
+                currentTask: activationResult.currentTask
+              });
+              await this.storage.updateThread({
+                id: threadId,
+                title: thread.title ?? "",
+                metadata: newMetadata
+              });
+            }
+          }
           await this.maybeAsyncReflect(
             updatedRecord,
             updatedRecord.observationTokenCount ?? 0,
@@ -3178,6 +3244,20 @@ ${suggestedResponse}
             _ObservationalMemory.lastBufferedBoundary.set(bufKey, 0);
             this.storage.setBufferingObservationFlag(record.id, false, 0).catch(() => {
             });
+            if (activationResult.suggestedContinuation || activationResult.currentTask) {
+              const thread = await this.storage.getThreadById({ threadId });
+              if (thread) {
+                const newMetadata = memory.setThreadOMMetadata(thread.metadata, {
+                  suggestedResponse: activationResult.suggestedContinuation,
+                  currentTask: activationResult.currentTask
+                });
+                await this.storage.updateThread({
+                  id: threadId,
+                  title: thread.title ?? "",
+                  metadata: newMetadata
+                });
+              }
+            }
             await this.maybeReflect({
               record,
               observationTokens: record.observationTokenCount ?? 0,
@@ -3236,7 +3316,7 @@ ${suggestedResponse}
       state.sealedIds = sealedIds;
       const lockKey = this.getLockKey(threadId, resourceId);
       if (this.isAsyncObservationEnabled() && totalPendingTokens < threshold) {
-        const shouldTrigger = this.shouldTriggerAsyncObservation(unbufferedPendingTokens, lockKey, record);
+        const shouldTrigger = this.shouldTriggerAsyncObservation(unbufferedPendingTokens, lockKey, record, threshold);
         omDebug(
           `[OM:async-obs] belowThreshold: pending=${totalPendingTokens}, unbuffered=${unbufferedPendingTokens}, threshold=${threshold}, shouldTrigger=${shouldTrigger}, isBufferingObs=${record.isBufferingObservation}, lastBufferedAt=${record.lastBufferedAtTokens}`
         );
@@ -3252,7 +3332,7 @@ ${suggestedResponse}
           );
         }
       } else if (this.isAsyncObservationEnabled()) {
-        const shouldTrigger = this.shouldTriggerAsyncObservation(unbufferedPendingTokens, lockKey, record);
+        const shouldTrigger = this.shouldTriggerAsyncObservation(unbufferedPendingTokens, lockKey, record, threshold);
         omDebug(
           `[OM:async-obs] atOrAboveThreshold: pending=${totalPendingTokens}, unbuffered=${unbufferedPendingTokens}, threshold=${threshold}, step=${stepNumber}, shouldTrigger=${shouldTrigger}`
         );
@@ -3934,8 +4014,12 @@ ${result.observations}` : result.observations;
       messagesToBuffer,
       void 0,
       // No abort signal for background ops
-      { skipContinuationHints: true, requestContext }
+      { requestContext }
     );
+    if (!result.observations) {
+      omDebug(`[OM:doAsyncBufferedObservation] empty observations returned, skipping buffer storage`);
+      return;
+    }
     let newObservations;
     if (this.scope === "resource") {
       newObservations = await this.wrapWithThreadTag(threadId, result.observations);
@@ -3955,7 +4039,9 @@ ${result.observations}` : result.observations;
         tokenCount: newTokenCount,
         messageIds: newMessageIds,
         messageTokens,
-        lastObservedAt
+        lastObservedAt,
+        suggestedContinuation: result.suggestedContinuation,
+        currentTask: result.currentTask
       },
       lastBufferedAtTime: lastObservedAt
     });
@@ -4034,24 +4120,28 @@ ${bufferedObservations}`;
       return { success: false };
     }
     const messageTokensThreshold = this.getMaxThreshold(this.observationConfig.messageTokens);
+    let effectivePendingTokens = currentPendingTokens;
     if (messageList) {
-      const freshPendingTokens = this.tokenCounter.countMessages(messageList.get.all.db());
-      if (freshPendingTokens < messageTokensThreshold) {
+      effectivePendingTokens = this.tokenCounter.countMessages(messageList.get.all.db());
+      if (effectivePendingTokens < messageTokensThreshold) {
         omDebug(
-          `[OM:tryActivate] skipping activation: freshPendingTokens=${freshPendingTokens} < threshold=${messageTokensThreshold}`
+          `[OM:tryActivate] skipping activation: freshPendingTokens=${effectivePendingTokens} < threshold=${messageTokensThreshold}`
         );
         return { success: false };
       }
     }
-    const activationRatio = this.observationConfig.bufferActivation ?? 0.7;
+    const bufferActivation = this.observationConfig.bufferActivation ?? 0.7;
+    const activationRatio = this.resolveActivationRatio(bufferActivation, messageTokensThreshold);
+    const forceMaxActivation = !!(this.observationConfig.blockAfter && effectivePendingTokens >= this.observationConfig.blockAfter);
     omDebug(
-      `[OM:tryActivate] swapping: freshChunks=${freshChunks.length}, activationRatio=${activationRatio}, totalChunkTokens=${freshChunks.reduce((s, c) => s + (c.tokenCount ?? 0), 0)}`
+      `[OM:tryActivate] swapping: freshChunks=${freshChunks.length}, bufferActivation=${bufferActivation}, activationRatio=${activationRatio}, forceMax=${forceMaxActivation}, totalChunkTokens=${freshChunks.reduce((s, c) => s + (c.tokenCount ?? 0), 0)}`
     );
     const activationResult = await this.storage.swapBufferedToActive({
       id: freshRecord.id,
       activationRatio,
       messageTokensThreshold,
-      currentPendingTokens
+      currentPendingTokens: effectivePendingTokens,
+      forceMaxActivation
     });
     omDebug(
       `[OM:tryActivate] swapResult: chunksActivated=${activationResult.chunksActivated}, tokensActivated=${activationResult.messageTokensActivated}, obsTokensActivated=${activationResult.observationTokensActivated}, activatedCycleIds=${activationResult.activatedCycleIds.join(",")}`
@@ -4090,7 +4180,9 @@ ${bufferedObservations}`;
       success: true,
       updatedRecord: updatedRecord ?? void 0,
       messageTokensActivated: activationResult.messageTokensActivated,
-      activatedMessageIds: activationResult.activatedMessageIds
+      activatedMessageIds: activationResult.activatedMessageIds,
+      suggestedContinuation: activationResult.suggestedContinuation,
+      currentTask: activationResult.currentTask
     };
   }
   /**
@@ -4976,5 +5068,5 @@ exports.formatMessagesForObserver = formatMessagesForObserver;
 exports.hasCurrentTaskSection = hasCurrentTaskSection;
 exports.optimizeObservationsForContext = optimizeObservationsForContext;
 exports.parseObserverOutput = parseObserverOutput;
-//# sourceMappingURL=chunk-QRKB5I2S.cjs.map
-//# sourceMappingURL=chunk-QRKB5I2S.cjs.map
+//# sourceMappingURL=chunk-LLTHE64H.cjs.map
+//# sourceMappingURL=chunk-LLTHE64H.cjs.map