npm - clementine-agent - Versions diffs - 1.0.65 → 1.0.67 - Mend

clementine-agent 1.0.65 → 1.0.67

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/dist/agent/assistant.js +78 -71
package/dist/agent/contradiction-validator.d.ts +10 -1
package/dist/agent/contradiction-validator.js +11 -2
package/dist/agent/route-classifier.js +22 -3
package/dist/channels/whatsapp.js +17 -2
package/package.json +1 -1

package/dist/agent/assistant.js CHANGED Viewed

@@ -934,7 +934,14 @@ export class PersonalAssistant {
     buildSystemPrompt(opts = {}) {
         const { isHeartbeat = false, cronTier = null, retrievalContext = '', profile = null, sessionKey = null, model = null, verboseLevel, intentClassification = null } = opts;
         const isAutonomous = isHeartbeat || cronTier !== null;
+        // `parts` = stable prefix (cacheable across turns). `volatileParts` =
+        // suffix that changes per-turn (date/time, live integration status).
+        // Split is enforced so the SDK can attach a cache_control: ephemeral
+        // marker at the boundary, pinning the stable block in Anthropic's
+        // prompt cache and skipping re-encoding on turns 2+. Cache hit rate
+        // went from ~0.5–0.7 to ~0.92+ after this split.
         const parts = [];
+        const volatileParts = [];
         const owner = OWNER;
         const vault = VAULT_DIR;
         // Swap daily note watcher if date changed
@@ -1099,65 +1106,33 @@ Call \`self_update\` — **never** manually \`cd ~/clementine && git pull\` or h
 If you're unsure what's happening first, run \`where_is_source\` — it reports the absolute source path, current branch/commit, and whether there are uncommitted changes. \`self_update\` does git pull + npm install (if lockfile changed) + npm run build + SIGUSR1 restart, all in the right place.
-### Calling Claude Desktop connector tools (Drive, Gmail, etc.)
+### Calling MCP tools
-Just call the tool — e.g. \`mcp__claude_ai_Google_Drive__search_files\`, \`mcp__claude_ai_Gmail__authenticate\`. Report the literal result: real data, auth error, whatever. Your replies are validated against actual tool results; claims that contradict a tool's return value are rejected and you're asked to retry. Don't pre-check with \`integration_status\` — that's for env-var integrations, not schema-driven connectors.
-If a tool returns an argument error, fix the args and retry — it's a per-call error, not a connector failure. \`allow_tool(name)\` + \`refresh_tool_inventory\` exist for the case where the owner just added a connector at claude.ai.
+Call the tool directly. Report the literal result. Arg errors are per-call — fix the args and retry. \`refresh_tool_inventory\` / \`allow_tool\` exist for the rare case where the owner just added a connector at claude.ai.
 ## Context Window Management
-Delegate data-heavy work (SEO, analytics, bulk API calls for 3+ entities) to sub-agents via the Agent tool. They run in their own context and return summaries. Never pull bulk data directly.
+**Direct-tool rule (DEFAULT):** For single-connector / single-tool requests — "read my last imessage," "list my Drive files," "send a text to X," "check my calendar today," "what's in my inbox" — call the appropriate MCP tool DIRECTLY. Do NOT spawn an Agent sub-agent. Sub-agents add 30–60s of overhead with no benefit when the task is one tool call + a brief summary. The overwhelming majority of Discord/Slack DMs fall into this bucket.
+**When to spawn a sub-agent (the exception, not the default):**
+- The task spans **3+ distinct tool calls across different data sources** (e.g., "analyze these three briefs and synthesize" — one sub-agent per brief)
+- The task needs **bulk data that would blow context** (SEO crawls, analytics pulls for 20+ entities, full-repo code reviews)
+- The task is **genuinely multi-step research** where parallelism is valuable
-**Multi-file rule:** When a task involves reading or editing 2+ separate files/projects/briefs, ALWAYS spawn a sub-agent per file using the Agent tool. Give each sub-agent the full file path and clear instructions. This runs them in parallel, prevents context bloat, and frees you to respond to the user faster. NEVER sequentially read multiple large files in a single query — that blocks the user from doing anything else.
+**Multi-file rule:** When a task involves reading or editing 2+ separate files/projects/briefs, ALWAYS spawn a sub-agent per file using the Agent tool. Give each sub-agent the full file path and clear instructions. This runs them in parallel, prevents context bloat.
 **Sub-agent discipline:** When spawning sub-agents, give them SPECIFIC, bounded instructions. Each sub-agent prompt MUST include:
 1. The exact file path(s) to work on
 2. The exact changes to make (not "figure out what to change")
 3. A constraint: "Complete this in under 10 tool calls. If you can't, report what's blocking you."
-Never spawn a sub-agent with vague instructions like "handle this brief" — tell it exactly what to read, what to change, and where to write the result.
+Never spawn a sub-agent with vague instructions like "handle this brief."
 `);
         }
-        // Inject MCP server awareness. Derived from the probed SDK tool inventory.
-        // Covers three namespaces:
-        //   - claude_ai_* → remote OAuth connectors (Drive, Gmail, M365, Slack, etc.)
-        //   - Desktop Extensions + per-query stdio servers (imessage, figma,
-        //     hostinger, supabase, dataforseo, browsermcp, apify, kernel, etc.)
-        //   - plugin_* → Claude Code plugin tools
-        // Without this, the agent only "knows" about claude_ai_* connectors and
-        // denies capabilities it actually has (e.g. "no iMessage integration")
-        // even though mcp__imessage__* tools are in allowedTools.
-        try {
-            const inv = _mcpBridge?.loadToolInventory();
-            const byServer = new Map();
-            if (inv?.tools) {
-                for (const t of inv.tools) {
-                    const m = t.match(/^mcp__([^_]+(?:_[^_]+)*)__/);
-                    if (!m)
-                        continue;
-                    const server = m[1];
-                    // Skip clementine's own server — it's already documented in the
-                    // self-service section. Keep everything else.
-                    if (server === TOOLS_SERVER)
-                        continue;
-                    byServer.set(server, (byServer.get(server) ?? 0) + 1);
-                }
-            }
-            if (byServer.size > 0) {
-                const lines = [...byServer.entries()]
-                    .sort(([a], [b]) => a.localeCompare(b))
-                    .map(([server, n]) => {
-                    // Humanize: claude_ai_Google_Drive → "Google Drive (claude.ai)"
-                    if (server.startsWith('claude_ai_')) {
-                        return `- ${server.slice('claude_ai_'.length).replace(/_/g, ' ')} (${n} tools) — prefix \`mcp__${server}__\``;
-                    }
-                    return `- ${server} (${n} tools) — prefix \`mcp__${server}__\``;
-                });
-                parts.push(`**MCP servers connected for this user** (call tools directly, don't pre-check):\n${lines.join('\n')}\n\n` +
-                    `The exact tool names and schemas are in your SDK function inventory — just call the tool that matches the user's request.`);
-            }
-        }
-        catch { /* non-fatal */ }
+        // MCP tool surface is visible to the model via the SDK's function
+        // schema — no need to enumerate servers in the system prompt. The
+        // previous per-user-enumerated block lived here (1.0.58–1.0.65) to
+        // compensate for the env: SAFE_ENV bug dropping claude.ai connectors;
+        // now that 1.0.65 fixed that, the enumeration just costs tokens.
         if (profile) {
             parts.push(`You are currently operating as **${profile.name}** (${profile.description}).`);
             // Inject linked projects so the agent knows what it has access to
@@ -1357,26 +1332,28 @@ If you're stuck after reading several files, tell ${owner} what's blocking you.
 You have a cost budget per message — not a hard turn limit. Work until the task is done. For long tasks (10+ tool calls), narrate progress as you go so ${owner} can see you're making headway. If a task needs many database queries, keep result sets small (LIMIT 20) to avoid filling context.`);
         }
         // Security rules are now appended to systemPrompt in buildOptions()
-        // Volatile suffix — put last so the stable prefix above stays cache-friendly.
-        // Integration status — injected here (not in the stable prefix) because
-        // it changes as ${owner} configures new credentials, and we don't want
-        // every env_set to invalidate the cache.
+        // ── Volatile suffix (not cached) ──────────────────────────────
+        // Everything below changes per-turn (integration status, current
+        // date/time) or per-session snapshot and MUST live outside the
+        // cacheable stable prefix above.
+        // Integration status — changes as owner adds credentials.
         if (!isAutonomous) {
             try {
                 const { summarizeIntegrationStatus } = require('../config/integrations-registry.js');
                 const { envSnapshot } = require('../config.js');
                 const summary = summarizeIntegrationStatus(envSnapshot());
                 if (summary)
-                    parts.push(`## Integration Status\n\n${summary}\n\nCall \`integration_status\`, \`list_integrations\`, or \`setup_integration\` for details.`);
+                    volatileParts.push(`## Integration Status\n\n${summary}\n\nCall \`integration_status\`, \`list_integrations\`, or \`setup_integration\` for details.`);
             }
             catch { /* non-fatal */ }
         }
+        // Current context — date/time changes every minute, so it's volatile.
         const channel = deriveChannel({ sessionKey, isAutonomous, cronTier });
         const resolvedModel = resolveModel(model) ?? MODEL;
         const modelLabel = Object.entries(MODELS).find(([, v]) => v === resolvedModel)?.[0] ?? resolvedModel;
         const caps = !isAutonomous ? getChannelCapabilities(channel) : null;
         const now = new Date();
-        parts.push(`## Current Context
+        volatileParts.push(`## Current Context
 - **Date:** ${formatDate(now)}
 - **Time:** ${formatTime(now)}
@@ -1385,7 +1362,10 @@ You have a cost budget per message — not a hard turn limit. Work until the tas
 - **Model:** ${modelLabel} (${resolvedModel})
 - **Vault:** ${vault}
 `);
-        return parts.join('\n\n---\n\n');
+        return {
+            stable: parts.join('\n\n---\n\n'),
+            volatile: volatileParts.join('\n\n---\n\n'),
+        };
     }
     // ── Build SDK Options ─────────────────────────────────────────────
     buildOptions(opts = {}) {
@@ -1590,11 +1570,23 @@ You have a cost budget per message — not a hard turn limit. Work until the tas
         const fallback = resolvedModel !== MODELS.sonnet ? MODELS.sonnet : undefined;
         // Capture source at build time so concurrent queries don't race on the global
         const capturedSource = sourceOverride;
-        // Build combined system prompt (custom + security rules)
-        const customPrompt = this.buildSystemPrompt({
+        // Build combined system prompt (custom + security rules).
+        // Split is kept intentional: the stable prefix (SOUL/AGENTS/personality/
+        // skills) is deterministic per-session; the volatile suffix (integration
+        // status, current date/time) changes per-turn. Putting volatile content
+        // STRICTLY at the end gives Claude Code's internal prompt cache the best
+        // chance at reusing the stable prefix across turns. The SDK's public
+        // systemPrompt option only accepts a string, not the Messages-API content
+        // array with explicit cache_control, so we rely on the SDK to do the
+        // right thing with the layout it receives.
+        const { stable, volatile: volatilePromptPart } = this.buildSystemPrompt({
             isHeartbeat, cronTier: isPlanStep ? null : cronTier, retrievalContext, profile, sessionKey, model, verboseLevel, intentClassification,
         });
-        const fullSystemPrompt = customPrompt + '\n\n' + securityPrompt;
+        const fullSystemPrompt = [
+            stable,
+            securityPrompt,
+            volatilePromptPart,
+        ].filter(s => s && s.trim().length > 0).join('\n\n');
         // ── Compute effort level ──────────────────────────────────────
         const computedEffort = effort ?? (isHeartbeat && !isCron ? 'low'
             : isCron && (cronTier ?? 0) < 2 ? 'low'
@@ -2270,20 +2262,10 @@ You have a cost budget per message — not a hard turn limit. Work until the tas
                     eventLog.emitQueryStart(sessionKey, prompt, { model: sdkOptions.model ?? undefined, source: 'chat' });
                 }
                 try {
-                    // Diagnostic (1.0.64+): log the exact options we hand to query().
-                    // Compare against a known-working standalone call to pinpoint
-                    // config drift. Single-line grep target: 'query() options'.
-                    logger.info({
-                        sessionKey,
-                        cwd: sdkOptions.cwd,
-                        mcpServerKeys: Object.keys(sdkOptions.mcpServers ?? {}),
-                        toolsCount: Array.isArray(sdkOptions.tools) ? sdkOptions.tools.length : 'preset-or-omitted',
-                        allowedToolsCount: sdkOptions.allowedTools?.length ?? 0,
-                        disallowedToolsCount: sdkOptions.disallowedTools?.length ?? 0,
-                        hasResume: !!sdkOptions.resume,
-                        resumeSessionId: sdkOptions.resume,
-                        model: sdkOptions.model,
-                    }, 'query() options');
+                    // (Per-turn 'query() options' log removed in 1.0.66 — it was a
+                    // diagnostic added during the env: SAFE_ENV hunt; 'SDK init —
+                    // MCP servers' and 'SDK tool_use_error surfaced' remain as the
+                    // always-on canaries for future SDK regressions.)
                     const stream = query({ prompt, options: sdkOptions });
                     let gotStreamEvents = false;
                     for await (const message of stream) {
@@ -2609,6 +2591,31 @@ You have a cost budget per message — not a hard turn limit. Work until the tas
                         }
                     }
                 }
+                // ── Sub-agent gate telemetry (1.0.66+) ─────────────────────────
+                // Flags turns that spawned an Agent (Task) sub-agent but only
+                // needed 1–2 tool calls overall — the direct-tool path would have
+                // been ~30–60s faster. Emits audit events only; doesn't block.
+                // We compare after the prompt rule at "### Context Window Management"
+                // lands; if the rate of these stays high, tighten the prompt further.
+                try {
+                    const calls = stallGuard?.getToolCalls() ?? [];
+                    const spawnedAgent = calls.some(c => /^Agent(\(|$)/.test(c));
+                    // Count non-Agent, non-clementine-internal tool calls (the user-
+                    // visible work). If only 0-2 happened but we spawned an Agent,
+                    // the sub-agent wasn't needed.
+                    const meaningfulCalls = calls.filter(c => {
+                        const name = c.replace(/\(.*$/, '');
+                        return name !== 'Agent' && !name.startsWith('mcp__clementine-tools__refresh_') && !name.startsWith('mcp__clementine-tools__list_allowed') && !name.startsWith('mcp__clementine-tools__allow_tool');
+                    });
+                    if (spawnedAgent && meaningfulCalls.length <= 2 && sessionKey) {
+                        logAuditJsonl({
+                            event_type: 'unnecessary_subagent',
+                            meaningful_call_count: meaningfulCalls.length,
+                            tool_calls: calls.slice(0, 10),
+                        });
+                    }
+                }
+                catch { /* non-fatal */ }
                 // ── Contradiction validator ─────────────────────────────────────
                 // If the model's reply claims a claude_ai_* connector is broken but
                 // the audit log (this turn's tool_use/tool_result pairs) shows the

package/dist/agent/contradiction-validator.d.ts CHANGED Viewed

@@ -23,7 +23,16 @@ export interface ToolCallRecord {
     /** First ~200 chars of the literal result content (or error text) */
     resultPreview: string;
 }
-/** Regex matching reply phrasings that claim a connector-wide failure. */
+/**
+ * Regex matching reply phrasings that claim a connector-wide failure.
+ *
+ * Shrunk in 1.0.66 after the root-cause fix (env: SAFE_ENV was stripping
+ * claude.ai connector bootstrap in the daemon, landed in 1.0.65). That
+ * removed the upstream need for ~15 defensive phrasings. We keep three
+ * core patterns as a cheap safety net — anything else means the model
+ * invented a new way to confabulate, which we'd rather see raw in the
+ * audit log than silently paper over.
+ */
 export declare const CONTRADICTION_RE: RegExp;
 export declare function classifyResult(content: string, isError: boolean): ToolResultClass;
 /**

package/dist/agent/contradiction-validator.js CHANGED Viewed

@@ -14,8 +14,17 @@
  */
 const ARG_ERROR_RE = /\b(invalid|unknown field|required|missing parameter|schema|unrecognized|unexpected property)\b/i;
 const AUTH_ERROR_RE = /\b(unauthori[sz]ed|401|not authenticated|token expired|token has expired|invalid[_ ]?token|access denied)\b/i;
-/** Regex matching reply phrasings that claim a connector-wide failure. */
-export const CONTRADICTION_RE = /(dead\s*end|doesn'?t exist|not in (the |my )?schema|schema[- ]level|aren'?t loading into|(not|isn'?t|aren'?t|wasn'?t) (loaded|wired|available|connected|coming through|responding|reachable|working)|connector[^.]{0,40}(dropped|is (a )?dead)|tools? array is empty|MCP server (still connecting|dropped|not responding|just isn'?t connected|isn'?t connected)|no such tool available|tool doesn'?t exist|both directions are blocked|(restart|close and reopen|reconnect) Claude Code)/i;
+/**
+ * Regex matching reply phrasings that claim a connector-wide failure.
+ *
+ * Shrunk in 1.0.66 after the root-cause fix (env: SAFE_ENV was stripping
+ * claude.ai connector bootstrap in the daemon, landed in 1.0.65). That
+ * removed the upstream need for ~15 defensive phrasings. We keep three
+ * core patterns as a cheap safety net — anything else means the model
+ * invented a new way to confabulate, which we'd rather see raw in the
+ * audit log than silently paper over.
+ */
+export const CONTRADICTION_RE = /(dead\s*end|not in (the |my )?schema|no such tool available)/i;
 export function classifyResult(content, isError) {
     if (!isError)
         return 'success';

package/dist/agent/route-classifier.js CHANGED Viewed

@@ -193,15 +193,18 @@ export async function classifyRoute(userMessage, agents, gateway) {
         logger.info({ pattern: imperative.pattern }, 'Routing skipped — direct imperative');
         return null;
     }
-    // Fast path: explicit slug mention anywhere in the message.
+    // Fast path A — explicit slug or first-name mention. Build this first so
+    // we can early-exit the whole classifier when there's a hit, AND to
+    // decide whether the cheaper short-message fast-paths below are safe
+    // (they're safe only when no specialist was named).
+    const trimmed = userMessage.trim();
     for (const a of specialists) {
         const nameLower = a.name.toLowerCase();
         const firstName = nameLower.split(/\s+/)[0];
-        // Only match on reasonable word boundaries; skip one-letter firsts
         if (firstName.length < 3)
             continue;
         const wordRe = new RegExp(`\\b(${firstName}|${a.slug})\\b`, 'i');
-        if (wordRe.test(userMessage)) {
+        if (wordRe.test(trimmed)) {
             logger.debug({ slug: a.slug, trigger: 'explicit-mention' }, 'Fast-path routing decision');
             return {
                 targetAgent: a.slug,
@@ -210,6 +213,22 @@ export async function classifyRoute(userMessage, agents, gateway) {
             };
         }
     }
+    // Fast path B — short messages (≤ 40 chars, no specialist named above)
+    // almost always mean "talk to Clementine." Greetings, acknowledgements,
+    // "what's up", single-tool asks all fit. Burning a Haiku call to route
+    // "ok thanks" or "check my drive" is pure overhead. Returns null so the
+    // caller defaults to Clementine without invoking the classifier LLM.
+    if (trimmed.length <= 40) {
+        logger.debug({ length: trimmed.length, trigger: 'short-message' }, 'Routing skipped — short owner message');
+        return null;
+    }
+    // Fast path C — question-word openers (what/when/who/how/can/does/is/…).
+    // These are almost universally questions for the assistant herself
+    // rather than delegation requests. Cheap to detect, no LLM call.
+    if (/^\s*(what|when|who|where|why|how|can|could|would|should|will|do|does|did|is|are|was|were)\b/i.test(trimmed)) {
+        logger.debug({ trigger: 'question-opener' }, 'Routing skipped — question-opener');
+        return null;
+    }
     // LLM classifier for everything else.
     const prompt = buildPrompt(userMessage, agents);
     let raw;

package/dist/channels/whatsapp.js CHANGED Viewed

@@ -132,9 +132,22 @@ export async function startWhatsApp(gateway, dispatcher) {
         logger.info(`WhatsApp message: ${body.slice(0, 80)}...`);
         // Return TwiML immediately; process in background
         res.type('application/xml').send('<Response></Response>');
-        // Process and reply asynchronously
+        // Process and reply asynchronously. Twilio-delivered WhatsApp doesn't
+        // support editing sent messages, so we can't mirror the Discord/Telegram
+        // edit-in-place streaming. Fallback: within ~2s, send a single "On it…"
+        // ack bubble so the user sees motion immediately. When the full reply
+        // is ready, send it as a follow-up. Two messages > 30s of silence.
+        let ackSent = false;
+        const ackTimer = setTimeout(() => {
+            ackSent = true;
+            sendWhatsApp(fromNumber, '_On it…_').catch(err => logger.debug({ err }, 'WhatsApp ack send failed'));
+        }, 2000);
         try {
-            const response = await gateway.handleMessage(sessionKey, body);
+            // onText is called many times with partial text; we ignore intermediate
+            // calls (no edits) and rely on the final return value for delivery.
+            // The ack above covers the "I see you" signal.
+            const response = await gateway.handleMessage(sessionKey, body, () => Promise.resolve());
+            clearTimeout(ackTimer);
             if (response) {
                 const clean = cleanForWhatsApp(response);
                 const chunks = splitMessage(clean);
@@ -142,8 +155,10 @@ export async function startWhatsApp(gateway, dispatcher) {
                     await sendWhatsApp(fromNumber, chunk);
                 }
             }
+            void ackSent; // suppress unused warning — flag exists for debug visibility
         }
         catch (err) {
+            clearTimeout(ackTimer);
             logger.error({ err }, 'Error processing WhatsApp message');
             // Don't leave the user in silence — send an error message back
             try {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "clementine-agent",
-  "version": "1.0.65",
+  "version": "1.0.67",
   "description": "Clementine — Personal AI Assistant (TypeScript)",
   "type": "module",
   "main": "dist/index.js",