clementine-agent 1.18.195 → 1.18.197

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -50,6 +50,51 @@ const RESEARCHER_PROMPT = [
50
50
  '',
51
51
  'If you cannot find the requested data, say so in one line. Do not speculate.',
52
52
  ].join('\n');
53
+ /**
54
+ * 1.18.197 — discovery subagent. The owner asked Clementine to be the
55
+ * ORCHESTRATOR, not the worker. When chat says "find that coach project
56
+ * locally", "where is the X folder", "what's in ~/Downloads/Y" — the
57
+ * main session should NOT run recursive Glob/find/Read in its own turn
58
+ * (that's the autocompact thrash we kept hitting). It should dispatch
59
+ * to this subagent which has its own fresh 200K context, does the
60
+ * file-system traversal, and returns paths + a 1-paragraph summary.
61
+ *
62
+ * The discovery subagent is intentionally narrower than researcher:
63
+ * researcher investigates ONE specific item; discovery LOCATES things.
64
+ *
65
+ * Tools: Bash (head/find/ls/awk), Read (one specific file at a time
66
+ * once located), Glob, Grep — all bounded.
67
+ *
68
+ * NOT included: Edit, Write, mutating MCP tools. Pure read-only.
69
+ */
70
+ const DISCOVERY_PROMPT = [
71
+ 'You are the file-system discovery specialist. You receive a discovery request from the orchestrator and return PATHS + a brief summary.',
72
+ '',
73
+ 'Your job: locate things. NOT read full contents. NOT analyze in depth.',
74
+ '',
75
+ 'Tooling rules (these prevent the autocompact thrashing that crashes the orchestrator):',
76
+ '- Use `Bash ls -la <dir>` to enumerate a directory — never recursive Glob without --maxdepth.',
77
+ '- Use `Bash find <dir> -maxdepth 3 -name "*.csv"` (or similar) to find files matching a pattern.',
78
+ '- Use `Bash head -c 2000 <file>` to PEEK at a file — never raw Read on an unknown-size file.',
79
+ '- Use `Bash wc -l <file>` to size-check before any Read.',
80
+ '- Once you find target files, return their absolute paths + sizes + one-line descriptions.',
81
+ '- DO NOT load file contents into your context unless asked for a specific file.',
82
+ '',
83
+ 'Output format (strict):',
84
+ '```',
85
+ 'Found: <count> matching items',
86
+ '',
87
+ 'Paths:',
88
+ '- /absolute/path/to/file1.csv (12KB, 340 rows) — appears to be coach roster',
89
+ '- /absolute/path/to/file2.md (3KB) — README describing the project',
90
+ '',
91
+ 'Recommendation: <which path the orchestrator should fetch next, if any>',
92
+ '```',
93
+ '',
94
+ 'If nothing matches, say so in one line.',
95
+ '',
96
+ 'You are bounded by max 15 turns. Use them wisely — list, scope, summarize, return.',
97
+ ].join('\n');
53
98
  const CRON_FIXER_PROMPT = [
54
99
  'You are the cron-fix specialist. You diagnose and apply fixes to broken cron jobs.',
55
100
  '',
@@ -147,6 +192,31 @@ export function buildAgentMap(opts = {}) {
147
192
  effort: 'low',
148
193
  maxTurns: 15,
149
194
  };
195
+ // Discovery (1.18.197): file-system / project location. Owner's
196
+ // northstar: Clementine orchestrates, doesn't bulk-process. ANY
197
+ // local file-system traversal ("find the X project", "where is Y",
198
+ // "what's in ~/Downloads", "scan this directory") delegates here so
199
+ // the recursive find/Glob/Read outputs land in this subagent's
200
+ // 200K window instead of the orchestrator's chat session. Returns
201
+ // paths + brief summaries — never file contents.
202
+ map['discovery'] = {
203
+ description: [
204
+ 'Use this subagent for ANY local file-system or project discovery:',
205
+ '"find that X project", "locate the Y folder", "where is Z",',
206
+ '"scan ~/Downloads for W", "is there a file matching V", "list',
207
+ 'what is in directory U". The discovery subagent has its own',
208
+ 'fresh 200K context window and uses bounded `Bash` (ls, find,',
209
+ 'head, wc) — it returns absolute paths + brief descriptions but',
210
+ 'NEVER loads file contents into your main chat context. ALWAYS',
211
+ 'prefer this over running recursive Glob / `find -r` / Read on',
212
+ 'unknown-size files in your own turn — those are context bombs.',
213
+ ].join(' '),
214
+ prompt: DISCOVERY_PROMPT,
215
+ model: 'haiku',
216
+ tools: ['Bash', 'Read', 'Grep', 'Glob'],
217
+ effort: 'low',
218
+ maxTurns: 15,
219
+ };
150
220
  // Cron-fixer: sonnet, owns the broken-job diagnose+apply path.
151
221
  // Tools restricted to the canonical fix path (no parallel mechanisms).
152
222
  map['cron-fixer'] = {
@@ -125,12 +125,23 @@ const BEHAVIORAL_POSTURE = `## How you operate
125
125
 
126
126
  **Verification posture for disputed claims.** If you see "Dispute mode" in the turn context, the owner is reporting that prior work FAILED. Past \`done\` claims in memory are NOT authoritative — your recall is biased. Before defending any past success, re-verify against reality: curl URLs, check file existence, run status commands. Saying "but my memory says it's live" without re-checking is a hallucination, not a defense.
127
127
 
128
- **Fan-out posture (1.18.194).** When the owner asks for 3+ similar operations send N emails, pull N records, enrich N contacts, summarize N pages dispatch subagents in PARALLEL via the Agent tool. One subagent per item. Don't loop in your own turn; that's slow, serializes I/O that should be concurrent, and burns context linearly. Available subagents (see Agent tool descriptions for the canonical list):
129
- - \`researcher\` (Haiku, parallel, read-only) — per-item investigation
130
- - \`planner\` (Opus, 1-turn, no tools) — decomposition before write/send batches
131
- - Hired agents (Ross, Nora, etc.) — cross-delegation when relevant
128
+ **Orchestrator posture (1.18.197).** You are the orchestrator, not the worker. Your job in chat is to UNDERSTAND what the owner wants, DELEGATE the heavy lifting to the right subagent, and ORCHESTRATE the final response. The main chat session is a small, focused context not a workspace for bulk file reads or recursive directory traversal. Loading raw tool outputs into your own turn is the failure mode; delegating is the success mode.
132
129
 
133
- A 25-contact enrichment that fans out to 25 \`researcher\` calls finishes in ~30s. The same work done serially in your own turn takes 10+ minutes AND fills your context window with tool outputs. Default to fan-out for batch work.`;
130
+ **Tool-selection rubric.** Before running tools yourself, ask which bucket the request falls into:
131
+
132
+ 1. **Local discovery / file-system traversal** ("find the X project", "where is Y", "scan ~/Downloads", "what's in this folder", "is there a file matching Z") → dispatch \`discovery\` subagent via the Agent tool. It has its own 200K context and returns paths + summaries. Never run recursive \`Glob\`/\`find\`/\`Read\` on unknown-size files in your own turn — that's a context bomb.
133
+
134
+ 2. **Per-item batch work** (send N emails, pull N contacts, enrich N records, summarize N pages, "for each of these…") → dispatch \`researcher\` subagents in PARALLEL — one per item. A 25-item job that fans out finishes in ~30s. The same work done serially in your own turn takes 10+ minutes and fills your context with tool outputs.
135
+
136
+ 3. **Multi-step decomposition needed first** (Zach-style "find the project, build a report, deploy it, verify") → owner can opt into this via \`/plan\` which dispatches the \`planner\` subagent to decompose first, then chain workers per step. Don't auto-trigger plan mode yourself; respond directly and use subagents for the parts you can decompose.
137
+
138
+ 4. **Broken cron jobs** ("fix the X job", "what's failing", "re-run Y") → dispatch \`cron-fixer\` subagent — it owns the diagnose-and-apply flow with the right tools.
139
+
140
+ 5. **Cross-agent work** (work that belongs to Ross / Nora / Sasha / etc.) → dispatch the hired agent as a subagent so they execute with their own identity and tools.
141
+
142
+ 6. **Single, targeted action** (read this specific file, write this output, call this one MCP tool, send this one message) → do it yourself in your own turn. Direct tool use is correct when the scope is small and known.
143
+
144
+ **The northstar.** A request like "find that coach project locally and build a report" should look like: you dispatch \`discovery\` to find the project (returns paths), then you Read the specific README it returned (one targeted Read), then you dispatch a worker subagent or do the report-write yourself depending on scope. NOT: you run a recursive \`Glob\` then 20 Reads in your own turn.`;
134
145
  /**
135
146
  * Read the long-term memory block for an autonomous run (cron, team-task).
136
147
  * Returns the agent-specific MEMORY.md when a hired agent is active, the
@@ -2606,20 +2606,68 @@ export class Gateway {
2606
2606
  // result with terminalReason), we surface a clean "rephrase or
2607
2607
  // /plan" message via the `case 'context_overflow':` handler
2608
2608
  // below. No more "Planning failed" half-finished chains.
2609
- let runAgentResult;
2610
- try {
2611
- runAgentResult = await runAgent(finalPrompt, buildRunAgentChatOptions({
2612
- ...(priorSdkSessionId ? { resumeSessionId: priorSdkSessionId } : {}),
2609
+ // 1.18.196 — smart session reset on overflow.
2610
+ //
2611
+ // SDK session files on disk can grow to 80+ MB after months of
2612
+ // chat. When we resume one of those, the SDK tries to load the
2613
+ // entire accumulated JSONL into context — easily blows past
2614
+ // 200K tokens on the very first user message after a daemon
2615
+ // start. Zach hit this on a freshly-installed 1.18.195 because
2616
+ // his topshelfbot session had months of history.
2617
+ //
2618
+ // The fix is the same thing any human would do: throw the
2619
+ // resume away, start a fresh SDK session, send the same user
2620
+ // message. The user loses chat memory of the prior session
2621
+ // (it's gone anyway — we can't load it), but they get an
2622
+ // actual response instead of a "rephrase smaller" message.
2623
+ //
2624
+ // This is ONE retry max. If the fresh session also overflows,
2625
+ // that's a genuine ceiling — surface the recovery message.
2626
+ // No planner queueing, no compression dance, no second retry.
2627
+ //
2628
+ // Distinct from the deleted-in-1.18.194 pipeline because:
2629
+ // - No prompt compression — the prompt is fine
2630
+ // - No planner background task — that was the failing piece
2631
+ // - No second retry — one shot only
2632
+ // - Triggered only when we WERE resuming (overflow on a
2633
+ // virgin session is a real ceiling, not a stale-history
2634
+ // problem)
2635
+ let didFreshSessionRetry = false;
2636
+ const runAgentOnce = async (resumeId) => {
2637
+ return runAgent(finalPrompt, buildRunAgentChatOptions({
2638
+ ...(resumeId ? { resumeSessionId: resumeId } : {}),
2613
2639
  ...(chatSystemAppend ? { systemPromptAppend: chatSystemAppend } : {}),
2614
2640
  }));
2641
+ };
2642
+ const maybeRetryFresh = async () => {
2643
+ if (didFreshSessionRetry || !priorSdkSessionId)
2644
+ return null;
2645
+ didFreshSessionRetry = true;
2646
+ logger.info({
2647
+ sessionKey: effectiveSessionKey,
2648
+ priorSdkSessionId,
2649
+ }, 'Context overflow on resumed session — clearing session and retrying once with fresh SDK session');
2650
+ this.assistant.clearSession(effectiveSessionKey);
2651
+ if (onProgress) {
2652
+ await onProgress('long conversation history — starting a fresh session...').catch(() => { });
2653
+ }
2654
+ return runAgentOnce(undefined);
2655
+ };
2656
+ let runAgentResult;
2657
+ try {
2658
+ runAgentResult = await runAgentOnce(priorSdkSessionId);
2615
2659
  }
2616
2660
  catch (err) {
2617
2661
  if (chatAc.signal.aborted || classifyChatError(err) !== 'context_overflow') {
2618
2662
  throw err;
2619
2663
  }
2620
- // Re-throw so the outer catch's classifyChatError gets it
2621
- // and routes to the 'context_overflow' case.
2622
- throw err;
2664
+ // First attempt overflowed. If we were resuming, try once
2665
+ // with a fresh session. Otherwise (no resume to drop),
2666
+ // surface the recovery message.
2667
+ const retried = await maybeRetryFresh();
2668
+ if (!retried)
2669
+ throw err;
2670
+ runAgentResult = retried;
2623
2671
  }
2624
2672
  if (!chatAc.signal.aborted && runAgentResultIndicatesContextOverflow(runAgentResult)) {
2625
2673
  logger.info({
@@ -2627,8 +2675,15 @@ export class Gateway {
2627
2675
  subtype: runAgentResult.subtype,
2628
2676
  terminalReason: runAgentResult.terminalReason,
2629
2677
  textPreview: runAgentResult.text?.slice(0, 240),
2630
- }, 'Context overflow result — autocompact ceiling reached, surfacing recovery message');
2631
- throw new Error('context_overflow_after_autocompact');
2678
+ didFreshSessionRetry,
2679
+ }, 'Context overflow result — attempting fresh-session retry or surfacing recovery');
2680
+ const retried = await maybeRetryFresh();
2681
+ if (retried && !runAgentResultIndicatesContextOverflow(retried)) {
2682
+ runAgentResult = retried;
2683
+ }
2684
+ else {
2685
+ throw new Error('context_overflow_after_autocompact');
2686
+ }
2632
2687
  }
2633
2688
  if (ledgerRunMetadata) {
2634
2689
  ledgerRunMetadata.runId = runAgentResult.runId;
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "clementine-agent",
3
- "version": "1.18.195",
3
+ "version": "1.18.197",
4
4
  "description": "Clementine — Personal AI Assistant (TypeScript)",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",