@denizokcu/haze 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,23 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ ## 0.2.0 - 2026-06-07
6
+
7
+ - Improved coding-loop reliability with stronger continuation behavior after failed edits, failed validation, missing validation, tool-budget interruptions, and incomplete assistant responses.
8
+ - Added structured bash command classification for read-only, mutating, destructive, network, validation, and unknown commands, with cwd, duration, timeout, and classification metadata in bash results.
9
+ - Added validation-output parsing for common test, typecheck, lint, and build commands, including failed files, failed tests, diagnostics, summaries, and suggested next steps.
10
+ - Added shared structured tool result types and more specific file-edit failure reason codes so edit recovery can reread the affected file and retry with better guidance.
11
+ - Reworked the system prompt, subagent prompt, compaction prompt, and generated-skill guidance around autonomous expert developer workflows with concise final status reporting.
12
+ - Removed hard-coded `temperature: 0` from model calls so providers/models that reject temperature options can run without warning workarounds.
13
+ - Removed bash confirmation gates, including for destructive classifications; Haze now assumes expert users know what they asked for and relies on transparent tool output rather than permission prompts.
14
+ - Improved chat input editing with wrapped multi-line display, vertical cursor movement across wrapped lines, and better cursor mapping for compacted paste blocks.
15
+ - Added and updated tests for bash classification, bash execution behavior, validation parsing, edit recovery, system-prompt guidance, and skill generation.
16
+
17
+ ## 0.1.1 - 2026-06-07
18
+
19
+ - Bundled ripgrep with `@vscode/ripgrep` and updated the `grep` tool to use the package-provided binary path, removing the requirement for users to install `rg` separately or expose it on `PATH`.
20
+ - Updated release documentation and site copy for the 0.1.1 patch release.
21
+
5
22
  ## 0.1.0 - 2026-06-07
6
23
 
7
24
  - Added ripgrep-backed `grep` for fast workspace search with regex, glob, context-line, case-insensitive, and result-limit options.
package/README.md CHANGED
@@ -2,16 +2,17 @@
2
2
 
3
3
  A minimal LLM harness for your terminal.
4
4
 
5
- ## What's new in 0.1.0
5
+ ## What's new in 0.2.0
6
6
 
7
- Haze 0.1.0 is the foundation release: the agent can now *find*, *delegate*, and *show its work* without turning your terminal into soup.
7
+ Haze 0.2.0 is a reliability release for the everyday coding loop: inspect, edit, validate, and report what happened without getting timid or noisy.
8
8
 
9
- - `grep` gives the model fast, targeted codebase search with regex, globs, context lines, and `.gitignore` awareness no more brute-force file spelunking.
10
- - Subagents let Haze fan out independent investigations into fresh contexts, then fold the result back into the main turn as a concise summary.
11
- - File edits now render compact, colorized inline diffs with one context line around the change; big diffs stay summarized so signal beats scrollback.
12
- - Long-turn handling is calmer: truncated model output and tool-heavy loops recover more gracefully.
9
+ - The agent loop is more persistent after failed edits, failed validation, missing validation, and tool-heavy turns. Haze now pushes toward a concrete final status instead of stopping at a vague recap.
10
+ - Bash execution now includes command classification, working directory, duration, timeout state, and parsed validation summaries for common test/typecheck/lint/build output.
11
+ - File-tool failures carry structured reason codes and recovery hints, making exact-edit failures easier for the model to repair with a fresh read and targeted retry.
12
+ - The system and subagent prompts now assume expert users: relevant commands should run directly, including mutating shell workflows, while blockers are reserved for concrete tool failures or real ambiguity.
13
+ - The chat input wraps across multiple visible lines and supports vertical cursor movement, which makes longer prompts and pasted context easier to edit.
13
14
 
14
- The result is a more capable agent loop while keeping the core small and inspectable. Haze gives an AI model transparent local tools — read, search, edit, write, list, and run commands — plus focused delegation when work can split safely. Tiny spell, sharper goblin.
15
+ The result is a sharper supervised coding loop while keeping the core small and inspectable. Haze gives an AI model transparent local tools — read, search, edit, write, list, and run commands — plus focused delegation when work can split safely. Tiny spell, steadier goblin.
15
16
 
16
17
  Haze works with OpenAI-compatible providers, including OpenRouter and local endpoints. Use `/provider` to choose or add one, then `/model` to select a model.
17
18
 
@@ -24,7 +25,7 @@ Haze works with OpenAI-compatible providers, including OpenRouter and local endp
24
25
  |_| |_|\__,_/___\___|
25
26
  ```
26
27
 
27
- Haze keeps guardrails light. The LLM can work from the terminal with freedoms close to yours, while trying to stay scoped to the current project. Watch the tool calls. Keep your hands near the wheel. Progress.
28
+ Haze keeps guardrails light. The LLM can work from the terminal with freedoms close to yours, while trying to stay scoped to the current project. It is aimed at developers who want an expert-oriented tool, not a permission dialog factory. Watch the tool calls. Keep your hands near the wheel. Progress.
28
29
 
29
30
  ## Getting started
30
31
 
@@ -77,7 +78,7 @@ Open a project and ask for work:
77
78
  create a calculator in calc-app in ruby with add subtract multiply divide
78
79
  ```
79
80
 
80
- Haze will inspect, search, write files, run commands, and show compact tool activity inline. Small file edits include a colorized line diff with one context line before and after the change; large diffs stay summarized so the transcript does not become a wall of noise. Sessions are saved by default so you can resume the latest workspace conversation with `haze --continue` or `/resume`.
81
+ Haze will inspect, search, write files, run commands, and show compact tool activity inline. Small file edits include a colorized line diff with one context line before and after the change; large diffs stay summarized so the transcript does not become a wall of noise. Bash validation output is summarized when possible so failures point at the relevant files, tests, or diagnostics. Sessions are saved by default so you can resume the latest workspace conversation with `haze --continue` or `/resume`.
81
82
 
82
83
  Use `/` to discover commands and skills. `Tab` completes the top suggestion.
83
84
 
@@ -194,10 +195,10 @@ Haze exposes a deliberately small toolset:
194
195
  - `editFile` — unique text replacements, with line-number-prefix tolerance for common model mistakes.
195
196
  - `replaceLines` — line-range edits when exact replacements are awkward; slightly-too-large EOF ranges are clamped.
196
197
  - `writeFile` — create files and parent directories.
197
- - `bash` — run tests, builds, git commands, and inspections.
198
+ - `bash` — run tests, builds, git commands, inspections, scripts, installs, and other shell workflows with command classification metadata.
198
199
  - `skill_*` — load Markdown skill instructions on demand.
199
200
 
200
- Tool calls are grouped in the transcript so you can see what happened without reading a novella. Successful targeted file edits show a compact diff with colored additions/removals and one context line around the change when the diff is small; larger diffs are summarized with a pointer to `git diff`. File-tool failures return structured recovery hints instead of mystery stack traces.
201
+ Tool calls are grouped in the transcript so you can see what happened without reading a novella. Successful targeted file edits show a compact diff with colored additions/removals and one context line around the change when the diff is small; larger diffs are summarized with a pointer to `git diff`. File-tool failures return structured reason codes and recovery hints instead of mystery stack traces. Bash validation commands can return parsed summaries with failed files, failed tests, diagnostics, and suggested next steps.
201
202
 
202
203
  ## Subagents
203
204
 
@@ -223,8 +224,8 @@ Use `AGENTS.md` for project conventions, commands, architecture notes, and thing
223
224
  - File tools are restricted to the current workspace.
224
225
  - File tools follow `.gitignore` by default.
225
226
  - Ignored files require an explicit override.
226
- - Bash mutations are discouraged by the tool contract.
227
- - Destructive actions should require explicit user confirmation.
227
+ - Bash commands are classified and shown with working-directory metadata, but Haze does not use command confirmation gates.
228
+ - Mutating and destructive commands can run when they are relevant to the user's request; this is intentional for expert users.
228
229
  - Haze is powerful enough to help and dumb enough to deserve supervision. Ideal software, basically.
229
230
 
230
231
  ## Local development
@@ -721,7 +721,7 @@ function ChatScreen({ debug = false, version, continueSession = false, noSession
721
721
  ];
722
722
  return _jsxs(Box, { flexDirection: "column", children: [_jsx(Static, { items: staticItems, children: item => item.kind === 'header'
723
723
  ? _jsx(Header, { subtitle: item.subtitle, version: version }, item.key)
724
- : _jsx(MessageView, { message: item.message, width: width }, item.key) }), activeLiveMessages.length > 0 && _jsx(Box, { flexDirection: "column", flexShrink: 0, children: activeLiveMessages.map((message, index) => _jsx(MessageView, { message: message, width: width }, messageKey(message, index))) }), debug && debugLogs.length > 0 && _jsxs(Box, { flexDirection: "column", flexShrink: 0, marginBottom: 1, borderStyle: "round", borderColor: theme.muted, paddingX: 1, children: [_jsx(Text, { color: theme.muted, bold: true, children: "Debug" }), debugLogs.map((line, index) => _jsxs(Text, { color: theme.muted, children: ["\u2022 ", line] }, index))] }), queuedFollowUps.length > 0 && _jsxs(Box, { flexDirection: "column", flexShrink: 0, marginBottom: 1, children: [_jsx(Text, { color: theme.muted, children: "Queued follow-ups:" }), queuedFollowUps.map((item, index) => _jsxs(Text, { color: theme.muted, dimColor: true, children: [" ", index + 1, ". ", item] }, `${index}-${item}`))] }), busy && _jsx(Box, { flexShrink: 0, marginBottom: 1, children: _jsxs(Text, { children: [_jsxs(Text, { color: theme.orange, bold: true, children: [_jsx(Spinner, { type: "dots" }), " ", busyLabel] }), _jsx(Text, { color: theme.muted, dimColor: true, children: " \u00B7 type to queue follow-up \u00B7 esc to interrupt" })] }) }), goalText && _jsx(Box, { flexShrink: 0, children: _jsxs(Text, { wrap: "truncate-end", children: [_jsx(Text, { color: theme.blue, bold: true, children: "Goal:" }), _jsxs(Text, { color: "white", children: [" ", goalRequest] }), goalStatusText ? _jsxs(Text, { color: theme.orange, children: [" \u00B7 ", goalStatusText] }) : null] }) }), _jsx(Box, { borderStyle: "round", borderColor: theme.deepPurple, paddingX: 1, flexShrink: 0, children: _jsx(Box, { flexGrow: 1, minWidth: 0, children: _jsx(TextInput, { placeholder: placeholder, disabled: busy && mode !== 'chat', mask: mode === 'providerAddKey', historyItems: inputHistory, recordHistory: mode === 'chat', suggestions: inputSuggestions, suggestionMode: mode === 'provider' || mode === 'providerAction' || mode === 'model' ? 'always' : 'slash', submitOnEmpty: mode === 'providerAddKey', onHistoryAdd: persistInputHistory, onCancel: cancelThinking, onEscape: () => {
724
+ : _jsx(MessageView, { message: item.message, width: width }, item.key) }), activeLiveMessages.length > 0 && _jsx(Box, { flexDirection: "column", flexShrink: 0, children: activeLiveMessages.map((message, index) => _jsx(MessageView, { message: message, width: width }, messageKey(message, index))) }), debug && debugLogs.length > 0 && _jsxs(Box, { flexDirection: "column", flexShrink: 0, marginBottom: 1, borderStyle: "round", borderColor: theme.muted, paddingX: 1, children: [_jsx(Text, { color: theme.muted, bold: true, children: "Debug" }), debugLogs.map((line, index) => _jsxs(Text, { color: theme.muted, children: ["\u2022 ", line] }, index))] }), queuedFollowUps.length > 0 && _jsxs(Box, { flexDirection: "column", flexShrink: 0, marginBottom: 1, children: [_jsx(Text, { color: theme.muted, children: "Queued follow-ups:" }), queuedFollowUps.map((item, index) => _jsxs(Text, { color: theme.muted, dimColor: true, children: [" ", index + 1, ". ", item] }, `${index}-${item}`))] }), busy && _jsx(Box, { flexShrink: 0, marginBottom: 1, children: _jsxs(Text, { children: [_jsxs(Text, { color: theme.orange, bold: true, children: [_jsx(Spinner, { type: "dots" }), " ", busyLabel] }), _jsx(Text, { color: theme.muted, dimColor: true, children: " \u00B7 type to queue follow-up \u00B7 esc to interrupt" })] }) }), goalText && _jsx(Box, { flexShrink: 0, children: _jsxs(Text, { wrap: "truncate-end", children: [_jsx(Text, { color: theme.blue, bold: true, children: "Goal:" }), _jsxs(Text, { color: "white", children: [" ", goalRequest] }), goalStatusText ? _jsxs(Text, { color: theme.orange, children: [" \u00B7 ", goalStatusText] }) : null] }) }), _jsx(Box, { borderStyle: "round", borderColor: theme.deepPurple, paddingX: 1, flexShrink: 0, children: _jsx(Box, { flexGrow: 1, minWidth: 0, children: _jsx(TextInput, { placeholder: placeholder, disabled: busy && mode !== 'chat', mask: mode === 'providerAddKey', historyItems: inputHistory, recordHistory: mode === 'chat', suggestions: inputSuggestions, suggestionMode: mode === 'provider' || mode === 'providerAction' || mode === 'model' ? 'always' : 'slash', submitOnEmpty: mode === 'providerAddKey', width: Math.max(20, width - 4), onHistoryAdd: persistInputHistory, onCancel: cancelThinking, onEscape: () => {
725
725
  if (busy)
726
726
  cancelThinking();
727
727
  else
@@ -50,8 +50,17 @@ export function toolResultSummary(event) {
50
50
  const count = output.totalMatches;
51
51
  return count === 0 ? 'no matches' : `${count} match${count === 1 ? '' : 'es'}`;
52
52
  }
53
- if (typeof output?.code === 'number')
54
- return `exited with code ${output.code}`;
53
+ if (typeof output?.validationSummary === 'object' && output.validationSummary != null && 'summaryText' in output.validationSummary) {
54
+ const summary = output.validationSummary;
55
+ const next = typeof summary.suggestedNextStep === 'string' ? `; next: ${summary.suggestedNextStep}` : '';
56
+ return `${String(summary.summaryText)}${next}`;
57
+ }
58
+ if (typeof output?.code === 'number') {
59
+ const risk = typeof output.classification?.riskLevel === 'string'
60
+ ? ` (${output.classification.riskLevel})`
61
+ : '';
62
+ return `exited with code ${output.code}${risk}`;
63
+ }
55
64
  if (typeof output?.status === 'string' && typeof output?.summary === 'string') {
56
65
  const summary = output.summary.split('\n')[0] ?? '';
57
66
  const preview = summary.length > 120 ? `${summary.slice(0, 120).trimEnd()}…` : summary;
@@ -69,7 +78,8 @@ export function toolResultSummary(event) {
69
78
  }
70
79
  return 'completed';
71
80
  }
72
- return typeof output.error === 'string' ? `failed: ${compact(output.error)}` : 'failed';
81
+ const reason = typeof output.reasonCode === 'string' ? ` (${output.reasonCode})` : '';
82
+ return typeof output.error === 'string' ? `failed${reason}: ${compact(output.error)}` : `failed${reason}`;
73
83
  }
74
84
  return 'completed';
75
85
  }
@@ -82,7 +92,13 @@ export function toolOutputDetails(value) {
82
92
  const output = value;
83
93
  const stdout = output.stdout?.text?.trim();
84
94
  const stderr = output.stderr?.text?.trim();
95
+ const meta = [
96
+ output.cwd ? `cwd: ${output.cwd}` : '',
97
+ output.classification?.riskLevel ? `classification: ${output.classification.riskLevel}${output.classification.reason ? ` — ${output.classification.reason}` : ''}` : '',
98
+ output.validationSummary?.summaryText ? `validation: ${output.validationSummary.summaryText}${output.validationSummary.suggestedNextStep ? `\nnext: ${output.validationSummary.suggestedNextStep}` : ''}` : '',
99
+ ].filter(Boolean).join('\n');
85
100
  const parts = [
101
+ meta,
86
102
  stdout ? `stdout:\n${compact(stdout, 1200)}` : '',
87
103
  stderr ? `stderr:\n${compact(stderr, 1200)}` : '',
88
104
  ].filter(Boolean);
@@ -126,7 +126,7 @@ export async function runAgentTurn(value, displayValue, contextFiles, callbacks,
126
126
  const likelyPlanImplementationRequest = isPlanImplementationRequest(value);
127
127
  const likelyActionRequest = isActionRequest(value);
128
128
  const likelyValidationRequest = isValidationRequest(value);
129
- const planImplementationGuidance = 'When implementing a plan file, first identify the concrete required checklist items and compare them with the current files. Do not edit source or tests when the required behavior is already present. Implement the smallest clearly required phase or required items, skip optional/design-question items unless explicitly requested, add tests rather than exploratory one-off scripts where possible, use file tools (not bash) for any file changes, run validation once after code/test edits, then update plan status with file tools if requested. Do not call unresolved optional scope a blocker.';
129
+ const planImplementationGuidance = 'Haze internal guidance for implementing plan files. The original user request remains authoritative. First identify the concrete required checklist items and compare them with the current files. Do not edit source or tests when the required behavior is already present. Implement the smallest clearly required phase or required items, skip optional/design-question items unless explicitly requested, add tests rather than exploratory one-off scripts where possible, prefer file tools for source changes, run validation once after code/test edits, then update plan status with file tools if requested. Do not call unresolved optional scope a blocker.';
130
130
  const requestMessages = retryingExistingRequest
131
131
  ? callbacks.getConversation()
132
132
  : likelyPlanImplementationRequest
@@ -151,6 +151,7 @@ export async function runAgentTurn(value, displayValue, contextFiles, callbacks,
151
151
  let completionContinuationCount = 0;
152
152
  const maxCompletionContinuations = COMPLETION_CONTINUATION_LIMIT;
153
153
  let editRecoveryPath;
154
+ let editRecoveryReasonCode;
154
155
  let editRecoveryReadSatisfied = false;
155
156
  const toolSummaries = [];
156
157
  const visibleAssistantTexts = new Set();
@@ -294,6 +295,7 @@ export async function runAgentTurn(value, displayValue, contextFiles, callbacks,
294
295
  if (!ok && ['editFile', 'replaceLines', 'writeFile'].includes(event.toolCall.toolName)) {
295
296
  editFileFailed = true;
296
297
  editRecoveryPath = path;
298
+ editRecoveryReasonCode = typeof event.output === 'object' && event.output != null && 'reasonCode' in event.output && typeof event.output.reasonCode === 'string' ? event.output.reasonCode : undefined;
297
299
  editRecoveryReadSatisfied = false;
298
300
  }
299
301
  if (ok && ['listFiles', 'readFile'].includes(event.toolCall.toolName))
@@ -305,6 +307,7 @@ export async function runAgentTurn(value, displayValue, contextFiles, callbacks,
305
307
  mutatingToolSucceeded = true;
306
308
  if (!path || path === editRecoveryPath) {
307
309
  editRecoveryPath = undefined;
310
+ editRecoveryReasonCode = undefined;
308
311
  editRecoveryReadSatisfied = false;
309
312
  editFileFailed = false;
310
313
  }
@@ -329,7 +332,6 @@ export async function runAgentTurn(value, displayValue, contextFiles, callbacks,
329
332
  ];
330
333
  const followUp = streamText({
331
334
  model: activeModel,
332
- temperature: 0,
333
335
  maxOutputTokens: DEFAULT_MAX_OUTPUT_TOKENS,
334
336
  system: buildSystemPrompt(contextFiles),
335
337
  messages: continuationMessages,
@@ -363,7 +365,7 @@ export async function runAgentTurn(value, displayValue, contextFiles, callbacks,
363
365
  activeTools: ['readFile'],
364
366
  messages: [
365
367
  ...messages,
366
- { role: 'user', content: `A previous edit failed for ${editRecoveryPath}. Before any further edit or bash inspection, call readFile on exactly ${editRecoveryPath}. Bash/cat does not satisfy this recovery step.` },
368
+ { role: 'user', content: `A previous edit failed for ${editRecoveryPath}${editRecoveryReasonCode ? ` (${editRecoveryReasonCode})` : ''}. Before any further edit or bash inspection, call readFile on exactly ${editRecoveryPath}. Bash/cat does not satisfy this recovery step.` },
367
369
  ],
368
370
  };
369
371
  }
@@ -436,7 +438,6 @@ export async function runAgentTurn(value, displayValue, contextFiles, callbacks,
436
438
  let lastFinishReason;
437
439
  const result = streamText({
438
440
  model: activeModel,
439
- temperature: 0,
440
441
  maxOutputTokens: DEFAULT_MAX_OUTPUT_TOKENS,
441
442
  system: buildSystemPrompt(contextFiles),
442
443
  messages: requestMessages,
@@ -467,7 +468,7 @@ export async function runAgentTurn(value, displayValue, contextFiles, callbacks,
467
468
  activeTools: ['readFile'],
468
469
  messages: [
469
470
  ...messages,
470
- { role: 'user', content: `A previous edit failed for ${editRecoveryPath}. Before any further edit or bash inspection, call readFile on exactly ${editRecoveryPath}. Bash/cat does not satisfy this recovery step.` },
471
+ { role: 'user', content: `A previous edit failed for ${editRecoveryPath}${editRecoveryReasonCode ? ` (${editRecoveryReasonCode})` : ''}. Before any further edit or bash inspection, call readFile on exactly ${editRecoveryPath}. Bash/cat does not satisfy this recovery step.` },
471
472
  ],
472
473
  };
473
474
  }
@@ -597,6 +598,7 @@ export async function runAgentTurn(value, displayValue, contextFiles, callbacks,
597
598
  validationToolFailed,
598
599
  editFileFailed,
599
600
  editRecoveryPath,
601
+ editRecoveryReasonCode,
600
602
  });
601
603
  let decision = decideCompletion(combinedAssistantText);
602
604
  async function runCompletionLoop(seedConversation, seedText) {
@@ -18,7 +18,9 @@ export function compactModelMessages(messages, options = {}) {
18
18
  return text ? `- ${message.role}: ${text.slice(0, 500)}` : '';
19
19
  }).filter(Boolean).join('\n');
20
20
  const summary = [
21
- 'Compacted prior Haze conversation. Continue preserving the user goal, constraints, decisions, files touched, validation results, and unresolved next steps from this summary.',
21
+ 'Compacted prior Haze conversation. Treat this as continuity context, not a new user request.',
22
+ 'Preserve especially: current user goal and success condition; explicit user constraints/preferences/decisions; files created/changed/read; validation commands and pass/fail results; blockers or pending product decisions; exact next action if work was unfinished.',
23
+ 'Do not treat older tool outputs as current unless the recent conversation confirms they still apply.',
22
24
  options.instructions ? `User compaction instructions: ${options.instructions}` : undefined,
23
25
  '',
24
26
  'Older context summary:',
@@ -13,6 +13,7 @@ export interface CompletionPolicyInput {
13
13
  validationToolFailed: boolean;
14
14
  editFileFailed: boolean;
15
15
  editRecoveryPath?: string;
16
+ editRecoveryReasonCode?: string;
16
17
  }
17
18
  export interface CompletionDecision {
18
19
  needsActionContinuation: boolean;
@@ -25,4 +26,4 @@ export interface CompletionDecision {
25
26
  export declare function completionDecision(input: CompletionPolicyInput): CompletionDecision;
26
27
  export declare function toolLoopBudgetPrompt(): string;
27
28
  export declare function postContinuationPrompt(): string;
28
- export declare function noTextAfterToolPrompt(allowTools: boolean): "Continue the original request now. If it asks for a change, edit or write the necessary files. If it asks to run or verify tests, run the command. Do not provide only a retrospective summary unless blocked." | "Continue from the tool result and answer my original request. Do not call tools. Summarize only current-turn changes and validation; do not recap unrelated earlier tasks.";
29
+ export declare function noTextAfterToolPrompt(allowTools: boolean): "Continue the original request now. If it asks for a change, edit or write the necessary files. If it asks to run or verify tests, run the command. Do not provide only a retrospective summary unless blocked or needing a user decision." | "Continue from the tool result and answer my original request. Do not call tools. Use the final status template for implementation-like requests; summarize only current-turn changes and validation; do not recap unrelated earlier tasks.";
@@ -44,23 +44,30 @@ export function completionDecision(input) {
44
44
  && !requestCompletedByTools
45
45
  && !input.validationToolSucceeded
46
46
  && !assistantReportsBlocker;
47
+ const stateLines = [
48
+ `User goal: ${input.request}`,
49
+ input.editRecoveryPath ? `Edit recovery path: ${input.editRecoveryPath}` : undefined,
50
+ input.editRecoveryReasonCode ? `Edit failure reason: ${input.editRecoveryReasonCode}` : undefined,
51
+ input.mutatingToolSucceeded ? 'Files changed in this turn: yes' : 'Files changed in this turn: no',
52
+ input.validationToolSucceeded ? 'Validation status: passed' : input.validationToolFailed ? 'Validation status: failed' : 'Validation status: not run',
53
+ ].filter((line) => line !== undefined).join('\n');
47
54
  let continuationPrompt;
48
55
  if (input.editFileFailed) {
49
- continuationPrompt = 'Your editFile attempt failed. Use the latest readFile line-numbered output and replaceLines to complete the requested change. Continue with any remaining tests or validation if relevant. Do not stop with a summary.';
56
+ continuationPrompt = `State:\n${stateLines}\n\nRequired next action: call readFile on the exact edit recovery path first. Then use the latest line-numbered output with replaceLines, or a corrected editFile call, to complete the requested change. Continue with relevant validation if practical. Do not stop with a summary while tools are available.`;
50
57
  }
51
58
  else if (input.validationToolFailed && input.mutatingToolSucceeded) {
52
- continuationPrompt = 'Validation failed after files changed in this task. Inspect the failure output, fix failures that are plausibly caused by the current change, then rerun the relevant validation once. If the failure is clearly unrelated or environment-specific, summarize the blocker instead of expanding scope.';
59
+ continuationPrompt = `State:\n${stateLines}\n\nRequired next action: Validation failed after files changed in this task. Use the validation summary/output to inspect the first relevant failure, make one focused fix if it is plausibly caused by this change, then rerun the same relevant validation once. If it is an environment/dependency/unrelated failure, finish with Status: blocked or Status: partial and concrete evidence.`;
53
60
  }
54
61
  else if (needsValidationContinuation) {
55
62
  continuationPrompt = changedActionNeedsValidation
56
- ? 'Files changed for this request, but no validation has run yet. Continue by running the smallest relevant test/check command you can identify from the project. If no practical validation exists, state that concrete blocker briefly instead of claiming the goal is complete.'
57
- : 'You have not run the requested validation yet. Continue now by running the appropriate test/check command. Summarize only after the command finishes.';
63
+ ? `State:\n${stateLines}\n\nRequired next action: files changed for this request, but no validation has run. Run the smallest relevant test/typecheck/build command you can identify. If no practical validation exists, finish with the final status template and say why validation was not run.`
64
+ : `State:\n${stateLines}\n\nRequired next action: run the requested validation now. Summarize only after the command finishes.`;
58
65
  }
59
66
  else if (input.mutatingToolSucceeded && assistantAdmitsIncomplete) {
60
- continuationPrompt = 'Your previous response says the current request is incomplete. Continue now with the remaining edits and validation for this same request. Do not summarize a plan unless blocked.';
67
+ continuationPrompt = `State:\n${stateLines}\n\nRequired next action: your previous response described unfinished work. Continue with the remaining in-scope edits and validation for this same request. Do not summarize a plan unless concretely blocked.`;
61
68
  }
62
69
  else if (needsActionContinuation) {
63
- continuationPrompt = 'You inspected files but have not made the requested change yet. Continue now by editing or writing the necessary files. Do not summarize a plan unless blocked.';
70
+ continuationPrompt = `State:\n${stateLines}\n\nRequired next action: you inspected files but have not made the requested change yet. Edit or write the necessary files now. Do not summarize a plan unless concretely blocked.`;
64
71
  }
65
72
  return {
66
73
  needsActionContinuation,
@@ -72,13 +79,13 @@ export function completionDecision(input) {
72
79
  };
73
80
  }
74
81
  export function toolLoopBudgetPrompt() {
75
- return 'Tool slice reached for this model step. Do not output XML, JSON tool-call syntax, <tool_call> blocks, or function-call markup. If the current request is complete, summarize only current-turn changes and validation. If the requested change is incomplete, state the next concrete unfinished action briefly so Haze can continue autonomously in a fresh tool slice. Do not claim tools are unavailable, recap unrelated earlier tasks, or provide a generic remains list.';
82
+ return 'Tool slice reached for this model step. Do not output XML, JSON tool-call syntax, <tool_call> blocks, or function-call markup. If the current request is complete, answer with the final status template using only current-turn changes and validation evidence. If incomplete, state the single next concrete unfinished action so Haze can continue autonomously in a fresh tool slice. Do not claim tools are unavailable, recap unrelated earlier tasks, or provide a generic remains list.';
76
83
  }
77
84
  export function postContinuationPrompt() {
78
- return 'Your previous response still described unfinished work, missing validation, or a tool-budget issue. If any tools are still available, complete the remaining edit or run the final validation now. Only call something a blocker if a concrete tool failure prevents progress.';
85
+ return 'Your previous response still described unfinished work, missing validation, or a tool-budget issue. If tools are available, complete the remaining edit or run the final validation now. Only call something blocked for a concrete tool failure, missing dependency/permission, or unavoidable ambiguity.';
79
86
  }
80
87
  export function noTextAfterToolPrompt(allowTools) {
81
88
  return allowTools
82
- ? 'Continue the original request now. If it asks for a change, edit or write the necessary files. If it asks to run or verify tests, run the command. Do not provide only a retrospective summary unless blocked.'
83
- : 'Continue from the tool result and answer my original request. Do not call tools. Summarize only current-turn changes and validation; do not recap unrelated earlier tasks.';
89
+ ? 'Continue the original request now. If it asks for a change, edit or write the necessary files. If it asks to run or verify tests, run the command. Do not provide only a retrospective summary unless blocked or needing a user decision.'
90
+ : 'Continue from the tool result and answer my original request. Do not call tools. Use the final status template for implementation-like requests; summarize only current-turn changes and validation; do not recap unrelated earlier tasks.';
84
91
  }
@@ -0,0 +1,10 @@
1
+ export type BashRiskLevel = 'read_only' | 'mutating' | 'destructive' | 'network' | 'unknown';
2
+ export type BashTrait = 'reads_files' | 'writes_files' | 'deletes_files' | 'installs_dependencies' | 'runs_tests' | 'runs_build' | 'uses_network' | 'changes_git_state' | 'changes_permissions';
3
+ export type BashClassification = {
4
+ riskLevel: BashRiskLevel;
5
+ traits: BashTrait[];
6
+ confidence: 'high' | 'medium' | 'low';
7
+ reason: string;
8
+ };
9
+ export declare function classifyBashCommand(command: string): BashClassification;
10
+ export declare function isValidationClassification(classification: BashClassification): boolean;
@@ -0,0 +1,51 @@
1
+ function has(command, pattern) {
2
+ return pattern.test(command);
3
+ }
4
+ function uniq(values) {
5
+ return [...new Set(values)];
6
+ }
7
+ export function classifyBashCommand(command) {
8
+ const trimmed = command.trim();
9
+ const traits = [];
10
+ const lower = trimmed.toLowerCase();
11
+ const complex = /[`$()]|\b(eval|xargs|sh\s+-c|bash\s+-c)\b/.test(trimmed);
12
+ if (!trimmed) {
13
+ return { riskLevel: 'unknown', traits: [], confidence: 'high', reason: 'empty command' };
14
+ }
15
+ if (has(lower, /(^|[;&|]\s*)(rm\b|rm\s+-|git\s+reset\s+--hard\b|git\s+clean\b|git\s+restore\s+\.|git\s+checkout\s+--\b)/) || has(lower, /push\b.*--force|drop\s+database|truncate\s+table/)) {
16
+ if (has(lower, /\brm\b|git\s+clean|git\s+restore|git\s+checkout\s+--|drop\s+database|truncate\s+table/))
17
+ traits.push('deletes_files');
18
+ if (has(lower, /\bgit\b/))
19
+ traits.push('changes_git_state');
20
+ return { riskLevel: 'destructive', traits: uniq(traits), confidence: complex ? 'medium' : 'high', reason: 'command can delete files or irreversibly change repository state' };
21
+ }
22
+ if (has(lower, /(^|[;&|]\s*)(curl\b|wget\b|scp\b|ssh\b|npm\s+(install|i|add)\b|pnpm\s+(install|add)\b|yarn\s+(add|install)\b|pip\s+install\b|brew\s+install\b)/)) {
23
+ traits.push('uses_network');
24
+ if (has(lower, /\b(npm|pnpm|yarn|pip|brew)\b/))
25
+ traits.push('installs_dependencies', 'writes_files');
26
+ return { riskLevel: has(lower, /\b(curl|wget|scp|ssh)\b/) && !has(lower, /\binstall|\badd\b/) ? 'network' : 'mutating', traits: uniq(traits), confidence: complex ? 'medium' : 'high', reason: 'command uses the network or installs dependencies' };
27
+ }
28
+ if (has(trimmed, /(^|\s)(>|>>)(\s|\S)/) || has(lower, /(^|[;&|]\s*)(sed\s+-i|perl\s+-pi|tee\b|chmod\b|mv\b|cp\b|mkdir\b|touch\b|git\s+(add|commit|merge|rebase|checkout|restore)\b)/) || has(trimmed, /\b(File\.write|writeFileSync|writeFile|appendFileSync|appendFile)\b/)) {
29
+ traits.push('writes_files');
30
+ if (has(lower, /\bchmod\b/))
31
+ traits.push('changes_permissions');
32
+ if (has(lower, /\bgit\b/))
33
+ traits.push('changes_git_state');
34
+ return { riskLevel: 'mutating', traits: uniq(traits), confidence: complex ? 'medium' : 'high', reason: 'command can modify files or repository state' };
35
+ }
36
+ if (has(lower, /(^|[;&|]\s*)(npm\s+test|npm\s+run\s+(test|typecheck|lint|build)|pnpm\s+(test|run\s+(test|typecheck|lint|build))|yarn\s+(test|run\s+(test|typecheck|lint|build))|vitest\b|jest\b|tsc\b|eslint\b)/)) {
37
+ if (has(lower, /test|vitest|jest/))
38
+ traits.push('runs_tests');
39
+ if (has(lower, /build|tsc|typecheck|lint|eslint/))
40
+ traits.push('runs_build');
41
+ return { riskLevel: 'read_only', traits: uniq(traits), confidence: complex ? 'medium' : 'high', reason: 'validation command' };
42
+ }
43
+ if (has(lower, /(^|[;&|]\s*)(git\s+(status|diff|log|show|branch)\b|rg\b|grep\b|find\b|ls\b|pwd\b|cat\b|head\b|tail\b|node\s+--version|npm\s+--version|which\b)/)) {
44
+ traits.push('reads_files');
45
+ return { riskLevel: complex ? 'unknown' : 'read_only', traits: uniq(traits), confidence: complex ? 'low' : 'high', reason: complex ? 'read-like command with complex shell syntax' : 'read-only inspection command' };
46
+ }
47
+ return { riskLevel: 'unknown', traits: [], confidence: 'low', reason: 'command did not match known safe patterns' };
48
+ }
49
+ export function isValidationClassification(classification) {
50
+ return classification.traits.includes('runs_tests') || classification.traits.includes('runs_build');
51
+ }
@@ -28,6 +28,6 @@ export declare function createSubagentTool(options: {
28
28
  contextFiles: ContextFile[];
29
29
  }): import("ai").Tool<{
30
30
  task: string;
31
- tools?: ("editFile" | "replaceLines" | "writeFile" | "listFiles" | "readFile" | "grep" | "bash")[] | undefined;
31
+ tools?: ("listFiles" | "editFile" | "replaceLines" | "writeFile" | "readFile" | "grep" | "bash")[] | undefined;
32
32
  maxSteps?: number | undefined;
33
33
  }, SubagentResult>;
@@ -2,15 +2,17 @@ import { streamText, stepCountIs, tool } from 'ai';
2
2
  import { z } from 'zod';
3
3
  import { buildSystemPrompt } from '../../llm/systemPrompt.js';
4
4
  import { hazeTools } from '../../llm/hazeTools.js';
5
- const SUBAGENT_SYSTEM_PROMPT = `You are a focused subagent. Complete the assigned task using all tools available to you, then return a clear summary.
5
+ const SUBAGENT_SYSTEM_PROMPT = `You are a focused subagent for a professional developer workflow. Complete the assigned task autonomously using the available tools, then return a clear summary.
6
6
 
7
7
  Rules:
8
- - Use whatever tools you need. You have full access to file tools and bash.
9
- - If the task requires creating or modifying files, do it directly do not ask for permission.
10
- - If a tool fails, read the file again and retry with the correct content.
11
- - After completing the task, summarize what you did, what files you created or changed, and any important findings.
8
+ - Use whatever tools you need within the assigned scope. You have full access to file tools and bash.
9
+ - If the task requires creating or modifying files, do it directly with file tools or an efficient non-destructive shell command; do not ask for permission for ordinary edits.
10
+ - Assume expert users understand what they asked for. Do not ask for command confirmations; run relevant commands directly.
11
+ - Destructive and ordinary professional workflows should proceed without extra ceremony when they are relevant to the assigned task.
12
+ - If a file edit tool fails, read the exact file again and retry once with current content or line numbers.
13
+ - After completing the task, summarize what you did, what files you created or changed, validation run, and important findings.
12
14
  - If you cannot complete the task, explain exactly what blocked you and what you tried.
13
- - Your summary is all the parent agent will see. Be specific: include file paths, function names, and concrete results.`;
15
+ - Your summary is all the parent agent will see. Be specific: include file paths, function names, command results, and concrete next steps.`;
14
16
  const ALL_TOOLS = ['listFiles', 'readFile', 'grep', 'bash', 'editFile', 'replaceLines', 'writeFile'];
15
17
  const STEP_LIMIT = 25;
16
18
  const MAX_SUMMARY = 4000;
@@ -56,7 +58,6 @@ export async function runSubagent(task, options) {
56
58
  try {
57
59
  const result = streamText({
58
60
  model: options.model,
59
- temperature: 0,
60
61
  maxOutputTokens: 4096,
61
62
  system: `${SUBAGENT_SYSTEM_PROMPT}\n\n${buildSystemPrompt(options.contextFiles)}`,
62
63
  messages: [{ role: 'user', content: task }],
@@ -71,7 +72,7 @@ export async function runSubagent(task, options) {
71
72
  return {
72
73
  toolChoice: 'none',
73
74
  messages: [
74
- { role: 'user', content: 'You have done enough tool work. Summarize what you found or did right now.' },
75
+ { role: 'user', content: 'Tool budget reached for this subtask. Summarize what you found or changed, validation evidence, and the exact remaining action if incomplete. Do not claim tools are unavailable.' },
75
76
  ],
76
77
  };
77
78
  }
@@ -0,0 +1,12 @@
1
+ import type { BashClassification } from '../safety/bashClassifier.js';
2
+ import type { ValidationSummary } from '../../llm/toolResultTypes.js';
3
+ export declare function parseValidationOutput(input: {
4
+ command: string;
5
+ code: number | null;
6
+ stdout: string;
7
+ stderr: string;
8
+ timedOut?: boolean;
9
+ stdoutTruncated?: boolean;
10
+ stderrTruncated?: boolean;
11
+ classification?: BashClassification;
12
+ }): ValidationSummary;
@@ -0,0 +1,79 @@
1
+ function uniq(values) {
2
+ return [...new Set(values.filter(Boolean))];
3
+ }
4
+ function inferKind(command, classification) {
5
+ const lower = command.toLowerCase();
6
+ if (/typecheck|\btsc\b/.test(lower))
7
+ return 'typecheck';
8
+ if (/\beslint\b|\blint\b/.test(lower))
9
+ return 'lint';
10
+ if (/\bbuild\b/.test(lower) || classification?.traits.includes('runs_build'))
11
+ return 'build';
12
+ if (/\b(test|vitest|jest)\b/.test(lower) || classification?.traits.includes('runs_tests'))
13
+ return 'test';
14
+ return 'generic';
15
+ }
16
+ export function parseValidationOutput(input) {
17
+ const text = `${input.stdout}\n${input.stderr}`;
18
+ const lines = text.split(/\r?\n/);
19
+ const diagnostics = [];
20
+ const failedTests = [];
21
+ const failedFiles = [];
22
+ const kind = inferKind(input.command, input.classification);
23
+ const status = input.timedOut ? 'timed_out' : input.code === 0 ? 'passed' : input.code == null ? 'unknown' : 'failed';
24
+ for (const line of lines) {
25
+ const ts = line.match(/^(.+?\.(?:ts|tsx|js|jsx|mts|cts))\((\d+),(\d+)\):\s+(error|warning)\s+TS\d+:\s+(.+)$/);
26
+ if (ts) {
27
+ const [, file, lineNo, column, severity, message] = ts;
28
+ diagnostics.push({ file, line: Number(lineNo), column: Number(column), severity: severity === 'warning' ? 'warning' : 'error', message: message ?? '' });
29
+ failedFiles.push(file ?? '');
30
+ continue;
31
+ }
32
+ const eslint = line.match(/^(.+?\.(?:ts|tsx|js|jsx|mts|cts))\s*$/);
33
+ if (eslint) {
34
+ const currentFile = eslint[1] ?? '';
35
+ const next = lines[lines.indexOf(line) + 1];
36
+ if (next && /^\s*\d+:\d+\s+/.test(next))
37
+ failedFiles.push(currentFile);
38
+ }
39
+ const eslintDiag = line.match(/^\s*(\d+):(\d+)\s+(error|warning)\s+(.+?)(?:\s{2,}\S+)?$/);
40
+ if (eslintDiag) {
41
+ const [, lineNo, column, severity, message] = eslintDiag;
42
+ diagnostics.push({ line: Number(lineNo), column: Number(column), severity: severity === 'warning' ? 'warning' : 'error', message: message ?? '' });
43
+ continue;
44
+ }
45
+ const vitestFile = line.match(/^\s*(?:FAIL|FAILED|✓|✗|❯)?\s*([^\s]+\.(?:test|spec)\.(?:ts|tsx|js|jsx))/i);
46
+ if (vitestFile)
47
+ failedFiles.push(vitestFile[1] ?? '');
48
+ const testName = line.match(/^\s*(?:FAIL|✗|×|●|-)\s+(.+)$/);
49
+ if (testName && !/^(FAIL|FAILED)\s+\S+\.(?:test|spec)\./i.test(line.trim()))
50
+ failedTests.push((testName[1] ?? '').trim());
51
+ const genericFile = line.match(/([^\s()]+\.(?:ts|tsx|js|jsx|mts|cts)):(\d+):(\d+)/);
52
+ if (genericFile) {
53
+ const [, file, lineNo, column] = genericFile;
54
+ failedFiles.push(file ?? '');
55
+ diagnostics.push({ file, line: Number(lineNo), column: Number(column), severity: /warn/i.test(line) ? 'warning' : 'error', message: line.trim() });
56
+ }
57
+ }
58
+ const uniqueFiles = uniq(failedFiles).slice(0, 10);
59
+ const uniqueTests = uniq(failedTests).slice(0, 10);
60
+ const diagCount = diagnostics.length;
61
+ const rawOutputTruncated = Boolean(input.stdoutTruncated || input.stderrTruncated);
62
+ let summaryText;
63
+ if (status === 'passed')
64
+ summaryText = `${kind} passed`;
65
+ else if (status === 'timed_out')
66
+ summaryText = `${kind} timed out`;
67
+ else if (uniqueTests.length > 0)
68
+ summaryText = `${kind} failed: ${uniqueTests.length} failed test${uniqueTests.length === 1 ? '' : 's'}${uniqueFiles.length ? ` in ${uniqueFiles.join(', ')}` : ''}`;
69
+ else if (diagCount > 0)
70
+ summaryText = `${kind} failed: ${diagCount} diagnostic${diagCount === 1 ? '' : 's'}${uniqueFiles.length ? ` in ${uniqueFiles.join(', ')}` : ''}`;
71
+ else
72
+ summaryText = `${kind} ${status}`;
73
+ const suggestedNextStep = status === 'failed'
74
+ ? uniqueFiles.length > 0
75
+ ? `Inspect ${uniqueFiles.slice(0, 3).join(', ')} and fix the first relevant failure.`
76
+ : 'Inspect the command output and fix the first relevant failure.'
77
+ : undefined;
78
+ return { kind, status, failedFiles: uniqueFiles, failedTests: uniqueTests, diagnostics: diagnostics.slice(0, 20), summaryText, suggestedNextStep, rawOutputTruncated };
79
+ }
@@ -1,9 +1,4 @@
1
- type ToolDiffLine = {
2
- type: 'add' | 'remove' | 'context';
3
- oldLine?: number;
4
- newLine?: number;
5
- text: string;
6
- };
1
+ import type { ToolDiffLine, ToolFailureReasonCode } from './toolResultTypes.js';
7
2
  export declare const hazeTools: {
8
3
  listFiles: import("ai").Tool<{
9
4
  path: string;
@@ -16,8 +11,11 @@ export declare const hazeTools: {
16
11
  toolName: string;
17
12
  path: string | undefined;
18
13
  error: string;
14
+ reasonCode: ToolFailureReasonCode | undefined;
19
15
  recoverable: boolean;
20
16
  suggestedNextStep: string;
17
+ recoveryTool: string | undefined;
18
+ recoveryInput: unknown;
21
19
  } | {
22
20
  ok: true;
23
21
  duplicateSkipped: true;
@@ -47,8 +45,11 @@ export declare const hazeTools: {
47
45
  toolName: string;
48
46
  path: string | undefined;
49
47
  error: string;
48
+ reasonCode: ToolFailureReasonCode | undefined;
50
49
  recoverable: boolean;
51
50
  suggestedNextStep: string;
51
+ recoveryTool: string | undefined;
52
+ recoveryInput: unknown;
52
53
  } | {
53
54
  ok: true;
54
55
  duplicateSkipped: true;
@@ -85,8 +86,11 @@ export declare const hazeTools: {
85
86
  toolName: string;
86
87
  path: string | undefined;
87
88
  error: string;
89
+ reasonCode: ToolFailureReasonCode | undefined;
88
90
  recoverable: boolean;
89
91
  suggestedNextStep: string;
92
+ recoveryTool: string | undefined;
93
+ recoveryInput: unknown;
90
94
  } | {
91
95
  ok: true;
92
96
  duplicateSkipped: true;
@@ -117,8 +121,11 @@ export declare const hazeTools: {
117
121
  toolName: string;
118
122
  path: string | undefined;
119
123
  error: string;
124
+ reasonCode: ToolFailureReasonCode | undefined;
120
125
  recoverable: boolean;
121
126
  suggestedNextStep: string;
127
+ recoveryTool: string | undefined;
128
+ recoveryInput: unknown;
122
129
  } | {
123
130
  ok: true;
124
131
  duplicateSkipped: true;
@@ -148,8 +155,11 @@ export declare const hazeTools: {
148
155
  toolName: string;
149
156
  path: string | undefined;
150
157
  error: string;
158
+ reasonCode: ToolFailureReasonCode | undefined;
151
159
  recoverable: boolean;
152
160
  suggestedNextStep: string;
161
+ recoveryTool: string | undefined;
162
+ recoveryInput: unknown;
153
163
  } | {
154
164
  ok: true;
155
165
  duplicateSkipped: true;
@@ -173,8 +183,11 @@ export declare const hazeTools: {
173
183
  toolName: string;
174
184
  path: string | undefined;
175
185
  error: string;
186
+ reasonCode: ToolFailureReasonCode | undefined;
176
187
  recoverable: boolean;
177
188
  suggestedNextStep: string;
189
+ recoveryTool: string | undefined;
190
+ recoveryInput: unknown;
178
191
  } | {
179
192
  ok: true;
180
193
  duplicateSkipped: true;
@@ -197,4 +210,3 @@ export declare const hazeTools: {
197
210
  }, unknown>;
198
211
  };
199
212
  export type HazeTools = typeof hazeTools;
200
- export {};