@openharness/core 0.2.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/dist/__tests__/to-response.test.d.ts +2 -0
  2. package/dist/__tests__/to-response.test.d.ts.map +1 -0
  3. package/dist/__tests__/to-response.test.js +67 -0
  4. package/dist/__tests__/to-response.test.js.map +1 -0
  5. package/dist/__tests__/ui-stream.test.d.ts +2 -0
  6. package/dist/__tests__/ui-stream.test.d.ts.map +1 -0
  7. package/dist/__tests__/ui-stream.test.js +436 -0
  8. package/dist/__tests__/ui-stream.test.js.map +1 -0
  9. package/dist/agent.d.ts +17 -2
  10. package/dist/agent.d.ts.map +1 -1
  11. package/dist/agent.js +23 -6
  12. package/dist/agent.js.map +1 -1
  13. package/dist/index.d.ts +3 -0
  14. package/dist/index.d.ts.map +1 -1
  15. package/dist/index.js +8 -0
  16. package/dist/index.js.map +1 -1
  17. package/dist/session.d.ts +15 -0
  18. package/dist/session.d.ts.map +1 -1
  19. package/dist/session.js +21 -1
  20. package/dist/session.js.map +1 -1
  21. package/dist/types/__tests__/stream-parts.test.d.ts +2 -0
  22. package/dist/types/__tests__/stream-parts.test.d.ts.map +1 -0
  23. package/dist/types/__tests__/stream-parts.test.js +232 -0
  24. package/dist/types/__tests__/stream-parts.test.js.map +1 -0
  25. package/dist/types/stream-parts.d.ts +68 -0
  26. package/dist/types/stream-parts.d.ts.map +1 -0
  27. package/dist/types/stream-parts.js +53 -0
  28. package/dist/types/stream-parts.js.map +1 -0
  29. package/dist/types/ui-message.d.ts +44 -0
  30. package/dist/types/ui-message.d.ts.map +1 -0
  31. package/dist/types/ui-message.js +2 -0
  32. package/dist/types/ui-message.js.map +1 -0
  33. package/dist/ui-stream.d.ts +15 -0
  34. package/dist/ui-stream.d.ts.map +1 -0
  35. package/dist/ui-stream.js +255 -0
  36. package/dist/ui-stream.js.map +1 -0
  37. package/package.json +8 -11
  38. package/README.md +0 -451
package/README.md DELETED
@@ -1,451 +0,0 @@
1
- # OpenHarness
2
-
3
- Claude Code, Codex, OpenCode et al. are amazing general purpose agent harnesses that go far beyond just software development.
4
-
5
- And while Anthropic offers the Claude Agent SDK, OpenAI now offers the Codex App Server, and OpenCode has a client to connect to an OpenCode instance, these harnesses are very "heavy" to use programmatically.
6
-
7
- OpenHarness is an open source project based on Vercel's AI SDK that aims to provide the building blocks to build very capable, general-purpose agents in code. It is inspired by all of the aforementioned coding agents.
8
-
9
- ## Agents
10
-
11
- The `Agent` class is the core primitive. An agent wraps a language model, a set of tools, and a multi-step execution loop into a stateless executor that you can `run()` with a message history and new input.
12
-
13
- ```typescript
14
- import { Agent } from "@openharness/core";
15
- import { openai } from "@ai-sdk/openai";
16
- import { fsTools } from "@openharness/core/tools/fs";
17
- import { bash } from "@openharness/core/tools/bash";
18
-
19
- const agent = new Agent({
20
- name: "dev",
21
- model: openai("gpt-5.2"),
22
- systemPrompt: "You are a helpful coding assistant.",
23
- tools: { ...fsTools, bash },
24
- maxSteps: 20,
25
- });
26
- ```
27
-
28
- ### Running an agent
29
-
30
- `agent.run()` is an async generator that takes a message history and new input, and yields a stream of typed events as the agent works. The agent is **stateless** — it doesn't accumulate messages internally. You pass the conversation history in and get the updated history back in the `done` event.
31
-
32
- ```typescript
33
- import type { ModelMessage } from "ai";
34
-
35
- let messages: ModelMessage[] = [];
36
-
37
- for await (const event of agent.run(messages, "Refactor the auth module to use JWTs")) {
38
- switch (event.type) {
39
- case "text.delta":
40
- process.stdout.write(event.text);
41
- break;
42
- case "tool.start":
43
- console.log(`Calling ${event.toolName}...`);
44
- break;
45
- case "tool.done":
46
- console.log(`${event.toolName} finished`);
47
- break;
48
- case "done":
49
- messages = event.messages; // capture updated history for next turn
50
- console.log(`Result: ${event.result}, tokens: ${event.totalUsage.totalTokens}`);
51
- break;
52
- }
53
- }
54
- ```
55
-
56
- This makes it easy to build multi-turn interactions — just pass the messages from the previous `done` event into the next `run()` call. It also means you have full control over the conversation history: you can inspect it, modify it, or share it between agents.
57
-
58
- ### Events
59
-
60
- The full set of events emitted by `run()`:
61
-
62
- | Event | Description |
63
- | --- | --- |
64
- | `text.delta` | Streamed text chunk from the model |
65
- | `text.done` | Full text for the current step is complete |
66
- | `reasoning.delta` | Streamed reasoning/thinking chunk (if the model supports it) |
67
- | `reasoning.done` | Full reasoning text for the step is complete |
68
- | `tool.start` | A tool call has been initiated |
69
- | `tool.done` | A tool call completed successfully |
70
- | `tool.error` | A tool call failed |
71
- | `step.start` | A new agentic step is starting |
72
- | `step.done` | A step completed (includes token usage and finish reason) |
73
- | `error` | An error occurred during execution |
74
- | `done` | The agent has finished. `result` is one of `"complete"`, `"stopped"`, `"max_steps"`, or `"error"` |
75
-
76
- ### Configuration
77
-
78
- | Option | Default | Description |
79
- | --- | --- | --- |
80
- | `name` | (required) | Agent name, used in logging and subagent selection |
81
- | `model` | (required) | Any Vercel AI SDK `LanguageModel` |
82
- | `systemPrompt` | — | System prompt prepended to every request |
83
- | `tools` | — | AI SDK `ToolSet` — the tools the agent can call |
84
- | `maxSteps` | `100` | Maximum agentic steps before stopping |
85
- | `temperature` | — | Sampling temperature |
86
- | `maxTokens` | — | Max output tokens per step |
87
- | `instructions` | `true` | Whether to load `AGENTS.md` / `CLAUDE.md` from the project directory |
88
- | `approve` | — | Callback for tool call approval (see [Permissions](#permissions)) |
89
- | `subagents` | — | Child agents available via the `task` tool (see [Subagents](#subagents)) |
90
- | `mcpServers` | — | MCP servers to connect to (see [MCP Servers](#mcp-servers)) |
91
-
92
- ## Sessions
93
-
94
- While Agent is a stateless executor, `Session` adds the statefulness and resilience you need for interactive, multi-turn conversations. It owns the message history and handles compaction, retry, persistence, and lifecycle hooks automatically.
95
-
96
- ```typescript
97
- import { Session } from "@openharness/core";
98
-
99
- const session = new Session({
100
- agent,
101
- contextWindow: 200_000,
102
- });
103
-
104
- for await (const event of session.send("Refactor the auth module")) {
105
- switch (event.type) {
106
- case "text.delta":
107
- process.stdout.write(event.text);
108
- break;
109
- case "compaction.done":
110
- console.log(`Compacted: ${event.tokensBefore} → ${event.tokensAfter} tokens`);
111
- break;
112
- case "retry":
113
- console.log(`Retrying in ${event.delayMs}ms...`);
114
- break;
115
- case "turn.done":
116
- console.log(`Turn ${event.turnNumber} complete`);
117
- break;
118
- }
119
- }
120
- ```
121
-
122
- `session.send()` yields all the same `AgentEvent` types as `agent.run()`, plus additional session lifecycle events.
123
-
124
- ### Session configuration
125
-
126
- | Option | Default | Description |
127
- | --- | --- | --- |
128
- | `agent` | (required) | The `Agent` to use for execution |
129
- | `contextWindow` | — | Model context window size in tokens. Required for auto-compaction |
130
- | `reservedTokens` | `min(20_000, agent.maxTokens ?? 20_000)` | Tokens reserved for output |
131
- | `autoCompact` | `true` when `contextWindow` is set | Enable auto-compaction |
132
- | `shouldCompact` | — | Custom overflow detection function |
133
- | `compactionStrategy` | `DefaultCompactionStrategy()` | Custom compaction strategy |
134
- | `retry` | — | Retry config for transient API errors |
135
- | `hooks` | — | Lifecycle hooks (see [Hooks](#hooks)) |
136
- | `sessionStore` | — | Pluggable persistence backend |
137
- | `sessionId` | auto-generated UUID | Session identifier |
138
-
139
- ### Session events
140
-
141
- In addition to all `AgentEvent` types, `session.send()` yields:
142
-
143
- | Event | Description |
144
- | --- | --- |
145
- | `turn.start` | A new turn is starting |
146
- | `turn.done` | Turn completed (includes token usage) |
147
- | `compaction.start` | Compaction triggered (includes reason and token count) |
148
- | `compaction.pruned` | Tool results pruned (phase 1) |
149
- | `compaction.summary` | Conversation summarized (phase 2) |
150
- | `compaction.done` | Compaction finished (includes before/after token counts) |
151
- | `retry` | Retrying after a transient error (includes attempt count and delay) |
152
-
153
- ### Compaction
154
-
155
- When a conversation approaches the context window limit, the session automatically compacts the message history. The default strategy works in two phases:
156
-
157
- 1. **Pruning** — replaces tool result content in older messages with `"[pruned]"`, preserving the most recent ~40K tokens of context. No LLM call needed.
158
- 2. **Summarization** — when pruning isn't enough, calls the model to generate a structured summary and replaces the entire history with it.
159
-
160
- You can customize compaction at multiple levels:
161
-
162
- ```typescript
163
- import { DefaultCompactionStrategy } from "@openharness/core";
164
-
165
- // Tune the default strategy
166
- const session = new Session({
167
- agent,
168
- contextWindow: 128_000,
169
- compactionStrategy: new DefaultCompactionStrategy({
170
- protectedTokens: 60_000, // protect more recent context
171
- summaryModel: cheapModel, // use a cheaper model for summarization
172
- }),
173
- });
174
-
175
- // Or replace the strategy entirely
176
- const session = new Session({
177
- agent,
178
- contextWindow: 128_000,
179
- compactionStrategy: {
180
- async compact(context) {
181
- // your own compaction logic
182
- return { messages: [...], messagesRemoved: 0, tokensPruned: 0 };
183
- },
184
- },
185
- });
186
-
187
- // Or go fully manual
188
- const session = new Session({ agent, autoCompact: false });
189
- // ...later:
190
- for await (const event of session.compact()) { ... }
191
- ```
192
-
193
- ### Retry
194
-
195
- Transient API errors (429, 500, 502, 503, 504, 529, rate limits, timeouts) are retried automatically with exponential backoff and jitter. Retries only happen **before** any content has been streamed to the consumer — once the model starts producing output, the session commits to that attempt.
196
-
197
- ```typescript
198
- const session = new Session({
199
- agent,
200
- retry: {
201
- maxRetries: 5,
202
- initialDelayMs: 2000,
203
- maxDelayMs: 60_000,
204
- isRetryable: (error) => error.message.includes("overloaded"),
205
- },
206
- });
207
- ```
208
-
209
- ### Hooks
210
-
211
- Hooks let you intercept and customize the session lifecycle:
212
-
213
- ```typescript
214
- const session = new Session({
215
- agent,
216
- hooks: {
217
- // Modify messages before each LLM call
218
- onBeforeSend: (messages) => {
219
- return messages.filter(m => !isStale(m));
220
- },
221
- // Post-processing after each turn
222
- onAfterResponse: ({ turnNumber, messages, usage }) => {
223
- console.log(`Turn ${turnNumber}: ${usage.totalTokens} tokens`);
224
- },
225
- // Custom compaction prompt
226
- onCompaction: (context) => {
227
- return "Summarize with emphasis on code changes and file paths.";
228
- },
229
- // Custom error handling (return true to suppress)
230
- onError: (error, attempt) => {
231
- logger.warn(`Attempt ${attempt} failed: ${error.message}`);
232
- },
233
- },
234
- });
235
- ```
236
-
237
- ### Persistence
238
-
239
- Plug in any storage backend by implementing the `SessionStore` interface:
240
-
241
- ```typescript
242
- const session = new Session({
243
- agent,
244
- sessionId: "user-123-conversation-1",
245
- sessionStore: {
246
- async load(id) { return db.get(id); },
247
- async save(id, messages) { await db.set(id, messages); },
248
- async delete(id) { await db.del(id); },
249
- },
250
- });
251
-
252
- // Restore a previous session
253
- await session.load();
254
-
255
- // Messages are auto-saved after each turn, or save manually:
256
- await session.save();
257
- ```
258
-
259
- ### Direct state access
260
-
261
- The session's message history is directly readable and writable:
262
-
263
- ```typescript
264
- // Read current state
265
- console.log(session.messages.length, session.turns, session.totalUsage);
266
-
267
- // Inject or modify messages
268
- session.messages.push({ role: "user", content: "Remember: always use TypeScript." });
269
- ```
270
-
271
- ## Tools
272
-
273
- Tools use the Vercel AI SDK `tool()` helper with Zod schemas. OpenHarness ships a set of built-in tools that you can use as-is, compose, or replace entirely.
274
-
275
- ### Filesystem tools (`@openharness/core/tools/fs`)
276
-
277
- | Tool | Description |
278
- | --- | --- |
279
- | `readFile` | Read file contents (supports line offset/limit) |
280
- | `writeFile` | Write content to a file (creates parent dirs) |
281
- | `editFile` | Find-and-replace within a file |
282
- | `listFiles` | List files/directories (optionally recursive) |
283
- | `grep` | Regex search across files (skips `node_modules`, `.git`) |
284
- | `deleteFile` | Delete a file or directory |
285
-
286
- All are exported individually and also grouped as `fsTools`.
287
-
288
- ### Bash tool (`@openharness/core/tools/bash`)
289
-
290
- Runs arbitrary shell commands via `bash -c`. Configurable timeout (default 30s, max 5min) and automatic output truncation.
291
-
292
- ### Custom tools
293
-
294
- Any AI SDK-compatible tool works. Just define it with `tool()` from the `ai` package:
295
-
296
- ```typescript
297
- import { tool } from "ai";
298
- import { z } from "zod";
299
-
300
- const myTool = tool({
301
- description: "Do something useful",
302
- inputSchema: z.object({ query: z.string() }),
303
- execute: async ({ query }) => {
304
- return { result: `You asked: ${query}` };
305
- },
306
- });
307
-
308
- const agent = new Agent({
309
- name: "my-agent",
310
- model: openai("gpt-5.2"),
311
- tools: { myTool },
312
- });
313
- ```
314
-
315
- ## Permissions
316
-
317
- By default, all tool calls are allowed. To gate tool execution — for example, prompting a user for confirmation — pass an `approve` callback:
318
-
319
- ```typescript
320
- const agent = new Agent({
321
- name: "safe-agent",
322
- model: openai("gpt-5.2"),
323
- tools: { ...fsTools, bash },
324
- approve: async ({ toolName, toolCallId, input }) => {
325
- // Return true to allow, false to deny
326
- const answer = await askUser(`Allow ${toolName}?`);
327
- return answer === "yes";
328
- },
329
- });
330
- ```
331
-
332
- When a tool call is denied, a `ToolDeniedError` is thrown and surfaced to the model as a tool error, so it can adjust its approach.
333
-
334
- The callback receives a `ToolCallInfo` object:
335
-
336
- ```typescript
337
- interface ToolCallInfo {
338
- toolName: string;
339
- toolCallId: string;
340
- input: unknown;
341
- }
342
- ```
343
-
344
- The callback can be async — you can prompt a user in a terminal, show a modal in a web UI, or call an external approval service.
345
-
346
- ## Subagents
347
-
348
- Agents can delegate work to other agents. When you pass a `subagents` array, a `task` tool is automatically generated that lets the parent agent spawn child agents by name.
349
-
350
- ```typescript
351
- const explore = new Agent({
352
- name: "explore",
353
- description: "Read-only codebase exploration. Use for searching and reading files.",
354
- model: openai("gpt-5.2"),
355
- tools: { readFile, listFiles, grep },
356
- maxSteps: 30,
357
- });
358
-
359
- const agent = new Agent({
360
- name: "dev",
361
- model: openai("gpt-5.2"),
362
- tools: { ...fsTools, bash },
363
- subagents: [explore],
364
- });
365
- ```
366
-
367
- The parent model sees a `task` tool with a description listing the available subagents and their descriptions. It can call `task` with an `agent` name and a `prompt`, and the subagent runs to completion autonomously.
368
-
369
- Key behaviors:
370
-
371
- - **Fresh instance per task** — each `task` call creates a new agent with no shared conversation state
372
- - **No approval** — subagents run autonomously without prompting for permission
373
- - **No nesting** — subagents cannot themselves have subagents
374
- - **Abort propagation** — the parent's abort signal is forwarded to the child
375
- - **Concurrent execution** — the model can call `task` multiple times in one response to run subagents in parallel
376
-
377
- ### Live subagent events
378
-
379
- To observe what subagents are doing in real time, pass an `onSubagentEvent` callback:
380
-
381
- ```typescript
382
- const agent = new Agent({
383
- name: "dev",
384
- model: openai("gpt-5.2"),
385
- tools: { ...fsTools, bash },
386
- subagents: [explore],
387
- onSubagentEvent: (agentName, event) => {
388
- if (event.type === "tool.done") {
389
- console.log(`[${agentName}] ${event.toolName} completed`);
390
- }
391
- },
392
- });
393
- ```
394
-
395
- The callback receives the same `AgentEvent` types as the parent's `run()` generator.
396
-
397
- ## AGENTS.md
398
-
399
- OpenHarness supports the [AGENTS.md](https://agents.md) spec. On first run, the agent walks up from the current directory to the filesystem root looking for `AGENTS.md` or `CLAUDE.md`. The first file found is loaded and prepended to the system prompt.
400
-
401
- This is enabled by default. Set `instructions: false` to disable it.
402
-
403
- ## MCP Servers
404
-
405
- Agents can connect to [Model Context Protocol](https://modelcontextprotocol.io) servers. Tools from MCP servers are merged into the agent's toolset alongside any static tools.
406
-
407
- ```typescript
408
- const agent = new Agent({
409
- name: "dev",
410
- model: openai("gpt-5.2"),
411
- tools: { ...fsTools, bash },
412
- mcpServers: {
413
- github: {
414
- type: "stdio",
415
- command: "npx",
416
- args: ["-y", "@modelcontextprotocol/server-github"],
417
- env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN },
418
- },
419
- weather: {
420
- type: "http",
421
- url: "https://weather-mcp.example.com/mcp",
422
- headers: { Authorization: "Bearer ..." },
423
- },
424
- },
425
- });
426
-
427
- // MCP connections are established lazily on first run()
428
- for await (const event of agent.run([], "What PRs are open?")) { ... }
429
-
430
- // Clean up MCP connections when done
431
- await agent.close();
432
- ```
433
-
434
- Three transport types are supported:
435
-
436
- | Transport | Use case |
437
- | --- | --- |
438
- | `stdio` | Local servers — spawns a child process, communicates over stdin/stdout |
439
- | `http` | Remote servers via Streamable HTTP (recommended for production) |
440
- | `sse` | Remote servers via Server-Sent Events (legacy) |
441
-
442
- When multiple MCP servers are configured, tools are namespaced as `serverName_toolName` to avoid collisions. With a single server, tool names are used as-is.
443
-
444
- ## Example CLI
445
-
446
- [`example/cli.ts`](example/cli.ts) is a fully working agent CLI that ties everything together — a `Session` wrapping an `Agent` with tool approval prompts, ora spinners, streamed output, live subagent display, and a `/compact` command for manual compaction. It's a good reference for how to wire up all the primitives into an interactive application.
447
-
448
- ```bash
449
- # requires a .env file with OPENAI_API_KEY
450
- pnpm cli
451
- ```