npm - typeclaw - Versions diffs - 0.37.2 → 0.37.4 - Mend

typeclaw 0.37.2 → 0.37.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/README.md +71 -47
package/package.json +1 -1
package/src/agent/compaction.ts +24 -15
package/src/agent/session-origin.ts +101 -173
package/src/agent/system-prompt.ts +46 -48
package/src/bundled-plugins/memory/index.ts +24 -27
package/src/bundled-plugins/memory/load-memory.ts +78 -35
package/src/bundled-plugins/memory/turn-dedup.ts +32 -29
package/src/bundled-plugins/tool-result-cap/README.md +7 -7
package/src/bundled-plugins/tool-result-cap/index.ts +1 -1
package/src/channels/adapters/discord-bot.ts +11 -4
package/src/channels/adapters/mention-hints.ts +58 -0
package/src/channels/adapters/slack-bot.ts +8 -2
package/src/channels/continuation-willingness.ts +265 -53
package/src/channels/router.ts +105 -3
package/src/cli/init.ts +41 -7
package/src/cli/qr.ts +4 -3
package/src/cli/ui.ts +8 -4
package/src/doctor/checks.ts +145 -2
package/src/hostd/tailscale.ts +12 -1
package/src/init/index.ts +35 -8
package/src/init/run-bun-install.ts +71 -37
package/src/inspect/transcript-view.ts +15 -2
package/src/portbroker/hostd-client.ts +32 -6
package/src/shared/index.ts +4 -0
package/src/shared/platform.ts +11 -0
package/src/shared/wsl.ts +139 -0
package/src/tui/index.ts +26 -8
package/src/tui/terminal-guard.ts +139 -0
package/typeclaw.schema.json +2 -2

package/README.md CHANGED Viewed

@@ -1,59 +1,85 @@
-# TypeClaw
 <p align="center">
-  <img src="./docs/public/typeclaw.png" alt="TypeClaw logo" width="240" />
+  <img src="./docs/public/typeclaw-transparent.png" alt="TypeClaw logo" width="240" />
 </p>
-> The agent for perfectionists — crafted in every detail. It behaves in your team's chat and gets sharper the longer it runs. Sandboxed and self-managing.
+<h3 align="center">TypeClaw: The agent for perfectionists</h3>
+<p align="center">Crafted in every detail – it behaves in your team's chat and<br />gets sharper the longer it runs. Sandboxed and self-managing.</p>
+<br />
+<br />
+## Self-improving — a learning loop, not a black box
+- 🌱 **Memory** — logs its own work to a daily stream as it goes
+- 💤 **Dreaming** — a subagent distills each day's work into long-term memory, committed to git as plain files you can read, diff, and revert
+- 🧠 **Muscle memory** — recurring procedures become reusable skills it writes for itself and loads on later runs
+- 🔎 **Optional embedding recall** — hybrid keyword-and-embedding search over the same markdown memory, off by default; the plain files remain the durable source of truth
+## Group chat — knows when not to talk
+- 👥 **Room awareness** — knows who's present and tells humans from bots, so it stays quiet when people are talking to each other rather than chiming in on messages it wasn't part of
+- 💬 **Sticky engagement** — holds an ongoing thread after replying without needing to be re-mentioned, then steps back when the conversation moves on; multilingual continuation detection, peer-bot loop guards, and flood filters keep it from spiraling
+## Channels — one agent, many inboxes
+- 📨 **Supported channels** — Slack, Discord, Telegram, LINE, KakaoTalk, GitHub, and a websocket TUI, driven by the same agent
+- ✅ **Pull-request review** — treats a GitHub PR as a conversation, reviewing as a participant, with guards against claiming a verdict it didn't actually post and against leaving a PR stranded
+## Web & research — reads the web like a person
+- 🔍 **Live web search & fetch** — pull a page as a readable article, a JSON query, a selected slice, a grep, or raw
+- 🪪 **Browser-like fetching** — replays a Chrome-like TLS/HTTP fingerprint to get past many generic-client blocks; CAPTCHA and IP-reputation gates can still fail
+- 🌐 **Interactive browser sessions** — drives a browser on live pages, with a dashboard you can step into for logins, 2FA, or CAPTCHA
+## Security — defense-in-depth for risky actions
+- 🛡 **Layered guards** — stop secret exfiltration, SSRF, prompt injection, rogue git pushes, and silent privilege escalation before they fire
+- 🪪 **Roles** — owner, trusted, member, and guest gate privileged actions
+- 🔑 **Permissions** — per-channel match rules decide who can ask for what; an untrusted channel user can't trigger privileged behavior
+- 🔒 **Encryption at rest** — sensitive channel passwords are sealed with authenticated encryption; the key is host-held and isn't passed into the container during normal operation
+## Isolation & sandbox — runs clean, stays out of each other's way
+- 🐳 **No machine clutter** — agent runtime state lives in its own folder and container; apart from the TypeClaw CLI install, it doesn't scatter services or config across your machine, and stopping it shuts the running pieces down, leaving a folder you can keep, copy, or delete
+- 🧩 **No cross-agent interference** — run as many as you like; each gets its own container, files, memory, and even its own browser, so one can read a page while another drives a different one
+- 📁 **Self-contained folder** — settings, memory, and connections live together in the agent's folder, kept as a version history you can review, undo, or back up
+## Subagents — delegation in a fresh context
-## Why?
+- 🪄 **A bench of specialists** — it hands off research, planning, code review, and hands-on execution to focused child sessions, each with its own prompt, tools, and model
+- 🔀 **Sync or background** — spawn and block for a result, or spawn in the background and collect completions later; coalescing prevents duplicate concurrent runs and depth limits keep delegation chains bounded
-There are great agents out there. None of them were quite the shape I wanted:
+## Extensibility — teach it new tricks in TypeScript
-- **OpenClaw** — feature-rich, but heavy
-- **NanoClaw** — simple, but no plugin system
-- **PicoClaw** — fast, but Go (so plugins live outside the runtime)
-- **ZeroClaw** — light, but Rust (same problem, different ecosystem)
-- **Hermes Agent** — awesome, but Python
+- 🔌 **Plugins are just imports** — a plugin is a plain TypeScript file that imports the runtime and adds tools, skills, channels, and commands; no IPC, no FFI, no DSL, distributed as packages and resolved like any dependency
+- 🛰 **MCP support** — connect external MCP servers over stdio or HTTP; their tools become the agent's tools
+- 📚 **Skills on demand** — markdown procedures load lazily when selected, so they avoid prompt-token cost until used; skills layer from bundled, your own, and what the agent learns
+- ⚙️ **Typed config with hot reload** — most config changes take effect live; boot-only fields are flagged restart-required
-None of that matters to most people. It matters to me. If you're like me, TypeClaw is the right choice.
+## Connectivity — reachable wherever you need it
-TypeClaw is the agent I wanted to use:
+- 🌍 **Auto port-forward** — services inside the container appear on your `localhost`, including loopback-only ones
+- 🚇 **Public tunnels** — a zero-signup public URL out of the box, or bring your own; webhooks self-register at the resulting URL
+- 🔗 **Private network access** — forwarded ports can publish to a private network when configured
-- **TypeScript end to end** — agent core, plugins, channel adapters, CLI, TUI all in one language
-- **Bun-native plugins** — plugins are just TS modules; no IPC, no FFI, hot-reloadable config
-- **Docker-friendly by default** — every agent runs in its own container; the host CLI is purely a launcher
-- **Self-improving** — the agent observes its own work, distills it into sharded long-term memory and reusable skills, and gets sharper over time without you writing prompts for it
+## Self-managing — operational autonomy, on a budget
-If you're like me, TypeClaw is the right choice. If not, that's fine too.
+- 💾 **Self-backup** — commits and pushes its own state during idle windows, with a generated commit message
+- 🔁 **Self-restart** — can rebuild and restart its own container when it needs to, through the host daemon
+- ♻️ **Self-continuation** — keeps working through an unfinished task list when you step away, bounded by a turn, token, and wall-clock budget
-## What you'd expect
+## Operator CLI — see what it's doing and what it costs
-- 🐳 **Sandboxed by default** — every agent runs in its own Docker container with `.env` injection and bind-mounted host folders
-- 🔌 **Plugin system** — plain TypeScript modules contribute tools, skills, subagents, channels, commands, and typed config
-- 💬 **Multi-channel** — Slack, Discord, Telegram, LINE, KakaoTalk, GitHub webhooks, and a websocket TUI; one agent, many inboxes
-- ⏰ **Cron** — schedule prompts or shell commands; per-job coalescing so slow jobs don't pile up
-- 📚 **Skills on demand** — markdown procedures the agent loads only when relevant; zero token cost until used
-- 🔎 **Web research** — bundled `scout` subagent plus first-class `web_search` and `web_fetch` tools (DuckDuckGo via curl-impersonate, Wikipedia)
-- 🛡 **Security guards** — bundled `tool.before` policies catch secret exfil, SSRF, prompt injection, tainted git remotes, and silent privilege escalation (role/cron promotion) before they fire
-- 📊 **Usage, inspect, doctor** — `typeclaw usage` reports token/$ spend per session, model, or day; `typeclaw inspect` replays a session transcript and tails live activity; `typeclaw doctor` diagnoses host, agent folder, and plugin state
+- 🩺 **doctor** — diagnoses host, agent folder, config, and channels, with auto-fix for managed files
+- 📊 **usage** — reports token and dollar spend by day, model, session, or origin
+- 🔍 **inspect** — replays a session transcript and tails live activity
+- 📜 **logs** — streams container logs with local-time prefixes
-## Where it goes further
+## Compose — manage a fleet from the CLI
-- 🌱 **Self-improving** — bundled `memory` plugin logs sessions to daily streams, then a `dreaming` subagent distills them into sharded long-term memory (`memory/topics/`) on its own schedule; no prompts to write
-- 🧠 **Muscle memory** — repeated procedures get distilled into reusable skills the agent writes for itself and loads on later runs
-- 💾 **Auto-backup** — the bundled `backup` plugin commits session logs and memory on every idle window with an LLM-generated commit subject
-- 🪄 **Subagents** — first-class child sessions with their own system prompt, payload schema, and per-payload coalescing; cron and the main agent fire them through one in-process Stream
-- 🪪 **Roles and permissions** — `owner` / `trusted` / `member` / `guest` with first-message match rules per channel; gates `channel.respond`, cron scheduling, and security bypasses, so a Slack stranger can't tell the agent to push to main
-- 👥 **Group chat awareness** — knows who's in the room, distinguishes humans from bots, and stays engaged after a reply without re-mentioning
-- 🧱 **Managed-file guards** — `typeclaw.json`, `cron.json`, memory shards, and bundled skills are protected from accidental rewrites; invalid config writes and silent role/cron privilege grants are rejected at the tool boundary
-- 🌐 **Headed browser inside the container** — bundled `agent-browser` plugin ships Chrome under Xvfb so the agent can drive real web pages past bot fingerprinting
-- 🌍 **Tunnels and auto port-forward** — dev servers inside the container appear on `localhost` (even loopback-only ones); public URLs via Cloudflare Quick (zero signup) or your own external URL, with GitHub webhooks self-registered at the resulting URL
-- 🔄 **Hot reload** — change `typeclaw.json`, run `typeclaw reload` — no restart for most fields
-- 🔁 **Self-restart** — the agent can bounce its own container when it updates itself
-- 🎼 **Compose** — orchestrate multiple agents across multiple folders
+- 🎼 **Fleet operations** — discover agent folders and start, stop, restart, check status, tail logs, report usage, and run diagnostics across them from the command line
-Memory loop and subagent architecture are covered in detail in [AGENTS.md](./AGENTS.md) and [`src/bundled-plugins/memory/README.md`](./src/bundled-plugins/memory/README.md).
+Memory loop and subagent architecture are covered in detail in the [Internals docs](https://typeclaw.dev/docs/internals) and [`src/bundled-plugins/memory/README.md`](./src/bundled-plugins/memory/README.md).
 ## Install
@@ -67,14 +93,12 @@ Requires Bun ≥ 1.1 and Docker (or OrbStack) on the host.
 ```sh
 mkdir my-agent && cd my-agent
-typeclaw init        # scaffold typeclaw.json, .env, Dockerfile, package.json
-typeclaw start       # build + run the container
-typeclaw tui         # attach a terminal UI to the running agent
+typeclaw init        # scaffold, build, run the container, and attach a TUI
 ```
-That's it. The agent is now alive, listening on a websocket, ready to receive prompts from the TUI or any wired channel.
+That's it. `init` hatches the agent end to end — it scaffolds the folder (`typeclaw.json`, `.env`, `Dockerfile`, `package.json`), builds and runs the container, then drops you into a terminal UI. The agent is now alive, listening on a websocket, ready to receive prompts from the TUI or any wired channel.
-See `typeclaw --help` for the full command surface, or [typeclaw.dev](https://typeclaw.dev) for guides and configuration reference.
+For later sessions, `typeclaw start` runs the container and `typeclaw tui` re-attaches. See `typeclaw --help` for the full command surface, or [typeclaw.dev](https://typeclaw.dev) for guides and configuration reference.
 ## Development
@@ -93,7 +117,7 @@ bun run lint
 bun run format
 ```
-See [CONTRIBUTING.md](./CONTRIBUTING.md) for the recommended local dev loop (`bun link` → `typeclaw init`), commit and PR conventions, and where to ask questions. See [AGENTS.md](./AGENTS.md) for the long-form architecture notes — stages, hostd internals, message stream, plugin contracts, and the testing philosophy. The docs site at [typeclaw.dev](https://typeclaw.dev) lives in [`docs/`](./docs/).
+See [CONTRIBUTING.md](./CONTRIBUTING.md) for the recommended local dev loop (`bun link` → `typeclaw init`), commit and PR conventions, and where to ask questions. The [Internals docs](https://typeclaw.dev/docs/internals) cover the long-form architecture notes — stages, hostd internals, message stream, plugin contracts, and the testing philosophy. The docs site at [typeclaw.dev](https://typeclaw.dev) lives in [`docs/`](./docs/).
 ## Acknowledgments

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "typeclaw",
-  "version": "0.37.2",
+  "version": "0.37.4",
   "homepage": "https://github.com/typeclaw/typeclaw#readme",
   "bugs": {
     "url": "https://github.com/typeclaw/typeclaw/issues"

package/src/agent/compaction.ts CHANGED Viewed

@@ -1,27 +1,36 @@
 import type { KnownApi, Model } from '@mariozechner/pi-ai'
 import { SettingsManager } from '@mariozechner/pi-coding-agent'
-// Compaction trigger threshold expressed as a percentage of the model's
-// context window. pi-coding-agent's auto-compaction fires when
-// `contextTokens > contextWindow - reserveTokens`. To honor a percentage-
-// based intent across models with very different window sizes (200K Claude
-// vs. 1M Gemini vs. 256K Kimi), we derive `reserveTokens` per-model from
-// the model's `contextWindow`. SDK defaults (16384 reserve) are a fixed
-// number of tokens that drift in relative terms across models — at 256K
-// that's ~6% headroom (94% trigger), at 1M it's ~1.6% (98% trigger). A
-// percentage-derived reserve trips at the same fraction regardless of
-// model, which is what we actually want.
+// Compaction trigger expressed as a fraction of the model's context window.
+// pi-coding-agent auto-compaction fires when `contextTokens > contextWindow -
+// reserveTokens`; deriving `reserveTokens` from the window keeps the trigger at
+// the same fraction across models with very different windows (200K Claude vs.
+// 1M Gemini vs. 256K Kimi) instead of the SDK's fixed 16384 reserve, which
+// drifts to ~94% on a 256K window and ~98% on 1M.
 export const COMPACTION_TRIGGER_PERCENT = 0.8
+// Absolute ceiling on the compaction trigger, independent of window size. The
+// window-relative trigger alone optimizes for overflow avoidance, not token
+// cost: at 80% of a large window a session accumulates ~160K (200K window) to
+// ~800K (1M window) tokens of history that get re-shipped as `cacheRead` every
+// turn before compaction ever fires. Capping the trigger bounds that
+// steady-state re-read on big-window models; `min()` keeps the 80% behavior on
+// small ones. 64K is 3x keepRecent (invariant asserted in the test), leaving
+// growth room after a compaction so it does not retrigger immediately.
+export const COMPACTION_ABSOLUTE_TRIGGER_TOKENS = 64_000
 // Tokens to keep in the recent window after compaction. Fixed (not a
-// percentage) because "recent context" is a property of conversation
-// shape, not model capacity — the same recent ~20K is roughly the right
-// amount of history regardless of whether the model has 200K or 1M total.
-// Mirrors pi's DEFAULT_COMPACTION_SETTINGS.keepRecentTokens.
+// percentage) because "recent context" is a property of conversation shape, not
+// model capacity. Mirrors pi's DEFAULT_COMPACTION_SETTINGS.keepRecentTokens.
 export const COMPACTION_KEEP_RECENT_TOKENS = 20_000
+export function compactionTriggerTokens<TApi extends KnownApi>(model: Model<TApi>): number {
+  const windowRelative = Math.round(model.contextWindow * COMPACTION_TRIGGER_PERCENT)
+  return Math.min(windowRelative, COMPACTION_ABSOLUTE_TRIGGER_TOKENS)
+}
 export function reserveTokensForModel<TApi extends KnownApi>(model: Model<TApi>): number {
-  return Math.max(1, Math.round(model.contextWindow * (1 - COMPACTION_TRIGGER_PERCENT)))
+  return Math.max(1, model.contextWindow - compactionTriggerTokens(model))
 }
 export function createCompactionSettingsManager<TApi extends KnownApi>(model: Model<TApi>): SettingsManager {