npm - compound-agent - Versions diffs - 1.7.4 → 1.8.0 - Mend

compound-agent 1.7.4 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,56 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.8.0] - 2026-03-15
+### Added
+- **`ca improve` command**: Generates a bash script that autonomously improves the codebase using `improve/*.md` program files. Each program defines what to improve, how to find work, and how to validate changes. Options: `--topics` (filter specific topics), `--max-iters` (iterations per topic, default 5), `--time-budget` (total seconds, 0=unlimited), `--model`, `--force`, `--dry-run`. Includes `ca improve init` subcommand to scaffold an example program file.
+- **`ca watch` command**: Tails and pretty-prints live trace JSONL from infinity loop and improvement loop sessions. Supports `--epic <id>` to watch a specific epic, `--improve` to watch improvement loop traces, and `--no-follow` to print existing trace and exit. Formats tool calls, thinking blocks, token usage, and result markers into a compact, color-coded stream.
+### Fixed
+- **`git clean` scoping in improvement loop**: Bare `git clean -fd` on rollback was removing all untracked files including the script's own log directory, causing crashes. All three rollback paths now use `git clean -fd -e "$LOG_DIR/"` to exclude agent logs.
+- **Embedded dirty-worktree guard fallthrough**: In embedded mode (when improvement loop runs inside `ca loop --improve`), setting `IMPROVE_RESULT=1` on a dirty worktree did not prevent the loop body from executing. Restructured to use `if/else` so the loop body only runs inside the `else` branch.
+- **`ca watch --improve` ignoring `.latest` symlink**: The `--improve` code path had inline logic that only did reverse filename sort, bypassing the `.latest` symlink that the improvement loop maintains. Refactored `findLatestTraceFile()` with a `prefix` parameter to unify both code paths.
+- **`--topics` flag ignored in `get_topics()`**: The `TOPIC_FILTER` variable from the CLI `--topics` flag was not used in the generated bash `get_topics()` function, causing all topics to run regardless of filtering.
+- **Update-check hardening**: Switched to a lightweight npm registry endpoint, added CI environment guards, and corrected the update command shown to users.
+## [1.7.6] - 2026-03-12
+### Added
+- **`ca install-beads` command**: Standalone subcommand to install the beads CLI via the official script. Includes a platform guard (skips on Windows with `exitCode 1`), an "already installed" short-circuit, a `--yes` flag to bypass the confirmation hint (safe: never runs `curl | bash` without explicit opt-in), `spawnSync` with a 60-second timeout, and a post-install shell-reload warning. Non-TTY mode without `--yes` prints the install command as a copy-pasteable hint rather than silently doing nothing.
+### Fixed
+- **Beads hint display**: `printBeadsFullStatus` was silently swallowing the install hint message when the beads CLI was not found. The curl install command is now printed below the "not found" line.
+- **Beads hint text**: `checkBeadsAvailable` now returns the actual `curl -sSL ... | bash` install command in its message instead of a bare repo URL.
+- **Doctor fix message**: `ca doctor` now shows `Run: ca install-beads` for the missing-beads check instead of pointing to a URL.
+- **`ca knowledge` description**: Reframed from "Ask the project docs any question" to "Semantic search over project docs — use keyword phrases, not questions" in both the live prime template and the setup template, reflecting the underlying embedding RAG retrieval mechanism.
+## [1.7.5] - 2026-03-12
+### Added
+- **`ca feedback` command**: Surfaces the GitHub Discussions URL for bug reports and feature requests. `ca feedback --open` opens the page directly in the browser. Cross-platform (macOS `open`, Windows `start`, Linux `xdg-open`).
+- **Star and feedback prompt in `ca about`**: TTY sessions now see a star-us link and the GitHub Discussions URL after the changelog output.
+### Changed
+- **README overhaul**: Complete rewrite to present compound-agent as a full agentic development environment rather than a memory plugin.
+  - New thesis-driven one-liner that names category, mechanism, and benefit
+  - "What gets installed" inventory table (15 commands, 24 agent role skills, 7 hooks, 5 phase skills, 5 docs)
+  - Three principles section mapping each architecture layer to the problem it solves (Memory / Feedback Loops / Navigable Structure)
+  - "Agents are interchangeable" design principle explained in the overview
+  - Levels of use replacing flat Quick Start: memory-only, structured workflow, and factory mode with code examples
+  - `/compound:architect` promoted to its own section with 4-phase description and context-window motivation
+  - Infinity loop elevated from CLI table row to its own section with full flag examples and honest maturity note
+  - Automatic hooks table with per-hook descriptions
+  - Architecture diagram updated to reflect three-principle mapping and accurate counts
+  - Compound loop diagram updated with architect as optional upstream entry point
+  - "Open with an AI agent" entry point in the Documentation section
 ## [1.7.4] - 2026-03-11
 ### Added

package/README.md CHANGED Viewed

@@ -1,51 +1,79 @@
 # Compound Agent
-**Memory. Knowledge. Structure. Accountability. For AI coding agents.**
+> compound-agent is a Claude Code plugin that ships a self-improving development factory into your repository — persistent memory, structured multi-agent workflows, and autonomous loop execution. Fully local. Everything in git.
 [![npm version](https://img.shields.io/npm/v/compound-agent)](https://www.npmjs.com/package/compound-agent)
 [![license](https://img.shields.io/npm/l/compound-agent)](LICENSE)
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.3+-blue)](https://www.typescriptlang.org/)
-- **Memory** -- capture mistakes once, surface them forever
-- **Knowledge** -- hybrid vector search over your project docs
-- **Structure** -- 5-phase workflows with 35+ specialized agents
-- **Accountability** -- git-tracked issues, multi-agent reviews, quality gates
+AI coding agents forget everything between sessions. Each session starts with whatever context was prepared for it — nothing more. Because agents carry no persistent state, that state must live in the codebase itself, and any agent that reads the same well-structured context should be able to pick up where another left off. Compound Agent implements this: it captures mistakes once, retrieves them precisely when relevant, and can hand entire systems to an autonomous loop that processes epic by epic with no human intervention.
-Fully local. Fully offline. Everything in git.
+## What gets installed
-## Overview
+`npx ca setup` injects a complete development environment into your repository:
-AI coding agents forget everything between sessions. Compound Agent fixes this with a three-layer system: issue tracking at the foundation, semantic memory with vector search in the middle, and structured workflows with multi-agent review on top. It captures knowledge from corrections, discoveries, and completed work, then retrieves it precisely when relevant. Every cycle through the loop makes subsequent cycles smarter.
+| Component | What ships |
+|-----------|-----------|
+| 15 slash commands | `/compound:architect`, `cook-it`, `spec-dev`, `plan`, `work`, `review`, `compound`, `learn-that`, `check-that`, and more |
+| 24 agent role skills | Security reviewers, TDD pair, decomposition convoy, spec writers, test analysers, drift detectors, and more |
+| 7 automatic hooks | Fire on session start, prompt submit, tool use, tool failure, pre-compact, phase guard, and session stop |
+| 5 phase skill files | Full workflow instructions for `architect`, `spec-dev`, `cook-it`, `work`, and `review` |
+| 5 deployed docs | Workflow reference, CLI reference, skills guide, integration guide, and overview |
+This is not a memory plugin bolted onto a text editor. It is the environment your agents run inside.
+## How it works
 ```mermaid
-graph LR
-    B[SPEC DEV] --> P[PLAN]
-    P --> W[WORK]
-    W --> R[REVIEW]
-    R --> C[COMPOUND]
-    C --> M[(MEMORY)]
-    M --> P
+flowchart TD
+    A["/compound:architect\nDecompose large system\ninto epics via DDD"] -->|produces epics| L
+    subgraph L["Compound Loop — one cycle per epic"]
+        direction LR
+        S[SPEC-DEV] --> P[PLAN]
+        P --> W[WORK]
+        W --> R[REVIEW]
+        R --> C[COMPOUND]
+    end
+    C -->|writes lessons| M[(MEMORY\nJSONL + SQLite\n+ embeddings)]
+    M -->|injects context| P
     style M fill:#f9f,stroke:#333
+    style A fill:#e8f4fd,stroke:#4a9ede
 ```
+Each cycle through the loop makes the next one smarter. The architect step is optional — use it for systems too large for a single feature cycle.
 ```mermaid
 block-beta
     columns 1
-    block:L3["Workflows"]
-        A["5-phase cycle"] B["35+ specialized agents"] C["Multi-model review"]
+    block:L3["Workflows  ·  Feedback Loops"]
+        A["15 slash commands"] B["24 specialized agents"] C["Autonomous loop"]
     end
-    block:L2["Semantic Memory"]
-        D["Vector search"] E["Hybrid retrieval"] F["Cross-cutting patterns"]
+    block:L2["Semantic Memory  ·  Codebase Memory"]
+        D["Vector search"] E["Hybrid retrieval"] F["Cross-session persistence"]
     end
-    block:L1["Foundation"]
-        G["Issue tracking"] H["Git-backed sync"] I["Quality gates"]
+    block:L1["Beads Foundation  ·  Navigable Structure"]
+        G["Issue tracking"] H["Git-backed sync"] I["Dependency graphs"]
     end
     L3 --> L2
     L2 --> L1
 ```
+## Three principles
+These constraints follow from how AI agents work, and each one maps to a layer of the architecture.
+| Principle | Without it | Layer |
+|-----------|-----------|-------|
+| **Memory** | Same mistakes every session. Architectural decisions re-derived from scratch. Knowledge locked in human heads where agents cannot reach it. | Semantic Memory |
+| **Feedback loops** | Agents cannot verify their own work. Manual review is the only quality gate. Drift is the default at agent-scale output. | Structured Workflows |
+| **Navigable structure** | Context windows fill with orientation work. Agents make unverifiable assumptions about dependencies and ordering. | Beads Foundation |
+The three are not independent. Memory without feedback loops is unreliable. Feedback without navigable structure fires blindly. The system works as a whole or not at all.
 ## Is this for you?
 **"It keeps making the same mistake every session."**
@@ -54,20 +82,159 @@ Capture it once. Compound Agent surfaces it automatically before the agent repea
 **"I explained our auth pattern three sessions ago. Now it's reimplementing from scratch."**
 Architectural decisions persist as searchable lessons. Next session, they inject into context before planning starts.
-**"My agent uses pandas when we standardized on Polars months ago."**
+**"My agent uses pandas when we standardised on Polars months ago."**
 Preferences survive across sessions and projects. Once captured, they appear at the right moment.
 **"Code reviews keep catching the same class of bugs."**
-35+ specialized review agents (security, performance, architecture, test coverage) run in parallel. Findings feed back as lessons that become test requirements in future work.
+24 specialised review agents (security, performance, architecture, test coverage) run in parallel. Findings feed back as lessons that become test requirements in future work.
 **"I have no idea what my agent actually learned or if it's reliable."**
-`ca list` shows all captured knowledge. `ca stats` shows health. `ca wrong <id>` invalidates bad lessons. Everything is git-tracked JSONL -- you can read, diff, and audit it.
+`ca list` shows all captured knowledge. `ca stats` shows health. `ca wrong <id>` invalidates bad lessons. Everything is git-tracked JSONL — you can read, diff, and audit it.
 **"I want structured phases, not just 'go build this'."**
 Five workflow phases (spec-dev, plan, work, review, compound) with mandatory gates between them. Each phase searches memory and docs for relevant context before starting.
 **"My agent doesn't read the project docs before making decisions."**
-`ca knowledge "auth flow"` runs hybrid search (vector + keyword) over your indexed docs. Agents query it automatically during planning -- ADRs, specs, and standards surface before code gets written.
+`ca knowledge "auth flow"` runs hybrid search (vector + keyword) over your indexed docs. Agents query it automatically during planning — ADRs, specs, and standards surface before code gets written.
+**"I want to hand a large system spec to the machine and walk away."**
+`/compound:architect` decomposes it into epics. `ca loop` processes them autonomously.
+## Levels of use
+### Level 1 — Memory only
+Two minutes to set up. Works in any session without changing your existing workflow.
+```bash
+# Capture a mistake or preference
+ca learn "Always use Polars, not pandas in this project" --severity high
+ca learn "Auth 401 fix: add X-Request-ID header" --type solution
+# Search manually anytime
+ca search "polars"
+# Or let hooks surface it automatically — no command needed
+```
+### Level 2 — Structured workflow
+One command runs all five phases on a single feature: spec-dev, plan, work (TDD + agent team), review (24 agents), and compound (capture lessons).
+```bash
+/compound:cook-it "Add rate limiting to the API"
+```
+Run phases individually when you want more control:
+```bash
+/compound:spec-dev "Add rate limiting"    # Socratic dialogue → EARS spec → Mermaid diagrams
+/compound:plan                            # Tasks enriched by memory search
+/compound:work                            # TDD with agent team
+/compound:review                          # 24 parallel agents with severity gates
+/compound:compound                        # Capture what was learned
+```
+### Level 3 — Factory mode
+For systems too large for a single feature cycle. `/compound:architect` decomposes the system; `ca loop` processes the resulting epics autonomously.
+```bash
+# Step 1: decompose the system into epics
+/compound:architect "Multi-tenant SaaS: auth, billing, API, admin dashboard"
+# → Socratic dialogue → system-level EARS spec → DDD decomposition
+# → N epics with dependency graph, interface contracts, and scope boundaries
+# Step 2: generate and run the loop
+ca loop --reviewers claude-sonnet --review-every 3
+./infinity-loop.sh
+# → Processes each epic in dependency order: spec-dev → plan → work → review → compound
+# → Captures lessons after every cycle, improving subsequent cycles
+```
+## The infinity loop
+`ca loop` generates a bash script that processes your beads epics sequentially, running the full cook-it cycle on each one. No human intervention required between epics.
+```bash
+# Generate script for all ready epics
+ca loop
+# With periodic review every 3 epics
+ca loop --reviewers claude-sonnet --review-every 3
+# Target specific epics
+ca loop --epics beads-abc beads-def beads-ghi --max-retries 2
+# Run it
+./infinity-loop.sh
+```
+The loop respects beads dependency graphs — it only processes epics whose dependencies are complete. If an epic fails after `--max-retries` attempts, it stops and reports before proceeding.
+**Current maturity**: the loop works and has been used to ship real projects, including compound-agent itself. Two things still required human involvement: specifications had to be written before the loop started, and a human applied fixes after the first review pass surfaced real problems (missing error handling, a migration gap, insufficient test coverage). Fully unattended long-duration runs across many epics are the current area of hardening.
+## The improvement loop
+`ca improve` generates a bash script that iterates over `improve/*.md` program files, spawning Claude Code sessions to make focused improvements. Each program file defines what to improve, how to find work, and how to validate changes.
+```bash
+# Scaffold an example program file
+ca improve init
+# Creates improve/example.md with a linting template
+# Generate the improvement script
+ca improve
+# Filter to specific topics
+ca improve --topics lint tests --max-iters 3
+# Preview without generating
+ca improve --dry-run
+# Run the generated script
+./improvement-loop.sh
+# Preview without executing Claude sessions
+IMPROVE_DRY_RUN=1 ./improvement-loop.sh
+```
+Each iteration makes one focused improvement, commits it, and moves on. If an iteration finds nothing to improve or fails validation, it reverts cleanly and moves to the next topic. The loop tracks consecutive no-improvement results and stops early to avoid diminishing returns.
+Monitor progress with `ca watch --improve` to see live trace output from improvement sessions.
+## Automatic hooks
+Once installed, seven Claude Code hooks fire without any commands:
+| Hook | When it fires | What it does |
+|------|--------------|--------------|
+| `SessionStart` | Every new session | Loads high-severity lessons into context before you type anything |
+| `PreCompact` | Before context compression | Saves phase state so cook-it survives compaction |
+| `UserPromptSubmit` | Every prompt | Injects relevant memory items into context |
+| `PreToolUse` | During cook-it | Enforces phase gates — prevents jumping ahead |
+| `PostToolUse` | After tool success | Clears failure tracking state |
+| `PostToolUseFailure` | After tool failure | Tracks failures; suggests memory search after repeated errors |
+| `Stop` | Session end | Audits session for uncaptured lessons and unclosed issues |
+No configuration needed. `npx ca setup` wires them into your `.claude/settings.json`.
+## `/compound:architect`
+AI agents work best on well-scoped problems. When a task exceeds what fits comfortably in one context window, quality degrades — not from lack of capability but from too many competing concerns pulling in different directions.
+`/compound:architect` addresses this before the cook-it cycle begins. It takes a large system description and produces cook-it-ready epics via a structured 4-phase process:
+1. **Socratic** — builds a domain glossary and discovery mindmap; classifies decisions by reversibility
+2. **Spec** — produces system-level EARS requirements, C4 architecture diagrams, and a scenario table
+3. **Decompose** — runs 6 parallel subagents (bounded context mapping, dependency analysis, scope sizing, interface design, STPA hazard analysis, structural-semantic gap analysis) then synthesises into a proposed epic structure
+4. **Materialise** — creates beads epics with scope boundaries, interface contracts, and wired dependencies
+Three human approval gates separate the phases. Each output epic is sized for one cook-it cycle and includes an EARS subset for traceability back to the system spec.
+```bash
+/compound:architect "Build a data pipeline: ingestion, transformation, storage, and API layer"
+```
 ## Installation
@@ -104,36 +271,6 @@ If you prefer to configure manually, add to your `package.json`:
 Then run `pnpm install`.
-## Quick Start
-The five-phase workflow:
-```
-1. /compound:spec-dev    -->  Develop precise specifications
-2. /compound:plan        -->  Create tasks enriched by memory search
-3. /compound:work        -->  Execute with agent teams + TDD
-4. /compound:review      -->  Multi-agent review with inter-communication
-5. /compound:compound    -->  Capture what was learned into memory
-```
-Or run all phases sequentially:
-```
-/compound:cook-it "Add auth to API"
-```
-Additional commands:
-```
-/compound:learn-that       -->  Capture a lesson from conversation context
-/compound:check-that       -->  Search lessons and apply to current work
-/compound:get-a-phd        -->  Deep research to build agent knowledge
-/compound:agentic-audit    -->  Score codebase against agentic manifesto
-/compound:agentic-setup    -->  Audit then set up agentic infrastructure
-```
-Each phase searches memory for relevant past knowledge and injects it into agent context. The compound phase captures new knowledge, closing the loop.
 ## CLI Reference
 The CLI binary is `ca` (alias: `compound-agent`).
@@ -195,6 +332,17 @@ The CLI binary is `ca` (alias: `compound-agent`).
 | `ca loop --max-review-cycles <n>` | Max review/fix iterations (default: 3) |
 | `ca loop --review-blocking` | Fail loop if review not approved after max cycles |
 | `ca loop --review-model <model>` | Model for implementer fix sessions (default: claude-opus-4-6) |
+| `ca improve` | Generate improvement loop script from `improve/*.md` programs |
+| `ca improve --topics <names...>` | Run only specific topics |
+| `ca improve --max-iters <n>` | Max iterations per topic (default: 5) |
+| `ca improve --time-budget <seconds>` | Total time budget, 0=unlimited (default: 0) |
+| `ca improve --dry-run` | Validate and print plan without generating |
+| `ca improve --force` | Overwrite existing script |
+| `ca improve init` | Scaffold an example `improve/*.md` program file |
+| `ca watch` | Tail and pretty-print live trace from loop sessions |
+| `ca watch --epic <id>` | Watch a specific epic trace |
+| `ca watch --improve` | Watch improvement loop traces |
+| `ca watch --no-follow` | Print existing trace and exit (no live tail) |
 ### Knowledge
@@ -210,7 +358,7 @@ The CLI binary is `ca` (alias: `compound-agent`).
 | `ca setup` | One-shot setup (hooks + git pre-commit + model) |
 | `ca setup --skip-model` | Setup without model download |
 | `ca setup --uninstall` | Remove all generated files |
-| `ca setup --update` | Regenerate files (preserves user customizations) |
+| `ca setup --update` | Regenerate files (preserves user customisations) |
 | `ca setup --status` | Show installation status |
 | `ca setup --dry-run` | Show what would change without changing |
 | `ca setup claude --status` | Check Claude Code integration health |
@@ -243,7 +391,7 @@ confirmation_boost: confirmed=1.3, unconfirmed=1.0
 ## FAQ
 **Q: How is this different from mem0?**
-A: mem0 is a cloud memory layer for general AI agents. Compound Agent is local-first with git-tracked storage and local embeddings -- no API keys or cloud services needed. It also goes beyond memory with structured workflows, multi-agent review, and issue tracking.
+A: mem0 is a cloud memory layer for general AI agents. Compound Agent is local-first with git-tracked storage and local embeddings — no API keys or cloud services needed. It also goes beyond memory with structured workflows, multi-agent review, and issue tracking.
 **Q: Does this work offline?**
 A: Yes, completely. Embeddings run locally via node-llama-cpp. No network requests after the initial model download.
@@ -257,6 +405,9 @@ A: The CLI (`ca`) works standalone with any tool. Full hook integration is avail
 **Q: What happens if the embedding model isn't available?**
 A: Search gracefully falls back to keyword-only mode. Other commands that require embeddings will tell you what's missing. Run `npx ca doctor` to diagnose issues.
+**Q: Is the loop production-ready?**
+A: The loop works and has been used to ship real projects, including compound-agent itself. Long-duration autonomous runs across many epics are the current area of hardening. For 3–5 epic sequences, it is reliable today.
 ## Development
 ```bash
@@ -297,13 +448,15 @@ pnpm lint             # Type check + ESLint
 | [CHANGELOG.md](https://github.com/Nathandela/compound-agent/blob/main/CHANGELOG.md) | Version history |
 | [AGENTS.md](https://github.com/Nathandela/compound-agent/blob/main/AGENTS.md) | Agent workflow instructions |
+The most direct way to explore the system is to open this repository with an AI agent and ask it to walk you through the design — the project is structured precisely for that.
 ## Acknowledgments
 Compound Agent builds on ideas and patterns from these projects:
 | Project | Influence |
 |---------|-----------|
-| [Compound Engineering Plugin](https://github.com/EveryInc/compound-engineering-plugin) | The "compound" philosophy -- each unit of work makes subsequent units easier. Multi-agent review workflows and skills as encoded knowledge. |
+| [Compound Engineering Plugin](https://github.com/EveryInc/compound-engineering-plugin) | The "compound" philosophy — each unit of work makes subsequent units easier. Multi-agent review workflows and skills as encoded knowledge. |
 | [Beads](https://github.com/steveyegge/beads) | Git-backed JSONL + SQLite hybrid storage model, hash-based conflict-free IDs, dependency graphs |
 | [OpenClaw](https://github.com/openclaw/openclaw) | Claude Code integration patterns and hook-based workflow architecture |
@@ -311,10 +464,10 @@ Also informed by research into [Reflexion](https://arxiv.org/abs/2303.11366) (ve
 ## Contributing
-Bug reports and feature requests are welcome via [Issues](https://github.com/Nathandela/compound-agent/issues). Pull requests are not accepted at this time -- see [CONTRIBUTING.md](CONTRIBUTING.md) for details.
+Bug reports and feature requests are welcome via [Issues](https://github.com/Nathandela/compound-agent/issues). Pull requests are not accepted at this time — see [CONTRIBUTING.md](CONTRIBUTING.md) for details.
 ## License
-MIT -- see [LICENSE](LICENSE) for details.
+MIT — see [LICENSE](LICENSE) for details.
 > The embedding model (EmbeddingGemma-300M) is downloaded on-demand and subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms). See [THIRD-PARTY-LICENSES.md](THIRD-PARTY-LICENSES.md) for full dependency license information.