beth-copilot 2.0.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -8,6 +8,18 @@ All notable changes to Beth are documented here. Format based on [Keep a Changel
8
8
 
9
9
  ## [Unreleased]
10
10
 
11
+ ## [2.1.0] - 2026-03-16
12
+
13
+ ### Added
14
+ - **`npx beth-copilot uninstall` command** — Cleanly removes all Beth-installed files from a project: `.github/agents/`, `.github/skills/`, `.github/hooks/`, `AGENTS.md`, `Backlog.md`, `.github/copilot-instructions.md`, `.vscode/settings.json`, `mcp.json.example`, and `backlog/` directory. Removes Beth guard block from pre-push hook (preserving non-Beth content). Cleans up empty `.github/` and `.vscode/` directories. 17 tests covering all removal paths.
15
+ - **Auto-derived backlog prefix** — `backlog init` during `beth-copilot init` now automatically derives a 6-letter prefix from the project name (e.g., `my-app` → `MYAPP`), eliminating the interactive prompt that blocked agent workflows.
16
+
17
+ ### Fixed
18
+ - **Shell command injection in backlog init** — Fixed GHAS-flagged command injection vulnerability where unsanitized project directory names were interpolated into shell commands. Now validates input against a strict allowlist pattern before use.
19
+
20
+ ### Changed
21
+ - **885 tests** — Up from 860 in v2.0.0. Added uninstall command tests and init prefix derivation coverage.
22
+
11
23
  ## [2.0.0] - 2026-03-16
12
24
 
13
25
  ### Breaking Changes
package/README.md CHANGED
@@ -21,12 +21,12 @@ She commands seven specialized agents, each with their own expertise, tools, and
21
21
  | Layer | What It Does | Status |
22
22
  |-------|-------------|--------|
23
23
  | **Copilot Agents** | `.agent.md` definitions running in VS Code Agent Mode | Live |
24
- | **CLI Toolchain** | `beth init`, `beth doctor`, `beth close`, `beth land` — TypeScript commands | Live |
24
+ | **CLI Toolchain** | `beth init`, `beth doctor`, `beth land`, `beth update` — TypeScript commands | Live |
25
25
  | **Orchestration Engine** | Fan-out routing, tool calling loop, subagent spawning, handoffs | Live |
26
- | **Tool Abstraction** | 6 CLI tools + MCP bridge uniform interface for all agent capabilities | Live |
26
+ | **Agent Tools** | Copilot built-ins (codebase, readFile, editFiles, runSubagent) + optional MCP servers | Live |
27
27
  | **LLM Provider** | Azure OpenAI with Entra ID auth, streaming, retry, tool calling | Live |
28
28
 
29
- **478 tests.** 477 pass, 1 skip, 0 fail.
29
+ **860 tests.** All passing.
30
30
 
31
31
  ---
32
32
 
@@ -55,7 +55,7 @@ flowchart LR
55
55
  | **LLM Provider** | Azure OpenAI via `openai` SDK | Entra ID auth (no API keys), streaming + tool calling |
56
56
  | **Auth** | `@azure/identity` DefaultAzureCredential | az login, managed identity, VS Code creds |
57
57
  | **Frontmatter** | `gray-matter` | Parses `.agent.md` and `SKILL.md` YAML |
58
- | **Testing** | vitest + Node.js test runner | 478 tests — unit, integration, E2E |
58
+ | **Testing** | vitest | 860 tests — unit, integration, E2E |
59
59
  | **Task Tracking** | Backlog.md (`backlog` CLI) | Markdown-based task tracking for agents and humans |
60
60
  | **Package Manager** | npm | Lockfile committed |
61
61
 
@@ -97,9 +97,10 @@ For detailed setup (prerequisites, task tracking, MCP servers): [docs/INSTALLATI
97
97
  | `beth doctor` | Validate Node.js ≥18, agents frontmatter, skills |
98
98
  | `beth quickstart` | Run init + doctor in one shot |
99
99
  | `beth land` | Automate session completion: tests, commit, push, verify sync |
100
+ | `beth update` | Update project files to latest templates without full re-init |
100
101
  | `beth help` | Show all commands and options |
101
102
 
102
- **Flags:** `--force`, `--skip-backlog`, `--skip-mcp`, `--verbose`, `--skip-tests`, `--message/-m`, `--dry-run`
103
+ **Flags:** `--force`, `--skip-backlog`, `--skip-mcp`, `--verbose`, `--skip-tests`, `--message/-m`, `--dry-run`, `--check-only`
103
104
 
104
105
  ---
105
106
 
@@ -234,7 +235,7 @@ Skills are domain-knowledge modules that agents load automatically when trigger
234
235
  | **React/Next.js Best Practices** | React performance, Next.js patterns | Developer |
235
236
  | **shadcn/ui** | "shadcn", "ui component" | Developer |
236
237
  | **Security Analysis** | "security review", "OWASP", "threat model" | Security Reviewer |
237
- | **Azure Operations** | Azure resource management | Developer |
238
+ | **Azure Operations** | Azure resource management (27+ Azure skills) | Developer |
238
239
  | **Web Search** | Internet research via Brave | Researcher |
239
240
 
240
241
  ### Design & UI Skills
@@ -287,19 +288,21 @@ flowchart LR
287
288
 
288
289
  ---
289
290
 
290
- ## Tool Abstraction Layer
291
+ ## Agent Tools
291
292
 
292
- A uniform interface for all agent capabilities file I/O, terminal, search, task tracking, subagent spawning, and MCP server tools. Tools expose OpenAI-compatible function calling schemas so the LLM can invoke them directly.
293
+ Beth's agents leverage VS Code Copilot's built-in tools alongside task tracking through the `backlog` CLI. The orchestration layer delegates to these capabilities:
293
294
 
294
- | Tool | What It Does | Key Features |
295
- |------|-------------|-------------- |
296
- | **readFile** | Read file contents | Line ranges, path validation, traversal guards |
297
- | **editFile** | Atomic string replacement | Single-match enforcement, whitespace-safe |
298
- | **search** | Ripgrep search | Node.js fallback, regex support, file filtering |
299
- | **terminal** | Execute shell commands | `execFile('/bin/sh')` — no shell injection, timeouts |
300
- | **backlog** | Task tracking | `backlog task create`, `backlog board`, `backlog task edit` via CLI |
301
- | **subagent** | Spawn nested agents | Returns structured result for orchestrator to process |
302
- | **MCP Bridge** | External tool servers | JSON-RPC 2.0 over stdio, JSONC config, namespaced tools |
295
+ | Tool | What It Does |
296
+ |------|-------------|
297
+ | **codebase** | Semantic code search across the workspace |
298
+ | **readFile** | Read file contents with line ranges |
299
+ | **editFiles** | Atomic file modifications |
300
+ | **runInTerminal** | Shell command execution |
301
+ | **runSubagent** | Spawn specialist agents autonomously |
302
+ | **backlog CLI** | `backlog task create`, `backlog board`, `backlog task edit` for tracking |
303
+ | **MCP servers** | Optional external tools (shadcn, Playwright, Azure, Brave Search) |
304
+
305
+ ### Public API
303
306
 
304
307
  ```typescript
305
308
  import { loadAgents, loadSkills, getInferableAgents, buildTriggerMap } from 'beth-copilot';
@@ -329,112 +332,105 @@ flowchart LR
329
332
  CLI["beth"] --> Init["init"]
330
333
  CLI --> Doctor["doctor"]
331
334
  CLI --> QS["quickstart"]
335
+ CLI --> Land["land"]
336
+ CLI --> Update["update"]
332
337
  Init --> Templates[".agent.md · SKILL.md · settings"]
333
338
  Doctor --> Checks["Node ≥18 · agents · skills"]
334
339
  QS --> Init & Doctor
340
+ Update --> Diff["Template diffing"]
335
341
  ```
336
342
 
337
343
  **Commands:**
338
344
  - `beth init` — Scaffold agents, skills, VS Code settings, Backlog.md tracking
339
345
  - `beth doctor` — Validate Node.js, agent frontmatter, skill directories
340
346
  - `beth quickstart` — Run init + doctor in one shot
347
+ - `beth land` — Automated session completion: tests, commit, push, verify sync
348
+ - `beth update` — Update project files to latest templates (supports `--check-only`)
341
349
 
342
350
  ---
343
351
 
344
352
  ## TypeScript Core
345
353
 
346
- The engine that powers everything. Parses agent and skill definitions, manages conversations, routes requests, executes tools, and provides typed APIs for the full agentic loop.
354
+ The engine that powers Beth. Parses agent and skill definitions, provides typed APIs for the agentic loop, and drives the CLI toolchain.
347
355
 
348
356
  ### Project Structure
349
357
 
350
358
  ```
351
359
  beth/
352
360
  ├── bin/
353
- │ └── cli.js # CLI entry point (init, doctor, quickstart, help)
361
+ │ └── cli.js # CLI entry point (init, doctor, quickstart, land, update, help)
354
362
  ├── src/
355
363
  │ ├── index.ts # Barrel exports (all public API)
356
364
  │ ├── cli/commands/
357
365
  │ │ ├── doctor.ts # System health validation
358
- │ │ └── quickstart.ts # Guided setup flow
366
+ │ │ ├── land.ts # Automated session completion
367
+ │ │ ├── pre-push-guard.ts # Branch discipline enforcement
368
+ │ │ ├── quickstart.ts # Guided setup flow
369
+ │ │ └── update.ts # Template update diffing
359
370
  │ ├── core/
360
- │ │ ├── orchestrator.ts # Agentic loop: route → LLM → tools → response
361
- │ │ ├── router.ts # @mention routing, skill matching, agent lookup
362
- │ │ ├── context.ts # Conversation state, token truncation, skill injection
363
- │ │ ├── handoffs.ts # Agent handoff transfers, loop detection
364
371
  │ │ ├── agents/
365
372
  │ │ │ ├── types.ts # AgentDefinition, AgentFrontmatter, AgentHandoff
366
373
  │ │ │ └── loader.ts # Parse .agent.md → typed definitions
367
374
  │ │ └── skills/
368
375
  │ │ ├── types.ts # SkillDefinition, TriggerMap
369
376
  │ │ └── loader.ts # Parse SKILL.md, extract triggers, match queries
370
- ├── lib/
371
- └── pathValidation.ts # Traversal/injection guards
372
- │ ├── tools/
373
- │ │ ├── interface.ts # Tool interface + toToolDefinition()
374
- │ │ ├── types.ts # ToolError, ToolResult, ToolContext, ToolPermissions
375
- │ │ ├── registry.ts # ToolRegistry: register, get, list, getDefinitions
376
- │ │ ├── cli/
377
- │ │ │ ├── readFile.ts # File reading with line ranges
378
- │ │ │ ├── editFile.ts # Atomic string replacement
379
- │ │ │ ├── search.ts # Ripgrep with Node.js fallback
380
- │ │ │ ├── terminal.ts # Secure command execution
381
- │ │ │ ├── backlog.ts # Task tracking via backlog CLI
382
- │ │ │ └── subagent.ts # Agent spawning interface
383
- │ │ └── mcp/
384
- │ │ ├── client.ts # JSON-RPC 2.0 over stdio
385
- │ │ └── bridge.ts # JSONC config, tool namespacing
386
- │ └── providers/
387
- │ ├── interface.ts # LLMProviderBase abstract class
388
- │ ├── azure.ts # AzureOpenAIProvider (Entra ID, streaming, tools)
389
- │ ├── types.ts # 17 types: ChatMessage, ToolCall, LLMError, etc.
390
- │ ├── retry.ts # Exponential backoff with jitter
391
- │ ├── config.ts # Environment + dotfile config loader
392
- │ └── streaming.ts # StreamAccumulator, collectStream, mapStream
377
+ └── lib/
378
+ └── pathValidation.ts # Traversal/injection guards
393
379
  ├── templates/
394
380
  │ └── .github/
395
381
  │ ├── agents/ # 7 agent definitions (.agent.md)
396
- │ └── skills/ # 8 skill modules (SKILL.md)
382
+ │ └── skills/ # 6 core skill modules (SKILL.md)
397
383
  └── docs/
398
384
  ├── INSTALLATION.md
399
385
  ├── MCP-SETUP.md
400
386
  ├── CLI-ARCHITECTURE.md
401
- └── SYSTEM-FLOW.md
387
+ ├── SYSTEM-FLOW.md
388
+ ├── HOOKS-AND-HANDOFF-ENFORCEMENT.md
389
+ ├── E2E-SKILL-TESTS.md
390
+ ├── PR-REVIEW-PROCESS.md
391
+ └── SWARM-ARCHITECTURE.md
402
392
  ```
403
393
 
404
394
  ### Test Coverage
405
395
 
406
- **814 tests** (813 pass, 1 skip, 0 fail):
396
+ **860 tests** (860 pass, 0 fail):
407
397
 
408
398
  | Suite | Tests | What It Covers |
409
399
  |-------|-------|---------------|
410
- | **Orchestration** | | |
411
- | Orchestrator | 30+ | Agentic loop, tool calling, subagent spawning, iteration limits |
412
- | AgentRouter | 30+ | @mention routing, skill matching, agent resolution |
413
- | ConversationContext | 30+ | Token truncation, skill injection, tool call repair |
414
- | HandoffManager | 30+ | Context transfer, depth limits, ping-pong detection |
415
- | **Tools** | | |
416
- | Tool interface | 20+ | Tool ToolDefinition conversion, schema validation |
417
- | ToolRegistry | 20+ | Register, get, list, definitions, duplicate detection |
418
- | readFile | 30+ | Line ranges, path validation, encoding |
419
- | editFile | 30+ | String replacement, single-match enforcement |
420
- | search | 30+ | Ripgrep, Node.js fallback, regex, file filtering |
421
- | terminal | 30+ | Command execution, timeouts, output capture |
422
- | backlog | 30+ | Backlog.md CLI wrapper, task tracking |
423
- | subagent | 30+ | Spawn interface, result marking, agent validation |
424
- | MCP client | 30+ | JSON-RPC 2.0, protocol handshake, tool listing |
425
- | MCP bridge | 30+ | JSONC parsing, tool namespacing, error handling |
426
- | Tool suite | 10+ | createDefaultRegistry, integration tests |
427
- | **Providers** | | |
428
- | Provider types | 40+ | LLMError codes, ChatMessage shapes, ToolDefinition schemas |
429
- | Provider retry | 40+ | Exponential backoff, jitter, transient error detection |
430
- | Provider config | 30+ | Env precedence, dotenv parsing, URL validation |
431
- | Provider streaming | 40+ | Chunk accumulation, tool call delta assembly |
432
- | Provider Azure | 30+ | Message mapping, response mapping, error wrapping |
433
- | **Core & CLI** | | |
434
- | Agent loader | 30+ | Frontmatter parsing, validation, code fence stripping, handoffs |
435
- | Skill loader | 30+ | Trigger extraction, query matching, trigger map building |
436
- | CLI E2E | 52 | Init/doctor pipeline, MCP template validation, help output |
437
- | Path validation | 33 | Traversal detection, injection prevention, allowlists |
400
+ | **Skill Routing** | | |
401
+ | Hook injection | 51 | Deterministic skill injection via SubagentStart hook |
402
+ | Skill routing | 223 | Agent skill mapping, trigger phrase matching |
403
+ | Trigger coverage | 147 | All trigger phrases resolve to correct skills |
404
+ | Disambiguation | 28 | Overlapping trigger phrase resolution |
405
+ | Mapping completeness | 12 | Every agent has required skills mapped |
406
+ | Pipeline integration | 41 | End-to-end skill loading through full pipeline |
407
+ | Inject-skills hook | 20 | `inject-skills.mjs` unit tests |
408
+ | Verify-skills hook | 9 | `verify-skills.mjs` compliance gate |
409
+ | Smoke tests | 7 | Package exports, barrel imports |
410
+ | **Core** | | |
411
+ | Agent loader | 13 | `.agent.md` parsing, validation, code fence stripping |
412
+ | Agent frontmatter | 32 | YAML frontmatter extraction, required fields |
413
+ | Agent handoffs | 18 | Handoff chain validation, escalation patterns |
414
+ | Agent tools | 25 | Tool declarations, permission schemas |
415
+ | Agent types | 13 | Type definitions, discriminated unions |
416
+ | Agent suite | 18 | Integration: load all 7 agents, validate consistency |
417
+ | Skill loader | 20 | SKILL.md parsing, trigger extraction, query matching |
418
+ | Path validation | 26 | Traversal detection, injection prevention, allowlists |
419
+ | **CLI** | | |
420
+ | Init | 24 | File scaffolding, template copying, idempotency |
421
+ | Doctor | 15 | Node.js version, agent validation, skill checks |
422
+ | Land | 62 | Test commit → push pipeline, branch discipline |
423
+ | Pre-push guard | 46 | Branch protection, main/master blocking |
424
+ | Quickstart | 10 | Init + Doctor combined flow |
425
+ | **CLI E2E** | | |
426
+ | Init logic | 20 | End-to-end init with real filesystem |
427
+ | Doctor | 21 | Health checks against real project structure |
428
+ | Pipeline | 14 | Init → Doctor pipeline validation |
429
+ | Help | 24 | Help output format, command listing |
430
+ | MCP | 13 | MCP template validation and copying |
431
+ | Edge cases | 13 | Flag combinations, error scenarios |
432
+ | Pre-push guard | 11 | Git hook integration with temp repos |
433
+ | Quickstart expanded | 11 | Full quickstart flow E2E |
438
434
 
439
435
  ---
440
436
 
@@ -535,6 +531,10 @@ See [MCP Integrations](#mcp-integrations) above or [docs/MCP-SETUP.md](docs/MCP-
535
531
  | [MCP Setup](docs/MCP-SETUP.md) | Optional server integrations |
536
532
  | [CLI Architecture](docs/CLI-ARCHITECTURE.md) | Dual-interface design, implementation phases |
537
533
  | [System Flow](docs/SYSTEM-FLOW.md) | Agent orchestration diagrams |
534
+ | [Hooks & Handoffs](docs/HOOKS-AND-HANDOFF-ENFORCEMENT.md) | Skill injection hooks, hub-and-spoke enforcement |
535
+ | [E2E Skill Tests](docs/E2E-SKILL-TESTS.md) | Behavioral skill routing test plan |
536
+ | [PR Review Process](docs/PR-REVIEW-PROCESS.md) | Code review checklist and workflow |
537
+ | [Swarm Architecture](docs/SWARM-ARCHITECTURE.md) | Multi-agent swarm design (planned) |
538
538
  | [Contributing Guide](CONTRIBUTING.md) | How to contribute (PR process, review checklist) |
539
539
  | [Changelog](CHANGELOG.md) | Version history |
540
540
  | [Security Policy](SECURITY.md) | Vulnerability reporting |