npm - @plaited/acp-harness - Versions diffs - 0.2.6 → 0.3.1 - Mend

@plaited/acp-harness 0.2.6 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

package/LICENSE +1 -1
package/README.md +120 -16
package/bin/cli.ts +105 -636
package/bin/tests/cli.spec.ts +218 -51
package/package.json +20 -4
package/src/acp-client.ts +5 -4
package/src/acp-transport.ts +14 -7
package/src/adapter-check.ts +542 -0
package/src/adapter-scaffold.ts +934 -0
package/src/balance.ts +232 -0
package/src/calibrate.ts +300 -0
package/src/capture.ts +457 -0
package/src/constants.ts +94 -0
package/src/grader-loader.ts +174 -0
package/src/harness.ts +35 -0
package/src/schemas-cli.ts +239 -0
package/src/schemas.ts +567 -0
package/src/summarize.ts +245 -0
package/src/tests/adapter-check.spec.ts +70 -0
package/src/tests/adapter-scaffold.spec.ts +112 -0
package/src/tests/fixtures/grader-bad-module.ts +5 -0
package/src/tests/fixtures/grader-exec-fail.py +9 -0
package/src/tests/fixtures/grader-exec-invalid.py +6 -0
package/src/tests/fixtures/grader-exec.py +29 -0
package/src/tests/fixtures/grader-module.ts +14 -0
package/src/tests/grader-loader.spec.ts +153 -0
package/src/trials.ts +395 -0
package/src/validate-refs.ts +188 -0
package/.claude/rules/accuracy.md +0 -43
package/.claude/rules/bun-apis.md +0 -80
package/.claude/rules/code-review.md +0 -254
package/.claude/rules/git-workflow.md +0 -37
package/.claude/rules/github.md +0 -154
package/.claude/rules/testing.md +0 -172
package/.claude/skills/acp-harness/SKILL.md +0 -310
package/.claude/skills/acp-harness/assets/Dockerfile.acp +0 -25
package/.claude/skills/acp-harness/assets/docker-compose.acp.yml +0 -19
package/.claude/skills/acp-harness/references/downstream.md +0 -288
package/.claude/skills/acp-harness/references/output-formats.md +0 -221
package/.claude-plugin/marketplace.json +0 -15
package/.claude-plugin/plugin.json +0 -16
package/.github/CODEOWNERS +0 -6
package/.github/workflows/ci.yml +0 -63
package/.github/workflows/publish.yml +0 -146
package/.mcp.json +0 -20
package/CLAUDE.md +0 -92
package/Dockerfile.test +0 -23
package/biome.json +0 -96
package/bun.lock +0 -513
package/docker-compose.test.yml +0 -21
package/scripts/bun-test-wrapper.sh +0 -46
package/src/acp.constants.ts +0 -56
package/src/acp.schemas.ts +0 -161
package/src/acp.types.ts +0 -28
package/src/tests/fixtures/.claude/settings.local.json +0 -8
package/src/tests/fixtures/.claude/skills/greeting/SKILL.md +0 -17
package/tsconfig.json +0 -32

package/.claude/rules/code-review.md DELETED Viewed

@@ -1,254 +0,0 @@
-# Code Review Standards
-The following standards are not automatically enforced by Biome but should be checked during code review.
-## Automated Validation
-Before completing a code review, run these validation scripts:
-### Plugin Skills Validation
-When changes touch `.claude/skills/`, validate against the AgentSkills spec by running:
-```
-/validate-skill
-```
-This checks:
-- SKILL.md exists and has required frontmatter
-- Scripts have proper structure
-- References are valid markdown files
-## TypeScript Style Conventions
-### Prefer `type` Over `interface`
-Use type aliases instead of interfaces for better consistency and flexibility:
-```typescript
-// ✅ Good
-type User = {
-  name: string
-  email: string
-}
-// ❌ Avoid
-interface User {
-  name: string
-  email: string
-}
-```
-**Rationale:** Type aliases are more flexible (unions, intersections, mapped types) and provide consistent syntax across the codebase.
-### No `any` Types
-Always use proper types; use `unknown` if type is truly unknown and add type guards:
-```typescript
-// ✅ Good
-const process = (data: unknown) => {
-  if (typeof data === 'string') {
-    return data.toUpperCase()
-  }
-}
-// ❌ Avoid
-const process = (data: any) => {
-  return data.toUpperCase()
-}
-```
-### PascalCase for Types and Schemas
-All type names use PascalCase. Zod schema names use `PascalCaseSchema` suffix:
-```typescript
-// ✅ Good
-type UserConfig = { /* ... */ }
-const UserConfigSchema = z.object({ /* ... */ })
-// ❌ Avoid
-type userConfig = { /* ... */ }
-const zUserConfig = z.object({ /* ... */ })  // Don't use z-prefix
-```
-### Arrow Functions Preferred
-```typescript
-// ✅ Good
-const greet = (name: string) => `Hello, ${name}!`
-// ❌ Avoid
-function greet(name: string) {
-  return `Hello, ${name}!`
-}
-```
-### Object Parameter Pattern
-For functions with more than two parameters, use a single object parameter:
-```typescript
-// ✅ Good: Object parameter pattern
-const createClient = ({
-  command,
-  timeout,
-  cwd,
-}: {
-  command: string[]
-  timeout: number
-  cwd?: string
-}): ACPClient => { /* ... */ }
-// ❌ Avoid: Multiple positional parameters
-const createClient = (
-  command: string[],
-  timeout: number,
-  cwd?: string
-): ACPClient => { /* ... */ }
-```
-## TypeScript Comment Directives
-### `@ts-ignore` Requires Description
-- When using `@ts-ignore`, always include a description explaining why the type error is being suppressed
-- This helps future maintainers understand the reasoning
-- Example:
-  ```typescript
-  // @ts-ignore - TypeScript incorrectly infers this as string when it's actually a number from the API
-  const value = response.data
-  ```
-**Rationale:** Lost TypeScript-ESLint rule `ban-ts-comment` with `allow-with-description` option during Biome migration
-## Expression Statements
-### Allow Short-Circuit and Ternary Expressions
-- Short-circuit evaluation is acceptable: `condition && doSomething()`
-- Ternary expressions are acceptable: `condition ? doThis() : doThat()`
-- These patterns are idiomatic and improve code readability
-**Rationale:** Lost TypeScript-ESLint rule `no-unused-expressions` with `allowShortCircuit` and `allowTernary` options during Biome migration
-## Empty Object Types
-### Allow Empty Object Types When Extending a Single Interface
-- Empty object types are acceptable when they extend exactly one other interface
-- This pattern is useful for creating branded types or extending third-party interfaces
-- Example:
-  ```typescript
-  // ✅ Acceptable: Extends single interface
-  interface CustomElement extends HTMLElement {}
-  // ❌ Avoid: Empty with no extends or multiple extends
-  interface Empty {}
-  interface Multi extends Foo, Bar {}
-  ```
-**Rationale:** Lost TypeScript-ESLint rule `no-empty-object-type` with `allowInterfaces: 'with-single-extends'` option during Biome migration
-## Modern JavaScript Standards
-### Prefer Private Fields Over `private` Keyword
-Use JavaScript private fields (`#field`) instead of TypeScript's `private` keyword:
-```typescript
-// ✅ Good: JavaScript private fields (ES2022+)
-class EventBus {
-  #listeners = new Map<string, Set<Function>>()
-  #count = 0
-  #emit(event: string) {
-    this.#count++
-    this.#listeners.get(event)?.forEach(fn => fn())
-  }
-}
-// ❌ Avoid: TypeScript private keyword
-class EventBus {
-  private listeners = new Map<string, Set<Function>>()
-  private count = 0
-  private emit(event: string) {
-    this.count++
-    this.listeners.get(event)?.forEach(fn => fn())
-  }
-}
-```
-**Rationale:** JavaScript private fields are a runtime feature (ES2022) providing true encapsulation. TypeScript's `private` is erased at compile time and can be bypassed. Prefer platform standards over TypeScript-only features
-### JSON Import Attributes
-Use import attributes when importing JSON files. This is required because the tsconfig uses `verbatimModuleSyntax: true` without `resolveJsonModule`:
-```typescript
-// ✅ Good: Import attribute for JSON
-import { version } from '../package.json' with { type: 'json' }
-import config from './config.json' with { type: 'json' }
-// ❌ Avoid: Missing import attribute
-import { version } from '../package.json'
-import config from './config.json'
-```
-**Rationale:** Import attributes (ES2025) explicitly declare module types to the runtime. The `with { type: 'json' }` syntax is the standard (replacing the deprecated `assert` keyword). This provides runtime enforcement—if the file isn't valid JSON, the import fails.
-## Module Organization
-Organize module exports into separate files by category:
-```
-module/
-├── module.types.ts      # Type definitions only
-├── module.schemas.ts    # Zod schemas and schema-inferred types
-├── module.constants.ts  # Constants and enum-like objects
-├── module-client.ts     # Implementation
-└── index.ts             # Public API exports (barrel file)
-```
-### Key Principles
-- **Types file**: Contains only type definitions, does NOT re-export from sibling files
-- **Schemas file**: Contains Zod schemas and their inferred types, does NOT export constants
-- **Constants file**: Contains all constant values (error codes, method names, defaults)
-- **Barrel file**: Uses wildcard exports; non-public files are simply not included
-```typescript
-// ✅ Good: Direct imports from specific files
-import type { Config } from './module.types.ts'
-import { ConfigSchema } from './module.schemas.ts'
-import { METHODS, ERROR_CODES } from './module.constants.ts'
-// ❌ Avoid: Expecting types file to re-export everything
-import { Config, ConfigSchema, METHODS } from './module.types.ts'
-```
-**Rationale:** Prevents circular dependencies, makes dependencies explicit, improves tree-shaking.
-## Documentation Standards
-### Mermaid Diagrams Only
-Use [mermaid](https://mermaid.js.org/) syntax for all diagrams in markdown files:
-```markdown
-\```mermaid
-flowchart TD
-    A[Start] --> B[Process]
-    B --> C[End]
-\```
-```
-**Avoid**: ASCII box-drawing characters (`┌`, `│`, `└`, `─`, etc.)
-**Rationale:** Token efficiency, clearer semantic meaning, easier maintenance.
-### No `@example` Sections in TSDoc
-Tests serve as living examples. Do not add `@example` sections to TSDoc comments.

package/.claude/rules/git-workflow.md DELETED Viewed

@@ -1,37 +0,0 @@
-# Git Workflow
-## Commit Message Format
-When creating commits with multi-line messages, use single-quoted strings instead of heredocs. The sandbox environment restricts temp file creation needed for heredocs.
-```bash
-# ✅ CORRECT: Single-quoted multi-line string
-git commit -m 'refactor: description here
-Additional context on second line.
-🤖 Generated with [Claude Code](https://claude.com/claude-code)
-Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>'
-# ❌ WRONG: Heredoc syntax (fails in sandbox)
-git commit -m "$(cat <<'EOF'
-refactor: description here
-EOF
-)"
-```
-The heredoc approach fails with:
-```
-(eval):1: can't create temp file for here document: operation not permitted
-```
-## Commit Conventions
-This project follows conventional commits:
-- `feat:` - New features
-- `fix:` - Bug fixes
-- `refactor:` - Code changes that neither fix bugs nor add features
-- `docs:` - Documentation only changes
-- `chore:` - Maintenance tasks
-- `test:` - Adding or updating tests

package/.claude/rules/github.md DELETED Viewed

@@ -1,154 +0,0 @@
-# GitHub Integration
-## Prefer GitHub CLI Over Web Fetch
-When given GitHub URLs (PRs, issues, repos), use the `gh` CLI instead of WebFetch for more reliable and complete data access.
-### PR Information
-```bash
-# Get PR details with comments and reviews (comprehensive)
-gh pr view <number> --repo <owner>/<repo> --json title,body,comments,reviews,reviewRequests,state,author,additions,deletions,changedFiles
-# Get PR diff
-gh pr diff <number> --repo <owner>/<repo>
-# Get PR files changed
-gh pr view <number> --repo <owner>/<repo> --json files
-# Get PR checks status
-gh pr checks <number> --repo <owner>/<repo>
-```
-### Complete PR Evaluation Workflow
-When asked to evaluate PR feedback, you MUST fetch **all** feedback sources. Do not just fetch comments/reviews - also check security alerts and inline code comments.
-**Step 1: Fetch PR comments and reviews**
-```bash
-gh pr view <number> --repo <owner>/<repo> --json title,body,comments,reviews,state
-```
-**Step 2: Fetch code scanning alerts (security vulnerabilities)**
-```bash
-gh api repos/<owner>/<repo>/code-scanning/alerts --jq '
-  .[] | select(.state == "open") | {
-    number: .number,
-    rule: .rule.description,
-    severity: .rule.severity,
-    file: .most_recent_instance.location.path,
-    line: .most_recent_instance.location.start_line
-  }
-'
-```
-**Step 3: Fetch inline review comments (code quality, suggestions)**
-```bash
-gh api repos/<owner>/<repo>/pulls/<number>/comments --jq '
-  .[] | {
-    id: .id,
-    user: .user.login,
-    file: .path,
-    line: .line,
-    body: .body
-  }
-'
-```
-**Step 4: Address ALL feedback**
-Create a checklist and address each item:
-- [ ] Human reviewer comments
-- [ ] Claude Code review comments
-- [ ] GitHub Advanced Security alerts (ReDoS, injection, etc.)
-- [ ] GitHub Code Quality comments (dead code, useless assignments)
-- [ ] Inline review suggestions
-### Comment Sources
-| Source | API/Location | Description |
-|--------|--------------|-------------|
-| Human reviewers | `gh pr view --json reviews` | Code owners, team members |
-| Claude Code | `gh pr view --json comments` | AI-generated review (login: `claude`) |
-| GitHub Advanced Security | `gh api .../code-scanning/alerts` | Security vulnerabilities (ReDoS, injection) |
-| GitHub Code Quality | `gh api .../pulls/.../comments` | Code quality issues (login: `github-code-quality[bot]`) |
-| Inline suggestions | `gh api .../pulls/.../comments` | Line-specific review comments |
-### Filtering by Author
-```bash
-# Get all automated reviews from PR
-gh pr view <number> --repo <owner>/<repo> --json reviews --jq '
-  .reviews[] | select(.author.login | test("github-|claude")) | {author: .author.login, state: .state}
-'
-# Get specific inline comment by ID
-gh api repos/<owner>/<repo>/pulls/<number>/comments --jq '
-  .[] | select(.id == <comment_id>)
-'
-```
-### URL Patterns for Specific Feedback
-| URL Pattern | How to Fetch |
-|-------------|--------------|
-| `.../pull/<n>#issuecomment-<id>` | `gh pr view <n> --json comments` |
-| `.../pull/<n>#discussion_r<id>` | `gh api repos/.../pulls/<n>/comments` |
-| `.../security/code-scanning/<id>` | `gh api repos/.../code-scanning/alerts/<id>` |
-### Review States
-- `APPROVED` - Reviewer approved changes
-- `CHANGES_REQUESTED` - Reviewer requested changes
-- `COMMENTED` - Review with comments only
-- `PENDING` - Review not yet submitted
-### Issue Information
-```bash
-# Get issue details
-gh issue view <number> --repo <owner>/<repo> --json title,body,comments,state,author
-# List issues
-gh issue list --repo <owner>/<repo> --json number,title,state
-```
-### Repository Information
-```bash
-# Get repo info
-gh repo view <owner>/<repo> --json name,description,url
-# List workflows
-gh workflow list --repo <owner>/<repo>
-# View run logs
-gh run view <run-id> --repo <owner>/<repo> --log
-```
-### URL Parsing
-When given a GitHub URL, extract the components:
-| URL Pattern | Command |
-|-------------|---------|
-| `github.com/<owner>/<repo>/pull/<number>` | `gh pr view <number> --repo <owner>/<repo>` |
-| `github.com/<owner>/<repo>/issues/<number>` | `gh issue view <number> --repo <owner>/<repo>` |
-| `github.com/<owner>/<repo>` | `gh repo view <owner>/<repo>` |
-| `github.com/<owner>/<repo>/actions/runs/<id>` | `gh run view <id> --repo <owner>/<repo>` |
-### JSON Output Fields
-Common useful JSON fields for PRs:
-- `title`, `body`, `state`, `author`
-- `comments` - PR comments
-- `reviews` - Review comments and approvals
-- `additions`, `deletions`, `changedFiles`
-- `files` - List of changed files
-- `commits` - Commit history
-### Benefits Over WebFetch
-1. **Complete data** - Access to all comments, reviews, and metadata
-2. **Authentication** - Uses configured GitHub credentials
-3. **Structured output** - JSON format for reliable parsing
-4. **No rate limiting issues** - Authenticated requests have higher limits
-5. **Access to private repos** - Works with repos you have access to

package/.claude/rules/testing.md DELETED Viewed

@@ -1,172 +0,0 @@
-# Testing
-This project uses Bun's built-in test runner for unit and integration tests.
-## Test Types
-### Unit/Integration Tests (`*.spec.ts`)
-- Standard Bun tests using `*.spec.ts` extension
-- Run with `bun test` command
-- Used for testing business logic, utilities, and non-visual functionality
-### Docker Integration Tests (`*.docker.ts`)
-- Tests that require external services or API keys run in Docker containers
-- Use `*.docker.ts` extension
-- Run with `bun run test:docker`
-## Running Tests
-```bash
-# Run all unit tests
-bun test
-# Run a specific spec test file
-bun test path/to/file.spec.ts
-# Run tests matching a pattern
-bun test pattern
-# Run Docker integration tests (requires ANTHROPIC_API_KEY)
-ANTHROPIC_API_KEY=sk-... bun run test:docker
-```
-## Test Style Conventions
-### Use `test` Instead of `it`
-Use `test` instead of `it` in test files for consistency:
-```typescript
-// ✅ Good
-test('should create ACP client correctly', () => {
-  // ...
-})
-// ❌ Avoid
-it('should create ACP client correctly', () => {
-  // ...
-})
-```
-## Skill Script Tests
-Claude Code skills in `.claude/skills/` may include executable scripts. Tests for these scripts follow a specific structure:
-### Directory Structure
-```
-.claude/skills/<skill-name>/
-├── SKILL.md
-├── scripts/
-│   ├── script-name.ts        # Executable script
-│   └── tests/
-│       └── script-name.spec.ts  # Tests for the script
-```
-### Running Skill Script Tests
-```bash
-# From skill directory
-bun test scripts/tests/
-```
-### Test Pattern
-Scripts that output JSON can be tested using Bun's shell API:
-```typescript
-import { describe, test, expect } from 'bun:test'
-import { join } from 'node:path'
-import { $ } from 'bun'
-const scriptsDir = join(import.meta.dir, '..')
-describe('script-name', () => {
-  test('outputs expected JSON', async () => {
-    const result = await $`bun ${scriptsDir}/script-name.ts arg1 arg2`.json()
-    expect(result.filePath).toEndWith('expected.ts')
-  })
-  test('exits with error on invalid input', async () => {
-    const proc = Bun.spawn(['bun', `${scriptsDir}/script-name.ts`], {
-      stderr: 'pipe',
-    })
-    const exitCode = await proc.exited
-    expect(exitCode).toBe(1)
-  })
-})
-```
-## Docker Integration Tests
-Tests that require the Anthropic API run in Docker containers for consistent, isolated execution.
-### ACP Integration Tests
-The ACP client integration tests require the Anthropic API and run in a Docker container:
-```bash
-# Run locally with Docker (requires ANTHROPIC_API_KEY)
-ANTHROPIC_API_KEY=sk-... docker compose -f docker-compose.test.yml run --rm acp-test
-# Or using the npm script (still requires Docker)
-ANTHROPIC_API_KEY=sk-... bun run test:docker
-```
-### File Naming
-- **`*.docker.ts`**: Tests that run in Docker containers
-- These are excluded from `bun test` and run separately in CI
-### CI Workflow
-Docker tests use path filtering to reduce API costs:
-```yaml
-# .github/workflows/ci.yml
-jobs:
-  changes:
-    # Detects which paths changed
-    steps:
-      - uses: dorny/paths-filter@v3
-        with:
-          filters: |
-            acp:
-              - 'src/**'
-  test-acp-integration:
-    needs: changes
-    if: ${{ needs.changes.outputs.acp == 'true' }}
-    # Only runs when src/ files change
-```
-## Anti-Patterns
-### No Conditionals Around Assertions
-Never wrap assertions in conditionals. Tests should fail explicitly, not silently skip assertions.
-```typescript
-// ❌ WRONG: Conditional assertion
-if (result) {
-  expect(result.value).toBe(expected)
-}
-// ❌ WRONG: Optional chaining with assertion
-result?.value && expect(result.value).toBe(expected)
-// ✅ CORRECT: Assert the condition, then assert the value
-expect(result).toBeDefined()
-expect(result.value).toBe(expected)
-// ✅ CORRECT: Use type narrowing assertion
-expect(result).not.toBeNull()
-expect(result!.value).toBe(expected)
-```
-If a value might not exist, the test should either:
-1. Assert that it exists first, then check its value
-2. Assert that it doesn't exist (if that's the expected behavior)
-3. Restructure the test to ensure the value is always present