npm - codeharness - Versions diffs - 0.16.0 → 0.16.1 - Mend

codeharness 0.16.0 → 0.16.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -1,66 +1,118 @@
 # codeharness
-## Quick Start
+Makes autonomous coding agents produce software that actually works — not software that passes tests.
-```bash
-# Install
-npm install -g codeharness
+codeharness is an **npm CLI** + **Claude Code plugin** that packages verification-driven development as an installable tool: black-box verification via Docker, agent-first observability via VictoriaMetrics, and mechanical enforcement via hooks that make skipping verification architecturally impossible.
-# Initialize the project
-codeharness init
+## What it does
-# Check project status
-codeharness status
-```
+1. **Verifies features work** — not just that tests pass. Black-box verification runs the built CLI inside a Docker container with no source code access. If the feature doesn't work from a user's perspective, verification fails.
+2. **Fixes what it finds** — verification failures with code bugs automatically return to development with specific findings. The dev agent gets told exactly what's broken and why.
+3. **Runs sprints autonomously** — reads your sprint plan, picks the highest-priority story, implements it, reviews it, verifies it, and moves to the next one. Cross-epic prioritization, retry management, and session handoff built in.
+4. **Makes agents see runtime** — ephemeral VictoriaMetrics stack (logs, metrics, traces) that agents query programmatically during development. No guessing at what the code does at runtime.
 ## Installation
+Two components — install both:
 ```bash
+# CLI (npm package)
 npm install -g codeharness
-```
-## Usage
+# Claude Code plugin (slash commands, hooks, skills)
+claude plugin install github:iVintik/codeharness
+```
-After installation, initialize codeharness in your project directory:
+## Quick Start
 ```bash
+# Initialize in your project
 codeharness init
+# Start autonomous sprint execution (inside Claude Code)
+/harness-run
 ```
-This sets up the harness with stack detection, observability, and documentation scaffolding.
+## How it works
+### As a CLI (`codeharness`)
+The CLI handles all mechanical work — stack detection, Docker management, verification, coverage, retry state.
+| Command | Purpose |
+|---------|---------|
+| `codeharness init` | Detect stack, install dependencies, start observability, scaffold docs |
+| `codeharness run` | Execute the autonomous coding loop (Ralph) |
+| `codeharness verify --story <key>` | Run verification pipeline for a story |
+| `codeharness status` | Show harness health, sprint progress, Docker stack |
+| `codeharness coverage` | Run tests with coverage and evaluate against targets |
+| `codeharness onboard epic` | Scan codebase for gaps, generate onboarding stories |
+| `codeharness retry --status` | Show retry counts and flagged stories |
+| `codeharness retry --reset` | Clear retry state for re-verification |
+| `codeharness verify-env build` | Build Docker image for black-box verification |
+| `codeharness stack start` | Start the shared observability stack |
+| `codeharness teardown` | Remove harness from project |
-## CLI Reference
+All commands support `--json` for machine-readable output.
+### As a Claude Code plugin (`/harness-*`)
+The plugin provides slash commands that orchestrate the CLI within Claude Code sessions:
+| Command | Purpose |
+|---------|---------|
+| `/harness-run` | Autonomous sprint execution — picks stories by priority, runs create → dev → review → verify loop |
+| `/harness-init` | Interactive project initialization |
+| `/harness-status` | Quick overview of sprint progress and harness health |
+| `/harness-onboard` | Scan project and generate onboarding plan |
+| `/harness-verify` | Verify a story with real-world evidence |
+### BMAD Method integration
+codeharness integrates with [BMAD Method](https://github.com/bmadcode/BMAD-METHOD) for structured sprint planning:
+| Phase | Commands |
+|-------|----------|
+| Analysis | `/create-brief`, `/brainstorm-project`, `/market-research` |
+| Planning | `/create-prd`, `/create-ux` |
+| Solutioning | `/create-architecture`, `/create-epics-stories` |
+| Implementation | `/sprint-planning`, `/create-story`, then `/harness-run` |
+## Verification architecture
 ```
-Usage: codeharness [options] [command]
-Makes autonomous coding agents produce software that actually works
-Options:
-  -V, --version            output the version number
-  --json                   Output in machine-readable JSON format
-  -h, --help               display help for command
-Commands:
-  init [options]           Initialize the harness in a project
-  bridge [options]         Bridge BMAD epics/stories into beads task store
-  run [options]            Execute the autonomous coding loop
-  verify [options]         Run verification pipeline on completed work
-  status [options]         Show current harness status and health
-  onboard [options]        Onboard an existing codebase into the harness
-  teardown [options]       Remove harness from a project
-  state                    Manage harness state
-  sync [options]           Synchronize beads issue statuses with story files and
-                           sprint-status.yaml
-  coverage [options]       Run tests with coverage and evaluate against targets
-  doc-health [options]     Scan documentation for freshness and quality issues
-  stack                    Manage the shared observability stack
-  query                    Query observability data (logs, metrics, traces)
-                           scoped to current project
-  retro-import [options]   Import retrospective action items as beads issues
-  github-import [options]  Import GitHub issues labeled for sprint planning into
-                           beads
-  verify-env               Manage verification environment (Docker image + clean
-                           workspace)
-  help [command]           display help for command
+┌─────────────────────────────────────────┐
+│  Claude Code Session                     │
+│  /harness-run picks next story           │
+│  → create-story → dev → review → verify  │
+└────────────────────┬────────────────────┘
+                     │ verify
+                     ▼
+┌─────────────────────────────────────────┐
+│  Docker Container (no source code)       │
+│  - codeharness CLI installed from tarball│
+│  - claude CLI for nested verification    │
+│  - curl/jq for observability queries     │
+│  Exercises CLI as a real user would      │
+└────────────────────┬────────────────────┘
+                     │ queries
+                     ▼
+┌─────────────────────────────────────────┐
+│  Observability Stack (VictoriaMetrics)   │
+│  - VictoriaLogs  :9428 (LogQL)          │
+│  - VictoriaMetrics :8428 (PromQL)       │
+│  - OTEL Collector :4318                  │
+└─────────────────────────────────────────┘
 ```
+When verification finds code bugs → story returns to dev with findings → dev fixes → re-verify. This loop runs up to 10 times per story. Infrastructure failures (timeouts, Docker errors) retry 3 times then skip.
+## Requirements
+- Node.js >= 18
+- Docker (for observability and verification)
+- Claude Code (for plugin features)
+## License
+MIT

package/dist/index.js CHANGED Viewed

@@ -1402,7 +1402,7 @@ function getInstallCommand(stack) {
 }
 // src/commands/init.ts
-var HARNESS_VERSION = true ? "0.16.0" : "0.0.0-dev";
+var HARNESS_VERSION = true ? "0.16.1" : "0.0.0-dev";
 function getProjectName(projectDir) {
   try {
     const pkgPath = join6(projectDir, "package.json");
@@ -7675,7 +7675,7 @@ function handleStatus(dir, isJson, filterStory) {
 }
 // src/index.ts
-var VERSION = true ? "0.16.0" : "0.0.0-dev";
+var VERSION = true ? "0.16.1" : "0.0.0-dev";
 function createProgram() {
   const program = new Command();
   program.name("codeharness").description("Makes autonomous coding agents produce software that actually works").version(VERSION).option("--json", "Output in machine-readable JSON format");

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "codeharness",
-  "version": "0.16.0",
+  "version": "0.16.1",
   "type": "module",
   "description": "CLI for codeharness — makes autonomous coding agents produce software that actually works",
   "bin": {

package/ralph/ralph.sh CHANGED Viewed

@@ -7,6 +7,9 @@
 set -e
+# DEBUG: catch unexpected exits from set -e
+trap 'echo "[$(date "+%Y-%m-%d %H:%M:%S")] [FATAL] ralph.sh died at line $LINENO (exit code: $?)" >> "${LOG_DIR:-ralph/logs}/ralph_crash.log" 2>/dev/null' ERR
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 source "$SCRIPT_DIR/lib/date_utils.sh"
 source "$SCRIPT_DIR/lib/timeout_utils.sh"
@@ -581,6 +584,11 @@ execute_iteration() {
     local deadline=$(( $(date +%s) + timeout_seconds ))
     echo "$deadline" > "ralph/.iteration_deadline"
+    # DEBUG: log the command being run
+    log_status "DEBUG" "Command: ${CLAUDE_CMD_ARGS[*]}"
+    log_status "DEBUG" "Output file: $output_file"
+    log_status "DEBUG" "LIVE_OUTPUT=$LIVE_OUTPUT, timeout=${timeout_seconds}s"
     log_status "INFO" "Starting $(driver_display_name) (timeout: ${ITERATION_TIMEOUT_MINUTES}m)..."
     # Execute with timeout
@@ -606,6 +614,8 @@ execute_iteration() {
         local claude_pid=$!
         local progress_counter=0
+        log_status "DEBUG" "Background PID: $claude_pid"
         while kill -0 $claude_pid 2>/dev/null; do
             progress_counter=$((progress_counter + 1))
             if [[ -f "$output_file" && -s "$output_file" ]]; then
@@ -614,8 +624,23 @@ execute_iteration() {
             sleep 10
         done
+        # Protect wait from set -e — capture exit code without crashing
+        set +e
         wait $claude_pid
         exit_code=$?
+        set -e
+        log_status "DEBUG" "Claude exited with code: $exit_code, output size: $(wc -c < "$output_file" 2>/dev/null || echo 0) bytes"
+        # If output is empty and exit code is non-zero, log diagnostic info
+        if [[ ! -s "$output_file" && $exit_code -ne 0 ]]; then
+            log_status "ERROR" "Claude produced no output and exited with code $exit_code"
+            log_status "DEBUG" "Checking if claude binary is responsive..."
+            if claude --version > /dev/null 2>&1; then
+                log_status "DEBUG" "claude binary OK: $(claude --version 2>&1)"
+            else
+                log_status "ERROR" "claude binary not responding"
+            fi
+        fi
     fi
     if [[ $exit_code -eq 0 ]]; then