codeharness 0.16.0 → 0.16.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,66 +1,118 @@
1
1
  # codeharness
2
2
 
3
- ## Quick Start
3
+ Makes autonomous coding agents produce software that actually works — not software that passes tests.
4
4
 
5
- ```bash
6
- # Install
7
- npm install -g codeharness
5
+ codeharness is an **npm CLI** + **Claude Code plugin** that packages verification-driven development as an installable tool: black-box verification via Docker, agent-first observability via VictoriaMetrics, and mechanical enforcement via hooks that make skipping verification architecturally impossible.
8
6
 
9
- # Initialize the project
10
- codeharness init
7
+ ## What it does
11
8
 
12
- # Check project status
13
- codeharness status
14
- ```
9
+ 1. **Verifies features work** — not just that tests pass. Black-box verification runs the built CLI inside a Docker container with no source code access. If the feature doesn't work from a user's perspective, verification fails.
10
+ 2. **Fixes what it finds** — verification failures with code bugs automatically return to development with specific findings. The dev agent gets told exactly what's broken and why.
11
+ 3. **Runs sprints autonomously** — reads your sprint plan, picks the highest-priority story, implements it, reviews it, verifies it, and moves to the next one. Cross-epic prioritization, retry management, and session handoff built in.
12
+ 4. **Makes agents see runtime** — ephemeral VictoriaMetrics stack (logs, metrics, traces) that agents query programmatically during development. No guessing at what the code does at runtime.
15
13
 
16
14
  ## Installation
17
15
 
16
+ Two components — install both:
17
+
18
18
  ```bash
19
+ # CLI (npm package)
19
20
  npm install -g codeharness
20
- ```
21
21
 
22
- ## Usage
22
+ # Claude Code plugin (slash commands, hooks, skills)
23
+ claude plugin install github:iVintik/codeharness
24
+ ```
23
25
 
24
- After installation, initialize codeharness in your project directory:
26
+ ## Quick Start
25
27
 
26
28
  ```bash
29
+ # Initialize in your project
27
30
  codeharness init
31
+
32
+ # Start autonomous sprint execution (inside Claude Code)
33
+ /harness-run
28
34
  ```
29
35
 
30
- This sets up the harness with stack detection, observability, and documentation scaffolding.
36
+ ## How it works
37
+
38
+ ### As a CLI (`codeharness`)
39
+
40
+ The CLI handles all mechanical work — stack detection, Docker management, verification, coverage, retry state.
41
+
42
+ | Command | Purpose |
43
+ |---------|---------|
44
+ | `codeharness init` | Detect stack, install dependencies, start observability, scaffold docs |
45
+ | `codeharness run` | Execute the autonomous coding loop (Ralph) |
46
+ | `codeharness verify --story <key>` | Run verification pipeline for a story |
47
+ | `codeharness status` | Show harness health, sprint progress, Docker stack |
48
+ | `codeharness coverage` | Run tests with coverage and evaluate against targets |
49
+ | `codeharness onboard epic` | Scan codebase for gaps, generate onboarding stories |
50
+ | `codeharness retry --status` | Show retry counts and flagged stories |
51
+ | `codeharness retry --reset` | Clear retry state for re-verification |
52
+ | `codeharness verify-env build` | Build Docker image for black-box verification |
53
+ | `codeharness stack start` | Start the shared observability stack |
54
+ | `codeharness teardown` | Remove harness from project |
31
55
 
32
- ## CLI Reference
56
+ All commands support `--json` for machine-readable output.
57
+
58
+ ### As a Claude Code plugin (`/harness-*`)
59
+
60
+ The plugin provides slash commands that orchestrate the CLI within Claude Code sessions:
61
+
62
+ | Command | Purpose |
63
+ |---------|---------|
64
+ | `/harness-run` | Autonomous sprint execution — picks stories by priority, runs create → dev → review → verify loop |
65
+ | `/harness-init` | Interactive project initialization |
66
+ | `/harness-status` | Quick overview of sprint progress and harness health |
67
+ | `/harness-onboard` | Scan project and generate onboarding plan |
68
+ | `/harness-verify` | Verify a story with real-world evidence |
69
+
70
+ ### BMAD Method integration
71
+
72
+ codeharness integrates with [BMAD Method](https://github.com/bmadcode/BMAD-METHOD) for structured sprint planning:
73
+
74
+ | Phase | Commands |
75
+ |-------|----------|
76
+ | Analysis | `/create-brief`, `/brainstorm-project`, `/market-research` |
77
+ | Planning | `/create-prd`, `/create-ux` |
78
+ | Solutioning | `/create-architecture`, `/create-epics-stories` |
79
+ | Implementation | `/sprint-planning`, `/create-story`, then `/harness-run` |
80
+
81
+ ## Verification architecture
33
82
 
34
83
  ```
35
- Usage: codeharness [options] [command]
36
-
37
- Makes autonomous coding agents produce software that actually works
38
-
39
- Options:
40
- -V, --version output the version number
41
- --json Output in machine-readable JSON format
42
- -h, --help display help for command
43
-
44
- Commands:
45
- init [options] Initialize the harness in a project
46
- bridge [options] Bridge BMAD epics/stories into beads task store
47
- run [options] Execute the autonomous coding loop
48
- verify [options] Run verification pipeline on completed work
49
- status [options] Show current harness status and health
50
- onboard [options] Onboard an existing codebase into the harness
51
- teardown [options] Remove harness from a project
52
- state Manage harness state
53
- sync [options] Synchronize beads issue statuses with story files and
54
- sprint-status.yaml
55
- coverage [options] Run tests with coverage and evaluate against targets
56
- doc-health [options] Scan documentation for freshness and quality issues
57
- stack Manage the shared observability stack
58
- query Query observability data (logs, metrics, traces)
59
- scoped to current project
60
- retro-import [options] Import retrospective action items as beads issues
61
- github-import [options] Import GitHub issues labeled for sprint planning into
62
- beads
63
- verify-env Manage verification environment (Docker image + clean
64
- workspace)
65
- help [command] display help for command
84
+ ┌─────────────────────────────────────────┐
85
+ │ Claude Code Session │
86
+ │ /harness-run picks next story │
87
+ │ → create-story → dev → review → verify │
88
+ └────────────────────┬────────────────────┘
89
+ verify
90
+
91
+ ┌─────────────────────────────────────────┐
92
+ │ Docker Container (no source code) │
93
+ │ - codeharness CLI installed from tarball│
94
+ - claude CLI for nested verification │
95
+ - curl/jq for observability queries │
96
+ Exercises CLI as a real user would │
97
+ └────────────────────┬────────────────────┘
98
+ queries
99
+
100
+ ┌─────────────────────────────────────────┐
101
+ Observability Stack (VictoriaMetrics) │
102
+ - VictoriaLogs :9428 (LogQL) │
103
+ - VictoriaMetrics :8428 (PromQL) │
104
+ - OTEL Collector :4318 │
105
+ └─────────────────────────────────────────┘
66
106
  ```
107
+
108
+ When verification finds code bugs → story returns to dev with findings → dev fixes → re-verify. This loop runs up to 10 times per story. Infrastructure failures (timeouts, Docker errors) retry 3 times then skip.
109
+
110
+ ## Requirements
111
+
112
+ - Node.js >= 18
113
+ - Docker (for observability and verification)
114
+ - Claude Code (for plugin features)
115
+
116
+ ## License
117
+
118
+ MIT
package/dist/index.js CHANGED
@@ -1402,7 +1402,7 @@ function getInstallCommand(stack) {
1402
1402
  }
1403
1403
 
1404
1404
  // src/commands/init.ts
1405
- var HARNESS_VERSION = true ? "0.16.0" : "0.0.0-dev";
1405
+ var HARNESS_VERSION = true ? "0.16.1" : "0.0.0-dev";
1406
1406
  function getProjectName(projectDir) {
1407
1407
  try {
1408
1408
  const pkgPath = join6(projectDir, "package.json");
@@ -7675,7 +7675,7 @@ function handleStatus(dir, isJson, filterStory) {
7675
7675
  }
7676
7676
 
7677
7677
  // src/index.ts
7678
- var VERSION = true ? "0.16.0" : "0.0.0-dev";
7678
+ var VERSION = true ? "0.16.1" : "0.0.0-dev";
7679
7679
  function createProgram() {
7680
7680
  const program = new Command();
7681
7681
  program.name("codeharness").description("Makes autonomous coding agents produce software that actually works").version(VERSION).option("--json", "Output in machine-readable JSON format");
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "codeharness",
3
- "version": "0.16.0",
3
+ "version": "0.16.1",
4
4
  "type": "module",
5
5
  "description": "CLI for codeharness — makes autonomous coding agents produce software that actually works",
6
6
  "bin": {
package/ralph/ralph.sh CHANGED
@@ -7,6 +7,9 @@
7
7
 
8
8
  set -e
9
9
 
10
+ # DEBUG: catch unexpected exits from set -e
11
+ trap 'echo "[$(date "+%Y-%m-%d %H:%M:%S")] [FATAL] ralph.sh died at line $LINENO (exit code: $?)" >> "${LOG_DIR:-ralph/logs}/ralph_crash.log" 2>/dev/null' ERR
12
+
10
13
  SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
11
14
  source "$SCRIPT_DIR/lib/date_utils.sh"
12
15
  source "$SCRIPT_DIR/lib/timeout_utils.sh"
@@ -581,6 +584,11 @@ execute_iteration() {
581
584
  local deadline=$(( $(date +%s) + timeout_seconds ))
582
585
  echo "$deadline" > "ralph/.iteration_deadline"
583
586
 
587
+ # DEBUG: log the command being run
588
+ log_status "DEBUG" "Command: ${CLAUDE_CMD_ARGS[*]}"
589
+ log_status "DEBUG" "Output file: $output_file"
590
+ log_status "DEBUG" "LIVE_OUTPUT=$LIVE_OUTPUT, timeout=${timeout_seconds}s"
591
+
584
592
  log_status "INFO" "Starting $(driver_display_name) (timeout: ${ITERATION_TIMEOUT_MINUTES}m)..."
585
593
 
586
594
  # Execute with timeout
@@ -606,6 +614,8 @@ execute_iteration() {
606
614
  local claude_pid=$!
607
615
  local progress_counter=0
608
616
 
617
+ log_status "DEBUG" "Background PID: $claude_pid"
618
+
609
619
  while kill -0 $claude_pid 2>/dev/null; do
610
620
  progress_counter=$((progress_counter + 1))
611
621
  if [[ -f "$output_file" && -s "$output_file" ]]; then
@@ -614,8 +624,23 @@ execute_iteration() {
614
624
  sleep 10
615
625
  done
616
626
 
627
+ # Protect wait from set -e — capture exit code without crashing
628
+ set +e
617
629
  wait $claude_pid
618
630
  exit_code=$?
631
+ set -e
632
+ log_status "DEBUG" "Claude exited with code: $exit_code, output size: $(wc -c < "$output_file" 2>/dev/null || echo 0) bytes"
633
+
634
+ # If output is empty and exit code is non-zero, log diagnostic info
635
+ if [[ ! -s "$output_file" && $exit_code -ne 0 ]]; then
636
+ log_status "ERROR" "Claude produced no output and exited with code $exit_code"
637
+ log_status "DEBUG" "Checking if claude binary is responsive..."
638
+ if claude --version > /dev/null 2>&1; then
639
+ log_status "DEBUG" "claude binary OK: $(claude --version 2>&1)"
640
+ else
641
+ log_status "ERROR" "claude binary not responding"
642
+ fi
643
+ fi
619
644
  fi
620
645
 
621
646
  if [[ $exit_code -eq 0 ]]; then