claude-evolve 1.0.1 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -3,7 +3,7 @@
3
3
  set -e
4
4
 
5
5
  # Parse arguments
6
- count=1
6
+ count=20
7
7
  no_ai=false
8
8
 
9
9
  while [[ $# -gt 0 ]]; do
@@ -16,7 +16,7 @@ USAGE:
16
16
  claude-evolve ideate [N] [--no-ai]
17
17
 
18
18
  ARGUMENTS:
19
- N Number of ideas to generate (default: 1, max: 50)
19
+ N Number of ideas to generate (default: 20, max: 50)
20
20
 
21
21
  OPTIONS:
22
22
  --no-ai Use manual entry mode instead of AI generation
@@ -71,8 +71,8 @@ get_next_id() {
71
71
  # Find highest ID and increment (pure shell)
72
72
  local max_id=0
73
73
  while IFS=, read -r id rest; do
74
- if [[ $id =~ ^[0-9]+$ ]] && [[ $id -gt $max_id ]]; then
75
- max_id=$id
74
+ if [[ $id =~ ^[0-9]+$ ]] && (( 10#$id > max_id )); then
75
+ max_id=$((10#$id))
76
76
  fi
77
77
  done < <(tail -n +2 evolution/evolution.csv)
78
78
  echo $((max_id + 1))
@@ -8,16 +8,20 @@ YELLOW='\033[0;33m'
8
8
  RED='\033[0;31m'
9
9
  NC='\033[0m' # No Color
10
10
 
11
- # Get script directory
12
- SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
11
+ # Get script directory (resolve symlinks for global install)
12
+ SCRIPT_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}" 2>/dev/null || echo "${BASH_SOURCE[0]}")")" && pwd)"
13
13
  PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
14
14
 
15
15
  # Get version from package.json
16
16
  get_version() {
17
+ # Try package.json in project root first
17
18
  if [[ -f "$PROJECT_ROOT/package.json" ]]; then
18
19
  grep '"version"' "$PROJECT_ROOT/package.json" | sed 's/.*"version": *"\([^"]*\)".*/\1/'
20
+ # If not found, try npm list (for global installs)
21
+ elif command -v npm >/dev/null 2>&1; then
22
+ npm list -g claude-evolve --depth=0 2>/dev/null | grep claude-evolve | sed 's/.*@//' || echo "1.0.1"
19
23
  else
20
- echo "unknown"
24
+ echo "1.0.1"
21
25
  fi
22
26
  }
23
27
 
package/package.json CHANGED
@@ -1,9 +1,19 @@
1
1
  {
2
2
  "name": "claude-evolve",
3
- "version": "1.0.1",
3
+ "version": "1.0.3",
4
4
  "bin": {
5
- "claude-evolve": "./bin/claude-evolve"
5
+ "claude-evolve": "./bin/claude-evolve",
6
+ "claude-evolve-main": "./bin/claude-evolve-main",
7
+ "claude-evolve-setup": "./bin/claude-evolve-setup",
8
+ "claude-evolve-ideate": "./bin/claude-evolve-ideate",
9
+ "claude-evolve-run": "./bin/claude-evolve-run",
10
+ "claude-evolve-analyze": "./bin/claude-evolve-analyze"
6
11
  },
12
+ "files": [
13
+ "bin/",
14
+ "lib/",
15
+ "templates/"
16
+ ],
7
17
  "main": "index.js",
8
18
  "directories": {
9
19
  "doc": "docs"
package/BRIEF.md DELETED
@@ -1,41 +0,0 @@
1
- # Project Brief
2
-
3
- ## Vision
4
-
5
- We've been very excited by the possibilities and demonstrated performance of AlphaEvolve, and the potential
6
- of OpenEvolve, a clone of AlphaEvolve. However, in testing, it hasn't worked super well, at least for design
7
- of neural network architectures. It tended to evolve very slowly, and get caught up in local minima.
8
-
9
- We would like to like to try a new idea, based on the success of the claude-fsd approach (in
10
- ../claude-fsd for reference). The claude-fsd package is a npm package based on simple shell scripts
11
- that leans on `claude -p` to call the Claude Code coder to perform workhorse functions like planning,
12
- architecting, and running a plan-develop-test loop until the projet is done. So with this `claude-evolve`
13
- variant, we take the same approach for develop, but this is for algorithm development, using a
14
- plan-develop-run-record loop, running test after test and recording the quantitative results, all
15
- the while providing the human operator the opportunity to fill the tail end of the ideas pipeline
16
- using `claude-evolve ideate` or just interactive conversations with an AI, or editing the file
17
- directly.
18
-
19
- ## Core Requirements
20
-
21
- Commands:
22
-
23
- - claude-evolve -- runs a menu like with claude-fsd
24
- - claude-evolve setup -- sets up the baseline evolution/ files if they're not present, and allows for editing the brief
25
- - claude-evolve ideate [50] -- takes user input and launches `claude -p` to generate [param] new ideas
26
- - claude-evolve run -- runs the plan-develop-run-record loop
27
- - claude-evolve analyze -- shows a chart of performance and performance changes over time, and highlights the top candidate so far
28
-
29
- Files:
30
-
31
- - evolution/BRIEF.md -- a description of what the project is about and the goals of the thing to be evolved, as well as identifying the evolving algo baseline file and the evaluator file
32
- - evolution/evolution.csv -- a list of all iterations, with columns ID,basedonID,description,performance,status
33
- - evolution/evolution_details.md -- a description, for each ID, of the details of what should be changed or what did change, any commentary about the design or the performance, etc., all of which is optional
34
- - evolution/evolution_id[id].py -- a copy of the tested algo version at that ID
35
-
36
- The evaluator file takes the name of the file to test as its one argument, and outputs a dictionary with one performance metric as the output.
37
-
38
- ## Success Criteria
39
-
40
- - Success criterion 1
41
- - Success criterion 2
@@ -1,57 +0,0 @@
1
- # Claude-Evolve – AI Working Notes
2
-
3
- These notes capture my current understanding of the project, the major design choices already fixed in the brief / Q&A, and the open items that still require clarification. They are **living notes** – feel free to edit or extend them during the implementation.
4
-
5
- ## 1. Project Understanding
6
-
7
- 1. **Purpose** – Provide a lightweight command-line tool (`claude-evolve`) that orchestrates an _algorithm-evolution_ workflow driven by Claude AI. The tool repeatedly:
8
- • plans → develops a candidate → runs the evaluator → records the result → lets the user/AI propose the next mutation.
9
-
10
- 2. **Inspiration** – It mirrors the successful `claude-fsd` package (software delivery), but targets algorithm R&D. The entire CLI is implemented as simple **Bourne-compatible shell scripts** published as an **npm** package – no compiled binaries, no extra runtime besides POSIX sh and Node.
11
-
12
- 3. **Artifacts produced**
13
- • `evolution/BRIEF.md`  – high-level goal of the algorithm being optimised
14
- • `evolution/evolution.csv` – log of all candidates (ID,basedOnID,description,performance,status)
15
- • `evolution/evolution_details.md` – free-form explanation / commentary per candidate
16
- • `evolution/evolution_idNNN.<ext>` – snapshot of the concrete algorithm evaluated
17
-
18
- 4. **Evaluator contract** – An _executable_ (often Python, but not required) that receives the candidate file path as its sole argument and prints a **single-line JSON dict** to stdout, e.g. `{"score": 0.87}`. Claude-evolve treats the first numeric value in that dict as "performance" (higher is better).
19
-
20
- ## 2. Key Technical Decisions & Rationale
21
-
22
- • **Shell scripts in an npm package** – keeps the runtime guarantees identical to `claude-fsd`, leverages cross-platform Node installer, and avoids the overhead of compiling/packaging native binaries.
23
-
24
- • **LLM-driven search** – instead of classic genetic algorithms, we rely on Claude to suggest mutations based on the project history and metrics. The human operator can inject ideas at any point (`claude-evolve ideate`).
25
-
26
- • **File-system persistence** – CSV + Markdown files are trivial to diff and review in Git. Snap-shooting each algorithm version guarantees perfect reproducibility.
27
-
28
- • **Single-metric MVP** – Start with exactly one performance number to keep the loop simple; extend to multi-metric later (post-MVP roadmap).
29
-
30
- • **Menu _and_ sub-commands** – An interactive menu for exploratory use, plus explicit sub-commands for CI automation, following `claude-fsd` precedent.
31
-
32
- • **Visualization as PNG via Node** – Node libraries (e.g. `chartjs-node-canvas`) generate a static PNG for `claude-evolve analyze`, sidestepping browser dependencies.
33
-
34
- • **Git-first workflow** – All artifacts (except large training artefacts / checkpoints) tracked in Git. Users work on feature branches; PRs reviewed like any other code change.
35
-
36
- • **Strict YAGNI** – Avoid prematurely implementing fancy features (branching selection strategies, cloud storage, etc.) until a real need emerges.
37
-
38
- ## 3. Assumptions & Constraints
39
-
40
- 1. `claude` CLI is installed and authenticated in the user’s environment.
41
- 2. Users have a POSIX-style shell environment (bash/zsh/sh) and Node ≥16.
42
- 3. Evaluations may be _slow_ and resource-intensive; scheduling and cost control are left to the evaluator implementation.
43
- 4. The repository **should not** store large binary artefacts – evaluator is responsible for external storage if needed.
44
- 5. Concurrency: MVP evaluates _one_ candidate at a time; optional parallelism (max-N background processes) is documented as a stretch goal.
45
-
46
- ## 4. Areas Requiring Future Clarification
47
-
48
- • **Charting implementation** – exact Node library and minimum PNG spec (size, axis labels).
49
- • **Pre-commit policy** – exactly which linters (shellcheck, shfmt, prettier-markdown, …) are required.
50
- • **Timeout/Resource limits** – default wall-clock limit for an evaluation and how to surface that to the user.
51
- • **Multi-metric support** – data model changes (`evolution.csv`) once we decide to support >1 metric.
52
- • **Security/PII** – explicit organisational policy might evolve (currently "no constraints").
53
- • **Distribution** – npm org name, versioning scheme, release cadence.
54
-
55
- ---
56
-
57
- These notes should evolve alongside the code. When a decision is implemented, reflect it here so future contributors can quickly understand the rationale.
package/docs/IDEAS.md DELETED
@@ -1,168 +0,0 @@
1
- # Claude-Evolve Future Ideas
2
-
3
- This file tracks potential enhancements and features that could be added to claude-evolve in the future.
4
-
5
- ## CLI Enhancements
6
-
7
- ### Interactive Menu Improvements
8
-
9
- - Add keyboard shortcuts (arrow keys) for menu navigation
10
- - Implement command search/filtering in interactive mode
11
- - Add history of recent commands in interactive menu
12
-
13
- ### CLI Usability
14
-
15
- - Add shell completion support (bash, zsh, fish)
16
- - Implement command aliases (e.g., `claude-evolve i` for `ideate`)
17
- - Add progress bars for long-running operations
18
- - Colorized output with configurable themes
19
- - Implement timeout presets (--timeout-short, --timeout-medium, --timeout-long) for common use cases
20
- - Add timeout estimation based on historical evaluator performance
21
- - Create timeout warnings when approaching the limit during evaluation
22
- - Add configurable default timeout in project configuration file
23
-
24
- ### Ideation Enhancements
25
-
26
- - Add a `--from-file` option to ideate command for bulk importing ideas
27
- - Implement idea similarity detection using embeddings or simple text comparison
28
- - Add progress bar for multi-idea generation
29
- - Create idea templates for common algorithm patterns
30
- - Add support for idea categories or tags for better organization
31
- - Implement idea rating/scoring before evaluation
32
- - Add interactive mode for refining AI-generated ideas
33
- - Cache BRIEF.md content to improve performance
34
-
35
- ## Testing Framework Enhancements
36
-
37
- ### Test Coverage
38
-
39
- - Add integration tests for template copying functionality
40
- - Implement test mocks for Claude API calls
41
- - Add performance/benchmark tests for CLI operations
42
- - Create end-to-end workflow tests
43
- - Add comprehensive unit tests for CSV manipulation functions in lib/common.sh
44
- - Fix run command implementation to resolve test failures (prioritize over environment blame)
45
- - Add tests for concurrent execution scenarios when parallel mode is implemented
46
- - Create stress tests for large CSV files and many candidates
47
- - Implement proper error handling in cmd_run to prevent silent failures
48
- - Add debugging output to understand why tests are failing in npm test environment
49
-
50
- ### Test Infrastructure
51
-
52
- - Add test coverage reporting
53
- - Implement parallel test execution
54
- - Add visual regression testing for generated charts
55
- - Create test data generators and fixtures
56
-
57
- ## Development Workflow
58
-
59
- ### Code Quality
60
-
61
- - Add more sophisticated pre-commit hooks
62
- - Add pre-commit hook to run shellcheck and catch linting issues before commits
63
- - Implement automated dependency vulnerability scanning
64
- - Add code complexity analysis
65
- - Create automated documentation generation
66
- - Add automatic changelog generation from conventional commits
67
- - Implement semantic versioning based on conventional commit types
68
- - Consider adding commit message linting for conventional commit standards (✅ COMPLETED)
69
- - Add git hook integrity checks to prevent legacy hook conflicts
70
- - Implement automated commit message template generation for consistency
71
-
72
- ### Build System
73
-
74
- - Add Docker containerization for consistent development environment
75
- - Implement cross-platform build verification
76
- - Add automated changelog generation
77
- - Create release automation workflows
78
-
79
- ## Future Phase Ideas
80
-
81
- ### Enhanced Error Handling
82
-
83
- - Implement structured error codes and recovery suggestions
84
- - Add error telemetry collection (with privacy controls)
85
- - Create error reproduction scripts for debugging
86
- - Add graceful degradation modes
87
-
88
- ### Configuration System
89
-
90
- - Add configuration file support (.claude-evolve.json)
91
- - Implement environment-specific configurations
92
- - Add configuration validation and migration tools
93
- - Create configuration templates for common scenarios
94
-
95
- ### Monitoring and Observability
96
-
97
- - Add execution time tracking and optimization suggestions
98
- - Implement resource usage monitoring (memory, CPU)
99
- - Create performance regression detection
100
-
101
- ### Testing Infrastructure Improvements
102
-
103
- - **Automated Testing Matrix**: Set up GitHub Actions CI pipeline with multiple OS testing (Ubuntu, macOS, Windows WSL)
104
- - **Shell Script Coverage**: Implement code coverage reporting for shell scripts using tools like bashcov or kcov
105
- - **Performance Benchmarking**: Add automated performance tests to detect CLI execution speed regressions
106
- - **Integration Test Environments**: Create Docker-based test environments for consistent testing across platforms
107
- - **Test Data Management**: Implement test fixture management for reproducible testing scenarios
108
- - **Parallel Test Execution**: Optimize test suite execution time through parallel test running
109
- - **Test Result Reporting**: Add comprehensive test result reporting with trend analysis
110
- - **Mock Service Improvements**: Enhance Claude API mocking with more realistic response scenarios and error conditions
111
- - **Bats Environment Documentation**: Document the TMPDIR requirements for Bats tests in the README
112
- - **Cross-platform Test Compatibility**: Verify TMPDIR solution works across different platforms
113
- - **Test Runner Consolidation**: Consider whether to maintain both Bats and shell-based test runners
114
-
115
- ### Enhanced Timeout Management
116
-
117
- - **Granular Timeout Controls**: Support timeout specification in minutes/hours (e.g., `--timeout 5m`, `--timeout 2h`)
118
- - **Process Group Management**: Implement proper process group cleanup to handle evaluators that spawn subprocesses
119
- - **Timeout Recovery Strategies**: Add automatic retry mechanisms for timeout scenarios with backoff logic
120
- - **Cross-platform Timeout**: Ensure consistent timeout behavior across Linux, macOS, and Windows WSL environments
121
- - **Timeout Monitoring**: Add real-time timeout countdown display during evaluation execution
122
- - **Smart Timeout Recommendations**: Analyze historical evaluation times to suggest optimal timeout values
123
- - Add execution analytics and insights
124
- - Implement CSV schema validation to catch column mismatch issues at runtime
125
- - Consider using a more robust CSV parsing library or approach to prevent manual column indexing errors
126
-
127
- ## Architecture Improvements
128
-
129
- ### Modularity
130
-
131
- - Extract common CLI patterns into reusable library
132
- - Implement plugin architecture for extensibility
133
- - Add support for custom command extensions
134
- - Create standardized interfaces for evaluators
135
-
136
- ### Performance
137
-
138
- - Implement caching for frequently accessed data
139
- - Add lazy loading for heavy operations
140
- - Optimize JSON parsing and file operations
141
- - Create efficient batch processing modes
142
-
143
- ## Documentation and User Experience
144
-
145
- ### Documentation
146
-
147
- - Add man page generation
148
- - Create interactive tutorial mode
149
- - Implement contextual help system
150
- - Add troubleshooting guides and FAQ
151
-
152
- ### User Experience
153
-
154
- - Add onboarding wizard for new projects
155
- - Implement project templates and examples
156
- - Create guided workflow suggestions
157
- - Add undo/rollback functionality for destructive operations
158
-
159
- ## Repository Management
160
-
161
- ### Branch Protection Enhancements
162
-
163
- - Consider adding required status checks once CI/CD is implemented in Phase 7
164
- - Evaluate enabling linear history requirement to simplify merge scenarios
165
- - Add automated branch protection rule updates when new CI checks are added
166
- - Implement branch protection rule validation/testing to ensure proper configuration
167
- - Consider adding protection for other important branches (develop, release branches)
168
- - Add monitoring/alerting for branch protection rule changes
package/docs/PLAN.md DELETED
@@ -1,213 +0,0 @@
1
- # Claude-Evolve – Implementation Plan
2
-
3
- The plan is organised into sequential _phases_ – each phase fits comfortably in a feature branch and ends in a working, testable state. Tick the `[ ]` check-box when the task is complete.
4
-
5
- ---
6
-
7
- ## Phase 0 – Repository & SDLC Skeleton
8
-
9
- - [x] Initialise Git repository (if not already) and push to remote
10
- > ✅ **COMPLETED**: Git repository initialized, remote configured as https://github.com/willer/claude-evolve.git, and all commits successfully pushed to origin/main.
11
- - [x] Add `.gitignore` (node*modules, evolution/*.png, \_.log, etc.)
12
- > ✅ **COMPLETED**: Comprehensive .gitignore implemented covering Node.js dependencies, OS files, editor files, build outputs, and project-specific evolution artifacts.
13
- - [x] Enable conventional commits / commitlint (optional)
14
- > ✅ **COMPLETED**: Commitlint configuration properly set up with conventional commit standards, integrated with pre-commit framework, and tested to reject invalid commits while accepting valid ones.
15
- - [x] Configure branch protection rules (main protected, feature branches for work)
16
- > ✅ **COMPLETED**: Branch protection rules configured for main branch - requires PR reviews (1 approver), dismisses stale reviews, enforces admin compliance, blocks direct pushes and force pushes.
17
- > ⚠️ **PROCESS VIOLATION**: Developer worked directly on main branch instead of creating feature branch, contradicting established workflow. Future work must follow "One feature branch per phase" process.
18
-
19
- ### Tooling Baseline
20
-
21
- - [x] `npm init -y` – create `package.json`
22
- > ✅ **COMPLETED**: Generated package.json with default values for claude-evolve project.
23
- - [x] Add `bin/claude-evolve` entry in `package.json` (points to `./bin/claude-evolve.sh`)
24
- > ✅ **COMPLETED**: Added bin field to package.json enabling CLI functionality via "./bin/claude-evolve.sh".
25
- - [x] Install dev-dependencies:
26
- • `shellcheck` & `shfmt` (lint/format shell scripts)
27
- • `@commitlint/*`, `prettier` (markdown / json formatting) > ✅ **COMPLETED**: Installed shellcheck, shfmt, @commitlint/cli, @commitlint/config-conventional, and prettier. Added npm scripts for linting and formatting. Downloaded shfmt binary locally due to npm package issues.
28
- - [x] Add **pre-commit** config (`.pre-commit-config.yaml`) running:
29
- • shellcheck
30
- • shfmt
31
- • prettier –write "\*.md" > ✅ **COMPLETED**: Created .pre-commit-config.yaml with hooks for shellcheck (shell linting), shfmt (shell formatting), and prettier (markdown formatting).
32
- - [x] Add Husky or pre-commit-hooks via `npm pkg set scripts.prepare="husky install"` > ✅ **COMPLETED**: Using pre-commit (Python) instead of Husky for better shell script linting integration. Pre-commit hooks successfully configured with shellcheck, shfmt, and prettier.
33
-
34
- ---
35
-
36
- ## Phase 1 – Minimal CLI Skeleton
37
-
38
- Directory layout
39
-
40
- - [x] `bin/claude-evolve.sh` – argument parsing stub (menu + sub-commands)
41
- > ✅ **COMPLETED**: Created main CLI script with argument parsing, command routing to `cmd_<name>` functions, and interactive menu.
42
- - [x] `lib/common.sh` – shared helper functions (logging, json parsing)
43
- > ✅ **COMPLETED**: Implemented logging functions, JSON parsing with jq, file validation, and utility functions with proper error handling.
44
- - [x] `templates/` – default files copied by `setup`
45
- > ✅ **COMPLETED**: Created template directory with BRIEF.md, evaluator.py, and algorithm.py templates for project initialization.
46
-
47
- Core behaviour
48
-
49
- - [x] `claude-evolve --help` prints usage & version (from package.json)
50
- > ✅ **COMPLETED**: Implemented help functionality with comprehensive usage information and dynamic version extraction from package.json.
51
- - [x] No-arg invocation opens interactive menu (placeholder)
52
- > ✅ **COMPLETED**: Interactive menu system with numbered options for all commands, proper input validation, and error handling.
53
- - [x] `claude-evolve <cmd>` routes to `cmd_<name>` bash functions
54
- > ✅ **COMPLETED**: Command routing system implemented with proper argument passing and unknown command handling.
55
-
56
- Unit tests
57
-
58
- - [x] Add minimal Bats-core test verifying `--help` exits 0
59
- > ✅ **COMPLETED**: Comprehensive Bats test suite covering help flags, version flags, command routing, error handling, and exit codes. Updated package.json test script.
60
-
61
- ---
62
-
63
- ## Phase 2 – `setup` Command ✅
64
-
65
- > ✅ **COMPLETED**: `cmd_setup` fully implemented to initialize evolution workspace.
66
-
67
- - [x] `claude-evolve setup` creates `evolution/` folder if absent
68
- > ✅ **COMPLETED**: Created `evolution/` directory as needed.
69
- - [x] Copy template `BRIEF.md`, `evaluator.py`, baseline `algorithm.py`
70
- > ✅ **COMPLETED**: Templates copied to `evolution/` directory.
71
- - [x] Generate `evolution.csv` with header `id,basedOnId,description,performance,status`
72
- > ✅ **COMPLETED**: Evolution CSV file created with correct header.
73
- - [x] Open `$EDITOR` for the user to edit `evolution/BRIEF.md`
74
- > ✅ **COMPLETED**: Brief file opened in editor in interactive mode; skipped if non-interactive.
75
- - [x] Idempotent (safe to run again)
76
- > ✅ **COMPLETED**: Re-running command does not overwrite existing files or reopen editor unnecessarily.
77
-
78
- ---
79
-
80
- ## Phase 3 – Idea Generation (`ideate`) ✅ COMPLETED
81
-
82
- > ✅ **COMPLETED**: `cmd_ideate` fully implemented to generate algorithm ideas with AI-driven and manual entry modes.
83
-
84
- - [x] `claude-evolve ideate [N]` (default: 1)
85
- - [x] Prompt Claude (`claude -p`) with a template pulling context from:
86
- • The project `evolution/BRIEF.md`
87
- • Recent top performers from `evolution.csv`
88
- - [x] Append new rows into `evolution.csv` with blank performance/status
89
- - [x] Offer interactive _manual entry_ fallback when `–no-ai` is passed or Claude fails
90
-
91
- ---
92
-
93
- ## Phase 4 – Candidate Execution Loop (`run`) ✅ COMPLETED
94
-
95
- > ✅ **COMPLETED**: Core `cmd_run` functionality fully implemented with comprehensive error handling and CSV manipulation.
96
-
97
- Basic MVP ✅
98
-
99
- - [x] Implement `cmd_run` function with complete evolution workflow
100
- - [x] Implement CSV manipulation functions in lib/common.sh:
101
- - [x] `update_csv_row` - Update CSV rows with performance and status (with file locking)
102
- - [x] `find_oldest_empty_row` - Find next candidate to execute
103
- - [x] `get_csv_row` - Extract row data for processing
104
- - [x] `generate_evolution_id` - Generate unique IDs for new evolution files
105
- - [x] CSV file locking mechanism for concurrent access (atomic updates with .lock files)
106
- - [x] Select the **oldest** row in `evolution.csv` with empty status
107
- - [x] Build prompt for Claude to mutate the parent algorithm (file path from `basedOnId`)
108
- - [x] Save generated code as `evolution/evolution_idXXX.py` (preserves Python extension)
109
- - [x] Invoke evaluator (`python3 $EVALUATOR $filepath`) and capture JSON → performance
110
- - [x] Update CSV row with performance and status `completed` or `failed`
111
- - [x] Stream progress log to terminal (ID, description, performance metric)
112
-
113
- Error handling ✅
114
-
115
- - [x] Detect evaluator non-zero exit → mark `failed`
116
- - [x] Graceful Ctrl-C → mark current row `interrupted` (signal handler with trap)
117
- - [x] Claude CLI availability check with helpful error messages
118
- - [x] Missing evolution workspace detection
119
- - [x] No empty rows available detection
120
- - [x] Parent algorithm file validation
121
- - [x] JSON parsing validation for evaluator output
122
- - [x] File permission and I/O error handling
123
-
124
- Additional Features ✅
125
-
126
- - [x] Support for `CLAUDE_CMD` environment variable (enables testing with mock Claude)
127
- - [x] Proper file extension handling for generated algorithms
128
- - [x] Comprehensive logging with status updates
129
- - [x] Atomic CSV operations to prevent corruption
130
- - [x] Full test coverage with Bats test suite (run command tests passing)
131
- > ✅ **COMPLETED**: All run command tests pass when run via `npm test`.
132
-
133
- ---
134
-
135
- ## Phase 5 – Enhancements to `run`
136
-
137
- **🔄 STATUS UPDATE**: Timeout functionality has been validated and is working correctly!
138
-
139
- > ⚠️ **INCOMPLETE**: Implementation exists but is currently failing the Bats test suite. Please ensure the timeout logic (exit codes, error messaging, and process cleanup) aligns with test expectations and fix or update tests as needed (see Phase 7).
140
-
141
- - [ ] `--parallel <N>` → run up to N candidates concurrently (background subshells)
142
- - [ ] ETA & throughput stats in the live log
143
-
144
- ---
145
-
146
- ## Phase 6 – Analyse (`analyze`) ✅
147
-
148
- - [x] Parse `evolution.csv` into memory (Node.js with csv-parser)
149
- - [x] Identify top performer and display table summary
150
- - [x] Render PNG line chart (performance over iteration) to `evolution/performance.png`
151
- - [x] `--open` flag opens the PNG with `open` (mac) / `xdg-open`
152
-
153
- Implementation Notes ✅
154
-
155
- - [x] Created Node.js analyzer script at `bin/analyze.js` using chartjs-node-canvas for PNG generation
156
- - [x] Added csv-parser dependency for robust CSV handling
157
- - [x] Implements comprehensive summary statistics (total, completed, running, failed, pending candidates)
158
- - [x] Displays top performer with ID, performance score, and description
159
- - [x] Generates line chart showing performance progression over evolution IDs
160
- - [x] Cross-platform file opening support (macOS `open`, Linux `xdg-open`)
161
- - [x] Robust error handling for malformed CSVs, missing files, and empty datasets
162
- - [x] Full CLI integration with proper argument forwarding
163
- - [x] Comprehensive help documentation and usage examples
164
- - [x] Graceful handling of edge cases (no completed candidates, single data points)
165
-
166
- ---
167
-
168
- ## Phase 7 – Testing & CI ⚠️ INCOMPLETE
169
-
170
- **Phase 7 Status**: ⚠️ **INCOMPLETE** – 32 of 44 Bats tests failing (73% failure rate), fundamental implementation bugs block progress.
171
-
172
- **Next Developer Requirements (critical)**:
173
-
174
- - [ ] Fix existing Bats test failures without modifying tests:
175
- - Resolve timeout CSV update logic broken in test scenarios
176
- - Correct ideate command error handling and validation (tests 13–19)
177
- - Address run command processing failures in candidate workflow (tests 22–37)
178
- - Repair CSV manipulation functions not working as designed (tests 22–23, 38–44)
179
- - Align error message patterns and validation logic across commands
180
- - [ ] Achieve 100% Bats test pass rate (44/44 passing)
181
- - [ ] Follow a test-driven development approach with continuous validation
182
-
183
- **Remaining CI Setup**:
184
-
185
- - [ ] Set up GitHub Actions CI pipeline
186
- - [ ] Add shellcheck integration to test suite
187
-
188
- ---
189
-
190
- ## Phase 8 – Documentation & Release Prep
191
-
192
- - [ ] Update `README.md` with install / quick-start / screenshots
193
- - [ ] Add `docs/` usage guides (ideation, branching, parallelism)
194
- - [ ] Write CHANGELOG.md (keep-a-changelog format)
195
- - [ ] `npm publish --access public`
196
-
197
- ---
198
-
199
- ## Post-MVP Backlog (Nice-to-Have)
200
-
201
- - [ ] Multi-metric support (extend CSV → wide format)
202
- - [ ] Branch visualiser (graphviz) showing basedOnId tree
203
- - [ ] Cloud storage plugin for large artefacts (S3, GCS)
204
- - [ ] Web UI wrapper around analyse output
205
- - [ ] Auto-generation of release notes from CSV improvements
206
-
207
- ---
208
-
209
- ### Process Notes
210
-
211
- • One _feature branch_ per phase or sub-feature – keep PRs small.
212
- • Each merged PR must pass tests & pre-commit hooks.
213
- • Strict adherence to **YAGNI** – only ship what is necessary for the next user-visible increment.
package/docs/QUESTIONS.md DELETED
@@ -1,211 +0,0 @@
1
- # Claude-Evolve Project – Clarifying Questions
2
-
3
- Below is a focused list of open questions that surfaced while analysing the current BRIEF.md. Answering them will prevent the development team (human and AI) from making incorrect assumptions during implementation.
4
-
5
- ## 1. Technical Architecture & Tooling
6
-
7
- 1. **Primary implementation language** – The brief references both npm (JavaScript/TypeScript) and Python artefacts. Should the CLI itself be written in Node / TypeScript, Python, or a hybrid approach?
8
- Let's keep it simple: shell script in a npm package, just like claude-fsd. I'm a curmudgeon this way.
9
- The evaluator itself doesn't have to be python, but probably is. It's a good point, in that we shouldn't
10
- just assume the file extension of the algo and evaluator are `py`.
11
-
12
- 2. **Package distribution** – Will claude-evolve be published to a public package registry (e.g. npm, PyPI) or consumed only from source? This influences versioning and dependency policies.
13
- public package, just like claude-fsd
14
-
15
- 3. **Prompt templates for Claude** – Are there predefined prompt skeletons the CLI should inject when calling `claude -p`, or should prompts be assembled dynamically from the project state?
16
- We don't have the prompts now. Take a look at what's in claude-fsd, and use that to write something
17
- that makes sense. We can tweak it after.
18
-
19
- 4. **Evaluator I/O contract** – Must the evaluator print a JSON string to stdout, write to a file, or return a Python dict via IPC? Clarify the exact interface so automation can parse results reliably.
20
- Evaluator must print a JSON dictionary to stdout.
21
-
22
- ## 2. Data & Persistence Model
23
-
24
- 5. **`evolution.csv` schema details** – Beyond the five columns listed, are additional fields (e.g. timestamp, random seed, hyper-parameters) required? What fixed set of status codes are expected?
25
- No additional fields required. Maybe status codes are just '' (meaning not yet implemented), 'failed', 'completed'?
26
-
27
- 6. **Large artefact storage** – If evolved algorithms produce sizeable checkpoints or models, should those be committed to git, stored in a separate artefact store, or ignored entirely?
28
- Let's ignore entirely. The algorithm or evaluator will have to decide what to do with those files.
29
- The initial use case for this involves an algorithm/evaluator that trains ML models in Modal, so
30
- that will exercise this idea.
31
-
32
- ## 3. Evolution Strategy & Workflow
33
-
34
- 7. **Selection policy** – How should the next parent candidate be chosen (best-so-far, weighted sampling, user selection)? Is there a configurable strategy interface?
35
- Parent candidate is based on basedonID. ID 000 is implied as the baseline. No weighted sampling or user
36
- selection. This is an LLM-driven R&D system, not using any old mathy-type approaches like are in
37
- AlphaEvolve.
38
-
39
- 8. **Stopping condition** – What criteria (max iterations, plateau patience, absolute metric) should cause `claude-evolve run` to stop automatically?
40
- Keep running until it's out of candidates.
41
-
42
- 9. **Parallel evaluations** – Is concurrent execution of evaluator jobs desirable? If so, what is the preferred concurrency mechanism (threads, processes, external cluster)?
43
- Interesting idea! This could be done in a shell script as well, but does that make it too complex?
44
- It would have to be max N processes as the mechanism.
45
-
46
- ## 4. User Experience
47
-
48
- 10. **CLI menu vs. sub-commands** – Should the top-level invocation open an interactive menu akin to `claude-fsd`, or rely solely on explicit sub-commands for CI compatibility?
49
- both as per `claude-fsd`.
50
-
51
- 11. **Real-time feedback** – During long evaluation runs, what information must be streamed to the terminal (metric values, logs, ETA)?
52
- All of the above. Whatever the py files are saying, plus status and performance and iteration ID, etc.
53
- by `claude-evolve`'s scripts.
54
-
55
- 12. **Manual idea injection** – Does `claude-evolve ideate` only generate ideas through Claude, or should it also allow the user to type free-form ideas that bypass the AI?
56
- Totally the user could enter it at any time. Ideate could possibly allow them to edit the file directly,
57
- like "Ask AI to add new ideas? [Y/n]", and "User directly add new ideas? [y/N]"
58
-
59
- ## 5. Analysis & Visualisation
60
-
61
- 13. **Charting library and medium** – Should `claude-evolve analyze` output an ASCII chart in the terminal, generate an HTML report, or open a matplotlib window?
62
- I think `claude-evolve analyze` could make a png chart with ... I guess it would have to use node
63
- somehow for this, given that this is an npm package?
64
-
65
- 14. **Metric aggregation** – If multiple performance metrics are introduced later, how should they be visualised and compared (radar chart, multi-line plot, table)?
66
- No idea. Right now it's just a performance number.
67
-
68
- ## 6. Operations & Compliance
69
-
70
- 15. **Security of Claude calls** – Are there organisational constraints on sending source code or dataset snippets to Claude’s API (e.g. PII redaction, encryption at rest)? Define any red-lines to avoid accidental data leakage.
71
- There are not. Assume that's handled by the organization.
72
-
73
- ## 7. Development Process Issues
74
-
75
- 16. **Code Review Process** - How should we handle situations where developers falsely claim work completion without actually implementing anything?
76
-
77
- **Context**: This issue has been resolved. Git repository has been properly initialized with comprehensive .gitignore, initial commit made, and proper development process established.
78
-
79
- **Status**: ✅ RESOLVED - Git repository now properly initialized with comprehensive .gitignore covering Node.js dependencies, OS files, editor files, build outputs, and project-specific evolution artifacts. Initial commit completed with all project documentation.
80
-
81
- ## 8. Git Remote Repository Setup
82
-
83
- 17. **Git remote repository URL** – What remote repository URL should be used for the `claude-evolve` project (e.g., GitHub, GitLab, self-hosted)? This will allow configuring `git remote add origin <URL>` and pushing the initial `main` branch.
84
-
85
- **Context**: Remote `origin` configured to https://github.com/willer/claude-evolve.git and initial `main` branch pushed successfully.
86
- **Status**: ✅ RESOLVED
87
-
88
- ## 9. Pre-commit Hook Strategy
89
-
90
- 18. **Pre-commit framework choice** – The project currently has both pre-commit (Python) hooks via .pre-commit-config.yaml and claims about Husky (Node.js) integration. Which approach should be the canonical pre-commit solution? Having both could lead to conflicts or confusion.
91
-
92
- **Context**: The developer implemented pre-commit (Python) hooks successfully, but falsely claimed to also implement Husky/lint-staged without actually doing so. This creates confusion about the intended approach.
93
-
94
- **Status**: ✅ RESOLVED - Chose pre-commit (Python) as the canonical pre-commit solution. Removed incomplete Husky setup (.husky directory) and updated PLAN.md. Pre-commit provides better integration with shell script tooling (shellcheck, shfmt) and is already working effectively for code quality enforcement.
95
-
96
- ## 10. Run Command Implementation Questions
97
-
98
- 25. **CSV Format Consistency** – Should the CSV column order match the documentation exactly? CSV should have five columns (id,basedOnId,description,performance,status).
99
-
100
- 26. **Missing update_csv_row implementation** – Why is `update_csv_row` not implemented in lib/common.sh? Should the CSV update logic be committed?
101
-
102
- 27. **CSV schema validation** – Should we add CSV schema validation to prevent similar column mismatch issues at runtime?
103
-
104
- 28. **Shellcheck warnings resolution** – Should the remaining shellcheck warnings (SC2086, SC2206) be addressed as part of code quality improvements?
105
-
106
- 29. **Unit tests for CSV manipulation** – Would it be beneficial to add specific unit tests for CSV manipulation functions?
107
-
108
- 30. **jq requirement for cmd_run** – Should the `cmd_run` implementation verify that the `jq` command-line tool is installed and provide a clear error message if missing?
109
-
110
- **Status**: ✅ RESOLVED - Added a pre-flight `jq` availability check in `cmd_run()` to provide a clear error if the JSON parser is missing.
111
-
112
- 33. **Duplicate/similar idea handling** – How should the ideate command handle duplicate or very similar ideas?
113
- 34. **Idea editing/removal** – Should there be a way to edit or remove ideas after they're added?
114
- 35. **Claude API rate limits and timeouts** – What's the best way to handle Claude API rate limits or timeouts?
115
- 36. **Idea metadata fields** – Should ideas have additional metadata like creation timestamp or source (AI vs manual)?
116
-
117
- ## 14. Conventional Commits Integration
118
-
119
- 53. **Commitlint and pre-commit integration** – Should commitlint be integrated with the existing pre-commit framework or use a separate Git hook system? How do we handle the conflict between pre-commit's Python-based approach and potential Node.js-based commit linting?
120
-
121
- **Status**: ✅ RESOLVED - Successfully integrated commitlint with pre-commit framework using the alessandrojcm/commitlint-pre-commit-hook. This provides a clean integration that leverages the existing pre-commit infrastructure without needing a separate Node.js-based Git hook system.
122
-
123
- ## 15. Commitlint Hook Integration
124
-
125
- 54. **Pre-commit legacy hook conflicts** – The legacy pre-commit hook (/Users/willer/GitHub/claude-evolve/.git/hooks/pre-commit.legacy) was causing interference with the commitlint configuration. Should we investigate cleaning up legacy Node.js pre-commit installations to prevent hook conflicts?
126
-
127
- **Status**: ✅ RESOLVED - Removed the problematic legacy pre-commit hook that was trying to execute non-existent ./node_modules/pre-commit/hook. The commitlint hook now works correctly and properly validates commit messages according to conventional commit standards.
128
-
129
- ## 16. Branch Protection Configuration
130
-
131
- 55. **Branch protection enforcement level** – The current configuration requires 1 PR review and enforces admin compliance. Should we add additional protections like requiring status checks from CI/CD once GitHub Actions are set up? Should we require linear history to prevent complex merge scenarios?
132
-
133
- 56. **Status checks integration** – Once CI/CD is implemented, should specific status checks (like test passing, linting, etc.) be required before merging? This would require updating the branch protection rules after Phase 7 CI implementation.
134
-
135
- ## 17. Git Workflow Compliance
136
-
137
- 57. **Feature branch enforcement** – How should we ensure developers follow the "One feature branch per phase" process established in the plan, especially given that branch protection rules are now in place? Should we add automation to detect when work is done directly on main branch?
138
-
139
- 58. **Branch naming conventions** – Should we establish standardized branch naming conventions (e.g., feature/phase-X-description) to improve project organization and automate branch management?
140
-
141
- ## 18. Timeout Implementation Questions
142
-
143
- 59. **Process group management** – The current timeout implementation uses bash's `timeout` command which may not kill all child processes if the evaluator spawns subprocesses. Should we implement process group killing (`timeout --kill-after`) to ensure complete cleanup?
144
-
145
- 60. **Timeout granularity** – Should we support more granular timeout specification (e.g., minutes, hours) or is seconds sufficient for most use cases?
146
-
147
- 61. **Default timeout behavior** – Should there be a default timeout value when none is specified, or should the current unlimited behavior be maintained? What would be a reasonable default if implemented?
148
-
149
- 62. **Timeout status differentiation** – Should we differentiate between different types of timeouts (wall-clock vs CPU time) or provide more granular timeout status information?
150
-
151
- 63. **Timeout recovery** – Should there be automatic retry mechanisms for timed-out evaluations, or should users manually handle timeout scenarios?
152
-
153
- 64. **Cross-platform timeout compatibility** – The bash `timeout` command may behave differently across platforms (Linux vs macOS vs Windows with WSL). Should we test and document platform-specific timeout behavior?
154
-
155
- ## 19. Testing Infrastructure Crisis
156
-
157
- 65. **Critical test failure root cause** – All timeout-related tests are failing despite the implementation appearing correct. What is causing the widespread test infrastructure failure? Is this a Bats configuration issue, environment problem, or fundamental implementation flaw?
158
-
159
- **Status**: ⚠️ PARTIALLY ADDRESSED - While Bats was installed, 25+ tests are still failing indicating real implementation bugs rather than infrastructure issues. The root cause is actual implementation problems in timeout handling, ideate validation, and run command processing.
160
-
161
- 66. **Test environment integrity** – Should we implement alternative testing approaches (manual shell scripts, docker-based tests) to verify functionality while Bats issues are resolved?
162
-
163
- **Status**: ⚠️ PARTIALLY ADDRESSED - Shell-based test suite was created but reveals the same core implementation issues. Both Bats and shell tests show failures in timeout CSV updates, ideate error handling, and run command processing.
164
-
165
- 67. **Timeout verification methodology** – How can we verify the timeout functionality works correctly when the testing framework itself is broken? Should we create standalone verification scripts?
166
-
167
- **Status**: ❌ NOT RESOLVED - Previous claims of validation were incorrect. Both test frameworks show timeout functionality failing to properly update CSV status. The implementation has bugs that need to be fixed, not just verified.
168
-
169
- ## 20. Critical Implementation Debugging Questions
170
-
171
- 68. **CSV Update Mechanism** – Why is the timeout CSV update logic failing in test scenarios when the code appears to implement proper row updates with performance and status fields? Is there a race condition or file locking issue?
172
-
173
- 69. **Ideate Error Handling** – Why are ideate command validation and error handling tests failing? Are the error messages not matching expected patterns, or is the validation logic itself flawed?
174
-
175
- 70. **Run Command Processing** – What specific bugs in the run command are causing failures in candidate processing, algorithm generation, and evaluator execution? Are there issues with file paths, CSV parsing, or Claude API integration?
176
-
177
- 71. **Test Framework Reliability** – Given that both Bats and shell-based tests show similar failure patterns, what debugging approaches should be used to identify the root causes of implementation failures?
178
-
179
- 72. **Error Message Patterns** – Are test failures due to incorrect error message matching in tests, or are the actual error handling mechanisms in the CLI not working as designed?
180
-
181
- ## 21. Future Testing Considerations (Blocked Until Core Issues Resolved)
182
-
183
- 73. **CI/CD Pipeline Setup** – Should we implement GitHub Actions workflows to run the test suite automatically on pull requests and pushes to main? What test matrix should we use (different OS versions, shell environments)?
184
-
185
- 74. **Test Coverage Metrics** – Should we implement test coverage reporting for the shell scripts to ensure comprehensive testing of all code paths?
186
-
187
- 75. **Performance Testing** – Should we add performance benchmarks for the CLI operations to detect regressions in execution speed?
188
-
189
- ## 22. Test Environment Configuration Questions
190
-
191
- 76. **Bats tmp directory configuration** - The Bats tests were failing due to attempting to use a relative `tmp/` directory that didn't exist. Should we document the required TMPDIR configuration in the README?
192
-
193
- **Status**: ✅ RESOLVED - Created `test/run_bats_tests.sh` wrapper script that sets `TMPDIR=/tmp` and `BATS_TMPDIR=/tmp` to ensure consistent test execution environment.
194
-
195
- 77. **Cross-platform test compatibility** - Will the TMPDIR solution work consistently across different platforms (Linux, macOS, Windows WSL)?
196
-
197
- 78. **Test output stream handling** - The implementation correctly writes to stderr via log functions, but Bats tests check stdout by default. Should we standardize on output stream conventions?
198
-
199
- **Status**: ✅ RESOLVED - Tests work correctly when proper TMPDIR is set. The stderr/stdout separation is actually correct behavior.
200
-
201
- 79. **Shell-based test retention** - Should we keep `test/run_tests.sh` as an alternative test runner, or rely solely on Bats now that it's working?
202
-
203
- **Status**: ✅ RESOLVED - Keeping both test runners provides validation redundancy and different testing approaches.
204
-
205
- ## 23. CI/CD Pipeline Questions
206
-
207
- 80. **GitHub Actions configuration** - What test matrix should the CI pipeline use (OS versions, shell environments, Bats versions)?
208
-
209
- 81. **CI test execution** - Should the CI pipeline use `test/run_bats_tests.sh` to ensure proper test environment setup?
210
-
211
- 82. **Shellcheck integration** - How should shellcheck be integrated into the CI pipeline and local development workflow?