helix-evolve 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- helix_evolve-0.1.0/.gitignore +62 -0
- helix_evolve-0.1.0/PKG-INFO +558 -0
- helix_evolve-0.1.0/README.md +548 -0
- helix_evolve-0.1.0/helix_cli/__init__.py +3 -0
- helix_evolve-0.1.0/helix_cli/commands/__init__.py +0 -0
- helix_evolve-0.1.0/helix_cli/commands/accept.py +71 -0
- helix_evolve-0.1.0/helix_cli/commands/evolve.py +177 -0
- helix_evolve-0.1.0/helix_cli/commands/init.py +67 -0
- helix_evolve-0.1.0/helix_cli/commands/list.py +46 -0
- helix_evolve-0.1.0/helix_cli/commands/results.py +57 -0
- helix_evolve-0.1.0/helix_cli/commands/show.py +111 -0
- helix_evolve-0.1.0/helix_cli/display/__init__.py +0 -0
- helix_evolve-0.1.0/helix_cli/display/progress.py +103 -0
- helix_evolve-0.1.0/helix_cli/display/tables.py +58 -0
- helix_evolve-0.1.0/helix_cli/main.py +24 -0
- helix_evolve-0.1.0/helix_cli/project/__init__.py +0 -0
- helix_evolve-0.1.0/helix_cli/project/discovery.py +63 -0
- helix_evolve-0.1.0/helix_cli/project/loader.py +157 -0
- helix_evolve-0.1.0/helix_cli/project/scaffold.py +95 -0
- helix_evolve-0.1.0/helix_cli/project/writer.py +82 -0
- helix_evolve-0.1.0/pyproject.toml +25 -0
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
# Python
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[cod]
|
|
4
|
+
*$py.class
|
|
5
|
+
*.so
|
|
6
|
+
*.egg-info/
|
|
7
|
+
dist/
|
|
8
|
+
build/
|
|
9
|
+
*.egg
|
|
10
|
+
|
|
11
|
+
# Virtual environment
|
|
12
|
+
.venv/
|
|
13
|
+
venv/
|
|
14
|
+
env/
|
|
15
|
+
|
|
16
|
+
# IDE
|
|
17
|
+
.idea/
|
|
18
|
+
.vscode/
|
|
19
|
+
*.swp
|
|
20
|
+
*.swo
|
|
21
|
+
|
|
22
|
+
# Environment / secrets
|
|
23
|
+
.env
|
|
24
|
+
.env.*
|
|
25
|
+
!.env.example
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
# User data
|
|
29
|
+
/prompts/
|
|
30
|
+
|
|
31
|
+
# Testing
|
|
32
|
+
.pytest_cache/
|
|
33
|
+
.coverage
|
|
34
|
+
htmlcov/
|
|
35
|
+
|
|
36
|
+
# uv cache
|
|
37
|
+
.uv/
|
|
38
|
+
|
|
39
|
+
# Database
|
|
40
|
+
*.db
|
|
41
|
+
*.sqlite
|
|
42
|
+
*.sqlite3
|
|
43
|
+
|
|
44
|
+
# Claude Code
|
|
45
|
+
.claude/
|
|
46
|
+
|
|
47
|
+
# Certificates / keys
|
|
48
|
+
*.pem
|
|
49
|
+
*.key
|
|
50
|
+
*.cert
|
|
51
|
+
*.p12
|
|
52
|
+
|
|
53
|
+
# Node
|
|
54
|
+
node_modules/
|
|
55
|
+
|
|
56
|
+
# OS
|
|
57
|
+
.DS_Store
|
|
58
|
+
Thumbs.db
|
|
59
|
+
|
|
60
|
+
# Impeccable skills (local dev tool)
|
|
61
|
+
.agents/
|
|
62
|
+
skills-lock.json
|
|
@@ -0,0 +1,558 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: helix-evolve
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: CLI for Helix evolutionary prompt optimization
|
|
5
|
+
Requires-Python: >=3.13
|
|
6
|
+
Requires-Dist: helix-engine>=0.1.0
|
|
7
|
+
Requires-Dist: rich>=14.0.0
|
|
8
|
+
Requires-Dist: typer>=0.16.0
|
|
9
|
+
Description-Content-Type: text/markdown
|
|
10
|
+
|
|
11
|
+
# Helix CLI
|
|
12
|
+
|
|
13
|
+
Standalone command-line tool for evolutionary prompt optimization. Evolve your LLM prompts against test cases without running a web server.
|
|
14
|
+
|
|
15
|
+
## Overview
|
|
16
|
+
|
|
17
|
+
The Helix CLI runs the same evolution engine as the web UI, but operates on local YAML files. Define your prompt, test cases, and configuration as files in a directory, then run evolution from the terminal.
|
|
18
|
+
|
|
19
|
+
Key features:
|
|
20
|
+
|
|
21
|
+
- **Standalone** -- no web server or database needed
|
|
22
|
+
- **YAML-based** -- human-readable prompt and test case definitions
|
|
23
|
+
- **Agent-friendly** -- every command supports `--json` for machine-readable output
|
|
24
|
+
- **Same engine** -- uses the identical evolution pipeline as the web UI
|
|
25
|
+
|
|
26
|
+
## Installation
|
|
27
|
+
|
|
28
|
+
### Prerequisites
|
|
29
|
+
|
|
30
|
+
- Python 3.13+
|
|
31
|
+
- [uv](https://docs.astral.sh/uv/) (recommended) or pip
|
|
32
|
+
- An API key for at least one LLM provider (Gemini, OpenAI, or OpenRouter)
|
|
33
|
+
|
|
34
|
+
### Install from source
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
git clone https://github.com/Onebu/helix.git
|
|
38
|
+
cd helix
|
|
39
|
+
|
|
40
|
+
# Install the core engine and CLI
|
|
41
|
+
uv pip install -e . # core engine (api package)
|
|
42
|
+
uv pip install -e cli/ # CLI tool
|
|
43
|
+
|
|
44
|
+
# Verify installation
|
|
45
|
+
helix --help
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
If `helix` isn't on your PATH, use the full path to the venv binary:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
.venv/bin/helix --help
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
### Configure API key
|
|
55
|
+
|
|
56
|
+
Create a `.env` file in your working directory (or set environment variables):
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
# Choose at least one provider
|
|
60
|
+
GENE_GEMINI_API_KEY=your-gemini-key
|
|
61
|
+
# GENE_OPENAI_API_KEY=your-openai-key
|
|
62
|
+
# GENE_OPENROUTER_API_KEY=your-openrouter-key
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## Quick Start
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
# 1. Create a new prompt project
|
|
69
|
+
helix init customer-support
|
|
70
|
+
|
|
71
|
+
# 2. Edit the YAML files (see File Format below)
|
|
72
|
+
# - customer-support/prompt.yaml
|
|
73
|
+
# - customer-support/dataset.yaml
|
|
74
|
+
# - customer-support/config.yaml
|
|
75
|
+
|
|
76
|
+
# 3. Run evolution
|
|
77
|
+
helix evolve customer-support
|
|
78
|
+
|
|
79
|
+
# 4. Review the results
|
|
80
|
+
helix results customer-support
|
|
81
|
+
|
|
82
|
+
# 5. Accept the evolved template
|
|
83
|
+
helix accept customer-support
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Commands
|
|
87
|
+
|
|
88
|
+
### `helix init <prompt-id>`
|
|
89
|
+
|
|
90
|
+
Scaffold a new prompt directory with template YAML files.
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
helix init my-prompt
|
|
94
|
+
helix init my-prompt --dir /path/to/workspace
|
|
95
|
+
helix init my-prompt --json # machine-readable output
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Creates:
|
|
99
|
+
```
|
|
100
|
+
my-prompt/
|
|
101
|
+
prompt.yaml # prompt template and variable definitions
|
|
102
|
+
dataset.yaml # test cases for fitness evaluation
|
|
103
|
+
config.yaml # model, provider, and evolution settings
|
|
104
|
+
results/ # evolution results (populated by helix evolve)
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
### `helix list`
|
|
108
|
+
|
|
109
|
+
List all prompt directories in the current workspace.
|
|
110
|
+
|
|
111
|
+
```bash
|
|
112
|
+
helix list
|
|
113
|
+
helix list --dir /path/to/workspace
|
|
114
|
+
helix list --json
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
Output:
|
|
118
|
+
```
|
|
119
|
+
Prompts
|
|
120
|
+
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━┓
|
|
121
|
+
┃ ID ┃ Purpose ┃ Cases ┃ Runs ┃
|
|
122
|
+
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━┩
|
|
123
|
+
│ customer-support │ Handle support tickets │ 8 │ 3 │
|
|
124
|
+
│ summarizer │ Summarize documents │ 5 │ 1 │
|
|
125
|
+
└──────────────────┴───────────────────────┴───────┴──────┘
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### `helix show <prompt-id>`
|
|
129
|
+
|
|
130
|
+
Display detailed information about a prompt: template preview, variables, dataset summary, and latest evolution result.
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
helix show customer-support
|
|
134
|
+
helix show customer-support --json
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
### `helix evolve <prompt-id>`
|
|
138
|
+
|
|
139
|
+
Run the evolution engine against your prompt and test cases. Shows a live Rich progress display with generation-by-generation fitness updates.
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
# Basic usage
|
|
143
|
+
helix evolve customer-support
|
|
144
|
+
|
|
145
|
+
# Override evolution parameters
|
|
146
|
+
helix evolve customer-support --generations 5 --islands 2 --budget 1.00
|
|
147
|
+
|
|
148
|
+
# JSON output (no live display, suitable for scripts/agents)
|
|
149
|
+
helix evolve customer-support --json
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
Options:
|
|
153
|
+
| Flag | Description |
|
|
154
|
+
|------|-------------|
|
|
155
|
+
| `--generations, -g` | Override number of generations |
|
|
156
|
+
| `--islands, -i` | Override number of islands |
|
|
157
|
+
| `--budget, -b` | Override budget cap in USD |
|
|
158
|
+
| `--json` | JSON output, no live display |
|
|
159
|
+
|
|
160
|
+
The live progress display shows:
|
|
161
|
+
```
|
|
162
|
+
╭──────────────── helix evolve customer-support ─────────────────╮
|
|
163
|
+
│ Generation 7/10 [████████████████████░░░░] 70% │
|
|
164
|
+
│ │
|
|
165
|
+
│ Best Fitness -0.50 (seed: -6.00, +5.50) │
|
|
166
|
+
│ Avg Fitness -2.10 │
|
|
167
|
+
│ Candidates 140 │
|
|
168
|
+
│ Cost $0.23 / $2.00 │
|
|
169
|
+
│ Elapsed 1m 42s │
|
|
170
|
+
╰────────────────────────────────────────────────────────────────╯
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Results are automatically saved to `results/run-NNN.yaml`.
|
|
174
|
+
|
|
175
|
+
### `helix results <prompt-id>`
|
|
176
|
+
|
|
177
|
+
Display evolution results. Shows fitness, cost, model info, and the best evolved template.
|
|
178
|
+
|
|
179
|
+
```bash
|
|
180
|
+
# Latest result
|
|
181
|
+
helix results customer-support
|
|
182
|
+
|
|
183
|
+
# Specific run
|
|
184
|
+
helix results customer-support --run run-001
|
|
185
|
+
|
|
186
|
+
# JSON output
|
|
187
|
+
helix results customer-support --json
|
|
188
|
+
|
|
189
|
+
# Print only the evolved template (useful for piping)
|
|
190
|
+
helix results customer-support --template
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
### `helix accept <prompt-id>`
|
|
194
|
+
|
|
195
|
+
Accept an evolved template by copying it back into `prompt.yaml`, replacing the current template.
|
|
196
|
+
|
|
197
|
+
```bash
|
|
198
|
+
helix accept customer-support
|
|
199
|
+
helix accept customer-support --run run-001 # accept a specific run
|
|
200
|
+
helix accept customer-support --json
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
### Global Options
|
|
204
|
+
|
|
205
|
+
Every command supports:
|
|
206
|
+
| Flag | Description |
|
|
207
|
+
|------|-------------|
|
|
208
|
+
| `--dir, -d` | Workspace directory (default: current directory) |
|
|
209
|
+
| `--json` | Machine-readable JSON output |
|
|
210
|
+
|
|
211
|
+
## File Format Reference
|
|
212
|
+
|
|
213
|
+
### `prompt.yaml`
|
|
214
|
+
|
|
215
|
+
Defines the prompt template, variables, and optional tool definitions.
|
|
216
|
+
|
|
217
|
+
```yaml
|
|
218
|
+
# Required fields
|
|
219
|
+
id: customer-support # lowercase slug (a-z, 0-9, hyphens)
|
|
220
|
+
purpose: "Handle customer support tickets"
|
|
221
|
+
|
|
222
|
+
# The Jinja2 template. Variables use {{ variable_name }} syntax.
|
|
223
|
+
template: |
|
|
224
|
+
You are a customer support agent for {{ company_name }}.
|
|
225
|
+
|
|
226
|
+
Customer: {{ customer_name }}
|
|
227
|
+
Issue: {{ issue_description }}
|
|
228
|
+
|
|
229
|
+
Resolve the issue professionally and empathetically.
|
|
230
|
+
Always offer a concrete next step.
|
|
231
|
+
|
|
232
|
+
# Variable definitions (optional).
|
|
233
|
+
# If omitted, variables are auto-extracted from the template.
|
|
234
|
+
# Define explicitly to set types, descriptions, and anchor status.
|
|
235
|
+
variables:
|
|
236
|
+
- name: company_name
|
|
237
|
+
description: "Company brand name"
|
|
238
|
+
var_type: string # string | number | boolean | object | array
|
|
239
|
+
is_anchor: true # anchored = preserved during evolution
|
|
240
|
+
|
|
241
|
+
- name: customer_name
|
|
242
|
+
description: "Customer's display name"
|
|
243
|
+
var_type: string
|
|
244
|
+
is_anchor: true
|
|
245
|
+
|
|
246
|
+
- name: issue_description
|
|
247
|
+
description: "Description of the customer's issue"
|
|
248
|
+
var_type: string
|
|
249
|
+
is_anchor: false
|
|
250
|
+
|
|
251
|
+
# Tool definitions (optional, OpenAI function-calling format)
|
|
252
|
+
tools:
|
|
253
|
+
- type: function
|
|
254
|
+
function:
|
|
255
|
+
name: lookup_order
|
|
256
|
+
description: "Look up a customer's order by ID"
|
|
257
|
+
parameters:
|
|
258
|
+
type: object
|
|
259
|
+
properties:
|
|
260
|
+
order_id:
|
|
261
|
+
type: string
|
|
262
|
+
required: [order_id]
|
|
263
|
+
|
|
264
|
+
- type: function
|
|
265
|
+
function:
|
|
266
|
+
name: create_ticket
|
|
267
|
+
description: "Create a support ticket"
|
|
268
|
+
parameters:
|
|
269
|
+
type: object
|
|
270
|
+
properties:
|
|
271
|
+
subject:
|
|
272
|
+
type: string
|
|
273
|
+
priority:
|
|
274
|
+
type: string
|
|
275
|
+
enum: [low, normal, high, urgent]
|
|
276
|
+
required: [subject, priority]
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
**Anchor variables** are preserved during evolution. The evolution engine will never remove `{{ company_name }}` from the template if it's marked as an anchor. Use anchors for variables that must always be present (brand names, user identifiers, etc.).
|
|
280
|
+
|
|
281
|
+
### `dataset.yaml`
|
|
282
|
+
|
|
283
|
+
Defines test cases that the evolution engine scores prompts against. Each case specifies inputs, conversation context, and expected behavior.
|
|
284
|
+
|
|
285
|
+
```yaml
|
|
286
|
+
cases:
|
|
287
|
+
# --- Simple behavior test ---
|
|
288
|
+
- name: "Greeting"
|
|
289
|
+
tier: normal # critical | normal | low
|
|
290
|
+
tags: [greeting, basic]
|
|
291
|
+
variables:
|
|
292
|
+
company_name: "Acme Corp"
|
|
293
|
+
customer_name: "Alice"
|
|
294
|
+
issue_description: ""
|
|
295
|
+
chat_history:
|
|
296
|
+
- role: user
|
|
297
|
+
content: "Hi, I need help with my order"
|
|
298
|
+
expected_output:
|
|
299
|
+
require_content: true # must produce text (not just tool calls)
|
|
300
|
+
behavior: # LLM-judged criteria
|
|
301
|
+
- "Greets the customer by name"
|
|
302
|
+
- "Asks for order details or offers to look it up"
|
|
303
|
+
|
|
304
|
+
# --- Tool call test ---
|
|
305
|
+
- name: "Order lookup"
|
|
306
|
+
tier: critical # critical cases have 5x weight
|
|
307
|
+
tags: [tool-call, order]
|
|
308
|
+
variables:
|
|
309
|
+
company_name: "Acme Corp"
|
|
310
|
+
customer_name: "Bob"
|
|
311
|
+
issue_description: "Where is my order #12345?"
|
|
312
|
+
chat_history:
|
|
313
|
+
- role: user
|
|
314
|
+
content: "Where is my order #12345?"
|
|
315
|
+
expected_output:
|
|
316
|
+
match_args: # expect a specific tool call
|
|
317
|
+
tool_name: lookup_order
|
|
318
|
+
tool_args:
|
|
319
|
+
order_id: "12345"
|
|
320
|
+
|
|
321
|
+
# --- Multi-turn conversation ---
|
|
322
|
+
- name: "Escalation flow"
|
|
323
|
+
tier: normal
|
|
324
|
+
tags: [escalation, multi-turn]
|
|
325
|
+
variables:
|
|
326
|
+
company_name: "Acme Corp"
|
|
327
|
+
customer_name: "Carol"
|
|
328
|
+
issue_description: "Product is defective"
|
|
329
|
+
chat_history:
|
|
330
|
+
- role: user
|
|
331
|
+
content: "My product broke after one day!"
|
|
332
|
+
- role: assistant
|
|
333
|
+
content: "I'm sorry to hear that. Can you describe the issue?"
|
|
334
|
+
- role: user
|
|
335
|
+
content: "The screen cracked on its own. I want a full refund NOW."
|
|
336
|
+
expected_output:
|
|
337
|
+
require_content: true
|
|
338
|
+
behavior:
|
|
339
|
+
- "Shows empathy for the customer's frustration"
|
|
340
|
+
- "Offers a concrete resolution (refund, replacement, or escalation)"
|
|
341
|
+
- "Does not make promises outside company policy"
|
|
342
|
+
|
|
343
|
+
# --- Edge case ---
|
|
344
|
+
- name: "Off-topic request"
|
|
345
|
+
tier: low # low priority = 0.25x weight
|
|
346
|
+
tags: [edge-case]
|
|
347
|
+
variables:
|
|
348
|
+
company_name: "Acme Corp"
|
|
349
|
+
customer_name: "Dave"
|
|
350
|
+
issue_description: ""
|
|
351
|
+
chat_history:
|
|
352
|
+
- role: user
|
|
353
|
+
content: "What's the meaning of life?"
|
|
354
|
+
expected_output:
|
|
355
|
+
require_content: true
|
|
356
|
+
behavior:
|
|
357
|
+
- "Politely redirects to support topics"
|
|
358
|
+
- "Does not engage with the off-topic question"
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
**Test case tiers** control how much each case affects the fitness score:
|
|
362
|
+
| Tier | Weight | Use for |
|
|
363
|
+
|------|--------|---------|
|
|
364
|
+
| `critical` | 5x | Must-pass cases, core functionality |
|
|
365
|
+
| `normal` | 1x | Standard behavior expectations |
|
|
366
|
+
| `low` | 0.25x | Nice-to-have, edge cases |
|
|
367
|
+
|
|
368
|
+
**Expected output fields:**
|
|
369
|
+
| Field | Type | Description |
|
|
370
|
+
|-------|------|-------------|
|
|
371
|
+
| `require_content` | bool | Response must include text content |
|
|
372
|
+
| `behavior` | list[str] | Natural language criteria judged by LLM |
|
|
373
|
+
| `match_args` | dict | Expected tool call (`tool_name` + `tool_args`) |
|
|
374
|
+
| `tool_calls` | list[dict] | Multiple expected tool calls |
|
|
375
|
+
| `must_contain` | str | Response must contain this substring |
|
|
376
|
+
|
|
377
|
+
A fitness score of **0.0** means all cases pass. Negative scores indicate failures (more negative = worse).
|
|
378
|
+
|
|
379
|
+
### `config.yaml`
|
|
380
|
+
|
|
381
|
+
Configures models, providers, and evolution hyperparameters. All fields are optional and override environment variables.
|
|
382
|
+
|
|
383
|
+
```yaml
|
|
384
|
+
# Model configuration per role
|
|
385
|
+
models:
|
|
386
|
+
meta: # generates critique & refinement
|
|
387
|
+
provider: gemini # gemini | openrouter | openai
|
|
388
|
+
model: gemini-2.5-pro
|
|
389
|
+
thinking_budget: -1 # Gemini-specific (-1 = dynamic)
|
|
390
|
+
target: # evaluates prompts against test cases
|
|
391
|
+
provider: gemini
|
|
392
|
+
model: gemini-2.5-flash
|
|
393
|
+
judge: # scores evaluation results
|
|
394
|
+
provider: gemini
|
|
395
|
+
model: gemini-2.5-flash
|
|
396
|
+
|
|
397
|
+
# Evolution hyperparameters
|
|
398
|
+
evolution:
|
|
399
|
+
generations: 10 # number of evolution generations
|
|
400
|
+
islands: 4 # parallel island populations
|
|
401
|
+
conversations_per_island: 5 # RCC conversations per island per gen
|
|
402
|
+
budget_cap_usd: 2.00 # stop when cost exceeds this
|
|
403
|
+
# n_seq: 3 # critic-author turns per conversation
|
|
404
|
+
# temperature: 1.0 # Boltzmann selection temperature
|
|
405
|
+
# structural_mutation_probability: 0.2
|
|
406
|
+
# population_cap: 10 # max candidates per island
|
|
407
|
+
# n_emigrate: 5 # migration count between islands
|
|
408
|
+
# reset_interval: 3 # gens between island resets
|
|
409
|
+
# adaptive_sampling: false # enable adaptive case sampling
|
|
410
|
+
|
|
411
|
+
# Inference parameters (passed to LLM API calls)
|
|
412
|
+
generation:
|
|
413
|
+
temperature: 0.7
|
|
414
|
+
max_tokens: 4096
|
|
415
|
+
# top_p: null
|
|
416
|
+
# top_k: null
|
|
417
|
+
# frequency_penalty: null
|
|
418
|
+
# presence_penalty: null
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
### `results/run-NNN.yaml`
|
|
422
|
+
|
|
423
|
+
Written automatically by `helix evolve`. Contains the full evolution result.
|
|
424
|
+
|
|
425
|
+
```yaml
|
|
426
|
+
run_id: run-001
|
|
427
|
+
timestamp: "2026-03-27T22:35:28+00:00"
|
|
428
|
+
termination_reason: perfect_fitness # or generations_complete, budget_exhausted
|
|
429
|
+
|
|
430
|
+
best_template: |
|
|
431
|
+
You are a customer support agent for {{ company_name }}...
|
|
432
|
+
(the full evolved template)
|
|
433
|
+
|
|
434
|
+
fitness:
|
|
435
|
+
score: 0.0 # 0.0 = all cases pass
|
|
436
|
+
normalized_score: 0.0
|
|
437
|
+
|
|
438
|
+
seed_fitness: -6.0 # original template's score
|
|
439
|
+
|
|
440
|
+
cost:
|
|
441
|
+
total_calls: 342
|
|
442
|
+
total_input_tokens: 1250000
|
|
443
|
+
total_output_tokens: 85000
|
|
444
|
+
total_cost_usd: 0.47
|
|
445
|
+
|
|
446
|
+
effective_models:
|
|
447
|
+
meta_model: gemini-2.5-pro
|
|
448
|
+
meta_provider: gemini
|
|
449
|
+
target_model: gemini-2.5-flash
|
|
450
|
+
target_provider: gemini
|
|
451
|
+
judge_model: gemini-2.5-flash
|
|
452
|
+
judge_provider: gemini
|
|
453
|
+
|
|
454
|
+
generation_records:
|
|
455
|
+
- generation: 1
|
|
456
|
+
best_fitness: -2.0
|
|
457
|
+
avg_fitness: -4.5
|
|
458
|
+
candidates_evaluated: 20
|
|
459
|
+
- generation: 2
|
|
460
|
+
best_fitness: -0.5
|
|
461
|
+
avg_fitness: -1.8
|
|
462
|
+
candidates_evaluated: 20
|
|
463
|
+
|
|
464
|
+
config_snapshot:
|
|
465
|
+
evolution:
|
|
466
|
+
generations: 10
|
|
467
|
+
islands: 4
|
|
468
|
+
conversations_per_island: 5
|
|
469
|
+
```
|
|
470
|
+
|
|
471
|
+
## Configuration Cascade
|
|
472
|
+
|
|
473
|
+
Settings are resolved in priority order (highest wins):
|
|
474
|
+
|
|
475
|
+
1. **CLI flags** (`--generations 5`, `--budget 1.00`)
|
|
476
|
+
2. **config.yaml** values
|
|
477
|
+
3. **Environment variables** (`GENE_GEMINI_API_KEY`, `GENE_META_MODEL`, etc.)
|
|
478
|
+
4. **Defaults** (built into the engine)
|
|
479
|
+
|
|
480
|
+
This means you can set defaults in `config.yaml`, override per-run with CLI flags, and keep secrets in `.env` or environment variables.
|
|
481
|
+
|
|
482
|
+
## Agent Integration
|
|
483
|
+
|
|
484
|
+
Every command supports `--json` for structured output, making the CLI usable by AI coding agents like Claude Code.
|
|
485
|
+
|
|
486
|
+
Example CLAUDE.md configuration for a project using Helix:
|
|
487
|
+
|
|
488
|
+
```markdown
|
|
489
|
+
## Prompt Optimization
|
|
490
|
+
|
|
491
|
+
Run `helix evolve <prompt-id> --json` to optimize prompts.
|
|
492
|
+
Run `helix results <prompt-id> --json` to check evolution results.
|
|
493
|
+
Run `helix accept <prompt-id>` to apply the best evolved template.
|
|
494
|
+
Run `helix show <prompt-id> --json` to inspect prompt configuration.
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
Example agent workflow:
|
|
498
|
+
|
|
499
|
+
```bash
|
|
500
|
+
# Agent creates a prompt
|
|
501
|
+
helix init my-agent-prompt --json
|
|
502
|
+
|
|
503
|
+
# Agent checks current state
|
|
504
|
+
helix show my-agent-prompt --json
|
|
505
|
+
|
|
506
|
+
# Agent runs evolution and parses result
|
|
507
|
+
helix evolve my-agent-prompt --generations 5 --json
|
|
508
|
+
|
|
509
|
+
# Agent reads the evolved template
|
|
510
|
+
helix results my-agent-prompt --template
|
|
511
|
+
|
|
512
|
+
# Agent accepts if fitness improved
|
|
513
|
+
helix accept my-agent-prompt --json
|
|
514
|
+
```
|
|
515
|
+
|
|
516
|
+
## Typical Workflow
|
|
517
|
+
|
|
518
|
+
```
|
|
519
|
+
helix init <id> Create prompt directory
|
|
520
|
+
|
|
|
521
|
+
v
|
|
522
|
+
Edit YAML files Define template, test cases, config
|
|
523
|
+
|
|
|
524
|
+
v
|
|
525
|
+
helix evolve <id> Run evolution (live progress)
|
|
526
|
+
|
|
|
527
|
+
v
|
|
528
|
+
helix results <id> Review fitness, cost, evolved template
|
|
529
|
+
|
|
|
530
|
+
/ \
|
|
531
|
+
/ \
|
|
532
|
+
v v
|
|
533
|
+
Accept Iterate Apply template or adjust test cases
|
|
534
|
+
```
|
|
535
|
+
|
|
536
|
+
1. **Initialize**: `helix init` creates a starter directory
|
|
537
|
+
2. **Define**: Write your prompt template in `prompt.yaml`, add test cases to `dataset.yaml`
|
|
538
|
+
3. **Configure**: Set models and budget in `config.yaml` (or use defaults)
|
|
539
|
+
4. **Evolve**: `helix evolve` runs the engine and saves results
|
|
540
|
+
5. **Review**: `helix results` shows what improved and by how much
|
|
541
|
+
6. **Accept or iterate**: `helix accept` applies the evolved template, or adjust test cases and re-run
|
|
542
|
+
|
|
543
|
+
## Troubleshooting
|
|
544
|
+
|
|
545
|
+
**"Missing API key for gemini"**
|
|
546
|
+
Set your API key in a `.env` file or environment:
|
|
547
|
+
```bash
|
|
548
|
+
export GENE_GEMINI_API_KEY=your-key-here
|
|
549
|
+
```
|
|
550
|
+
|
|
551
|
+
**"No test cases found"**
|
|
552
|
+
Add at least one test case to `dataset.yaml`. See the file format reference above.
|
|
553
|
+
|
|
554
|
+
**"Prompt not found"**
|
|
555
|
+
Make sure you're in the right directory. Use `helix list` to see available prompts, or `--dir` to specify the workspace.
|
|
556
|
+
|
|
557
|
+
**Evolution finishes instantly with perfect fitness**
|
|
558
|
+
Your test cases may be too easy. Add more specific `behavior` criteria or `critical`-tier cases to challenge the prompt.
|