helix-evolve 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,62 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ *.egg-info/
7
+ dist/
8
+ build/
9
+ *.egg
10
+
11
+ # Virtual environment
12
+ .venv/
13
+ venv/
14
+ env/
15
+
16
+ # IDE
17
+ .idea/
18
+ .vscode/
19
+ *.swp
20
+ *.swo
21
+
22
+ # Environment / secrets
23
+ .env
24
+ .env.*
25
+ !.env.example
26
+
27
+
28
+ # User data
29
+ /prompts/
30
+
31
+ # Testing
32
+ .pytest_cache/
33
+ .coverage
34
+ htmlcov/
35
+
36
+ # uv cache
37
+ .uv/
38
+
39
+ # Database
40
+ *.db
41
+ *.sqlite
42
+ *.sqlite3
43
+
44
+ # Claude Code
45
+ .claude/
46
+
47
+ # Certificates / keys
48
+ *.pem
49
+ *.key
50
+ *.cert
51
+ *.p12
52
+
53
+ # Node
54
+ node_modules/
55
+
56
+ # OS
57
+ .DS_Store
58
+ Thumbs.db
59
+
60
+ # Impeccable skills (local dev tool)
61
+ .agents/
62
+ skills-lock.json
@@ -0,0 +1,558 @@
1
+ Metadata-Version: 2.4
2
+ Name: helix-evolve
3
+ Version: 0.1.0
4
+ Summary: CLI for Helix evolutionary prompt optimization
5
+ Requires-Python: >=3.13
6
+ Requires-Dist: helix-engine>=0.1.0
7
+ Requires-Dist: rich>=14.0.0
8
+ Requires-Dist: typer>=0.16.0
9
+ Description-Content-Type: text/markdown
10
+
11
+ # Helix CLI
12
+
13
+ Standalone command-line tool for evolutionary prompt optimization. Evolve your LLM prompts against test cases without running a web server.
14
+
15
+ ## Overview
16
+
17
+ The Helix CLI runs the same evolution engine as the web UI, but operates on local YAML files. Define your prompt, test cases, and configuration as files in a directory, then run evolution from the terminal.
18
+
19
+ Key features:
20
+
21
+ - **Standalone** -- no web server or database needed
22
+ - **YAML-based** -- human-readable prompt and test case definitions
23
+ - **Agent-friendly** -- every command supports `--json` for machine-readable output
24
+ - **Same engine** -- uses the identical evolution pipeline as the web UI
25
+
26
+ ## Installation
27
+
28
+ ### Prerequisites
29
+
30
+ - Python 3.13+
31
+ - [uv](https://docs.astral.sh/uv/) (recommended) or pip
32
+ - An API key for at least one LLM provider (Gemini, OpenAI, or OpenRouter)
33
+
34
+ ### Install from source
35
+
36
+ ```bash
37
+ git clone https://github.com/Onebu/helix.git
38
+ cd helix
39
+
40
+ # Install the core engine and CLI
41
+ uv pip install -e . # core engine (api package)
42
+ uv pip install -e cli/ # CLI tool
43
+
44
+ # Verify installation
45
+ helix --help
46
+ ```
47
+
48
+ If `helix` isn't on your PATH, use the full path to the venv binary:
49
+
50
+ ```bash
51
+ .venv/bin/helix --help
52
+ ```
53
+
54
+ ### Configure API key
55
+
56
+ Create a `.env` file in your working directory (or set environment variables):
57
+
58
+ ```bash
59
+ # Choose at least one provider
60
+ GENE_GEMINI_API_KEY=your-gemini-key
61
+ # GENE_OPENAI_API_KEY=your-openai-key
62
+ # GENE_OPENROUTER_API_KEY=your-openrouter-key
63
+ ```
64
+
65
+ ## Quick Start
66
+
67
+ ```bash
68
+ # 1. Create a new prompt project
69
+ helix init customer-support
70
+
71
+ # 2. Edit the YAML files (see File Format below)
72
+ # - customer-support/prompt.yaml
73
+ # - customer-support/dataset.yaml
74
+ # - customer-support/config.yaml
75
+
76
+ # 3. Run evolution
77
+ helix evolve customer-support
78
+
79
+ # 4. Review the results
80
+ helix results customer-support
81
+
82
+ # 5. Accept the evolved template
83
+ helix accept customer-support
84
+ ```
85
+
86
+ ## Commands
87
+
88
+ ### `helix init <prompt-id>`
89
+
90
+ Scaffold a new prompt directory with template YAML files.
91
+
92
+ ```bash
93
+ helix init my-prompt
94
+ helix init my-prompt --dir /path/to/workspace
95
+ helix init my-prompt --json # machine-readable output
96
+ ```
97
+
98
+ Creates:
99
+ ```
100
+ my-prompt/
101
+ prompt.yaml # prompt template and variable definitions
102
+ dataset.yaml # test cases for fitness evaluation
103
+ config.yaml # model, provider, and evolution settings
104
+ results/ # evolution results (populated by helix evolve)
105
+ ```
106
+
107
+ ### `helix list`
108
+
109
+ List all prompt directories in the current workspace.
110
+
111
+ ```bash
112
+ helix list
113
+ helix list --dir /path/to/workspace
114
+ helix list --json
115
+ ```
116
+
117
+ Output:
118
+ ```
119
+ Prompts
120
+ ┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━┓
121
+ ┃ ID ┃ Purpose ┃ Cases ┃ Runs ┃
122
+ ┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━┩
123
+ │ customer-support │ Handle support tickets │ 8 │ 3 │
124
+ │ summarizer │ Summarize documents │ 5 │ 1 │
125
+ └──────────────────┴───────────────────────┴───────┴──────┘
126
+ ```
127
+
128
+ ### `helix show <prompt-id>`
129
+
130
+ Display detailed information about a prompt: template preview, variables, dataset summary, and latest evolution result.
131
+
132
+ ```bash
133
+ helix show customer-support
134
+ helix show customer-support --json
135
+ ```
136
+
137
+ ### `helix evolve <prompt-id>`
138
+
139
+ Run the evolution engine against your prompt and test cases. Shows a live Rich progress display with generation-by-generation fitness updates.
140
+
141
+ ```bash
142
+ # Basic usage
143
+ helix evolve customer-support
144
+
145
+ # Override evolution parameters
146
+ helix evolve customer-support --generations 5 --islands 2 --budget 1.00
147
+
148
+ # JSON output (no live display, suitable for scripts/agents)
149
+ helix evolve customer-support --json
150
+ ```
151
+
152
+ Options:
153
+ | Flag | Description |
154
+ |------|-------------|
155
+ | `--generations, -g` | Override number of generations |
156
+ | `--islands, -i` | Override number of islands |
157
+ | `--budget, -b` | Override budget cap in USD |
158
+ | `--json` | JSON output, no live display |
159
+
160
+ The live progress display shows:
161
+ ```
162
+ ╭──────────────── helix evolve customer-support ─────────────────╮
163
+ │ Generation 7/10 [████████████████████░░░░] 70% │
164
+ │ │
165
+ │ Best Fitness -0.50 (seed: -6.00, +5.50) │
166
+ │ Avg Fitness -2.10 │
167
+ │ Candidates 140 │
168
+ │ Cost $0.23 / $2.00 │
169
+ │ Elapsed 1m 42s │
170
+ ╰────────────────────────────────────────────────────────────────╯
171
+ ```
172
+
173
+ Results are automatically saved to `results/run-NNN.yaml`.
174
+
175
+ ### `helix results <prompt-id>`
176
+
177
+ Display evolution results. Shows fitness, cost, model info, and the best evolved template.
178
+
179
+ ```bash
180
+ # Latest result
181
+ helix results customer-support
182
+
183
+ # Specific run
184
+ helix results customer-support --run run-001
185
+
186
+ # JSON output
187
+ helix results customer-support --json
188
+
189
+ # Print only the evolved template (useful for piping)
190
+ helix results customer-support --template
191
+ ```
192
+
193
+ ### `helix accept <prompt-id>`
194
+
195
+ Accept an evolved template by copying it back into `prompt.yaml`, replacing the current template.
196
+
197
+ ```bash
198
+ helix accept customer-support
199
+ helix accept customer-support --run run-001 # accept a specific run
200
+ helix accept customer-support --json
201
+ ```
202
+
203
+ ### Global Options
204
+
205
+ Every command supports:
206
+ | Flag | Description |
207
+ |------|-------------|
208
+ | `--dir, -d` | Workspace directory (default: current directory) |
209
+ | `--json` | Machine-readable JSON output |
210
+
211
+ ## File Format Reference
212
+
213
+ ### `prompt.yaml`
214
+
215
+ Defines the prompt template, variables, and optional tool definitions.
216
+
217
+ ```yaml
218
+ # Required fields
219
+ id: customer-support # lowercase slug (a-z, 0-9, hyphens)
220
+ purpose: "Handle customer support tickets"
221
+
222
+ # The Jinja2 template. Variables use {{ variable_name }} syntax.
223
+ template: |
224
+ You are a customer support agent for {{ company_name }}.
225
+
226
+ Customer: {{ customer_name }}
227
+ Issue: {{ issue_description }}
228
+
229
+ Resolve the issue professionally and empathetically.
230
+ Always offer a concrete next step.
231
+
232
+ # Variable definitions (optional).
233
+ # If omitted, variables are auto-extracted from the template.
234
+ # Define explicitly to set types, descriptions, and anchor status.
235
+ variables:
236
+ - name: company_name
237
+ description: "Company brand name"
238
+ var_type: string # string | number | boolean | object | array
239
+ is_anchor: true # anchored = preserved during evolution
240
+
241
+ - name: customer_name
242
+ description: "Customer's display name"
243
+ var_type: string
244
+ is_anchor: true
245
+
246
+ - name: issue_description
247
+ description: "Description of the customer's issue"
248
+ var_type: string
249
+ is_anchor: false
250
+
251
+ # Tool definitions (optional, OpenAI function-calling format)
252
+ tools:
253
+ - type: function
254
+ function:
255
+ name: lookup_order
256
+ description: "Look up a customer's order by ID"
257
+ parameters:
258
+ type: object
259
+ properties:
260
+ order_id:
261
+ type: string
262
+ required: [order_id]
263
+
264
+ - type: function
265
+ function:
266
+ name: create_ticket
267
+ description: "Create a support ticket"
268
+ parameters:
269
+ type: object
270
+ properties:
271
+ subject:
272
+ type: string
273
+ priority:
274
+ type: string
275
+ enum: [low, normal, high, urgent]
276
+ required: [subject, priority]
277
+ ```
278
+
279
+ **Anchor variables** are preserved during evolution. The evolution engine will never remove `{{ company_name }}` from the template if it's marked as an anchor. Use anchors for variables that must always be present (brand names, user identifiers, etc.).
280
+
281
+ ### `dataset.yaml`
282
+
283
+ Defines test cases that the evolution engine scores prompts against. Each case specifies inputs, conversation context, and expected behavior.
284
+
285
+ ```yaml
286
+ cases:
287
+ # --- Simple behavior test ---
288
+ - name: "Greeting"
289
+ tier: normal # critical | normal | low
290
+ tags: [greeting, basic]
291
+ variables:
292
+ company_name: "Acme Corp"
293
+ customer_name: "Alice"
294
+ issue_description: ""
295
+ chat_history:
296
+ - role: user
297
+ content: "Hi, I need help with my order"
298
+ expected_output:
299
+ require_content: true # must produce text (not just tool calls)
300
+ behavior: # LLM-judged criteria
301
+ - "Greets the customer by name"
302
+ - "Asks for order details or offers to look it up"
303
+
304
+ # --- Tool call test ---
305
+ - name: "Order lookup"
306
+ tier: critical # critical cases have 5x weight
307
+ tags: [tool-call, order]
308
+ variables:
309
+ company_name: "Acme Corp"
310
+ customer_name: "Bob"
311
+ issue_description: "Where is my order #12345?"
312
+ chat_history:
313
+ - role: user
314
+ content: "Where is my order #12345?"
315
+ expected_output:
316
+ match_args: # expect a specific tool call
317
+ tool_name: lookup_order
318
+ tool_args:
319
+ order_id: "12345"
320
+
321
+ # --- Multi-turn conversation ---
322
+ - name: "Escalation flow"
323
+ tier: normal
324
+ tags: [escalation, multi-turn]
325
+ variables:
326
+ company_name: "Acme Corp"
327
+ customer_name: "Carol"
328
+ issue_description: "Product is defective"
329
+ chat_history:
330
+ - role: user
331
+ content: "My product broke after one day!"
332
+ - role: assistant
333
+ content: "I'm sorry to hear that. Can you describe the issue?"
334
+ - role: user
335
+ content: "The screen cracked on its own. I want a full refund NOW."
336
+ expected_output:
337
+ require_content: true
338
+ behavior:
339
+ - "Shows empathy for the customer's frustration"
340
+ - "Offers a concrete resolution (refund, replacement, or escalation)"
341
+ - "Does not make promises outside company policy"
342
+
343
+ # --- Edge case ---
344
+ - name: "Off-topic request"
345
+ tier: low # low priority = 0.25x weight
346
+ tags: [edge-case]
347
+ variables:
348
+ company_name: "Acme Corp"
349
+ customer_name: "Dave"
350
+ issue_description: ""
351
+ chat_history:
352
+ - role: user
353
+ content: "What's the meaning of life?"
354
+ expected_output:
355
+ require_content: true
356
+ behavior:
357
+ - "Politely redirects to support topics"
358
+ - "Does not engage with the off-topic question"
359
+ ```
360
+
361
+ **Test case tiers** control how much each case affects the fitness score:
362
+ | Tier | Weight | Use for |
363
+ |------|--------|---------|
364
+ | `critical` | 5x | Must-pass cases, core functionality |
365
+ | `normal` | 1x | Standard behavior expectations |
366
+ | `low` | 0.25x | Nice-to-have, edge cases |
367
+
368
+ **Expected output fields:**
369
+ | Field | Type | Description |
370
+ |-------|------|-------------|
371
+ | `require_content` | bool | Response must include text content |
372
+ | `behavior` | list[str] | Natural language criteria judged by LLM |
373
+ | `match_args` | dict | Expected tool call (`tool_name` + `tool_args`) |
374
+ | `tool_calls` | list[dict] | Multiple expected tool calls |
375
+ | `must_contain` | str | Response must contain this substring |
376
+
377
+ A fitness score of **0.0** means all cases pass. Negative scores indicate failures (more negative = worse).
378
+
379
+ ### `config.yaml`
380
+
381
+ Configures models, providers, and evolution hyperparameters. All fields are optional and override environment variables.
382
+
383
+ ```yaml
384
+ # Model configuration per role
385
+ models:
386
+ meta: # generates critique & refinement
387
+ provider: gemini # gemini | openrouter | openai
388
+ model: gemini-2.5-pro
389
+ thinking_budget: -1 # Gemini-specific (-1 = dynamic)
390
+ target: # evaluates prompts against test cases
391
+ provider: gemini
392
+ model: gemini-2.5-flash
393
+ judge: # scores evaluation results
394
+ provider: gemini
395
+ model: gemini-2.5-flash
396
+
397
+ # Evolution hyperparameters
398
+ evolution:
399
+ generations: 10 # number of evolution generations
400
+ islands: 4 # parallel island populations
401
+ conversations_per_island: 5 # RCC conversations per island per gen
402
+ budget_cap_usd: 2.00 # stop when cost exceeds this
403
+ # n_seq: 3 # critic-author turns per conversation
404
+ # temperature: 1.0 # Boltzmann selection temperature
405
+ # structural_mutation_probability: 0.2
406
+ # population_cap: 10 # max candidates per island
407
+ # n_emigrate: 5 # migration count between islands
408
+ # reset_interval: 3 # gens between island resets
409
+ # adaptive_sampling: false # enable adaptive case sampling
410
+
411
+ # Inference parameters (passed to LLM API calls)
412
+ generation:
413
+ temperature: 0.7
414
+ max_tokens: 4096
415
+ # top_p: null
416
+ # top_k: null
417
+ # frequency_penalty: null
418
+ # presence_penalty: null
419
+ ```
420
+
421
+ ### `results/run-NNN.yaml`
422
+
423
+ Written automatically by `helix evolve`. Contains the full evolution result.
424
+
425
+ ```yaml
426
+ run_id: run-001
427
+ timestamp: "2026-03-27T22:35:28+00:00"
428
+ termination_reason: perfect_fitness # or generations_complete, budget_exhausted
429
+
430
+ best_template: |
431
+ You are a customer support agent for {{ company_name }}...
432
+ (the full evolved template)
433
+
434
+ fitness:
435
+ score: 0.0 # 0.0 = all cases pass
436
+ normalized_score: 0.0
437
+
438
+ seed_fitness: -6.0 # original template's score
439
+
440
+ cost:
441
+ total_calls: 342
442
+ total_input_tokens: 1250000
443
+ total_output_tokens: 85000
444
+ total_cost_usd: 0.47
445
+
446
+ effective_models:
447
+ meta_model: gemini-2.5-pro
448
+ meta_provider: gemini
449
+ target_model: gemini-2.5-flash
450
+ target_provider: gemini
451
+ judge_model: gemini-2.5-flash
452
+ judge_provider: gemini
453
+
454
+ generation_records:
455
+ - generation: 1
456
+ best_fitness: -2.0
457
+ avg_fitness: -4.5
458
+ candidates_evaluated: 20
459
+ - generation: 2
460
+ best_fitness: -0.5
461
+ avg_fitness: -1.8
462
+ candidates_evaluated: 20
463
+
464
+ config_snapshot:
465
+ evolution:
466
+ generations: 10
467
+ islands: 4
468
+ conversations_per_island: 5
469
+ ```
470
+
471
+ ## Configuration Cascade
472
+
473
+ Settings are resolved in priority order (highest wins):
474
+
475
+ 1. **CLI flags** (`--generations 5`, `--budget 1.00`)
476
+ 2. **config.yaml** values
477
+ 3. **Environment variables** (`GENE_GEMINI_API_KEY`, `GENE_META_MODEL`, etc.)
478
+ 4. **Defaults** (built into the engine)
479
+
480
+ This means you can set defaults in `config.yaml`, override per-run with CLI flags, and keep secrets in `.env` or environment variables.
481
+
482
+ ## Agent Integration
483
+
484
+ Every command supports `--json` for structured output, making the CLI usable by AI coding agents like Claude Code.
485
+
486
+ Example CLAUDE.md configuration for a project using Helix:
487
+
488
+ ```markdown
489
+ ## Prompt Optimization
490
+
491
+ Run `helix evolve <prompt-id> --json` to optimize prompts.
492
+ Run `helix results <prompt-id> --json` to check evolution results.
493
+ Run `helix accept <prompt-id>` to apply the best evolved template.
494
+ Run `helix show <prompt-id> --json` to inspect prompt configuration.
495
+ ```
496
+
497
+ Example agent workflow:
498
+
499
+ ```bash
500
+ # Agent creates a prompt
501
+ helix init my-agent-prompt --json
502
+
503
+ # Agent checks current state
504
+ helix show my-agent-prompt --json
505
+
506
+ # Agent runs evolution and parses result
507
+ helix evolve my-agent-prompt --generations 5 --json
508
+
509
+ # Agent reads the evolved template
510
+ helix results my-agent-prompt --template
511
+
512
+ # Agent accepts if fitness improved
513
+ helix accept my-agent-prompt --json
514
+ ```
515
+
516
+ ## Typical Workflow
517
+
518
+ ```
519
+ helix init <id> Create prompt directory
520
+ |
521
+ v
522
+ Edit YAML files Define template, test cases, config
523
+ |
524
+ v
525
+ helix evolve <id> Run evolution (live progress)
526
+ |
527
+ v
528
+ helix results <id> Review fitness, cost, evolved template
529
+ |
530
+ / \
531
+ / \
532
+ v v
533
+ Accept Iterate Apply template or adjust test cases
534
+ ```
535
+
536
+ 1. **Initialize**: `helix init` creates a starter directory
537
+ 2. **Define**: Write your prompt template in `prompt.yaml`, add test cases to `dataset.yaml`
538
+ 3. **Configure**: Set models and budget in `config.yaml` (or use defaults)
539
+ 4. **Evolve**: `helix evolve` runs the engine and saves results
540
+ 5. **Review**: `helix results` shows what improved and by how much
541
+ 6. **Accept or iterate**: `helix accept` applies the evolved template, or adjust test cases and re-run
542
+
543
+ ## Troubleshooting
544
+
545
+ **"Missing API key for gemini"**
546
+ Set your API key in a `.env` file or environment:
547
+ ```bash
548
+ export GENE_GEMINI_API_KEY=your-key-here
549
+ ```
550
+
551
+ **"No test cases found"**
552
+ Add at least one test case to `dataset.yaml`. See the file format reference above.
553
+
554
+ **"Prompt not found"**
555
+ Make sure you're in the right directory. Use `helix list` to see available prompts, or `--dir` to specify the workspace.
556
+
557
+ **Evolution finishes instantly with perfect fitness**
558
+ Your test cases may be too easy. Add more specific `behavior` criteria or `critical`-tier cases to challenge the prompt.