weco 0.2.17__tar.gz → 0.2.19__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. {weco-0.2.17 → weco-0.2.19}/PKG-INFO +30 -51
  2. {weco-0.2.17 → weco-0.2.19}/README.md +28 -50
  3. {weco-0.2.17 → weco-0.2.19}/examples/cuda/README.md +2 -2
  4. weco-0.2.19/examples/prompt/README.md +51 -0
  5. {weco-0.2.17 → weco-0.2.19}/examples/spaceship-titanic/README.md +9 -34
  6. weco-0.2.19/examples/spaceship-titanic/data/sample_submission.csv +4278 -0
  7. weco-0.2.19/examples/spaceship-titanic/data/test.csv +4278 -0
  8. weco-0.2.19/examples/spaceship-titanic/data/train.csv +8694 -0
  9. {weco-0.2.17 → weco-0.2.19}/examples/spaceship-titanic/requirements-test.txt +1 -2
  10. {weco-0.2.17 → weco-0.2.19}/examples/triton/README.md +2 -2
  11. {weco-0.2.17 → weco-0.2.19}/pyproject.toml +5 -7
  12. {weco-0.2.17 → weco-0.2.19}/weco/__init__.py +2 -1
  13. weco-0.2.19/weco/api.py +162 -0
  14. {weco-0.2.17 → weco-0.2.19}/weco/cli.py +211 -23
  15. {weco-0.2.17 → weco-0.2.19}/weco/panels.py +1 -1
  16. {weco-0.2.17 → weco-0.2.19}/weco/utils.py +32 -0
  17. {weco-0.2.17 → weco-0.2.19}/weco.egg-info/PKG-INFO +30 -51
  18. {weco-0.2.17 → weco-0.2.19}/weco.egg-info/SOURCES.txt +3 -2
  19. {weco-0.2.17 → weco-0.2.19}/weco.egg-info/requires.txt +1 -0
  20. weco-0.2.17/examples/prompt/README.md +0 -99
  21. weco-0.2.17/examples/spaceship-titanic/get_data.py +0 -16
  22. weco-0.2.17/examples/spaceship-titanic/submit.py +0 -14
  23. weco-0.2.17/weco/api.py +0 -86
  24. {weco-0.2.17 → weco-0.2.19}/.github/workflows/lint.yml +0 -0
  25. {weco-0.2.17 → weco-0.2.19}/.github/workflows/release.yml +0 -0
  26. {weco-0.2.17 → weco-0.2.19}/.gitignore +0 -0
  27. {weco-0.2.17 → weco-0.2.19}/.repomixignore +0 -0
  28. {weco-0.2.17 → weco-0.2.19}/LICENSE +0 -0
  29. {weco-0.2.17 → weco-0.2.19}/assets/example-optimization.gif +0 -0
  30. {weco-0.2.17 → weco-0.2.19}/examples/cuda/evaluate.py +0 -0
  31. {weco-0.2.17 → weco-0.2.19}/examples/cuda/guide.md +0 -0
  32. {weco-0.2.17 → weco-0.2.19}/examples/cuda/optimize.py +0 -0
  33. {weco-0.2.17 → weco-0.2.19}/examples/hello-kernel-world/evaluate.py +0 -0
  34. {weco-0.2.17 → weco-0.2.19}/examples/hello-kernel-world/optimize.py +0 -0
  35. {weco-0.2.17 → weco-0.2.19}/examples/prompt/eval.py +0 -0
  36. {weco-0.2.17 → weco-0.2.19}/examples/prompt/optimize.py +0 -0
  37. {weco-0.2.17 → weco-0.2.19}/examples/prompt/prompt_guide.md +0 -0
  38. {weco-0.2.17 → weco-0.2.19}/examples/spaceship-titanic/competition_description.md +0 -0
  39. {weco-0.2.17 → weco-0.2.19}/examples/spaceship-titanic/evaluate.py +0 -0
  40. {weco-0.2.17 → weco-0.2.19}/examples/triton/evaluate.py +0 -0
  41. {weco-0.2.17 → weco-0.2.19}/examples/triton/optimize.py +0 -0
  42. {weco-0.2.17 → weco-0.2.19}/setup.cfg +0 -0
  43. {weco-0.2.17 → weco-0.2.19}/weco/auth.py +0 -0
  44. {weco-0.2.17 → weco-0.2.19}/weco.egg-info/dependency_links.txt +0 -0
  45. {weco-0.2.17 → weco-0.2.19}/weco.egg-info/entry_points.txt +0 -0
  46. {weco-0.2.17 → weco-0.2.19}/weco.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: weco
3
- Version: 0.2.17
3
+ Version: 0.2.19
4
4
  Summary: Documentation for `weco`, a CLI for using Weco AI's code optimizer.
5
5
  Author-email: Weco AI Team <contact@weco.ai>
6
6
  License: MIT
@@ -14,6 +14,7 @@ Description-Content-Type: text/markdown
14
14
  License-File: LICENSE
15
15
  Requires-Dist: requests
16
16
  Requires-Dist: rich
17
+ Requires-Dist: packaging
17
18
  Provides-Extra: dev
18
19
  Requires-Dist: ruff; extra == "dev"
19
20
  Requires-Dist: build; extra == "dev"
@@ -22,15 +23,13 @@ Dynamic: license-file
22
23
 
23
24
  <div align="center">
24
25
 
25
- # Weco: The AI Code Optimizer
26
+ # Weco: The Platform for Self-Improving Code
26
27
 
27
28
  [![Python](https://img.shields.io/badge/Python-3.8.0+-blue)](https://www.python.org)
28
29
  [![docs](https://img.shields.io/website?url=https://docs.weco.ai/&label=docs)](https://docs.weco.ai/)
29
30
  [![PyPI version](https://badge.fury.io/py/weco.svg)](https://badge.fury.io/py/weco)
30
31
  [![AIDE](https://img.shields.io/badge/AI--Driven_Exploration-arXiv-orange?style=flat-square&logo=arxiv)](https://arxiv.org/abs/2502.13138)
31
32
 
32
- <code>pip install weco</code>
33
-
34
33
  </div>
35
34
 
36
35
  ---
@@ -39,9 +38,9 @@ Weco systematically optimizes your code, guided directly by your evaluation metr
39
38
 
40
39
  Example applications include:
41
40
 
42
- - **GPU Kernel Optimization**: Reimplement PyTorch functions using CUDA or Triton optimizing for `latency`, `throughput`, or `memory_bandwidth`.
43
- - **Model Development**: Tune feature transformations or architectures, optimizing for `validation_accuracy`, `AUC`, or `Sharpe Ratio`.
44
- - **Prompt Engineering**: Refine prompts for LLMs, optimizing for `win_rate`, `relevance`, or `format_adherence`
41
+ - **GPU Kernel Optimization**: Reimplement PyTorch functions using [CUDA](/examples/cuda/README.md) or [Triton](/examples/triton/README.md), optimizing for `latency`, `throughput`, or `memory_bandwidth`.
42
+ - **Model Development**: Tune feature transformations, architectures or [the whole training pipeline](/examples/spaceship-titanic/README.md), optimizing for `validation_accuracy`, `AUC`, or `Sharpe Ratio`.
43
+ - **Prompt Engineering**: Refine prompts for LLMs (e.g., for [math problems](/examples/prompt/README.md)), optimizing for `win_rate`, `relevance`, or `format_adherence`
45
44
 
46
45
  ![image](assets/example-optimization.gif)
47
46
 
@@ -71,29 +70,9 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
71
70
  - **Anthropic:** `export ANTHROPIC_API_KEY="your_key_here"`
72
71
  - **Google DeepMind:** `export GEMINI_API_KEY="your_key_here"` (Google AI Studio has a free API usage quota. Create a key [here](https://aistudio.google.com/apikey) to use `weco` for free.)
73
72
 
74
- The optimization process will fail if the necessary keys for the chosen model are not found in your environment.
75
-
76
- 3. **Log In to Weco (Optional):**
77
-
78
- To associate your optimization runs with your Weco account and view them on the Weco dashboard, you can log in. `weco` uses a device authentication flow:
79
-
80
- - When you first run `weco run`, you'll be prompted if you want to log in or proceed anonymously.
81
- - If you choose to log in (by pressing `l`), you'll be shown a URL and `weco` will attempt to open it in your default web browser.
82
- - You then authenticate in the browser. Once authenticated, the CLI will detect this and complete the login.
83
- - This saves a Weco-specific API key locally (typically at `~/.config/weco/credentials.json`).
84
-
85
- If you choose to skip login (by pressing Enter or `s`), `weco` will still function using the environment variable LLM keys, but the run history will not be linked to a Weco account.
86
-
87
- To log out and remove your saved Weco API key, use the `weco logout` command.
88
-
89
73
  ---
90
74
 
91
- ## Usage
92
-
93
- The CLI has two main commands:
94
-
95
- - `weco run`: Initiates the code optimization process.
96
- - `weco logout`: Logs you out of your Weco account.
75
+ ## Get Started
97
76
 
98
77
  <div style="background-color: #fff3cd; border: 1px solid #ffeeba; padding: 15px; border-radius: 4px; margin-bottom: 15px;">
99
78
  <strong>⚠️ Warning: Code Modification</strong><br>
@@ -102,10 +81,6 @@ The CLI has two main commands:
102
81
 
103
82
  ---
104
83
 
105
- ### `weco run` Command
106
-
107
- This command starts the optimization process.
108
-
109
84
  **Example: Optimizing Simple PyTorch Operations**
110
85
 
111
86
  This basic example shows how to optimize a simple PyTorch function for speedup.
@@ -123,9 +98,8 @@ pip install torch
123
98
  weco run --source optimize.py \
124
99
  --eval-command "python evaluate.py --solution-path optimize.py --device cpu" \
125
100
  --metric speedup \
126
- --maximize true \
101
+ --goal maximize \
127
102
  --steps 15 \
128
- --model gemini-2.5-pro-exp-03-25 \
129
103
  --additional-instructions "Fuse operations in the forward method while ensuring the max float deviation remains small. Maintain the same format of the code."
130
104
  ```
131
105
 
@@ -133,28 +107,33 @@ weco run --source optimize.py \
133
107
 
134
108
  ---
135
109
 
136
- **Arguments for `weco run`:**
110
+ ### Arguments for `weco run`
137
111
 
138
- | Argument | Description | Required |
139
- | :-------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :------- |
140
- | `--source` | Path to the source code file that will be optimized (e.g., `optimize.py`). | Yes |
141
- | `--eval-command` | Command to run for evaluating the code in `--source`. This command should print the target `--metric` and its value to the terminal (stdout/stderr). See note below. | Yes |
142
- | `--metric` | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`. | Yes |
143
- | `--maximize` | Whether to maximize (`true`) or minimize (`false`) the metric. | Yes |
144
- | `--steps` | Number of optimization steps (LLM iterations) to run. | Yes |
145
- | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). Recommended models to try include `o3-mini`, `claude-3-haiku`, and `gemini-2.5-pro-exp-03-25`. | Yes |
146
- | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM. | No |
147
- | `--log-dir` | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`. | No |
112
+ **Required:**
148
113
 
149
- ---
114
+ | Argument | Description |
115
+ | :------------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
116
+ | `-s, --source` | Path to the source code file that will be optimized (e.g., `optimize.py`). |
117
+ | `-c, --eval-command`| Command to run for evaluating the code in `--source`. This command should print the target `--metric` and its value to the terminal (stdout/stderr). See note below. |
118
+ | `-m, --metric` | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`. |
119
+ | `-g, --goal` | `maximize`/`max` to maximize the `--metric` or `minimize`/`min` to minimize it. |
150
120
 
151
- ### `weco logout` Command
121
+ <br>
152
122
 
153
- This command logs you out by removing the locally stored Weco API key.
123
+ **Optional:**
154
124
 
155
- ```bash
156
- weco logout
157
- ```
125
+ | Argument | Description | Default |
126
+ | :----------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------ |
127
+ | `-n, --steps` | Number of optimization steps (LLM iterations) to run. | 100 |
128
+ | `-M, --model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). | `o4-mini` when `OPENAI_API_KEY` is set; `claude-3-7-sonnet-20250219` when `ANTHROPIC_API_KEY` is set; `gemini-2.5-pro-exp-03-25` when `GEMINI_API_KEY` is set (priority: `OPENAI_API_KEY` > `ANTHROPIC_API_KEY` > `GEMINI_API_KEY`). |
129
+ | `-i, --additional-instructions`| Natural language description of specific instructions **or** path to a file containing detailed instructions to guide the LLM. | `None` |
130
+ | `-l, --log-dir` | Path to the directory to log intermediate steps and final optimization result. | `.runs/` |
131
+
132
+ ---
133
+
134
+ ### Weco Dashboard
135
+ To associate your optimization runs with your Weco account and view them on the Weco dashboard, you can log in. `weco` uses a device authentication flow
136
+ ![image (16)](https://github.com/user-attachments/assets/8a0a285b-4894-46fa-b6a2-4990017ca0c6)
158
137
 
159
138
  ---
160
139
 
@@ -1,14 +1,12 @@
1
1
  <div align="center">
2
2
 
3
- # Weco: The AI Code Optimizer
3
+ # Weco: The Platform for Self-Improving Code
4
4
 
5
5
  [![Python](https://img.shields.io/badge/Python-3.8.0+-blue)](https://www.python.org)
6
6
  [![docs](https://img.shields.io/website?url=https://docs.weco.ai/&label=docs)](https://docs.weco.ai/)
7
7
  [![PyPI version](https://badge.fury.io/py/weco.svg)](https://badge.fury.io/py/weco)
8
8
  [![AIDE](https://img.shields.io/badge/AI--Driven_Exploration-arXiv-orange?style=flat-square&logo=arxiv)](https://arxiv.org/abs/2502.13138)
9
9
 
10
- <code>pip install weco</code>
11
-
12
10
  </div>
13
11
 
14
12
  ---
@@ -17,9 +15,9 @@ Weco systematically optimizes your code, guided directly by your evaluation metr
17
15
 
18
16
  Example applications include:
19
17
 
20
- - **GPU Kernel Optimization**: Reimplement PyTorch functions using CUDA or Triton optimizing for `latency`, `throughput`, or `memory_bandwidth`.
21
- - **Model Development**: Tune feature transformations or architectures, optimizing for `validation_accuracy`, `AUC`, or `Sharpe Ratio`.
22
- - **Prompt Engineering**: Refine prompts for LLMs, optimizing for `win_rate`, `relevance`, or `format_adherence`
18
+ - **GPU Kernel Optimization**: Reimplement PyTorch functions using [CUDA](/examples/cuda/README.md) or [Triton](/examples/triton/README.md), optimizing for `latency`, `throughput`, or `memory_bandwidth`.
19
+ - **Model Development**: Tune feature transformations, architectures or [the whole training pipeline](/examples/spaceship-titanic/README.md), optimizing for `validation_accuracy`, `AUC`, or `Sharpe Ratio`.
20
+ - **Prompt Engineering**: Refine prompts for LLMs (e.g., for [math problems](/examples/prompt/README.md)), optimizing for `win_rate`, `relevance`, or `format_adherence`
23
21
 
24
22
  ![image](assets/example-optimization.gif)
25
23
 
@@ -49,29 +47,9 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
49
47
  - **Anthropic:** `export ANTHROPIC_API_KEY="your_key_here"`
50
48
  - **Google DeepMind:** `export GEMINI_API_KEY="your_key_here"` (Google AI Studio has a free API usage quota. Create a key [here](https://aistudio.google.com/apikey) to use `weco` for free.)
51
49
 
52
- The optimization process will fail if the necessary keys for the chosen model are not found in your environment.
53
-
54
- 3. **Log In to Weco (Optional):**
55
-
56
- To associate your optimization runs with your Weco account and view them on the Weco dashboard, you can log in. `weco` uses a device authentication flow:
57
-
58
- - When you first run `weco run`, you'll be prompted if you want to log in or proceed anonymously.
59
- - If you choose to log in (by pressing `l`), you'll be shown a URL and `weco` will attempt to open it in your default web browser.
60
- - You then authenticate in the browser. Once authenticated, the CLI will detect this and complete the login.
61
- - This saves a Weco-specific API key locally (typically at `~/.config/weco/credentials.json`).
62
-
63
- If you choose to skip login (by pressing Enter or `s`), `weco` will still function using the environment variable LLM keys, but the run history will not be linked to a Weco account.
64
-
65
- To log out and remove your saved Weco API key, use the `weco logout` command.
66
-
67
50
  ---
68
51
 
69
- ## Usage
70
-
71
- The CLI has two main commands:
72
-
73
- - `weco run`: Initiates the code optimization process.
74
- - `weco logout`: Logs you out of your Weco account.
52
+ ## Get Started
75
53
 
76
54
  <div style="background-color: #fff3cd; border: 1px solid #ffeeba; padding: 15px; border-radius: 4px; margin-bottom: 15px;">
77
55
  <strong>⚠️ Warning: Code Modification</strong><br>
@@ -80,10 +58,6 @@ The CLI has two main commands:
80
58
 
81
59
  ---
82
60
 
83
- ### `weco run` Command
84
-
85
- This command starts the optimization process.
86
-
87
61
  **Example: Optimizing Simple PyTorch Operations**
88
62
 
89
63
  This basic example shows how to optimize a simple PyTorch function for speedup.
@@ -101,9 +75,8 @@ pip install torch
101
75
  weco run --source optimize.py \
102
76
  --eval-command "python evaluate.py --solution-path optimize.py --device cpu" \
103
77
  --metric speedup \
104
- --maximize true \
78
+ --goal maximize \
105
79
  --steps 15 \
106
- --model gemini-2.5-pro-exp-03-25 \
107
80
  --additional-instructions "Fuse operations in the forward method while ensuring the max float deviation remains small. Maintain the same format of the code."
108
81
  ```
109
82
 
@@ -111,28 +84,33 @@ weco run --source optimize.py \
111
84
 
112
85
  ---
113
86
 
114
- **Arguments for `weco run`:**
87
+ ### Arguments for `weco run`
115
88
 
116
- | Argument | Description | Required |
117
- | :-------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :------- |
118
- | `--source` | Path to the source code file that will be optimized (e.g., `optimize.py`). | Yes |
119
- | `--eval-command` | Command to run for evaluating the code in `--source`. This command should print the target `--metric` and its value to the terminal (stdout/stderr). See note below. | Yes |
120
- | `--metric` | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`. | Yes |
121
- | `--maximize` | Whether to maximize (`true`) or minimize (`false`) the metric. | Yes |
122
- | `--steps` | Number of optimization steps (LLM iterations) to run. | Yes |
123
- | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). Recommended models to try include `o3-mini`, `claude-3-haiku`, and `gemini-2.5-pro-exp-03-25`. | Yes |
124
- | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM. | No |
125
- | `--log-dir` | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`. | No |
89
+ **Required:**
126
90
 
127
- ---
91
+ | Argument | Description |
92
+ | :------------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
93
+ | `-s, --source` | Path to the source code file that will be optimized (e.g., `optimize.py`). |
94
+ | `-c, --eval-command`| Command to run for evaluating the code in `--source`. This command should print the target `--metric` and its value to the terminal (stdout/stderr). See note below. |
95
+ | `-m, --metric` | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`. |
96
+ | `-g, --goal` | `maximize`/`max` to maximize the `--metric` or `minimize`/`min` to minimize it. |
128
97
 
129
- ### `weco logout` Command
98
+ <br>
130
99
 
131
- This command logs you out by removing the locally stored Weco API key.
100
+ **Optional:**
132
101
 
133
- ```bash
134
- weco logout
135
- ```
102
+ | Argument | Description | Default |
103
+ | :----------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------ |
104
+ | `-n, --steps` | Number of optimization steps (LLM iterations) to run. | 100 |
105
+ | `-M, --model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). | `o4-mini` when `OPENAI_API_KEY` is set; `claude-3-7-sonnet-20250219` when `ANTHROPIC_API_KEY` is set; `gemini-2.5-pro-exp-03-25` when `GEMINI_API_KEY` is set (priority: `OPENAI_API_KEY` > `ANTHROPIC_API_KEY` > `GEMINI_API_KEY`). |
106
+ | `-i, --additional-instructions`| Natural language description of specific instructions **or** path to a file containing detailed instructions to guide the LLM. | `None` |
107
+ | `-l, --log-dir` | Path to the directory to log intermediate steps and final optimization result. | `.runs/` |
108
+
109
+ ---
110
+
111
+ ### Weco Dashboard
112
+ To associate your optimization runs with your Weco account and view them on the Weco dashboard, you can log in. `weco` uses a device authentication flow
113
+ ![image (16)](https://github.com/user-attachments/assets/8a0a285b-4894-46fa-b6a2-4990017ca0c6)
136
114
 
137
115
  ---
138
116
 
@@ -21,7 +21,7 @@ Run the following command to start the optimization process:
21
21
  weco run --source optimize.py \
22
22
  --eval-command "python evaluate.py --solution-path optimize.py" \
23
23
  --metric speedup \
24
- --maximize true \
24
+ --goal maximize \
25
25
  --steps 30 \
26
26
  --model gemini-2.5-pro-exp-03-25 \
27
27
  --additional-instructions guide.md
@@ -32,7 +32,7 @@ weco run --source optimize.py \
32
32
  * `--source optimize.py`: The initial PyTorch self-attention code to be optimized with CUDA.
33
33
  * `--eval-command "python evaluate.py --solution-path optimize.py"`: Runs the evaluation script, which compiles (if necessary) and benchmarks the CUDA-enhanced code in `optimize.py` against a baseline, printing the `speedup`.
34
34
  * `--metric speedup`: The optimization target metric.
35
- * `--maximize true`: Weco aims to increase the speedup.
35
+ * `--goal maximize`: Weco aims to increase the speedup.
36
36
  * `--steps 30`: The number of optimization iterations.
37
37
  * `--model gemini-2.5-pro-exp-03-25`: The LLM used for code generation.
38
38
  * `--additional-instructions guide.md`: Points Weco to a file containing detailed instructions for the LLM on how to write the CUDA kernels, handle compilation (e.g., using `torch.utils.cpp_extension`), manage data types, and ensure correctness.
@@ -0,0 +1,51 @@
1
+ # AIME Prompt Engineering Example with Weco
2
+
3
+ This example shows how **Weco** can iteratively improve a prompt for solving American Invitational Mathematics Examination (AIME) problems. The experiment runs locally, requires only two short Python files, and aims to improve the accuracy metric.
4
+
5
+ This example uses `gpt-4o-mini` via the OpenAI API by default. Ensure your `OPENAI_API_KEY` environment variable is set.
6
+
7
+ ## Files in this folder
8
+
9
+ | File | Purpose |
10
+ | :------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
11
+ | `optimize.py` | Holds the prompt template (instructing the LLM to reason step-by-step and use `\\boxed{}` for the final answer) and the mutable `EXTRA_INSTRUCTIONS` string. Weco edits **only** this file during the search. |
12
+ | `eval.py` | Downloads a small slice of the 2024 AIME dataset, calls `optimize.solve` in parallel, parses the LLM output (looking for `\\boxed{}`), compares it to the ground truth, prints progress logs, and finally prints an `accuracy:` line that Weco reads. |
13
+
14
+
15
+ ## Quick start
16
+
17
+ 1. **Clone the repository and enter the folder.**
18
+ ```bash
19
+ git clone https://github.com/your‑fork/weco‑examples.git
20
+ cd weco‑examples/aime‑2024
21
+ ```
22
+ 2. **Run Weco.** The command below edits `EXTRA_INSTRUCTIONS` in `optimize.py`, invokes `eval.py` on every iteration, reads the printed accuracy, and keeps the best variants.
23
+ ```bash
24
+ weco --source optimize.py \
25
+ --eval-command "python eval.py" \
26
+ --metric accuracy \
27
+ --goal maximize \
28
+ --steps 40 \
29
+ --model gemini-2.5-flash-preview-04-17 \
30
+ --additional-instructions prompt_guide.md
31
+ ```
32
+
33
+ During each evaluation round you will see log lines similar to the following.
34
+
35
+ ```text
36
+ [setup] loading 20 problems from AIME 2024 …
37
+ [progress] 5/20 completed, elapsed 7.3 s
38
+ [progress] 10/20 completed, elapsed 14.6 s
39
+ [progress] 15/20 completed, elapsed 21.8 s
40
+ [progress] 20/20 completed, elapsed 28.9 s
41
+ accuracy: 0.0500
42
+ ```
43
+
44
+ Weco then mutates the config, tries again, and gradually pushes the accuracy higher. On a modern laptop you can usually double the baseline score within thirty to forty iterations.
45
+
46
+ ## How it works
47
+
48
+ * `eval_aime.py` slices the **Maxwell‑Jia/AIME_2024** dataset to twenty problems for fast feedback. You can change the slice in one line.
49
+ * The script sends model calls in parallel via `ThreadPoolExecutor`, so network latency is hidden.
50
+ * Every five completed items, the script logs progress and elapsed time.
51
+ * The final line `accuracy: value` is the only part Weco needs for guidance.
@@ -1,33 +1,16 @@
1
- # Example: Optimizing a Kaggle Classification Model (Spaceship Titanic)
1
+ # Example: Solving a Kaggle Competition (Spaceship Titanic)
2
2
 
3
3
  This example demonstrates using Weco to optimize a Python script designed for the [Spaceship Titanic Kaggle competition](https://www.kaggle.com/competitions/spaceship-titanic/overview). The goal is to improve the model's `accuracy` metric by directly optimizing the evaluate.py
4
4
 
5
5
  ## Setup
6
6
 
7
7
  1. Ensure you are in the `examples/spaceship-titanic` directory.
8
- 2. **Kaggle Credentials:** You need your Kaggle API credentials (`kaggle.json`) configured to download the competition dataset. Place the `kaggle.json` file in `~/.kaggle/` or set the `KAGGLE_USERNAME` and `KAGGLE_KEY` environment variables. See [Kaggle API documentation](https://github.com/Kaggle/kaggle-api#api-credentials) for details.
9
- 3. **Install Dependencies:** Install the required Python packages:
8
+ 2. `pip install weco`
9
+ 3. Set up LLM API Key, `export OPENAI_API_KEY="your_key_here"`
10
+ 4. **Install Dependencies:** Install the required Python packages:
10
11
  ```bash
11
12
  pip install -r requirements-test.txt
12
13
  ```
13
- 4. **Prepare Data:** Run the utility script once to download the dataset from Kaggle and place it in the expected `./data/` subdirectories:
14
- ```bash
15
- python get_data.py
16
- ```
17
- After running `get_data.py`, your directory structure should look like this:
18
- ```
19
- .
20
- ├── competition_description.md
21
- ├── data
22
- │ ├── sample_submission.csv
23
- │ ├── test.csv
24
- │ └── train.csv
25
- ├── evaluate.py
26
- ├── get_data.py
27
- ├── README.md # This file
28
- ├── requirements-test.txt
29
- └── submit.py
30
- ```
31
14
 
32
15
  ## Optimization Command
33
16
 
@@ -37,21 +20,13 @@ Run the following command to start optimizing the model:
37
20
  weco run --source evaluate.py \
38
21
  --eval-command "python evaluate.py --data-dir ./data" \
39
22
  --metric accuracy \
40
- --maximize true \
41
- --steps 10 \
42
- --model gemini-2.5-pro-exp-03-25 \
23
+ --goal maximize \
24
+ --steps 20 \
25
+ --model o4-mini \
43
26
  --additional-instructions "Improve feature engineering, model choice and hyper-parameters."
44
27
  --log-dir .runs/spaceship-titanic
45
28
  ```
46
29
 
47
- ## Submit the solution
48
-
49
- Once the optimization finished, you can submit your predictions to kaggle to see the results. Make sure `submission.csv` is present and then simply run the following command.
50
-
51
- ```bash
52
- python submit.py
53
- ```
54
-
55
30
  ### Explanation
56
31
 
57
32
  * `--source evaluate.py`: The script provides a baseline as root node and directly optimize the evaluate.py
@@ -59,9 +34,9 @@ python submit.py
59
34
  * [optional] `--data-dir`: path to the train and test data.
60
35
  * [optional] `--seed`: Seed for reproduce the experiment.
61
36
  * `--metric accuracy`: The target metric Weco should optimize.
62
- * `--maximize true`: Weco aims to increase the accuracy.
37
+ * `--goal maximize`: Weco aims to increase the accuracy.
63
38
  * `--steps 10`: The number of optimization iterations.
64
39
  * `--model gemini-2.5-pro-exp-03-25`: The LLM driving the optimization.
65
40
  * `--additional-instructions "Improve feature engineering, model choice and hyper-parameters."`: A simple instruction for model improvement or you can put the path to [`comptition_description.md`](./competition_description.md) within the repo to feed the agent more detailed information.
66
41
 
67
- Weco will iteratively modify the feature engineering or modeling code within `evaluate.py`, run the evaluation pipeline, and use the resulting `accuracy` to guide further improvements.
42
+ Weco will iteratively modify the feature engineering or modeling code within `evaluate.py`, run the evaluation pipeline, and use the resulting `accuracy` to guide further improvements.