weco 0.2.7__tar.gz → 0.2.9__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. {weco-0.2.7 → weco-0.2.9}/.github/workflows/release.yml +2 -2
  2. {weco-0.2.7 → weco-0.2.9}/PKG-INFO +24 -130
  3. weco-0.2.9/README.md +152 -0
  4. weco-0.2.9/examples/cuda/README.md +40 -0
  5. weco-0.2.9/examples/metal/README.md +39 -0
  6. weco-0.2.9/examples/prompt/README.md +100 -0
  7. weco-0.2.9/examples/prompt/eval.py +135 -0
  8. weco-0.2.9/examples/prompt/optimize.py +34 -0
  9. weco-0.2.9/examples/prompt/prompt_guide.md +45 -0
  10. weco-0.2.9/examples/spaceship-titanic/README.md +62 -0
  11. weco-0.2.9/examples/triton/README.md +38 -0
  12. {weco-0.2.7 → weco-0.2.9}/pyproject.toml +2 -2
  13. {weco-0.2.7 → weco-0.2.9}/weco/__init__.py +1 -1
  14. {weco-0.2.7 → weco-0.2.9}/weco/api.py +3 -8
  15. {weco-0.2.7 → weco-0.2.9}/weco/cli.py +34 -17
  16. {weco-0.2.7 → weco-0.2.9}/weco/panels.py +12 -3
  17. {weco-0.2.7 → weco-0.2.9}/weco.egg-info/PKG-INFO +24 -130
  18. {weco-0.2.7 → weco-0.2.9}/weco.egg-info/SOURCES.txt +7 -0
  19. weco-0.2.7/README.md +0 -258
  20. weco-0.2.7/examples/spaceship-titanic/README.md +0 -93
  21. {weco-0.2.7 → weco-0.2.9}/.github/workflows/lint.yml +0 -0
  22. {weco-0.2.7 → weco-0.2.9}/.gitignore +0 -0
  23. {weco-0.2.7 → weco-0.2.9}/LICENSE +0 -0
  24. {weco-0.2.7 → weco-0.2.9}/examples/cuda/evaluate.py +0 -0
  25. {weco-0.2.7 → weco-0.2.9}/examples/cuda/guide.md +0 -0
  26. {weco-0.2.7 → weco-0.2.9}/examples/cuda/optimize.py +0 -0
  27. {weco-0.2.7 → weco-0.2.9}/examples/hello-kernel-world/evaluate.py +0 -0
  28. {weco-0.2.7 → weco-0.2.9}/examples/hello-kernel-world/optimize.py +0 -0
  29. {weco-0.2.7 → weco-0.2.9}/examples/metal/evaluate.py +0 -0
  30. {weco-0.2.7 → weco-0.2.9}/examples/metal/examples.rst +0 -0
  31. {weco-0.2.7 → weco-0.2.9}/examples/metal/optimize.py +0 -0
  32. {weco-0.2.7 → weco-0.2.9}/examples/spaceship-titanic/baseline.py +0 -0
  33. {weco-0.2.7 → weco-0.2.9}/examples/spaceship-titanic/evaluate.py +0 -0
  34. {weco-0.2.7 → weco-0.2.9}/examples/spaceship-titanic/optimize.py +0 -0
  35. {weco-0.2.7 → weco-0.2.9}/examples/spaceship-titanic/requirements-test.txt +0 -0
  36. {weco-0.2.7 → weco-0.2.9}/examples/spaceship-titanic/utils.py +0 -0
  37. {weco-0.2.7 → weco-0.2.9}/examples/triton/evaluate.py +0 -0
  38. {weco-0.2.7 → weco-0.2.9}/examples/triton/optimize.py +0 -0
  39. {weco-0.2.7 → weco-0.2.9}/setup.cfg +0 -0
  40. {weco-0.2.7 → weco-0.2.9}/weco/utils.py +0 -0
  41. {weco-0.2.7 → weco-0.2.9}/weco.egg-info/dependency_links.txt +0 -0
  42. {weco-0.2.7 → weco-0.2.9}/weco.egg-info/entry_points.txt +0 -0
  43. {weco-0.2.7 → weco-0.2.9}/weco.egg-info/requires.txt +0 -0
  44. {weco-0.2.7 → weco-0.2.9}/weco.egg-info/top_level.txt +0 -0
@@ -90,7 +90,7 @@ jobs:
90
90
  GITHUB_TOKEN: ${{ github.token }}
91
91
  run: >-
92
92
  gh release create
93
- 'v0.2.7'
93
+ 'v0.2.9'
94
94
  --repo '${{ github.repository }}'
95
95
  --notes ""
96
96
 
@@ -102,5 +102,5 @@ jobs:
102
102
  # sigstore-produced signatures and certificates.
103
103
  run: >-
104
104
  gh release upload
105
- 'v0.2.7' dist/**
105
+ 'v0.2.9' dist/**
106
106
  --repo '${{ github.repository }}'
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: weco
3
- Version: 0.2.7
3
+ Version: 0.2.9
4
4
  Summary: Documentation for `weco`, a CLI for using Weco AI's code optimizer.
5
5
  Author-email: Weco AI Team <contact@weco.ai>
6
6
  License: MIT
@@ -9,7 +9,7 @@ Keywords: AI,Code Optimization,Code Generation
9
9
  Classifier: Programming Language :: Python :: 3
10
10
  Classifier: Operating System :: OS Independent
11
11
  Classifier: License :: OSI Approved :: MIT License
12
- Requires-Python: >=3.12
12
+ Requires-Python: >=3.8
13
13
  Description-Content-Type: text/markdown
14
14
  License-File: LICENSE
15
15
  Requires-Dist: requests
@@ -20,13 +20,19 @@ Requires-Dist: build; extra == "dev"
20
20
  Requires-Dist: setuptools_scm; extra == "dev"
21
21
  Dynamic: license-file
22
22
 
23
- # Weco CLI Code Optimizer for Machine Learning Engineers
23
+ # Weco: The Evaluation-Driven AI Code Optimizer
24
24
 
25
25
  [![Python](https://img.shields.io/badge/Python-3.12.0-blue)](https://www.python.org)
26
- [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
27
26
  [![PyPI version](https://badge.fury.io/py/weco.svg)](https://badge.fury.io/py/weco)
27
+ [![AIDE](https://img.shields.io/badge/AI--Driven_Exploration-arXiv-orange?style=flat-square&logo=arxiv)](https://arxiv.org/abs/2502.13138)
28
28
 
29
- `weco` is a command-line interface for interacting with Weco AI's code optimizer, powered by [AI-Driven Exploration](https://arxiv.org/abs/2502.13138). It helps you automate the improvement of your code for tasks like GPU kernel optimization, feature engineering, model development, and prompt engineering.
29
+ Weco systematically optimizes your code, guided directly by your evaluation metrics.
30
+
31
+ Example applications include:
32
+
33
+ - **GPU Kernel Optimization**: Reimplement PyTorch functions using CUDA, Triton or Metal, optimizing for `latency`, `throughput`, or `memory_bandwidth`.
34
+ - **Model Development**: Tune feature transformations or architectures, optimizing for `validation_accuracy`, `AUC`, or `Sharpe Ratio`.
35
+ - **Prompt Engineering**: Refine prompts for LLMs, optimizing for `win_rate`, `relevance`, or `format_adherence`
30
36
 
31
37
  https://github.com/user-attachments/assets/cb724ef1-bff6-4757-b457-d3b2201ede81
32
38
 
@@ -40,37 +46,6 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
40
46
 
41
47
  ---
42
48
 
43
- ## Example Use Cases
44
-
45
- Here's how `weco` can be applied to common ML engineering tasks:
46
-
47
- * **GPU Kernel Optimization:**
48
- * **Goal:** Improve the speed or efficiency of low-level GPU code.
49
- * **How:** `weco` iteratively refines CUDA, Triton, Metal, or other kernel code specified in your `--source` file.
50
- * **`--eval-command`:** Typically runs a script that compiles the kernel, executes it, and benchmarks performance (e.g., latency, throughput).
51
- * **`--metric`:** Examples include `latency`, `throughput`, `TFLOPS`, `memory_bandwidth`. Optimize to `minimize` latency or `maximize` throughput.
52
-
53
- * **Feature Engineering:**
54
- * **Goal:** Discover better data transformations or feature combinations for your machine learning models.
55
- * **How:** `weco` explores different processing steps or parameters within your feature transformation code (`--source`).
56
- * **`--eval-command`:** Executes a script that applies the features, trains/validates a model using those features, and prints a performance score.
57
- * **`--metric`:** Examples include `accuracy`, `AUC`, `F1-score`, `validation_loss`. Usually optimized to `maximize` accuracy/AUC/F1 or `minimize` loss.
58
-
59
- * **Model Development:**
60
- * **Goal:** Tune hyperparameters or experiment with small architectural changes directly within your model's code.
61
- * **How:** `weco` modifies hyperparameter values (like learning rate, layer sizes if defined in the code) or structural elements in your model definition (`--source`).
62
- * **`--eval-command`:** Runs your model training and evaluation script, printing the key performance indicator.
63
- * **`--metric`:** Examples include `validation_accuracy`, `test_loss`, `inference_time`, `perplexity`. Optimize according to the metric's nature (e.g., `maximize` accuracy, `minimize` loss).
64
-
65
- * **Prompt Engineering:**
66
- * **Goal:** Refine prompts used within larger systems (e.g., for LLM interactions) to achieve better or more consistent outputs.
67
- * **How:** `weco` modifies prompt templates, examples, or instructions stored in the `--source` file.
68
- * **`--eval-command`:** Executes a script that uses the prompt, generates an output, evaluates that output against desired criteria (e.g., using another LLM, checking for keywords, format validation), and prints a score.
69
- * **`--metric`:** Examples include `quality_score`, `relevance`, `task_success_rate`, `format_adherence`. Usually optimized to `maximize`.
70
-
71
- ---
72
-
73
-
74
49
  ## Setup
75
50
 
76
51
  1. **Install the Package:**
@@ -97,13 +72,20 @@ Here's how `weco` can be applied to common ML engineering tasks:
97
72
 
98
73
  ---
99
74
 
100
- ### Examples
75
+ ### Example: Optimizing Simple PyTorch Operations
76
+
77
+ This basic example shows how to optimize a simple PyTorch function for speedup.
101
78
 
102
- **Example 1: Optimizing PyTorch simple operations**
79
+ For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
103
80
 
104
81
  ```bash
82
+ # Navigate to the example directory
105
83
  cd examples/hello-kernel-world
106
- pip install torch
84
+
85
+ # Install dependencies
86
+ pip install torch
87
+
88
+ # Run Weco
107
89
  weco --source optimize.py \
108
90
  --eval-command "python evaluate.py --solution-path optimize.py --device cpu" \
109
91
  --metric speedup \
@@ -113,96 +95,7 @@ weco --source optimize.py \
113
95
  --additional-instructions "Fuse operations in the forward method while ensuring the max float deviation remains small. Maintain the same format of the code."
114
96
  ```
115
97
 
116
- Note that if you have an NVIDIA gpu, change the device to `cuda`. If you are running this on Apple Silicon, set it to `mps`.
117
-
118
- **Example 2: Optimizing MLX operations with instructions from a file**
119
-
120
- Lets optimize a 2D convolution operation in [`mlx`](https://github.com/ml-explore/mlx) using [Metal](https://developer.apple.com/documentation/metal/). Sometimes, additional context or instructions are too complex for a single command-line string. You can provide a path to a file containing these instructions.
121
-
122
- ```bash
123
- cd examples/metal
124
- pip install mlx
125
- weco --source optimize.py \
126
- --eval-command "python evaluate.py --solution-path optimize.py" \
127
- --metric speedup \
128
- --maximize true \
129
- --steps 30 \
130
- --model gemini-2.5-pro-exp-03-25 \
131
- --additional-instructions examples.rst
132
- ```
133
-
134
- **Example 3: Level Agnostic Optimization: Causal Self Attention with Triton & CUDA**
135
-
136
- Given how useful causal multihead self attention is to transformers, we've seen its wide adoption across ML engineering and AI research. Its great to keep things at a high-level (in PyTorch) when doing research, but when moving to production you often need to write highly customized low-level kernels to make things run as fast as they can. The `weco` CLI can optimize kernels across a variety of different abstraction levels and frameworks. Example 2 uses Metal but lets explore two more frameworks:
137
-
138
- 1. [Triton](https://github.com/triton-lang/triton)
139
- ```bash
140
- cd examples/triton
141
- pip install torch triton
142
- weco --source optimize.py \
143
- --eval-command "python evaluate.py --solution-path optimize.py" \
144
- --metric speedup \
145
- --maximize true \
146
- --steps 30 \
147
- --model gemini-2.5-pro-exp-03-25 \
148
- --additional-instructions "Use triton to optimize the code while ensuring a small max float diff. Maintain the same code format."
149
- ```
150
-
151
- 2. [CUDA](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
152
- ```bash
153
- cd examples/cuda
154
- pip install torch
155
- weco --source optimize.py \
156
- --eval-command "python evaluate.py --solution-path optimize.py" \
157
- --metric speedup \
158
- --maximize true \
159
- --steps 30 \
160
- --model gemini-2.5-pro-exp-03-25 \
161
- --additional-instructions guide.md
162
- ```
163
-
164
- **Example 4: Optimizing a Classification Model**
165
-
166
- This example demonstrates optimizing a script for a Kaggle competition ([Spaceship Titanic](https://www.kaggle.com/competitions/spaceship-titanic/overview)) to improve classification accuracy. The additional instructions are provided via a separate file (`examples/spaceship-titanic/README.md`).
167
-
168
- First, install the requirements for the example environment:
169
- ```bash
170
- pip install -r examples/spaceship-titanic/requirements-test.txt
171
- ```
172
- And run utility function once to prepare the dataset
173
- ```bash
174
- python examples/spaceship-titanic/utils.py
175
- ```
176
-
177
- You should see the following structure at `examples/spaceship-titanic`. You need to prepare the kaggle credentials for downloading the dataset.
178
- ```
179
- .
180
- ├── baseline.py
181
- ├── evaluate.py
182
- ├── optimize.py
183
- ├── private
184
- │ └── test.csv
185
- ├── public
186
- │ ├── sample_submission.csv
187
- │ ├── test.csv
188
- │ └── train.csv
189
- ├── README.md
190
- ├── requirements-test.txt
191
- └── utils.py
192
- ```
193
-
194
- Then, execute the optimization command:
195
- ```bash
196
- weco --source examples/spaceship-titanic/optimize.py \
197
- --eval-command "python examples/spaceship-titanic/optimize.py && python examples/spaceship-titanic/evaluate.py" \
198
- --metric accuracy \
199
- --maximize true \
200
- --steps 10 \
201
- --model gemini-2.5-pro-exp-03-25 \
202
- --additional-instructions examples/spaceship-titanic/README.md
203
- ```
204
-
205
- *The [baseline.py](examples/spaceship-titanic/baseline.py) is provided as a start point for optimization*
98
+ **Note:** If you have an NVIDIA GPU, change the device in the `--eval-command` to `cuda`. If you are running this on Apple Silicon, set it to `mps`.
206
99
 
207
100
  ---
208
101
 
@@ -215,9 +108,10 @@ weco --source examples/spaceship-titanic/optimize.py \
215
108
  | `--metric` | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`. | Yes |
216
109
  | `--maximize` | Whether to maximize (`true`) or minimize (`false`) the metric. | Yes |
217
110
  | `--steps` | Number of optimization steps (LLM iterations) to run. | Yes |
218
- | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). Recommended models to try include `o3-mini`, `claude-3-haiku`, and `gemini-2.5-pro-exp-03-25`.| Yes |
111
+ | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.7-sonnet`). Recommended models to try include `o4-mini`, and `gemini-2.5-pro-exp-03-25`.| Yes |
219
112
  | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM. | No |
220
113
  | `--log-dir` | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`. | No |
114
+ | `--preserve-source` | (Optional) If set, do not overwrite the original `--source` file. Modifications and the best solution will still be saved in the `--log-dir`. | No |
221
115
 
222
116
  ---
223
117
 
weco-0.2.9/README.md ADDED
@@ -0,0 +1,152 @@
1
+ # Weco: The Evaluation-Driven AI Code Optimizer
2
+
3
+ [![Python](https://img.shields.io/badge/Python-3.12.0-blue)](https://www.python.org)
4
+ [![PyPI version](https://badge.fury.io/py/weco.svg)](https://badge.fury.io/py/weco)
5
+ [![AIDE](https://img.shields.io/badge/AI--Driven_Exploration-arXiv-orange?style=flat-square&logo=arxiv)](https://arxiv.org/abs/2502.13138)
6
+
7
+ Weco systematically optimizes your code, guided directly by your evaluation metrics.
8
+
9
+ Example applications include:
10
+
11
+ - **GPU Kernel Optimization**: Reimplement PyTorch functions using CUDA, Triton or Metal, optimizing for `latency`, `throughput`, or `memory_bandwidth`.
12
+ - **Model Development**: Tune feature transformations or architectures, optimizing for `validation_accuracy`, `AUC`, or `Sharpe Ratio`.
13
+ - **Prompt Engineering**: Refine prompts for LLMs, optimizing for `win_rate`, `relevance`, or `format_adherence`
14
+
15
+ https://github.com/user-attachments/assets/cb724ef1-bff6-4757-b457-d3b2201ede81
16
+
17
+ ---
18
+
19
+ ## Overview
20
+
21
+ The `weco` CLI leverages a tree search approach guided by Large Language Models (LLMs) to iteratively explore and refine your code. It automatically applies changes, runs your evaluation script, parses the results, and proposes further improvements based on the specified goal.
22
+
23
+ ![image](https://github.com/user-attachments/assets/a6ed63fa-9c40-498e-aa98-a873e5786509)
24
+
25
+ ---
26
+
27
+ ## Setup
28
+
29
+ 1. **Install the Package:**
30
+
31
+ ```bash
32
+ pip install weco
33
+ ```
34
+
35
+ 2. **Configure API Keys:**
36
+
37
+ Set the appropriate environment variables for your desired language model provider:
38
+
39
+ - **OpenAI:** `export OPENAI_API_KEY="your_key_here"`
40
+ - **Anthropic:** `export ANTHROPIC_API_KEY="your_key_here"`
41
+ - **Google DeepMind:** `export GEMINI_API_KEY="your_key_here"` (Google AI Studio has a free API usage quota. Create a key [here](https://aistudio.google.com/apikey) to use weco for free.)
42
+
43
+ ---
44
+
45
+ ## Usage
46
+ <div style="background-color: #fff3cd; border: 1px solid #ffeeba; padding: 15px; border-radius: 4px; margin-bottom: 15px;">
47
+ <strong>⚠️ Warning: Code Modification</strong><br>
48
+ <code>weco</code> directly modifies the file specified by <code>--source</code> during the optimization process. It is <strong>strongly recommended</strong> to use version control (like Git) to track changes and revert if needed. Alternatively, ensure you have a backup of your original file before running the command. Upon completion, the file will contain the best-performing version of the code found during the run.
49
+ </div>
50
+
51
+ ---
52
+
53
+ ### Example: Optimizing Simple PyTorch Operations
54
+
55
+ This basic example shows how to optimize a simple PyTorch function for speedup.
56
+
57
+ For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
58
+
59
+ ```bash
60
+ # Navigate to the example directory
61
+ cd examples/hello-kernel-world
62
+
63
+ # Install dependencies
64
+ pip install torch
65
+
66
+ # Run Weco
67
+ weco --source optimize.py \
68
+ --eval-command "python evaluate.py --solution-path optimize.py --device cpu" \
69
+ --metric speedup \
70
+ --maximize true \
71
+ --steps 15 \
72
+ --model gemini-2.5-pro-exp-03-25 \
73
+ --additional-instructions "Fuse operations in the forward method while ensuring the max float deviation remains small. Maintain the same format of the code."
74
+ ```
75
+
76
+ **Note:** If you have an NVIDIA GPU, change the device in the `--eval-command` to `cuda`. If you are running this on Apple Silicon, set it to `mps`.
77
+
78
+ ---
79
+
80
+ ### Command Line Arguments
81
+
82
+ | Argument | Description | Required |
83
+ | :-------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- |
84
+ | `--source` | Path to the source code file that will be optimized (e.g., `optimize.py`). | Yes |
85
+ | `--eval-command` | Command to run for evaluating the code in `--source`. This command should print the target `--metric` and its value to the terminal (stdout/stderr). See note below. | Yes |
86
+ | `--metric` | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`. | Yes |
87
+ | `--maximize` | Whether to maximize (`true`) or minimize (`false`) the metric. | Yes |
88
+ | `--steps` | Number of optimization steps (LLM iterations) to run. | Yes |
89
+ | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.7-sonnet`). Recommended models to try include `o4-mini`, and `gemini-2.5-pro-exp-03-25`.| Yes |
90
+ | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM. | No |
91
+ | `--log-dir` | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`. | No |
92
+ | `--preserve-source` | (Optional) If set, do not overwrite the original `--source` file. Modifications and the best solution will still be saved in the `--log-dir`. | No |
93
+
94
+ ---
95
+
96
+ ### Performance & Expectations
97
+
98
+ Weco, powered by the AIDE algorithm, optimizes code iteratively based on your evaluation results. Achieving significant improvements, especially on complex research-level tasks, often requires substantial exploration time.
99
+
100
+ The following plot from the independent [Research Engineering Benchmark (RE-Bench)](https://metr.org/AI_R_D_Evaluation_Report.pdf) report shows the performance of AIDE (the algorithm behind Weco) on challenging ML research engineering tasks over different time budgets.
101
+ <p align="center">
102
+ <img src="https://github.com/user-attachments/assets/ff0e471d-2f50-4e2d-b718-874862f533df" alt="RE-Bench Performance Across Time" width="60%"/>
103
+ </p>
104
+
105
+ As shown, AIDE demonstrates strong performance gains over time, surpassing lower human expert percentiles within hours and continuing to improve. This highlights the potential of evaluation-driven optimization but also indicates that reaching high levels of performance comparable to human experts on difficult benchmarks can take considerable time (tens of hours in this specific benchmark, corresponding to many `--steps` in the Weco CLI). Factor this into your planning when setting the number of `--steps` for your optimization runs.
106
+
107
+ ---
108
+
109
+ ### Important Note on Evaluation
110
+
111
+ The command specified by `--eval-command` is crucial. It's responsible for executing the potentially modified code from `--source` and assessing its performance. **This command MUST print the metric you specified with `--metric` along with its numerical value to the terminal (standard output or standard error).** Weco reads this output to understand how well each code version performs and guide the optimization process.
112
+
113
+ For example, if you set `--metric speedup`, your evaluation script (`eval.py` in the examples) should output a line like:
114
+
115
+ ```
116
+ speedup: 1.5
117
+ ```
118
+
119
+ or
120
+
121
+ ```
122
+ Final speedup value = 1.5
123
+ ```
124
+
125
+ Weco will parse this output to extract the numerical value (1.5 in this case) associated with the metric name ('speedup').
126
+
127
+
128
+ ## Contributing
129
+
130
+ We welcome contributions! To get started:
131
+
132
+ 1. **Fork and Clone the Repository:**
133
+ ```bash
134
+ git clone https://github.com/WecoAI/weco-cli.git
135
+ cd weco-cli
136
+ ```
137
+
138
+ 2. **Install Development Dependencies:**
139
+ ```bash
140
+ pip install -e ".[dev]"
141
+ ```
142
+
143
+ 3. **Create a Feature Branch:**
144
+ ```bash
145
+ git checkout -b feature/your-feature-name
146
+ ```
147
+
148
+ 4. **Make Your Changes:** Ensure your code adheres to our style guidelines and includes relevant tests.
149
+
150
+ 5. **Commit and Push** your changes, then open a pull request with a clear description of your enhancements.
151
+
152
+ ---
@@ -0,0 +1,40 @@
1
+ # Example: Optimizing PyTorch Self-Attention with CUDA
2
+
3
+ This example showcases using Weco to optimize a PyTorch causal multi-head self-attention implementation by generating custom [CUDA](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) kernels. This approach aims for low-level optimization beyond standard PyTorch or even Triton for potentially higher performance on NVIDIA GPUs.
4
+
5
+ This example uses a separate Markdown file (`guide.md`) to provide detailed instructions and context to the LLM.
6
+
7
+ ## Setup
8
+
9
+ 1. Ensure you are in the `examples/cuda` directory.
10
+ 2. Install the required dependency:
11
+ ```bash
12
+ pip install torch
13
+ ```
14
+ *(Note: This example requires a compatible NVIDIA GPU and the CUDA Toolkit installed on your system for compiling and running the generated CUDA code.)*
15
+
16
+ ## Optimization Command
17
+
18
+ Run the following command to start the optimization process:
19
+
20
+ ```bash
21
+ weco --source optimize.py \
22
+ --eval-command "python evaluate.py --solution-path optimize.py" \
23
+ --metric speedup \
24
+ --maximize true \
25
+ --steps 30 \
26
+ --model gemini-2.5-pro-exp-03-25 \
27
+ --additional-instructions guide.md
28
+ ```
29
+
30
+ ### Explanation
31
+
32
+ * `--source optimize.py`: The initial PyTorch self-attention code to be optimized with CUDA.
33
+ * `--eval-command "python evaluate.py --solution-path optimize.py"`: Runs the evaluation script, which compiles (if necessary) and benchmarks the CUDA-enhanced code in `optimize.py` against a baseline, printing the `speedup`.
34
+ * `--metric speedup`: The optimization target metric.
35
+ * `--maximize true`: Weco aims to increase the speedup.
36
+ * `--steps 30`: The number of optimization iterations.
37
+ * `--model gemini-2.5-pro-exp-03-25`: The LLM used for code generation.
38
+ * `--additional-instructions guide.md`: Points Weco to a file containing detailed instructions for the LLM on how to write the CUDA kernels, handle compilation (e.g., using `torch.utils.cpp_extension`), manage data types, and ensure correctness.
39
+
40
+ Weco will iteratively modify `optimize.py`, potentially generating and integrating CUDA C++ code, guided by the evaluation results and the instructions in `guide.md`.
@@ -0,0 +1,39 @@
1
+ # Example: Optimizing MLX Convolution with Metal
2
+
3
+ This example demonstrates how to use Weco to optimize a 2D convolution operation implemented in [`mlx`](https://github.com/ml-explore/mlx), targeting Apple's [Metal](https://developer.apple.com/documentation/metal/) framework for execution on Apple Silicon GPUs.
4
+
5
+ It showcases using a separate file (`examples.rst`) to provide detailed context and instructions to the optimizing LLM.
6
+
7
+ ## Setup
8
+
9
+ 1. Ensure you are in the `examples/metal` directory.
10
+ 2. Install the required dependency:
11
+ ```bash
12
+ pip install mlx
13
+ ```
14
+
15
+ ## Optimization Command
16
+
17
+ Run the following command to start the optimization process:
18
+
19
+ ```bash
20
+ weco --source optimize.py \
21
+ --eval-command "python evaluate.py --solution-path optimize.py" \
22
+ --metric speedup \
23
+ --maximize true \
24
+ --steps 30 \
25
+ --model gemini-2.5-pro-exp-03-25 \
26
+ --additional-instructions examples.rst
27
+ ```
28
+
29
+ ### Explanation
30
+
31
+ * `--source optimize.py`: Specifies the Python file containing the MLX convolution code to be optimized.
32
+ * `--eval-command "python evaluate.py --solution-path optimize.py"`: Runs the evaluation script. `evaluate.py` executes the code in `optimize.py`, measures its performance against a baseline, and prints the `speedup` metric.
33
+ * `--metric speedup`: Tells Weco to target the 'speedup' value printed by the evaluation command.
34
+ * `--maximize true`: Instructs Weco to aim for a higher speedup value.
35
+ * `--steps 30`: Defines the number of iterative optimization steps Weco will perform.
36
+ * `--model gemini-2.5-pro-exp-03-25`: Selects the LLM used for proposing code modifications.
37
+ * `--additional-instructions examples.rst`: Provides a path to a file containing detailed guidance for the LLM during optimization (e.g., constraints, preferred Metal techniques).
38
+
39
+ Weco will iteratively modify `optimize.py`, run `evaluate.py`, parse the `speedup`, and generate new code versions based on the results and the instructions in `examples.rst`.
@@ -0,0 +1,100 @@
1
+ # weco-cli/examples/prompt/README.md
2
+ # AIME Prompt Engineering Example with Weco
3
+
4
+ This example shows how **Weco** can iteratively improve a prompt for solving American Invitational Mathematics Examination (AIME) problems. The experiment runs locally, requires only two short Python files, and aims to improve the accuracy metric.
5
+
6
+ This example uses `gpt-4o-mini` via the OpenAI API by default. Ensure your `OPENAI_API_KEY` environment variable is set.
7
+
8
+ ## Files in this folder
9
+
10
+ | File | Purpose |
11
+ | :------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
12
+ | `optimize.py` | Holds the prompt template (instructing the LLM to reason step-by-step and use `\\boxed{}` for the final answer) and the mutable `EXTRA_INSTRUCTIONS` string. Weco edits **only** this file during the search. |
13
+ | `eval.py` | Downloads a small slice of the 2024 AIME dataset, calls `optimize.solve` in parallel, parses the LLM output (looking for `\\boxed{}`), compares it to the ground truth, prints progress logs, and finally prints an `accuracy:` line that Weco reads. |
14
+
15
+ ## Quick start
16
+
17
+ 1. **Clone the repository and enter the folder.**
18
+ ```bash
19
+ # If you cloned the main weco-cli repo already:
20
+ cd examples/prompt
21
+
22
+ # Otherwise:
23
+ # git clone https://github.com/WecoAI/weco-cli.git
24
+ # cd weco-cli/examples/prompt
25
+ ```
26
+ 2. **Install dependencies.**
27
+ ```bash
28
+ # Ensure you have weco installed: pip install weco
29
+ pip install openai datasets # Add any other dependencies if needed
30
+ ```
31
+ 3. **Set your OpenAI API Key.**
32
+ ```bash
33
+ export OPENAI_API_KEY="your_openai_api_key_here"
34
+ ```
35
+ 4. **Run Weco.** The command below iteratively modifies `EXTRA_INSTRUCTIONS` in `optimize.py`, runs `eval.py` to evaluate the prompt's effectiveness, reads the printed accuracy, and keeps the best prompt variations found.
36
+ ```bash
37
+ weco --source optimize.py \
38
+ --eval-command "python eval.py" \
39
+ --metric accuracy \
40
+ --maximize true \
41
+ --steps 40 \
42
+ --model gemini-2.5-pro-exp-03-25
43
+ ```
44
+ *Note: You can replace `--model gemini-2.5-pro-exp-03-25` with another powerful model like `o3` if you have the respective API keys set.*
45
+
46
+ During each evaluation round, you will see log lines similar to the following:
47
+
48
+ ```text
49
+ [setup] loading 20 problems from AIME 2024 …
50
+ [progress] 5/20 completed, accuracy: 0.0000, elapsed 7.3 s
51
+ [progress] 10/20 completed, accuracy: 0.1000, elapsed 14.6 s
52
+ [progress] 15/20 completed, accuracy: 0.0667, elapsed 21.8 s
53
+ [progress] 20/20 completed, accuracy: 0.0500, elapsed 28.9 s
54
+ accuracy: 0.0500# AIME 2024 Prompt‑Engineering Example
55
+ This example shows how **Weco** can iteratively improve a prompt for solving American Invitational Mathematics Examination (AIME) problems. The experiment runs locally, requires only two short Python files, and finishes in a few hours on a laptop.
56
+
57
+ ## Files in this folder
58
+
59
+ | File | Purpose |
60
+ | :------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
61
+ | `optimize.py` | Holds the prompt template (instructing the LLM to reason step-by-step and use `\\boxed{}` for the final answer) and the function to call the LLM. Weco edits **only** this file during the search to refine the prompt template. |
62
+ | `eval.py` | Defines the LLM model to use (`MODEL_TO_USE`). Downloads a small slice of the 2024 AIME dataset, calls `optimize.solve` in parallel (passing the chosen model), parses the LLM output, compares it to the ground truth, prints progress logs, and finally prints an `accuracy:` line that Weco reads. |
63
+
64
+ ## Quick start
65
+
66
+ 1. **Clone the repository and enter the folder.**
67
+ ```bash
68
+ git clone https://github.com/your‑fork/weco‑examples.git
69
+ cd weco‑examples/aime‑2024
70
+ ```
71
+ 2. **Run Weco.** The command below edits `EXTRA_INSTRUCTIONS` in `optimize.py`, invokes `eval.py` on every iteration, reads the printed accuracy, and keeps the best variants.
72
+ ```bash
73
+ weco --source optimize.py \
74
+ --eval-command "python eval.py" \
75
+ --metric accuracy \
76
+ --maximize true \
77
+ --steps 40 \
78
+ --model gemini-2.5-flash-preview-04-17 \
79
+ --addtional-instructions prompt_guide.md
80
+ ```
81
+
82
+ During each evaluation round you will see log lines similar to the following.
83
+
84
+ ```text
85
+ [setup] loading 20 problems from AIME 2024 …
86
+ [progress] 5/20 completed, elapsed 7.3 s
87
+ [progress] 10/20 completed, elapsed 14.6 s
88
+ [progress] 15/20 completed, elapsed 21.8 s
89
+ [progress] 20/20 completed, elapsed 28.9 s
90
+ accuracy: 0.0500
91
+ ```
92
+
93
+ Weco then mutates the config, tries again, and gradually pushes the accuracy higher. On a modern laptop you can usually double the baseline score within thirty to forty iterations.
94
+
95
+ ## How it works
96
+
97
+ * `eval_aime.py` slices the **Maxwell‑Jia/AIME_2024** dataset to twenty problems for fast feedback. You can change the slice in one line.
98
+ * The script sends model calls in parallel via `ThreadPoolExecutor`, so network latency is hidden.
99
+ * Every five completed items, the script logs progress and elapsed time.
100
+ * The final line `accuracy: value` is the only part Weco needs for guidance.