PyPI - weco - Versions diffs - 0.2.7__tar.gz → 0.2.9__tar.gz - Mend

weco 0.2.7tar.gz → 0.2.9tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

{weco-0.2.7 → weco-0.2.9}/.github/workflows/release.yml RENAMED Viewed

@@ -90,7 +90,7 @@ jobs:
         GITHUB_TOKEN: ${{ github.token }}
       run: >-
         gh release create
-        'v0.2.7'
+        'v0.2.9'
         --repo '${{ github.repository }}'
         --notes ""
@@ -102,5 +102,5 @@ jobs:
       # sigstore-produced signatures and certificates.
       run: >-
         gh release upload
-        'v0.2.7' dist/**
+        'v0.2.9' dist/**
         --repo '${{ github.repository }}'

{weco-0.2.7 → weco-0.2.9}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: weco
-Version: 0.2.7
+Version: 0.2.9
 Summary: Documentation for `weco`, a CLI for using Weco AI's code optimizer.
 Author-email: Weco AI Team <contact@weco.ai>
 License: MIT
@@ -9,7 +9,7 @@ Keywords: AI,Code Optimization,Code Generation
 Classifier: Programming Language :: Python :: 3
 Classifier: Operating System :: OS Independent
 Classifier: License :: OSI Approved :: MIT License
-Requires-Python: >=3.12
+Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: requests
@@ -20,13 +20,19 @@ Requires-Dist: build; extra == "dev"
 Requires-Dist: setuptools_scm; extra == "dev"
 Dynamic: license-file
-# Weco CLI – Code Optimizer for Machine Learning Engineers
+# Weco: The Evaluation-Driven AI Code Optimizer
 [![Python](https://img.shields.io/badge/Python-3.12.0-blue)](https://www.python.org)
-[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
 [![PyPI version](https://badge.fury.io/py/weco.svg)](https://badge.fury.io/py/weco)
+[![AIDE](https://img.shields.io/badge/AI--Driven_Exploration-arXiv-orange?style=flat-square&logo=arxiv)](https://arxiv.org/abs/2502.13138)
-`weco` is a command-line interface for interacting with Weco AI's code optimizer, powered by [AI-Driven Exploration](https://arxiv.org/abs/2502.13138). It helps you automate the improvement of your code for tasks like GPU kernel optimization, feature engineering, model development, and prompt engineering.
+Weco systematically optimizes your code, guided directly by your evaluation metrics.
+Example applications include:
+- **GPU Kernel Optimization**: Reimplement PyTorch functions using CUDA, Triton or Metal, optimizing for `latency`, `throughput`, or `memory_bandwidth`.
+- **Model Development**: Tune feature transformations or architectures, optimizing for `validation_accuracy`, `AUC`, or `Sharpe Ratio`.
+- **Prompt Engineering**: Refine prompts for LLMs, optimizing for  `win_rate`, `relevance`, or `format_adherence`
 https://github.com/user-attachments/assets/cb724ef1-bff6-4757-b457-d3b2201ede81
@@ -40,37 +46,6 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
 ---
-## Example Use Cases
-Here's how `weco` can be applied to common ML engineering tasks:
-*   **GPU Kernel Optimization:**
-    *   **Goal:** Improve the speed or efficiency of low-level GPU code.
-    *   **How:** `weco` iteratively refines CUDA, Triton, Metal, or other kernel code specified in your `--source` file.
-    *   **`--eval-command`:** Typically runs a script that compiles the kernel, executes it, and benchmarks performance (e.g., latency, throughput).
-    *   **`--metric`:** Examples include `latency`, `throughput`, `TFLOPS`, `memory_bandwidth`. Optimize to `minimize` latency or `maximize` throughput.
-*   **Feature Engineering:**
-    *   **Goal:** Discover better data transformations or feature combinations for your machine learning models.
-    *   **How:** `weco` explores different processing steps or parameters within your feature transformation code (`--source`).
-    *   **`--eval-command`:** Executes a script that applies the features, trains/validates a model using those features, and prints a performance score.
-    *   **`--metric`:** Examples include `accuracy`, `AUC`, `F1-score`, `validation_loss`. Usually optimized to `maximize` accuracy/AUC/F1 or `minimize` loss.
-*   **Model Development:**
-    *   **Goal:** Tune hyperparameters or experiment with small architectural changes directly within your model's code.
-    *   **How:** `weco` modifies hyperparameter values (like learning rate, layer sizes if defined in the code) or structural elements in your model definition (`--source`).
-    *   **`--eval-command`:** Runs your model training and evaluation script, printing the key performance indicator.
-    *   **`--metric`:** Examples include `validation_accuracy`, `test_loss`, `inference_time`, `perplexity`. Optimize according to the metric's nature (e.g., `maximize` accuracy, `minimize` loss).
-*   **Prompt Engineering:**
-    *   **Goal:** Refine prompts used within larger systems (e.g., for LLM interactions) to achieve better or more consistent outputs.
-    *   **How:** `weco` modifies prompt templates, examples, or instructions stored in the `--source` file.
-    *   **`--eval-command`:** Executes a script that uses the prompt, generates an output, evaluates that output against desired criteria (e.g., using another LLM, checking for keywords, format validation), and prints a score.
-    *   **`--metric`:** Examples include `quality_score`, `relevance`, `task_success_rate`, `format_adherence`. Usually optimized to `maximize`.
----
 ## Setup
 1.  **Install the Package:**
@@ -97,13 +72,20 @@ Here's how `weco` can be applied to common ML engineering tasks:
 ---
-### Examples
+### Example: Optimizing Simple PyTorch Operations
+This basic example shows how to optimize a simple PyTorch function for speedup.
-**Example 1: Optimizing PyTorch simple operations**
+For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
 ```bash
+# Navigate to the example directory
 cd examples/hello-kernel-world
-pip install torch
+# Install dependencies
+pip install torch
+# Run Weco
 weco --source optimize.py \
      --eval-command "python evaluate.py --solution-path optimize.py --device cpu" \
      --metric speedup \
@@ -113,96 +95,7 @@ weco --source optimize.py \
      --additional-instructions "Fuse operations in the forward method while ensuring the max float deviation remains small. Maintain the same format of the code."
 ```
-Note that if you have an NVIDIA gpu, change the device to `cuda`. If you are running this on Apple Silicon, set it to `mps`.
-**Example 2: Optimizing MLX operations with instructions from a file**
-Lets optimize a 2D convolution operation in [`mlx`](https://github.com/ml-explore/mlx) using [Metal](https://developer.apple.com/documentation/metal/). Sometimes, additional context or instructions are too complex for a single command-line string. You can provide a path to a file containing these instructions.
-```bash
-cd examples/metal
-pip install mlx
-weco --source optimize.py \
-     --eval-command "python evaluate.py --solution-path optimize.py" \
-     --metric speedup \
-     --maximize true \
-     --steps 30 \
-     --model gemini-2.5-pro-exp-03-25 \
-     --additional-instructions examples.rst
-```
-**Example 3: Level Agnostic Optimization: Causal Self Attention with Triton & CUDA**
-Given how useful causal multihead self attention is to transformers, we've seen its wide adoption across ML engineering and AI research. Its great to keep things at a high-level (in PyTorch) when doing research, but when moving to production you often need to write highly customized low-level kernels to make things run as fast as they can. The `weco` CLI can optimize kernels across a variety of different abstraction levels and frameworks. Example 2 uses Metal but lets explore two more frameworks:
-1. [Triton](https://github.com/triton-lang/triton)
-    ```bash
-   cd examples/triton
-   pip install torch triton
-   weco --source optimize.py \
-        --eval-command "python evaluate.py --solution-path optimize.py" \
-        --metric speedup \
-        --maximize true \
-        --steps 30 \
-        --model gemini-2.5-pro-exp-03-25 \
-        --additional-instructions "Use triton to optimize the code while ensuring a small max float diff. Maintain the same code format."
-   ```
-2. [CUDA](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)
-   ```bash
-   cd examples/cuda
-   pip install torch
-   weco --source optimize.py \
-        --eval-command "python evaluate.py --solution-path optimize.py" \
-        --metric speedup \
-        --maximize true \
-        --steps 30 \
-        --model gemini-2.5-pro-exp-03-25 \
-        --additional-instructions guide.md
-   ```
-**Example 4: Optimizing a Classification Model**
-This example demonstrates optimizing a script for a Kaggle competition ([Spaceship Titanic](https://www.kaggle.com/competitions/spaceship-titanic/overview)) to improve classification accuracy. The additional instructions are provided via a separate file (`examples/spaceship-titanic/README.md`).
-First, install the requirements for the example environment:
-```bash
-pip install -r examples/spaceship-titanic/requirements-test.txt
-```
-And run utility function once to prepare the dataset
-```bash
-python examples/spaceship-titanic/utils.py
-```
-You should see the following structure at `examples/spaceship-titanic`. You need to prepare the kaggle credentials for downloading the dataset.
-```
-.
-├── baseline.py
-├── evaluate.py
-├── optimize.py
-├── private
-│   └── test.csv
-├── public
-│   ├── sample_submission.csv
-│   ├── test.csv
-│   └── train.csv
-├── README.md
-├── requirements-test.txt
-└── utils.py
-```
-Then, execute the optimization command:
-```bash
-weco --source examples/spaceship-titanic/optimize.py \
-     --eval-command "python examples/spaceship-titanic/optimize.py && python examples/spaceship-titanic/evaluate.py" \
-     --metric accuracy \
-     --maximize true \
-     --steps 10 \
-     --model gemini-2.5-pro-exp-03-25 \
-     --additional-instructions examples/spaceship-titanic/README.md
-```
-*The [baseline.py](examples/spaceship-titanic/baseline.py) is provided as a start point for optimization*
+**Note:** If you have an NVIDIA GPU, change the device in the `--eval-command` to `cuda`. If you are running this on Apple Silicon, set it to `mps`.
 ---
@@ -215,9 +108,10 @@ weco --source examples/spaceship-titanic/optimize.py \
 | `--metric`                  | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`.                | Yes      |
 | `--maximize`                | Whether to maximize (`true`) or minimize (`false`) the metric.                                                                                                           | Yes      |
 | `--steps`                   | Number of optimization steps (LLM iterations) to run.                                                                                                                    | Yes      |
-| `--model`                   | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). Recommended models to try include `o3-mini`, `claude-3-haiku`, and `gemini-2.5-pro-exp-03-25`.| Yes      |
+| `--model`                   | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.7-sonnet`). Recommended models to try include `o4-mini`, and `gemini-2.5-pro-exp-03-25`.| Yes      |
 | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM.                                    | No       |
 | `--log-dir`                 | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`.                                                          | No       |
+| `--preserve-source`         | (Optional) If set, do not overwrite the original `--source` file. Modifications and the best solution will still be saved in the `--log-dir`.                                | No       |
 ---

weco-0.2.9/README.md ADDED Viewed

@@ -0,0 +1,152 @@
+# Weco: The Evaluation-Driven AI Code Optimizer
+[![Python](https://img.shields.io/badge/Python-3.12.0-blue)](https://www.python.org)
+[![PyPI version](https://badge.fury.io/py/weco.svg)](https://badge.fury.io/py/weco)
+[![AIDE](https://img.shields.io/badge/AI--Driven_Exploration-arXiv-orange?style=flat-square&logo=arxiv)](https://arxiv.org/abs/2502.13138)
+Weco systematically optimizes your code, guided directly by your evaluation metrics.
+Example applications include:
+- **GPU Kernel Optimization**: Reimplement PyTorch functions using CUDA, Triton or Metal, optimizing for `latency`, `throughput`, or `memory_bandwidth`.
+- **Model Development**: Tune feature transformations or architectures, optimizing for `validation_accuracy`, `AUC`, or `Sharpe Ratio`.
+- **Prompt Engineering**: Refine prompts for LLMs, optimizing for  `win_rate`, `relevance`, or `format_adherence`
+https://github.com/user-attachments/assets/cb724ef1-bff6-4757-b457-d3b2201ede81
+---
+## Overview
+The `weco` CLI leverages a tree search approach guided by Large Language Models (LLMs) to iteratively explore and refine your code. It automatically applies changes, runs your evaluation script, parses the results, and proposes further improvements based on the specified goal.
+![image](https://github.com/user-attachments/assets/a6ed63fa-9c40-498e-aa98-a873e5786509)
+---
+## Setup
+1.  **Install the Package:**
+    ```bash
+    pip install weco
+    ```
+2.  **Configure API Keys:**
+    Set the appropriate environment variables for your desired language model provider:
+    -   **OpenAI:** `export OPENAI_API_KEY="your_key_here"`
+    -   **Anthropic:** `export ANTHROPIC_API_KEY="your_key_here"`
+    -   **Google DeepMind:** `export GEMINI_API_KEY="your_key_here"` (Google AI Studio has a free API usage quota. Create a key [here](https://aistudio.google.com/apikey) to use weco for free.)
+---
+## Usage
+<div style="background-color: #fff3cd; border: 1px solid #ffeeba; padding: 15px; border-radius: 4px; margin-bottom: 15px;">
+  <strong>⚠️ Warning: Code Modification</strong><br>
+  <code>weco</code> directly modifies the file specified by <code>--source</code> during the optimization process. It is <strong>strongly recommended</strong> to use version control (like Git) to track changes and revert if needed. Alternatively, ensure you have a backup of your original file before running the command. Upon completion, the file will contain the best-performing version of the code found during the run.
+</div>
+---
+### Example: Optimizing Simple PyTorch Operations
+This basic example shows how to optimize a simple PyTorch function for speedup.
+For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
+```bash
+# Navigate to the example directory
+cd examples/hello-kernel-world
+# Install dependencies
+pip install torch
+# Run Weco
+weco --source optimize.py \
+     --eval-command "python evaluate.py --solution-path optimize.py --device cpu" \
+     --metric speedup \
+     --maximize true \
+     --steps 15 \
+     --model gemini-2.5-pro-exp-03-25 \
+     --additional-instructions "Fuse operations in the forward method while ensuring the max float deviation remains small. Maintain the same format of the code."
+```
+**Note:** If you have an NVIDIA GPU, change the device in the `--eval-command` to `cuda`. If you are running this on Apple Silicon, set it to `mps`.
+---
+### Command Line Arguments
+| Argument                    | Description                                                                                                                                                              | Required |
+| :-------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- |
+| `--source`                  | Path to the source code file that will be optimized (e.g., `optimize.py`).                                                                                               | Yes      |
+| `--eval-command`            | Command to run for evaluating the code in `--source`. This command should print the target `--metric` and its value to the terminal (stdout/stderr). See note below.     | Yes      |
+| `--metric`                  | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`.                | Yes      |
+| `--maximize`                | Whether to maximize (`true`) or minimize (`false`) the metric.                                                                                                           | Yes      |
+| `--steps`                   | Number of optimization steps (LLM iterations) to run.                                                                                                                    | Yes      |
+| `--model`                   | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.7-sonnet`). Recommended models to try include `o4-mini`, and `gemini-2.5-pro-exp-03-25`.| Yes      |
+| `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM.                                    | No       |
+| `--log-dir`                 | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`.                                                          | No       |
+| `--preserve-source`         | (Optional) If set, do not overwrite the original `--source` file. Modifications and the best solution will still be saved in the `--log-dir`.                                | No       |
+---
+### Performance & Expectations
+Weco, powered by the AIDE algorithm, optimizes code iteratively based on your evaluation results. Achieving significant improvements, especially on complex research-level tasks, often requires substantial exploration time.
+The following plot from the independent [Research Engineering Benchmark (RE-Bench)](https://metr.org/AI_R_D_Evaluation_Report.pdf) report shows the performance of AIDE (the algorithm behind Weco) on challenging ML research engineering tasks over different time budgets.
+<p align="center">
+<img src="https://github.com/user-attachments/assets/ff0e471d-2f50-4e2d-b718-874862f533df" alt="RE-Bench Performance Across Time" width="60%"/>
+</p>
+As shown, AIDE demonstrates strong performance gains over time, surpassing lower human expert percentiles within hours and continuing to improve. This highlights the potential of evaluation-driven optimization but also indicates that reaching high levels of performance comparable to human experts on difficult benchmarks can take considerable time (tens of hours in this specific benchmark, corresponding to many `--steps` in the Weco CLI). Factor this into your planning when setting the number of `--steps` for your optimization runs.
+---
+### Important Note on Evaluation
+The command specified by `--eval-command` is crucial. It's responsible for executing the potentially modified code from `--source` and assessing its performance. **This command MUST print the metric you specified with `--metric` along with its numerical value to the terminal (standard output or standard error).** Weco reads this output to understand how well each code version performs and guide the optimization process.
+For example, if you set `--metric speedup`, your evaluation script (`eval.py` in the examples) should output a line like:
+```
+speedup: 1.5
+```
+or
+```
+Final speedup value = 1.5
+```
+Weco will parse this output to extract the numerical value (1.5 in this case) associated with the metric name ('speedup').
+## Contributing
+We welcome contributions! To get started:
+1.  **Fork and Clone the Repository:**
+    ```bash
+    git clone https://github.com/WecoAI/weco-cli.git
+    cd weco-cli
+    ```
+2.  **Install Development Dependencies:**
+    ```bash
+    pip install -e ".[dev]"
+    ```
+3.  **Create a Feature Branch:**
+    ```bash
+    git checkout -b feature/your-feature-name
+    ```
+4.  **Make Your Changes:** Ensure your code adheres to our style guidelines and includes relevant tests.
+5.  **Commit and Push** your changes, then open a pull request with a clear description of your enhancements.
+---

weco-0.2.9/examples/cuda/README.md ADDED Viewed

@@ -0,0 +1,40 @@
+# Example: Optimizing PyTorch Self-Attention with CUDA
+This example showcases using Weco to optimize a PyTorch causal multi-head self-attention implementation by generating custom [CUDA](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) kernels. This approach aims for low-level optimization beyond standard PyTorch or even Triton for potentially higher performance on NVIDIA GPUs.
+This example uses a separate Markdown file (`guide.md`) to provide detailed instructions and context to the LLM.
+## Setup
+1.  Ensure you are in the `examples/cuda` directory.
+2.  Install the required dependency:
+    ```bash
+    pip install torch
+    ```
+    *(Note: This example requires a compatible NVIDIA GPU and the CUDA Toolkit installed on your system for compiling and running the generated CUDA code.)*
+## Optimization Command
+Run the following command to start the optimization process:
+```bash
+weco --source optimize.py \
+     --eval-command "python evaluate.py --solution-path optimize.py" \
+     --metric speedup \
+     --maximize true \
+     --steps 30 \
+     --model gemini-2.5-pro-exp-03-25 \
+     --additional-instructions guide.md
+```
+### Explanation
+*   `--source optimize.py`: The initial PyTorch self-attention code to be optimized with CUDA.
+*   `--eval-command "python evaluate.py --solution-path optimize.py"`: Runs the evaluation script, which compiles (if necessary) and benchmarks the CUDA-enhanced code in `optimize.py` against a baseline, printing the `speedup`.
+*   `--metric speedup`: The optimization target metric.
+*   `--maximize true`: Weco aims to increase the speedup.
+*   `--steps 30`: The number of optimization iterations.
+*   `--model gemini-2.5-pro-exp-03-25`: The LLM used for code generation.
+*   `--additional-instructions guide.md`: Points Weco to a file containing detailed instructions for the LLM on how to write the CUDA kernels, handle compilation (e.g., using `torch.utils.cpp_extension`), manage data types, and ensure correctness.
+Weco will iteratively modify `optimize.py`, potentially generating and integrating CUDA C++ code, guided by the evaluation results and the instructions in `guide.md`.

weco-0.2.9/examples/metal/README.md ADDED Viewed

@@ -0,0 +1,39 @@
+# Example: Optimizing MLX Convolution with Metal
+This example demonstrates how to use Weco to optimize a 2D convolution operation implemented in [`mlx`](https://github.com/ml-explore/mlx), targeting Apple's [Metal](https://developer.apple.com/documentation/metal/) framework for execution on Apple Silicon GPUs.
+It showcases using a separate file (`examples.rst`) to provide detailed context and instructions to the optimizing LLM.
+## Setup
+1.  Ensure you are in the `examples/metal` directory.
+2.  Install the required dependency:
+    ```bash
+    pip install mlx
+    ```
+## Optimization Command
+Run the following command to start the optimization process:
+```bash
+weco --source optimize.py \
+     --eval-command "python evaluate.py --solution-path optimize.py" \
+     --metric speedup \
+     --maximize true \
+     --steps 30 \
+     --model gemini-2.5-pro-exp-03-25 \
+     --additional-instructions examples.rst
+```
+### Explanation
+*   `--source optimize.py`: Specifies the Python file containing the MLX convolution code to be optimized.
+*   `--eval-command "python evaluate.py --solution-path optimize.py"`: Runs the evaluation script. `evaluate.py` executes the code in `optimize.py`, measures its performance against a baseline, and prints the `speedup` metric.
+*   `--metric speedup`: Tells Weco to target the 'speedup' value printed by the evaluation command.
+*   `--maximize true`: Instructs Weco to aim for a higher speedup value.
+*   `--steps 30`: Defines the number of iterative optimization steps Weco will perform.
+*   `--model gemini-2.5-pro-exp-03-25`: Selects the LLM used for proposing code modifications.
+*   `--additional-instructions examples.rst`: Provides a path to a file containing detailed guidance for the LLM during optimization (e.g., constraints, preferred Metal techniques).
+Weco will iteratively modify `optimize.py`, run `evaluate.py`, parse the `speedup`, and generate new code versions based on the results and the instructions in `examples.rst`.

weco-0.2.9/examples/prompt/README.md ADDED Viewed

@@ -0,0 +1,100 @@
+# weco-cli/examples/prompt/README.md
+# AIME Prompt Engineering Example with Weco
+This example shows how **Weco** can iteratively improve a prompt for solving American Invitational Mathematics Examination (AIME) problems. The experiment runs locally, requires only two short Python files, and aims to improve the accuracy metric.
+This example uses `gpt-4o-mini` via the OpenAI API by default. Ensure your `OPENAI_API_KEY` environment variable is set.
+## Files in this folder
+| File          | Purpose                                                                                                                                                           |
+| :------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `optimize.py` | Holds the prompt template (instructing the LLM to reason step-by-step and use `\\boxed{}` for the final answer) and the mutable `EXTRA_INSTRUCTIONS` string. Weco edits **only** this file during the search. |
+| `eval.py`     | Downloads a small slice of the 2024 AIME dataset, calls `optimize.solve` in parallel, parses the LLM output (looking for `\\boxed{}`), compares it to the ground truth, prints progress logs, and finally prints an `accuracy:` line that Weco reads. |
+## Quick start
+1.  **Clone the repository and enter the folder.**
+    ```bash
+    # If you cloned the main weco-cli repo already:
+    cd examples/prompt
+    # Otherwise:
+    # git clone https://github.com/WecoAI/weco-cli.git
+    # cd weco-cli/examples/prompt
+    ```
+2.  **Install dependencies.**
+    ```bash
+    # Ensure you have weco installed: pip install weco
+    pip install openai datasets # Add any other dependencies if needed
+    ```
+3.  **Set your OpenAI API Key.**
+    ```bash
+    export OPENAI_API_KEY="your_openai_api_key_here"
+    ```
+4.  **Run Weco.** The command below iteratively modifies `EXTRA_INSTRUCTIONS` in `optimize.py`, runs `eval.py` to evaluate the prompt's effectiveness, reads the printed accuracy, and keeps the best prompt variations found.
+    ```bash
+    weco --source optimize.py \
+         --eval-command "python eval.py" \
+         --metric accuracy \
+         --maximize true \
+         --steps 40 \
+         --model gemini-2.5-pro-exp-03-25
+    ```
+    *Note: You can replace `--model gemini-2.5-pro-exp-03-25` with another powerful model like `o3` if you have the respective API keys set.*
+During each evaluation round, you will see log lines similar to the following:
+```text
+[setup] loading 20 problems from AIME 2024 …
+[progress] 5/20 completed, accuracy: 0.0000, elapsed 7.3 s
+[progress] 10/20 completed, accuracy: 0.1000, elapsed 14.6 s
+[progress] 15/20 completed, accuracy: 0.0667, elapsed 21.8 s
+[progress] 20/20 completed, accuracy: 0.0500, elapsed 28.9 s
+accuracy: 0.0500# AIME 2024 Prompt‑Engineering Example
+This example shows how **Weco** can iteratively improve a prompt for solving American Invitational Mathematics Examination (AIME) problems. The experiment runs locally, requires only two short Python files, and finishes in a few hours on a laptop.
+## Files in this folder
+| File          | Purpose                                                                                                                                                           |
+| :------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `optimize.py` | Holds the prompt template (instructing the LLM to reason step-by-step and use `\\boxed{}` for the final answer) and the function to call the LLM. Weco edits **only** this file during the search to refine the prompt template. |
+| `eval.py`     | Defines the LLM model to use (`MODEL_TO_USE`). Downloads a small slice of the 2024 AIME dataset, calls `optimize.solve` in parallel (passing the chosen model), parses the LLM output, compares it to the ground truth, prints progress logs, and finally prints an `accuracy:` line that Weco reads. |
+## Quick start
+1. **Clone the repository and enter the folder.**
+   ```bash
+   git clone https://github.com/your‑fork/weco‑examples.git
+   cd weco‑examples/aime‑2024
+   ```
+2. **Run Weco.**  The command below edits `EXTRA_INSTRUCTIONS` in `optimize.py`, invokes `eval.py` on every iteration, reads the printed accuracy, and keeps the best variants.
+   ```bash
+   weco --source optimize.py \
+        --eval-command "python eval.py" \
+        --metric accuracy \
+        --maximize true \
+        --steps 40 \
+        --model gemini-2.5-flash-preview-04-17 \
+        --addtional-instructions prompt_guide.md
+   ```
+During each evaluation round you will see log lines similar to the following.
+```text
+[setup] loading 20 problems from AIME 2024 …
+[progress] 5/20 completed, elapsed 7.3 s
+[progress] 10/20 completed, elapsed 14.6 s
+[progress] 15/20 completed, elapsed 21.8 s
+[progress] 20/20 completed, elapsed 28.9 s
+accuracy: 0.0500
+```
+Weco then mutates the config, tries again, and gradually pushes the accuracy higher. On a modern laptop you can usually double the baseline score within thirty to forty iterations.
+## How it works
+* `eval_aime.py` slices the **Maxwell‑Jia/AIME_2024** dataset to twenty problems for fast feedback. You can change the slice in one line.
+* The script sends model calls in parallel via `ThreadPoolExecutor`, so network latency is hidden.
+* Every five completed items, the script logs progress and elapsed time.
+* The final line `accuracy: value` is the only part Weco needs for guidance.

weco 0.2.7__tar.gz → 0.2.9__tar.gz

weco 0.2.7tar.gz → 0.2.9tar.gz