PyPI - weco - Versions diffs - 0.2.8__tar.gz → 0.2.9__tar.gz - Mend

weco 0.2.8tar.gz → 0.2.9tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

{weco-0.2.8 → weco-0.2.9}/.github/workflows/release.yml RENAMED Viewed

@@ -90,7 +90,7 @@ jobs:
         GITHUB_TOKEN: ${{ github.token }}
       run: >-
         gh release create
-        'v0.2.8'
+        'v0.2.9'
         --repo '${{ github.repository }}'
         --notes ""
@@ -102,5 +102,5 @@ jobs:
       # sigstore-produced signatures and certificates.
       run: >-
         gh release upload
-        'v0.2.8' dist/**
+        'v0.2.9' dist/**
         --repo '${{ github.repository }}'

{weco-0.2.8 → weco-0.2.9}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: weco
-Version: 0.2.8
+Version: 0.2.9
 Summary: Documentation for `weco`, a CLI for using Weco AI's code optimizer.
 Author-email: Weco AI Team <contact@weco.ai>
 License: MIT
@@ -76,7 +76,7 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
 This basic example shows how to optimize a simple PyTorch function for speedup.
-For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)t**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
+For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
 ```bash
 # Navigate to the example directory
@@ -108,9 +108,10 @@ weco --source optimize.py \
 | `--metric`                  | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`.                | Yes      |
 | `--maximize`                | Whether to maximize (`true`) or minimize (`false`) the metric.                                                                                                           | Yes      |
 | `--steps`                   | Number of optimization steps (LLM iterations) to run.                                                                                                                    | Yes      |
-| `--model`                   | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). Recommended models to try include `o3-mini`, `claude-3-haiku`, and `gemini-2.5-pro-exp-03-25`.| Yes      |
+| `--model`                   | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.7-sonnet`). Recommended models to try include `o4-mini`, and `gemini-2.5-pro-exp-03-25`.| Yes      |
 | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM.                                    | No       |
 | `--log-dir`                 | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`.                                                          | No       |
+| `--preserve-source`         | (Optional) If set, do not overwrite the original `--source` file. Modifications and the best solution will still be saved in the `--log-dir`.                                | No       |
 ---

{weco-0.2.8 → weco-0.2.9}/README.md RENAMED Viewed

@@ -54,7 +54,7 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
 This basic example shows how to optimize a simple PyTorch function for speedup.
-For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)t**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
+For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
 ```bash
 # Navigate to the example directory
@@ -86,9 +86,10 @@ weco --source optimize.py \
 | `--metric`                  | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`.                | Yes      |
 | `--maximize`                | Whether to maximize (`true`) or minimize (`false`) the metric.                                                                                                           | Yes      |
 | `--steps`                   | Number of optimization steps (LLM iterations) to run.                                                                                                                    | Yes      |
-| `--model`                   | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). Recommended models to try include `o3-mini`, `claude-3-haiku`, and `gemini-2.5-pro-exp-03-25`.| Yes      |
+| `--model`                   | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.7-sonnet`). Recommended models to try include `o4-mini`, and `gemini-2.5-pro-exp-03-25`.| Yes      |
 | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM.                                    | No       |
 | `--log-dir`                 | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`.                                                          | No       |
+| `--preserve-source`         | (Optional) If set, do not overwrite the original `--source` file. Modifications and the best solution will still be saved in the `--log-dir`.                                | No       |
 ---

weco-0.2.9/examples/metal/README.md ADDED Viewed

@@ -0,0 +1,39 @@
+# Example: Optimizing MLX Convolution with Metal
+This example demonstrates how to use Weco to optimize a 2D convolution operation implemented in [`mlx`](https://github.com/ml-explore/mlx), targeting Apple's [Metal](https://developer.apple.com/documentation/metal/) framework for execution on Apple Silicon GPUs.
+It showcases using a separate file (`examples.rst`) to provide detailed context and instructions to the optimizing LLM.
+## Setup
+1.  Ensure you are in the `examples/metal` directory.
+2.  Install the required dependency:
+    ```bash
+    pip install mlx
+    ```
+## Optimization Command
+Run the following command to start the optimization process:
+```bash
+weco --source optimize.py \
+     --eval-command "python evaluate.py --solution-path optimize.py" \
+     --metric speedup \
+     --maximize true \
+     --steps 30 \
+     --model gemini-2.5-pro-exp-03-25 \
+     --additional-instructions examples.rst
+```
+### Explanation
+*   `--source optimize.py`: Specifies the Python file containing the MLX convolution code to be optimized.
+*   `--eval-command "python evaluate.py --solution-path optimize.py"`: Runs the evaluation script. `evaluate.py` executes the code in `optimize.py`, measures its performance against a baseline, and prints the `speedup` metric.
+*   `--metric speedup`: Tells Weco to target the 'speedup' value printed by the evaluation command.
+*   `--maximize true`: Instructs Weco to aim for a higher speedup value.
+*   `--steps 30`: Defines the number of iterative optimization steps Weco will perform.
+*   `--model gemini-2.5-pro-exp-03-25`: Selects the LLM used for proposing code modifications.
+*   `--additional-instructions examples.rst`: Provides a path to a file containing detailed guidance for the LLM during optimization (e.g., constraints, preferred Metal techniques).
+Weco will iteratively modify `optimize.py`, run `evaluate.py`, parse the `speedup`, and generate new code versions based on the results and the instructions in `examples.rst`.

weco-0.2.9/examples/prompt/README.md ADDED Viewed

@@ -0,0 +1,100 @@
+# weco-cli/examples/prompt/README.md
+# AIME Prompt Engineering Example with Weco
+This example shows how **Weco** can iteratively improve a prompt for solving American Invitational Mathematics Examination (AIME) problems. The experiment runs locally, requires only two short Python files, and aims to improve the accuracy metric.
+This example uses `gpt-4o-mini` via the OpenAI API by default. Ensure your `OPENAI_API_KEY` environment variable is set.
+## Files in this folder
+| File          | Purpose                                                                                                                                                           |
+| :------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `optimize.py` | Holds the prompt template (instructing the LLM to reason step-by-step and use `\\boxed{}` for the final answer) and the mutable `EXTRA_INSTRUCTIONS` string. Weco edits **only** this file during the search. |
+| `eval.py`     | Downloads a small slice of the 2024 AIME dataset, calls `optimize.solve` in parallel, parses the LLM output (looking for `\\boxed{}`), compares it to the ground truth, prints progress logs, and finally prints an `accuracy:` line that Weco reads. |
+## Quick start
+1.  **Clone the repository and enter the folder.**
+    ```bash
+    # If you cloned the main weco-cli repo already:
+    cd examples/prompt
+    # Otherwise:
+    # git clone https://github.com/WecoAI/weco-cli.git
+    # cd weco-cli/examples/prompt
+    ```
+2.  **Install dependencies.**
+    ```bash
+    # Ensure you have weco installed: pip install weco
+    pip install openai datasets # Add any other dependencies if needed
+    ```
+3.  **Set your OpenAI API Key.**
+    ```bash
+    export OPENAI_API_KEY="your_openai_api_key_here"
+    ```
+4.  **Run Weco.** The command below iteratively modifies `EXTRA_INSTRUCTIONS` in `optimize.py`, runs `eval.py` to evaluate the prompt's effectiveness, reads the printed accuracy, and keeps the best prompt variations found.
+    ```bash
+    weco --source optimize.py \
+         --eval-command "python eval.py" \
+         --metric accuracy \
+         --maximize true \
+         --steps 40 \
+         --model gemini-2.5-pro-exp-03-25
+    ```
+    *Note: You can replace `--model gemini-2.5-pro-exp-03-25` with another powerful model like `o3` if you have the respective API keys set.*
+During each evaluation round, you will see log lines similar to the following:
+```text
+[setup] loading 20 problems from AIME 2024 …
+[progress] 5/20 completed, accuracy: 0.0000, elapsed 7.3 s
+[progress] 10/20 completed, accuracy: 0.1000, elapsed 14.6 s
+[progress] 15/20 completed, accuracy: 0.0667, elapsed 21.8 s
+[progress] 20/20 completed, accuracy: 0.0500, elapsed 28.9 s
+accuracy: 0.0500# AIME 2024 Prompt‑Engineering Example
+This example shows how **Weco** can iteratively improve a prompt for solving American Invitational Mathematics Examination (AIME) problems. The experiment runs locally, requires only two short Python files, and finishes in a few hours on a laptop.
+## Files in this folder
+| File          | Purpose                                                                                                                                                           |
+| :------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `optimize.py` | Holds the prompt template (instructing the LLM to reason step-by-step and use `\\boxed{}` for the final answer) and the function to call the LLM. Weco edits **only** this file during the search to refine the prompt template. |
+| `eval.py`     | Defines the LLM model to use (`MODEL_TO_USE`). Downloads a small slice of the 2024 AIME dataset, calls `optimize.solve` in parallel (passing the chosen model), parses the LLM output, compares it to the ground truth, prints progress logs, and finally prints an `accuracy:` line that Weco reads. |
+## Quick start
+1. **Clone the repository and enter the folder.**
+   ```bash
+   git clone https://github.com/your‑fork/weco‑examples.git
+   cd weco‑examples/aime‑2024
+   ```
+2. **Run Weco.**  The command below edits `EXTRA_INSTRUCTIONS` in `optimize.py`, invokes `eval.py` on every iteration, reads the printed accuracy, and keeps the best variants.
+   ```bash
+   weco --source optimize.py \
+        --eval-command "python eval.py" \
+        --metric accuracy \
+        --maximize true \
+        --steps 40 \
+        --model gemini-2.5-flash-preview-04-17 \
+        --addtional-instructions prompt_guide.md
+   ```
+During each evaluation round you will see log lines similar to the following.
+```text
+[setup] loading 20 problems from AIME 2024 …
+[progress] 5/20 completed, elapsed 7.3 s
+[progress] 10/20 completed, elapsed 14.6 s
+[progress] 15/20 completed, elapsed 21.8 s
+[progress] 20/20 completed, elapsed 28.9 s
+accuracy: 0.0500
+```
+Weco then mutates the config, tries again, and gradually pushes the accuracy higher. On a modern laptop you can usually double the baseline score within thirty to forty iterations.
+## How it works
+* `eval_aime.py` slices the **Maxwell‑Jia/AIME_2024** dataset to twenty problems for fast feedback. You can change the slice in one line.
+* The script sends model calls in parallel via `ThreadPoolExecutor`, so network latency is hidden.
+* Every five completed items, the script logs progress and elapsed time.
+* The final line `accuracy: value` is the only part Weco needs for guidance.

weco-0.2.9/examples/prompt/eval.py ADDED Viewed

@@ -0,0 +1,135 @@
+# weco-cli/examples/prompt/eval.py
+"""
+eval.py  (parallel with progress logs)
+Downloads a slice of AIME 2024, calls optimize.solve in parallel,
+prints progress every N samples, and finally prints accuracy
+in the format that Weco expects.
+The LLM model to use is defined in this file.
+"""
+import re
+import time
+from concurrent.futures import ThreadPoolExecutor, as_completed
+import sys
+import concurrent.futures
+from datasets import load_dataset
+import optimize  # the file Weco mutates
+# ---------------------------------------------------------------------
+# Configuration
+TOTAL_SAMPLES = 30  # how many problems to load
+NUM_WORKERS = 30  # concurrent LLM calls
+LOG_EVERY = 5  # print progress after this many
+MODEL_TO_USE = "gpt-4.1"  # Define the model to use HERE
+TASK_TIMEOUT = 300  # seconds per LLM call
+# ---------------------------------------------------------------------
+print(f"[setup] loading {TOTAL_SAMPLES} problems from AIME 2024 …", flush=True)
+DATA = load_dataset("Maxwell-Jia/AIME_2024", split=f"train[:{TOTAL_SAMPLES}]", cache_dir=".cache")
+def extract_final_answer(text: str) -> str:
+    """
+    Extracts the final AIME answer (000-999) from the LLM response.
+    Prioritizes answers within \boxed{}, then looks for patterns,
+    and falls back to finding the last 3-digit number.
+    """
+    # 1. Check for \boxed{...}
+    boxed_match = re.search(r"\\boxed\{(\d{1,3})\}", text)
+    if boxed_match:
+        return boxed_match.group(1).zfill(3)  # Pad with leading zeros if needed
+    # 2. Check for "final answer is ..." patterns (case-insensitive)
+    # Make sure pattern captures potential variations like "is: 123", "is 123."
+    answer_pattern = r"(?:final|answer is|result is)[:\s]*(\d{1,3})\b"
+    answer_match = re.search(answer_pattern, text, re.IGNORECASE)
+    if answer_match:
+        return answer_match.group(1).zfill(3)
+    # 3. Fallback: Find the last occurrence of a 1-3 digit number in the text
+    #    This is less reliable but can be a fallback.
+    #    Let's refine the fallback regex to be slightly more specific
+    #    Look for isolated 1-3 digit numbers, possibly at the end or after keywords.
+    fallback_matches = re.findall(r"\b(\d{1,3})\b", text)
+    if fallback_matches:
+        # Return the last found number, assuming it's the most likely answer candidate
+        return fallback_matches[-1].zfill(3)
+    return ""  # Return empty if no answer found
+def grade_answer(llm_output: str, ground_truth_answer: str) -> bool:
+    """Compares the extracted LLM answer to the ground truth."""
+    extracted_guess = extract_final_answer(llm_output)
+    # Ground truth answers in AIME are typically strings "000" to "999"
+    # Ensure comparison is consistent (e.g., both as strings, potentially padded)
+    # The ground truth from the dataset seems to be string integers already.
+    # Let's ensure the extracted guess is also treated as a simple integer string for comparison.
+    # The ground truth might not be zero-padded in the dataset, so compare integers.
+    try:
+        # Check if both can be converted to integers for comparison
+        return int(extracted_guess) == int(ground_truth_answer)
+    except ValueError:
+        # If conversion fails (e.g., empty string), they don't match
+        return False
+def run_evaluation() -> float:
+    """Runs the evaluation on the dataset and returns the accuracy."""
+    correct = 0
+    start = time.time()
+    results = []  # Store results for potential later analysis if needed
+    with ThreadPoolExecutor(max_workers=NUM_WORKERS) as pool:
+        # Submit all tasks, passing the MODEL_TO_USE
+        futures = {
+            pool.submit(optimize.solve, row["Problem"], MODEL_TO_USE): row["Answer"] for row in DATA
+        }  # Pass MODEL_TO_USE here
+        try:
+            # Process completed tasks
+            for idx, future in enumerate(as_completed(futures), 1):
+                problem_answer = futures[future]  # Get the corresponding ground truth answer
+                try:
+                    # Wait up to TASK_TIMEOUT seconds for each LLM call
+                    llm_raw_output = future.result(timeout=TASK_TIMEOUT)
+                    is_correct = grade_answer(llm_raw_output, str(problem_answer))
+                    if is_correct:
+                        correct += 1
+                    results.append({"raw_output": llm_raw_output, "correct_answer": problem_answer, "is_correct": is_correct})
+                except Exception as exc:
+                    print(f"[error] Generated an exception: {exc}")
+                    results.append({"raw_output": f"Error: {exc}", "correct_answer": problem_answer, "is_correct": False})
+                if idx % LOG_EVERY == 0 or idx == TOTAL_SAMPLES:
+                    elapsed = time.time() - start
+                    current_accuracy = correct / idx if idx > 0 else 0
+                    print(
+                        f"[progress] {idx}/{TOTAL_SAMPLES} completed, accuracy: {current_accuracy:.4f}, elapsed {elapsed:.1f} s",
+                        flush=True,
+                    )
+        except concurrent.futures.TimeoutError:
+            # Abort any stuck LLM calls
+            print(f"[error] LLM call timed out after {TASK_TIMEOUT}s", flush=True)
+            # Cancel all pending futures and exit
+            for f in futures:
+                f.cancel()
+            print("Exiting due to timeout", file=sys.stderr)
+            sys.exit(1)
+        except KeyboardInterrupt:
+            print("\nEvaluation interrupted by user", file=sys.stderr)
+            sys.exit(1)
+    # Final accuracy calculation
+    total_evaluated = len(results)
+    final_accuracy = correct / total_evaluated if total_evaluated > 0 else 0
+    return final_accuracy
+if __name__ == "__main__":
+    acc = run_evaluation()
+    # Weco parses this exact line format
+    print(f"accuracy: {acc:.4f}")

weco-0.2.9/examples/prompt/optimize.py ADDED Viewed

@@ -0,0 +1,34 @@
+# weco-cli/examples/prompt/optimize.py
+"""
+optimize.py
+This module holds the prompt template and the LLM call.
+Weco modifies this file to optimize the prompt instructions.
+The model used for the LLM call is passed in from eval.py.
+"""
+from openai import OpenAI
+client = OpenAI()  # API key must be in OPENAI_API_KEY
+# MODEL constant removed from here
+PROMPT_TEMPLATE = """You are an expert competition mathematician tasked with solving an AIME problem.
+The final answer must be a three-digit integer between 000 and 999, inclusive.
+Please reason step-by-step towards the solution. Keep your reasoning concise.
+Conclude your response with the final answer enclosed in \\boxed{{}}. For example: The final answer is \\boxed{{042}}.
+Problem:
+{problem}
+Solution:
+"""
+def solve(problem: str, model_name: str) -> str:
+    """Return the model's raw text answer for one problem using the specified model."""
+    prompt = PROMPT_TEMPLATE.format(problem=problem)
+    response = client.chat.completions.create(
+        model=model_name,  # Use the passed-in model name
+        messages=[{"role": "user", "content": prompt}],
+    )
+    return response.choices[0].message.content.strip()

weco-0.2.9/examples/prompt/prompt_guide.md ADDED Viewed

@@ -0,0 +1,45 @@
+# Weco Prompt Optimization Guidelines for AIME (Targeting GPT-4.1)
+## 1. Goal
+Your objective is to modify the the `optimize.py` file to improve the `accuracy` metric when solving AIME math problems. The modifications should leverage the capabilities of the target model, **GPT-4.1**.
+## 2. Files and Workflow
+*   **Target File for Modification:** `optimize.py`. *   **Evaluation Script:** `eval.py`. This script:
+    *   Defines the actual LLM used for solving (`MODEL_TO_USE`, which is set to `gpt-4.1` in this context).
+    *   Calls `optimize.solve(problem, model_name="gpt-4.1")`.
+    *   Parses the output from `optimize.solve`. **Crucially, it expects the final 3-digit answer (000-999) to be enclosed in `\boxed{XXX}`.** For example: `\boxed{042}`. Your prompt modifications *must* ensure the model consistently produces this format for the final answer.
+    *   Compares the extracted answer to the ground truth and prints the `accuracy:` metric, which Weco uses for guidance.
+## 3. Target Model: GPT-4.1
+You are optimizing the prompt for `gpt-4.1`. Based on its characteristics, consider the following:
+*   **Strengths:**
+    *   **Significantly Improved Instruction Following:** GPT-4.1 is better at adhering to complex instructions, formats, and constraints compared to previous models. This is key for AIME where precision is vital. It excels on hard instruction-following tasks.
+    *   **Stronger Coding & Reasoning:** Its improved coding performance (e.g., SWE-bench) suggests enhanced logical reasoning capabilities applicable to mathematical problem-solving.
+    *   **Refreshed Knowledge:** Knowledge cutoff is June 2024.
+*   **Considerations:**
+    *   **Literal Interpretation:** GPT-4.1 can be more literal. Prompts should be explicit and specific about the desired reasoning process and output format. Avoid ambiguity.
+## 4. Optimization Strategies (Focus on `PROMPT_TEMPLATE` in `optimize.py`)
+The primary goal is to enhance the model's reasoning process for these challenging math problems. Focus on Chain-of-Thought (CoT) designs within the `PROMPT_TEMPLATE`.
+**Ideas to Explore:**
+You don't have to implement all of them, but the following ideas might be helpful:
+*   **Workflow Patterns** try to use some of the following patterns:
+    *  **Linear**: Linear workflow, standarded CoT E.g. considering the following thinking steps (you don't have to include all of them), "1. Understand the problem constraints. 2. Identify relevant theorems/formulas. 3. Formulate a plan. 4. Execute calculations step-by-step. 5. Verify intermediate results. 6. State the final answer in the required format."
+    *  **List Candidates**: You can ask the model to propose a few solutions in a particular step and pick the best solution. You can potentially also set the criterias in the prompt.
+    *  **Code** Use pesudo code to define even more complex workflows with loops, conditional statement, or go to statement.
+*   **Other CoT Techniques:**
+    *   Self-Correction/Reflection
+    *   Plan Generation
+    *   Debate, simulating multiple characters
+    *   Tree of thought
+*   **Few-Shot Examples:** You *could* experiment with adding 1-2 high-quality AIME problem/solution examples directly into the `PROMPT_TEMPLATE` string (similar to how Weco attempted in one of the runs). Ensure the examples clearly show the desired reasoning style and the final `\boxed{XXX}` format.
+*   **Play with format:** The way you format the prompt. Markdown, xml, json, code or natural language. Similarly for the thinking tokens themselves you can also try out different formats.
+## 5. Constraints
+*   **Ensure the final output reliably contains `\boxed{XXX}` as the evaluation script depends on it.**

weco-0.2.9/examples/triton/README.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Example: Optimizing PyTorch Self-Attention with Triton
+This example demonstrates using Weco to optimize a causal multi-head self-attention mechanism, a core component of Transformer models, implemented in PyTorch. The optimization target is to leverage [Triton](https://github.com/triton-lang/triton), a language and compiler for writing highly efficient GPU code, to accelerate the operation.
+## Setup
+1.  Ensure you are in the `examples/triton` directory.
+2.  Install the required dependencies:
+    ```bash
+    pip install torch triton
+    ```
+    *(Note: Triton installation might require specific CUDA versions. Refer to the official Triton documentation if you encounter issues.)*
+## Optimization Command
+Run the following command to start the optimization process:
+```bash
+weco --source optimize.py \
+     --eval-command "python evaluate.py --solution-path optimize.py" \
+     --metric speedup \
+     --maximize true \
+     --steps 30 \
+     --model gemini-2.5-pro-exp-03-25 \
+     --additional-instructions "Use triton to optimize the code while ensuring a small max float diff. Maintain the same code format."
+```
+### Explanation
+*   `--source optimize.py`: The PyTorch self-attention implementation to be optimized.
+*   `--eval-command "python evaluate.py --solution-path optimize.py"`: Executes the evaluation script, which benchmarks the `optimize.py` code against a baseline and prints the `speedup`.
+*   `--metric speedup`: The target metric for optimization.
+*   `--maximize true`: Weco should maximize the speedup.
+*   `--steps 30`: The number of optimization iterations.
+*   `--model gemini-2.5-pro-exp-03-25`: The LLM driving the optimization.
+*   `--additional-instructions "..."`: Provides specific guidance to the LLM, instructing it to use Triton, maintain numerical accuracy ("small max float diff"), and preserve the code structure.
+Weco will iteratively refine `optimize.py` using Triton, guided by the evaluation results and the provided instructions.

{weco-0.2.8 → weco-0.2.9}/pyproject.toml RENAMED Viewed

@@ -10,7 +10,7 @@ authors = [
 ]
 description = "Documentation for `weco`, a CLI for using Weco AI's code optimizer."
 readme = "README.md"
-version = "0.2.8"
+version = "0.2.9"
 license = {text = "MIT"}
 requires-python = ">=3.8"
 dependencies = ["requests", "rich"]

{weco-0.2.8 → weco-0.2.9}/weco/__init__.py RENAMED Viewed

@@ -1,4 +1,4 @@
 # DO NOT EDIT
-__pkg_version__ = "0.2.8"
+__pkg_version__ = "0.2.9"
 __api_version__ = "v1"
 __base_url__ = f"https://api.aide.weco.ai/{__api_version__}"

{weco-0.2.8 → weco-0.2.9}/weco/cli.py RENAMED Viewed

@@ -57,6 +57,11 @@ def main() -> None:
         type=str,
         help="Description of additional instruction or path to a file containing additional instructions",
     )
+    parser.add_argument(
+        "--preserve-source",
+        action="store_true",
+        help="If set, do not overwrite the original source file; only save modified versions in the runs directory",
+    )
     args = parser.parse_args()
     try:
@@ -73,15 +78,16 @@ def main() -> None:
                 "debug_prob": 0.5,
                 "max_debug_depth": max(1, math.ceil(0.1 * steps)),  # 10% of steps
             }
+            # Read API keys
+            api_keys = read_api_keys_from_env()
+            # API request timeout
+            timeout = 800
             # Read additional instructions
             additional_instructions = read_additional_instructions(additional_instructions=args.additional_instructions)
             # Read source code
             source_fp = pathlib.Path(args.source)
             source_code = read_from_path(fp=source_fp, is_json=False)
-            # Read API keys
-            api_keys = read_api_keys_from_env()
-            # API request timeout
-            timeout = 800
         # Initialize panels
         summary_panel = SummaryPanel(
@@ -124,7 +130,8 @@ def main() -> None:
             # Write the code string to the source file path
             # Do this after the original code is saved
-            write_to_path(fp=source_fp, content=session_response["code"])
+            if not args.preserve_source:
+                write_to_path(fp=source_fp, content=session_response["code"])
             # Update the panels with the initial solution
             # Add session id now that we have it
@@ -191,20 +198,25 @@ def main() -> None:
             )
             for step in range(1, steps):
+                # Re-read instructions from the original source (file path or string) BEFORE each suggest call
+                current_additional_instructions = read_additional_instructions(
+                    additional_instructions=args.additional_instructions
+                )
                 # Evaluate the current output and get the next solution
                 eval_and_next_solution_response = evaluate_feedback_then_suggest_next_solution(
                     console=console,
                     session_id=session_id,
                     execution_output=term_out,
-                    additional_instructions=additional_instructions,
+                    additional_instructions=current_additional_instructions,
                     api_keys=api_keys,
                     timeout=timeout,
                 )
                 # Save next solution (.runs/<session-id>/step_<step>.<extension>)
-                write_to_path(fp=runs_dir / f"step_{step}.{source_fp.suffix}", content=eval_and_next_solution_response["code"])
+                write_to_path(fp=runs_dir / f"step_{step}{source_fp.suffix}", content=eval_and_next_solution_response["code"])
                 # Write the next solution to the source file
-                write_to_path(fp=source_fp, content=eval_and_next_solution_response["code"])
+                if not args.preserve_source:
+                    write_to_path(fp=source_fp, content=eval_and_next_solution_response["code"])
                 # Get the optimization session status for
                 # the best solution, its score, and the history to plot the tree
@@ -283,12 +295,16 @@ def main() -> None:
                     transition_delay=0.1,  # Slightly longer delay for evaluation results
                 )
+            # Re-read instructions before the final feedback step
+            current_additional_instructions = read_additional_instructions(
+                additional_instructions=args.additional_instructions
+            )
             # Ensure we pass evaluation results for the last step's generated solution
             eval_and_next_solution_response = evaluate_feedback_then_suggest_next_solution(
                 console=console,
                 session_id=session_id,
                 execution_output=term_out,
-                additional_instructions=additional_instructions,
+                additional_instructions=current_additional_instructions,
                 api_keys=api_keys,
                 timeout=timeout,
             )
@@ -355,7 +371,8 @@ def main() -> None:
             write_to_path(fp=runs_dir / f"best.{source_fp.suffix}", content=best_solution_content)
             # write the best solution to the source file
-            write_to_path(fp=source_fp, content=best_solution_content)
+            if not args.preserve_source:
+                write_to_path(fp=source_fp, content=best_solution_content)
         console.print(end_optimization_layout)

{weco-0.2.8 → weco-0.2.9}/weco.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: weco
-Version: 0.2.8
+Version: 0.2.9
 Summary: Documentation for `weco`, a CLI for using Weco AI's code optimizer.
 Author-email: Weco AI Team <contact@weco.ai>
 License: MIT
@@ -76,7 +76,7 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
 This basic example shows how to optimize a simple PyTorch function for speedup.
-For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)t**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
+For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
 ```bash
 # Navigate to the example directory
@@ -108,9 +108,10 @@ weco --source optimize.py \
 | `--metric`                  | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`.                | Yes      |
 | `--maximize`                | Whether to maximize (`true`) or minimize (`false`) the metric.                                                                                                           | Yes      |
 | `--steps`                   | Number of optimization steps (LLM iterations) to run.                                                                                                                    | Yes      |
-| `--model`                   | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). Recommended models to try include `o3-mini`, `claude-3-haiku`, and `gemini-2.5-pro-exp-03-25`.| Yes      |
+| `--model`                   | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.7-sonnet`). Recommended models to try include `o4-mini`, and `gemini-2.5-pro-exp-03-25`.| Yes      |
 | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM.                                    | No       |
 | `--log-dir`                 | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`.                                                          | No       |
+| `--preserve-source`         | (Optional) If set, do not overwrite the original `--source` file. Modifications and the best solution will still be saved in the `--log-dir`.                                | No       |
 ---

{weco-0.2.8 → weco-0.2.9}/weco.egg-info/SOURCES.txt RENAMED Viewed

@@ -14,6 +14,10 @@ examples/metal/README.md
 examples/metal/evaluate.py
 examples/metal/examples.rst
 examples/metal/optimize.py
+examples/prompt/README.md
+examples/prompt/eval.py
+examples/prompt/optimize.py
+examples/prompt/prompt_guide.md
 examples/spaceship-titanic/README.md
 examples/spaceship-titanic/baseline.py
 examples/spaceship-titanic/evaluate.py