weco 0.2.8__tar.gz → 0.2.10__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. {weco-0.2.8 → weco-0.2.10}/.github/workflows/release.yml +2 -2
  2. {weco-0.2.8 → weco-0.2.10}/PKG-INFO +4 -3
  3. {weco-0.2.8 → weco-0.2.10}/README.md +3 -2
  4. weco-0.2.10/examples/metal/README.md +39 -0
  5. weco-0.2.10/examples/prompt/README.md +100 -0
  6. weco-0.2.10/examples/prompt/eval.py +135 -0
  7. weco-0.2.10/examples/prompt/optimize.py +34 -0
  8. weco-0.2.10/examples/prompt/prompt_guide.md +45 -0
  9. weco-0.2.10/examples/triton/README.md +38 -0
  10. {weco-0.2.8 → weco-0.2.10}/pyproject.toml +1 -1
  11. {weco-0.2.8 → weco-0.2.10}/weco/__init__.py +1 -1
  12. {weco-0.2.8 → weco-0.2.10}/weco/cli.py +29 -12
  13. {weco-0.2.8 → weco-0.2.10}/weco.egg-info/PKG-INFO +4 -3
  14. {weco-0.2.8 → weco-0.2.10}/weco.egg-info/SOURCES.txt +4 -0
  15. weco-0.2.8/examples/metal/README.md +0 -0
  16. weco-0.2.8/examples/triton/README.md +0 -0
  17. {weco-0.2.8 → weco-0.2.10}/.github/workflows/lint.yml +0 -0
  18. {weco-0.2.8 → weco-0.2.10}/.gitignore +0 -0
  19. {weco-0.2.8 → weco-0.2.10}/LICENSE +0 -0
  20. {weco-0.2.8 → weco-0.2.10}/examples/cuda/README.md +0 -0
  21. {weco-0.2.8 → weco-0.2.10}/examples/cuda/evaluate.py +0 -0
  22. {weco-0.2.8 → weco-0.2.10}/examples/cuda/guide.md +0 -0
  23. {weco-0.2.8 → weco-0.2.10}/examples/cuda/optimize.py +0 -0
  24. {weco-0.2.8 → weco-0.2.10}/examples/hello-kernel-world/evaluate.py +0 -0
  25. {weco-0.2.8 → weco-0.2.10}/examples/hello-kernel-world/optimize.py +0 -0
  26. {weco-0.2.8 → weco-0.2.10}/examples/metal/evaluate.py +0 -0
  27. {weco-0.2.8 → weco-0.2.10}/examples/metal/examples.rst +0 -0
  28. {weco-0.2.8 → weco-0.2.10}/examples/metal/optimize.py +0 -0
  29. {weco-0.2.8 → weco-0.2.10}/examples/spaceship-titanic/README.md +0 -0
  30. {weco-0.2.8 → weco-0.2.10}/examples/spaceship-titanic/baseline.py +0 -0
  31. {weco-0.2.8 → weco-0.2.10}/examples/spaceship-titanic/evaluate.py +0 -0
  32. {weco-0.2.8 → weco-0.2.10}/examples/spaceship-titanic/optimize.py +0 -0
  33. {weco-0.2.8 → weco-0.2.10}/examples/spaceship-titanic/requirements-test.txt +0 -0
  34. {weco-0.2.8 → weco-0.2.10}/examples/spaceship-titanic/utils.py +0 -0
  35. {weco-0.2.8 → weco-0.2.10}/examples/triton/evaluate.py +0 -0
  36. {weco-0.2.8 → weco-0.2.10}/examples/triton/optimize.py +0 -0
  37. {weco-0.2.8 → weco-0.2.10}/setup.cfg +0 -0
  38. {weco-0.2.8 → weco-0.2.10}/weco/api.py +0 -0
  39. {weco-0.2.8 → weco-0.2.10}/weco/panels.py +0 -0
  40. {weco-0.2.8 → weco-0.2.10}/weco/utils.py +0 -0
  41. {weco-0.2.8 → weco-0.2.10}/weco.egg-info/dependency_links.txt +0 -0
  42. {weco-0.2.8 → weco-0.2.10}/weco.egg-info/entry_points.txt +0 -0
  43. {weco-0.2.8 → weco-0.2.10}/weco.egg-info/requires.txt +0 -0
  44. {weco-0.2.8 → weco-0.2.10}/weco.egg-info/top_level.txt +0 -0
@@ -90,7 +90,7 @@ jobs:
90
90
  GITHUB_TOKEN: ${{ github.token }}
91
91
  run: >-
92
92
  gh release create
93
- 'v0.2.8'
93
+ 'v0.2.10'
94
94
  --repo '${{ github.repository }}'
95
95
  --notes ""
96
96
 
@@ -102,5 +102,5 @@ jobs:
102
102
  # sigstore-produced signatures and certificates.
103
103
  run: >-
104
104
  gh release upload
105
- 'v0.2.8' dist/**
105
+ 'v0.2.10' dist/**
106
106
  --repo '${{ github.repository }}'
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: weco
3
- Version: 0.2.8
3
+ Version: 0.2.10
4
4
  Summary: Documentation for `weco`, a CLI for using Weco AI's code optimizer.
5
5
  Author-email: Weco AI Team <contact@weco.ai>
6
6
  License: MIT
@@ -76,7 +76,7 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
76
76
 
77
77
  This basic example shows how to optimize a simple PyTorch function for speedup.
78
78
 
79
- For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)t**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
79
+ For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
80
80
 
81
81
  ```bash
82
82
  # Navigate to the example directory
@@ -108,9 +108,10 @@ weco --source optimize.py \
108
108
  | `--metric` | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`. | Yes |
109
109
  | `--maximize` | Whether to maximize (`true`) or minimize (`false`) the metric. | Yes |
110
110
  | `--steps` | Number of optimization steps (LLM iterations) to run. | Yes |
111
- | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). Recommended models to try include `o3-mini`, `claude-3-haiku`, and `gemini-2.5-pro-exp-03-25`.| Yes |
111
+ | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.7-sonnet`). Recommended models to try include `o4-mini`, and `gemini-2.5-pro-exp-03-25`.| Yes |
112
112
  | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM. | No |
113
113
  | `--log-dir` | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`. | No |
114
+ | `--preserve-source` | (Optional) If set, do not overwrite the original `--source` file. Modifications and the best solution will still be saved in the `--log-dir`. | No |
114
115
 
115
116
  ---
116
117
 
@@ -54,7 +54,7 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
54
54
 
55
55
  This basic example shows how to optimize a simple PyTorch function for speedup.
56
56
 
57
- For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)t**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
57
+ For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
58
58
 
59
59
  ```bash
60
60
  # Navigate to the example directory
@@ -86,9 +86,10 @@ weco --source optimize.py \
86
86
  | `--metric` | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`. | Yes |
87
87
  | `--maximize` | Whether to maximize (`true`) or minimize (`false`) the metric. | Yes |
88
88
  | `--steps` | Number of optimization steps (LLM iterations) to run. | Yes |
89
- | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). Recommended models to try include `o3-mini`, `claude-3-haiku`, and `gemini-2.5-pro-exp-03-25`.| Yes |
89
+ | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.7-sonnet`). Recommended models to try include `o4-mini`, and `gemini-2.5-pro-exp-03-25`.| Yes |
90
90
  | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM. | No |
91
91
  | `--log-dir` | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`. | No |
92
+ | `--preserve-source` | (Optional) If set, do not overwrite the original `--source` file. Modifications and the best solution will still be saved in the `--log-dir`. | No |
92
93
 
93
94
  ---
94
95
 
@@ -0,0 +1,39 @@
1
+ # Example: Optimizing MLX Convolution with Metal
2
+
3
+ This example demonstrates how to use Weco to optimize a 2D convolution operation implemented in [`mlx`](https://github.com/ml-explore/mlx), targeting Apple's [Metal](https://developer.apple.com/documentation/metal/) framework for execution on Apple Silicon GPUs.
4
+
5
+ It showcases using a separate file (`examples.rst`) to provide detailed context and instructions to the optimizing LLM.
6
+
7
+ ## Setup
8
+
9
+ 1. Ensure you are in the `examples/metal` directory.
10
+ 2. Install the required dependency:
11
+ ```bash
12
+ pip install mlx
13
+ ```
14
+
15
+ ## Optimization Command
16
+
17
+ Run the following command to start the optimization process:
18
+
19
+ ```bash
20
+ weco --source optimize.py \
21
+ --eval-command "python evaluate.py --solution-path optimize.py" \
22
+ --metric speedup \
23
+ --maximize true \
24
+ --steps 30 \
25
+ --model gemini-2.5-pro-exp-03-25 \
26
+ --additional-instructions examples.rst
27
+ ```
28
+
29
+ ### Explanation
30
+
31
+ * `--source optimize.py`: Specifies the Python file containing the MLX convolution code to be optimized.
32
+ * `--eval-command "python evaluate.py --solution-path optimize.py"`: Runs the evaluation script. `evaluate.py` executes the code in `optimize.py`, measures its performance against a baseline, and prints the `speedup` metric.
33
+ * `--metric speedup`: Tells Weco to target the 'speedup' value printed by the evaluation command.
34
+ * `--maximize true`: Instructs Weco to aim for a higher speedup value.
35
+ * `--steps 30`: Defines the number of iterative optimization steps Weco will perform.
36
+ * `--model gemini-2.5-pro-exp-03-25`: Selects the LLM used for proposing code modifications.
37
+ * `--additional-instructions examples.rst`: Provides a path to a file containing detailed guidance for the LLM during optimization (e.g., constraints, preferred Metal techniques).
38
+
39
+ Weco will iteratively modify `optimize.py`, run `evaluate.py`, parse the `speedup`, and generate new code versions based on the results and the instructions in `examples.rst`.
@@ -0,0 +1,100 @@
1
+ # weco-cli/examples/prompt/README.md
2
+ # AIME Prompt Engineering Example with Weco
3
+
4
+ This example shows how **Weco** can iteratively improve a prompt for solving American Invitational Mathematics Examination (AIME) problems. The experiment runs locally, requires only two short Python files, and aims to improve the accuracy metric.
5
+
6
+ This example uses `gpt-4o-mini` via the OpenAI API by default. Ensure your `OPENAI_API_KEY` environment variable is set.
7
+
8
+ ## Files in this folder
9
+
10
+ | File | Purpose |
11
+ | :------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
12
+ | `optimize.py` | Holds the prompt template (instructing the LLM to reason step-by-step and use `\\boxed{}` for the final answer) and the mutable `EXTRA_INSTRUCTIONS` string. Weco edits **only** this file during the search. |
13
+ | `eval.py` | Downloads a small slice of the 2024 AIME dataset, calls `optimize.solve` in parallel, parses the LLM output (looking for `\\boxed{}`), compares it to the ground truth, prints progress logs, and finally prints an `accuracy:` line that Weco reads. |
14
+
15
+ ## Quick start
16
+
17
+ 1. **Clone the repository and enter the folder.**
18
+ ```bash
19
+ # If you cloned the main weco-cli repo already:
20
+ cd examples/prompt
21
+
22
+ # Otherwise:
23
+ # git clone https://github.com/WecoAI/weco-cli.git
24
+ # cd weco-cli/examples/prompt
25
+ ```
26
+ 2. **Install dependencies.**
27
+ ```bash
28
+ # Ensure you have weco installed: pip install weco
29
+ pip install openai datasets # Add any other dependencies if needed
30
+ ```
31
+ 3. **Set your OpenAI API Key.**
32
+ ```bash
33
+ export OPENAI_API_KEY="your_openai_api_key_here"
34
+ ```
35
+ 4. **Run Weco.** The command below iteratively modifies `EXTRA_INSTRUCTIONS` in `optimize.py`, runs `eval.py` to evaluate the prompt's effectiveness, reads the printed accuracy, and keeps the best prompt variations found.
36
+ ```bash
37
+ weco --source optimize.py \
38
+ --eval-command "python eval.py" \
39
+ --metric accuracy \
40
+ --maximize true \
41
+ --steps 40 \
42
+ --model gemini-2.5-pro-exp-03-25
43
+ ```
44
+ *Note: You can replace `--model gemini-2.5-pro-exp-03-25` with another powerful model like `o3` if you have the respective API keys set.*
45
+
46
+ During each evaluation round, you will see log lines similar to the following:
47
+
48
+ ```text
49
+ [setup] loading 20 problems from AIME 2024 …
50
+ [progress] 5/20 completed, accuracy: 0.0000, elapsed 7.3 s
51
+ [progress] 10/20 completed, accuracy: 0.1000, elapsed 14.6 s
52
+ [progress] 15/20 completed, accuracy: 0.0667, elapsed 21.8 s
53
+ [progress] 20/20 completed, accuracy: 0.0500, elapsed 28.9 s
54
+ accuracy: 0.0500# AIME 2024 Prompt‑Engineering Example
55
+ This example shows how **Weco** can iteratively improve a prompt for solving American Invitational Mathematics Examination (AIME) problems. The experiment runs locally, requires only two short Python files, and finishes in a few hours on a laptop.
56
+
57
+ ## Files in this folder
58
+
59
+ | File | Purpose |
60
+ | :------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
61
+ | `optimize.py` | Holds the prompt template (instructing the LLM to reason step-by-step and use `\\boxed{}` for the final answer) and the function to call the LLM. Weco edits **only** this file during the search to refine the prompt template. |
62
+ | `eval.py` | Defines the LLM model to use (`MODEL_TO_USE`). Downloads a small slice of the 2024 AIME dataset, calls `optimize.solve` in parallel (passing the chosen model), parses the LLM output, compares it to the ground truth, prints progress logs, and finally prints an `accuracy:` line that Weco reads. |
63
+
64
+ ## Quick start
65
+
66
+ 1. **Clone the repository and enter the folder.**
67
+ ```bash
68
+ git clone https://github.com/your‑fork/weco‑examples.git
69
+ cd weco‑examples/aime‑2024
70
+ ```
71
+ 2. **Run Weco.** The command below edits `EXTRA_INSTRUCTIONS` in `optimize.py`, invokes `eval.py` on every iteration, reads the printed accuracy, and keeps the best variants.
72
+ ```bash
73
+ weco --source optimize.py \
74
+ --eval-command "python eval.py" \
75
+ --metric accuracy \
76
+ --maximize true \
77
+ --steps 40 \
78
+ --model gemini-2.5-flash-preview-04-17 \
79
+ --addtional-instructions prompt_guide.md
80
+ ```
81
+
82
+ During each evaluation round you will see log lines similar to the following.
83
+
84
+ ```text
85
+ [setup] loading 20 problems from AIME 2024 …
86
+ [progress] 5/20 completed, elapsed 7.3 s
87
+ [progress] 10/20 completed, elapsed 14.6 s
88
+ [progress] 15/20 completed, elapsed 21.8 s
89
+ [progress] 20/20 completed, elapsed 28.9 s
90
+ accuracy: 0.0500
91
+ ```
92
+
93
+ Weco then mutates the config, tries again, and gradually pushes the accuracy higher. On a modern laptop you can usually double the baseline score within thirty to forty iterations.
94
+
95
+ ## How it works
96
+
97
+ * `eval_aime.py` slices the **Maxwell‑Jia/AIME_2024** dataset to twenty problems for fast feedback. You can change the slice in one line.
98
+ * The script sends model calls in parallel via `ThreadPoolExecutor`, so network latency is hidden.
99
+ * Every five completed items, the script logs progress and elapsed time.
100
+ * The final line `accuracy: value` is the only part Weco needs for guidance.
@@ -0,0 +1,135 @@
1
+ # weco-cli/examples/prompt/eval.py
2
+ """
3
+ eval.py (parallel with progress logs)
4
+
5
+ Downloads a slice of AIME 2024, calls optimize.solve in parallel,
6
+ prints progress every N samples, and finally prints accuracy
7
+ in the format that Weco expects.
8
+ The LLM model to use is defined in this file.
9
+ """
10
+
11
+ import re
12
+ import time
13
+ from concurrent.futures import ThreadPoolExecutor, as_completed
14
+ import sys
15
+ import concurrent.futures
16
+
17
+ from datasets import load_dataset
18
+ import optimize # the file Weco mutates
19
+
20
+ # ---------------------------------------------------------------------
21
+ # Configuration
22
+ TOTAL_SAMPLES = 30 # how many problems to load
23
+ NUM_WORKERS = 30 # concurrent LLM calls
24
+ LOG_EVERY = 5 # print progress after this many
25
+ MODEL_TO_USE = "gpt-4.1" # Define the model to use HERE
26
+ TASK_TIMEOUT = 300 # seconds per LLM call
27
+ # ---------------------------------------------------------------------
28
+
29
+ print(f"[setup] loading {TOTAL_SAMPLES} problems from AIME 2024 …", flush=True)
30
+ DATA = load_dataset("Maxwell-Jia/AIME_2024", split=f"train[:{TOTAL_SAMPLES}]", cache_dir=".cache")
31
+
32
+
33
+ def extract_final_answer(text: str) -> str:
34
+ """
35
+ Extracts the final AIME answer (000-999) from the LLM response.
36
+ Prioritizes answers within \boxed{}, then looks for patterns,
37
+ and falls back to finding the last 3-digit number.
38
+ """
39
+ # 1. Check for \boxed{...}
40
+ boxed_match = re.search(r"\\boxed\{(\d{1,3})\}", text)
41
+ if boxed_match:
42
+ return boxed_match.group(1).zfill(3) # Pad with leading zeros if needed
43
+
44
+ # 2. Check for "final answer is ..." patterns (case-insensitive)
45
+ # Make sure pattern captures potential variations like "is: 123", "is 123."
46
+ answer_pattern = r"(?:final|answer is|result is)[:\s]*(\d{1,3})\b"
47
+ answer_match = re.search(answer_pattern, text, re.IGNORECASE)
48
+ if answer_match:
49
+ return answer_match.group(1).zfill(3)
50
+
51
+ # 3. Fallback: Find the last occurrence of a 1-3 digit number in the text
52
+ # This is less reliable but can be a fallback.
53
+ # Let's refine the fallback regex to be slightly more specific
54
+ # Look for isolated 1-3 digit numbers, possibly at the end or after keywords.
55
+ fallback_matches = re.findall(r"\b(\d{1,3})\b", text)
56
+ if fallback_matches:
57
+ # Return the last found number, assuming it's the most likely answer candidate
58
+ return fallback_matches[-1].zfill(3)
59
+
60
+ return "" # Return empty if no answer found
61
+
62
+
63
+ def grade_answer(llm_output: str, ground_truth_answer: str) -> bool:
64
+ """Compares the extracted LLM answer to the ground truth."""
65
+ extracted_guess = extract_final_answer(llm_output)
66
+ # Ground truth answers in AIME are typically strings "000" to "999"
67
+ # Ensure comparison is consistent (e.g., both as strings, potentially padded)
68
+ # The ground truth from the dataset seems to be string integers already.
69
+ # Let's ensure the extracted guess is also treated as a simple integer string for comparison.
70
+ # The ground truth might not be zero-padded in the dataset, so compare integers.
71
+ try:
72
+ # Check if both can be converted to integers for comparison
73
+ return int(extracted_guess) == int(ground_truth_answer)
74
+ except ValueError:
75
+ # If conversion fails (e.g., empty string), they don't match
76
+ return False
77
+
78
+
79
+ def run_evaluation() -> float:
80
+ """Runs the evaluation on the dataset and returns the accuracy."""
81
+ correct = 0
82
+ start = time.time()
83
+ results = [] # Store results for potential later analysis if needed
84
+
85
+ with ThreadPoolExecutor(max_workers=NUM_WORKERS) as pool:
86
+ # Submit all tasks, passing the MODEL_TO_USE
87
+ futures = {
88
+ pool.submit(optimize.solve, row["Problem"], MODEL_TO_USE): row["Answer"] for row in DATA
89
+ } # Pass MODEL_TO_USE here
90
+
91
+ try:
92
+ # Process completed tasks
93
+ for idx, future in enumerate(as_completed(futures), 1):
94
+ problem_answer = futures[future] # Get the corresponding ground truth answer
95
+ try:
96
+ # Wait up to TASK_TIMEOUT seconds for each LLM call
97
+ llm_raw_output = future.result(timeout=TASK_TIMEOUT)
98
+ is_correct = grade_answer(llm_raw_output, str(problem_answer))
99
+ if is_correct:
100
+ correct += 1
101
+ results.append({"raw_output": llm_raw_output, "correct_answer": problem_answer, "is_correct": is_correct})
102
+
103
+ except Exception as exc:
104
+ print(f"[error] Generated an exception: {exc}")
105
+ results.append({"raw_output": f"Error: {exc}", "correct_answer": problem_answer, "is_correct": False})
106
+
107
+ if idx % LOG_EVERY == 0 or idx == TOTAL_SAMPLES:
108
+ elapsed = time.time() - start
109
+ current_accuracy = correct / idx if idx > 0 else 0
110
+ print(
111
+ f"[progress] {idx}/{TOTAL_SAMPLES} completed, accuracy: {current_accuracy:.4f}, elapsed {elapsed:.1f} s",
112
+ flush=True,
113
+ )
114
+ except concurrent.futures.TimeoutError:
115
+ # Abort any stuck LLM calls
116
+ print(f"[error] LLM call timed out after {TASK_TIMEOUT}s", flush=True)
117
+ # Cancel all pending futures and exit
118
+ for f in futures:
119
+ f.cancel()
120
+ print("Exiting due to timeout", file=sys.stderr)
121
+ sys.exit(1)
122
+ except KeyboardInterrupt:
123
+ print("\nEvaluation interrupted by user", file=sys.stderr)
124
+ sys.exit(1)
125
+
126
+ # Final accuracy calculation
127
+ total_evaluated = len(results)
128
+ final_accuracy = correct / total_evaluated if total_evaluated > 0 else 0
129
+ return final_accuracy
130
+
131
+
132
+ if __name__ == "__main__":
133
+ acc = run_evaluation()
134
+ # Weco parses this exact line format
135
+ print(f"accuracy: {acc:.4f}")
@@ -0,0 +1,34 @@
1
+ # weco-cli/examples/prompt/optimize.py
2
+ """
3
+ optimize.py
4
+ This module holds the prompt template and the LLM call.
5
+ Weco modifies this file to optimize the prompt instructions.
6
+ The model used for the LLM call is passed in from eval.py.
7
+ """
8
+
9
+ from openai import OpenAI
10
+
11
+ client = OpenAI() # API key must be in OPENAI_API_KEY
12
+ # MODEL constant removed from here
13
+
14
+ PROMPT_TEMPLATE = """You are an expert competition mathematician tasked with solving an AIME problem.
15
+ The final answer must be a three-digit integer between 000 and 999, inclusive.
16
+ Please reason step-by-step towards the solution. Keep your reasoning concise.
17
+ Conclude your response with the final answer enclosed in \\boxed{{}}. For example: The final answer is \\boxed{{042}}.
18
+
19
+ Problem:
20
+ {problem}
21
+
22
+ Solution:
23
+ """
24
+
25
+
26
+ def solve(problem: str, model_name: str) -> str:
27
+ """Return the model's raw text answer for one problem using the specified model."""
28
+ prompt = PROMPT_TEMPLATE.format(problem=problem)
29
+
30
+ response = client.chat.completions.create(
31
+ model=model_name, # Use the passed-in model name
32
+ messages=[{"role": "user", "content": prompt}],
33
+ )
34
+ return response.choices[0].message.content.strip()
@@ -0,0 +1,45 @@
1
+ # Weco Prompt Optimization Guidelines for AIME (Targeting GPT-4.1)
2
+
3
+ ## 1. Goal
4
+
5
+ Your objective is to modify the the `optimize.py` file to improve the `accuracy` metric when solving AIME math problems. The modifications should leverage the capabilities of the target model, **GPT-4.1**.
6
+
7
+ ## 2. Files and Workflow
8
+
9
+ * **Target File for Modification:** `optimize.py`. * **Evaluation Script:** `eval.py`. This script:
10
+ * Defines the actual LLM used for solving (`MODEL_TO_USE`, which is set to `gpt-4.1` in this context).
11
+ * Calls `optimize.solve(problem, model_name="gpt-4.1")`.
12
+ * Parses the output from `optimize.solve`. **Crucially, it expects the final 3-digit answer (000-999) to be enclosed in `\boxed{XXX}`.** For example: `\boxed{042}`. Your prompt modifications *must* ensure the model consistently produces this format for the final answer.
13
+ * Compares the extracted answer to the ground truth and prints the `accuracy:` metric, which Weco uses for guidance.
14
+
15
+ ## 3. Target Model: GPT-4.1
16
+
17
+ You are optimizing the prompt for `gpt-4.1`. Based on its characteristics, consider the following:
18
+
19
+ * **Strengths:**
20
+ * **Significantly Improved Instruction Following:** GPT-4.1 is better at adhering to complex instructions, formats, and constraints compared to previous models. This is key for AIME where precision is vital. It excels on hard instruction-following tasks.
21
+ * **Stronger Coding & Reasoning:** Its improved coding performance (e.g., SWE-bench) suggests enhanced logical reasoning capabilities applicable to mathematical problem-solving.
22
+ * **Refreshed Knowledge:** Knowledge cutoff is June 2024.
23
+ * **Considerations:**
24
+ * **Literal Interpretation:** GPT-4.1 can be more literal. Prompts should be explicit and specific about the desired reasoning process and output format. Avoid ambiguity.
25
+
26
+ ## 4. Optimization Strategies (Focus on `PROMPT_TEMPLATE` in `optimize.py`)
27
+
28
+ The primary goal is to enhance the model's reasoning process for these challenging math problems. Focus on Chain-of-Thought (CoT) designs within the `PROMPT_TEMPLATE`.
29
+
30
+ **Ideas to Explore:**
31
+ You don't have to implement all of them, but the following ideas might be helpful:
32
+ * **Workflow Patterns** try to use some of the following patterns:
33
+ * **Linear**: Linear workflow, standarded CoT E.g. considering the following thinking steps (you don't have to include all of them), "1. Understand the problem constraints. 2. Identify relevant theorems/formulas. 3. Formulate a plan. 4. Execute calculations step-by-step. 5. Verify intermediate results. 6. State the final answer in the required format."
34
+ * **List Candidates**: You can ask the model to propose a few solutions in a particular step and pick the best solution. You can potentially also set the criterias in the prompt.
35
+ * **Code** Use pesudo code to define even more complex workflows with loops, conditional statement, or go to statement.
36
+ * **Other CoT Techniques:**
37
+ * Self-Correction/Reflection
38
+ * Plan Generation
39
+ * Debate, simulating multiple characters
40
+ * Tree of thought
41
+ * **Few-Shot Examples:** You *could* experiment with adding 1-2 high-quality AIME problem/solution examples directly into the `PROMPT_TEMPLATE` string (similar to how Weco attempted in one of the runs). Ensure the examples clearly show the desired reasoning style and the final `\boxed{XXX}` format.
42
+ * **Play with format:** The way you format the prompt. Markdown, xml, json, code or natural language. Similarly for the thinking tokens themselves you can also try out different formats.
43
+
44
+ ## 5. Constraints
45
+ * **Ensure the final output reliably contains `\boxed{XXX}` as the evaluation script depends on it.**
@@ -0,0 +1,38 @@
1
+ # Example: Optimizing PyTorch Self-Attention with Triton
2
+
3
+ This example demonstrates using Weco to optimize a causal multi-head self-attention mechanism, a core component of Transformer models, implemented in PyTorch. The optimization target is to leverage [Triton](https://github.com/triton-lang/triton), a language and compiler for writing highly efficient GPU code, to accelerate the operation.
4
+
5
+ ## Setup
6
+
7
+ 1. Ensure you are in the `examples/triton` directory.
8
+ 2. Install the required dependencies:
9
+ ```bash
10
+ pip install torch triton
11
+ ```
12
+ *(Note: Triton installation might require specific CUDA versions. Refer to the official Triton documentation if you encounter issues.)*
13
+
14
+ ## Optimization Command
15
+
16
+ Run the following command to start the optimization process:
17
+
18
+ ```bash
19
+ weco --source optimize.py \
20
+ --eval-command "python evaluate.py --solution-path optimize.py" \
21
+ --metric speedup \
22
+ --maximize true \
23
+ --steps 30 \
24
+ --model gemini-2.5-pro-exp-03-25 \
25
+ --additional-instructions "Use triton to optimize the code while ensuring a small max float diff. Maintain the same code format."
26
+ ```
27
+
28
+ ### Explanation
29
+
30
+ * `--source optimize.py`: The PyTorch self-attention implementation to be optimized.
31
+ * `--eval-command "python evaluate.py --solution-path optimize.py"`: Executes the evaluation script, which benchmarks the `optimize.py` code against a baseline and prints the `speedup`.
32
+ * `--metric speedup`: The target metric for optimization.
33
+ * `--maximize true`: Weco should maximize the speedup.
34
+ * `--steps 30`: The number of optimization iterations.
35
+ * `--model gemini-2.5-pro-exp-03-25`: The LLM driving the optimization.
36
+ * `--additional-instructions "..."`: Provides specific guidance to the LLM, instructing it to use Triton, maintain numerical accuracy ("small max float diff"), and preserve the code structure.
37
+
38
+ Weco will iteratively refine `optimize.py` using Triton, guided by the evaluation results and the provided instructions.
@@ -10,7 +10,7 @@ authors = [
10
10
  ]
11
11
  description = "Documentation for `weco`, a CLI for using Weco AI's code optimizer."
12
12
  readme = "README.md"
13
- version = "0.2.8"
13
+ version = "0.2.10"
14
14
  license = {text = "MIT"}
15
15
  requires-python = ">=3.8"
16
16
  dependencies = ["requests", "rich"]
@@ -1,4 +1,4 @@
1
1
  # DO NOT EDIT
2
- __pkg_version__ = "0.2.8"
2
+ __pkg_version__ = "0.2.10"
3
3
  __api_version__ = "v1"
4
4
  __base_url__ = f"https://api.aide.weco.ai/{__api_version__}"
@@ -57,6 +57,11 @@ def main() -> None:
57
57
  type=str,
58
58
  help="Description of additional instruction or path to a file containing additional instructions",
59
59
  )
60
+ parser.add_argument(
61
+ "--preserve-source",
62
+ action="store_true",
63
+ help="If set, do not overwrite the original source file; only save modified versions in the runs directory",
64
+ )
60
65
  args = parser.parse_args()
61
66
 
62
67
  try:
@@ -73,15 +78,16 @@ def main() -> None:
73
78
  "debug_prob": 0.5,
74
79
  "max_debug_depth": max(1, math.ceil(0.1 * steps)), # 10% of steps
75
80
  }
81
+ # Read API keys
82
+ api_keys = read_api_keys_from_env()
83
+ # API request timeout
84
+ timeout = 800
85
+
76
86
  # Read additional instructions
77
87
  additional_instructions = read_additional_instructions(additional_instructions=args.additional_instructions)
78
88
  # Read source code
79
89
  source_fp = pathlib.Path(args.source)
80
90
  source_code = read_from_path(fp=source_fp, is_json=False)
81
- # Read API keys
82
- api_keys = read_api_keys_from_env()
83
- # API request timeout
84
- timeout = 800
85
91
 
86
92
  # Initialize panels
87
93
  summary_panel = SummaryPanel(
@@ -119,12 +125,13 @@ def main() -> None:
119
125
  runs_dir.mkdir(parents=True, exist_ok=True)
120
126
 
121
127
  # Save the original code (.runs/<session-id>/original.<extension>)
122
- runs_copy_source_fp = runs_dir / f"original.{source_fp.suffix}"
128
+ runs_copy_source_fp = runs_dir / f"original{source_fp.suffix}"
123
129
  write_to_path(fp=runs_copy_source_fp, content=source_code)
124
130
 
125
131
  # Write the code string to the source file path
126
132
  # Do this after the original code is saved
127
- write_to_path(fp=source_fp, content=session_response["code"])
133
+ if not args.preserve_source:
134
+ write_to_path(fp=source_fp, content=session_response["code"])
128
135
 
129
136
  # Update the panels with the initial solution
130
137
  # Add session id now that we have it
@@ -191,20 +198,25 @@ def main() -> None:
191
198
  )
192
199
 
193
200
  for step in range(1, steps):
201
+ # Re-read instructions from the original source (file path or string) BEFORE each suggest call
202
+ current_additional_instructions = read_additional_instructions(
203
+ additional_instructions=args.additional_instructions
204
+ )
194
205
  # Evaluate the current output and get the next solution
195
206
  eval_and_next_solution_response = evaluate_feedback_then_suggest_next_solution(
196
207
  console=console,
197
208
  session_id=session_id,
198
209
  execution_output=term_out,
199
- additional_instructions=additional_instructions,
210
+ additional_instructions=current_additional_instructions,
200
211
  api_keys=api_keys,
201
212
  timeout=timeout,
202
213
  )
203
214
  # Save next solution (.runs/<session-id>/step_<step>.<extension>)
204
- write_to_path(fp=runs_dir / f"step_{step}.{source_fp.suffix}", content=eval_and_next_solution_response["code"])
215
+ write_to_path(fp=runs_dir / f"step_{step}{source_fp.suffix}", content=eval_and_next_solution_response["code"])
205
216
 
206
217
  # Write the next solution to the source file
207
- write_to_path(fp=source_fp, content=eval_and_next_solution_response["code"])
218
+ if not args.preserve_source:
219
+ write_to_path(fp=source_fp, content=eval_and_next_solution_response["code"])
208
220
 
209
221
  # Get the optimization session status for
210
222
  # the best solution, its score, and the history to plot the tree
@@ -283,12 +295,16 @@ def main() -> None:
283
295
  transition_delay=0.1, # Slightly longer delay for evaluation results
284
296
  )
285
297
 
298
+ # Re-read instructions before the final feedback step
299
+ current_additional_instructions = read_additional_instructions(
300
+ additional_instructions=args.additional_instructions
301
+ )
286
302
  # Ensure we pass evaluation results for the last step's generated solution
287
303
  eval_and_next_solution_response = evaluate_feedback_then_suggest_next_solution(
288
304
  console=console,
289
305
  session_id=session_id,
290
306
  execution_output=term_out,
291
- additional_instructions=additional_instructions,
307
+ additional_instructions=current_additional_instructions,
292
308
  api_keys=api_keys,
293
309
  timeout=timeout,
294
310
  )
@@ -352,10 +368,11 @@ def main() -> None:
352
368
  best_solution_content = f"# Best solution from Weco with a score of {best_score_str}\n\n{best_solution_code}"
353
369
 
354
370
  # Save best solution to .runs/<session-id>/best.<extension>
355
- write_to_path(fp=runs_dir / f"best.{source_fp.suffix}", content=best_solution_content)
371
+ write_to_path(fp=runs_dir / f"best{source_fp.suffix}", content=best_solution_content)
356
372
 
357
373
  # write the best solution to the source file
358
- write_to_path(fp=source_fp, content=best_solution_content)
374
+ if not args.preserve_source:
375
+ write_to_path(fp=source_fp, content=best_solution_content)
359
376
 
360
377
  console.print(end_optimization_layout)
361
378
 
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: weco
3
- Version: 0.2.8
3
+ Version: 0.2.10
4
4
  Summary: Documentation for `weco`, a CLI for using Weco AI's code optimizer.
5
5
  Author-email: Weco AI Team <contact@weco.ai>
6
6
  License: MIT
@@ -76,7 +76,7 @@ The `weco` CLI leverages a tree search approach guided by Large Language Models
76
76
 
77
77
  This basic example shows how to optimize a simple PyTorch function for speedup.
78
78
 
79
- For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)t**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
79
+ For more advanced examples, including **[Metal/MLX](/examples/metal/README.md), [Triton](/examples/triton/README.md), [CUDA kernel optimization](/examples/cuda/README.md)**, and **[ML model optimization](/examples/spaceship-titanic/README.md)**, please see the `README.md` files within the corresponding subdirectories under the [`examples/`](./examples/) folder.
80
80
 
81
81
  ```bash
82
82
  # Navigate to the example directory
@@ -108,9 +108,10 @@ weco --source optimize.py \
108
108
  | `--metric` | The name of the metric you want to optimize (e.g., 'accuracy', 'speedup', 'loss'). This metric name should match what's printed by your `--eval-command`. | Yes |
109
109
  | `--maximize` | Whether to maximize (`true`) or minimize (`false`) the metric. | Yes |
110
110
  | `--steps` | Number of optimization steps (LLM iterations) to run. | Yes |
111
- | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.5-sonnet`). Recommended models to try include `o3-mini`, `claude-3-haiku`, and `gemini-2.5-pro-exp-03-25`.| Yes |
111
+ | `--model` | Model identifier for the LLM to use (e.g., `gpt-4o`, `claude-3.7-sonnet`). Recommended models to try include `o4-mini`, and `gemini-2.5-pro-exp-03-25`.| Yes |
112
112
  | `--additional-instructions` | (Optional) Natural language description of specific instructions OR path to a file containing detailed instructions to guide the LLM. | No |
113
113
  | `--log-dir` | (Optional) Path to the directory to log intermediate steps and final optimization result. Defaults to `.runs/`. | No |
114
+ | `--preserve-source` | (Optional) If set, do not overwrite the original `--source` file. Modifications and the best solution will still be saved in the `--log-dir`. | No |
114
115
 
115
116
  ---
116
117
 
@@ -14,6 +14,10 @@ examples/metal/README.md
14
14
  examples/metal/evaluate.py
15
15
  examples/metal/examples.rst
16
16
  examples/metal/optimize.py
17
+ examples/prompt/README.md
18
+ examples/prompt/eval.py
19
+ examples/prompt/optimize.py
20
+ examples/prompt/prompt_guide.md
17
21
  examples/spaceship-titanic/README.md
18
22
  examples/spaceship-titanic/baseline.py
19
23
  examples/spaceship-titanic/evaluate.py
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes