prompt-better 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- prompt_better-0.1.0/LICENSE +21 -0
- prompt_better-0.1.0/PKG-INFO +600 -0
- prompt_better-0.1.0/README.md +575 -0
- prompt_better-0.1.0/prompt_better/__init__.py +1 -0
- prompt_better-0.1.0/prompt_better/cli.py +292 -0
- prompt_better-0.1.0/prompt_better/config.py +270 -0
- prompt_better-0.1.0/prompt_better/dataset_manager/__init__.py +12 -0
- prompt_better-0.1.0/prompt_better/dataset_manager/dataset-schema.json +47 -0
- prompt_better-0.1.0/prompt_better/dataset_manager/golden-schema.json +23 -0
- prompt_better-0.1.0/prompt_better/dataset_manager/golden_generator.py +82 -0
- prompt_better-0.1.0/prompt_better/dataset_manager/loader.py +129 -0
- prompt_better-0.1.0/prompt_better/dataset_manager/metrics.py +41 -0
- prompt_better-0.1.0/prompt_better/dataset_manager/models.py +20 -0
- prompt_better-0.1.0/prompt_better/dspy_manager/__init__.py +7 -0
- prompt_better-0.1.0/prompt_better/dspy_manager/evaluator.py +237 -0
- prompt_better-0.1.0/prompt_better/dspy_manager/fallbacks.py +115 -0
- prompt_better-0.1.0/prompt_better/dspy_manager/models.py +49 -0
- prompt_better-0.1.0/prompt_better/dspy_manager/openai_structured.py +272 -0
- prompt_better-0.1.0/prompt_better/dspy_manager/optimizer.py +508 -0
- prompt_better-0.1.0/prompt_better/dspy_manager/optimizers.py +135 -0
- prompt_better-0.1.0/prompt_better/dspy_manager/reporter.py +122 -0
- prompt_better-0.1.0/prompt_better/dspy_manager/templates/prompt_summary.j2 +49 -0
- prompt_better-0.1.0/prompt_better/dspy_manager/utils.py +29 -0
- prompt_better-0.1.0/prompt_better/prompt_json/__init__.py +12 -0
- prompt_better-0.1.0/prompt_better/prompt_json/codegen.py +20 -0
- prompt_better-0.1.0/prompt_better/prompt_json/dspy_converter.py +125 -0
- prompt_better-0.1.0/prompt_better/prompt_json/generator.py +35 -0
- prompt_better-0.1.0/prompt_better/prompt_json/loader.py +44 -0
- prompt_better-0.1.0/prompt_better/prompt_json/models.py +142 -0
- prompt_better-0.1.0/prompt_better/prompt_json/prompt-schema.json +122 -0
- prompt_better-0.1.0/prompt_better/prompt_json/templates/swift.jinja2 +40 -0
- prompt_better-0.1.0/prompt_better.egg-info/PKG-INFO +600 -0
- prompt_better-0.1.0/prompt_better.egg-info/SOURCES.txt +37 -0
- prompt_better-0.1.0/prompt_better.egg-info/dependency_links.txt +1 -0
- prompt_better-0.1.0/prompt_better.egg-info/entry_points.txt +2 -0
- prompt_better-0.1.0/prompt_better.egg-info/requires.txt +6 -0
- prompt_better-0.1.0/prompt_better.egg-info/top_level.txt +1 -0
- prompt_better-0.1.0/pyproject.toml +72 -0
- prompt_better-0.1.0/setup.cfg +4 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Paul Hackenberger Contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,600 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: prompt-better
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: DSPy-based prompt optimization prompts.
|
|
5
|
+
Author-email: Paul Hackenberger <paul.hackenberger@gmail.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/pkcpkc/prompt-better
|
|
8
|
+
Project-URL: Bug Tracker, https://github.com/pkcpkc/prompt-better/issues
|
|
9
|
+
Classifier: Programming Language :: Python :: 3
|
|
10
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Operating System :: OS Independent
|
|
15
|
+
Requires-Python: >=3.9
|
|
16
|
+
Description-Content-Type: text/markdown
|
|
17
|
+
License-File: LICENSE
|
|
18
|
+
Requires-Dist: dspy>=2.6.0
|
|
19
|
+
Requires-Dist: openai>=1.50.0
|
|
20
|
+
Requires-Dist: pydantic>=2.0.0
|
|
21
|
+
Requires-Dist: Jinja2>=3.1.0
|
|
22
|
+
Requires-Dist: beautifulsoup4>=4.12.0
|
|
23
|
+
Requires-Dist: requests>=2.31.0
|
|
24
|
+
Dynamic: license-file
|
|
25
|
+
|
|
26
|
+
# JSON-Driven Prompt Optimization Framework based on DSPy
|
|
27
|
+
|
|
28
|
+
[](https://pypi.org/project/prompt-better/)
|
|
29
|
+
[](https://opensource.org/licenses/MIT)
|
|
30
|
+
[](https://pypi.org/project/prompt-better/)
|
|
31
|
+
|
|
32
|
+
**_ANYTHING YOU CAN PROMPT, I CAN PROMPT BETTER!_**
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## 1. What This Is About
|
|
37
|
+
|
|
38
|
+
`prompt-better` is a generic, reusable, and platform-agnostic framework designed to validate, execute, and optimize Large Language Model (LLM) prompts. Instead of hardcoding prompts inside application source files, prompts are defined in a language-agnostic data asset (`prompt.json`), establishing a **Single Source of Truth (SSOT)**.
|
|
39
|
+
|
|
40
|
+
### Built on Top of DSPy & Pydantic
|
|
41
|
+
At runtime, `prompt-better` parses the JSON specification and builds type-safe Pydantic models on-the-fly. It automatically generates a corresponding **DSPy Signature** (`Inputs -> PydanticSchemaClass`), mapping unstructured model outputs into strictly validated objects.
|
|
42
|
+
|
|
43
|
+
### Coached Student-Teacher Model Paradigm
|
|
44
|
+
To support resource-constrained target devices (like local 3B–8B weights), the framework uses a **Coached Student-Teacher Pipeline**:
|
|
45
|
+
* **Teacher Model**: A high-capacity cloud model (e.g., GPT-4o) that drafts prompt/instruction variations, synthesizes few-shot demonstrations, and evaluates execution quality against custom rubrics.
|
|
46
|
+
* **Student Model**: The target local or on-device model being optimized. The student executes prompt candidates during the compilation loop.
|
|
47
|
+
|
|
48
|
+
### iOS and macOS On-Device Foundation Models
|
|
49
|
+
`prompt-better` integrates natively with Apple's local **LanguageModelSession** API (iOS 26+ and macOS 26+) using Vapor-based HTTP bridges:
|
|
50
|
+
* [macOS Vapor Bridge](AIBridges/macOS): Exposes Apple's on-device foundation models as an OpenAI-compatible REST API.
|
|
51
|
+
* [iOS Vapor Bridge](AIBridges/iOS): Runs directly on physical iOS devices or simulators to serve local model weights.
|
|
52
|
+
|
|
53
|
+
This allows the optimization engine to tune prompts directly for the specific hardware, quantization limits, and quirks of Apple's on-device silicon runtimes.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## 2. Short Example for Optimize
|
|
58
|
+
|
|
59
|
+
Follow these steps to optimize instructions for the sample `TopicClassifierPrompt`. A concrete reference implementation can be found in the [example/](example/) folder, which contains the [TopicClassifier](example/prompts/TopicClassifier) example prompt specification, its dataset, and automated scripts.
|
|
60
|
+
|
|
61
|
+
### Step 1: Set Up Python Environment
|
|
62
|
+
Install `prompt-better` in editable mode using `uv` and trust python runtimes configured in `mise.toml`:
|
|
63
|
+
```bash
|
|
64
|
+
# Trust and install local python versions
|
|
65
|
+
mise trust && mise install
|
|
66
|
+
|
|
67
|
+
# Install prompt-better locally
|
|
68
|
+
mise exec -- uv pip install -e .
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Step 2: Start Student Vapor Bridge (Example: macOS Bridge)
|
|
72
|
+
Build and run the Apple Silicon bridge server (translates on-device models to `/v1/chat/completions` endpoints):
|
|
73
|
+
```bash
|
|
74
|
+
cd AIBridges/macOS
|
|
75
|
+
swift build && swift run App serve --hostname 127.0.0.1 --port 8080
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### Step 3: Setup Configuration & Credentials
|
|
79
|
+
|
|
80
|
+
The framework resolves configurations hierarchically: CLI arguments > Environment variables > Configuration files (`prompt-better.json`).
|
|
81
|
+
|
|
82
|
+
1. **Configuration File (`prompt-better.json`)**:
|
|
83
|
+
Create a `prompt-better.json` file in the parent folder of your prompts directory (for this example, `example/prompt-better.json`). This defines student/teacher runtimes and defaults:
|
|
84
|
+
```json
|
|
85
|
+
{
|
|
86
|
+
"student": {
|
|
87
|
+
"base_url": "http://127.0.0.1:8080/v1",
|
|
88
|
+
"model": "apple-intelligence",
|
|
89
|
+
"temperature": 0.2
|
|
90
|
+
},
|
|
91
|
+
"teacher": {
|
|
92
|
+
"base_url": "https://api.openai.com/v1",
|
|
93
|
+
"model": "gpt-4o",
|
|
94
|
+
"temperature": 0.2,
|
|
95
|
+
"eval_temperature": 0.0
|
|
96
|
+
},
|
|
97
|
+
"auto_mode": "light",
|
|
98
|
+
"num_threads": 1
|
|
99
|
+
}
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
2. **Credentials (API Keys)**:
|
|
103
|
+
> [!IMPORTANT]
|
|
104
|
+
> For security, API keys **cannot** be stored in the `prompt-better.json` configuration file. Doing so will trigger a validation error.
|
|
105
|
+
|
|
106
|
+
Provide API keys using either environment variables or direct CLI arguments:
|
|
107
|
+
* **Via Environment Variables** (Recommended):
|
|
108
|
+
```bash
|
|
109
|
+
export PROMPT_BETTER_TEACHER_API_KEY="sk-proj-..."
|
|
110
|
+
```
|
|
111
|
+
* **Via CLI Flags**:
|
|
112
|
+
Pass `--teacher-api-key "sk-proj-..."` directly during command execution.
|
|
113
|
+
|
|
114
|
+
### Step 4: Run DSPy Optimization
|
|
115
|
+
Run the `optimize` command. Pass `--no-requires-permission-to-run` to bypass estimated token cost warnings when targeting free local endpoints:
|
|
116
|
+
```bash
|
|
117
|
+
python3 -m prompt_better.cli optimize \
|
|
118
|
+
--prompts-dir example/prompts \
|
|
119
|
+
--prompt TopicClassifierPrompt \
|
|
120
|
+
--no-requires-permission-to-run
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
During execution, the DSPy `MIPROv2` compiler will propose instruction re-writes via the Teacher model, run them on the Student bridge, and evaluate output quality.
|
|
124
|
+
|
|
125
|
+
The optimization output is written to:
|
|
126
|
+
* `results/optimized-prompt.json`: Optimized prompt with winning instructions.
|
|
127
|
+
* `results/optimize-report.json`: Metrics report and validation summary.
|
|
128
|
+
* `results/dspy.json`: Serialized DSPy compile state.
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## 3. Subcommand `validate`
|
|
133
|
+
|
|
134
|
+
The `validate` subcommand evaluates the **status quo** of your prompt's baseline instructions against a target dataset. It does not perform instruction rewriting or few-shot compilation. Instead, it measures how well the target Student model conforms to structure and accuracy guidelines under the current prompt instructions.
|
|
135
|
+
|
|
136
|
+
> [!TIP]
|
|
137
|
+
> For a deeper conceptual foundation on evaluating AI systems, we recommend referring to the book **"AI Engineering - Building Application with Foundation Models"** by **Chip Huyen** (O'Reilly), specifically **Chapter 4. Evaluate AI Systems**.
|
|
138
|
+
|
|
139
|
+
### Mathematical Scoring Formulas
|
|
140
|
+
|
|
141
|
+
For each evaluation case, the candidate output is rated between `0.0` and `1.0` using weighted scores. Since a Teacher Model is required, the validator fetches a semantic grade from the teacher and averages it with structural and similarity scores:
|
|
142
|
+
|
|
143
|
+
$$\text{Aggregate Score} = \frac{(0.55 \times S_{\text{structural}} + 0.45 \times S_{\text{similarity}}) + S_{\text{teacher}}}{2}$$
|
|
144
|
+
|
|
145
|
+
### Scoring Metrics & Code References
|
|
146
|
+
|
|
147
|
+
The scores are resolved in code inside [evaluator.py](prompt_better/dspy_manager/evaluator.py) via a resolved `Evaluator` instance (by default, `DefaultEvaluator` inherits from `BaseEvaluator`):
|
|
148
|
+
|
|
149
|
+
1. **Validation Loop**: [validate_prompt](prompt_better/dspy_manager/optimizer.py) iterates through prompt files and gathers results using `_validate_single_example`.
|
|
150
|
+
2. **Structural Score ($S_{\text{structural}}$)**: Calculated in `structural_score`. It verifies:
|
|
151
|
+
* Fields map precisely to target JSON schema types.
|
|
152
|
+
* Array counts match specified boundaries (e.g. `min_count`, `max_count`).
|
|
153
|
+
3. **Similarity Score ($S_{\text{similarity}}$)**: Calculated in `similarity_score`. It measures token-level F1 overlap (precision and recall of matching tokens) between the generated values and the golden-truth references.
|
|
154
|
+
4. **Teacher Score ($S_{\text{teacher}}$)**: Resolved via `teacher_score`. The Teacher model receives a structured grading schema containing the prompt instructions, inputs, candidate output, reference output, and quality rubric. It responds with a numeric score (`0.0` to `1.0`) and a text justification.
|
|
155
|
+
|
|
156
|
+
### Custom Evaluator Implementations
|
|
157
|
+
|
|
158
|
+
You can customize the evaluation and scoring by providing your own evaluator subclassing `BaseEvaluator`:
|
|
159
|
+
|
|
160
|
+
```python
|
|
161
|
+
from prompt_better.dspy_manager import BaseEvaluator
|
|
162
|
+
|
|
163
|
+
class CustomEvaluator(BaseEvaluator):
|
|
164
|
+
def structural_score(self, spec, candidate) -> float:
|
|
165
|
+
# Custom structural scoring
|
|
166
|
+
return 1.0
|
|
167
|
+
|
|
168
|
+
def similarity_score(self, spec, reference, candidate) -> float:
|
|
169
|
+
# Custom similarity scoring
|
|
170
|
+
return 0.8
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
#### Setting the Custom Evaluator
|
|
174
|
+
|
|
175
|
+
You can configure the custom evaluator dynamically in three ways (resolved hierarchically):
|
|
176
|
+
|
|
177
|
+
1. **CLI Flag**: Specify `--evaluator path.to.module:CustomEvaluator` or file path `path/to/script.py:CustomEvaluator` (or simply `path/to/script.py` which auto-detects the subclass).
|
|
178
|
+
2. **Environment Variable**: `export PROMPT_BETTER_EVALUATOR="path.to.module:CustomEvaluator"`
|
|
179
|
+
3. **Global Config (`prompt-better.json`)**:
|
|
180
|
+
```json
|
|
181
|
+
{
|
|
182
|
+
"evaluator": "path.to.module:CustomEvaluator"
|
|
183
|
+
}
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
### Validation Flow Diagram
|
|
187
|
+
|
|
188
|
+
```mermaid
|
|
189
|
+
flowchart TD
|
|
190
|
+
Start["Start cli validate"] --> Scan["Scan --prompts-dir for prompt.json"]
|
|
191
|
+
Scan --> LoadDataset["Load dataset/ and golden-truth/ case cases"]
|
|
192
|
+
LoadDataset --> Iterate["For each test case..."]
|
|
193
|
+
Iterate --> BuildPayload["Compile prompt instructions + case inputs"]
|
|
194
|
+
BuildPayload --> RunStudent["Invoke Student Model via HTTP Bridge"]
|
|
195
|
+
RunStudent --> ParseOutput["Parse output JSON schema"]
|
|
196
|
+
ParseOutput --> CalcStruct["Calculate Structural Score (55% weight)"]
|
|
197
|
+
ParseOutput --> CalcSim["Calculate Token F1 Similarity (45% weight)"]
|
|
198
|
+
|
|
199
|
+
CalcStruct --> CallTeacher["Invoke Teacher Model to grade output against rubric"]
|
|
200
|
+
CalcSim --> CallTeacher
|
|
201
|
+
CallTeacher --> CalcTeacher["Average (Base Scores + Teacher Score)"]
|
|
202
|
+
CalcTeacher --> SaveResult["Save ValidationResult metrics"]
|
|
203
|
+
|
|
204
|
+
SaveResult --> CheckMore{More cases?}
|
|
205
|
+
CheckMore -- Yes --> Iterate
|
|
206
|
+
CheckMore -- No --> GenerateReport["Output JSON report & print summary stats"]
|
|
207
|
+
GenerateReport --> End["End validate"]
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## 4. Subcommand `optimize`
|
|
213
|
+
|
|
214
|
+
The `optimize` subcommand applies per default the DSPy **MIPROv2** (Multi-prompt Instruction Proposal and Few-shot Optimization) compiler to find the best instructions and few-shot examples for your target student model.
|
|
215
|
+
|
|
216
|
+
> [!TIP]
|
|
217
|
+
> For a deeper conceptual foundation on engineering and compiling prompts, we recommend referring to the book **"AI Engineering - Building Application with Foundation Models"** by **Chip Huyen** (O'Reilly), specifically **Chapter 5. Prompt Engineering**.
|
|
218
|
+
|
|
219
|
+
> [!NOTE]
|
|
220
|
+
> By default, the final evaluation at the end of optimization runs on **all dataset cases** to print a complete baseline vs optimized comparison table. To run the final evaluation only on the held-out evaluation set (`evalset`) slice, specify the `--eval-cases-only` flag.
|
|
221
|
+
|
|
222
|
+
> [!IMPORTANT]
|
|
223
|
+
> **iOS & On-Device Model Recommendation**: If you compile your prompt specification to a Swift type conformant to `GenerableWithPrompt` (which utilizes Apple's native schema-guided structured output), you have two options for handling optimization:
|
|
224
|
+
>
|
|
225
|
+
> * **Option 1: Direct Prediction (Recommended for speed/simplicity)**: Optimize your prompt using `--optimizer predict` (or set `"optimizer": "predict"` in `prompt-better.json`). This compiles the prompt using `dspy.Predict` instead of the default `dspy.ChainOfThought` (which uses `"chain-of-thought"`). Because the default CoT generates formatting instructions instructing the model to output intermediate reasoning with text prefixes (e.g. `Reasoning:` and `Output:`), it conflicts with Apple's native structured JSON schema constraint (where no `reasoning` field exists), leading to parsing or validation errors. Running with `predict` compiles cleanly without these text prefixes.
|
|
226
|
+
>
|
|
227
|
+
> * **Option 2: Schema-guided Chain of Thought (Recommended for accuracy)**: If the target model needs step-by-step reasoning to deliver accurate outputs, you must explicitly model the reasoning field inside your `prompt.json` output schema:
|
|
228
|
+
> ```json
|
|
229
|
+
> "outputs": [
|
|
230
|
+
> {
|
|
231
|
+
> "name": "reasoning",
|
|
232
|
+
> "type": "string",
|
|
233
|
+
> "desc": "Explanation of the context based on domain-specific lexical cues."
|
|
234
|
+
> },
|
|
235
|
+
> {
|
|
236
|
+
> "name": "topic",
|
|
237
|
+
> "type": "string",
|
|
238
|
+
> "desc": "The final classified topic category."
|
|
239
|
+
> }
|
|
240
|
+
> ]
|
|
241
|
+
> ```
|
|
242
|
+
> When compiled, the generated Swift struct will contain both fields as `@Guide` properties:
|
|
243
|
+
> ```swift
|
|
244
|
+
> @Generable
|
|
245
|
+
> struct TopicClassifierPrompt: GenerableWithPrompt {
|
|
246
|
+
> @Guide(description: "Explanation of the context...")
|
|
247
|
+
> var reasoning: String
|
|
248
|
+
>
|
|
249
|
+
> @Guide(description: "The final classified topic category.")
|
|
250
|
+
> var topic: String
|
|
251
|
+
> }
|
|
252
|
+
> ```
|
|
253
|
+
> This aligns the prompt's reasoning instructions with the Swift schema structure, allowing Apple's native session to parse the intermediate reasoning step successfully.
|
|
254
|
+
|
|
255
|
+
|
|
256
|
+
|
|
257
|
+
### Optimization Workflow
|
|
258
|
+
1. **Splitting Dataset**: The command loads optimization cases and splits them into training and evaluation sets based on `--train-ratio` (default `0.8`).
|
|
259
|
+
2. **Generating Candidates (Teacher)**: The high-capacity Teacher model reads the baseline specifications, analyzes errors from initial runs, and generates instruction proposals (candidates).
|
|
260
|
+
3. **Compiling Few-Shot Demonstrations**: DSPy selects successful execution traces from the student model running on the training dataset to include as few-shot bootstrap examples in the compiled prompt.
|
|
261
|
+
4. **Evaluation Iteration**: Candidate instruction proposals are evaluated against the training and validation sets on the student model.
|
|
262
|
+
5. **Selecting the Winner**: The combination of instructions and few-shot demonstrations yielding the highest aggregate score is compiled and saved.
|
|
263
|
+
|
|
264
|
+
### Optimization Flow Diagram
|
|
265
|
+
|
|
266
|
+
```mermaid
|
|
267
|
+
flowchart TD
|
|
268
|
+
Start["Start cli optimize"] --> Load["Load specifications & split dataset (Train vs. Eval)"]
|
|
269
|
+
Load --> CreateSignature["Map prompt context & outputs to DSPy Signature"]
|
|
270
|
+
CreateSignature --> EvalBaseline["Run Student baseline to calculate baseline score"]
|
|
271
|
+
|
|
272
|
+
EvalBaseline --> InitMIPRO["Initialize DSPy MIPROv2 Optimizer"]
|
|
273
|
+
InitMIPRO --> TeacherGenerate["Teacher proposes new instruction variations"]
|
|
274
|
+
|
|
275
|
+
subgraph CompLoop ["Optimization Compilation Loop (num_trials)"]
|
|
276
|
+
TeacherGenerate --> CompileCandidate["Combine candidate instructions + few-shot traces"]
|
|
277
|
+
CompileCandidate --> ExecStudent["Execute candidates on Student model over Trainset"]
|
|
278
|
+
ExecStudent --> GradeCandidate["Calculate metric score (Structural + Similarity)"]
|
|
279
|
+
GradeCandidate --> SelectBest["Track best-performing prompt configuration"]
|
|
280
|
+
end
|
|
281
|
+
|
|
282
|
+
SelectBest --> TestWinner["Evaluate winning prompt against Evalset"]
|
|
283
|
+
TestWinner --> SaveDSPy["Serialize compiled weights to dspy.json"]
|
|
284
|
+
TestWinner --> SaveOptimized["Write prompt.json replacement to optimized-prompt.json"]
|
|
285
|
+
TestWinner --> WriteReport["Generate optimization optimize-report.json"]
|
|
286
|
+
|
|
287
|
+
WriteReport --> ApplyFlag{--apply flag set?}
|
|
288
|
+
ApplyFlag -- Yes --> OverwriteSource["Overwrite source prompt.json with winning instructions"]
|
|
289
|
+
ApplyFlag -- No --> End["End optimize"]
|
|
290
|
+
OverwriteSource --> End
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
### Custom Optimizer Implementations
|
|
294
|
+
|
|
295
|
+
By default, prompt optimization uses the DSPy `MIPROv2` compiler. You can customize the optimization and compilation process by providing your own optimizer subclassing `BaseOptimizer`:
|
|
296
|
+
|
|
297
|
+
```python
|
|
298
|
+
from prompt_better.dspy_manager import BaseOptimizer
|
|
299
|
+
|
|
300
|
+
class CustomOptimizer(BaseOptimizer):
|
|
301
|
+
def compile(
|
|
302
|
+
self,
|
|
303
|
+
config,
|
|
304
|
+
spec,
|
|
305
|
+
specs,
|
|
306
|
+
student_lm,
|
|
307
|
+
teacher_lm,
|
|
308
|
+
trainset,
|
|
309
|
+
evalset,
|
|
310
|
+
metric,
|
|
311
|
+
module,
|
|
312
|
+
):
|
|
313
|
+
# Implement custom compilation or training loops
|
|
314
|
+
...
|
|
315
|
+
return compiled_module
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
#### Setting the Custom Optimizer
|
|
319
|
+
|
|
320
|
+
You can configure the custom optimizer dynamically in three ways (resolved hierarchically):
|
|
321
|
+
|
|
322
|
+
1. **CLI Flag**: Specify `--optimizer path.to.module:CustomOptimizer` or file path `path/to/script.py:CustomOptimizer` (or simply `path/to/script.py` which auto-detects the subclass).
|
|
323
|
+
2. **Environment Variable**: `export PROMPT_BETTER_OPTIMIZER="path.to.module:CustomOptimizer"`
|
|
324
|
+
3. **Global Config (`prompt-better.json`)**:
|
|
325
|
+
```json
|
|
326
|
+
{
|
|
327
|
+
"optimizer": "path.to.module:CustomOptimizer"
|
|
328
|
+
}
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
## 5. JSON Specifications & Schemas
|
|
334
|
+
|
|
335
|
+
`prompt-better` uses three distinct JSON models.
|
|
336
|
+
|
|
337
|
+
### A. Prompt Specification (`prompt.json`)
|
|
338
|
+
Defines the name, model configs, dynamic placeholders (context), and structured outputs.
|
|
339
|
+
|
|
340
|
+
* **Schema Location**: [prompt-schema.json](prompt_better/prompt_json/prompt-schema.json)
|
|
341
|
+
* **JSON Schema**:
|
|
342
|
+
```json
|
|
343
|
+
{
|
|
344
|
+
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
345
|
+
"title": "Prompt Definition",
|
|
346
|
+
"type": "object",
|
|
347
|
+
"properties": {
|
|
348
|
+
"name": { "type": "string", "description": "Unique identifier for the prompt." },
|
|
349
|
+
"instructions": {
|
|
350
|
+
"type": "object",
|
|
351
|
+
"properties": {
|
|
352
|
+
"prompt": { "type": "string", "description": "The system instructions or template for the model." },
|
|
353
|
+
"context": {
|
|
354
|
+
"type": "array",
|
|
355
|
+
"items": {
|
|
356
|
+
"type": "object",
|
|
357
|
+
"properties": {
|
|
358
|
+
"name": { "type": "string" },
|
|
359
|
+
"type": { "type": "string", "enum": ["string", "integer", "number", "boolean", "array"] },
|
|
360
|
+
"desc": { "type": "string" }
|
|
361
|
+
},
|
|
362
|
+
"required": ["name", "type", "desc"]
|
|
363
|
+
}
|
|
364
|
+
}
|
|
365
|
+
},
|
|
366
|
+
"required": ["prompt", "context"]
|
|
367
|
+
},
|
|
368
|
+
"config": {
|
|
369
|
+
"type": "object",
|
|
370
|
+
"properties": {
|
|
371
|
+
"model_id": { "type": "string" },
|
|
372
|
+
"temperature": { "type": "number" },
|
|
373
|
+
"top_p": { "type": "number" },
|
|
374
|
+
"top_k": { "type": "integer" },
|
|
375
|
+
"max_tokens": { "type": "integer" },
|
|
376
|
+
"stop_sequences": { "type": "array", "items": { "type": "string" } }
|
|
377
|
+
}
|
|
378
|
+
},
|
|
379
|
+
"outputs": {
|
|
380
|
+
"type": "array",
|
|
381
|
+
"items": {
|
|
382
|
+
"type": "object",
|
|
383
|
+
"properties": {
|
|
384
|
+
"name": { "type": "string" },
|
|
385
|
+
"type": { "type": "string", "enum": ["string", "integer", "number", "boolean", "array"] },
|
|
386
|
+
"items": { "type": "string" },
|
|
387
|
+
"desc": { "type": "string" }
|
|
388
|
+
},
|
|
389
|
+
"required": ["name", "type", "desc"]
|
|
390
|
+
}
|
|
391
|
+
}
|
|
392
|
+
},
|
|
393
|
+
"required": ["name", "instructions", "outputs"]
|
|
394
|
+
}
|
|
395
|
+
```
|
|
396
|
+
* **Example Specification**: [prompt.json](example/prompts/TopicClassifier/prompt.json)
|
|
397
|
+
```json
|
|
398
|
+
{
|
|
399
|
+
"name": "TopicClassifierPrompt",
|
|
400
|
+
"config": {
|
|
401
|
+
"temperature": 0.0,
|
|
402
|
+
"max_tokens": 100
|
|
403
|
+
},
|
|
404
|
+
"instructions": {
|
|
405
|
+
"prompt": "Classify the topic of the following text.\n\nText: {{text}}",
|
|
406
|
+
"context": [
|
|
407
|
+
{
|
|
408
|
+
"name": "text",
|
|
409
|
+
"type": "string",
|
|
410
|
+
"desc": "The raw input text to analyze."
|
|
411
|
+
}
|
|
412
|
+
]
|
|
413
|
+
},
|
|
414
|
+
"outputs": [
|
|
415
|
+
{
|
|
416
|
+
"name": "topic",
|
|
417
|
+
"type": "string",
|
|
418
|
+
"desc": "Must be one of: Politics, Sports, Technology, Science, Entertainment."
|
|
419
|
+
}
|
|
420
|
+
]
|
|
421
|
+
}
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
---
|
|
425
|
+
|
|
426
|
+
### B. Dataset Case Specification (`caseX.json`)
|
|
427
|
+
Defines the inputs mapped to prompt placeholders, and optional conversation history messages.
|
|
428
|
+
|
|
429
|
+
> [!TIP]
|
|
430
|
+
> For a deeper dive into dataset design and curation, we recommend reading **Chapter 8. Dataset Engineering** in the book **"AI Engineering - Building Application with Foundation Models"** by **Chip Huyen** (O'Reilly).
|
|
431
|
+
|
|
432
|
+
* **Schema Location**: [dataset-schema.json](prompt_better/dataset_manager/dataset-schema.json)
|
|
433
|
+
* **JSON Schema**:
|
|
434
|
+
```json
|
|
435
|
+
{
|
|
436
|
+
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
437
|
+
"title": "Dataset Case Definition",
|
|
438
|
+
"type": "object",
|
|
439
|
+
"properties": {
|
|
440
|
+
"id": { "type": "string", "description": "Unique identifier for this test case." },
|
|
441
|
+
"inputs": {
|
|
442
|
+
"type": "object",
|
|
443
|
+
"additionalProperties": { "type": "string" },
|
|
444
|
+
"description": "Key-value mapping of input placeholders to values."
|
|
445
|
+
},
|
|
446
|
+
"history": {
|
|
447
|
+
"type": "array",
|
|
448
|
+
"items": {
|
|
449
|
+
"type": "object",
|
|
450
|
+
"properties": {
|
|
451
|
+
"role": { "type": "string", "enum": ["user", "assistant", "system"] },
|
|
452
|
+
"content": { "type": "string" },
|
|
453
|
+
"prompt_name": { "type": "string" },
|
|
454
|
+
"inputs": { "type": "object", "additionalProperties": { "type": "string" } }
|
|
455
|
+
},
|
|
456
|
+
"required": ["role"]
|
|
457
|
+
}
|
|
458
|
+
}
|
|
459
|
+
},
|
|
460
|
+
"required": ["inputs"]
|
|
461
|
+
}
|
|
462
|
+
```
|
|
463
|
+
* **Example Payload** (`dataset/case1.json`): [dataset/case1.json](example/prompts/TopicClassifier/dataset/case1.json)
|
|
464
|
+
```json
|
|
465
|
+
{
|
|
466
|
+
"id": "case1",
|
|
467
|
+
"inputs": {
|
|
468
|
+
"text": "Mars rover successfully collects rock sample."
|
|
469
|
+
}
|
|
470
|
+
}
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
---
|
|
474
|
+
|
|
475
|
+
### C. Golden Truth Reference Specification (`caseX.json` next to dataset cases)
|
|
476
|
+
Defines expected ground truth values and human-written grading rubrics.
|
|
477
|
+
|
|
478
|
+
* **Schema Location**: [golden-schema.json](prompt_better/dataset_manager/golden-schema.json)
|
|
479
|
+
* **JSON Schema**:
|
|
480
|
+
```json
|
|
481
|
+
{
|
|
482
|
+
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
483
|
+
"title": "Golden Truth Definition",
|
|
484
|
+
"type": "object",
|
|
485
|
+
"properties": {
|
|
486
|
+
"id": { "type": "string", "description": "Unique identifier matching the test case." },
|
|
487
|
+
"reference_output": {
|
|
488
|
+
"type": "object",
|
|
489
|
+
"description": "Expected structured output key-value mapping."
|
|
490
|
+
},
|
|
491
|
+
"rubric": {
|
|
492
|
+
"type": "array",
|
|
493
|
+
"items": { "type": "string" },
|
|
494
|
+
"description": "Quality criteria for evaluation."
|
|
495
|
+
}
|
|
496
|
+
},
|
|
497
|
+
"required": ["reference_output"]
|
|
498
|
+
}
|
|
499
|
+
```
|
|
500
|
+
* **Example Payload** (`golden-truth/case1.json`): [golden-truth/case1.json](example/prompts/TopicClassifier/golden-truth/case1.json)
|
|
501
|
+
```json
|
|
502
|
+
{
|
|
503
|
+
"reference_output": {
|
|
504
|
+
"topic": "Science"
|
|
505
|
+
},
|
|
506
|
+
"rubric": [
|
|
507
|
+
"The output topic must be exactly Science."
|
|
508
|
+
]
|
|
509
|
+
}
|
|
510
|
+
```
|
|
511
|
+
|
|
512
|
+
---
|
|
513
|
+
|
|
514
|
+
## 6. References
|
|
515
|
+
|
|
516
|
+
### CLI Subcommands Reference
|
|
517
|
+
|
|
518
|
+
| Command | Description | Key Arguments |
|
|
519
|
+
| :--- | :--- | :--- |
|
|
520
|
+
| `list-prompts` | Scans for prompt specifications inside the target directory. | `--prompts-dir` |
|
|
521
|
+
| `preview-schema` | Emits the parsed Pydantic schema structure derived from the spec. | `--prompts-dir`, `--prompt` |
|
|
522
|
+
| `validate-spec` | Runs validator checks on prompt JSONs against `prompt-schema.json`. | `--prompts-dir` |
|
|
523
|
+
| `validate` | Runs evaluation cases against the Student model. | `--prompts-dir`, `--prompt`, `--dataset`, `--student-temperature`, `--teacher-eval-temperature` |
|
|
524
|
+
| `optimize` | Compiles and optimizes instructions via DSPy MIPROv2. | `--prompts-dir`, `--prompt`, `--dataset`, `--student-temperature`, `--teacher-temperature`, `--teacher-eval-temperature`, `--eval-cases-only`, `--optimizer` |
|
|
525
|
+
| `generate-golden-truth` | Generates placeholder schemas inside the target `golden-truth/` path. | `--prompts-dir`, `--dataset-dir`, `--prompt`, `--case-id`, `--teacher-temperature` |
|
|
526
|
+
| `generate` | Compiles Swift class structs conformant to `AIPromptCore`. | `--source`, `--target`, `--language swift` |
|
|
527
|
+
|
|
528
|
+
### Environment Variables
|
|
529
|
+
|
|
530
|
+
* `PROMPT_BETTER_STUDENT_BASE_URL`: API root for student completions (e.g. Vapor server: `http://localhost:8080/v1`).
|
|
531
|
+
* `PROMPT_BETTER_STUDENT_MODEL`: Model ID identifier (e.g. `apple-intelligence`).
|
|
532
|
+
* `PROMPT_BETTER_STUDENT_API_KEY`: Key used for authentication (optional/blank for localhost).
|
|
533
|
+
* `PROMPT_BETTER_STUDENT_TEMPERATURE`: Default temperature for student model completion calls (defaults to `0.2`).
|
|
534
|
+
* `PROMPT_BETTER_TEACHER_BASE_URL`: API root for the cloud teacher model (e.g. `https://api.openai.com/v1`).
|
|
535
|
+
* `PROMPT_BETTER_TEACHER_MODEL`: Teacher model ID (e.g. `gpt-4o`).
|
|
536
|
+
* `PROMPT_BETTER_TEACHER_API_KEY`: API token authorization key.
|
|
537
|
+
* `PROMPT_BETTER_TEACHER_TEMPERATURE`: General/MIPRO temperature for the teacher model when proposing prompt variations and creating samples (defaults to `0.2`).
|
|
538
|
+
* `PROMPT_BETTER_TEACHER_EVAL_TEMPERATURE`: Validation/eval temperature for the teacher model when grading/evaluating candidate outputs (defaults to `0.0` as recommended).
|
|
539
|
+
* `PROMPT_BETTER_OPTIMIZER`: Import path or file path to custom Optimizer class, or built-in modes: `"chain-of-thought"` (default) or `"predict"`.
|
|
540
|
+
|
|
541
|
+
### Scenario-Specific CLI Presets
|
|
542
|
+
|
|
543
|
+
#### On-Device & Local Silicon Testing
|
|
544
|
+
Run evaluations sequentially to avoid core contention, disable token budget validations, and disable Chain-of-Thought formatting constraints for schema-guided output targets:
|
|
545
|
+
```bash
|
|
546
|
+
python3 -m prompt_better.cli optimize \
|
|
547
|
+
--prompts-dir ./prompts \
|
|
548
|
+
--prompt MyPrompt \
|
|
549
|
+
--num-threads 1 \
|
|
550
|
+
--no-requires-permission-to-run \
|
|
551
|
+
--optimizer predict
|
|
552
|
+
```
|
|
553
|
+
|
|
554
|
+
|
|
555
|
+
#### Multi-threaded Cloud Pipelines
|
|
556
|
+
Speed up calls over remote APIs by increasing parallel compilation threads:
|
|
557
|
+
```bash
|
|
558
|
+
python3 -m prompt_better.cli optimize \
|
|
559
|
+
--prompts-dir ./prompts \
|
|
560
|
+
--prompt MyPrompt \
|
|
561
|
+
--num-threads 8 \
|
|
562
|
+
--auto medium
|
|
563
|
+
```
|
|
564
|
+
|
|
565
|
+
### Gradle Pipeline Reference
|
|
566
|
+
|
|
567
|
+
For developers using Gradle (e.g., Android, iOS, or Kotlin Multiplatform projects), a generic, reusable Gradle script plugin helper is available in [contrib/gradle/](contrib/gradle/):
|
|
568
|
+
* Refer to the [contrib/gradle/README.md](contrib/gradle/README.md) for setup and integration instructions.
|
|
569
|
+
* Once integrated, tasks like `promptBetterList`, `promptBetterValidate`, `promptBetterOptimize`, and `promptBetterGenerateSwift` will be available in your project's Gradle build pipeline.
|
|
570
|
+
|
|
571
|
+
### iOS Integration Setup (`AIPromptCore`)
|
|
572
|
+
See [AIPromptCore Framework](frameworks/AIPromptCore) for details.
|
|
573
|
+
|
|
574
|
+
> [!NOTE]
|
|
575
|
+
> Using the `AIPromptCore` framework is recommended to ensure exactly the same execution interface (parameters, parsing, and call structures) to the Apple foundational model as used during optimization. However, it is not strictly required; you can extract the optimized instruction text from the results JSON files and run them in any custom LLM setup.
|
|
576
|
+
|
|
577
|
+
1. Compile Swift targets to framework binaries:
|
|
578
|
+
```bash
|
|
579
|
+
cd frameworks/AIPromptCore && ./build_xcframework.sh
|
|
580
|
+
```
|
|
581
|
+
2. Link the binary or package reference into your application project (`Package.swift`):
|
|
582
|
+
```swift
|
|
583
|
+
.package(path: "path/to/prompt-better/frameworks/AIPromptCore")
|
|
584
|
+
```
|
|
585
|
+
3. Include generated `@Generable` Swift structs:
|
|
586
|
+
```bash
|
|
587
|
+
python3 -m prompt_better.cli generate \
|
|
588
|
+
--source example/prompts/TopicClassifier/results/optimized-prompt.json \
|
|
589
|
+
--target example/prompts/TopicClassifier/results/TopicClassifierPrompt.swift \
|
|
590
|
+
--language swift
|
|
591
|
+
```
|
|
592
|
+
|
|
593
|
+
### Books & Literature
|
|
594
|
+
|
|
595
|
+
* **AI Engineering - Building Application with Foundation Models** by *Chip Huyen* (O'Reilly)
|
|
596
|
+
* Chapter 4. Evaluate AI Systems
|
|
597
|
+
* Chapter 5. Prompt Engineering
|
|
598
|
+
* Chapter 8. Dataset Engineering
|
|
599
|
+
|
|
600
|
+
|