prompt-better 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. prompt_better-0.1.0/LICENSE +21 -0
  2. prompt_better-0.1.0/PKG-INFO +600 -0
  3. prompt_better-0.1.0/README.md +575 -0
  4. prompt_better-0.1.0/prompt_better/__init__.py +1 -0
  5. prompt_better-0.1.0/prompt_better/cli.py +292 -0
  6. prompt_better-0.1.0/prompt_better/config.py +270 -0
  7. prompt_better-0.1.0/prompt_better/dataset_manager/__init__.py +12 -0
  8. prompt_better-0.1.0/prompt_better/dataset_manager/dataset-schema.json +47 -0
  9. prompt_better-0.1.0/prompt_better/dataset_manager/golden-schema.json +23 -0
  10. prompt_better-0.1.0/prompt_better/dataset_manager/golden_generator.py +82 -0
  11. prompt_better-0.1.0/prompt_better/dataset_manager/loader.py +129 -0
  12. prompt_better-0.1.0/prompt_better/dataset_manager/metrics.py +41 -0
  13. prompt_better-0.1.0/prompt_better/dataset_manager/models.py +20 -0
  14. prompt_better-0.1.0/prompt_better/dspy_manager/__init__.py +7 -0
  15. prompt_better-0.1.0/prompt_better/dspy_manager/evaluator.py +237 -0
  16. prompt_better-0.1.0/prompt_better/dspy_manager/fallbacks.py +115 -0
  17. prompt_better-0.1.0/prompt_better/dspy_manager/models.py +49 -0
  18. prompt_better-0.1.0/prompt_better/dspy_manager/openai_structured.py +272 -0
  19. prompt_better-0.1.0/prompt_better/dspy_manager/optimizer.py +508 -0
  20. prompt_better-0.1.0/prompt_better/dspy_manager/optimizers.py +135 -0
  21. prompt_better-0.1.0/prompt_better/dspy_manager/reporter.py +122 -0
  22. prompt_better-0.1.0/prompt_better/dspy_manager/templates/prompt_summary.j2 +49 -0
  23. prompt_better-0.1.0/prompt_better/dspy_manager/utils.py +29 -0
  24. prompt_better-0.1.0/prompt_better/prompt_json/__init__.py +12 -0
  25. prompt_better-0.1.0/prompt_better/prompt_json/codegen.py +20 -0
  26. prompt_better-0.1.0/prompt_better/prompt_json/dspy_converter.py +125 -0
  27. prompt_better-0.1.0/prompt_better/prompt_json/generator.py +35 -0
  28. prompt_better-0.1.0/prompt_better/prompt_json/loader.py +44 -0
  29. prompt_better-0.1.0/prompt_better/prompt_json/models.py +142 -0
  30. prompt_better-0.1.0/prompt_better/prompt_json/prompt-schema.json +122 -0
  31. prompt_better-0.1.0/prompt_better/prompt_json/templates/swift.jinja2 +40 -0
  32. prompt_better-0.1.0/prompt_better.egg-info/PKG-INFO +600 -0
  33. prompt_better-0.1.0/prompt_better.egg-info/SOURCES.txt +37 -0
  34. prompt_better-0.1.0/prompt_better.egg-info/dependency_links.txt +1 -0
  35. prompt_better-0.1.0/prompt_better.egg-info/entry_points.txt +2 -0
  36. prompt_better-0.1.0/prompt_better.egg-info/requires.txt +6 -0
  37. prompt_better-0.1.0/prompt_better.egg-info/top_level.txt +1 -0
  38. prompt_better-0.1.0/pyproject.toml +72 -0
  39. prompt_better-0.1.0/setup.cfg +4 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Paul Hackenberger Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,600 @@
1
+ Metadata-Version: 2.4
2
+ Name: prompt-better
3
+ Version: 0.1.0
4
+ Summary: DSPy-based prompt optimization prompts.
5
+ Author-email: Paul Hackenberger <paul.hackenberger@gmail.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/pkcpkc/prompt-better
8
+ Project-URL: Bug Tracker, https://github.com/pkcpkc/prompt-better/issues
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Programming Language :: Python :: 3.9
11
+ Classifier: Programming Language :: Python :: 3.10
12
+ Classifier: Programming Language :: Python :: 3.11
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Operating System :: OS Independent
15
+ Requires-Python: >=3.9
16
+ Description-Content-Type: text/markdown
17
+ License-File: LICENSE
18
+ Requires-Dist: dspy>=2.6.0
19
+ Requires-Dist: openai>=1.50.0
20
+ Requires-Dist: pydantic>=2.0.0
21
+ Requires-Dist: Jinja2>=3.1.0
22
+ Requires-Dist: beautifulsoup4>=4.12.0
23
+ Requires-Dist: requests>=2.31.0
24
+ Dynamic: license-file
25
+
26
+ # JSON-Driven Prompt Optimization Framework based on DSPy
27
+
28
+ [![PyPI version](https://img.shields.io/pypi/v/prompt-better.svg)](https://pypi.org/project/prompt-better/)
29
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
30
+ [![Python Support](https://img.shields.io/pypi/pyversions/prompt-better.svg)](https://pypi.org/project/prompt-better/)
31
+
32
+ **_ANYTHING YOU CAN PROMPT, I CAN PROMPT BETTER!_**
33
+
34
+ ---
35
+
36
+ ## 1. What This Is About
37
+
38
+ `prompt-better` is a generic, reusable, and platform-agnostic framework designed to validate, execute, and optimize Large Language Model (LLM) prompts. Instead of hardcoding prompts inside application source files, prompts are defined in a language-agnostic data asset (`prompt.json`), establishing a **Single Source of Truth (SSOT)**.
39
+
40
+ ### Built on Top of DSPy & Pydantic
41
+ At runtime, `prompt-better` parses the JSON specification and builds type-safe Pydantic models on-the-fly. It automatically generates a corresponding **DSPy Signature** (`Inputs -> PydanticSchemaClass`), mapping unstructured model outputs into strictly validated objects.
42
+
43
+ ### Coached Student-Teacher Model Paradigm
44
+ To support resource-constrained target devices (like local 3B–8B weights), the framework uses a **Coached Student-Teacher Pipeline**:
45
+ * **Teacher Model**: A high-capacity cloud model (e.g., GPT-4o) that drafts prompt/instruction variations, synthesizes few-shot demonstrations, and evaluates execution quality against custom rubrics.
46
+ * **Student Model**: The target local or on-device model being optimized. The student executes prompt candidates during the compilation loop.
47
+
48
+ ### iOS and macOS On-Device Foundation Models
49
+ `prompt-better` integrates natively with Apple's local **LanguageModelSession** API (iOS 26+ and macOS 26+) using Vapor-based HTTP bridges:
50
+ * [macOS Vapor Bridge](AIBridges/macOS): Exposes Apple's on-device foundation models as an OpenAI-compatible REST API.
51
+ * [iOS Vapor Bridge](AIBridges/iOS): Runs directly on physical iOS devices or simulators to serve local model weights.
52
+
53
+ This allows the optimization engine to tune prompts directly for the specific hardware, quantization limits, and quirks of Apple's on-device silicon runtimes.
54
+
55
+ ---
56
+
57
+ ## 2. Short Example for Optimize
58
+
59
+ Follow these steps to optimize instructions for the sample `TopicClassifierPrompt`. A concrete reference implementation can be found in the [example/](example/) folder, which contains the [TopicClassifier](example/prompts/TopicClassifier) example prompt specification, its dataset, and automated scripts.
60
+
61
+ ### Step 1: Set Up Python Environment
62
+ Install `prompt-better` in editable mode using `uv` and trust python runtimes configured in `mise.toml`:
63
+ ```bash
64
+ # Trust and install local python versions
65
+ mise trust && mise install
66
+
67
+ # Install prompt-better locally
68
+ mise exec -- uv pip install -e .
69
+ ```
70
+
71
+ ### Step 2: Start Student Vapor Bridge (Example: macOS Bridge)
72
+ Build and run the Apple Silicon bridge server (translates on-device models to `/v1/chat/completions` endpoints):
73
+ ```bash
74
+ cd AIBridges/macOS
75
+ swift build && swift run App serve --hostname 127.0.0.1 --port 8080
76
+ ```
77
+
78
+ ### Step 3: Setup Configuration & Credentials
79
+
80
+ The framework resolves configurations hierarchically: CLI arguments > Environment variables > Configuration files (`prompt-better.json`).
81
+
82
+ 1. **Configuration File (`prompt-better.json`)**:
83
+ Create a `prompt-better.json` file in the parent folder of your prompts directory (for this example, `example/prompt-better.json`). This defines student/teacher runtimes and defaults:
84
+ ```json
85
+ {
86
+ "student": {
87
+ "base_url": "http://127.0.0.1:8080/v1",
88
+ "model": "apple-intelligence",
89
+ "temperature": 0.2
90
+ },
91
+ "teacher": {
92
+ "base_url": "https://api.openai.com/v1",
93
+ "model": "gpt-4o",
94
+ "temperature": 0.2,
95
+ "eval_temperature": 0.0
96
+ },
97
+ "auto_mode": "light",
98
+ "num_threads": 1
99
+ }
100
+ ```
101
+
102
+ 2. **Credentials (API Keys)**:
103
+ > [!IMPORTANT]
104
+ > For security, API keys **cannot** be stored in the `prompt-better.json` configuration file. Doing so will trigger a validation error.
105
+
106
+ Provide API keys using either environment variables or direct CLI arguments:
107
+ * **Via Environment Variables** (Recommended):
108
+ ```bash
109
+ export PROMPT_BETTER_TEACHER_API_KEY="sk-proj-..."
110
+ ```
111
+ * **Via CLI Flags**:
112
+ Pass `--teacher-api-key "sk-proj-..."` directly during command execution.
113
+
114
+ ### Step 4: Run DSPy Optimization
115
+ Run the `optimize` command. Pass `--no-requires-permission-to-run` to bypass estimated token cost warnings when targeting free local endpoints:
116
+ ```bash
117
+ python3 -m prompt_better.cli optimize \
118
+ --prompts-dir example/prompts \
119
+ --prompt TopicClassifierPrompt \
120
+ --no-requires-permission-to-run
121
+ ```
122
+
123
+ During execution, the DSPy `MIPROv2` compiler will propose instruction re-writes via the Teacher model, run them on the Student bridge, and evaluate output quality.
124
+
125
+ The optimization output is written to:
126
+ * `results/optimized-prompt.json`: Optimized prompt with winning instructions.
127
+ * `results/optimize-report.json`: Metrics report and validation summary.
128
+ * `results/dspy.json`: Serialized DSPy compile state.
129
+
130
+ ---
131
+
132
+ ## 3. Subcommand `validate`
133
+
134
+ The `validate` subcommand evaluates the **status quo** of your prompt's baseline instructions against a target dataset. It does not perform instruction rewriting or few-shot compilation. Instead, it measures how well the target Student model conforms to structure and accuracy guidelines under the current prompt instructions.
135
+
136
+ > [!TIP]
137
+ > For a deeper conceptual foundation on evaluating AI systems, we recommend referring to the book **"AI Engineering - Building Application with Foundation Models"** by **Chip Huyen** (O'Reilly), specifically **Chapter 4. Evaluate AI Systems**.
138
+
139
+ ### Mathematical Scoring Formulas
140
+
141
+ For each evaluation case, the candidate output is rated between `0.0` and `1.0` using weighted scores. Since a Teacher Model is required, the validator fetches a semantic grade from the teacher and averages it with structural and similarity scores:
142
+
143
+ $$\text{Aggregate Score} = \frac{(0.55 \times S_{\text{structural}} + 0.45 \times S_{\text{similarity}}) + S_{\text{teacher}}}{2}$$
144
+
145
+ ### Scoring Metrics & Code References
146
+
147
+ The scores are resolved in code inside [evaluator.py](prompt_better/dspy_manager/evaluator.py) via a resolved `Evaluator` instance (by default, `DefaultEvaluator` inherits from `BaseEvaluator`):
148
+
149
+ 1. **Validation Loop**: [validate_prompt](prompt_better/dspy_manager/optimizer.py) iterates through prompt files and gathers results using `_validate_single_example`.
150
+ 2. **Structural Score ($S_{\text{structural}}$)**: Calculated in `structural_score`. It verifies:
151
+ * Fields map precisely to target JSON schema types.
152
+ * Array counts match specified boundaries (e.g. `min_count`, `max_count`).
153
+ 3. **Similarity Score ($S_{\text{similarity}}$)**: Calculated in `similarity_score`. It measures token-level F1 overlap (precision and recall of matching tokens) between the generated values and the golden-truth references.
154
+ 4. **Teacher Score ($S_{\text{teacher}}$)**: Resolved via `teacher_score`. The Teacher model receives a structured grading schema containing the prompt instructions, inputs, candidate output, reference output, and quality rubric. It responds with a numeric score (`0.0` to `1.0`) and a text justification.
155
+
156
+ ### Custom Evaluator Implementations
157
+
158
+ You can customize the evaluation and scoring by providing your own evaluator subclassing `BaseEvaluator`:
159
+
160
+ ```python
161
+ from prompt_better.dspy_manager import BaseEvaluator
162
+
163
+ class CustomEvaluator(BaseEvaluator):
164
+ def structural_score(self, spec, candidate) -> float:
165
+ # Custom structural scoring
166
+ return 1.0
167
+
168
+ def similarity_score(self, spec, reference, candidate) -> float:
169
+ # Custom similarity scoring
170
+ return 0.8
171
+ ```
172
+
173
+ #### Setting the Custom Evaluator
174
+
175
+ You can configure the custom evaluator dynamically in three ways (resolved hierarchically):
176
+
177
+ 1. **CLI Flag**: Specify `--evaluator path.to.module:CustomEvaluator` or file path `path/to/script.py:CustomEvaluator` (or simply `path/to/script.py` which auto-detects the subclass).
178
+ 2. **Environment Variable**: `export PROMPT_BETTER_EVALUATOR="path.to.module:CustomEvaluator"`
179
+ 3. **Global Config (`prompt-better.json`)**:
180
+ ```json
181
+ {
182
+ "evaluator": "path.to.module:CustomEvaluator"
183
+ }
184
+ ```
185
+
186
+ ### Validation Flow Diagram
187
+
188
+ ```mermaid
189
+ flowchart TD
190
+ Start["Start cli validate"] --> Scan["Scan --prompts-dir for prompt.json"]
191
+ Scan --> LoadDataset["Load dataset/ and golden-truth/ case cases"]
192
+ LoadDataset --> Iterate["For each test case..."]
193
+ Iterate --> BuildPayload["Compile prompt instructions + case inputs"]
194
+ BuildPayload --> RunStudent["Invoke Student Model via HTTP Bridge"]
195
+ RunStudent --> ParseOutput["Parse output JSON schema"]
196
+ ParseOutput --> CalcStruct["Calculate Structural Score (55% weight)"]
197
+ ParseOutput --> CalcSim["Calculate Token F1 Similarity (45% weight)"]
198
+
199
+ CalcStruct --> CallTeacher["Invoke Teacher Model to grade output against rubric"]
200
+ CalcSim --> CallTeacher
201
+ CallTeacher --> CalcTeacher["Average (Base Scores + Teacher Score)"]
202
+ CalcTeacher --> SaveResult["Save ValidationResult metrics"]
203
+
204
+ SaveResult --> CheckMore{More cases?}
205
+ CheckMore -- Yes --> Iterate
206
+ CheckMore -- No --> GenerateReport["Output JSON report & print summary stats"]
207
+ GenerateReport --> End["End validate"]
208
+ ```
209
+
210
+ ---
211
+
212
+ ## 4. Subcommand `optimize`
213
+
214
+ The `optimize` subcommand applies per default the DSPy **MIPROv2** (Multi-prompt Instruction Proposal and Few-shot Optimization) compiler to find the best instructions and few-shot examples for your target student model.
215
+
216
+ > [!TIP]
217
+ > For a deeper conceptual foundation on engineering and compiling prompts, we recommend referring to the book **"AI Engineering - Building Application with Foundation Models"** by **Chip Huyen** (O'Reilly), specifically **Chapter 5. Prompt Engineering**.
218
+
219
+ > [!NOTE]
220
+ > By default, the final evaluation at the end of optimization runs on **all dataset cases** to print a complete baseline vs optimized comparison table. To run the final evaluation only on the held-out evaluation set (`evalset`) slice, specify the `--eval-cases-only` flag.
221
+
222
+ > [!IMPORTANT]
223
+ > **iOS & On-Device Model Recommendation**: If you compile your prompt specification to a Swift type conformant to `GenerableWithPrompt` (which utilizes Apple's native schema-guided structured output), you have two options for handling optimization:
224
+ >
225
+ > * **Option 1: Direct Prediction (Recommended for speed/simplicity)**: Optimize your prompt using `--optimizer predict` (or set `"optimizer": "predict"` in `prompt-better.json`). This compiles the prompt using `dspy.Predict` instead of the default `dspy.ChainOfThought` (which uses `"chain-of-thought"`). Because the default CoT generates formatting instructions instructing the model to output intermediate reasoning with text prefixes (e.g. `Reasoning:` and `Output:`), it conflicts with Apple's native structured JSON schema constraint (where no `reasoning` field exists), leading to parsing or validation errors. Running with `predict` compiles cleanly without these text prefixes.
226
+ >
227
+ > * **Option 2: Schema-guided Chain of Thought (Recommended for accuracy)**: If the target model needs step-by-step reasoning to deliver accurate outputs, you must explicitly model the reasoning field inside your `prompt.json` output schema:
228
+ > ```json
229
+ > "outputs": [
230
+ > {
231
+ > "name": "reasoning",
232
+ > "type": "string",
233
+ > "desc": "Explanation of the context based on domain-specific lexical cues."
234
+ > },
235
+ > {
236
+ > "name": "topic",
237
+ > "type": "string",
238
+ > "desc": "The final classified topic category."
239
+ > }
240
+ > ]
241
+ > ```
242
+ > When compiled, the generated Swift struct will contain both fields as `@Guide` properties:
243
+ > ```swift
244
+ > @Generable
245
+ > struct TopicClassifierPrompt: GenerableWithPrompt {
246
+ > @Guide(description: "Explanation of the context...")
247
+ > var reasoning: String
248
+ >
249
+ > @Guide(description: "The final classified topic category.")
250
+ > var topic: String
251
+ > }
252
+ > ```
253
+ > This aligns the prompt's reasoning instructions with the Swift schema structure, allowing Apple's native session to parse the intermediate reasoning step successfully.
254
+
255
+
256
+
257
+ ### Optimization Workflow
258
+ 1. **Splitting Dataset**: The command loads optimization cases and splits them into training and evaluation sets based on `--train-ratio` (default `0.8`).
259
+ 2. **Generating Candidates (Teacher)**: The high-capacity Teacher model reads the baseline specifications, analyzes errors from initial runs, and generates instruction proposals (candidates).
260
+ 3. **Compiling Few-Shot Demonstrations**: DSPy selects successful execution traces from the student model running on the training dataset to include as few-shot bootstrap examples in the compiled prompt.
261
+ 4. **Evaluation Iteration**: Candidate instruction proposals are evaluated against the training and validation sets on the student model.
262
+ 5. **Selecting the Winner**: The combination of instructions and few-shot demonstrations yielding the highest aggregate score is compiled and saved.
263
+
264
+ ### Optimization Flow Diagram
265
+
266
+ ```mermaid
267
+ flowchart TD
268
+ Start["Start cli optimize"] --> Load["Load specifications & split dataset (Train vs. Eval)"]
269
+ Load --> CreateSignature["Map prompt context & outputs to DSPy Signature"]
270
+ CreateSignature --> EvalBaseline["Run Student baseline to calculate baseline score"]
271
+
272
+ EvalBaseline --> InitMIPRO["Initialize DSPy MIPROv2 Optimizer"]
273
+ InitMIPRO --> TeacherGenerate["Teacher proposes new instruction variations"]
274
+
275
+ subgraph CompLoop ["Optimization Compilation Loop (num_trials)"]
276
+ TeacherGenerate --> CompileCandidate["Combine candidate instructions + few-shot traces"]
277
+ CompileCandidate --> ExecStudent["Execute candidates on Student model over Trainset"]
278
+ ExecStudent --> GradeCandidate["Calculate metric score (Structural + Similarity)"]
279
+ GradeCandidate --> SelectBest["Track best-performing prompt configuration"]
280
+ end
281
+
282
+ SelectBest --> TestWinner["Evaluate winning prompt against Evalset"]
283
+ TestWinner --> SaveDSPy["Serialize compiled weights to dspy.json"]
284
+ TestWinner --> SaveOptimized["Write prompt.json replacement to optimized-prompt.json"]
285
+ TestWinner --> WriteReport["Generate optimization optimize-report.json"]
286
+
287
+ WriteReport --> ApplyFlag{--apply flag set?}
288
+ ApplyFlag -- Yes --> OverwriteSource["Overwrite source prompt.json with winning instructions"]
289
+ ApplyFlag -- No --> End["End optimize"]
290
+ OverwriteSource --> End
291
+ ```
292
+
293
+ ### Custom Optimizer Implementations
294
+
295
+ By default, prompt optimization uses the DSPy `MIPROv2` compiler. You can customize the optimization and compilation process by providing your own optimizer subclassing `BaseOptimizer`:
296
+
297
+ ```python
298
+ from prompt_better.dspy_manager import BaseOptimizer
299
+
300
+ class CustomOptimizer(BaseOptimizer):
301
+ def compile(
302
+ self,
303
+ config,
304
+ spec,
305
+ specs,
306
+ student_lm,
307
+ teacher_lm,
308
+ trainset,
309
+ evalset,
310
+ metric,
311
+ module,
312
+ ):
313
+ # Implement custom compilation or training loops
314
+ ...
315
+ return compiled_module
316
+ ```
317
+
318
+ #### Setting the Custom Optimizer
319
+
320
+ You can configure the custom optimizer dynamically in three ways (resolved hierarchically):
321
+
322
+ 1. **CLI Flag**: Specify `--optimizer path.to.module:CustomOptimizer` or file path `path/to/script.py:CustomOptimizer` (or simply `path/to/script.py` which auto-detects the subclass).
323
+ 2. **Environment Variable**: `export PROMPT_BETTER_OPTIMIZER="path.to.module:CustomOptimizer"`
324
+ 3. **Global Config (`prompt-better.json`)**:
325
+ ```json
326
+ {
327
+ "optimizer": "path.to.module:CustomOptimizer"
328
+ }
329
+ ```
330
+
331
+ ---
332
+
333
+ ## 5. JSON Specifications & Schemas
334
+
335
+ `prompt-better` uses three distinct JSON models.
336
+
337
+ ### A. Prompt Specification (`prompt.json`)
338
+ Defines the name, model configs, dynamic placeholders (context), and structured outputs.
339
+
340
+ * **Schema Location**: [prompt-schema.json](prompt_better/prompt_json/prompt-schema.json)
341
+ * **JSON Schema**:
342
+ ```json
343
+ {
344
+ "$schema": "http://json-schema.org/draft-07/schema#",
345
+ "title": "Prompt Definition",
346
+ "type": "object",
347
+ "properties": {
348
+ "name": { "type": "string", "description": "Unique identifier for the prompt." },
349
+ "instructions": {
350
+ "type": "object",
351
+ "properties": {
352
+ "prompt": { "type": "string", "description": "The system instructions or template for the model." },
353
+ "context": {
354
+ "type": "array",
355
+ "items": {
356
+ "type": "object",
357
+ "properties": {
358
+ "name": { "type": "string" },
359
+ "type": { "type": "string", "enum": ["string", "integer", "number", "boolean", "array"] },
360
+ "desc": { "type": "string" }
361
+ },
362
+ "required": ["name", "type", "desc"]
363
+ }
364
+ }
365
+ },
366
+ "required": ["prompt", "context"]
367
+ },
368
+ "config": {
369
+ "type": "object",
370
+ "properties": {
371
+ "model_id": { "type": "string" },
372
+ "temperature": { "type": "number" },
373
+ "top_p": { "type": "number" },
374
+ "top_k": { "type": "integer" },
375
+ "max_tokens": { "type": "integer" },
376
+ "stop_sequences": { "type": "array", "items": { "type": "string" } }
377
+ }
378
+ },
379
+ "outputs": {
380
+ "type": "array",
381
+ "items": {
382
+ "type": "object",
383
+ "properties": {
384
+ "name": { "type": "string" },
385
+ "type": { "type": "string", "enum": ["string", "integer", "number", "boolean", "array"] },
386
+ "items": { "type": "string" },
387
+ "desc": { "type": "string" }
388
+ },
389
+ "required": ["name", "type", "desc"]
390
+ }
391
+ }
392
+ },
393
+ "required": ["name", "instructions", "outputs"]
394
+ }
395
+ ```
396
+ * **Example Specification**: [prompt.json](example/prompts/TopicClassifier/prompt.json)
397
+ ```json
398
+ {
399
+ "name": "TopicClassifierPrompt",
400
+ "config": {
401
+ "temperature": 0.0,
402
+ "max_tokens": 100
403
+ },
404
+ "instructions": {
405
+ "prompt": "Classify the topic of the following text.\n\nText: {{text}}",
406
+ "context": [
407
+ {
408
+ "name": "text",
409
+ "type": "string",
410
+ "desc": "The raw input text to analyze."
411
+ }
412
+ ]
413
+ },
414
+ "outputs": [
415
+ {
416
+ "name": "topic",
417
+ "type": "string",
418
+ "desc": "Must be one of: Politics, Sports, Technology, Science, Entertainment."
419
+ }
420
+ ]
421
+ }
422
+ ```
423
+
424
+ ---
425
+
426
+ ### B. Dataset Case Specification (`caseX.json`)
427
+ Defines the inputs mapped to prompt placeholders, and optional conversation history messages.
428
+
429
+ > [!TIP]
430
+ > For a deeper dive into dataset design and curation, we recommend reading **Chapter 8. Dataset Engineering** in the book **"AI Engineering - Building Application with Foundation Models"** by **Chip Huyen** (O'Reilly).
431
+
432
+ * **Schema Location**: [dataset-schema.json](prompt_better/dataset_manager/dataset-schema.json)
433
+ * **JSON Schema**:
434
+ ```json
435
+ {
436
+ "$schema": "http://json-schema.org/draft-07/schema#",
437
+ "title": "Dataset Case Definition",
438
+ "type": "object",
439
+ "properties": {
440
+ "id": { "type": "string", "description": "Unique identifier for this test case." },
441
+ "inputs": {
442
+ "type": "object",
443
+ "additionalProperties": { "type": "string" },
444
+ "description": "Key-value mapping of input placeholders to values."
445
+ },
446
+ "history": {
447
+ "type": "array",
448
+ "items": {
449
+ "type": "object",
450
+ "properties": {
451
+ "role": { "type": "string", "enum": ["user", "assistant", "system"] },
452
+ "content": { "type": "string" },
453
+ "prompt_name": { "type": "string" },
454
+ "inputs": { "type": "object", "additionalProperties": { "type": "string" } }
455
+ },
456
+ "required": ["role"]
457
+ }
458
+ }
459
+ },
460
+ "required": ["inputs"]
461
+ }
462
+ ```
463
+ * **Example Payload** (`dataset/case1.json`): [dataset/case1.json](example/prompts/TopicClassifier/dataset/case1.json)
464
+ ```json
465
+ {
466
+ "id": "case1",
467
+ "inputs": {
468
+ "text": "Mars rover successfully collects rock sample."
469
+ }
470
+ }
471
+ ```
472
+
473
+ ---
474
+
475
+ ### C. Golden Truth Reference Specification (`caseX.json` next to dataset cases)
476
+ Defines expected ground truth values and human-written grading rubrics.
477
+
478
+ * **Schema Location**: [golden-schema.json](prompt_better/dataset_manager/golden-schema.json)
479
+ * **JSON Schema**:
480
+ ```json
481
+ {
482
+ "$schema": "http://json-schema.org/draft-07/schema#",
483
+ "title": "Golden Truth Definition",
484
+ "type": "object",
485
+ "properties": {
486
+ "id": { "type": "string", "description": "Unique identifier matching the test case." },
487
+ "reference_output": {
488
+ "type": "object",
489
+ "description": "Expected structured output key-value mapping."
490
+ },
491
+ "rubric": {
492
+ "type": "array",
493
+ "items": { "type": "string" },
494
+ "description": "Quality criteria for evaluation."
495
+ }
496
+ },
497
+ "required": ["reference_output"]
498
+ }
499
+ ```
500
+ * **Example Payload** (`golden-truth/case1.json`): [golden-truth/case1.json](example/prompts/TopicClassifier/golden-truth/case1.json)
501
+ ```json
502
+ {
503
+ "reference_output": {
504
+ "topic": "Science"
505
+ },
506
+ "rubric": [
507
+ "The output topic must be exactly Science."
508
+ ]
509
+ }
510
+ ```
511
+
512
+ ---
513
+
514
+ ## 6. References
515
+
516
+ ### CLI Subcommands Reference
517
+
518
+ | Command | Description | Key Arguments |
519
+ | :--- | :--- | :--- |
520
+ | `list-prompts` | Scans for prompt specifications inside the target directory. | `--prompts-dir` |
521
+ | `preview-schema` | Emits the parsed Pydantic schema structure derived from the spec. | `--prompts-dir`, `--prompt` |
522
+ | `validate-spec` | Runs validator checks on prompt JSONs against `prompt-schema.json`. | `--prompts-dir` |
523
+ | `validate` | Runs evaluation cases against the Student model. | `--prompts-dir`, `--prompt`, `--dataset`, `--student-temperature`, `--teacher-eval-temperature` |
524
+ | `optimize` | Compiles and optimizes instructions via DSPy MIPROv2. | `--prompts-dir`, `--prompt`, `--dataset`, `--student-temperature`, `--teacher-temperature`, `--teacher-eval-temperature`, `--eval-cases-only`, `--optimizer` |
525
+ | `generate-golden-truth` | Generates placeholder schemas inside the target `golden-truth/` path. | `--prompts-dir`, `--dataset-dir`, `--prompt`, `--case-id`, `--teacher-temperature` |
526
+ | `generate` | Compiles Swift class structs conformant to `AIPromptCore`. | `--source`, `--target`, `--language swift` |
527
+
528
+ ### Environment Variables
529
+
530
+ * `PROMPT_BETTER_STUDENT_BASE_URL`: API root for student completions (e.g. Vapor server: `http://localhost:8080/v1`).
531
+ * `PROMPT_BETTER_STUDENT_MODEL`: Model ID identifier (e.g. `apple-intelligence`).
532
+ * `PROMPT_BETTER_STUDENT_API_KEY`: Key used for authentication (optional/blank for localhost).
533
+ * `PROMPT_BETTER_STUDENT_TEMPERATURE`: Default temperature for student model completion calls (defaults to `0.2`).
534
+ * `PROMPT_BETTER_TEACHER_BASE_URL`: API root for the cloud teacher model (e.g. `https://api.openai.com/v1`).
535
+ * `PROMPT_BETTER_TEACHER_MODEL`: Teacher model ID (e.g. `gpt-4o`).
536
+ * `PROMPT_BETTER_TEACHER_API_KEY`: API token authorization key.
537
+ * `PROMPT_BETTER_TEACHER_TEMPERATURE`: General/MIPRO temperature for the teacher model when proposing prompt variations and creating samples (defaults to `0.2`).
538
+ * `PROMPT_BETTER_TEACHER_EVAL_TEMPERATURE`: Validation/eval temperature for the teacher model when grading/evaluating candidate outputs (defaults to `0.0` as recommended).
539
+ * `PROMPT_BETTER_OPTIMIZER`: Import path or file path to custom Optimizer class, or built-in modes: `"chain-of-thought"` (default) or `"predict"`.
540
+
541
+ ### Scenario-Specific CLI Presets
542
+
543
+ #### On-Device & Local Silicon Testing
544
+ Run evaluations sequentially to avoid core contention, disable token budget validations, and disable Chain-of-Thought formatting constraints for schema-guided output targets:
545
+ ```bash
546
+ python3 -m prompt_better.cli optimize \
547
+ --prompts-dir ./prompts \
548
+ --prompt MyPrompt \
549
+ --num-threads 1 \
550
+ --no-requires-permission-to-run \
551
+ --optimizer predict
552
+ ```
553
+
554
+
555
+ #### Multi-threaded Cloud Pipelines
556
+ Speed up calls over remote APIs by increasing parallel compilation threads:
557
+ ```bash
558
+ python3 -m prompt_better.cli optimize \
559
+ --prompts-dir ./prompts \
560
+ --prompt MyPrompt \
561
+ --num-threads 8 \
562
+ --auto medium
563
+ ```
564
+
565
+ ### Gradle Pipeline Reference
566
+
567
+ For developers using Gradle (e.g., Android, iOS, or Kotlin Multiplatform projects), a generic, reusable Gradle script plugin helper is available in [contrib/gradle/](contrib/gradle/):
568
+ * Refer to the [contrib/gradle/README.md](contrib/gradle/README.md) for setup and integration instructions.
569
+ * Once integrated, tasks like `promptBetterList`, `promptBetterValidate`, `promptBetterOptimize`, and `promptBetterGenerateSwift` will be available in your project's Gradle build pipeline.
570
+
571
+ ### iOS Integration Setup (`AIPromptCore`)
572
+ See [AIPromptCore Framework](frameworks/AIPromptCore) for details.
573
+
574
+ > [!NOTE]
575
+ > Using the `AIPromptCore` framework is recommended to ensure exactly the same execution interface (parameters, parsing, and call structures) to the Apple foundational model as used during optimization. However, it is not strictly required; you can extract the optimized instruction text from the results JSON files and run them in any custom LLM setup.
576
+
577
+ 1. Compile Swift targets to framework binaries:
578
+ ```bash
579
+ cd frameworks/AIPromptCore && ./build_xcframework.sh
580
+ ```
581
+ 2. Link the binary or package reference into your application project (`Package.swift`):
582
+ ```swift
583
+ .package(path: "path/to/prompt-better/frameworks/AIPromptCore")
584
+ ```
585
+ 3. Include generated `@Generable` Swift structs:
586
+ ```bash
587
+ python3 -m prompt_better.cli generate \
588
+ --source example/prompts/TopicClassifier/results/optimized-prompt.json \
589
+ --target example/prompts/TopicClassifier/results/TopicClassifierPrompt.swift \
590
+ --language swift
591
+ ```
592
+
593
+ ### Books & Literature
594
+
595
+ * **AI Engineering - Building Application with Foundation Models** by *Chip Huyen* (O'Reilly)
596
+ * Chapter 4. Evaluate AI Systems
597
+ * Chapter 5. Prompt Engineering
598
+ * Chapter 8. Dataset Engineering
599
+
600
+