compare-prompts 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Omar Mashal
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,266 @@
1
+ Metadata-Version: 2.4
2
+ Name: compare-prompts
3
+ Version: 0.1.0
4
+ Summary: Compare LLM prompts side by side — no config, no dashboard, just a table
5
+ Author-email: Omar Mashal <omarmashal@example.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/OmarMashal0/promptdiff
8
+ Project-URL: Repository, https://github.com/OmarMashal0/promptdiff
9
+ Project-URL: Issues, https://github.com/OmarMashal0/promptdiff/issues
10
+ Project-URL: Documentation, https://github.com/OmarMashal0/promptdiff#readme
11
+ Project-URL: Changelog, https://github.com/OmarMashal0/promptdiff/blob/main/CHANGELOG.md
12
+ Keywords: llm,prompt,comparison,diff,ai,openai,anthropic,gemini,evaluation
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Programming Language :: Python :: 3.13
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Classifier: Topic :: Software Development :: Testing
24
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
25
+ Requires-Python: >=3.9
26
+ Description-Content-Type: text/markdown
27
+ License-File: LICENSE
28
+ Requires-Dist: litellm>=1.0.0
29
+ Requires-Dist: rich>=13.0.0
30
+ Requires-Dist: python-dotenv>=1.0.0
31
+ Requires-Dist: textstat>=0.7.0
32
+ Requires-Dist: click>=8.0.0
33
+ Provides-Extra: dev
34
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
35
+ Requires-Dist: pytest-mock>=3.0.0; extra == "dev"
36
+ Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
37
+ Dynamic: license-file
38
+
39
+ # promptdiff
40
+
41
+ [![PyPI version](https://badge.fury.io/py/compare-prompts.svg)](https://pypi.org/project/compare-prompts/)
42
+ [![CI](https://github.com/OmarMashal0/promptdiff/actions/workflows/ci.yml/badge.svg)](https://github.com/OmarMashal0/promptdiff/actions)
43
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
44
+ [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
45
+
46
+ **Compare LLM prompts side by side. No config files. No dashboards. No signup.**
47
+
48
+ ```bash
49
+ pip install compare-prompts
50
+ ```
51
+
52
+ ---
53
+
54
+ ## The problem
55
+
56
+ You have two (or more) prompts. You changed one word. Did it actually change anything? Right now:
57
+ - Running them manually and eyeballing outputs takes 30 minutes
58
+ - Setting up promptfoo requires YAML config and predefined "correct" answers
59
+ - Platforms like Braintrust/LangSmith require signup and send data to a dashboard
60
+
61
+ **promptdiff is the missing middle ground** — run it in your script, get a table in your terminal.
62
+
63
+ ---
64
+
65
+ ## Quickstart
66
+
67
+ ### Step 1 — Install
68
+
69
+ ```bash
70
+ pip install compare-prompts
71
+ ```
72
+
73
+ ### Step 2 — Generate a starter file (optional)
74
+
75
+ ```bash
76
+ promptdiff init
77
+ ```
78
+
79
+ This creates a `test_prompts.py` file you can edit immediately.
80
+
81
+ ### Step 3 — Or write your own comparison
82
+
83
+ ```python
84
+ from promptdiff import compare
85
+
86
+ compare(
87
+ prompts={
88
+ "original": "You are a helpful assistant.",
89
+ "concise": "You are a concise helpful assistant.",
90
+ },
91
+ inputs=[
92
+ "Explain what a database is.",
93
+ "What is recursion?",
94
+ "Write a short poem about coding.",
95
+ ],
96
+ model="gpt-4o-mini"
97
+ )
98
+ ```
99
+
100
+ ### Step 4 — Run it
101
+
102
+ ```bash
103
+ python test_prompts.py
104
+ ```
105
+
106
+ ### Step 5 — See results
107
+
108
+ ```
109
+ Running 2 prompts x 3 inputs = 6 calls... done
110
+
111
+ Prompt Comparison Results
112
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
113
+ avg length (tokens) 187 61 (-67%)
114
+ tone warm neutral
115
+ uses lists 67% 33%
116
+ uses headers 33% 0%
117
+ avg cost (USD) $0.0021 $0.0009
118
+ refusal rate 0% 0%
119
+ reading level high school middle school
120
+ ```
121
+
122
+ ---
123
+
124
+ ## Where to put this in your project
125
+
126
+ ```
127
+ your-project/ <- your existing project
128
+ ├── main.py <- don't touch this
129
+ ├── prompts.py <- don't touch this
130
+ ├── .env <- don't touch this (already has your API key)
131
+ └── test_prompts.py <- create this one new file
132
+ ```
133
+
134
+ Import your prompts directly from your existing code:
135
+
136
+ ```python
137
+ from promptdiff import compare
138
+ from prompts import PROMPT_V1, PROMPT_V2
139
+
140
+ compare(
141
+ prompts={"v1": PROMPT_V1, "v2": PROMPT_V2},
142
+ inputs=["your test questions here"],
143
+ model="gpt-4o-mini"
144
+ )
145
+ ```
146
+
147
+ ---
148
+
149
+ ## Setup your API key
150
+
151
+ Create a `.env` file in your project root (or use an existing one):
152
+
153
+ ```bash
154
+ # Only one key is needed — whichever provider you use
155
+ OPENAI_API_KEY=sk-...
156
+ ```
157
+
158
+ promptdiff automatically reads `.env` files. No extra configuration.
159
+
160
+ ### Get an API key
161
+
162
+ | Provider | Link | Env variable | Free tier? |
163
+ |---|---|---|---|
164
+ | OpenAI | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` | No |
165
+ | Anthropic | [console.anthropic.com](https://console.anthropic.com/settings/keys) | `ANTHROPIC_API_KEY` | No |
166
+ | Google Gemini | [aistudio.google.com/apikey](https://aistudio.google.com/apikey) | `GEMINI_API_KEY` | Yes |
167
+ | Groq | [console.groq.com/keys](https://console.groq.com/keys) | `GROQ_API_KEY` | Yes |
168
+ | Ollama | [ollama.com](https://ollama.com) | None needed | Yes (local) |
169
+
170
+ ---
171
+
172
+ ## Supported models
173
+
174
+ Any model supported by [LiteLLM](https://litellm.ai) works (2,600+ models):
175
+
176
+ ```python
177
+ compare(..., model="gpt-4o-mini") # OpenAI
178
+ compare(..., model="gpt-4o") # OpenAI
179
+ compare(..., model="claude-haiku-4-5") # Anthropic
180
+ compare(..., model="claude-sonnet-4-6") # Anthropic
181
+ compare(..., model="gemini/gemini-2.0-flash") # Google Gemini
182
+ compare(..., model="groq/llama-3.3-70b-versatile") # Groq (free)
183
+ compare(..., model="ollama/llama3") # Ollama (local, free)
184
+ compare(..., model="deepseek/deepseek-chat") # DeepSeek
185
+ ```
186
+
187
+ Full list of all supported models: [models.litellm.ai](https://models.litellm.ai)
188
+
189
+ ---
190
+
191
+ ## Compare more than 2 prompts
192
+
193
+ ```python
194
+ compare(
195
+ prompts={
196
+ "baseline": "You are a helpful assistant.",
197
+ "concise": "You are a concise helpful assistant.",
198
+ "formal": "You are a professional formal assistant.",
199
+ "friendly": "You are a warm friendly assistant.",
200
+ },
201
+ inputs=["your test questions"]
202
+ )
203
+ ```
204
+
205
+ Each prompt becomes a column. Same table, more columns.
206
+
207
+ ---
208
+
209
+ ## See raw outputs
210
+
211
+ ```python
212
+ compare(
213
+ prompts={...},
214
+ inputs=[...],
215
+ show_outputs=True
216
+ )
217
+ ```
218
+
219
+ Prints each raw LLM response below the table, grouped by input.
220
+
221
+ ---
222
+
223
+ ## Faster execution with async
224
+
225
+ For many prompt+input combinations, run calls concurrently:
226
+
227
+ ```python
228
+ compare(
229
+ prompts={...},
230
+ inputs=[...],
231
+ use_async=True
232
+ )
233
+ ```
234
+
235
+ ---
236
+
237
+ ## What it measures
238
+
239
+ | Metric | Description |
240
+ |---|---|
241
+ | avg length (tokens) | Average response length in tokens |
242
+ | tone | Detected tone: neutral, formal, warm, or technical |
243
+ | uses lists | % of responses using bullet points or numbered lists |
244
+ | uses headers | % of responses using markdown headers |
245
+ | uses code blocks | % of responses using fenced code blocks |
246
+ | avg cost (USD) | Estimated cost per response based on token usage |
247
+ | refusal rate | % of responses that refused to answer |
248
+ | reading level | elementary / middle school / high school / college |
249
+ | avg sentence length | Average number of words per sentence |
250
+
251
+ ---
252
+
253
+ ## Why not promptfoo?
254
+
255
+ promptfoo is excellent. Use it if you need CI/CD integration, red-teaming,
256
+ or assertion-based testing with expected outputs.
257
+
258
+ **promptdiff is for when you just want to run prompts right now** and see how they
259
+ behave differently — no YAML, no config, no web server, no predefined "correct"
260
+ answers. Just a table in your terminal.
261
+
262
+ ---
263
+
264
+ ## License
265
+
266
+ MIT
@@ -0,0 +1,228 @@
1
+ # promptdiff
2
+
3
+ [![PyPI version](https://badge.fury.io/py/compare-prompts.svg)](https://pypi.org/project/compare-prompts/)
4
+ [![CI](https://github.com/OmarMashal0/promptdiff/actions/workflows/ci.yml/badge.svg)](https://github.com/OmarMashal0/promptdiff/actions)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
+ [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
7
+
8
+ **Compare LLM prompts side by side. No config files. No dashboards. No signup.**
9
+
10
+ ```bash
11
+ pip install compare-prompts
12
+ ```
13
+
14
+ ---
15
+
16
+ ## The problem
17
+
18
+ You have two (or more) prompts. You changed one word. Did it actually change anything? Right now:
19
+ - Running them manually and eyeballing outputs takes 30 minutes
20
+ - Setting up promptfoo requires YAML config and predefined "correct" answers
21
+ - Platforms like Braintrust/LangSmith require signup and send data to a dashboard
22
+
23
+ **promptdiff is the missing middle ground** — run it in your script, get a table in your terminal.
24
+
25
+ ---
26
+
27
+ ## Quickstart
28
+
29
+ ### Step 1 — Install
30
+
31
+ ```bash
32
+ pip install compare-prompts
33
+ ```
34
+
35
+ ### Step 2 — Generate a starter file (optional)
36
+
37
+ ```bash
38
+ promptdiff init
39
+ ```
40
+
41
+ This creates a `test_prompts.py` file you can edit immediately.
42
+
43
+ ### Step 3 — Or write your own comparison
44
+
45
+ ```python
46
+ from promptdiff import compare
47
+
48
+ compare(
49
+ prompts={
50
+ "original": "You are a helpful assistant.",
51
+ "concise": "You are a concise helpful assistant.",
52
+ },
53
+ inputs=[
54
+ "Explain what a database is.",
55
+ "What is recursion?",
56
+ "Write a short poem about coding.",
57
+ ],
58
+ model="gpt-4o-mini"
59
+ )
60
+ ```
61
+
62
+ ### Step 4 — Run it
63
+
64
+ ```bash
65
+ python test_prompts.py
66
+ ```
67
+
68
+ ### Step 5 — See results
69
+
70
+ ```
71
+ Running 2 prompts x 3 inputs = 6 calls... done
72
+
73
+ Prompt Comparison Results
74
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
75
+ avg length (tokens) 187 61 (-67%)
76
+ tone warm neutral
77
+ uses lists 67% 33%
78
+ uses headers 33% 0%
79
+ avg cost (USD) $0.0021 $0.0009
80
+ refusal rate 0% 0%
81
+ reading level high school middle school
82
+ ```
83
+
84
+ ---
85
+
86
+ ## Where to put this in your project
87
+
88
+ ```
89
+ your-project/ <- your existing project
90
+ ├── main.py <- don't touch this
91
+ ├── prompts.py <- don't touch this
92
+ ├── .env <- don't touch this (already has your API key)
93
+ └── test_prompts.py <- create this one new file
94
+ ```
95
+
96
+ Import your prompts directly from your existing code:
97
+
98
+ ```python
99
+ from promptdiff import compare
100
+ from prompts import PROMPT_V1, PROMPT_V2
101
+
102
+ compare(
103
+ prompts={"v1": PROMPT_V1, "v2": PROMPT_V2},
104
+ inputs=["your test questions here"],
105
+ model="gpt-4o-mini"
106
+ )
107
+ ```
108
+
109
+ ---
110
+
111
+ ## Setup your API key
112
+
113
+ Create a `.env` file in your project root (or use an existing one):
114
+
115
+ ```bash
116
+ # Only one key is needed — whichever provider you use
117
+ OPENAI_API_KEY=sk-...
118
+ ```
119
+
120
+ promptdiff automatically reads `.env` files. No extra configuration.
121
+
122
+ ### Get an API key
123
+
124
+ | Provider | Link | Env variable | Free tier? |
125
+ |---|---|---|---|
126
+ | OpenAI | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` | No |
127
+ | Anthropic | [console.anthropic.com](https://console.anthropic.com/settings/keys) | `ANTHROPIC_API_KEY` | No |
128
+ | Google Gemini | [aistudio.google.com/apikey](https://aistudio.google.com/apikey) | `GEMINI_API_KEY` | Yes |
129
+ | Groq | [console.groq.com/keys](https://console.groq.com/keys) | `GROQ_API_KEY` | Yes |
130
+ | Ollama | [ollama.com](https://ollama.com) | None needed | Yes (local) |
131
+
132
+ ---
133
+
134
+ ## Supported models
135
+
136
+ Any model supported by [LiteLLM](https://litellm.ai) works (2,600+ models):
137
+
138
+ ```python
139
+ compare(..., model="gpt-4o-mini") # OpenAI
140
+ compare(..., model="gpt-4o") # OpenAI
141
+ compare(..., model="claude-haiku-4-5") # Anthropic
142
+ compare(..., model="claude-sonnet-4-6") # Anthropic
143
+ compare(..., model="gemini/gemini-2.0-flash") # Google Gemini
144
+ compare(..., model="groq/llama-3.3-70b-versatile") # Groq (free)
145
+ compare(..., model="ollama/llama3") # Ollama (local, free)
146
+ compare(..., model="deepseek/deepseek-chat") # DeepSeek
147
+ ```
148
+
149
+ Full list of all supported models: [models.litellm.ai](https://models.litellm.ai)
150
+
151
+ ---
152
+
153
+ ## Compare more than 2 prompts
154
+
155
+ ```python
156
+ compare(
157
+ prompts={
158
+ "baseline": "You are a helpful assistant.",
159
+ "concise": "You are a concise helpful assistant.",
160
+ "formal": "You are a professional formal assistant.",
161
+ "friendly": "You are a warm friendly assistant.",
162
+ },
163
+ inputs=["your test questions"]
164
+ )
165
+ ```
166
+
167
+ Each prompt becomes a column. Same table, more columns.
168
+
169
+ ---
170
+
171
+ ## See raw outputs
172
+
173
+ ```python
174
+ compare(
175
+ prompts={...},
176
+ inputs=[...],
177
+ show_outputs=True
178
+ )
179
+ ```
180
+
181
+ Prints each raw LLM response below the table, grouped by input.
182
+
183
+ ---
184
+
185
+ ## Faster execution with async
186
+
187
+ For many prompt+input combinations, run calls concurrently:
188
+
189
+ ```python
190
+ compare(
191
+ prompts={...},
192
+ inputs=[...],
193
+ use_async=True
194
+ )
195
+ ```
196
+
197
+ ---
198
+
199
+ ## What it measures
200
+
201
+ | Metric | Description |
202
+ |---|---|
203
+ | avg length (tokens) | Average response length in tokens |
204
+ | tone | Detected tone: neutral, formal, warm, or technical |
205
+ | uses lists | % of responses using bullet points or numbered lists |
206
+ | uses headers | % of responses using markdown headers |
207
+ | uses code blocks | % of responses using fenced code blocks |
208
+ | avg cost (USD) | Estimated cost per response based on token usage |
209
+ | refusal rate | % of responses that refused to answer |
210
+ | reading level | elementary / middle school / high school / college |
211
+ | avg sentence length | Average number of words per sentence |
212
+
213
+ ---
214
+
215
+ ## Why not promptfoo?
216
+
217
+ promptfoo is excellent. Use it if you need CI/CD integration, red-teaming,
218
+ or assertion-based testing with expected outputs.
219
+
220
+ **promptdiff is for when you just want to run prompts right now** and see how they
221
+ behave differently — no YAML, no config, no web server, no predefined "correct"
222
+ answers. Just a table in your terminal.
223
+
224
+ ---
225
+
226
+ ## License
227
+
228
+ MIT