PyPI - compare-prompts - Versions diffs - 0.1.0__tar.gz - Mend

compare-prompts 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

compare_prompts-0.1.0/LICENSE +21 -0
compare_prompts-0.1.0/PKG-INFO +266 -0
compare_prompts-0.1.0/README.md +228 -0
compare_prompts-0.1.0/compare_prompts.egg-info/PKG-INFO +266 -0
compare_prompts-0.1.0/compare_prompts.egg-info/SOURCES.txt +17 -0
compare_prompts-0.1.0/compare_prompts.egg-info/dependency_links.txt +1 -0
compare_prompts-0.1.0/compare_prompts.egg-info/entry_points.txt +2 -0
compare_prompts-0.1.0/compare_prompts.egg-info/requires.txt +10 -0
compare_prompts-0.1.0/compare_prompts.egg-info/top_level.txt +1 -0
compare_prompts-0.1.0/promptdiff/__init__.py +149 -0
compare_prompts-0.1.0/promptdiff/cli.py +75 -0
compare_prompts-0.1.0/promptdiff/display.py +244 -0
compare_prompts-0.1.0/promptdiff/metrics.py +138 -0
compare_prompts-0.1.0/promptdiff/runner.py +200 -0
compare_prompts-0.1.0/pyproject.toml +53 -0
compare_prompts-0.1.0/setup.cfg +4 -0
compare_prompts-0.1.0/tests/test_cli.py +61 -0
compare_prompts-0.1.0/tests/test_metrics.py +158 -0
compare_prompts-0.1.0/tests/test_runner.py +84 -0

compare_prompts-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Omar Mashal
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

compare_prompts-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,266 @@
+Metadata-Version: 2.4
+Name: compare-prompts
+Version: 0.1.0
+Summary: Compare LLM prompts side by side — no config, no dashboard, just a table
+Author-email: Omar Mashal <omarmashal@example.com>
+License: MIT
+Project-URL: Homepage, https://github.com/OmarMashal0/promptdiff
+Project-URL: Repository, https://github.com/OmarMashal0/promptdiff
+Project-URL: Issues, https://github.com/OmarMashal0/promptdiff/issues
+Project-URL: Documentation, https://github.com/OmarMashal0/promptdiff#readme
+Project-URL: Changelog, https://github.com/OmarMashal0/promptdiff/blob/main/CHANGELOG.md
+Keywords: llm,prompt,comparison,diff,ai,openai,anthropic,gemini,evaluation
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: Software Development :: Testing
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: litellm>=1.0.0
+Requires-Dist: rich>=13.0.0
+Requires-Dist: python-dotenv>=1.0.0
+Requires-Dist: textstat>=0.7.0
+Requires-Dist: click>=8.0.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == "dev"
+Requires-Dist: pytest-mock>=3.0.0; extra == "dev"
+Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
+Dynamic: license-file
+# promptdiff
+[![PyPI version](https://badge.fury.io/py/compare-prompts.svg)](https://pypi.org/project/compare-prompts/)
+[![CI](https://github.com/OmarMashal0/promptdiff/actions/workflows/ci.yml/badge.svg)](https://github.com/OmarMashal0/promptdiff/actions)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
+**Compare LLM prompts side by side. No config files. No dashboards. No signup.**
+```bash
+pip install compare-prompts
+```
+---
+## The problem
+You have two (or more) prompts. You changed one word. Did it actually change anything? Right now:
+- Running them manually and eyeballing outputs takes 30 minutes
+- Setting up promptfoo requires YAML config and predefined "correct" answers
+- Platforms like Braintrust/LangSmith require signup and send data to a dashboard
+**promptdiff is the missing middle ground** — run it in your script, get a table in your terminal.
+---
+## Quickstart
+### Step 1 — Install
+```bash
+pip install compare-prompts
+```
+### Step 2 — Generate a starter file (optional)
+```bash
+promptdiff init
+```
+This creates a `test_prompts.py` file you can edit immediately.
+### Step 3 — Or write your own comparison
+```python
+from promptdiff import compare
+compare(
+    prompts={
+        "original": "You are a helpful assistant.",
+        "concise":  "You are a concise helpful assistant.",
+    },
+    inputs=[
+        "Explain what a database is.",
+        "What is recursion?",
+        "Write a short poem about coding.",
+    ],
+    model="gpt-4o-mini"
+)
+```
+### Step 4 — Run it
+```bash
+python test_prompts.py
+```
+### Step 5 — See results
+```
+Running 2 prompts x 3 inputs = 6 calls...  done
+                   Prompt Comparison Results
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  avg length (tokens)    187                  61  (-67%)
+  tone                   warm                 neutral
+  uses lists             67%                  33%
+  uses headers           33%                  0%
+  avg cost (USD)         $0.0021              $0.0009
+  refusal rate           0%                   0%
+  reading level          high school          middle school
+```
+---
+## Where to put this in your project
+```
+your-project/                <- your existing project
+├── main.py                  <- don't touch this
+├── prompts.py               <- don't touch this
+├── .env                     <- don't touch this (already has your API key)
+└── test_prompts.py          <- create this one new file
+```
+Import your prompts directly from your existing code:
+```python
+from promptdiff import compare
+from prompts import PROMPT_V1, PROMPT_V2
+compare(
+    prompts={"v1": PROMPT_V1, "v2": PROMPT_V2},
+    inputs=["your test questions here"],
+    model="gpt-4o-mini"
+)
+```
+---
+## Setup your API key
+Create a `.env` file in your project root (or use an existing one):
+```bash
+# Only one key is needed — whichever provider you use
+OPENAI_API_KEY=sk-...
+```
+promptdiff automatically reads `.env` files. No extra configuration.
+### Get an API key
+| Provider | Link | Env variable | Free tier? |
+|---|---|---|---|
+| OpenAI | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` | No |
+| Anthropic | [console.anthropic.com](https://console.anthropic.com/settings/keys) | `ANTHROPIC_API_KEY` | No |
+| Google Gemini | [aistudio.google.com/apikey](https://aistudio.google.com/apikey) | `GEMINI_API_KEY` | Yes |
+| Groq | [console.groq.com/keys](https://console.groq.com/keys) | `GROQ_API_KEY` | Yes |
+| Ollama | [ollama.com](https://ollama.com) | None needed | Yes (local) |
+---
+## Supported models
+Any model supported by [LiteLLM](https://litellm.ai) works (2,600+ models):
+```python
+compare(..., model="gpt-4o-mini")                      # OpenAI
+compare(..., model="gpt-4o")                            # OpenAI
+compare(..., model="claude-haiku-4-5")                  # Anthropic
+compare(..., model="claude-sonnet-4-6")                 # Anthropic
+compare(..., model="gemini/gemini-2.0-flash")           # Google Gemini
+compare(..., model="groq/llama-3.3-70b-versatile")      # Groq (free)
+compare(..., model="ollama/llama3")                     # Ollama (local, free)
+compare(..., model="deepseek/deepseek-chat")            # DeepSeek
+```
+Full list of all supported models: [models.litellm.ai](https://models.litellm.ai)
+---
+## Compare more than 2 prompts
+```python
+compare(
+    prompts={
+        "baseline": "You are a helpful assistant.",
+        "concise":  "You are a concise helpful assistant.",
+        "formal":   "You are a professional formal assistant.",
+        "friendly": "You are a warm friendly assistant.",
+    },
+    inputs=["your test questions"]
+)
+```
+Each prompt becomes a column. Same table, more columns.
+---
+## See raw outputs
+```python
+compare(
+    prompts={...},
+    inputs=[...],
+    show_outputs=True
+)
+```
+Prints each raw LLM response below the table, grouped by input.
+---
+## Faster execution with async
+For many prompt+input combinations, run calls concurrently:
+```python
+compare(
+    prompts={...},
+    inputs=[...],
+    use_async=True
+)
+```
+---
+## What it measures
+| Metric | Description |
+|---|---|
+| avg length (tokens) | Average response length in tokens |
+| tone | Detected tone: neutral, formal, warm, or technical |
+| uses lists | % of responses using bullet points or numbered lists |
+| uses headers | % of responses using markdown headers |
+| uses code blocks | % of responses using fenced code blocks |
+| avg cost (USD) | Estimated cost per response based on token usage |
+| refusal rate | % of responses that refused to answer |
+| reading level | elementary / middle school / high school / college |
+| avg sentence length | Average number of words per sentence |
+---
+## Why not promptfoo?
+promptfoo is excellent. Use it if you need CI/CD integration, red-teaming,
+or assertion-based testing with expected outputs.
+**promptdiff is for when you just want to run prompts right now** and see how they
+behave differently — no YAML, no config, no web server, no predefined "correct"
+answers. Just a table in your terminal.
+---
+## License
+MIT

compare_prompts-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,228 @@
+# promptdiff
+[![PyPI version](https://badge.fury.io/py/compare-prompts.svg)](https://pypi.org/project/compare-prompts/)
+[![CI](https://github.com/OmarMashal0/promptdiff/actions/workflows/ci.yml/badge.svg)](https://github.com/OmarMashal0/promptdiff/actions)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
+**Compare LLM prompts side by side. No config files. No dashboards. No signup.**
+```bash
+pip install compare-prompts
+```
+---
+## The problem
+You have two (or more) prompts. You changed one word. Did it actually change anything? Right now:
+- Running them manually and eyeballing outputs takes 30 minutes
+- Setting up promptfoo requires YAML config and predefined "correct" answers
+- Platforms like Braintrust/LangSmith require signup and send data to a dashboard
+**promptdiff is the missing middle ground** — run it in your script, get a table in your terminal.
+---
+## Quickstart
+### Step 1 — Install
+```bash
+pip install compare-prompts
+```
+### Step 2 — Generate a starter file (optional)
+```bash
+promptdiff init
+```
+This creates a `test_prompts.py` file you can edit immediately.
+### Step 3 — Or write your own comparison
+```python
+from promptdiff import compare
+compare(
+    prompts={
+        "original": "You are a helpful assistant.",
+        "concise":  "You are a concise helpful assistant.",
+    },
+    inputs=[
+        "Explain what a database is.",
+        "What is recursion?",
+        "Write a short poem about coding.",
+    ],
+    model="gpt-4o-mini"
+)
+```
+### Step 4 — Run it
+```bash
+python test_prompts.py
+```
+### Step 5 — See results
+```
+Running 2 prompts x 3 inputs = 6 calls...  done
+                   Prompt Comparison Results
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  avg length (tokens)    187                  61  (-67%)
+  tone                   warm                 neutral
+  uses lists             67%                  33%
+  uses headers           33%                  0%
+  avg cost (USD)         $0.0021              $0.0009
+  refusal rate           0%                   0%
+  reading level          high school          middle school
+```
+---
+## Where to put this in your project
+```
+your-project/                <- your existing project
+├── main.py                  <- don't touch this
+├── prompts.py               <- don't touch this
+├── .env                     <- don't touch this (already has your API key)
+└── test_prompts.py          <- create this one new file
+```
+Import your prompts directly from your existing code:
+```python
+from promptdiff import compare
+from prompts import PROMPT_V1, PROMPT_V2
+compare(
+    prompts={"v1": PROMPT_V1, "v2": PROMPT_V2},
+    inputs=["your test questions here"],
+    model="gpt-4o-mini"
+)
+```
+---
+## Setup your API key
+Create a `.env` file in your project root (or use an existing one):
+```bash
+# Only one key is needed — whichever provider you use
+OPENAI_API_KEY=sk-...
+```
+promptdiff automatically reads `.env` files. No extra configuration.
+### Get an API key
+| Provider | Link | Env variable | Free tier? |
+|---|---|---|---|
+| OpenAI | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) | `OPENAI_API_KEY` | No |
+| Anthropic | [console.anthropic.com](https://console.anthropic.com/settings/keys) | `ANTHROPIC_API_KEY` | No |
+| Google Gemini | [aistudio.google.com/apikey](https://aistudio.google.com/apikey) | `GEMINI_API_KEY` | Yes |
+| Groq | [console.groq.com/keys](https://console.groq.com/keys) | `GROQ_API_KEY` | Yes |
+| Ollama | [ollama.com](https://ollama.com) | None needed | Yes (local) |
+---
+## Supported models
+Any model supported by [LiteLLM](https://litellm.ai) works (2,600+ models):
+```python
+compare(..., model="gpt-4o-mini")                      # OpenAI
+compare(..., model="gpt-4o")                            # OpenAI
+compare(..., model="claude-haiku-4-5")                  # Anthropic
+compare(..., model="claude-sonnet-4-6")                 # Anthropic
+compare(..., model="gemini/gemini-2.0-flash")           # Google Gemini
+compare(..., model="groq/llama-3.3-70b-versatile")      # Groq (free)
+compare(..., model="ollama/llama3")                     # Ollama (local, free)
+compare(..., model="deepseek/deepseek-chat")            # DeepSeek
+```
+Full list of all supported models: [models.litellm.ai](https://models.litellm.ai)
+---
+## Compare more than 2 prompts
+```python
+compare(
+    prompts={
+        "baseline": "You are a helpful assistant.",
+        "concise":  "You are a concise helpful assistant.",
+        "formal":   "You are a professional formal assistant.",
+        "friendly": "You are a warm friendly assistant.",
+    },
+    inputs=["your test questions"]
+)
+```
+Each prompt becomes a column. Same table, more columns.
+---
+## See raw outputs
+```python
+compare(
+    prompts={...},
+    inputs=[...],
+    show_outputs=True
+)
+```
+Prints each raw LLM response below the table, grouped by input.
+---
+## Faster execution with async
+For many prompt+input combinations, run calls concurrently:
+```python
+compare(
+    prompts={...},
+    inputs=[...],
+    use_async=True
+)
+```
+---
+## What it measures
+| Metric | Description |
+|---|---|
+| avg length (tokens) | Average response length in tokens |
+| tone | Detected tone: neutral, formal, warm, or technical |
+| uses lists | % of responses using bullet points or numbered lists |
+| uses headers | % of responses using markdown headers |
+| uses code blocks | % of responses using fenced code blocks |
+| avg cost (USD) | Estimated cost per response based on token usage |
+| refusal rate | % of responses that refused to answer |
+| reading level | elementary / middle school / high school / college |
+| avg sentence length | Average number of words per sentence |
+---
+## Why not promptfoo?
+promptfoo is excellent. Use it if you need CI/CD integration, red-teaming,
+or assertion-based testing with expected outputs.
+**promptdiff is for when you just want to run prompts right now** and see how they
+behave differently — no YAML, no config, no web server, no predefined "correct"
+answers. Just a table in your terminal.
+---
+## License
+MIT