fasteval-core 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. fasteval_core-1.0.0/PKG-INFO +234 -0
  2. fasteval_core-1.0.0/README.md +184 -0
  3. fasteval_core-1.0.0/fasteval/__init__.py +304 -0
  4. fasteval_core-1.0.0/fasteval/cache/__init__.py +5 -0
  5. fasteval_core-1.0.0/fasteval/cache/memory.py +226 -0
  6. fasteval_core-1.0.0/fasteval/core/__init__.py +52 -0
  7. fasteval_core-1.0.0/fasteval/core/decorators.py +2019 -0
  8. fasteval_core-1.0.0/fasteval/core/evaluator.py +348 -0
  9. fasteval_core-1.0.0/fasteval/core/scoring.py +420 -0
  10. fasteval_core-1.0.0/fasteval/metrics/__init__.py +153 -0
  11. fasteval_core-1.0.0/fasteval/metrics/audio.py +428 -0
  12. fasteval_core-1.0.0/fasteval/metrics/base.py +66 -0
  13. fasteval_core-1.0.0/fasteval/metrics/conversation.py +326 -0
  14. fasteval_core-1.0.0/fasteval/metrics/deterministic.py +997 -0
  15. fasteval_core-1.0.0/fasteval/metrics/llm.py +1467 -0
  16. fasteval_core-1.0.0/fasteval/metrics/multimodal.py +652 -0
  17. fasteval_core-1.0.0/fasteval/metrics/vision.py +746 -0
  18. fasteval_core-1.0.0/fasteval/models/__init__.py +32 -0
  19. fasteval_core-1.0.0/fasteval/models/config.py +26 -0
  20. fasteval_core-1.0.0/fasteval/models/evaluation.py +199 -0
  21. fasteval_core-1.0.0/fasteval/models/multimodal.py +174 -0
  22. fasteval_core-1.0.0/fasteval/providers/__init__.py +17 -0
  23. fasteval_core-1.0.0/fasteval/providers/base.py +42 -0
  24. fasteval_core-1.0.0/fasteval/providers/openai.py +73 -0
  25. fasteval_core-1.0.0/fasteval/providers/registry.py +118 -0
  26. fasteval_core-1.0.0/fasteval/py.typed +3 -0
  27. fasteval_core-1.0.0/fasteval/testing/__init__.py +8 -0
  28. fasteval_core-1.0.0/fasteval/testing/plugin.py +34 -0
  29. fasteval_core-1.0.0/fasteval/utils/__init__.py +27 -0
  30. fasteval_core-1.0.0/fasteval/utils/async_helpers.py +28 -0
  31. fasteval_core-1.0.0/fasteval/utils/audio.py +396 -0
  32. fasteval_core-1.0.0/fasteval/utils/formatting.py +118 -0
  33. fasteval_core-1.0.0/fasteval/utils/image.py +325 -0
  34. fasteval_core-1.0.0/fasteval/utils/json_parsing.py +105 -0
  35. fasteval_core-1.0.0/fasteval/utils/terminal_ui.py +335 -0
  36. fasteval_core-1.0.0/fasteval/utils/text.py +38 -0
  37. fasteval_core-1.0.0/pyproject.toml +118 -0
@@ -0,0 +1,234 @@
1
+ Metadata-Version: 2.3
2
+ Name: fasteval-core
3
+ Version: 1.0.0
4
+ Summary: A decorator-first LLM evaluation library for testing AI agents
5
+ Keywords: llm,evaluation,testing,ai,agents,pytest
6
+ Author: Intuit
7
+ License: Apache-2.0
8
+ Classifier: Development Status :: 4 - Beta
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: License :: OSI Approved :: Apache Software License
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Programming Language :: Python :: 3.13
16
+ Classifier: Programming Language :: Python :: 3.14
17
+ Classifier: Topic :: Software Development :: Testing
18
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
19
+ Requires-Dist: openai>=1.0
20
+ Requires-Dist: pydantic>=2.0
21
+ Requires-Dist: rouge-score>=0.1
22
+ Requires-Dist: pytest>=8.0
23
+ Requires-Dist: anthropic>=0.30 ; extra == 'anthropic'
24
+ Requires-Dist: jiwer>=3.0 ; extra == 'audio'
25
+ Requires-Dist: pydub>=0.25 ; extra == 'audio'
26
+ Requires-Dist: soundfile>=0.12 ; extra == 'audio'
27
+ Requires-Dist: pillow>=10.0 ; extra == 'image-gen'
28
+ Requires-Dist: transformers>=4.30 ; extra == 'image-gen'
29
+ Requires-Dist: torch>=2.0 ; extra == 'image-gen'
30
+ Requires-Dist: langfuse>=2.0 ; extra == 'langfuse'
31
+ Requires-Dist: fasteval-core[vision] ; extra == 'multimodal'
32
+ Requires-Dist: fasteval-core[audio] ; extra == 'multimodal'
33
+ Requires-Dist: pytesseract>=0.3 ; extra == 'ocr'
34
+ Requires-Dist: pillow>=10.0 ; extra == 'vision'
35
+ Requires-Dist: httpx>=0.25 ; extra == 'vision'
36
+ Requires-Python: >=3.10, <4.0
37
+ Project-URL: Homepage, https://github.com/intuit/fasteval
38
+ Project-URL: Repository, https://github.com/intuit/fasteval
39
+ Project-URL: Documentation, https://github.com/intuit/fasteval/tree/main/docs
40
+ Project-URL: Issues, https://github.com/intuit/fasteval/issues
41
+ Project-URL: Changelog, https://github.com/intuit/fasteval/blob/main/CHANGELOG.md
42
+ Provides-Extra: anthropic
43
+ Provides-Extra: audio
44
+ Provides-Extra: image-gen
45
+ Provides-Extra: langfuse
46
+ Provides-Extra: multimodal
47
+ Provides-Extra: ocr
48
+ Provides-Extra: vision
49
+ Description-Content-Type: text/markdown
50
+
51
+ # fasteval
52
+
53
+ [![PyPI version](https://img.shields.io/pypi/v/fasteval-core.svg)](https://pypi.org/project/fasteval-core/)
54
+ ![Python versions](https://img.shields.io/badge/python-3.10_|_3.11_|_3.12_|_3.13_|_3.14-blue?logo=python)
55
+ [![CI](https://github.com/intuit/fasteval/actions/workflows/ci.yml/badge.svg)](https://github.com/intuit/fasteval/actions/workflows/ci.yml)
56
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
57
+
58
+ A **decorator-first LLM evaluation library** for testing AI agents and LLMs. Stack decorators to define evaluation criteria, run with pytest.
59
+
60
+ ## Features
61
+
62
+ - **Decorator-based metrics** -- stack `@fe.correctness`, `@fe.relevance`, `@fe.hallucination`, and 30+ more
63
+ - **pytest native** -- run evaluations with `pytest`, get familiar pass/fail output
64
+ - **LLM-as-judge + deterministic** -- semantic LLM metrics alongside ROUGE, exact match, JSON schema, regex
65
+ - **Multi-modal** -- evaluate vision, audio, and image generation models
66
+ - **Conversation metrics** -- context retention, topic drift, consistency for multi-turn agents
67
+ - **RAG metrics** -- faithfulness, contextual precision, contextual recall, answer correctness
68
+ - **Tool trajectory** -- verify agent tool calls, argument matching, call sequences
69
+ - **Pluggable providers** -- OpenAI (default), Anthropic, Azure OpenAI, Ollama
70
+
71
+ ## Quick Start
72
+
73
+ ```bash
74
+ pip install fasteval-core
75
+ ```
76
+
77
+ Set your LLM provider key:
78
+
79
+ ```bash
80
+ export OPENAI_API_KEY=sk-your-key-here
81
+ ```
82
+
83
+ Write your first evaluation test:
84
+
85
+ ```python
86
+ import fasteval as fe
87
+
88
+ @fe.correctness(threshold=0.8)
89
+ @fe.relevance(threshold=0.7)
90
+ def test_qa_agent():
91
+ response = my_agent("What is the capital of France?")
92
+ fe.score(response, expected_output="Paris", input="What is the capital of France?")
93
+ ```
94
+
95
+ Run it:
96
+
97
+ ```bash
98
+ pytest test_qa_agent.py -v
99
+ ```
100
+
101
+ ## Installation
102
+
103
+ ```bash
104
+ # pip
105
+ pip install fasteval-core
106
+
107
+ # uv
108
+ uv add fasteval-core
109
+ ```
110
+
111
+ ### Optional Extras
112
+
113
+ ```bash
114
+ # Anthropic provider
115
+ pip install fasteval-core[anthropic]
116
+
117
+ # Vision-language evaluation (GPT-4V, Claude Vision)
118
+ pip install fasteval-core[vision]
119
+
120
+ # Audio/speech evaluation (Whisper, ASR)
121
+ pip install fasteval-core[audio]
122
+
123
+ # Image generation evaluation (DALL-E, Stable Diffusion)
124
+ pip install fasteval-core[image-gen]
125
+
126
+ # All multi-modal features
127
+ pip install fasteval-core[multimodal]
128
+ ```
129
+
130
+ ## Usage Examples
131
+
132
+ ### Deterministic Metrics
133
+
134
+ ```python
135
+ import fasteval as fe
136
+
137
+ @fe.contains()
138
+ def test_keyword_present():
139
+ fe.score("The answer is 42", expected_output="42")
140
+
141
+ @fe.rouge(threshold=0.6, rouge_type="rougeL")
142
+ def test_summary_quality():
143
+ fe.score(actual_output=summary, expected_output=reference)
144
+ ```
145
+
146
+ ### RAG Evaluation
147
+
148
+ ```python
149
+ @fe.faithfulness(threshold=0.8)
150
+ @fe.contextual_precision(threshold=0.7)
151
+ def test_rag_pipeline():
152
+ result = rag_pipeline("How does photosynthesis work?")
153
+ fe.score(
154
+ actual_output=result.answer,
155
+ context=result.retrieved_docs,
156
+ input="How does photosynthesis work?",
157
+ )
158
+ ```
159
+
160
+ ### Tool Trajectory
161
+
162
+ ```python
163
+ @fe.tool_call_accuracy(threshold=0.9)
164
+ def test_agent_tools():
165
+ result = agent.run("Book a flight to Paris")
166
+ fe.score(
167
+ actual_tools=result.tool_calls,
168
+ expected_tools=[
169
+ {"name": "search_flights", "args": {"destination": "Paris"}},
170
+ {"name": "book_flight"},
171
+ ],
172
+ )
173
+ ```
174
+
175
+ ### Metric Stacks
176
+
177
+ ```python
178
+ @fe.correctness(threshold=0.8, weight=2.0)
179
+ @fe.relevance(threshold=0.7, weight=1.0)
180
+ @fe.coherence(threshold=0.6, weight=1.0)
181
+ def test_comprehensive():
182
+ response = agent("Explain quantum computing")
183
+ fe.score(response, expected_output=reference_answer, input="Explain quantum computing")
184
+ ```
185
+
186
+ ## Plugins
187
+
188
+ | Plugin | Description | Install |
189
+ |--------|-------------|---------|
190
+ | [fasteval-langfuse](./plugins/fasteval-langfuse/) | Evaluate Langfuse production traces with fasteval metrics | `pip install fasteval-langfuse` |
191
+ | [fasteval-langgraph](./plugins/fasteval-langgraph/) | Test harness for LangGraph agents | `pip install fasteval-langgraph` |
192
+ | [fasteval-observe](./plugins/fasteval-observe/) | Runtime monitoring with async sampling | `pip install fasteval-observe` |
193
+
194
+ ## Local Development
195
+
196
+ ```bash
197
+ # Install uv
198
+ brew install uv
199
+
200
+ # Create virtual environment and install dependencies
201
+ uv sync --all-extras
202
+
203
+ # Run the test suite
204
+ uv run tox
205
+
206
+ # Format code
207
+ uv run black .
208
+ uv run isort .
209
+
210
+ # Type checking
211
+ uv run mypy .
212
+ ```
213
+
214
+ ## Documentation
215
+
216
+ Full documentation is available in the [docs/](./docs/) directory, covering:
217
+
218
+ - [Getting Started](./docs/getting-started/) -- installation, quickstart
219
+ - [Core Concepts](./docs/core-concepts/) -- decorators, metrics, scoring, data sources
220
+ - [LLM Metrics](./docs/llm-metrics/) -- correctness, relevance, hallucination, and more
221
+ - [Deterministic Metrics](./docs/deterministic-metrics/) -- ROUGE, exact match, regex, JSON schema
222
+ - [RAG Metrics](./docs/rag-metrics/) -- faithfulness, contextual precision/recall
223
+ - [Conversation Metrics](./docs/conversation-metrics/) -- context retention, consistency
224
+ - [Multi-Modal](./docs/multimodal/) -- vision, audio, image generation evaluation
225
+ - [Plugins](./docs/plugins/) -- Langfuse, LangGraph, Observe
226
+ - [API Reference](./docs/api-reference/) -- decorators, evaluator, models, score
227
+
228
+ ## Contributing
229
+
230
+ See [CONTRIBUTING.md](./CONTRIBUTING.md) for development setup, coding standards, and how to submit pull requests.
231
+
232
+ ## License
233
+
234
+ Apache License 2.0 -- see [LICENSE](./LICENSE) for details.
@@ -0,0 +1,184 @@
1
+ # fasteval
2
+
3
+ [![PyPI version](https://img.shields.io/pypi/v/fasteval-core.svg)](https://pypi.org/project/fasteval-core/)
4
+ ![Python versions](https://img.shields.io/badge/python-3.10_|_3.11_|_3.12_|_3.13_|_3.14-blue?logo=python)
5
+ [![CI](https://github.com/intuit/fasteval/actions/workflows/ci.yml/badge.svg)](https://github.com/intuit/fasteval/actions/workflows/ci.yml)
6
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
7
+
8
+ A **decorator-first LLM evaluation library** for testing AI agents and LLMs. Stack decorators to define evaluation criteria, run with pytest.
9
+
10
+ ## Features
11
+
12
+ - **Decorator-based metrics** -- stack `@fe.correctness`, `@fe.relevance`, `@fe.hallucination`, and 30+ more
13
+ - **pytest native** -- run evaluations with `pytest`, get familiar pass/fail output
14
+ - **LLM-as-judge + deterministic** -- semantic LLM metrics alongside ROUGE, exact match, JSON schema, regex
15
+ - **Multi-modal** -- evaluate vision, audio, and image generation models
16
+ - **Conversation metrics** -- context retention, topic drift, consistency for multi-turn agents
17
+ - **RAG metrics** -- faithfulness, contextual precision, contextual recall, answer correctness
18
+ - **Tool trajectory** -- verify agent tool calls, argument matching, call sequences
19
+ - **Pluggable providers** -- OpenAI (default), Anthropic, Azure OpenAI, Ollama
20
+
21
+ ## Quick Start
22
+
23
+ ```bash
24
+ pip install fasteval-core
25
+ ```
26
+
27
+ Set your LLM provider key:
28
+
29
+ ```bash
30
+ export OPENAI_API_KEY=sk-your-key-here
31
+ ```
32
+
33
+ Write your first evaluation test:
34
+
35
+ ```python
36
+ import fasteval as fe
37
+
38
+ @fe.correctness(threshold=0.8)
39
+ @fe.relevance(threshold=0.7)
40
+ def test_qa_agent():
41
+ response = my_agent("What is the capital of France?")
42
+ fe.score(response, expected_output="Paris", input="What is the capital of France?")
43
+ ```
44
+
45
+ Run it:
46
+
47
+ ```bash
48
+ pytest test_qa_agent.py -v
49
+ ```
50
+
51
+ ## Installation
52
+
53
+ ```bash
54
+ # pip
55
+ pip install fasteval-core
56
+
57
+ # uv
58
+ uv add fasteval-core
59
+ ```
60
+
61
+ ### Optional Extras
62
+
63
+ ```bash
64
+ # Anthropic provider
65
+ pip install fasteval-core[anthropic]
66
+
67
+ # Vision-language evaluation (GPT-4V, Claude Vision)
68
+ pip install fasteval-core[vision]
69
+
70
+ # Audio/speech evaluation (Whisper, ASR)
71
+ pip install fasteval-core[audio]
72
+
73
+ # Image generation evaluation (DALL-E, Stable Diffusion)
74
+ pip install fasteval-core[image-gen]
75
+
76
+ # All multi-modal features
77
+ pip install fasteval-core[multimodal]
78
+ ```
79
+
80
+ ## Usage Examples
81
+
82
+ ### Deterministic Metrics
83
+
84
+ ```python
85
+ import fasteval as fe
86
+
87
+ @fe.contains()
88
+ def test_keyword_present():
89
+ fe.score("The answer is 42", expected_output="42")
90
+
91
+ @fe.rouge(threshold=0.6, rouge_type="rougeL")
92
+ def test_summary_quality():
93
+ fe.score(actual_output=summary, expected_output=reference)
94
+ ```
95
+
96
+ ### RAG Evaluation
97
+
98
+ ```python
99
+ @fe.faithfulness(threshold=0.8)
100
+ @fe.contextual_precision(threshold=0.7)
101
+ def test_rag_pipeline():
102
+ result = rag_pipeline("How does photosynthesis work?")
103
+ fe.score(
104
+ actual_output=result.answer,
105
+ context=result.retrieved_docs,
106
+ input="How does photosynthesis work?",
107
+ )
108
+ ```
109
+
110
+ ### Tool Trajectory
111
+
112
+ ```python
113
+ @fe.tool_call_accuracy(threshold=0.9)
114
+ def test_agent_tools():
115
+ result = agent.run("Book a flight to Paris")
116
+ fe.score(
117
+ actual_tools=result.tool_calls,
118
+ expected_tools=[
119
+ {"name": "search_flights", "args": {"destination": "Paris"}},
120
+ {"name": "book_flight"},
121
+ ],
122
+ )
123
+ ```
124
+
125
+ ### Metric Stacks
126
+
127
+ ```python
128
+ @fe.correctness(threshold=0.8, weight=2.0)
129
+ @fe.relevance(threshold=0.7, weight=1.0)
130
+ @fe.coherence(threshold=0.6, weight=1.0)
131
+ def test_comprehensive():
132
+ response = agent("Explain quantum computing")
133
+ fe.score(response, expected_output=reference_answer, input="Explain quantum computing")
134
+ ```
135
+
136
+ ## Plugins
137
+
138
+ | Plugin | Description | Install |
139
+ |--------|-------------|---------|
140
+ | [fasteval-langfuse](./plugins/fasteval-langfuse/) | Evaluate Langfuse production traces with fasteval metrics | `pip install fasteval-langfuse` |
141
+ | [fasteval-langgraph](./plugins/fasteval-langgraph/) | Test harness for LangGraph agents | `pip install fasteval-langgraph` |
142
+ | [fasteval-observe](./plugins/fasteval-observe/) | Runtime monitoring with async sampling | `pip install fasteval-observe` |
143
+
144
+ ## Local Development
145
+
146
+ ```bash
147
+ # Install uv
148
+ brew install uv
149
+
150
+ # Create virtual environment and install dependencies
151
+ uv sync --all-extras
152
+
153
+ # Run the test suite
154
+ uv run tox
155
+
156
+ # Format code
157
+ uv run black .
158
+ uv run isort .
159
+
160
+ # Type checking
161
+ uv run mypy .
162
+ ```
163
+
164
+ ## Documentation
165
+
166
+ Full documentation is available in the [docs/](./docs/) directory, covering:
167
+
168
+ - [Getting Started](./docs/getting-started/) -- installation, quickstart
169
+ - [Core Concepts](./docs/core-concepts/) -- decorators, metrics, scoring, data sources
170
+ - [LLM Metrics](./docs/llm-metrics/) -- correctness, relevance, hallucination, and more
171
+ - [Deterministic Metrics](./docs/deterministic-metrics/) -- ROUGE, exact match, regex, JSON schema
172
+ - [RAG Metrics](./docs/rag-metrics/) -- faithfulness, contextual precision/recall
173
+ - [Conversation Metrics](./docs/conversation-metrics/) -- context retention, consistency
174
+ - [Multi-Modal](./docs/multimodal/) -- vision, audio, image generation evaluation
175
+ - [Plugins](./docs/plugins/) -- Langfuse, LangGraph, Observe
176
+ - [API Reference](./docs/api-reference/) -- decorators, evaluator, models, score
177
+
178
+ ## Contributing
179
+
180
+ See [CONTRIBUTING.md](./CONTRIBUTING.md) for development setup, coding standards, and how to submit pull requests.
181
+
182
+ ## License
183
+
184
+ Apache License 2.0 -- see [LICENSE](./LICENSE) for details.