fasteval-core 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- fasteval_core-1.0.0/PKG-INFO +234 -0
- fasteval_core-1.0.0/README.md +184 -0
- fasteval_core-1.0.0/fasteval/__init__.py +304 -0
- fasteval_core-1.0.0/fasteval/cache/__init__.py +5 -0
- fasteval_core-1.0.0/fasteval/cache/memory.py +226 -0
- fasteval_core-1.0.0/fasteval/core/__init__.py +52 -0
- fasteval_core-1.0.0/fasteval/core/decorators.py +2019 -0
- fasteval_core-1.0.0/fasteval/core/evaluator.py +348 -0
- fasteval_core-1.0.0/fasteval/core/scoring.py +420 -0
- fasteval_core-1.0.0/fasteval/metrics/__init__.py +153 -0
- fasteval_core-1.0.0/fasteval/metrics/audio.py +428 -0
- fasteval_core-1.0.0/fasteval/metrics/base.py +66 -0
- fasteval_core-1.0.0/fasteval/metrics/conversation.py +326 -0
- fasteval_core-1.0.0/fasteval/metrics/deterministic.py +997 -0
- fasteval_core-1.0.0/fasteval/metrics/llm.py +1467 -0
- fasteval_core-1.0.0/fasteval/metrics/multimodal.py +652 -0
- fasteval_core-1.0.0/fasteval/metrics/vision.py +746 -0
- fasteval_core-1.0.0/fasteval/models/__init__.py +32 -0
- fasteval_core-1.0.0/fasteval/models/config.py +26 -0
- fasteval_core-1.0.0/fasteval/models/evaluation.py +199 -0
- fasteval_core-1.0.0/fasteval/models/multimodal.py +174 -0
- fasteval_core-1.0.0/fasteval/providers/__init__.py +17 -0
- fasteval_core-1.0.0/fasteval/providers/base.py +42 -0
- fasteval_core-1.0.0/fasteval/providers/openai.py +73 -0
- fasteval_core-1.0.0/fasteval/providers/registry.py +118 -0
- fasteval_core-1.0.0/fasteval/py.typed +3 -0
- fasteval_core-1.0.0/fasteval/testing/__init__.py +8 -0
- fasteval_core-1.0.0/fasteval/testing/plugin.py +34 -0
- fasteval_core-1.0.0/fasteval/utils/__init__.py +27 -0
- fasteval_core-1.0.0/fasteval/utils/async_helpers.py +28 -0
- fasteval_core-1.0.0/fasteval/utils/audio.py +396 -0
- fasteval_core-1.0.0/fasteval/utils/formatting.py +118 -0
- fasteval_core-1.0.0/fasteval/utils/image.py +325 -0
- fasteval_core-1.0.0/fasteval/utils/json_parsing.py +105 -0
- fasteval_core-1.0.0/fasteval/utils/terminal_ui.py +335 -0
- fasteval_core-1.0.0/fasteval/utils/text.py +38 -0
- fasteval_core-1.0.0/pyproject.toml +118 -0
|
@@ -0,0 +1,234 @@
|
|
|
1
|
+
Metadata-Version: 2.3
|
|
2
|
+
Name: fasteval-core
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: A decorator-first LLM evaluation library for testing AI agents
|
|
5
|
+
Keywords: llm,evaluation,testing,ai,agents,pytest
|
|
6
|
+
Author: Intuit
|
|
7
|
+
License: Apache-2.0
|
|
8
|
+
Classifier: Development Status :: 4 - Beta
|
|
9
|
+
Classifier: Intended Audience :: Developers
|
|
10
|
+
Classifier: License :: OSI Approved :: Apache Software License
|
|
11
|
+
Classifier: Programming Language :: Python :: 3
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
17
|
+
Classifier: Topic :: Software Development :: Testing
|
|
18
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
19
|
+
Requires-Dist: openai>=1.0
|
|
20
|
+
Requires-Dist: pydantic>=2.0
|
|
21
|
+
Requires-Dist: rouge-score>=0.1
|
|
22
|
+
Requires-Dist: pytest>=8.0
|
|
23
|
+
Requires-Dist: anthropic>=0.30 ; extra == 'anthropic'
|
|
24
|
+
Requires-Dist: jiwer>=3.0 ; extra == 'audio'
|
|
25
|
+
Requires-Dist: pydub>=0.25 ; extra == 'audio'
|
|
26
|
+
Requires-Dist: soundfile>=0.12 ; extra == 'audio'
|
|
27
|
+
Requires-Dist: pillow>=10.0 ; extra == 'image-gen'
|
|
28
|
+
Requires-Dist: transformers>=4.30 ; extra == 'image-gen'
|
|
29
|
+
Requires-Dist: torch>=2.0 ; extra == 'image-gen'
|
|
30
|
+
Requires-Dist: langfuse>=2.0 ; extra == 'langfuse'
|
|
31
|
+
Requires-Dist: fasteval-core[vision] ; extra == 'multimodal'
|
|
32
|
+
Requires-Dist: fasteval-core[audio] ; extra == 'multimodal'
|
|
33
|
+
Requires-Dist: pytesseract>=0.3 ; extra == 'ocr'
|
|
34
|
+
Requires-Dist: pillow>=10.0 ; extra == 'vision'
|
|
35
|
+
Requires-Dist: httpx>=0.25 ; extra == 'vision'
|
|
36
|
+
Requires-Python: >=3.10, <4.0
|
|
37
|
+
Project-URL: Homepage, https://github.com/intuit/fasteval
|
|
38
|
+
Project-URL: Repository, https://github.com/intuit/fasteval
|
|
39
|
+
Project-URL: Documentation, https://github.com/intuit/fasteval/tree/main/docs
|
|
40
|
+
Project-URL: Issues, https://github.com/intuit/fasteval/issues
|
|
41
|
+
Project-URL: Changelog, https://github.com/intuit/fasteval/blob/main/CHANGELOG.md
|
|
42
|
+
Provides-Extra: anthropic
|
|
43
|
+
Provides-Extra: audio
|
|
44
|
+
Provides-Extra: image-gen
|
|
45
|
+
Provides-Extra: langfuse
|
|
46
|
+
Provides-Extra: multimodal
|
|
47
|
+
Provides-Extra: ocr
|
|
48
|
+
Provides-Extra: vision
|
|
49
|
+
Description-Content-Type: text/markdown
|
|
50
|
+
|
|
51
|
+
# fasteval
|
|
52
|
+
|
|
53
|
+
[](https://pypi.org/project/fasteval-core/)
|
|
54
|
+

|
|
55
|
+
[](https://github.com/intuit/fasteval/actions/workflows/ci.yml)
|
|
56
|
+
[](https://opensource.org/licenses/Apache-2.0)
|
|
57
|
+
|
|
58
|
+
A **decorator-first LLM evaluation library** for testing AI agents and LLMs. Stack decorators to define evaluation criteria, run with pytest.
|
|
59
|
+
|
|
60
|
+
## Features
|
|
61
|
+
|
|
62
|
+
- **Decorator-based metrics** -- stack `@fe.correctness`, `@fe.relevance`, `@fe.hallucination`, and 30+ more
|
|
63
|
+
- **pytest native** -- run evaluations with `pytest`, get familiar pass/fail output
|
|
64
|
+
- **LLM-as-judge + deterministic** -- semantic LLM metrics alongside ROUGE, exact match, JSON schema, regex
|
|
65
|
+
- **Multi-modal** -- evaluate vision, audio, and image generation models
|
|
66
|
+
- **Conversation metrics** -- context retention, topic drift, consistency for multi-turn agents
|
|
67
|
+
- **RAG metrics** -- faithfulness, contextual precision, contextual recall, answer correctness
|
|
68
|
+
- **Tool trajectory** -- verify agent tool calls, argument matching, call sequences
|
|
69
|
+
- **Pluggable providers** -- OpenAI (default), Anthropic, Azure OpenAI, Ollama
|
|
70
|
+
|
|
71
|
+
## Quick Start
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
pip install fasteval-core
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Set your LLM provider key:
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
export OPENAI_API_KEY=sk-your-key-here
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
Write your first evaluation test:
|
|
84
|
+
|
|
85
|
+
```python
|
|
86
|
+
import fasteval as fe
|
|
87
|
+
|
|
88
|
+
@fe.correctness(threshold=0.8)
|
|
89
|
+
@fe.relevance(threshold=0.7)
|
|
90
|
+
def test_qa_agent():
|
|
91
|
+
response = my_agent("What is the capital of France?")
|
|
92
|
+
fe.score(response, expected_output="Paris", input="What is the capital of France?")
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Run it:
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
pytest test_qa_agent.py -v
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
## Installation
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
# pip
|
|
105
|
+
pip install fasteval-core
|
|
106
|
+
|
|
107
|
+
# uv
|
|
108
|
+
uv add fasteval-core
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Optional Extras
|
|
112
|
+
|
|
113
|
+
```bash
|
|
114
|
+
# Anthropic provider
|
|
115
|
+
pip install fasteval-core[anthropic]
|
|
116
|
+
|
|
117
|
+
# Vision-language evaluation (GPT-4V, Claude Vision)
|
|
118
|
+
pip install fasteval-core[vision]
|
|
119
|
+
|
|
120
|
+
# Audio/speech evaluation (Whisper, ASR)
|
|
121
|
+
pip install fasteval-core[audio]
|
|
122
|
+
|
|
123
|
+
# Image generation evaluation (DALL-E, Stable Diffusion)
|
|
124
|
+
pip install fasteval-core[image-gen]
|
|
125
|
+
|
|
126
|
+
# All multi-modal features
|
|
127
|
+
pip install fasteval-core[multimodal]
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
## Usage Examples
|
|
131
|
+
|
|
132
|
+
### Deterministic Metrics
|
|
133
|
+
|
|
134
|
+
```python
|
|
135
|
+
import fasteval as fe
|
|
136
|
+
|
|
137
|
+
@fe.contains()
|
|
138
|
+
def test_keyword_present():
|
|
139
|
+
fe.score("The answer is 42", expected_output="42")
|
|
140
|
+
|
|
141
|
+
@fe.rouge(threshold=0.6, rouge_type="rougeL")
|
|
142
|
+
def test_summary_quality():
|
|
143
|
+
fe.score(actual_output=summary, expected_output=reference)
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### RAG Evaluation
|
|
147
|
+
|
|
148
|
+
```python
|
|
149
|
+
@fe.faithfulness(threshold=0.8)
|
|
150
|
+
@fe.contextual_precision(threshold=0.7)
|
|
151
|
+
def test_rag_pipeline():
|
|
152
|
+
result = rag_pipeline("How does photosynthesis work?")
|
|
153
|
+
fe.score(
|
|
154
|
+
actual_output=result.answer,
|
|
155
|
+
context=result.retrieved_docs,
|
|
156
|
+
input="How does photosynthesis work?",
|
|
157
|
+
)
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
### Tool Trajectory
|
|
161
|
+
|
|
162
|
+
```python
|
|
163
|
+
@fe.tool_call_accuracy(threshold=0.9)
|
|
164
|
+
def test_agent_tools():
|
|
165
|
+
result = agent.run("Book a flight to Paris")
|
|
166
|
+
fe.score(
|
|
167
|
+
actual_tools=result.tool_calls,
|
|
168
|
+
expected_tools=[
|
|
169
|
+
{"name": "search_flights", "args": {"destination": "Paris"}},
|
|
170
|
+
{"name": "book_flight"},
|
|
171
|
+
],
|
|
172
|
+
)
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
### Metric Stacks
|
|
176
|
+
|
|
177
|
+
```python
|
|
178
|
+
@fe.correctness(threshold=0.8, weight=2.0)
|
|
179
|
+
@fe.relevance(threshold=0.7, weight=1.0)
|
|
180
|
+
@fe.coherence(threshold=0.6, weight=1.0)
|
|
181
|
+
def test_comprehensive():
|
|
182
|
+
response = agent("Explain quantum computing")
|
|
183
|
+
fe.score(response, expected_output=reference_answer, input="Explain quantum computing")
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
## Plugins
|
|
187
|
+
|
|
188
|
+
| Plugin | Description | Install |
|
|
189
|
+
|--------|-------------|---------|
|
|
190
|
+
| [fasteval-langfuse](./plugins/fasteval-langfuse/) | Evaluate Langfuse production traces with fasteval metrics | `pip install fasteval-langfuse` |
|
|
191
|
+
| [fasteval-langgraph](./plugins/fasteval-langgraph/) | Test harness for LangGraph agents | `pip install fasteval-langgraph` |
|
|
192
|
+
| [fasteval-observe](./plugins/fasteval-observe/) | Runtime monitoring with async sampling | `pip install fasteval-observe` |
|
|
193
|
+
|
|
194
|
+
## Local Development
|
|
195
|
+
|
|
196
|
+
```bash
|
|
197
|
+
# Install uv
|
|
198
|
+
brew install uv
|
|
199
|
+
|
|
200
|
+
# Create virtual environment and install dependencies
|
|
201
|
+
uv sync --all-extras
|
|
202
|
+
|
|
203
|
+
# Run the test suite
|
|
204
|
+
uv run tox
|
|
205
|
+
|
|
206
|
+
# Format code
|
|
207
|
+
uv run black .
|
|
208
|
+
uv run isort .
|
|
209
|
+
|
|
210
|
+
# Type checking
|
|
211
|
+
uv run mypy .
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
## Documentation
|
|
215
|
+
|
|
216
|
+
Full documentation is available in the [docs/](./docs/) directory, covering:
|
|
217
|
+
|
|
218
|
+
- [Getting Started](./docs/getting-started/) -- installation, quickstart
|
|
219
|
+
- [Core Concepts](./docs/core-concepts/) -- decorators, metrics, scoring, data sources
|
|
220
|
+
- [LLM Metrics](./docs/llm-metrics/) -- correctness, relevance, hallucination, and more
|
|
221
|
+
- [Deterministic Metrics](./docs/deterministic-metrics/) -- ROUGE, exact match, regex, JSON schema
|
|
222
|
+
- [RAG Metrics](./docs/rag-metrics/) -- faithfulness, contextual precision/recall
|
|
223
|
+
- [Conversation Metrics](./docs/conversation-metrics/) -- context retention, consistency
|
|
224
|
+
- [Multi-Modal](./docs/multimodal/) -- vision, audio, image generation evaluation
|
|
225
|
+
- [Plugins](./docs/plugins/) -- Langfuse, LangGraph, Observe
|
|
226
|
+
- [API Reference](./docs/api-reference/) -- decorators, evaluator, models, score
|
|
227
|
+
|
|
228
|
+
## Contributing
|
|
229
|
+
|
|
230
|
+
See [CONTRIBUTING.md](./CONTRIBUTING.md) for development setup, coding standards, and how to submit pull requests.
|
|
231
|
+
|
|
232
|
+
## License
|
|
233
|
+
|
|
234
|
+
Apache License 2.0 -- see [LICENSE](./LICENSE) for details.
|
|
@@ -0,0 +1,184 @@
|
|
|
1
|
+
# fasteval
|
|
2
|
+
|
|
3
|
+
[](https://pypi.org/project/fasteval-core/)
|
|
4
|
+

|
|
5
|
+
[](https://github.com/intuit/fasteval/actions/workflows/ci.yml)
|
|
6
|
+
[](https://opensource.org/licenses/Apache-2.0)
|
|
7
|
+
|
|
8
|
+
A **decorator-first LLM evaluation library** for testing AI agents and LLMs. Stack decorators to define evaluation criteria, run with pytest.
|
|
9
|
+
|
|
10
|
+
## Features
|
|
11
|
+
|
|
12
|
+
- **Decorator-based metrics** -- stack `@fe.correctness`, `@fe.relevance`, `@fe.hallucination`, and 30+ more
|
|
13
|
+
- **pytest native** -- run evaluations with `pytest`, get familiar pass/fail output
|
|
14
|
+
- **LLM-as-judge + deterministic** -- semantic LLM metrics alongside ROUGE, exact match, JSON schema, regex
|
|
15
|
+
- **Multi-modal** -- evaluate vision, audio, and image generation models
|
|
16
|
+
- **Conversation metrics** -- context retention, topic drift, consistency for multi-turn agents
|
|
17
|
+
- **RAG metrics** -- faithfulness, contextual precision, contextual recall, answer correctness
|
|
18
|
+
- **Tool trajectory** -- verify agent tool calls, argument matching, call sequences
|
|
19
|
+
- **Pluggable providers** -- OpenAI (default), Anthropic, Azure OpenAI, Ollama
|
|
20
|
+
|
|
21
|
+
## Quick Start
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
pip install fasteval-core
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Set your LLM provider key:
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
export OPENAI_API_KEY=sk-your-key-here
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Write your first evaluation test:
|
|
34
|
+
|
|
35
|
+
```python
|
|
36
|
+
import fasteval as fe
|
|
37
|
+
|
|
38
|
+
@fe.correctness(threshold=0.8)
|
|
39
|
+
@fe.relevance(threshold=0.7)
|
|
40
|
+
def test_qa_agent():
|
|
41
|
+
response = my_agent("What is the capital of France?")
|
|
42
|
+
fe.score(response, expected_output="Paris", input="What is the capital of France?")
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Run it:
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
pytest test_qa_agent.py -v
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Installation
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
# pip
|
|
55
|
+
pip install fasteval-core
|
|
56
|
+
|
|
57
|
+
# uv
|
|
58
|
+
uv add fasteval-core
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### Optional Extras
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
# Anthropic provider
|
|
65
|
+
pip install fasteval-core[anthropic]
|
|
66
|
+
|
|
67
|
+
# Vision-language evaluation (GPT-4V, Claude Vision)
|
|
68
|
+
pip install fasteval-core[vision]
|
|
69
|
+
|
|
70
|
+
# Audio/speech evaluation (Whisper, ASR)
|
|
71
|
+
pip install fasteval-core[audio]
|
|
72
|
+
|
|
73
|
+
# Image generation evaluation (DALL-E, Stable Diffusion)
|
|
74
|
+
pip install fasteval-core[image-gen]
|
|
75
|
+
|
|
76
|
+
# All multi-modal features
|
|
77
|
+
pip install fasteval-core[multimodal]
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
## Usage Examples
|
|
81
|
+
|
|
82
|
+
### Deterministic Metrics
|
|
83
|
+
|
|
84
|
+
```python
|
|
85
|
+
import fasteval as fe
|
|
86
|
+
|
|
87
|
+
@fe.contains()
|
|
88
|
+
def test_keyword_present():
|
|
89
|
+
fe.score("The answer is 42", expected_output="42")
|
|
90
|
+
|
|
91
|
+
@fe.rouge(threshold=0.6, rouge_type="rougeL")
|
|
92
|
+
def test_summary_quality():
|
|
93
|
+
fe.score(actual_output=summary, expected_output=reference)
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### RAG Evaluation
|
|
97
|
+
|
|
98
|
+
```python
|
|
99
|
+
@fe.faithfulness(threshold=0.8)
|
|
100
|
+
@fe.contextual_precision(threshold=0.7)
|
|
101
|
+
def test_rag_pipeline():
|
|
102
|
+
result = rag_pipeline("How does photosynthesis work?")
|
|
103
|
+
fe.score(
|
|
104
|
+
actual_output=result.answer,
|
|
105
|
+
context=result.retrieved_docs,
|
|
106
|
+
input="How does photosynthesis work?",
|
|
107
|
+
)
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### Tool Trajectory
|
|
111
|
+
|
|
112
|
+
```python
|
|
113
|
+
@fe.tool_call_accuracy(threshold=0.9)
|
|
114
|
+
def test_agent_tools():
|
|
115
|
+
result = agent.run("Book a flight to Paris")
|
|
116
|
+
fe.score(
|
|
117
|
+
actual_tools=result.tool_calls,
|
|
118
|
+
expected_tools=[
|
|
119
|
+
{"name": "search_flights", "args": {"destination": "Paris"}},
|
|
120
|
+
{"name": "book_flight"},
|
|
121
|
+
],
|
|
122
|
+
)
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### Metric Stacks
|
|
126
|
+
|
|
127
|
+
```python
|
|
128
|
+
@fe.correctness(threshold=0.8, weight=2.0)
|
|
129
|
+
@fe.relevance(threshold=0.7, weight=1.0)
|
|
130
|
+
@fe.coherence(threshold=0.6, weight=1.0)
|
|
131
|
+
def test_comprehensive():
|
|
132
|
+
response = agent("Explain quantum computing")
|
|
133
|
+
fe.score(response, expected_output=reference_answer, input="Explain quantum computing")
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
## Plugins
|
|
137
|
+
|
|
138
|
+
| Plugin | Description | Install |
|
|
139
|
+
|--------|-------------|---------|
|
|
140
|
+
| [fasteval-langfuse](./plugins/fasteval-langfuse/) | Evaluate Langfuse production traces with fasteval metrics | `pip install fasteval-langfuse` |
|
|
141
|
+
| [fasteval-langgraph](./plugins/fasteval-langgraph/) | Test harness for LangGraph agents | `pip install fasteval-langgraph` |
|
|
142
|
+
| [fasteval-observe](./plugins/fasteval-observe/) | Runtime monitoring with async sampling | `pip install fasteval-observe` |
|
|
143
|
+
|
|
144
|
+
## Local Development
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
# Install uv
|
|
148
|
+
brew install uv
|
|
149
|
+
|
|
150
|
+
# Create virtual environment and install dependencies
|
|
151
|
+
uv sync --all-extras
|
|
152
|
+
|
|
153
|
+
# Run the test suite
|
|
154
|
+
uv run tox
|
|
155
|
+
|
|
156
|
+
# Format code
|
|
157
|
+
uv run black .
|
|
158
|
+
uv run isort .
|
|
159
|
+
|
|
160
|
+
# Type checking
|
|
161
|
+
uv run mypy .
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
## Documentation
|
|
165
|
+
|
|
166
|
+
Full documentation is available in the [docs/](./docs/) directory, covering:
|
|
167
|
+
|
|
168
|
+
- [Getting Started](./docs/getting-started/) -- installation, quickstart
|
|
169
|
+
- [Core Concepts](./docs/core-concepts/) -- decorators, metrics, scoring, data sources
|
|
170
|
+
- [LLM Metrics](./docs/llm-metrics/) -- correctness, relevance, hallucination, and more
|
|
171
|
+
- [Deterministic Metrics](./docs/deterministic-metrics/) -- ROUGE, exact match, regex, JSON schema
|
|
172
|
+
- [RAG Metrics](./docs/rag-metrics/) -- faithfulness, contextual precision/recall
|
|
173
|
+
- [Conversation Metrics](./docs/conversation-metrics/) -- context retention, consistency
|
|
174
|
+
- [Multi-Modal](./docs/multimodal/) -- vision, audio, image generation evaluation
|
|
175
|
+
- [Plugins](./docs/plugins/) -- Langfuse, LangGraph, Observe
|
|
176
|
+
- [API Reference](./docs/api-reference/) -- decorators, evaluator, models, score
|
|
177
|
+
|
|
178
|
+
## Contributing
|
|
179
|
+
|
|
180
|
+
See [CONTRIBUTING.md](./CONTRIBUTING.md) for development setup, coding standards, and how to submit pull requests.
|
|
181
|
+
|
|
182
|
+
## License
|
|
183
|
+
|
|
184
|
+
Apache License 2.0 -- see [LICENSE](./LICENSE) for details.
|